As part of deep learning, we mostly would like to optimize the value of a function that either minimizes or maximizes *f(x)* with respect to *x*. A few examples of optimization problems are least-squares, logistic regression, and support vector machines. Many of these techniques will get examined in detail in later chapters.

# Optimization

# Optimizers

We will study `AdamOptimizer` here; TensorFlow `AdamOptimizer` uses Kingma and Ba's Adam algorithm to manage the learning rate. Adam has many advantages over the simple `GradientDescentOptimizer`. The first is that it uses moving averages of the parameters, which enables Adam to use a larger step size, and it will converge to this step size without any fine-tuning.

The disadvantage of Adam is that it requires more computation to be performed for each parameter in each training step. `GradientDescentOptimizer` can be used as well, but it would require more hyperparameter tuning before it would converge as quickly.

The following example shows how to use `AdamOptimizer`:

`tf.train.Optimizer`creates an optimizer`tf.train.Optimizer.minimize(loss, var_list)`adds the optimization operation to the computation graph

Here, automatic differentiation computes gradients without user input:

import numpy as np

import seaborn

import matplotlib.pyplot as plt

import tensorflow as tf

# input dataset

xData = np.arange(100, step=.1)

yData = xData + 20 * np.sin(xData/10)

# scatter plot for input data

plt.scatter(xData, yData)

plt.show()

# defining data size and batch size

nSamples = 1000

batchSize = 100

# resize

xData = np.reshape(xData, (nSamples,1))

yData = np.reshape(yData, (nSamples,1))

# input placeholders

x = tf.placeholder(tf.float32, shape=(batchSize, 1))

y = tf.placeholder(tf.float32, shape=(batchSize, 1))

# init weight and bias

with tf.variable_scope("linearRegression"):

W = tf.get_variable("weights", (1, 1), initializer=tf.random_normal_initializer())

b = tf.get_variable("bias", (1,), initializer=tf.constant_initializer(0.0))

y_pred = tf.matmul(x, W) + b

loss = tf.reduce_sum((y - y_pred)**2/nSamples)

# optimizer

opt = tf.train.AdamOptimizer().minimize(loss)

with tf.Session() as sess:

sess.run(tf.global_variables_initializer())

# gradient descent loop for 500 steps

for _ in range(500):

# random minibatch

indices = np.random.choice(nSamples, batchSize)

X_batch, y_batch = xData[indices], yData[indices]

# gradient descent step

_, loss_val = sess.run([opt, loss], feed_dict={x: X_batch, y: y_batch})

Here is the scatter plot for the dataset:

This is the plot of the learned model on the data: