Stochastic gradient descent (SGD), in contrast to batch gradient descent, performs a parameter update for each training example, x(i) and label y(i):
Θ = Θ - η∇Θj(Θ, x(i), y(i))
Make sure that the preceding common code list is added before the main code snippet in the following codes:
Create a sequential model with the appropriate network topology:
- Input layer with shape (*, 784), and an output of (*, 512)
- Hidden layer with an input (*, 512) and an output of (*, 512)
- Output layer with the input dimension as (*, 512) and the output as (*, 10)
Let's look at the activation functions for each layer:
- Layer 1 and Layer 1-
relu
- Layer 3-
softmax
from keras.optimizers import SGD y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) model = Sequential() model.add(Dense(512, activation='relu', input_shape=(784,))) model.add(Dropout(0.2)) model.add(Dense(512, activation='relu...