made by https://cneuralnets.netlify.app/

In deep learning, an optimizer is a crucial element that fine-tunes a neural network’s parameters during training. Its primary role is to minimize the model’s error or loss function, enhancing performance.

Optimizers facilitate the learning process of a neural network by iteratively updating the weights and biases received from the earlier data. We will discovering optimizers by taking a small road trip down the history of optimizers and slowly inventing newer and more advanced optimizers!

Premise

What we are trying to do here is nothing but to minimize the loss function of any model. Our objective is to find a set of values for which the value of the loss function is to be minimum

$$ objective = min(loss\space function) $$

We will be employing several methods to do so, while trying to make our optimizers better and better.

Approach 1 - Random Search

Logic

First and the most obvious solution is generate random parameters and find loss with that set of values, if the loss is lesser than our minimum loss, we will keep updating it! After few thousand iterations, we will find our optimum set of parameters

Code

min_loss=float("inf")
for n in range(10000):
	W=np.random.randn(10,2073)*0.0001 #random parameters
	loss=L(X_train,Y_train,W) #calculate the loss
	if loss<min_loss:
		min_loss=loss
		min_W=W
	print(f"Attempt {n} | Loss - {loss} | MinLoss - {min_loss}")

If we actually run this, it will give us an accuracy of around 16.3% (which is not actually that bad, considering it’s just random numbers.

Our first approach is not that good in terms of actually finding the optimum parameters, let’s try a more intuitive approach!

Approach 2 - Numeric Gradient

Logic

What if we try to follow the slope of the curve of the loss function? In a 1-D case, the formula of the derivative is nothing but :

$$ \frac{df(x)}{dx}=\lim_{h\rarr0}{\frac{f(x+h)-f(x)}{h}} $$

The slope of any curve is nothing but it’s dot product of its direction with the gradient. Let’s take an example table of weights and find their slopes.

W W+h dW
0.34 0.34+0.0001 ?
-1.11 . ?
0.78 . ?
0.12 . ?
0.55 . ?
…….. ……… ……
Loss : 1.25347 Loss : 1.25322