made by https://cneuralnets.netlify.app/
This is the most basic and overlooked in today’s machine learning world, when we have advanced stuff, like transformers, RNNs and so much more. But in reality, if you dive deep into any kind of model, it will have linear regression in some form or the other!
The equation for linear regression is :
$$ y=\alpha_0+\alpha_1x_1+\alpha_2x_2+.....+\alpha_nx_n+\epsilon $$
where : y is the dependent variable, the variable we are trying to predict $x_i$ is the independent variable, the features our model is trying to use $\alpha_i$ is the coefficient (or weights) of our linear regression, they are what we are learning essentially $\epsilon$ is the error in our model
What we mainly try to do is try to fit the model. By fitting we mean, we need to find the set of coefficients that will form the best predictions for $y$, closest to the actual values. Finally it will be as easy as just plugging in the values of $x_i$ in the equation below to find your prediction
$$ \hat{y}=\hat{\alpha_0}+\hat{\alpha_1x_1}+\hat{\alpha_2x_2}+....+\hat{\alpha_nx_n} $$
While making a regression model, one must always keep in mind the 4 rules that are assumed to be true before we move on to make the model!
The fundamental principle of multiple linear regression is that there is a linear relationship between the dependent (outcome) variable and the independent variables. This linearity can be visually assessed using scatterplots, which should indicate a straight-line relationship rather than a curvilinear one.
The analysis presumes that the residuals (the differences between observed and predicted values) follow a normal distribution. This assumption can be evaluated by inspecting histograms or Q-Q plots of the residuals, or through statistical tests like the Kolmogorov-Smirnov test.
It is crucial that the independent variables are not excessively correlated with one another, a situation referred to as multicollinearity. This can be assessed using: