Linear Regression with Ordinary Least Squares

EnderDincer.com

Ender Dincer

Apr 4, 2023

1. Introduction

In this article we will see how ordinary least squares method is used to fit a regression line. Then we will implement this with Python. Although there are existing optimized implementations in Ptyhon, our example will be for learning purposes, like doing arithmetic without a calculator.

Ordinary Least Squares aims to determine the coefficients of a linear regression line by minimising the area of squares between the obeservation points and the estimated points. It is widely used in machine learning, statistics, finance and so on.

Figure 1: Linear Regression

2. Deriving the LSE Formula

To derive the formula we will first simplify the above figure. We will only focus on a single point and the regression line like below.

Figure 2: LSE for a Single Point

The green point is an observation point. The yellow square is one of the many squares that we aim to minimise its area. We know that lengths of all four sides of a square is equal so only finding the length of a single side in terms of the regression line coefficients and the observation point will be sufficient. The area will be that length to the power 2.

Now that we have the area function of a single point, denoted by "A", we can generalise to all other observation points like equation 1.

Equation 1

This is the error function we need to minimise.

3. Coefficients of the Regression Line

The regression lines slop and position, and therefore the area of the square, depends on the coefficients. Thus, we will get partials derivatives with respect to each coefficient and equal it to zero.

Equation 2

Solve for b0

Equation 3

We can simplify the equation as 2 is ineffective. Then we will separate the sums.

Equations 4

As b0 is a constant with regards to i, it can be simplified and left on the right hand side of the equation like below:

Equation 5: Leave b0 Alone

Equation 6

Two sums on the right hand side of the equation should start to look familiar now. They are simply means (averages) of Xs and Ys. Usually the below notation is used to denote means and read as "x bar" or "y bar".

Equation 7: Mean notation

With the help of this notation and equation 7 we get equation 8:

Equation 8

We will follow the same steps for b1. The first is the partial derivative of the LSE function with respect to b1 which results in equation 9.

Equation 9

To find the minimum, we will make the partial derivative equal to zero, like equation 10.

Equation 10

We know b0 in terms b1 from equation 8, therefore we can replace b0 with the right hand side of equation 8 in equation 10.

Equation 11

Grouping Xs and Ys with their mean values and leaving b1 alone will result in a closed form formula like below:

Solve for b1

Equation 12

We can use the above equation to find the coefficients but I will show another, very often used, equation derived from the equation above. To obtain the new form we will use the below equation.

Equation 13

Since the above expression is equal to zero we can substract this from both the nominator and the denominator of the expression in equation 12 and the result would not change.

Equation 14

If we refactor the above expression we will obtain the below expression.

Equation 15

4. Implementation with Python

We now know how to get the parameters of the regression line. Let's see how we can implement a linear regression method with ordinary least squares.

We will first create a class called LeastSquaresLinearRegressor that has two public functions. The first function is to fit the regression line. We will use the equations we just derived in section 2.

Copy

class LeastSquaresLinearRegressor:

    def fit(self, x, y):
        x_mean = np.mean(x)
        y_mean = np.mean(y)

        # Equation 12
        self.b1 = (x * (y - y_mean)).sum() / (x * (x - x_mean)).sum()
        # Equation 8
        self.b0 = y_mean - self.b1 * x_mean

    def predict(self, x):
        return self.b0 + self.b1 * x

Copy

1Output: -

Equation 12 and 8 are used to find the parameters of the regression line. The second function is the prediction function where we use the parameters calculated with the fit function.

To test and visualise this regressor we will use scikit-learn and matplotlib libraries.

Copy

def mean_squared_error(actual, prediction):
    return np.mean((actual - prediction) ** 2)


def mean_absolute_pc_error(actual, prediction):
    return np.abs((actual - prediction) / actual).sum() * (100 / actual.size)


def test():
    dataset_x, dataset_y = datasets.make_regression(
        n_samples=100, n_features=1, noise=15, random_state=3
    )
    x = np.ravel(dataset_x)
    y = np.ravel(dataset_y)

    x_train, x_test, y_train, y_test = train_test_split(
        x, y, test_size=0.2, random_state=5
    )

    regressor = LeastSquaresLinearRegressor()
    regressor.fit(x_train, y_train)

    predictions = regressor.predict(x_test)

    print(f"MSE: {mean_squared_error(y_test, predictions)}")
    print(f"MAPE: {mean_absolute_pc_error(y_test, predictions)}%")
    plt.scatter(x_train, y_train)
    plt.scatter(x_test, y_test)
    plt.plot(x_train, regressor.predict(x_train), color="#11aa00")
    plt.show()

Copy

Output: 
MSE: 208.2299503682563
MAPE: 48.37193549306731%

Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE) are widely used metrics to measure the accuracy of regression lines. 48 percent error rate can look like a very bad error rate but since the regression type is linear this is normal and it heavily depends on the nature of the data.

Let's the see the plotted regression line. The green line is the regression line. Blue dots are observation points (training set), orange dots are test data.

Figure 3: Regression Line Plotted with Python

5. Conclusion

In conclusion, linear regression with ordinary least squares is a powerful and widely used statistical tool. With its ability to estimate the coefficients of a linear equation that best fits the data, OLS is a valuable technique for understanding the nature and strength of the relationship between variables.

Sources

Least Squares - Wikipedia

link

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition

link

enderdincer.com