Implementing Linear Regression from Scratch in Python

admin

4 weeks ago

This tutorial walks through implementing linear regression from scratch in Python, without using machine learning libraries like scikit-learn. We’ll cover the math behind linear regression, implement core functionality, and demonstrate usage with real data.

Overview

Our implementation will include:

Basic matrix operations for linear algebra
Linear regression weight calculation
Prediction functionality
Data loading and visualization
Example applications

Implementation

Let’s start by implementing the core LinearRegression class:

class LinearRegression:
    def __init__(self):
        """
        Initializes the LinearRegression object with weights set to None.
        """
        self.weights = None

    def matmul(self, A, B):
        """
        Matrix multiplication of A and B.
        """
        if not (isinstance(A, list) and isinstance(B, list)) or len(A[0]) != len(B):
            raise ValueError("Matrix dimensions are not compatible for multiplication.")

        result = [[0 for _ in range(len(B[0]))] for _ in range(len(A))]
        for i in range(len(A)):
            for j in range(len(B[0])):
                for k in range(len(B)):
                    result[i][j] += A[i][k] * B[k][j]
        return result

    def transpose(self, A):
        """
        Transposes matrix A.
        """
        return [[A[j][i] for j in range(len(A))] for i in range(len(A[0]))]

    def inverse_2x2(self, A):
        """
        Inverts a 2x2 matrix.
        """
        if len(A) != 2 or len(A[0]) != 2:
            raise ValueError("Matrix must be 2x2.")

        det = A[0][0] * A[1][1] - A[0][1] * A[1][0]
        if det == 0:
            return None

        return [[A[1][1] / det, -A[0][1] / det], 
                [-A[1][0] / det, A[0][0] / det]]

    def fit(self, X, Y):
        """
        Calculates regression weights using the normal equation method.
        """
        X_transpose = self.transpose(X)
        XTX = self.matmul(X_transpose, X)

        if len(XTX) == 2 and len(XTX[0]) == 2:
            XTX_inv = self.inverse_2x2(XTX)
            if XTX_inv is None:
                print("Unable to calculate weights - matrix inversion failed.")
                return None
        else:
            print("Only 2x2 matrices are supported for the fit function.")
            return None

        XTY = self.matmul(X_transpose, Y)
        self.weights = self.matmul(XTX_inv, XTY)
        return self.weights

    def predict(self, X):
        """
        Makes predictions using calculated weights.
        """
        if self.weights is None:
            print("Cannot make predictions - weights not calculated.")
            return None

        if len(X[0]) != len(self.weights):
            raise ValueError("The dimensions of X and weights are incompatible.")

        return self.matmul(X, self.weights)

Using the Implementation

Let’s demonstrate usage with some example data:

# Initialize the model
lr = LinearRegression()

# Sample data
X = [[1, 1], [1, 2]]  # Features with bias term
Y = [[5], [6]]        # Target values

# Fit the model
weights = lr.fit(X, Y)
print("Calculated weights:", weights)

# Make predictions
predictions = lr.predict(X)
print("Predictions:", predictions)

Visualizing Results

The implementation includes plotting functionality to visualize the regression results:

def plot(self, X, Y, predicted_Y, future_X=None, future_predicted_Y=None, plot_options=None):
    """
    Plots actual data points and regression line.
    """
    if plot_options is None:
        plot_options = {}

    x_vals = [row[1] for row in X]
    y_vals = [val[0] for val in Y]

    # Get regression line parameters
    w0 = self.weights[0][0]  # Intercept
    w1 = self.weights[1][0]  # Slope

    # Generate fitted line points
    ind = np.linspace(min(x_vals), max(x_vals), 100)
    fitted_line = ind * w1 + w0

    # Plot actual data and regression line
    plt.plot(x_vals, y_vals, 'bo', label='Actual data')
    plt.plot(ind, fitted_line, 'r-', label='Fitted line')

    if future_X and future_predicted_Y:
        future_x_vals = [row[1] for row in future_X]
        future_predicted_vals = [val[0] for val in future_predicted_Y]
        plt.plot(future_x_vals, future_predicted_vals, 'g--', 
                label='Predicted future')

    plt.xlabel(plot_options.get('x_label', 'X'))
    plt.ylabel(plot_options.get('y_label', 'Y'))
    plt.title(plot_options.get('title', 'Linear Regression Fit'))
    plt.legend()
    plt.show()

Example Application: Temperature Trends

Let’s use our implementation to analyze temperature data:

# Load temperature data
X_temps, Y_temps = lr.load_data('temperature_data.csv')

# Fit model and make predictions
lr.fit(X_temps, Y_temps)
predicted_temps = lr.predict(X_temps)

# Plot results
plot_options = {
    'x_label': 'Year',
    'y_label': 'Temperature (°C)',
    'title': 'Temperature Trends Over Time'
}

lr.plot(X_temps, Y_temps, predicted_temps, plot_options=plot_options)

Key Features

Matrix Operations: Custom implementations of matrix multiplication, transposition, and 2×2 matrix inversion
Modular Design: Separate methods for fitting, prediction, and visualization
Error Handling: Input validation and appropriate error messages
Visualization: Flexible plotting options with support for future predictions

Limitations

Only handles 2×2 matrices for inverse calculations
Requires input data in specific format (lists of lists)
No regularization or advanced features
Limited error metrics and model evaluation tools

Conclusion

This implementation provides a foundation for understanding linear regression from first principles. While not as optimized as professional libraries like scikit-learn, it demonstrates the core concepts and mathematics behind linear regression.

For production use cases, it’s recommended to use established libraries that offer more features, better optimization, and support for larger datasets. However, this implementation serves as a valuable learning tool for understanding the fundamentals of linear regression.