January 2026

Linear and Logistic Regression

A comprehensive guide covering the fundamentals of linear regression, gradient descent optimization, and logistic regression for classification tasks.

Download PDF

1. Linear Regression

Overview

Linear Regression is one of the simplest and most widely used statistical and machine learning techniques. It is used to model the relationship between a dependent variable (also known as the target or output) and one or more independent variables (also known as features or predictors).

1.1 Key Concepts

Dependent Variable (y)

The variable we are trying to predict or explain.

Independent Variable(s) (x)

The variable(s) used to make predictions.

Linear Relationship

The relationship between the dependent and independent variables is assumed to be linear.

1.2 The Linear Regression Equation

For a simple linear regression (one independent variable):

y = β₀ + β₁x + ε
  • y — Dependent variable (what we predict)
  • x — Independent variable (input feature)
  • β₀ — Intercept (value of y when x = 0)
  • β₁ — Slope (change in y for a unit change in x)
  • ε — Error term (captures unexplained variance)

For multiple linear regression (more than one independent variable):

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

1.3 Assumptions of Linear Regression

1. Linearity

The relationship between dependent and independent variables is linear.

2. Independence

Observations are independent of each other.

3. Homoscedasticity

The variance of errors is constant across all levels of independent variables.

4. Normal Distribution

The errors (residuals) are normally distributed.

5. No Multicollinearity

Independent variables are not highly correlated with each other (for multiple regression).

1.4 Goal of Linear Regression

The goal is to find the best-fitting line that minimizes the difference between the actual values and the predicted values. This is typically done using the Ordinary Least Squares (OLS) method, which minimizes the sum of squared residuals:

SSE = Σ(yᵢ - ŷᵢ)²

Applications of Linear Regression

  • Economics: Predicting GDP, inflation rates, etc.
  • Healthcare: Estimating the effect of treatment on patient outcomes.
  • Marketing: Understanding the impact of advertising spend on sales.
  • Real Estate: Predicting house prices based on features like size, location, etc.

2. Gradient Descent

Overview

Gradient Descent is an optimization algorithm used to find the minimum of a function. In machine learning, it is primarily used to minimize the cost function (or loss function) and find the optimal parameters (weights) for a model.

2.1 Key Concepts

Cost Function (J)

A function that measures the error between predictions and actual values. The goal is to minimize this function.

Gradient (∇J)

The slope of the cost function at a particular point, indicating the direction of steepest ascent.

Learning Rate (α)

A hyperparameter that controls the size of the step taken during each iteration.

2.2 How Gradient Descent Works

  1. 1. Initialize Parameters: Start with random values for the parameters (weights).
  2. 2. Compute the Gradient: Calculate the gradient of the cost function with respect to each parameter.
  3. 3. Update Parameters: Adjust the parameters in the opposite direction of the gradient.
  4. 4. Repeat: Continue until convergence (when the cost function stops decreasing significantly).

2.3 The Update Rule

θⱼ := θⱼ - α · ∂J(θ)/∂θⱼ

Where θⱼ is the parameter, α is the learning rate, and ∂J(θ)/∂θⱼ is the partial derivative.

2.4 Types of Gradient Descent

Batch Gradient Descent

Uses the entire dataset to compute the gradient at each step. Accurate but slow for large datasets.

Stochastic Gradient Descent (SGD)

Uses a single data point to compute the gradient. Faster but noisy updates.

Mini-Batch Gradient Descent

Uses a small batch of data points. Balances speed and accuracy; most commonly used in practice.

2.5 Challenges

Learning Rate Selection

Too high: overshooting. Too low: slow convergence.

Local Minima

Risk of getting stuck in local minima instead of the global minimum.

Feature Scaling

Features with different scales can cause slow or unstable convergence.

Vanishing/Exploding Gradients

Especially in deep networks, gradients can become too small or too large.

3. Logistic Regression

Overview

Logistic Regression is a classification algorithm used to predict the probability of a binary outcome (e.g., yes/no, 0/1, true/false). Despite its name, it is used for classification, not regression.

3.1 Key Concepts

Binary Classification

The target variable has two possible outcomes (e.g., spam/not spam, sick/healthy).

Probability

Logistic regression outputs a probability between 0 and 1, which is then mapped to a class.

Sigmoid Function

Used to map predictions to probabilities.

3.2 The Logistic Regression Model

Unlike linear regression, which predicts a continuous value, logistic regression uses the sigmoid function to map predictions to probabilities:

σ(z) = 1 / (1 + e⁻ᶻ)

Where z is the linear combination of inputs:

z = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ

Sigmoid Function Properties

  • • Outputs values between 0 and 1
  • • When z → ∞, σ(z) → 1
  • • When z → -∞, σ(z) → 0
  • • When z = 0, σ(z) = 0.5

3.3 Decision Boundary

A threshold (commonly 0.5) is used to classify the output:

If P(y=1|x) ≥ 0.5 → Predict class 1

If P(y=1|x) < 0.5 → Predict class 0

3.4 Cost Function (Log Loss)

Logistic regression uses log loss (or cross-entropy loss) instead of mean squared error:

J(θ) = -1/m · Σ[yᵢ·log(ŷᵢ) + (1-yᵢ)·log(1-ŷᵢ)]

3.5 Assumptions

Binary Outcome

The dependent variable is binary (for standard logistic regression).

Independence

Observations are independent of each other.

Linearity in Log-Odds

The relationship between independent variables and log-odds is linear.

No Multicollinearity

Independent variables are not highly correlated.

3.6 Applications

  • Healthcare: Predicting disease presence (e.g., diabetes, cancer).
  • Marketing: Predicting customer churn or conversion.
  • Finance: Credit scoring, fraud detection.
  • NLP: Spam detection, sentiment analysis.

4. Summary: Key Differences

AspectLinear RegressionLogistic Regression
TypeRegressionClassification
OutputContinuousProbability (0-1)
FunctionLinearSigmoid
Cost FunctionMean Squared ErrorLog Loss
Use CasePredicting valuesPredicting classes

Download the Full PDF

Get the complete guide with all equations and examples.

Download PDF