January 2026
Linear and Logistic Regression
A comprehensive guide covering the fundamentals of linear regression, gradient descent optimization, and logistic regression for classification tasks.
Download PDF1. Linear Regression
Overview
Linear Regression is one of the simplest and most widely used statistical and machine learning techniques. It is used to model the relationship between a dependent variable (also known as the target or output) and one or more independent variables (also known as features or predictors).
1.1 Key Concepts
Dependent Variable (y)
The variable we are trying to predict or explain.
Independent Variable(s) (x)
The variable(s) used to make predictions.
Linear Relationship
The relationship between the dependent and independent variables is assumed to be linear.
1.2 The Linear Regression Equation
For a simple linear regression (one independent variable):
- y — Dependent variable (what we predict)
- x — Independent variable (input feature)
- β₀ — Intercept (value of y when x = 0)
- β₁ — Slope (change in y for a unit change in x)
- ε — Error term (captures unexplained variance)
For multiple linear regression (more than one independent variable):
1.3 Assumptions of Linear Regression
1. Linearity
The relationship between dependent and independent variables is linear.
2. Independence
Observations are independent of each other.
3. Homoscedasticity
The variance of errors is constant across all levels of independent variables.
4. Normal Distribution
The errors (residuals) are normally distributed.
5. No Multicollinearity
Independent variables are not highly correlated with each other (for multiple regression).
1.4 Goal of Linear Regression
The goal is to find the best-fitting line that minimizes the difference between the actual values and the predicted values. This is typically done using the Ordinary Least Squares (OLS) method, which minimizes the sum of squared residuals:
Applications of Linear Regression
- • Economics: Predicting GDP, inflation rates, etc.
- • Healthcare: Estimating the effect of treatment on patient outcomes.
- • Marketing: Understanding the impact of advertising spend on sales.
- • Real Estate: Predicting house prices based on features like size, location, etc.
2. Gradient Descent
Overview
Gradient Descent is an optimization algorithm used to find the minimum of a function. In machine learning, it is primarily used to minimize the cost function (or loss function) and find the optimal parameters (weights) for a model.
2.1 Key Concepts
Cost Function (J)
A function that measures the error between predictions and actual values. The goal is to minimize this function.
Gradient (∇J)
The slope of the cost function at a particular point, indicating the direction of steepest ascent.
Learning Rate (α)
A hyperparameter that controls the size of the step taken during each iteration.
2.2 How Gradient Descent Works
- 1. Initialize Parameters: Start with random values for the parameters (weights).
- 2. Compute the Gradient: Calculate the gradient of the cost function with respect to each parameter.
- 3. Update Parameters: Adjust the parameters in the opposite direction of the gradient.
- 4. Repeat: Continue until convergence (when the cost function stops decreasing significantly).
2.3 The Update Rule
Where θⱼ is the parameter, α is the learning rate, and ∂J(θ)/∂θⱼ is the partial derivative.
2.4 Types of Gradient Descent
Batch Gradient Descent
Uses the entire dataset to compute the gradient at each step. Accurate but slow for large datasets.
Stochastic Gradient Descent (SGD)
Uses a single data point to compute the gradient. Faster but noisy updates.
Mini-Batch Gradient Descent
Uses a small batch of data points. Balances speed and accuracy; most commonly used in practice.
2.5 Challenges
Learning Rate Selection
Too high: overshooting. Too low: slow convergence.
Local Minima
Risk of getting stuck in local minima instead of the global minimum.
Feature Scaling
Features with different scales can cause slow or unstable convergence.
Vanishing/Exploding Gradients
Especially in deep networks, gradients can become too small or too large.
3. Logistic Regression
Overview
Logistic Regression is a classification algorithm used to predict the probability of a binary outcome (e.g., yes/no, 0/1, true/false). Despite its name, it is used for classification, not regression.
3.1 Key Concepts
Binary Classification
The target variable has two possible outcomes (e.g., spam/not spam, sick/healthy).
Probability
Logistic regression outputs a probability between 0 and 1, which is then mapped to a class.
Sigmoid Function
Used to map predictions to probabilities.
3.2 The Logistic Regression Model
Unlike linear regression, which predicts a continuous value, logistic regression uses the sigmoid function to map predictions to probabilities:
Where z is the linear combination of inputs:
Sigmoid Function Properties
- • Outputs values between 0 and 1
- • When z → ∞, σ(z) → 1
- • When z → -∞, σ(z) → 0
- • When z = 0, σ(z) = 0.5
3.3 Decision Boundary
A threshold (commonly 0.5) is used to classify the output:
If P(y=1|x) ≥ 0.5 → Predict class 1
If P(y=1|x) < 0.5 → Predict class 0
3.4 Cost Function (Log Loss)
Logistic regression uses log loss (or cross-entropy loss) instead of mean squared error:
3.5 Assumptions
Binary Outcome
The dependent variable is binary (for standard logistic regression).
Independence
Observations are independent of each other.
Linearity in Log-Odds
The relationship between independent variables and log-odds is linear.
No Multicollinearity
Independent variables are not highly correlated.
3.6 Applications
- • Healthcare: Predicting disease presence (e.g., diabetes, cancer).
- • Marketing: Predicting customer churn or conversion.
- • Finance: Credit scoring, fraud detection.
- • NLP: Spam detection, sentiment analysis.
4. Summary: Key Differences
| Aspect | Linear Regression | Logistic Regression |
|---|---|---|
| Type | Regression | Classification |
| Output | Continuous | Probability (0-1) |
| Function | Linear | Sigmoid |
| Cost Function | Mean Squared Error | Log Loss |
| Use Case | Predicting values | Predicting classes |
Download the Full PDF
Get the complete guide with all equations and examples.
Download PDF