Calculus for Machine Learning

Difficulty: Intermediate | Time: 60-90 minutes | Key Concepts: Derivatives, Gradients, Optimization

Learning Paths

Why Calculus Matters for ML

Training neural networks is an optimization problem. Calculus (specifically derivatives) tells us how to adjust parameters to improve the model.

1. Derivatives Explained

What is a Derivative?

The derivative of a function measures how much the output changes when the input changes slightly.

Geometric Interpretation

The derivative is the slope of the tangent line to the function at a point.

Mathematical Definition

f'(x) = lim(h→0) [f(x+h) - f(x)] / h

Simpler intuition: How steep is the curve at this point?

Common Derivatives (Power Rule)

f(x) = x^n  →  f'(x) = n × x^(n-1)

Examples:
f(x) = x²   →  f'(x) = 2x
f(x) = x³   →  f'(x) = 3x²
f(x) = x    →  f'(x) = 1
f(x) = 5    →  f'(x) = 0 (constant has zero derivative)

2. The Chain Rule (Most Important!)

Formula

If y = f(g(x)), then dy/dx = df/dg × dg/dx

Or more simply: derivative of outside × derivative of inside

Real Example

f(x) = (2x + 3)²

Let u = 2x + 3  (inside function)
Then f(x) = u²  (outside function)

du/dx = 2
df/du = 2u = 2(2x + 3)

df/dx = df/du × du/dx = 2(2x + 3) × 2 = 4(2x + 3) = 8x + 12

Why Chain Rule Matters for ML

Neural networks are compositions of functions. Backpropagation is just the chain rule applied repeatedly!

3. Partial Derivatives

What When You Have Multiple Variables?

Partial derivatives measure how a function changes with respect to ONE variable, holding others constant.

Notation & Example

f(x, y) = x² + 3xy + y²

∂f/∂x = 2x + 3y  (treat y as constant)
∂f/∂y = 3x + 2y  (treat x as constant)

4. Gradients (Vector of Derivatives)

What is a Gradient?

The gradient is a vector containing all partial derivatives. It points in the direction of steepest increase.

Notation

∇f = [∂f/∂x₁, ∂f/∂x₂, ∂f/∂x₃, ...]

Example for f(x, y, z) = x² + y² + z²:
∇f = [2x, 2y, 2z]

Why Gradients Matter

The negative gradient points toward lower loss, so gradient descent follows -∇L to minimize loss!

5. Gradient Descent Explained

The Algorithm

Repeat until convergence:
  θ_new = θ_old - α × ∇L(θ_old)

Where:
  θ = model parameters
  α = learning rate (step size)
  L = loss function
  ∇L = gradient of loss

Intuition

Start at a random point. Look at the slope (gradient). Take a step in the opposite direction (downhill). Repeat until you reach the bottom.

Learning Rate Matters

Too small: Very slow training, takes forever to converge
Just right: Steady progress toward minimum
Too large: Might overshoot the minimum or diverge

6. Second Derivatives & Curvature

Second Derivative

f(x) = x²
f'(x) = 2x
f''(x) = 2

The second derivative tells us about curvature:
f'' > 0: Curving upward (convex)
f'' = 0: Straight line
f'' < 0: Curving downward (concave)

In Optimization

Second derivatives help determine if we’re at a minimum, maximum, or saddle point.

7. Backpropagation (Gradient Computation)

Simple Neural Network

Input (x) → Weight (w) → Bias (b) → Sigmoid → Loss (L)
       y = sigmoid(wx + b)
       L = (y - target)²

To train: Find ∂L/∂w and ∂L/∂b

By chain rule:
∂L/∂w = ∂L/∂y × ∂y/∂(wx+b) × ∂(wx+b)/∂w

Why It’s Called Backpropagation

We compute gradients by working backward from the loss to each parameter.

8. Python Examples

Manual Gradient Descent


import numpy as np

# Simple function: f(x) = x²
def f(x):
    return x**2

def f_derivative(x):
    return 2*x

# Start at x = 5
x = 5.0
learning_rate = 0.1

print(f"Starting at x = {x}, f(x) = {f(x)}")

for iteration in range(10):
    gradient = f_derivative(x)
    x = x - learning_rate * gradient
    print(f"Iteration {iteration+1}: x = {x:.4f}, f(x) = {f(x):.4f}")

# Should converge to x = 0 (minimum)

Using NumPy for Automatic Differentiation


import autograd.numpy as np
from autograd import grad

# Define function
def f(x):
    return x**2 + 3*x + 2

# Compute gradient automatically
f_grad = grad(f)

x = 5.0
print(f"f'(5) = {f_grad(5.0)}")  # Should be 13 (= 2*5 + 3)

# Use for gradient descent
x = 5.0
for i in range(5):
    grad_value = f_grad(x)
    x = x - 0.1 * grad_value
    print(f"x = {x:.4f}, f(x) = {f(x):.4f}")

9. Key Rules to Remember

Common Derivatives

Function	Derivative
c (constant)	0
x^n	n × x^(n-1)
e^x	e^x
ln(x)	1/x
sin(x)	cos(x)
cos(x)	-sin(x)

10. Optimization Algorithms Beyond Gradient Descent

Momentum

Accumulate gradients to build momentum, like a ball rolling downhill.

Adam (Adaptive Moment Estimation)

Most popular optimizer. Adapts learning rate for each parameter individually.

RMSprop

Keeps a running average of gradient magnitudes to adapt learning rate.

Key Takeaways

Derivatives measure how functions change
Chain rule connects derivatives through function compositions
Gradients point toward function increase
Gradient descent follows negative gradient to minimize loss
Learning rate controls step size
Backpropagation computes gradients in neural networks

You’ve Completed Math Foundations!

Congratulations! You now have the mathematical foundation needed for AI/ML. Next step: Apply this knowledge in hands-on projects and advanced ML courses.

Calculus for Machine Learning

📑 Table of Contents

Calculus for Machine Learning

Learning Paths

Why Calculus Matters for ML

1. Derivatives Explained

What is a Derivative?

Geometric Interpretation

Mathematical Definition

Common Derivatives (Power Rule)

2. The Chain Rule (Most Important!)

Formula

Real Example

Why Chain Rule Matters for ML

3. Partial Derivatives

What When You Have Multiple Variables?

Notation & Example

4. Gradients (Vector of Derivatives)

What is a Gradient?

Notation

Why Gradients Matter

5. Gradient Descent Explained

The Algorithm

Intuition

Learning Rate Matters

6. Second Derivatives & Curvature

Second Derivative

In Optimization

7. Backpropagation (Gradient Computation)

Simple Neural Network

Why It’s Called Backpropagation

8. Python Examples

Manual Gradient Descent

Using NumPy for Automatic Differentiation

9. Key Rules to Remember

Common Derivatives

10. Optimization Algorithms Beyond Gradient Descent

Momentum

Adam (Adaptive Moment Estimation)

RMSprop

Key Takeaways

You’ve Completed Math Foundations!

Resources

Found this helpful? Share it!

About harshith

You Might Also Like

Computer Vision: From Image Recognition to Real-World Applications

Image Recognition with Convolutional Neural Networks

Build an Intelligent Chatbot with NLP