Home Uncategorized Article
Uncategorized

Calculus for Machine Learning

👤 By harshith
📅 Nov 20, 2025
⏱️ 7 min read
💬 0 Comments

📑 Table of Contents

Jump to sections as you read...

Calculus for Machine Learning

Difficulty: Intermediate | Time: 60-90 minutes | Key Concepts: Derivatives, Gradients, Optimization

Why Calculus Matters for ML

Training neural networks is an optimization problem. Calculus (specifically derivatives) tells us how to adjust parameters to improve the model.

1. Derivatives Explained

What is a Derivative?

The derivative of a function measures how much the output changes when the input changes slightly.

Geometric Interpretation

The derivative is the slope of the tangent line to the function at a point.

Mathematical Definition

f'(x) = lim(h→0) [f(x+h) - f(x)] / h

Simpler intuition: How steep is the curve at this point?

Common Derivatives (Power Rule)

f(x) = x^n  →  f'(x) = n × x^(n-1)

Examples:
f(x) = x²   →  f'(x) = 2x
f(x) = x³   →  f'(x) = 3x²
f(x) = x    →  f'(x) = 1
f(x) = 5    →  f'(x) = 0 (constant has zero derivative)

2. The Chain Rule (Most Important!)

Formula

If y = f(g(x)), then dy/dx = df/dg × dg/dx

Or more simply: derivative of outside × derivative of inside

Real Example

f(x) = (2x + 3)²

Let u = 2x + 3  (inside function)
Then f(x) = u²  (outside function)

du/dx = 2
df/du = 2u = 2(2x + 3)

df/dx = df/du × du/dx = 2(2x + 3) × 2 = 4(2x + 3) = 8x + 12

Why Chain Rule Matters for ML

Neural networks are compositions of functions. Backpropagation is just the chain rule applied repeatedly!

3. Partial Derivatives

What When You Have Multiple Variables?

Partial derivatives measure how a function changes with respect to ONE variable, holding others constant.

Notation & Example

f(x, y) = x² + 3xy + y²

∂f/∂x = 2x + 3y  (treat y as constant)
∂f/∂y = 3x + 2y  (treat x as constant)

4. Gradients (Vector of Derivatives)

What is a Gradient?

The gradient is a vector containing all partial derivatives. It points in the direction of steepest increase.

Notation

∇f = [∂f/∂x₁, ∂f/∂x₂, ∂f/∂x₃, ...]

Example for f(x, y, z) = x² + y² + z²:
∇f = [2x, 2y, 2z]

Why Gradients Matter

The negative gradient points toward lower loss, so gradient descent follows -∇L to minimize loss!

5. Gradient Descent Explained

The Algorithm

Repeat until convergence:
  θ_new = θ_old - α × ∇L(θ_old)

Where:
  θ = model parameters
  α = learning rate (step size)
  L = loss function
  ∇L = gradient of loss

Intuition

Start at a random point. Look at the slope (gradient). Take a step in the opposite direction (downhill). Repeat until you reach the bottom.

Learning Rate Matters

  • Too small: Very slow training, takes forever to converge
  • Just right: Steady progress toward minimum
  • Too large: Might overshoot the minimum or diverge

6. Second Derivatives & Curvature

Second Derivative

f(x) = x²
f'(x) = 2x
f''(x) = 2

The second derivative tells us about curvature:
f'' > 0: Curving upward (convex)
f'' = 0: Straight line
f'' < 0: Curving downward (concave)

In Optimization

Second derivatives help determine if we’re at a minimum, maximum, or saddle point.

7. Backpropagation (Gradient Computation)

Simple Neural Network

Input (x) → Weight (w) → Bias (b) → Sigmoid → Loss (L)
       y = sigmoid(wx + b)
       L = (y - target)²

To train: Find ∂L/∂w and ∂L/∂b

By chain rule:
∂L/∂w = ∂L/∂y × ∂y/∂(wx+b) × ∂(wx+b)/∂w

Why It’s Called Backpropagation

We compute gradients by working backward from the loss to each parameter.

8. Python Examples

Manual Gradient Descent


import numpy as np

# Simple function: f(x) = x²
def f(x):
    return x**2

def f_derivative(x):
    return 2*x

# Start at x = 5
x = 5.0
learning_rate = 0.1

print(f"Starting at x = {x}, f(x) = {f(x)}")

for iteration in range(10):
    gradient = f_derivative(x)
    x = x - learning_rate * gradient
    print(f"Iteration {iteration+1}: x = {x:.4f}, f(x) = {f(x):.4f}")

# Should converge to x = 0 (minimum)

Using NumPy for Automatic Differentiation


import autograd.numpy as np
from autograd import grad

# Define function
def f(x):
    return x**2 + 3*x + 2

# Compute gradient automatically
f_grad = grad(f)

x = 5.0
print(f"f'(5) = {f_grad(5.0)}")  # Should be 13 (= 2*5 + 3)

# Use for gradient descent
x = 5.0
for i in range(5):
    grad_value = f_grad(x)
    x = x - 0.1 * grad_value
    print(f"x = {x:.4f}, f(x) = {f(x):.4f}")

9. Key Rules to Remember

Common Derivatives

FunctionDerivative
c (constant)0
x^nn × x^(n-1)
e^xe^x
ln(x)1/x
sin(x)cos(x)
cos(x)-sin(x)

10. Optimization Algorithms Beyond Gradient Descent

Momentum

Accumulate gradients to build momentum, like a ball rolling downhill.

Adam (Adaptive Moment Estimation)

Most popular optimizer. Adapts learning rate for each parameter individually.

RMSprop

Keeps a running average of gradient magnitudes to adapt learning rate.

Key Takeaways

  • Derivatives measure how functions change
  • Chain rule connects derivatives through function compositions
  • Gradients point toward function increase
  • Gradient descent follows negative gradient to minimize loss
  • Learning rate controls step size
  • Backpropagation computes gradients in neural networks

You’ve Completed Math Foundations!

Congratulations! You now have the mathematical foundation needed for AI/ML. Next step: Apply this knowledge in hands-on projects and advanced ML courses.

Resources

Found this helpful? Share it!

Help others discover this content

About harshith

AI & ML enthusiast sharing insights and tutorials.

View all posts by harshith →