AI in 2025

Table of Contents

Intro

We will learn the 'full stack' of modern deep learning systems by building our own library from scratch and how to reverse engineer them. Neural networks have different architectures just like there is different computer hardware architectures (x86, RISC, ARM). The most popular currently is transformer architecture for large language models (Grok-n, GPT-n) but regular old neural networks learning some function continue to be useful in every field the game is not over with advanced LLMs.

Limits of AI

Current popular AI is all foundation models like Grok-n, GPT-n, DALL-E etc. Have you wondered if we had infinite resources, infinite data, perfect training algorithms with no errors (aka ideal models), can we use this type of model for everything aka 'General AI'? Someone with help from the Beijing Academy of AI used category theory to model this scenario to see what is possible.

Curriculum

Build an AI framework:

Fuse the framework with multimodal capabilities:

Research

We learn some of the applied art of attempting to reverse engineer or at least understand what an AI model is actually doing like if it's lying to us or has some emergent abilities.

  • Mechanistic Interpretability (Neel Nanda@Google DeepMind)
    • Complete guide to doing research yourself in mech interp
    • Example material is here and 90% programming
      • Chapter 0
      • Chapter 1
      • Chapter 2
      • Chapter 3

Theory

The PhD core requirements of most schools consists of:

  • The mathematical theory of ML (supervised learning is statistical decision theory)
  • Models that generate their own learning samples (diffusion/GANs)
  • Optimizing some function under a set of constraints (non and convex, distributed)
  • The mathematical model of RL and multi-armed bandits (decisions under uncertainty)
  • A bunch of higher dimensional probability/stats we won't take here
  • Causality

The goal is then to figure out how to improve or come up with a better model/optimizer and there's an endless stream of daily papers from around the world doing just that being dumped on arxiv or published in various ML journals.

Any AI doctor or robot space pilot is going to come from a causal learning model in fact every interesting problem you want predicted by AI is almost certainly a causal question like 'if I set this to X, will Y happen?' and 'what would've happened if I did Z instead of X'.

Advanced Introduction to ML

Here is the same content as 10-715 or the mathematical model of machine learning in 2025 (which is mostly just a bunch of graphs which themselves are mostly DAGs)

Start Here: Scalar Calculus

The math we are doing is not symbolic it's all numerical/analytical so it can be run on a computer system. We're just going to learn the math needed as we go.

  • Watch this derivative video by 3Blue1Brown which explains the dy/dx notation.

Anything that accumulates (speed, volume, interest, distance, produced units) has a function of the rate of how fast it is accumulating. In that 3Blue1Brown video distance traveled is an accumulation and the velocity (speed in scalar calculus) is the rate of that distance accumulation. Speed also accumulates because it is measured in m/s or mp/h so it too has a rate of change function called acceleration. You can switch between both functions so recover the velocity function from acceleration by integrating which we'll learn when it comes up.

Linear Algebra

Watch from the Essence of linear algebra playlist:

  • Vectors
  • Linear combinations
  • Linear transformations
  • Matrix mulitplication as composition
  • Nonsquare matrices as transformations between dimensions

Look up on youtube what a transpose is or xT we will see all this in depth shortly.

TODO Gradient

TODO Matrix Calculus

  • Matrix Calculus (MIT)
    • 8 lectures on calculus generalized to higher dimensions none of it will be symbolic

Home