AI in 2025

Table of Contents

Intro

We will learn the 'full stack' of modern deep learning systems by building our own library from scratch in your language of choice. Neural networks have different architectures just like there is different computer hardware architectures (x86, RISC, ARM). The most popular currently is transformer architecture for large language models like ChatGPT. We'll learn how to reverse engineer these models.

Neel Nanda has helpfully produced a suggested curriculum for reverse engineering transformers. He even offers mentorship through the ML Alignment & Theory Scholars program or you can claim bounties for GeoHot's TinyGrad as a freelancer.

I broke this up into days to see how long it will take and they simply represent a block of time in a single day where I worked on something here. You don't have to do this consecutively everyday.

Many iterations of this

I nuked this curriculum and stalled it dozens of times for the simple reason it was too expensive to get into but now the GPU bubble finally popped letting us plebs access to hardware to train our own models.

I learned the theory from taking Waterloo's CS485 here it is the best course for machine learning you will ever take and goes through the entire mathematical model assuming you have zero background by the Israeli author of Understanding Machine Learning. The theory of learning will not change. CMU still uses this book in their PhD track ML course.

Limits of AI

Current popular AI is all foundation models like GPT-n, DALL-E etc. Have you wondered if we had infinite resources, infinite data, perfect training algorithms with no errors, can we use this type of model for everything aka 'General AI'? Someone with help from the Beijing Academy of AI used category theory to model this scenario to see what is possible.

Curriculum

  • 10-414 Deep Learning Systems (CMU)
    • Use any language you want
  • CS 479 Neural Networks (Waterloo)
    • Survey on neural networks from the perspective of theoretical neuroscience
    • Intuition building while hacking through 10-414
  • Mechanistic Interpretability (Google DeepMind)
    • Neel's curriculum on reversing a trained neural network
  • Matrix Calculus (MIT)
    • All the calculus we need generalized to higher dimensions
    • IAP course or 'Independent Activities Period' where faculty can run a 4-week course

Linear Algebra options

There is no shortage of excellent linear algebra courses. These are only what I have reviewed there's tons more if you don't want to do them.

  • Coding the Matrix (Brown)
    • All the recorded lectures are open to anyone
    • Programmed w/Python and has a (nonfree) book
      • I did this book in OCaml and it worked out fine

Created by Philip Klein a name you will recognize if you take any algorithms course. He uses the complex field mostly to illustrate how linear transformations (mappings) work. The SVD is covered which we'll need. The book is at least cheap ($35) and worth buying or you can use Anna's Archive or whatever the latest Library Genesis domain is to get a pdf but it will likely be missing a lot of graphical content. This is a very good course for anyone interested in game graphics or taking an algorithms design course and wants to manipulate graphs using linear algebra. If you hate everything else here then do this you'll be fine.

  • Modernized 18.06 (MIT)
    • Alan Edelman's course that teaches intuition
    • No pivots, no echelon forms, no free variables, no hand computation
    • Treats the SVD as it's own independent thing not some Eigenwhatever
    • Almost entirely programmed
    • Lectures got locked up by MIT logins but were once open
    • All other materials like solutions to homework/recitations are open

Strang had a stranglehold on MIT's linear algebra curriculum for decades and Edelman came along and shredded it. Now that Strang is retired maybe we'll see the new 18.06 lectures soon. I simply can't follow any book or lecture by Strang it's just disjointed rambling to me.

  • Linear Algebra Done Right - Sheldon Axler
    • New completely free 4th version
    • He upgraded his cat too that was always in the about author pic
    • Abstract treatment which we need but is considered a second course
    • Contains some calculus
    • Seems designed to prepare students for functional analysis (and thus ML)

He banishes determinants to the end of the book for a good reason all laid out in a paper you can find on arxiv. It's great and the proofs are not difficult the way he writes them are clear and intuitive. He doesn't explain what is going on though at all however there is some short YouTube geometry examples he made for the book but then again this is a 'second course' so he doesn't need to. Every mathematician will tell you that is how you learn meaning figure it out yourself and piece it all together seeing that function composition is non-associative just like matrix multiplication ergo a matrix is some kind of encoding of a function (mapping). There's no solutions either but you don't need them you guys can just write any proof you want who cares you learn by going back to the material and digging through it trying to write some argument to yourself why something is true.

These notes were his lectures to accompany the book by Friedberg, Insel and Spence. CMU uses the David Lay book. A lot of other universities like Stanford use VMLS (free). VMLS also has a Julia language companion.

  • Math 8530 Graduate Linear Algebra (Clemson)
    • Theory of vector spaces not abstract modules
    • Loosely follows Peter Lax's book chapters 1-9 and Halmos' book on finite-dimensional vector spaces
      • Edelman took this in highschool

We could take this. It's highly abstract but not impossibly abstract like replacing scalars in a vector space with entire rings like module theory. Peter Lax when he wrote this book was the foremost expert on computational partial differential equations. PDEs are now being 'solved' by neural networks so there is a quiet scientific field going right now where physics informed machine learning injects physics knowledge to neural networks to supercharge semi-supervised learning. This is where all the early revolutionary AI will come from when we can accelerate training.

  • Geometric Linear Algebra NJ Wildberger
    • Tells you what is actually going on
    • Functionals are covered which we'll need for Matrix calculus

We will do this but with one of the above resources.

Have you ever wanted to redo everything from scratch using linear algebra? There's a book for that.

Linear Algebra resources I'll do here

  • Axler's latest book
    • We have no choice but to take this for higher-dimensional ML

These are optional you can pick something else to gain intuition

  • Wildberger's intuitive lectures to figure out Axler
  • Poh-Shen Loh's Matrices Teacher Professional Development
  • Poh-Shen Loh's Determinant Teacher Professional Development
  • Poh-Shen Loh's Putnam Seminar on Linear Algebra

Watching Poh-Shen Loh explain why Matrix multiplication works is amazing. He's definitely my favorite mathematican because he was just a regular guy who failed linear algebra the first time he took it then went on to become to coach of the US olympiad team after thinking they were going to toss him his second year. He survived by going for broke and martingaling on cruise control (making heavy bets to get yourself out of a debt hole) where he decided to prove everything in the olympiad seminar himself in the most creative way possible a huge gamble that paid off.

Calculus resources I'll do here

We need to know basic applied math modeling. We could take MIT's calculus sequence on Open CourseWare but that will take at least 6 months and assumes you are an MIT student so already have the prerequisite highschool background.

  • Mathematical Modeling and Applied Calculus by Kilty & McAllister
    • Use Anna's Archive or lastest Library Genesis domain to get a pdf or buy it and give it to someone after
    • Partial derivatives, vectors, it's all here
    • Uses RStudio with a GPL licensed package here

The Matrix calculus seminar we take will fill in the rest.

Day 1 Neuroscience Models

This is 'learn AI from scratch' so that's what we're going to do starting with brain models. I'm including this here in the beginning so we can see what kind of math models we need to teach ourselves.

If we're going to model a brain with a neural network we should see how it works in the real world. @6:39 his example of an action potential using a water bucket being filled and dumped is exactly the same intuition of a potential function in a data structures analysis course used for amortized analysis. A dynamic array once filled up to a certain percentage needs to perform a costly action of doubling itself to enable more free space. You take the infrequent cost of doubling and average it over many read/writes to come up with an individual cost of all array operations. There's some differential equations here but he draws out what they mean, we don't have to know them we just need to know this is a non-linear model and everything to do with neurons is electrical. Hz (hertz) is spikes per second.

You can access all these notebooks he's using for the demo here if interested.

I don't have a physics background either but he's explaining the dynamics of a membrane well enough we can get the gist of what's going on. The difference between this first model and the previous Hodgkin-Huxley model is we are no longer modeling the spikes only the action potential for a spike to occur and a reset. Reduces complexity. @18:44 even simpler models. These Sigmoidal models are covered in the calculus book we'll do shortly after this. @22:20 ReLU and SoftMax are introduced.

Day 2 Applied Calculus

As mentioned above obtain the book Mathematical Modeling and Applied Calculus by Joel Kilty and Alex McAllister either buy it, borrow it, or download it from Anna's Archive or a working Library Genesis domain. All the simplified neuron models we just watched in the CS479 lectures are in this book and there's an RStudio package to do the exercises with. You can also directly interface with any R library using JuliaCall.

TODO


Home