AI in 2025
Table of Contents
Intro
We will learn the 'full stack' of modern deep learning systems by building our own library from scratch and how to reverse engineer them. Neural networks have different architectures just like there is different computer hardware architectures (x86, RISC, ARM). The most popular currently is transformer architecture for large language models (Grok-n, GPT-n) but regular old neural networks learning some function continue to be useful in every field the game is not over with advanced LLMs.
Limits of AI
Current popular AI is all foundation models like Grok-n, GPT-n, DALL-E etc. Have you wondered if we had infinite resources, infinite data, perfect training algorithms with no errors (aka ideal models), can we use this type of model for everything aka 'General AI'? Someone with help from the Beijing Academy of AI used category theory to model this scenario to see what is possible.
Curriculum
Build an AI framework:
- 10-714 DL Algorithms & Implementation (CMU)
- Build from scratch an entire framework for deep learning
- References to use as we go:
- notes (MIT)
- short lectures from the perspective of theoretical neuroscience
- book Understanding Deep Learning
Fuse the framework with multimodal capabilities:
- MAS.S60 How2AI (MIT)
- How to AI (Almost) Anything
- YouTube lectures
Research
We learn some of the applied art of attempting to reverse engineer or at least understand what an AI model is actually doing like if it's lying to us or has some emergent abilities.
Theory
The PhD core requirements of most schools consists of:
- The mathematical theory of ML (supervised learning is statistical decision theory)
- Models that generate their own learning samples (diffusion/GANs)
- Optimizing some function under a set of constraints (non and convex, distributed)
- The mathematical model of RL and multi-armed bandits (decisions under uncertainty)
- A bunch of higher dimensional probability/stats we won't take here
- Causality
The goal is then to figure out how to improve or come up with a better model/optimizer and there's an endless stream of daily papers from around the world doing just that being dumped on arxiv or published in various ML journals.
Any AI doctor or robot space pilot is going to come from a causal learning model in fact every interesting problem you want predicted by AI is almost certainly a causal question like 'if I set this to X, will Y happen?' and 'what would've happened if I did Z instead of X'.
Advanced Introduction to ML
Here is the same content as 10-715 or the mathematical model of machine learning in 2025 (which is mostly just a bunch of graphs which themselves are mostly DAGs)
- CS 485 Theory of ML (Waterloo)
- All the lectures taught by the author of the book
- MIS 6.S184 Diffusion Models 6 lectures on the math of diffusion models
- Multi-Armed Bandits or RL: Theory and Algorithms though we won't take all of this
- Elements of Causal Inference Foundations and learning algorithms
Start Here: Scalar Calculus
The math we are doing is not symbolic it's all numerical/analytical so it can be run on a computer system. We're just going to learn the math needed as we go.
- Watch this derivative video by 3Blue1Brown which explains the dy/dx notation.
Anything that accumulates (speed, volume, interest, distance, produced units) has a function of the rate of how fast it is accumulating. In that 3Blue1Brown video distance traveled is an accumulation and the velocity (speed in scalar calculus) is the rate of that distance accumulation. Speed also accumulates because it is measured in m/s or mp/h so it too has a rate of change function called acceleration. You can switch between both functions so recover the velocity function from acceleration by integrating which we'll learn when it comes up.
Linear Algebra
Watch from the Essence of linear algebra playlist:
- Vectors
- Linear combinations
- Linear transformations
- Matrix mulitplication as composition
- Nonsquare matrices as transformations between dimensions
Look up on youtube what a transpose is or xT we will see all this in depth shortly.
TODO Gradient
TODO Matrix Calculus
- Matrix Calculus (MIT)
- 8 lectures on calculus generalized to higher dimensions none of it will be symbolic