AI in 2024

Table of Contents

Intro

There's a new field called Mechanistic Interpretability (MI) which reverse engineers neural networks similar to how you'd reverse engineer a compiled program. Neural networks have different architectures just like there is different software architectures (x86, RISC, ARM). The most popular is transformer architecture for large language models like ChatGPT.

Neel Nanda has helpfully produced a concrete curriculum for this and even offers mentorship through the ML Alignment & Theory Scholars program.

It requires knowing all there is to know about the internals of ML software so a side-effect is you can be a 'ML engineer', build your own TinyML or even get paid for bounties to work on GeoHot's TinyGrad.

Limits of AI

Current popular AI is all foundation models like GPT-n, DALL-E etc. Have you wondered if we had infinite resources, infinite data, perfect training algorithms with no errors, can we use this type of model for everything? Someone with help from the Beijing Academy of AI used category theory to model this scenario to see what is possible.

PreReqs

Looking at his prerequisite background I plotted the fastest possible way to satisfy them:

  • Derivatives, Gradients, Probability

The julia calculus library has a guide that is essentially a scalar, vector and differential calculus course. Alan Edelman's Computational Thinking has some lectures teaching autodiff (derivatives using a compiler feature), random variables and basic statistics.

Our goal is take just enough calculus prereqs to understand MIT's Matrix Calculus where derivatives and the chain rule are generalized to higher dimensions. It's an IAP course or 'independent activities period' where faculty can run a 4-week course so not a long course.

  • Linear Algebra

Alan Edelman's modernized 18.06 is here and we can use Strang's new 6th edition book that contains chapters on the SVD. Edelman designed it to teach intuition and spot patterns and the assignments and solutions are available. This should give us enough background to try Axler's book (or Terence Tao's notes) as suggested by Neel. Another second course you may want to do is MIT's Matrix Methods it covers basically everything neural network related including fourier analysis to understand superposition.

  • Neural Networks

CS 479 Neural Networks is an interesting Waterloo survey on neural networks from the perspective of theoretical neuroscience (Backpropagation, gradients/chain rule, basics of ML).

There's also 11-785 Intro to Deep Learning that covers some new architectures. 10-414 Deep Learning Systems is about the libraries of PyTorch and TensorFlow and programming it all from scratch. Neel Nanda will teach us everything about transformers after these prereqs.

TODO


Home