Let's Read the Art of Computer Programming

Intro
Prereq: Assembly
- New Orleans
- Sydney
- Hanoi
- Cusco
- Reykjavik
- Whitehorse
- Montevideo
Prereq: Architecture
Prereq: Calculus
- Crash course
1.1 Algorithms

Intro

I'm going to start with Volume 1: Fundamental Algorithms 3rd edition by Don Knuth and keep going. These are now the most relevant computer science books as industry has shifted to creating their own custom hardware and software stacks. Data is now so massive we can no longer move it around to be computed and the state-of-the-art architecture these days perform computation at the data. Knuth's MMIX (RISC architecture) he created for these books has a free meta-simulator program which is a test bed to try out different chip configurations. You can change instruction throughput, pipeline configurations, branch prediction, cache sizes and associativity, then test running various algorithms on a custom config. This is exactly what you do when writing a prototype FPGA you plan to later manufacturer into a chip.

The physical book itself is a representation of the careful work that has gone into it's writing with high quality typography, paper, and a binding where no matter what page you flip to the book stays flat. If you were to ever buy a physical book this would be one of those books. The pdf version Knuth paid the Berkeley Mathematical Sciences Publishers to create a compressed searchable text with hyperlinks throughout the document and even guaranteed that the final version was correct in all notation/figures. He tells readers to avoid any kind of other ebook format like Kindle where the notation is trashed.

Get the book(s)

See the book site for details or where to get official pdf/copies. I'm using the 3rd edition look for the most recent printings that fix errata. If you can't buy them use Library Genesis or Anna's Archive to get a pdf.

The Bill Gates quote

Everybody talks about the famous old quote in the back of volume 1 by Gates: 'you should definitely send me a resume if you can read the whole thing'. I found a more recent quote from 2005:

BILL GATES: ..say somebody came for an interview and they said "Hey, I read the Art of Computer Programming, that’s all I ever read, I did all the problems", I would hire them right then.

MARIA KLAWE: You’d hire them right then?

BILL GATES: Yeah, that’s right.

MARIA KLAWE: So would I.

BILL GATES: Even if they didn’t do the double-star problems, I mean, just the fact that they’d read the whole book, you know, those are the kinds of things you need to know to be a good programmer. Actually, there’s some of that you don’t even need to know, but the kind of algorithmic thinking that’s promoted there." (2005 interview with then Princeton Dean of Science).

I'm not sure what the double-star problems are maybe the earlier editions had this labelling scheme. I also don't think the people who run HR at bigtech have ever heard of these books so I wouldn't read these and expect a job in 2024.

MMIX/MIX

Volumes 1-3 still contain the original MIX machine and there exists GNU MDK or other software to run MIXware meaning compile .mixal files to .mix and simulate running them either with mixvm, gmixvm GUI, or a scheme interpreter.

There exists the MMIX Supplement by Martin Ruckert to replace the 1960s MIX architecture with MMIX (RISC) in volumes 1-3 which Knuth has announced on his book site he plans to someday replace. This arch has MMIXware tools and documented their design with the book MMIXware: A RISC Computer for the Third Millennium and all the code is written in literate programming style so you can read the programs like you would any other book. MMIX software there is mixmasters version or the 2017 sealed in time Knuth version you compile from c-web sources that are extensively documented.

I was originally going to do this in the original MIX until Chapter 4 but urgency at my job to start writing customized hardware means I have to start with MMIX. We were always using FPGAs but now they have literally taken over because x86 can no longer scale due to many problems we'll learn about so if you can extract some program logic and manufacture that into a hardware accelerator it's cheaper than trying to scale x86.

Optimizing compilers are dead

Another reason to learn this material is given by DJ Bernstein's talk a few years ago about the death of optimizing compilers, see these slides where at the end Knuth describes how hand coded assembly will always be faster than the best optimizing compilers.

Prereq: Assembly

The preface says we should familiar enough with programming that we've written and tested a few programs, which means assembly/machine language programs. I assume you already know basic programming from somewhere else but not an assembly language. We can learn that doing microcorruption.com CTF and having a visual interface to step through all the code in the debugger to see what it does, see how registers are manipulated etc.

Starting the tutorial on microcorruption. At 18/74 in the tutorial all values are in base 16 or hexadecimal. There's many online tutorials to convert or look at a binary to hex table and memorize A (10) and F (15) then the rest are easy. 0x(hex) examples: 0x2 is 2 or 2¹, 0x20 is 32 or 2⁵, 0x200 is 512 or 2⁹. If you forgot how binary works watch this.

At 35/74 in the tutorial this is how you read the memory dump:

      0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
4390: 52 45 02 00 9c 43 64 00 82 44 50 44 74 65 73 74

4390 points to 52, 439c points to 74, if you look in the register state box sp (stack pointer) points to 439a or 50. These are 1 byte each in size or 8 bits. At 44/74 in the tutorial if you move back -0x8 or 8 bytes, count back 8 times or each row of 4 hex values has 2 bytes, so move back 4 rows. If you see 0x2(r15) that means whatever the memory r15 points to, skip 2 bytes. You'll pick it up as we step through the code, and we will step through a lot of code.

There's a manual for the fake lock we are breaking too.

New Orleans

The comparison instruction wants a 7 char password, the 'create password' sub-routine generates a 7 char password, try it just entering the 7 hex numbers in one password string, success.

Sydney

Set breakpoint at most interesting function which looks like check password. Enter 'test' as text for password, then c to zip to the breakpoint. Pressing s and stepping, r15 is pointing to 439c where we see 7465 7374 or 'test'. Notice the instructions however:

448a:  bf90 2b5d 0000   cmp 0x5d2b, 0x0(r15)
4492:  bf90 7021 0200   cmp #0x2170, 0x2(r15)
449a:  bf90 3256 0400   cmp #0x5632, 0x4(r15)
44a4:  bf90 5467 0600   cmp #0x6754, 0x6(r15)

It's comparing each byte in the memory that r15 points to some constant hex values by offsetting 2 bytes each time. They are in little endian format, #0x5d2b is represented as 2b5d in the instruction, #0x2170 is 7021 etc., so take all the comparison instructions and enter the password as little endian format and success, operatives inside again.

Note the actual code/instruction the machine uses is bf902b5d0000 or bf9070210200 if you read the 'LockIT Pro' manual the opcode is cmp (compare) source, destination which is for our visual benefit only in the debugger and for writing assembly so even with machine code there is some syntax abstraction.

Hanoi

Passwords must be between 8-16 chars so I entered a gigantic password and while stepping through the execution noticed it overflowed to the second memory location 2410. Ctrl-f the instructions for 2410 and there's a comparison there so I overflowed the hex password to 17 digits with that comparison at 2410.

Cusco

Entered another gigantic password, noticed it overflowed to where the program counter (the next instruction) is so overflowed it to change the pc to 'unlock door' instruction in little endian format.

Reykjavik

We can't see all the disassembly and have to step watching 'Current Instruction' box. Enter a single character password as a test and step until you see a cmp for the address in r4 offset by -0x24 or 36. Move back that amount in memory and you're at the password you just entered, so enter (little endian) whatever the cmp constant is.

Whitehorse

They took away the function to unlock the door, we have to read the manual about how the HSM 2 model works. In 3.1 This enables the CPU to trigger software interrupt 0x7F to directly trigger the door lock to unlock so we want to bamboozle this code to somehow run 0x7F. As usual start with a gigantic password and see where it is stored. Mine begins at memory location 347c. Looking at the conditional unlock door function I want to push 0x7f the unlock command instead of 0x7e the password comparison command, and then call that interrupt after.

If you overflow your password you'll see in the 'current instruction' box it will run whatever we put there, so try the instructions 30127f00b0123245 which in line 445c we see pushes the door unlock code to the stack and calls an interrupt to pop the stack which will signal to open the door.

Montevideo

Two new functions strcpy and memset. Memset appears to clear the password storage memory and strcpy removes all null bytes 00 out of our password entry so if we want to use 0x7f or 7f00 we have to steal some code to add/subtract or AND off some bits to avoid strcpy deleting our payload. I used 32 junk filler + 0244 + payload as a hex password after many trial and error getting the overflow to lineup after strcpy killed my 0044 address.

You can play with instructions here and use any hex calculator. I added some arbitrary value to 0x7f, moved it to a register, pushed it to the stack and called that interrupt again. Success. We achieved enough of the programming prereqs.

Prereq: Architecture

Onur Mutlu at ETH Zurich every year publishes free and open courses on computer architecture which covers state of the art techniques, hardware security and future architecture like RISC-V. We can view these as we work through the MMIX chapter.

The undergrad course is Digital Design & Computer Architecture w/YouTube playlist here and it's pretty much essential casual viewing to understand how MMIX works. There's no physics or math background needed. These are live streamed lectures so contain breaks and aren't as long as the running time suggests. The graduate course is Computer Architecture and the labs are exactly what MMIXware is: building a cache timing simulator where you can experiment with block size, associativity, change replacement policies, etc.

Prereq: Calculus

In the preface 'a knowledge of basic calculus will suffice' is stated however basic calculus to Knuth probably means something like the following. Go to this jstor archive of an interview with Knuth and copy the DOI into Sci-Hub to read the pdf:

At Case, I spent hours and hours studying the mathematics book we used -Calculus and Analytic Geometry by Thomas- and I worked every supplementary problem in the book. We were assigned only the even-numbered problems, but I did every single one together with the extras in the back of the book because I felt so scared (of failing). I thought I should do all of them. I found at first that it was very slow going, and I worked late at night to do it. I think the only reason I did this was because I was worried about passing. But then I found out that after a few months I could do all of the problems in the same amount of time that it took the other kids to do just the odd-numbered ones. I had learned enough about problem solving by that time that I could gain speed, so it turned out to be very lucky that I crashed into it real hard at the beginning.

The second part of the interview:

We should teach people things with an emphasis on how they were discovered. I've always told my students to try to do the same when reading technical materials; that is, don't turn the page until you have thought a while about what's probably going to be on the next page, because then you will be able to read faster when you do turn the page. Before you see how to solve the problem, think about it. How would you solve it? Why do you want to solve the problem? All these questions should be asked before you read the solution. Nine times out of ten you won't solve it yourself, but you'll be ready to better appreciate the solutions and you'll learn a lot more about the process of developing mathematics as you go. I think that's why I got stuck in Chapter One all of the time when reading textbooks in college. I always liked the idea of "why?" Why is it that way? How did anybody ever think of that from the very beginning? Everyone should continue asking these questions. It enhances your ability to absorb. You can reconstruct so much of mathematics from a small part when you know how the parts are put together. We teach students to derive things in geometry, but a lot of times the exercises test if they know the theorem, not the proof. To do well in mathematics, you should learn methods and not results. And you should learn how the methods were invented.

You can see the exact version Knuth used on archive.org the 1950s version w/supplementary problems all in the back but the 1960s 3rd edition has the same content the publisher moved the supplementary problems to the end of each chapter labelled 'misc problems'. There's a version on libgen (95MB size because HD scanned). Herb Gross lectures from MIT OCW that cover the same book are Single Variable Calculus, Multivariable Calculus, Complex Variables, Differential Eq, Linear Alg.

Thomas' book is 900+ pages and took Knuth a year to complete with lectures and teaching assistants while he was at Case Tech. Since we are already doing countless problems in TAOCP including writing our own derivatives program in volume 1 we can begin with a crash course and pick it up later when needed.

Crash course

I am making a calculus crash course for all workshop here the primary goal being integration to do asymptotics. Take what you need when it comes up.

1.1 Algorithms

Whenever Knuth cites a reference/paper you can use Sci-Hub to get a copy.

The 'quadruple' definition of what an algorithm is will make sense when you try it yourself by hand. First we have f, which defines the computational steps:

f(n) = n (return the result and terminate, we aren't calling another function)
f(m,n) maps to f(m,n,0,1), the zero being just a placeholder to enter to the next step.
Step 1: f(m,n,r,1) maps to f(m,n, remainder of m divided by n, 2)
Step 2: f(m,n,r,2) maps to f(n) if the remainder is zero, or f(m,n,r,3) otherwise.
Step 3: f(m,n,p,3) maps to f(n, p, p, 1) swapping m with n, and setting the remainder p to n. The variable p is introduced here to make it explicit that p does not equal zero, as r can be equal to zero so different notation is used but it's just remainder with a new variable.

Let's try it using Knuth's example of m = 119 and n = 544 since the preface told us to try all these algorithms by hand.

f(119, 544)  -> f(119, 544, 0 ,1)
f(119, 544, 0, 1) -> f(119, 544, 119, 2) :divide m by n
f(119, 544, 119, 2) -> r is not 0 -> f(119, 544, 119, 3)
f(119, 544, 119, 3) -> f(544, 119, 119, 1) :set m <- n <- r
f(544, 119, 119, 1) -> f(544, 119, 68, 2) :r = 68
f(544, 119, 68, 2) -> r is not 0 -> f(544, 119, 68, 3)
f(544, 119, 68, 3) -> f(119, 68, 68, 1)
f(119, 68, 68, 1) -> f(119, 68, 51, 2) :r = 51
f(119, 68, 51, 2) -> r is not 0 -> f(119, 68, 51, 3)
f(119, 68, 51, 3) -> f(68, 51, 51, 1)
f(68, 51, 51, 1) -> f(68, 51, 17, 2) :r = 17
f(68, 51, 17, 2) -> r is not 0 -> f(68, 51, 17, 3)
f(68, 51, 17, 3) -> f(51, 17, 17, 1)
f(51, 17, 17, 1) -> f(51, 17, 0, 2)
f(51, 17, 0, 2) -> r = 0 -> f(17) 
f(17) -> 17

The last Markov algorithm example is pattern matching on strings. Alpha \(\alpha\) can be the empty string too so what's happening here is we are replacing characters out of strings every loop according to a set of matching rules exampled here. A is a set of finite alphabet characters, and A* is all the possible strings you could make using ordered sequences of characters from A. Next he defines (sigma, j) where sigma is some string in A* and j is a number less than N which is the final state. So input = (string, 0) and output = (string, num) or a termination state.

From this example we have 3 rules, and if the string doesn't match any rule patterns we terminate the algorithm. The set 'I' is the input so f("101", 0). Omega would be the output f("|||||", 8). Both of these are contained in Q and every other intermediary state like f("0|01", 1).

TODO

Home