Let's Read the Art of Computer Programming

Table of Contents

Intro

I'm going to start with Volume 1: Fundamental Algorithms 3rd edition by Don Knuth and keep going. These books don't seem to be a reference much of the index for major results points to exercises and the content is often comparing two algorithms together. Kirchoff's law of electrical circuit theory is used to analyze program control flow where the number of times an instruction is executed must equal the number of times a program transfers to that instruction. Knuth dislikes jargon and everything is practical you won't find theoretically good on paper algorithms where inputs are jacked up to infinity that hide a huge constant of ops like when Knuth wrote The Stanford GraphBase book and discovered Kruskal's algorithm beats Fibonacci heaps. The algorithms from that book are all in volume 4.

The physical book itself is a representation of the careful work that has gone into it's writing with high quality typography, paper, and a binding where no matter what page you flip to the book stays flat. If you were to ever buy a physical book this would be one of those books. The pdf version Knuth paid the Berkeley Mathematical Sciences Publishers to create a compressed searchable text with hyperlinks throughout the document and even guaranteed that the final version was correct in all notation/figures. He tells readers to avoid any kind of other ebook format like Kindle where the notation is trashed.

Get the book(s)

See the book site for details or where to get official pdf/copies. I'm using the 3rd edition look for the most recent printings that fix errata. If you can't buy them use Library Genesis to get a pdf.

The Bill Gates quote

Everybody talks about the famous old quote in the back of volume 1 by Gates: 'you should definitely send me a resume if you can read the whole thing'. I found a more recent quote from 2005:

BILL GATES: ..say somebody came for an interview and they said "Hey, I read the Art of Computer Programming, that’s all I ever read, I did all the problems", I would hire them right then.

MARIA KLAWE: You’d hire them right then?

BILL GATES: Yeah, that’s right.

MARIA KLAWE: So would I.

BILL GATES: Even if they didn’t do the double-star problems, I mean, just the fact that they’d read the whole book, you know, those are the kinds of things you need to know to be a good programmer. Actually, there’s some of that you don’t even need to know, but the kind of algorithmic thinking that’s promoted there." (2005 interview with then Princeton Dean of Science).

I'm not sure what the double-star problems are maybe the earlier editions had this labelling scheme. I also don't think the people who run HR at bigtech have ever heard of these books so I wouldn't read these and expect a job in 2024.

MMIX/MIX

Volumes 1-3 still contain the original MIX machine and there exists GNU MDK or other software to run MIXware meaning compile .mixal files to .mix and simulate running them either with mixvm, gmixvm GUI, or a scheme interpreter.

There exists the MMIX Supplement by Martin Ruckert to replace the 1960s MIX architecture with MMIX (RISC) in volumes 1-3 which Knuth has announced on his book site he plans to someday replace. MMIX arch you can customize the hardware however you want using Knuth's free meta-simulator program which is a test bed to try out different chip configurations. You can change instruction throughput, pipeline configurations, branch prediction, cache sizes and associativity, bus speeds, then test running various algorithms on that custom config. He documented that program with the book MMIXware: A RISC Computer for the Third Millennium and all the code is written in literate programming style so you can read the programs like you would any other book. MMIX software there is mixmasters version or the 2017 sealed in time Knuth version you compile from c-web sources that are extensively documented.

Knuth releases these updates called fascicles as drop in replacements for chapters and there's an MMIX replacement for most of 1.3.x but since the first 3 books have so much MIXAL content and I carry them around to read I'll switch to MMIX whenever we reach volume 4.

Optimizing compilers are dead

A reason to learn this material is given by DJ Bernstein's talk a few years ago about the death of optimizing compilers, see these slides where at the end Knuth describes how hand coded assembly will always be faster than the best optimizing compilers.

Prereq: Assembly

The preface says we should familiar enough with programming that we've written and tested a few programs, which means assembly/machine language programs. I assume you already know basic programming from somewhere else but not an assembly language. We can learn that doing microcorruption.com CTF and having a visual interface to step through all the code in the debugger to see what it does, see how registers are manipulated etc.

Starting the tutorial on microcorruption. At 18/74 in the tutorial all values are in base 16 or hexadecimal. There's many online tutorials to convert or look at a binary to hex table and memorize A (10) and F (15) then the rest are easy. 0x(hex) examples: 0x2 is 2 or 21, 0x20 is 32 or 25, 0x200 is 512 or 29. If you forgot how binary works watch this.

At 35/74 in the tutorial this is how you read the memory dump:

      0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
4390: 52 45 02 00 9c 43 64 00 82 44 50 44 74 65 73 74

4390 points to 52, 439c points to 74, if you look in the register state box sp (stack pointer) points to 439a or 50. These are 1 byte each in size or 8 bits. At 44/74 in the tutorial if you move back -0x8 or 8 bytes, count back 8 times or each row of 4 hex values has 2 bytes, so move back 4 rows. If you see 0x2(r15) that means whatever the memory r15 points to, skip 2 bytes. You'll pick it up as we step through the code, and we will step through a lot of code.

There's a manual for the fake lock we are breaking too.

New Orleans

The comparison instruction wants a 7 char password, the 'create password' sub-routine generates a 7 char password, try it just entering the 7 hex numbers in one password string, success.

Sydney

Set breakpoint at most interesting function which looks like check password. Enter 'test' as text for password, then c to zip to the breakpoint. Pressing s and stepping, r15 is pointing to 439c where we see 7465 7374 or 'test'. Notice the instructions however:

448a:  bf90 2b5d 0000   cmp 0x5d2b, 0x0(r15)
4492:  bf90 7021 0200   cmp #0x2170, 0x2(r15)
449a:  bf90 3256 0400   cmp #0x5632, 0x4(r15)
44a4:  bf90 5467 0600   cmp #0x6754, 0x6(r15)

It's comparing each byte in the memory that r15 points to some constant hex values by offsetting 2 bytes each time. They are in little endian format, #0x5d2b is represented as 2b5d in the instruction, #0x2170 is 7021 etc., so take all the comparison instructions and enter the password as little endian format and success, operatives inside again.

Note the actual code/instruction the machine uses is bf902b5d0000 or bf9070210200 if you read the 'LockIT Pro' manual the opcode is cmp (compare) source, destination which is for our visual benefit only in the debugger and for writing assembly so even with machine code there is some syntax abstraction.

Hanoi

Passwords must be between 8-16 chars so I entered a gigantic password and while stepping through the execution noticed it overflowed to the second memory location 2410. Ctrl-f the instructions for 2410 and there's a comparison there so I overflowed the hex password to 17 digits with that comparison at 2410.

Cusco

Entered another gigantic password, noticed it overflowed to where the program counter (the next instruction) is so overflowed it to change the pc to 'unlock door' instruction in little endian format.

Reykjavik

We can't see all the disassembly and have to step watching 'Current Instruction' box. Enter a single character password as a test and step until you see a cmp for the address in r4 offset by -0x24 or 36. Move back that amount in memory and you're at the password you just entered, so enter (little endian) whatever the cmp constant is.

Whitehorse

They took away the function to unlock the door, we have to read the manual about how the HSM 2 model works. In 3.1 This enables the CPU to trigger software interrupt 0x7F to directly trigger the door lock to unlock so we want to bamboozle this code to somehow run 0x7F. As usual start with a gigantic password and see where it is stored. Mine begins at memory location 347c. Looking at the conditional unlock door function I want to push 0x7f the unlock command instead of 0x7e the password comparison command, and then call that interrupt after.

If you overflow your password you'll see in the 'current instruction' box it will run whatever we put there, so try the instructions 30127f00b0123245 which in line 445c we see pushes the door unlock code to the stack and calls an interrupt to pop the stack which will signal to open the door.

Montevideo

Two new functions strcpy and memset. Memset appears to clear the password storage memory and strcpy removes all null bytes 00 out of our password entry so if we want to use 0x7f or 7f00 we have to steal some code to add/subtract or AND off some bits to avoid strcpy deleting our payload. I used 32 junk filler + 0244 + payload as a hex password after many trial and error getting the overflow to lineup after strcpy killed my 0044 address.

You can play with instructions here and use any hex calculator. I added some arbitrary value to 0x7f, moved it to a register, pushed it to the stack and called that interrupt again. Success. We achieved enough of the programming prereqs.

Prereq: Architecture

I'm going to crash course some lectures from MIT's 6.004 on RISC-V while reading the TAOCP chapter 1.3.1 on MIX which is a simple 1960s machine.

First is Binary Arithmetic. We're only watching half the lecture but if you wanted to learn about adders that's how modern arrays are indexed in logarithmic time with carry save adders. So when you think about arrays being linear they're actually not it's a tree structure hidden from you by the hardware. Watch this lecture if you don't remember binary encoding or modular arithmetic.

Second is Sequential Circuits. A multiplexer or mutex is a combinational circuit that has n-inputs and based on the select signal it forwards the input to the output or leaves the output in its present state. Remember this. The ALU is all the arithmetic. Now you know feedback really just means a cycle assuming you took graphs in some intro course. D Latch is a mutiplexer, C input is the selector for the D input. If C input is 0 you selected to change the state of Q to D otherwise Q stays the same state as before. An inverter does what you think it does switching inputs from 0 to 1 or 1 to 0. We learn what a clock is, it's a D Latch circuit which can hold state combined with other D Latches in a 'flip flop' combination so that the C input is a periodic signal going up and down and all flip flops are connected to the same C signal or clock. There's some content on time constraints and other levels of detail we don't care about so you can skip this. Then finally the clock time is the upper bound on how long of a delay it takes to propagate all changes in that combinational circuit and again the details don't really matter to us just skim this content. Write enable is dropping another multiplexer in to a flip flop. @32:50 this is what we came here to know. A register is write enabled flip flops which is a bunch of combined multiplexers all with the same clock that initializes the state and can change state. We don't have to watch the rest of the lecture.

Third is Programmable Machines. Skip to Random-Access Memory. That exact same memory diagram is in 1.3.1 (fig 13 the MIX computer). The much simpler MIX arch doesn't use hex to access memory locations it uses 0 - 3999. @7m flip through 1.3.1 as you watch this, Mem[0x100c] in the slides is equivalent to CONTENTS(4108) if MIX had that many memory locations it could address. MIX is a typical von Neumann computer so this is exactly how it works, instructions are fetched from memory and evaluated using the registers to hold state and a 'location counter' (today called program counter) is updated to point to the next instruction in memory to load. The instruction definition wasn't very clear here but is in TAOCP. In figure 3 of 1.3.1 the instruction is 5 bytes, and C field or byte 5 is the opcode to run. F is a range selector, I selects one of the 6 index registers and then adds their contents to the address field before executing the instruction, the remaining two are the address and first field is a sign -/+. Now imagine a stream of these instructions (sign)(address)(modifiers)(opcode) being executed one at a time and a modern desktop/laptop can handle around 100m instructions per second. On a modern system you want to write your program using the principle of locality so when the CPU grabs instructions and holds them in cache that all the next instructions are contained in that grab or else execution has to stop and instructions fetched from slower memory like RAM which is called a cache miss. Volume 4a covers cache strides or hit patterns and cache oblivious algorithms.

Stop second lecture before 18m (where she goes into RISC arch) and let's review chapter 1.3.1 of TAOCP. Bytes weren't the standard 255 when Knuth wrote this however he notes it won't matter since the programs don't depend on byte sizes. One byte in MIX holds 0 - 63 decimal numbers so it's 6 bits or 25 + 24 + 23 + 22 + 21 + 20 or 111 111 = 63 or 010 001 is 17. The word size in MIX is 5 bytes with an actual sign instead of two's complement where the first 0 or 1 is interpreted as pos/neg. This is because MIX is also a decimal computer which is wtf tier but we're in the world of wild compsci and not the conformity of texts from undergrad. The instruction format in figure 3 the 4th or F byte represents 8L + R so (1:3) field select is 8(1) + 3 or 11. The opcode is a number mapped to an operation so if C = 8 that means load into register A and there's a table figure here in this chapter and if you open the back cover you see the same table which lists all opcodes.

MIX has very basic peripherals for input like tape drives but this doesn't matter we use a turing machine tape drive abstraction for some complexity theory still anyway. Consider the 'tape' to be Amazon cloud s3 buckets and you have to remotely read them into memory somehow if you want a modern example. We can come back to this chapter and do all the exercises once we finish the prelims.

Prereq: Calculus

In the preface 'a knowledge of basic calculus will suffice' is stated however basic calculus to Knuth probably means something like the following. Go to this jstor archive of an interview with Knuth and copy the DOI into Sci-Hub to read the pdf:

At Case, I spent hours and hours studying the mathematics book we used -Calculus and Analytic Geometry by Thomas- and I worked every supplementary problem in the book. We were assigned only the even-numbered problems, but I did every single one together with the extras in the back of the book because I felt so scared (of failing). I thought I should do all of them. I found at first that it was very slow going, and I worked late at night to do it. I think the only reason I did this was because I was worried about passing. But then I found out that after a few months I could do all of the problems in the same amount of time that it took the other kids to do just the odd-numbered ones. I had learned enough about problem solving by that time that I could gain speed, so it turned out to be very lucky that I crashed into it real hard at the beginning.

The second part of the interview:

We should teach people things with an emphasis on how they were discovered. I've always told my students to try to do the same when reading technical materials; that is, don't turn the page until you have thought a while about what's probably going to be on the next page, because then you will be able to read faster when you do turn the page. Before you see how to solve the problem, think about it. How would you solve it? Why do you want to solve the problem? All these questions should be asked before you read the solution. Nine times out of ten you won't solve it yourself, but you'll be ready to better appreciate the solutions and you'll learn a lot more about the process of developing mathematics as you go. I think that's why I got stuck in Chapter One all of the time when reading textbooks in college. I always liked the idea of "why?" Why is it that way? How did anybody ever think of that from the very beginning? Everyone should continue asking these questions. It enhances your ability to absorb. You can reconstruct so much of mathematics from a small part when you know how the parts are put together. We teach students to derive things in geometry, but a lot of times the exercises test if they know the theorem, not the proof. To do well in mathematics, you should learn methods and not results. And you should learn how the methods were invented.

You can see the exact version Knuth used on archive.org the 1950s version w/supplementary problems all in the back but the 1960s 3rd edition has the same content the publisher moved the supplementary problems to the end of each chapter labelled 'misc problems'. There's a version on libgen (95MB size because HD scanned). Herb Gross lectures from MIT OCW that cover the same book are Single Variable Calculus, Multivariable Calculus, Complex Variables, Differential Eq, Linear Alg.

Peter Lax Calculus

Thomas' book is 900+ pages and took Knuth a year to complete with lectures and teaching assistants while he was at Case Tech. Since we are already doing hundreds of problems in TAOCP we really only need a single variable crash course that focuses on engineering instead of abstract real analysis.

There is a global survey of calculus education here the biggest problem students have is understanding limits are an infinite process. It's so bad that in 2016 Arizona State profs were paid by the NSF to redo the curriculum so they came up with one that doesn't have limits at all called Project DIRACC it only contains the natural idea of a limit via approximation (that doesn't end ergo infinite process) when accumulation (integration) is studied. They recommend Peter Lax's treatment of limits so that's what we'll do.

Obtain the book Calculus with Applications by Peter Lax/Terrell on library genesis. The errata for it is here.

1.1 Inequalities

Limits are an algebra of inequalities so we start right away with inequalities. Eq (1.1) \(\epsilon\) is epsilon a variable traditionally used for something close to zero or in other words a very small displacement. Problem 1.8 we can solve right now it states for any numbers a and b, only one of those 3 conditions hold in b <= a <= b and since b can't be less than itself it must equal a. Figure 1.9 multiply each side by 1/4 and square root both sides. In the visual proof it's the side of a plus the short side of b squared or the area of a square which is side x side.

Example 1.6 we want to prove any square with the same perimeter as a rectangle has a larger area. The perimeter of 11x1 rectangle is 2(11+1) = 24 and a square having the same perimeter would be (11+1)/2 or 6 each side = 24. Write this as the A-G inequality then square both sides to get rid of the square root and magically we have a proof that the area of any rectangle (Width x Length) is smaller than a square of same perimeter.

The proof for n = 4 in Eq (1.2) if a = mean and b = mean then sqrt(ab) <= (a + b)/2 and substitute back in the values for a and b.

Problems Many of these are good and teach rewriting inequalities.

Ex 1.7 I think derived by:

  • \((\sqrt x - \sqrt y)(\sqrt x + \sqrt y) = x - y\)
  • \((\sqrt x - \sqrt y) = \frac{1}{(\sqrt x + \sqrt y)} \cdot x - y\)
  • \(\frac{1}{(\sqrt x + \sqrt y)} \cdot (x - y) \le \frac{1}{4} \cdot (x - y)\)
  • \(\frac{1}{(\sqrt x + \sqrt y)} \le \frac{1}{4}\)

Ex 1.12 is the triangle inequality chapter.

Ex 1.13 (b) if you add some indexing it's more clear:

  • \(1_1 + 1_2 + 1_3 ... 1_{n-1} + x\) = n amount of total terms including x hence (x + n - 1)/n.

Ex 1.14(c) we can see it's less here using a desmos graph. A possible strategy notice if you take the geometrical mean of AH it equals G then the arithmetical mean of A + H will be greater. Ex (d) seems to be asking \(\frac{1}{2} \cdot \frac{2}{\frac{1}{R_1} + \frac{1}{R_2}}\)

Ex 1.6(b) The solution on pg 476 is right under the distributed triangle inequality |a||b-b0| or 10(0.001) if it doesn't make sense right away.

Ex 1.7 I tried this but doesn't seem correct:

  • 4m = (a + b + c + m)
  • 4m + 4m = (a + b + c + m) + (d + e + f + m)
  • 2m = h + g
  • 2m + 4m + 4m = (a + b + c + m) + (d + e + f + m) + g + h
  • (abcdefghmm)1/10 <= m
  • (abcdefghmm) <= m10
  • (abcdefgh) <= m8 then 8th root both sides

(b) (abcdemmm) <= m8 or (abcde) <= m5 and for the general case both times we increased m to show n like n+1 to show n in the book.

TODO

1.1 Algorithms

Whenever Knuth cites a reference/paper you can use Sci-Hub to get a copy.

The 'quadruple' definition of what an algorithm is will make sense when you try it yourself by hand. First we have f, which defines the computational steps:

  • f(n) = n (return the result and terminate, we aren't calling another function)
  • f(m,n) maps to f(m,n,0,1), the zero being just a placeholder to enter to the next step.
  • Step 1: f(m,n,r,1) maps to f(m,n, remainder of m divided by n, 2)
  • Step 2: f(m,n,r,2) maps to f(n) if the remainder is zero, or f(m,n,r,3) otherwise.
  • Step 3: f(m,n,p,3) maps to f(n, p, p, 1) swapping m with n, and setting the remainder p to n. The variable p is introduced here to make it explicit that p does not equal zero, as r can be equal to zero so different notation is used but it's just remainder with a new variable.

Let's try it using Knuth's example of m = 119 and n = 544 since the preface told us to try all these algorithms by hand.

f(119, 544)  -> f(119, 544, 0 ,1)
f(119, 544, 0, 1) -> f(119, 544, 119, 2) :divide m by n
f(119, 544, 119, 2) -> r is not 0 -> f(119, 544, 119, 3)
f(119, 544, 119, 3) -> f(544, 119, 119, 1) :set m <- n <- r
f(544, 119, 119, 1) -> f(544, 119, 68, 2) :r = 68
f(544, 119, 68, 2) -> r is not 0 -> f(544, 119, 68, 3)
f(544, 119, 68, 3) -> f(119, 68, 68, 1)
f(119, 68, 68, 1) -> f(119, 68, 51, 2) :r = 51
f(119, 68, 51, 2) -> r is not 0 -> f(119, 68, 51, 3)
f(119, 68, 51, 3) -> f(68, 51, 51, 1)
f(68, 51, 51, 1) -> f(68, 51, 17, 2) :r = 17
f(68, 51, 17, 2) -> r is not 0 -> f(68, 51, 17, 3)
f(68, 51, 17, 3) -> f(51, 17, 17, 1)
f(51, 17, 17, 1) -> f(51, 17, 0, 2)
f(51, 17, 0, 2) -> r = 0 -> f(17) 
f(17) -> 17

The last Markov algorithm example is pattern matching on strings. Alpha \(\alpha\) can be the empty string too so what's happening here is we are replacing characters out of strings every loop according to a set of matching rules exampled here. A is a set of finite alphabet characters, and A* is all the possible strings you could make using ordered sequences of characters from A. Next he defines (sigma, j) where sigma is some string in A* and j is a number less than N which is the final state. So input = (string, 0) and output = (string, num) or a termination state.

From this example we have 3 rules, and if the string doesn't match any rule patterns we terminate the algorithm. The set 'I' is the input so f("101", 0). Omega would be the output f("|||||", 8). Both of these are contained in Q and every other intermediary state like f("0|01", 1).

TODO


Home