Math Foundations from Scratch


This is a (ongoing) workshop to learn the math needed for various prereqs in university AI intro courses, and is Part IB of the AI Tripos. No background is assumed. The idea here is that you do this workshop until you can figure out the courses in the AI workshop and then can either continue doing both or switch, up to you. We start by writing proofs of basic arithmetic with the natural numbers and work our way up. Proof writing is a skill you practice like every other skill, if you persist and practice it enough eventually you pick up the skill. I watch a lecture one day, next day read some of a chapter from different material, so this is in linear order how I completed it.

Do 1% effort for 100 days - Prof Ryan O'Donnell


At first, your proofs will just be complete sentences that are a convincing argument to yourself. You don't need to have them verified or a solutions guide. The act of re-reading definitions, experimenting around with the examples, and trying to follow the logic is how you learn math. Remember when you learned to read and write, at first you just naively scribbled unaware your composition skills were terrible until you took more advanced grammar classes in school, and you began to see other people's writing to compare to your own. Then through experience you got better. This is same deal for proofs. You will naively scribble until we take enough logic (grammar) you can check your own proofs. You will read other people's more elegant proofs in books and online, and can at that point understand what they are doing because you have experience. We will see some proof theory and constructive logic that will help us verify our own arguments, and use proof assistants so we can run our proofs as code.

Make your own examples

If you write all your own examples everytime you come across something being defined, many of the exercises will be trivial to complete. Write them like you would example tests in the software workshop, find all the corner cases and see if the definition or lemma holds. Remember why we write examples in that workshop, it's to learn more about the problem domain since there may be subtle things that escaped our understanding. It's the same deal here, and usually those subtle things end up as exercises so if you've already done your own examples you already know the answer.

Start here

We start with some 10 minute lectures on basic arithmetic, from the point of view of a mathematician.

The natural numbers

Presented here is a definition of a mathematical object called a natural number, as a string or sequence of strokes, which are successors of each other. ie: the successor of a number, is another number.

  • YouTube - Arithmetic and Geometry Math Foundations 1 *note these videos eventually get better in quality

Arithmetic with natural numbers

  • YouTube - Arithmetic and Geometry Math Foundations 2

An interesting explanation of multiplication.

  • Laws for addition
    • n + m = m + n (Commutative)
      • Changing the order of the elements.
    • (k + n) + m = k + (m + n) (Associative)
      • Changing the grouping of the elements.
    • n + 1 = s(n) (Successor)
  • Laws for multiplication
    • nm = mn (Commutative)
    • (kn)m = k(nm) (Associative)
    • n \(\cdot\) 1 = n (Identity)
  • Note the different notation for multiplication:
    • nm
    • n \(\cdot\) m
    • n \(\times\) m
  • Distributive laws
    • k(n + m) = kn + km
    • (k + n)m = km + nm

Exercise: Prove why the multiplication and distributive laws work. Since we have no idea what a proof is yet, we can instead try going through some of our own examples.

Identity law: n \(\cdot\) 1 = n.

  • We are creating n groups of 1. According to our definition of multiplication |||| \(\cdot\) | would be 4 groups of | or ||||. This is still 4.

Commutative law: nm = mn.

  • ||| \(\cdot\) || = || || || or 6. Rearranging this, || \(\cdot\) ||| is ||| ||| which is again 6.

Associative law: (kn)m = k(nm).

  • If k = |, and n = ||, and m = ||| and parens expressions are evaluated first:
    • Left side: (kn)m. We first evaluate (kn) as || (2 groups of 1, the identity law), and 2m is ||| ||| (2 groups of 3) for a total of 6.
    • Right side: k(nm): In the parens nm is 2 groups of 3, so ||| |||. k(6) is the identity law, one group of 6. Thus (kn)m = k(nm).

Distributive law: (k + n)m = km + nm.

  • If k = ||||, and n = ||, and m = |:
    • Left side: (|||| + ||) is 6, and (6) x 1 is six groups of 1 or 6 according to the identity law.
    • Right side: km + nm: 4 groups of one is 4, plus two groups of 1 is 2 (identity law). Answer is again 6.

Laws of arithmetic 2

  • YouTube - Arithmetic and Geometry Math Foundations 3

Wildberger shows another interpretation of multiplication, and walks through the proofs of the laws by rearranging according to the commutative law.

Subtraction and division

  • YouTube - Arithmetic and Geometry Math Foundations 4

Subtraction is the inverse of addition

  • If k + m = n then k = n - m
  • Note: n - m is only defined if n > m as we are still in the realm of the Naturals and negative numbers/integers aren't yet defined.

+, -, are binary operations and can only do 2 operations at a time

  • Laws of Subtraction:
    • n - (m + k) = (n - m) - k
    • n + (m - k) = (n + m) - k
    • n - (m - k) = (n - m) + k
  • Distributive laws of subtraction:
    • n(m - k) = nm - nk
    • (n - m)k = nk - mk

Note to remove parens: we can drop the grouping enforcement when using addition (associative law) but must distribute the negative sign, as subtraction is not associative.

  • n - (m + k) = (n - m) - k
    • removing grouping: n - m - k = n - m - k

The Hindu-Arabic number system

  • YouTube - Arithmetic and Geometry Math Foundations 6

A simplified Roman numeral system is introduced to abstract our previous notation of one's so we can manipulate bigger numbers.

  • Uses powers of 10
    • 327 = 3(100) + 2(10) + 7(1)

Arithmetic with Hindu-Arabic notation

  • YouTube - Arithmetic and Geometry Math Foundations 7

Multiplication with the Hindu-Arabic notation:

  • Same laws as before:
    • (ab)c = a(bc)
    • a(b + c) = ab + ac
    • (a + b)(c + d) = ac + ad + bc + bd

A different way of hand multiplication is shown, taking advantage of the fact the notation is using the distributive law.


  • YouTube - Arithmetic and Geometry Math Foundations 8

Division is repeated subtraction. Laws of Division:

  • 1. \(\frac{k}{m}\) + \(\frac{n}{m}\) = \(\frac{k + n}{m}\)
  • 2. \(\frac{k}{m}\) \(\cdot\) \(\frac{n}{l}\) = \(\frac{kn}{ml}\)
  • 3. \(\frac{k/m}{n}\) = \(\frac{k}{mn}\)
  • 4. \(\frac{k}{m/n}\) = \(\frac{kn}{m}\)


  • YouTube - Arithmetic and Geometry Math Foundations 9

We're still using natural numbers.

  • Def: A fraction, is an ordered pair (m,n) of natural numbers also written m/n.
    • Fractions m/n and k/l are equal precisely when ml=nk.

Arithmetic with fractions

  • YouTube - Arithmetic and Geometry Math Foundations 10

We're still in the natural numbers, which he defined as beginning with 1 so we aren't working with zeros yet (no division by zero corner cases). He defines the operations of addition and multiplication with fractions using the laws of division we already learned.

Note how he sets up the proof. The 'prime' notation (the tick, so c', a', b') represents another variable. You could rewrite this 'Suppose a/b = y/z so that a/z = b/y, and c/d = something/another, so that c/another = something/d does it then follow …'.

What we are trying to prove is ab' = ba' and cd' = dc'. He introduces some extra notation, multiplying both sides by (dd') to help do this.

Laws of arithmetic for fractions

  • YouTube - Arithmetic and Geometry Math Foundations 11

Wildberger proves the distributive and commutative laws using fractions while still in the natural numbers.


Now we will redo what we just learned rigorously in Terence Tao's Analysis I. The author explains here why we need this rigorous stage. This book was originally supplementary notes for students while they read through Rudin's analysis book. Tao starts from the very beginning, and constructs all the algebra laws for natural numbers, integers, rational and real numbers then will teach us calculus so if you have no math background this is fine since we're going to redo it all from scratch anyway.

Materials needed

  • The 3rd edition of Analysis I from WorldCat or LibGen (or buy it, it's an inexpensive book if you get the international version from abebooks or Amazon India)
  • The book How to Think Like a Mathematician by Kevin Houston. Most of this won't make sense until we do a few chapter's out of Tao's book, we will read it at the same time.
  • The optional book An Infinite Descent into Pure Mathematics by Clive Newstead which provides another view of the same material if you're confused by something, look it up here (or search YouTube for a tutorial)

How to read math

See chapter 2 Reading mathematics and chapter 12 Examples and counterexamples in Kevin Houston's book.

2.1 The Peano Axioms

We begin like we did with Wildberger, basic counting but rigorously defined.

  • The five Axioms
    • Axiom 2.1: 0 is a natural number
    • Axiom 2.2: If n is a natural number, then n++ is also a natural number
    • Axiom 2.3: 0 is not the successor of any natural number
    • Axiom 2.4: Different natural numbers must have different successors
    • Axiom 2.5: Let P(n) be any property pertaining to a natural number n. Suppose that P(0) is true, and suppose that whenever P(n) is true, P(n++) is also true. Then P(n) is true for every natural number n.
      • We make an assumption about a property P(n) being true meaning you do not prove P(n), you prove P(0) and P(n + 1) are true, and both these form a logical implication meaning that if they are both true, then your initial assumption of P(n) being true holds.

Remark 2.1.10 and the template in Proposition 2.1.11 is probably the clearest definition of proof by induction I've seen. Note the example where no natural number can be 'a half number', because of the increment step n + 1. The base case of 0 (as per Axiom 2.5, and Axiom 2.1) fails if we are using 'half numbers' instead of natural numbers because if zero is a natural number, and any n++ is another natural number, the only next successive number is 1 it can't be 0.5 or -3 or 3/4. The way you should view these axioms about the natural numbers is:

  • 0 is an element of N
  • if x is an element of N, then so is x + 1.
  • nothing else is an element of N


Let's understand Axiom 2.5 by watching a lecture from MIT on proofs though you will end up writing so many induction style proofs on the natural numbers in Tao's book this will become second nature.

  • 1. Induction lecture
    • From MIT's 6.042j Mathematics for Computer Scientists
  • Read chapter 24 Techniques of proof IV: Induction in Kevin Houston's book

Back to 2.1

Proposition 2.1.16 (Recursive definitions) in the book Analysis I. Tao uses a function, it's definition can be read in chapter 3.3. Here \(a_0\) is function notation so \(a_0\) is \(f_0(0)\), \(a_1\) is \(f_1(1)\). This proposition is using a new function (a rule that maps from one input to one output) every time, I'm guessing to avoid sequences where they could repeat ie 0, 1, 1, 2, 3..

Let's do an example.

  • \(a_0\) becomes 0 (for c = 0)
  • \(a_{0++}\) becomes \(f_1(a_1)\) or 1
  • \(a_{1++}\) becomes \(f_2(a_2)\) or 2

The proof notes Axiom 2.3 protects the value \(a_0\) from being redefined by one of the functions. His base case for this proof is \({a_0}\) (Axiom 2.3) and he is assuming P(n) which is that the procedure will always give a single value to \({a_n}\). Axiom 2.4 is the inductive step, that this procedure will also give a single value to \(a_{m++}\). All of the Axioms are used to prove this. If this didn't make sense it will when we come back here in chapter 3.5 for an exercise.

Aside: Induction and Recursion

Induction and recursion are the same concept. As an example, the specification of a program function twoExp(n) that when given a value n, returns \(2^n\). The examples you write for this program are as follows:

  • twoExp(0) = 1
  • twoExp(1) = 2
  • twoExp(2) = 4
  • twoExp(3) = 8
  • twoExp(4) = 16
  • twoExp(5) = 32

You notice the pattern: the result of the next value, is 2x the previous value. twoExp(5) is 2 * twoExp(4). You write an algorithm using the above examples: if input = 0, then return 1, otherwise return 2 * twoExp(n-1). You make an assumption, that the input will always be non-negative. Walking through the program execution, twoExp(3) evaluates:

  • 2 * twoExp(3-1)
  • 2 * (2 * twoExp(2-1))
  • 2 * (2 * (2 * twoExp(1-1)))
  • 2 * (2 * (2 * (1))) = \(2^3\)

To prove this by induction on n, we try the base case P(0). If n = 0 then twoExp(0) evaluates to 1, which is \(2^0\). What is P(n)? According to our recursive definition, n = 2*P where the value of P = twoExp(n-1) or \(2^{n-1}\). Our inductive step is n, since our algorithm is inductively defined n-1, and we wish to prove the successor n + 1, which is n. 2*P = \(2*2^{n-1}\) or \(2^n\).

Note that we didn't have to do anything extra to prove this recursive function. That's because an inductive proof is a recursive function, and a recursive function is an inductive proof.

2.2 Addition

Let's understand his example for addition. 3 + 5 should be the same as (((5++)++)++) and adding 0 to 5 should result in 5.

Definition 2.2.1 notice he has a base case (0 + m is m), the hypotenuse "Suppose we inductively defined how to add n to m" which we saw at the beginning of 2.2 where 3 + 5 is defined as incrementing five three times, finally the inductive step, we can add n++ to m by defining (n++) + m to be (n + m)++.

Try writing your own examples as advised in the book How to Think Like a Mathematician

  • 0 + m = m
  • 1 + m = ((0++) + m) or m++
  • 2 + m = ((1++) + m) or ((m++)++)
  • 3 + m = ((2++) + m) or (((m++)++)++)
  • 4 + m = ((3++) + m) or ((((m++)++)++)++)
  • 2 + 3 = ((3++)++) or (4++) or 5.
  • 3 + 2 = (((2++)++)++) or ((3++)++) or (4++) or 5.

Lemma 2.2.2 notice first how he is using the exact same induction template from 2.1.11, you will eventually get very familiar with this template. He's using prior established definitions to set up the proofs, notably definition 2.2.1 where 0 + m = m is defined. He assumed that n + 0 = n. Next the inductive step, where (n++) + 0 = n++. No arithmetic was done here, he pointed to the definition of addition from 2.2.1 that (n++) + 0 is (n + 0)++ or n++, so we have n++ = n++ and the proof is done. He used a different variable to not confuse it with the previously established definition of 0 + m = m, since we are now going the other way with n + 0 = n on our way to prove a + b = b + a.

Lemma 2.2.3 Prove: n + (m++) = (n + m)++. We are inducting on n, and keeping m fixed which means we apply the inductive step to n++ instead of m++. Replace all instances of n with 0 for the base case and show the inductive hypothesis holds that 0 plus the increment of some other number, equals that number. Try picking apart this proof yourself to follow it:

  • base case:
    • 0 + (m++) = (0 + m)++
    • left side is m++ because of definition 2.2.1
    • right side is m++ because of definition 2.2.1
  • inductive hypotenuse: 'Suppose n + m++ = (n + m)++'
  • inductive step:
    • n++ + (m++) = (n++ + m)++
    • left side is ((n + m)++)++ because of 2.2.1
    • right side is ((n + m)++)++ because of 2.2.1

Corollary: Tao asks why n++ = n + 1 is a consequence from the two proved lemmas. Compare lemma 2.2.3 For any natural numbers n and m, n + (m++) = (n + m)++ to definition 2.2.1 where we were able to show m++ = m + 1. Compare lemma 2.2.2 to the base case in definition 2.2.1

Proposition 2.2.4 (Addition is commutative) the base case is straight forward. We use the definition of addition, and lemma 2.2.2 to rewrite the successor of n to be (n + m)++ = (m + n)++ and now it is in the form of the inductive hypothesis, so we can use our hypothesis of n + m = m + n to show that (m + n)++ is the same as (n + m)++. Proposition 2.2.6 is similar.

Exercise 2.2.1

Remember a proof at our early level is just a convicing argument to yourself. To check that it is correct, all the forms have to already have been introduced in the text, for example say Tao has only defined 0 + b = 0 but not b + 0 = 0 + b. We can't assume b + 0 = 0 + b unless it has already been proved (or you write your own lemma for it) so if Tao has only shown 0 + b = 0, you can still do a dance where you claim b = 0 and then by substitution claim it's also 0 = b but you can't directly assume anything because 'it's obvious'. This is a hard skill to learn, spotting assumptions, we will take more courses on logic later and practice this skill. I've probably incorrectly assumed things in my own feeble attempts at the exercises, keep that in mind if you look at my proof scratchwork.

Proposition 2.2.5 (Addition is associative) what we want to prove is (a + b) + c = a + (b + c)

Base Case: What is the base case? We want to show that if a natural number is inside or outside brackets, it will have the same result when added together. Induct on any variable (a, b or c) you want. This will use substitution, which is in the appendix. If we say x = y, then every future occurrence of y we can substitute for x:

(a + b) + 0 = a + (b + 0) (substitute 0 for c)

  • left side: (a + b) + 0 since b + 0 = b (lemma 2.2.2) then we have:
  • a + (b + 0) (lemma 2.2.2)
    • base case: a + (b + 0) = a + (b + 0)
  • or choose a: right side: 0 + (b + c) = b + c by definition of addition
  • lef side: (0 + b) + c by definition of addition
    • another base case: 0 + (b + c) = 0 + (b + c)

IH: The inductive hypothesis (IH) we assume addition is associative, and that (a + b) + c = a + (b + c). If any steps of your proof are in the form (a + b) + c or a + (b + c) you can substitute them for each other by the IH, since we declared them equal. For example: y + a + (b + c). Note a + (b + c) is the form of our hypothesis, so can change this to y + (a + b) + c

Inductive step: There's a few ways to show this, I'll move stuff around randomly and you can see where the proof is:

Inducting on c: (a + b) + (c++) = a + (b + (c++))

  • right side: a + (b + (c++))
  • a + (b + c)++ (lemma 2.2.3)
  • ((a + b) + c)++ (by the IH)
  • ((a + b)++) + c (def of addition)
  • ((b + a)++) + c (commutative)
  • ((b++)+ a) + c (def of addition)
  • c + ((b++) + a) (commutative)
  • c + (a + (b++)) (commutative)
  • left side: (a + b) + (c++)
  • (c++) + (a + b) (commutative)
  • (c + (a + b))++ (def of addition.. consider (a + b) to be one value
    • Recall (n++) + m = (n + m)++ ergo (n++) + (m + b) is (n + (m + b))++
  • ((a + b) + c)++ (commutative)
  • (a + (b + c))++ (by the IH)
  • ((b + c) + a)++ (commutative)
    • etc., can keep moving these around

Try moving everything around until you understand exactly how commutativity and associativity work. See if you can get it to a form where you can use the inductive hypothesis, substitute and permute them around some more until this starts to make sense. Remember that (n + m) is one value, so (n++) + (n + m) is (n++) + (value) which we know from the text can be written as the form (n + (n + m))++, and (n + (n + m)) can be rewritten according to commutative property ((n + m) + n) because it's just (n + (value)) so ((value) + n) changes nothing. .

  • The proof, mimic Tao and write full sentences
    • We induct on a keeping b and c fixed. We first consider the base case a = 0 and we have to prove (0 + b) + c = 0 + (b + c). By the definition of addition (0 + b) + c = b + c, and 0 + (b + c) = b + c, thus both sides are equal to b + c and are equal to each other. Suppose inductively that addition is associative and (a + b) + c = a + (b + c), now we have to prove that ((a++) + b) + c = (a++) + (b + c) to close the induction. By the definition of addition, the left hand side is equal to ((a + b)++) + c which is also equal to ((a + b) + c)++ by the definition of addition. By the inductive hypothesis, ((a + b) + c)++ is equal to (a + (b + c))++. The right-hand side is equal to (a + (b + c))++ by the definition of addition and now both sides are equal to each other, and we have closed the induction.

2.2. cont.

Proposition 2.2.6 (Cancellation Law) uses Axiom 2.4 which states 'Different natural numbers must have different successors; i.e., if n, m are natural numbers and n = m, then n++ = m++. Equivalently, if n++ = m++, then we must have n = m'. So (a + b)++ = (a + c)++ is = a + b = a + c by the line if n++ = m++, then n = m in axiom 2.4. We dropped down from the successor by using axiom 2.4 and can now use our inductive hypothesis of the cancellation law for a to show b = c and close the induction. .

Lemma 2.2.10 has an exercise where induction is suggested but this is actually much easier to prove using contradiction (See Kevin Houston book). Always read the comments on Tao's blog if you're stuck: "You can modify the statement of the exercise to an equivalent statement which makes sense for a=0. (The statement will probably look something like “For any natural number a, either a=0, or else there exists exactly one…”.)" If you see this stackexchange answer they use the peano axioms of 2.2 existence and 2.4 uniqueness. We can also try the chapter on contradiction proofs in Kevin Houston's book or the appendix of Analysis: Suppose for the sake of contradiction that that there exists two different natural numbers with the same successor. Then a = m++ and b = m++. But then by Axiom 2.4 we have a = b, a contradiction.

Definition 2.2.11 n is greater or equal to m if n = m + some other natural number. If that number is 0, then the equality holds, if that number is 0++ then the inequality holds.


Try and figure these out yourself before looking at my interpretation. If you naively just wrote down your own argument why you think something is true while going back over all the text, and it was completely wrong, you'd still learn more than looking at my notes. It's better to do it yourself then later we will go through all of Tao's chapter on Mathematical Reasoning, and take a course in Proof Theory, then you will understand how to manipulate definitions by declaring them logically equivalent, making writing proofs and checking them for correctness much easier.

For now just continue writing whatever, doesn't matter if your exercise answers are correct or not it only matters that you attempted to do them.

Proposition 2.2.12 / Exercise 2.2.3: Prove the basic properties of order for natural numbers. Here I will show how you can 'reverse' or unravel lemma 2.2.2 or other proved proposition, none of this is a properly written proof only notes on moving through steps to understand the problem then after write a proof in the style Tao has been teaching us. We just want to point to definitions and lemmas here not use induction:

  • Order is reflexive
  • Reflexive property is that any natural number is equal to itself.
  • We want to show order is reflexive: a \(\ge\) a
  • a = a + n (def 2.2.11 of ordering of natural numbers)
  • From lemma 2.2.2 we know that a + 0 = a and so a = a and is reflexive
  • By definition 2.2.11 we also have a \(\ge\) a
  • Transitive order property
  • a \(\ge\) b and b \(\ge\) c, then a \(\ge\) c
  • If a = b + 0, b = c + 0, and a = c + 0, then we have lemma 2.2.2 and transitivity (equality) holds with a = b, b = c and a = c.
  • Substitution:
  • a = b + n
  • b = c + n
    • b is equal to (c + n) so let's put that in place of b in a = b + n:
  • a = (c + n) + n
  • a = c + (n++)
    • By lemma 2.2.10 we have a = c + m since b++ will always equal some nonzero natural number.
  • By definition 2.2.11 a = c + m is a \(\ge\) c + m

Proof attempt, though we could write our own def or lemma that a + b = a natural number it's largely implied by everything we've read so far: By the definition of Ordering of the natural numbers, if a = b + m and b = c + n, then a = c + m + n. By associativity a = c + (m + n) and since if a = b when m or n are 0, then a = c when m and n are 0. By substitution d = m + n, and since m and n are natural numbers, d is a natural number by the Peano axioms and 2.1.3 (m + n)++ = d++, and by Axiom 2.4 (m + n) = d so we have a = c + d as desired by the definition of Ordering of the natural numbers.

  • Anti-symmetric order
  • If a \(\ge\) b and b \(\ge\) a then a = b
  • See the entry for Antisymmetric relation
  • If these two inequalites hold then they must be equal
  • We need to show a = b:
  • a = b + m by 2.2.11
  • b = a + n by 2.2.11
  • Substitute b for (a + n):
  • a = (a + n) + m
  • a = a + (n + m)
  • a + 0 = a + (n + m) ('reversing' lemma 2.2.2)
  • 0 = n + m (cancellation law)
  • 0 = 0 + 0 (corollary 2.2.9)
  • If n = 0, and m = 0 by above, then a = b + 0 which is a = b
  • d) Addition preserves order
  • a \(\ge\) b iff a + c \(\ge\) b + c
  • a + c = b + n + c by 2.2.11
  • a = b + n by cancellation law and a \(\ge\) b by 2.2.11
  • Another method: a + c \(\ge\) b + c
  • a + c = b + n + c by 2.2.11
  • a + c = b + c + n (commutative law)
  • (a + c) = (b + c) + n (associative law)
    • This is nat number = nat number + another number or by 2.2.11 a \(\ge\) b
  • e) a < b iff a++ \(\le\) b
  • b = (a++) + n (def ordering of natural numbers)
  • b = (a + n)++ (def of addition)
    • successor of a number cannot be itself, so b does not equal a and by 2.2.11 a < b
    • We skipped steps, Tao is more thorough so let's try it again:
  • b = a + n and n does not equal zero (2.2.11)
  • Since n must be a positive natural number by 2.2.7 and 2.2.11:
  • b = a + (n++)
    • def of a positive natural lemma 2.2.10
  • b = a + (n + 1) (corollary from 2.2.2/2.2.3)
  • b = a + (1 + n ) (commutative law)
  • b = (a + 1) + n (associative law, grouping doesn't matter for addition)
  • b = (a++) + n by previous corollary of n++ is n + 1
    • a++ \(\le\) b since by 2.2.11
  • f) a < b iff b = a + d for some positive number d
  • For the sake of contradiction, assume a \(\le\) b in the a = b case for some positive number d. Then a + 0 = a + d by lemma 2.2.2, and 0 = d by cancellation law. This is a contradiction that d must be a positive number so a cannot equal b and a \(\lt\) b

Proposition 2.2.13 fill in the gaps:.

  • When a = 0, we have 0 \(\le\) b for all b (why?)
    • 2.2.1 Addition of natural numbers, 0 + m = m so 0 + b = b and by 2.2.11 0 \(\le\) b
  • If a > b then a++ > b (why?)
    • a = b + d (as per 2.2.12, d must be positive)
    • c++ = d (lemma 2.2.10)
    • a++ = b + (c++) (substitution)
    • a++ > b by def of ordering
    • or not so rigorous way:
    • a++ = (b + c)++ (def of addition, (b + c)++ must be positive
    • a = b + c (Axiom 2.4)
    • a > b (2.2.12 d is positive)
  • If a = b, then a++ > b (why?)
    • If a = b, then a++ = b++ by Axiom 2.4
    • a++ = b + 1 (corollary 2.2.2/2.2.3)
    • this is a++ > b by 2.2.11

Proposition 2.2.14 (Strong induction) From MIT this lecture on strong induction, the intuition here is Q(m) if all the dominoes are already knocked down and there's one left standing, then it too will be knocked down where 'weak induction' intuition is P(n) if you knock down one domino, they will all fall. If P(n) is true because P(n++) implies P(n) true, then for all m \(\le\) P(n), Q(m) is true. If you still don't get it search Youtube for another tutorial.

First look up vacuous truth or see the appendix on mathematical reasoning where Tao defines it.

  • Let Q(n) be the property that P(m) is true for all m\(_0\) \(\le\) m < n for n \(\ge\) m\(_0\).
  • Q(0) is true since:
    • m\(_0\) \(\le\) m < 0 is vacuously true as stated by Tao in the exercise hint
    • 2.2.14 proposition of for all n \(\ge\) m\(_0\) is true 0 \(\ge\) 0\(_0\)
  • Show that Q(n++) is true:
  • P(m) m\(_0\) \(\le\) m \(\lt\) n \(\lt\) n++ is true (2.2.11)
  • P(m) m\(_0\) \(\le\) m \(\lt\) n++ is true for all n \(\ge\) m\(_0\)

Try an example.

  • P(n) is the property that n + 0 = n
  • Q(0) is 0 + 0 = 0 and P(0)
  • Q(1) is 1 + 0 = 1 and P(1), P(0)
  • Q(2) is 2 + 0 = 2 and P(2), P(1), P(0)

Exercise 2.2.6: Your proof could roughly be 'Suppose P(n++) is true such that all P(m) is true for m \(\le\) n++. Then per Axiom 2.5 P(n) is true, so P(m) is true for all m \(\le\) n' since Tao hinted at 'backwards induction'.

2.3 Multiplication

Proposition 2.3.4 follow along with definition 2.3.1 of multiplication where (n++) x m = (n x m) + m also recall that n++ is n + 1: (n + 1)m so nm + m

  • left side:
  • a(b + c)++)) is a(b + c) + a
  • because (m(n++)) = m(n) + m
  • right side:
  • ab + a(c++) = ab + ac + a
  • ab + ac + a is in the form of the inductive hypothesis so a(b + c) + a

Remark 2.3.8 the contradiction is that ac = bc by corollary 2.3.7 when fed into the trichotomy of order can only result in a = b. This is a scheme to produce a temporary notion of division/cancellation.


Scratchwork and notes

Exercise 2.3.1 n x m = m x n:

  • base case: 0 x m = m x 0
  • left side: 0 x m = 0 by definition 2.3.1

For the right side, m x 0 = 0, write your own lemma that mimics 2.2.2 but for multiplication:

Lemma X: For any natural number n, n x 0 = 0:

  • base case: 0 x 0 = 0
  • IH: n x 0 = 0
  • Inductive step: (n++) x 0 = 0
  • (n x 0) + 0 = 0 (def 2.3.1)
  • Remove the + 0 (lemma 2.2.2)
  • Now it is in the form of the IH: n x 0 = 0

Lemma Y: m x (n++) is m + (n x m), rewriting Lemma 2.2.3 but for multiplication

  • base case: Our lemma X and def 2.3.1
  • IH: m x (n++) = m + (n x m)
  • Inductive step: m x (n++)++ = m + (n++ x m)
  • Left:
    • m + (n++ x m) by IH
    • m + (n x m) + m by def 2.3.1
  • Right:
    • m + (n x m) + m by def 2.3.1

Now we have to prove m x n = n x m

  • base case: def 2.3.1 and lemma X
  • IH: m x n = n x m
  • m x n++ = n++ x m
  • Lemma Y = def 2.3.1

Exercise 2.3.2

The hint is to prove the second statement first, if n and m are both positive, then nm is also positive. You could use Proposition 2.3.6 as a template for this:

  • If n > 0, and m > 0, then nm > 0
  • We have nm = 0 + d for some positive d
  • Since d is positive, n and m must be positive (or else 0 = 0 + d, a contradiction)

Or you could use Lemma 2.2.10, n x m = (b++) x m and then mimic 2.3.6 to determine n x m is positive if n and m are positive. Note that (b++) x m expands to the form (b x m) + b, and since you know a x b + a is the successor form of multiplication it can't be zero.

Lemma 2.3.3: n x m = 0 iff n = 0 or m = 0. You could try a contradiction using the nm > 0 proof we already wrote, so it must be n = 0 or m = 0 if nm = 0 or a contrapositive of Tao's statement: see the appendix on mathematical reasoning or look up contrapositive in Houston's book. If Tao states something, and then proves it, then there is a contrapositive statement we can use to prove another fact.

Exercise 2.3.3 A rewrite of 2.2.5: For any natural numbers a, b c, we have (a x b) x c = a x (b x c). You could induct on b, convert to the 2.3.1 form, and rearrange whenever you move something around and it is in the form of the inductive hypotenuse.

Tao also suggests using the distributive law:

  • Inductive Step: (a x b++) x c = a x (b++ x c)
  • (a(b++)c) = (a(b++)c) by distributed law
  • (ac(b++)) = (ac(b++)) by commutative law
  • (ac(b + 1)) = (ac(b + 1)) by corollary n++ = n + 1
  • acb + ac = acb + ac by distributed law again
    • cancellation law the two +ac:
    • acb = acb or (ab)c = a(bc)
    • or permute any way you want like we did for 2.2.5

Exercise 2.3.4

Prove the identity.

  • (a + b)(a + b) = (a + b)a + (a + b)b by distributive law/exponentiation
  • aa + ba + ab + bb by distributive law
  • a\(^2\) + ba + ab + b\(^2\) by def 2.3.11/Examples 2.3.12
  • ba + ab is ba + ba by commutative law
  • ba + ba is 2ba by def 2.3.1 where 2 x m = 0 + m + m
  • (a + b)\(^2\) = a\(^2\) + 2ab + b\(^2\)

Exercise 2.3.5

Prove proposition 2.3.9 by fixing q and inducting on n for n = mq + r and 0 \(\le\) r \(\lt\) q

For the base case, if 0 = mq + r, then m and r must be zero because this still satisfies the inequality where q is greater than r and remember we fixed q, so it can't be zero. For the inductive step, n + 1 = mq + r + 1 (adding + 1 to both sides as per previous lectures on induction), we have two cases, r + 1 < q or r + 1 \(\le\) q. If r + 1 < q then we have the inductive hypothesis and there's nothing more to prove since 0 \(\le\) r + 1 \(\lt\) q. Remember the order of multiplication/addition, this is still (number) = (number) plus n in the form of (n + 1) = (mq) + (r + 1).

If r + 1 \(\le\) q, or (r = q), then we have n + 1 = mq + q, and from the definition of multiplication/distributive law this is n + 1 = (m + 1)q or m++(q) + 0 and 0 still satisfies the inequality of r < q.


Before we learn some algebra and modern trigonometry, we should at least see some geometry. These lectures follow the John Stillwell book Mathematics and it's History where Wildberger slices various shapes with planes so you can see why we study things like a parabola or hyperbola. This is only casual viewing.


  • Pythagoras' Theorem 1 and Pythaoras' Theorem 2
    • A geometric proof of original Pythagoras theorem
    • A proof by contradiction why the square root of 2 is irrational
    • Part B shows classical cos/sin/tan trigonometry using chords


This is the geometry we will be doing shortly that allows for computational Euclidean geometry. We don't have a definition of polynomials yet but you can probably understand most of this lecture anyway.

  • Analytic Geometry
    • Cartesian coordinate plane explained
    • Part B covers periodic continued fractions method to estimate irrational numbers


Abstract geometry, project from one plane to another like 2D to 3D or into another field to reduce complexity.

  • 3D Projective Plane
    • Shows you different perspectives when looking at objects
    • Lay the x,y cartesian plane flat, add a z dimension like a flag pole, add another flat cartesian plane in z, project lines from x,y into z


Even more abstract geometry, construct a new plane by slicing a sphere and studying a hyperbola. Hyperbolic geometry seems better suited for deep learning as it has 'more capacity'. Maybe we will end up back here projecting our data into a hyperbolic plane.


Geometry of space curves, osculating plane

Some Algebra

Crash course in solving equations & polynomials

Flip through sections 1.3.1, 1.3.2 and 1.3.3 Review, Redo, Renew from this draft book used in CMU's 15-151/21-28 course. The author has a list of errata on his page. You will get a crash course in some quick arithmetic, solving linear equations and polynomials. There's a few interesting methods here you may not know about like how to variables from equations. Wildberger covers much of this material too for example Problem 1.3.8, an infinite nest is also in a continuous fractions lecture from the History of Mathematics playlist: \(x = \sqrt{2 + x}\)

If you're interested there is a summary of Algebra here that was originally made for non-english speaking university students. Tao will also define everything here as we progress through his analysis texts.


Some optional mostly 10 minute lectures on algebra basics, such as how Pascal's array/Khayyam triangle/Yang Hui's triangle encodes the binomial theorem, this series sets up understanding algebra and later calculus and combinatorics.

A different modelling of Polynomials

Wildberger has a unique encoding of polynomials in the rest of his math foundations playlist by abstracting coefficients into columns and rows. Pascal's array is then used, and things like factoring or polynomial division is surprisingly trivial to do. Later in the same playlist he teaches some algebraic calculus using this polynumber encoding and taking derivatives and doing integration is again trivial arithmetic. They could teach this to primary school kids. If you have time, I highly recommend watching the rest of the playlist.

Lectures about pi

Before we do trig, let's learn what \(\pi\) is. Wildberger talks about the history of pi such as Archimedes interval for pi, Newton's formula, a hexadecimal version of pi, his own formula. The end of this lecture he claims pi is some kind of complex object we still don't understand. At this stage we only need to know it's a constant, or the applied approximation of 22/7 or 355/113.

Trigonometry Boot Camp

Let's take the Brown Univeristy trig boot camp. There is a crash course lecture. I assume you know the basic gist of what an angle is which will be better defined when we take calculus.

  • Radians you could even make up your own notation and define it as dblpi = 6.283… and then 360deg is dblpi, 90deg is 1/4dblpi, 180deg is dblpi/2, etc.
  • We really only need to know sine and cosine functions, all others are derived from them
  • In the scaling of triangles example where \(\frac{1}{\sqrt 2}\) becomes \(\frac{\sqrt 2}{2}\) they are rationalizing the denominator:
    • \(\frac{1}{\sqrt 2}\cdot\frac{\sqrt 2}{\sqrt 2}\) is \(\frac{\sqrt 2}{\sqrt 2 \cdot \sqrt 2}\) which is \(\frac{\sqrt 2}{\sqrt 4}\) or \(\frac{\sqrt 2}{2}\)
  • You can download the slides to keep a copy of the scaled triangles
  • The reflections think about subtracting 1/6 from 1/1, you get 5/6 so 5pi/6 is pi (180 deg) minus 1pi/6, and a negative direction means we are moving clockwise so move to pi, subtract pi/6 in other direction.
  • The end of the lecture we finally discover why sin() and cos() are wave functions, since it's a function that moves around the unit circle. Tan() is the slope 'rise over run'

Summary: There exists 3 triangles, which we should know their measurements, and scale them down since this is a unit circle with maximum x and y values of 1. Then we superimpose the triangles in the pi/2 (90deg) angle space on the unit circle in such a way that we have known values of cos()/sin() and other trig functions, because we can derive those other functions from cosine and sine. We reflect these triangles into the other areas of the circle so we can get x/y values there as well by rotating around the circle and then adding up all the angles to indicate how much we rotated counter-clockwise (positive) and clockwise (negative). A reflection is demonstrated here so if f(x) = sin(x), cos(x) and tan(x) then y = -(fx).

Try the worksheet you can actually draw these out and count rotations if you want or use this radian to angle picture from Wikipedia. The equations if one side is squared, then you need to take the square root of both sides so it's (not-squared = value) instead of (squared thing = value-when-squared). If you're stuck watch the video that walks through all the solutions then do the worksheet again until you can finish it without needing to look at the video walkthru.

How I solved these: subtract 1 pi if it's bigger than 2pi, or a 1/2pi if smaller and see what you get, look at the 3 triangles if they have a corresponding value for sin/cos. If not subtract pi/3, pi/6 etc. If you're wondering why we are using radians instead of degrees it is explained here. Wildberger's rational trigonometry uses a turn, so pi/2 is 1/4, pi is 1/2, and 2pi is 1. Angles are replaced by 'the spread of vectors'.

Some Calculus

Now that we know trig functions, watch this basics of calculus crash course which was designed for non-english speaking students to learn math notation (with Chinese subtitles). Note in the last few examples how the derivative works d = \(nx^{n-1}\) so in the examples the derivative of \(y = \frac{-g}{2}t^2+v_0t+y_0\) is (\(2\cdot\frac{-g}{2}t^{2-1}+ 1\cdot v_0t^{1-1}\)) or \(-gt + v_0\) you take the exponent power, multiply the coefficient of the base, and subtract the power, so the derivative of \(-gt^1 + v_0\) is (\(1\cdot-gt^0 + v_0\)) which is -g, as we'll see later any terms without a variable (in this example t) cancels out or as Wildberger mentioned are so small so they are discarded.

Summary: if you know the acceleration, then you integrate to find the velocity, then integrate again to find the position. If you know the position, you take a derivative to find it's velocity, and take another derivative to find the acceleration. One function tells you how another function is changing. These are also higher order functions we learned in CS019, where a function produces another function.

Try reading the first chapter from Gilbert Strang's calculus and see if you understand it. His book was designed for self-learning, so that first chapter assumes little background and builds up this intuition of one function that tells you about another. There is also a lecture for this chapter and a self-study guide that acts as a recitation.

Set Theory

History of set theory

There is a lecture on the history of sets/functions, explaining ordinals, cardinals, various schools of logic, and a second lecture about the history of set theory with Godel and Turing.

Appendix A: the basics of mathematical logic

This chapter on set theory in Analysis I relies on the fact that you've already read the appendix on logic and have an idea what implies means or disjunction. Now that we have done some proofs ourselves, and the basics of trigonometry we can finally understand some of the examples he uses in this appendix. Tao makes a point that this chapter is meant to be read and understood not studied and memorized, and thankfully he is not bombarding us with piles of confusing logic notation that you normally see in other 'how to do proofs' books. At the same time you read this appendix, read How to Think Like a Mathematician by K. Houston it will clear up any cryptic definitions like how implication works.

A.2 Implication

A.2 Implication seems confusing on first read, it's modern language use relating to casuality is different from the logician's use which is explained here. The only permutation where implication is false is if p implies q and p is true, but q is false. See the Kevin Houston book for more details.

An optional short series on the history of various schools of logic that led to our present state.

Using Boole's Algebra w/implications

Another optional lecture from a short series on Boole's Algebra which is different than Boolean Algebra you've already seen (truth tables). For example at around 24:29, he shows how you can verify Modus Ponens (p implies q) with some very simple algebra instead of deducing symbols.

Appendix A cont.

Proposition A.2.3, confusing the hypotenuse "Suppose that…" by assuming the conclusion. We always assume the hypotenuse, and prove the conclusion. Proposition A.2.6, proof by contradiction, we know that when pi/2 (90deg) the sine function is 1, because remember it is the y axis of the unit circle. Tao also makes a note here to not use confusing logical notation in proofs.

Remark A.5.1 is in the above history of logic lectures about Aristotle logic.

Exercise A.5.1

  • a) For every x++, and every y++, y*y = x or x is the square root of y. Is this true? Square root of two is the obvious candidate for claiming this is a false statement since what y * y = sqrt(2).
  • Think about the rest, using the K. Houston book as you go to find a contrapositive

A.6 the point of the examples is to make a point about arguments that use outer variables depending on inner variables being false. We can come back later and reread this section when we learn more concepts like limits, indeed we will likely have to reread this entire appendix many times over while writing future proofs.

Quiz: Logic

These multiple choice quizzes are from Tao's Analysis class, try the logic quiz. If you struggle read the other two resources from Materials needed relevant chapters on logic, search YouTube for proofs by implication, or search 'discrete math inference rules' there are dozens of tutorials. Take the quiz again.

3.1 Fundamentals (Set Theory)

We are back to reading Tao's Analysis I and still in the realm of the natural numbers. Example 3.1.2 is similar to a nested list such as [list: 1, [list: 1, 2], 100] is length 3, even though there is a nested list overall it still only contains 3 elements. Remark 3.1.3 is in the above historical lectures on set theory. We are referred to Appendix A.7 Equality to read about the substitution axiom which you should have already read.

Lemma 3.1.6 if set A is equal to set (empty), then if x is in A it must also be in set empty, a contradiction, so A must be a non-empty set is how I interpret the lemma, which is a way to say that if an empty set exists, there must also be a possibility of a non-empty set. Remark 3.1.8 Cardinality if you watched the Wildberger History of Mathematics lecture linked above assigns a value.

Note the errata for Definition 3.1.4 from Tao's site: "Definition 3.1.4 has to be given the status of an axiom (the axiom of extensionality) rather than a definition, changing all references to this definition accordingly. This requires some changes to the text discussing this definition. Firstly, in the preceding paragraph, “define the notion of equality” will now be “seek to capture the notion of equality”, and “formalize this as a definition” should be “formalize this as an axiom”. For the paragraph after Example 3.1.5, delete the first two sentences, and remove the word “Thus” from the third sentence. Exercise 3.1.1 is now trivial and can be deleted." So this is now Axiom 3.1.4, but the definition remains the same.

Exercise 3.1.2 prove emptyset, {emptyset}, {{emptyset}}, and {emptyset{emptyset}} are all not equal.

  • If \(\emptyset\) = {\(\emptyset\)} then by Def 3.1.4 \(\emptyset\) \(\in \emptyset\) a contradiction to Axiom 3.2
  • Same reasoning as above for \(\emptyset\) is not equal to {{\(\emptyset\)}} or {\(\emptyset\){\(\emptyset\)}}
  • {\(\emptyset\)} is not equal to {{\(\emptyset\)}} because [list: 1] is not equal to [list: [list: 1]]. The first has element 1 in it, the second has element [list: 1]. More precisely if {\(\emptyset\)} = {{\(\emptyset\)}} then by Def 3.1.4 every element of {\(\emptyset\)} (it's only element is \(\emptyset\)) must be in {{\(\emptyset\)}} (it's only element is {\(\emptyset\)}) and vice-versa, so claiming {\(\emptyset\)} \(\in\) {\(\emptyset\)} which contradicts Axiom 3.3, where y \(\in\) {a} iff y = a. In this case y must = \(\emptyset\) to be in {\(\emptyset\)}.

Remark 3.1.21 clarifies the above exercise how {2} is not in {1,2,3} but 2 is in {1,2,3}. The rest of this chapter is straight forward definitions of set operations, better than the usual treatment of venn diagrams. Remark 3.1.26 distinct is defined as there are elements of one set which are not elements of the other so \(\emptyset\) and \(\emptyset\) are not distinct because neither can have elements. In Proposition 3.1.28 notice how any operation on two sets results in a new set, like a higher order filter function over a list results in a new list.

Example 3.1.32 looks a lot like programming syntax. There seems to be an order of evaluation, applying the axiom of specification first to create a new set, then applying the axiom of replacement so working right to left in {n++ : n \(\in\) {3,5,9); n<6}

Quiz Sets

Try Tao's quiz. If you totally failed this try secondary reading material of the other two books in Materials needed. Watch Youtube lectures and take the quiz again.


  • 3.1.3 prove the remaining claims in 3.1.13:
  • If a and b are objects, then {a, b} = {a} U {b}:
    • By (new axiom) 3.1.4 we need to show every element x in {a, b} is in {a} U {b}. Suppose that x is an element of {a, b} then by axiom 3.3 x = a or x = b. By Axiom 3.4 this means at least one of the following statements is true: x is in {a} or x is in {b}. We divide into cases. If x is in a, then by axiom 3.4 it is in {a} U {b}, similarly if x is in b, it is also in {a} U {b}. Thus in both cases we see every element of {a, b} is in {a} U {b}.
  • The next claim, prove the union operation is commutative:
    • Follows the same proof he did for the union operation is associative. Suppose x is an element of A \(\cup\) B. By def 3.1.4 and axiom 3.4 pairwise union then x \(\in\) A or x \(\in\) B. By disjuntion this is also x \(\in\) B or x \(\in\) A. So x is in B \(\cup\) A and x is in A \(\cup\) B.
  • Last claim, prove A \(\cup\) A is equal to A \(\cup\) \(\emptyset\) is equal to \(\emptyset\) \(\cup\) A is equal to A. Suppose x is an element in A \(\cup\) A, then x is in A or x is in A, so by disjunction (we can pick either, but there's only one choice) x is in A. If x is in \(\emptyset\) \(\cup\) A, x is in \(\emptyset\) or x is in A, so x is in the union of A and \(\emptyset\) and since the emptyset is empty x is in A.
  • 3.1.4 prove claims in Proposition 3.1.18
    • You can write both these proofs exactly like his 3.1.18 proof for transitivity. Suppose A \(\subset\) B and B \(\subset\) A. To prove that A = B we have to prove that every element A is an element of B, and every element B is an element of A by def 3.1.4. Let us pick an arbitrary element x \(\in\) A. Then, since A \(\subset\) B, x must also be an element of B. But since B \(\subset\) A, then every element of A is an element of B and every element B is an element of A. By 3.1.4 they are equal.
  • 3.1.15 Show how union, intersection and subset are logically equivalent:
  • Look in the Houston book in the appendix, see the advice for how to prove that two statements are equivalent. For this exercise if A is a subset of B, and the union of A and B = B, then this implies the intersection of A and B will be A, which in turn implies A is a subset of B.
    • We will show that any two set operations implies another. If x is in A \(\cup\) B, then by definition of union x is in A or B, so by disjunction x is in B. If x is in A, since A is a subset of B then x is again in B. Thus x \(\in\) A \(\cup\) B if and only if x \(\in\) B and by definition of union, and axiom of equality 3.1.4, A \(\cup\) B = B. Since x is in B, if x is in A \(\cap\) B it must also be in A, so x is in A and B, which is the definition of intersection, and if A \(\cap\) B = A, and if A \(\cup\) B = B, then every element of A is in B so A \(\subset\) B.
  • 3.1.16 Prove proposition 3.1.28
    • de Morgan's laws a proof exists for them everywhere including youtube.
    • Rough proof strategy: what you want to do is get your arbitrary element into the form of the definitions for union, intersection and difference sets.
  • 3.1.8 You can google the absorption laws (disjunction(OR) absorbs conjuction(AND)) then try your own proof
  • 3.1.9 If A intersection B = emptyset, then they are disjoint (have no elements in common). This is the definition of 3.1.27 difference of sets, where x is in A but not in B, so A = x - b and B = x - a if the union of A and B = X.
  • 3.1.10 All I can think of is the emptyset, since emptyset union emptyset is still emptyset, and emptyset intersection emptyset is also disjoint.
  • 3.1.11 There is a solution here, but suppose we have a statement P(x, y) pertaining to x and y such that for each x in A, there is at most one y for which P(x, y) is true. Then by the axiom of replacement there exists a set {y: y = f(x) for some x in A}, then y = f(x) for some x in A, and y is in the set of A precisely when P(x) is true, which is the definition of the axiom of specification. Not a complete proof but something like that is the strategy.

3.2 Russell's Paradox

Wildberger covered some of this already, 3.9 Axiom of Regularity is worth reading to see some more examples of what is and what isn't an element of a set.

3.3 Functions

Example 3.3.3 if you forgot range notation. Remark 3.3.5 clears up some notation for functions notably a sequence subscript also denotes a function. Example 3.3.9 asks why 3.3.7 asserts that all functions from empty-set to X are equal, because they have the same domain (nothing) and range.

Our first proof in this chapter is in Lemma 3.3.12, and if we can go by all the previous chapters, this will be our template for the exercises. It also doesn't make sense, so I checked the errata page and sure enough there is errata for this lemma depending what version of the book you're using.

3.3.14 See this tutorial. Write your own examples for this definition. If two distinct elements of A map to the same element of B, this is not injective. However if those two A elements are equal to each other then it is injective:

  • A = {a, b, a} (function input)
  • B = {1, 2} (output)
    • f(a) = 1
    • f(b) = 2
    • f(a) = 1
      • but {a} = {a}, so a = a and this function is injective.
  • A = {b, c}
  • B = {2}
    • f(c) = 2
    • f(b) = 2
      • but {c} doesn't equal {b} so this not injective

3.3.17 Onto functions, another tutorial and for Remark 3.3.24 here is a tutorial for bijective inverse functions

Quiz Functions

Another multiple choice quiz from Tao's analysis class. If you totally failed this try secondary reading material of the other two books in Materials needed. Watch Youtube lectures you can find. Try the quiz again.


A all of these proofs have a youtube tutorial or stackexchange answer, these are common proofs first year students do.

  • Ex 3.3.1 was deleted in errata (See Tao's site)
  • Ex 3.3.2 The proof for this is in the book
  • Ex 3.3.3 The empty function is always injective since it's 'vacuously true' that f(x) = f(x'). It is never surjective since there are no elements in the emptyset that fit the surjective definition of every element in Y is a value of f(x) of f(x) = y, and thus never bijective, since a function must be both injective and surjective to be bijective.
  • Ex 3.3.4 The tilde hat thinghy is just notation for two different fuctions, like f and f'. Cancellation law means we're trying to prove two different functions are the same so g = g' and f = f'. If f and g are injective, then g(f(x)) = g(f'(x)) and f(x) = f'(x), and by one-to-one definition of injective functions f = f' (since f is g's input). Surjective is similar, g(y) = g'f(x)) = g(f(x)) = g'(y) so g = g'. Does g need to be injective for this proof to always hold? Try corner case examples from the text, like if g = n2, then g(f(-1)) and g(f'(1)) have the same output, so two different inputs to g with same output means f(x) can't equal f(x') as per definition of one-to-one functions so we can't always derive f = f'.
  • Ex 3.3.5 If f is injective in g(f(x)), then g doesn't need to be injective. Consider an example A = {1}, B = {2, 3}, and C = {4}. f(1) = g(2) = 4 so f is still injective. The difference in 3.3.4 is the wording 'show that these two input functions are always equal if g in injective' and the second question 'Is this still true if g is not injective?' which means a single case where if g is not injective, and we get two non distinct inputs that equal the same outputg, the whole proof is false whereas here we are asked 'must g always be injective' so just needa single case of not needing to be injective.
  • Ex 3.3.6 Is described here see Remark 3.3.24 if you forgot what an inverse function is. Note how the inverse of f and f cancel each other out.
  • Ex 3.3.7 Similar to above and his association proof

Proof strategies

Watch this starting @ 36:00, from 41:00+ he shows how he solves problems step by step, trying small cases to understand the problem better and poking at the problem enough to finally figure out a proof. You can check your own proofs too, going through Tao's mathematical logic appendix and seeing if the opposite of your argument holds, and double checking all your assumptions 'is there a definition of X or did I just assume it existed'.

So tl;dr relax and just keep writing because the practice of flipping back through the text and rereading definitions and theorems, tracing the logic and seeing how one definition helps define another, and putting this in the form of notes (a proof) is how you learn. This is why none of these books contain solutions because you're supposed to figure it out yourself.

3.4 Images and inverse images

Chapter 3.4 of Tao's Analysis I. Example 3.4.5 makes more sense if you read Example 3.4.6 too, f-1({0,1,4}) is the set of everything that could result in {0,1,4} if f(x) = x2, so {-2,-1,0,1,2} and in Example 3.4.5, the set of everything in the natural numbers that can result in {1,2,3} if f(x) = 2x, is only f({1}). Axiom 3.11 (Union) if A = {{2,3}, {3,4}, {4,5}} and Ua = {2,3,4,5} why? Because it's {2,3} U {3,4} U {4, 5} you are performing union on every set, this is notation to avoid having to write out an enormous page of union operations. Search youtube for indexed sets. The power set you can choose which subsets you want to add, again search for this on youtube.

The exercises, we've already done similar exercises in set theory there's nothing new here. Reminder that millions of discrete and applied math students have already done similar exercises so you can look up proofs of function inverse images anywhere. I did enough of the exercises so I was sure I understood these set and function operations otherwise move on.

3.5 Cartesian products

Another topic with plenty of youtube tutorials. Definition 3.5.7 notation for product is introduced, it looks complicated but all it's doing is specifying an index (i is greater or equal to 1, and less than or equal to n) and repeated cross product. The exercises some of the solutions are walked through on youtube.

The cartesian plane itself is a cartesian product of R x R, you have tuples (x, y) or (x, y, z) so R x R x R. In programming the type int * int is the set Integers x Integers, anything that isn't a cross product of the integers with itself will not type check.

3.6 Cardinality of sets

A good tutorial. Tao is using #(A) notation which means finite cardinality, while |A| cardinality notation means infinite. There exists errata for the exercises specifically 3.6.8. The last exercise the pigeonhole principle if set A has greater cardinality than B, then f(x) A -> B can't be injective, since there would be two elements in A that point to a single element in B.

Programming sets

If you want you can re-do what we've done so far to get better at this material by programming sets using the book Discrete Math and Functional Programming. Everything we covered such as peano axioms, inductive proofs, cardinality, function images and even converting math notation into programs is covered and highly recommended.

Linear Algebra

We're doing Analysis I and Linear Algebra in parallel. These sets of lectures cover Linear Algebra from a geometric approach so later when we work through Tao's Linear Algebra notes we can understand what is happening.

Why this series and not X book/series

This is unique in that each linear algebra topic is setup with the problem it's trying to solve, how the special case of that problem can be solved, then how you would go about creating and proving a general case. The determinant is built up from scratch. The exercises are largely proofs of the lecture content presented at the end of each lecture. It frequently goes into much more advanced material you won't find in any other linear algebra course which is possible since we're just using a simple affine plane, and working with vectors. The idea is full understanding of the subject using visual geometry so later when we read piles of sums and notation for matrix manipulation we can understand what they are doing. This course teaches you to become 3blue1brown where you can make your own visualizations.

Introduction to Linear Algebra

He constructs the affine plane from scratch. The vector (8/3, 10,3) you need to full screen to better see exactly where they are. From B, you move right 2 lines, and 2/3 so \(2 + \frac{2}{3}\) which is 8/3. From there you move up 1/3, and 3 so \(3 + \frac{1}{3}\) which is 10/3. Reasons for using the affine plane are you do not impose anything on it artificially like the origin (0,0) in the Cartesian plane, you can do projections easily, and it's construction is simple, since only parallel lines.

The biggest a takeaway here is what vector direction is, what a basis vector is and how you can combine them to form larger vectors, and how to translate basis vectors to suit different perspectives. The exercises at the end the delta symbol will return in the lecture on area and the determinant.

In this optional related lecture he shows how ancient people made a ruler using just parallel line geometry, which leads to modelling the rational continuum. Some more optional (short) videos is this from 3Blue1Brown on what a vector is, and this presentation of basis vectors and what a span is. Change of basis is video 13 in the 3BlueBrown playlist but the explanation here by Wildberger in the first lecture I find is better.

Vector Arithmetic

All this vector arithmetic will come up later in other courses, at least now you will be somewhat familiar with it. Subtraction means going in the opposite direction, so if you're at the head of one vector and going to another vector's head, you are subtracting. The generalized affine combination where coefficients add up to 1 will come up again.

The proof for diagonals of a parallelogram bisect each other, note how quickly this can become a pile of symbols.

  • let lambda = mu in: lambda = 1 - mu
    • if lambda = mu, then moving mu over to the other side means 2lambda
    • 2lambda = 1
    • Divide each side by 2 to isolate lambda
    • lambda = mu = 1/2

@38:00 or so solving linear eq with 2 unknown variables we already covered in the first lecture or review Crash course in solving equations & polynomials.

Center of mass and barycentric coordinates

Since we just did Physics in the software workshop (CS019 Differential systems) let's learn about Archimedes principle of the lever, accelerations in the grid plane and Newton's laws. This will come up again, note the labs here for a course in applied math specifically Lab Manual vol 2, writing python fuctions that perform Barycentric Lagrange interpolation. This is used to study air pollution by approximating graphs.

Area and volume (determinants)

The fourth lecture in the series to lead up to determinants. You probably recognize ad-bc from the first lecture, which was one of the exercises. He uses different notation for bi-vectors since it represents the union of vector a and b. The torque example you have probably seen somebody using a long pipe over a wrench in order to increase the torque without needing increased force, now you know why increasing the area of the wrench increases the torque. All 3 physics examples can be modelled with a parallegram/bi-vector. The bi-vector rules (ie: adding a scalar to one side of a bi-vector doesn't change the bi-vector) will make sense around 35:50 when he produces a picture proof showing how moving things doesn't change the bi-vector. At around 38:00 we once again get the formula ad-bc, the signed area is a scalar of two basis vectors. 45:50 introduces three-dimensional affine space.

Now that you've seen this model of area using bi-vectors, see this 10 minute presentation by 3Blue1Brown which will fill in some more details, like why the signed area can be negative if you didn't already figure it out. The exercises are conceptually easy, you don't have to do all of them but the whole point of linear algebra is to take these geometric concepts and do them in algebra.

Change of coordinates and determinants

Lecture 5. Now we get a full intro to matrix and vector notation, and matrix operations, this is normally where linear algebra courses start. I like the fact that he built up the change of coordinates first so we aren't just presented with a meaningless array of numbers.

Exer 5.1 if you make a B matrix with e, f, g, h and then multiply AB, take the resulting matrices determinant. The piles of letters should equal det(A) x det(B) after you do some commutative rearranging, this is a 10 minute exercise.

2x2 matrices I

Lecture 6. I appreciate he starts all these lectures with a review, that makes them mostly self contained. This is something I haven't seen before, explaining all the arithmetic of 2x2 and higher matrices using just a 1x1 matrix. The inverse of a matrix is now simple to understand. He walks through some proofs of matrix laws like the associative law (AB)C = A(BC). The standard cartesian coord system is used here for examples of transformations like dilations and shear. A dilation is growing or shrinking, like your eye being dilated.

The end of this lecture: how do you visualize an arbitrary transformation? The next lecture expands on this intuition.

2x2 matrices II

Lecture 7. This tells you how to visualize transformations T(e1), T(e2) are the columns of a matrix, so you can look at it and see how it is being transformed. He briefly introduces an alternate framework of rational trig so we don't have to use those sin/cos functions to figure out a rotation. There is a book and entire playlist for rational trig, if you want to learn it write your own software library for rational trig as you go.

End of the lecture introduces non-linear approximations, using differential calculus, to set up doing calculus in higher dimensions. The exercises you become 3Blue1Brown, able to visualize transformations yourself.

Inverting 3x3 Matrices

Lecture 8. 3 dimensional linear algebra. Pretty good setup explaining the inverse of a 3x3 matrix, most books just give you a formula. We find out why delta is 1/delta. Transpose is also explained better than I've seen before. Reminder that these really complicated introductions all have an off the shelf formula, but he's showing us how they are derived fully so you understand the formula. 3x3 matrix multiplication is explained and pretty simple, explains why B(A(vec)) is the same as (BA)vec. If you're also doing the AI workshop, in 15-388 it's claimed B(A(vec)) is preferable in terms of complexity of running time to (BA)vec.

Integers and rational numbers

Introducing the integers

  • YouTube - Arithmetic and Geometry Math Foundations 12

Wildberger's definition is similar to Tao's, he walks through a proof of a-(b-c)=(a-b)+c

4.1 The integers

Chapter 4 of Analysis I by Tao. Definition 4.1.1 is exactly the definition from the above Wildberger foundations video. Lemma 4.1.3 proof you can finish most of it yourself without reading the whole proof. Lemma 4.1.5 Tao notes the same concerns Wildberger expressed why they both used the 'a less b' style of definition instead of just using Lemma 4.1.5 as the definition of integers, because of the many different cases that would be needed. Tao mentions this definition of the integers forms a commutative ring because xy = yx.


Proof scratchwork, these exercises are all similar to the Wildberger proof he gave in the lecture and Tao's example proofs.

4.1.1 Proving A.7 Equality again. From Def 4.1.1 equality is defined as: a–b = c–d iff a + d = c + b and the variables are natural numbers. Does a–b = a–b? In this defition: a + b = a + b so yes, since we already proved natural numbers obey the reflexive axiom. Does x–y = y–x or x + x = y + y and yes, x and y are natural number objects. If a–b = c–d, and c–d = x–y, does a–b = x–y? Converting everything to their respective natural number definition, and arranging on each side: a + d + c + y + a + y = c + b + x + d + x + b cancel and you're left with 2a + 2y = 2b + 2x, divide each by 2: a + y = x + b, converting this back into an integer this is a–b = x–y similar to Tao's proof in 4.1.1 of transitivity.

4.1.2 If -(a–b) = -(a'–b') are defined as b–a = b'–a' then b + a' = b' + a. This is the integer b–a = b'–a'. Now convert a–b = a'–b' to a + b' = a' + b and notice they are the same, and since natural number construction -(a–b) = -(a'–b') if a–b = a'–b'.

4.1.3 (-1) x a = -a to solve use Tao's advice in remark 4.1.7 to assign a = (a–b) or some other integer notation. In lemma 4.1.3 Tao writes 1 is equal to 1–0, so if want to negate 1, this is 0–1, and in full integer notation: (0–1)(a–b) = ((0a + 1b)–(0b + 1a)) which is (b–a), which is what (a–b) would be if -(a–b).

4.1.4 Prove the rest of 4.1.6, we already did x1 = 1x = x kind of in exercise 4.1.3, convert 1 to (1–0) and pick any arbitrary variables to represent x as (a–b) and this is easy to prove. Same with 0, use lemma 4.1.3 to convert to (0–0) so (x–y) + (0–0) = (x + 0)–(y + 0) or (x–y).

4.1.5 ab = 0 is (x–y)(r–s) = (0–0) or xr + ys = 0 which is true only if xr = 0 and ys = 0. So (0–0)(r–s) or (x–y)(0–0) or both are 0 since we are dealing with natural number constructions as per lemma 2.3.3 nm = 0 only if n=0 or m=0.

4.1.6 if a, b, c, are integers such that ac = bc and c is non-zero, then a = b. (a–a')(c–c') = (b–b')(c–c') and c is non-zero, then (a–a') = (b–b') or a + b' = b + a'. (ac + a'c')–(ac' + a'c) = (bc + b'c')–(bc'+b'c). This is ac + a'c' + bc' + b'c = b'c' + bc + a'c + ac' or factored to c(a + b') + c'(a' + b) = c(b + a') + c'(b' + a). Now we are the exact same place as Tao's proof in Lemma 4.1.3 where he wrote since a + b' = b + a' two sides are equal. If (c–c') was (0–0') then this would be 0 = 0 not (a–a') = (b–b'). Not a proof, but explains why the cancellation law works. If a + b' = b + a' we can also substitute, so c(b + a') + c'(b + a') = c(b + a') + c'(b + a'). Subtract one of the c(value) or c'(value) from both sides so c'(b + a') = c'(b + a') and now we can divide by c' to get b + a' = b + a', and since a + b' = b + a' then substitute back b + a' = a + b' which won't work if c' = 0. Not really the proof Tao wanted but it kind of works.

4.1.7 You can also do these in integer notation. If a,b,c are integers, then a > b iff a - b is a positive natural number. Ordering is defined as a = b + n and a does not equal b in strictly greater than a > b. Taking a = b + n subtract b from both sides, we get a - b = n. If n = 0, then that means a - b = 0 which is a contradiction to Tao's definition that a and b are not equal and thus a - b = some positive n iff a > b is how I read that exercise, the rest are similar.

4.1.8 find an example property, where Axiom 2.5 as stated in the book doesn't work for integers. The axiom relies on the fact that P(n++) is true so negative n can break many properties since n++ will be another negative number unless you state like Tao does 'for all positive integers'.

This chapter has solutions everywhere floating around on stackexchange and other sites. Remember the solutions you find online could be more difficult than necessary and if you tried to solve it yourself, maybe you come up with a better proof than they do.

Rational numbers

  • YouTube - Arithmetic and Geometry Math Foundations 13

This definition is similar to 4.2 in Analysis I. The inverse law, write your own example, the rational number 4/1 inverse is 1/4 and 4/1 x 1/4 is 4/4 or 1. A rational number is (a-b)/(c-d) according to it's definition but we write them as p/q and in some cases just p if q = 1.

Visualizing the rational numbers

  • YouTube - Arithmetic and Geometry Math Foundations 14

The rational number line sits on [0, 1], so you don't end up with a zero denominator which isn't defined. A rational number is a line through the origin [0,0], and an integer lying on point [m,n]. Ford circles are an example of the massive complexity of the rational numbers, which are expanded upon in the next lecture.

The deep structure of the rational numbers

  • YouTube - Real numbers and limits Math Foundations 95

This is about 35m long but worth the watch if you want to see what a convex combination is, learn about rational intervals, and a more complete description of Ford Circles set up with Farey sequences. This will help us understand what a sequence is later in Tao's book. The overall takeaway of this lecture is that the smaller the interval you take on the rational line, the more complex the numbers become with enormous denominators meaning it has a very deep structure.

The next videos, #96 and #97 about Stern-Brocot trees and wedges are highly recommended and can also be found in the book Concrete Mathematics. In Lecture #97, which we can now understand that we know basic linear algebra, he takes a visualization on the plane of the Stern-Brocot tree and shows it also can be used as a geometric interpretation for the rational numbers, it's kind of crazy when you see it. The 'fraction' 1/0 is also explained.

4.2 The rationals

Tao's Analysis I chapter 4.2, definition 4.2.1: 3//4 = 6//8 = -3//-4 we know from Wildberger's visualization of the rational numbers, these numbers all lay on a line. Lemma 4.2.3, in the proof of addition getting rid of +bb'cd is a simple case of subtracting it from both sides, cancelling it out leaving ab'd2 = a'bd2 and he can divide both sides by d because d is known to be non-zero, since it's in the denominator of a/b + c/d.

To recap we defined the natural numbers, embedded them in the integers (a - b) to define integers, then embedded integers inside the rational numbers where the arithmetic of the integers is consistent with the arithmetic of the rationals. The rationals are a field because of the last identity in 4.2.4 the inverse algebra law. Tao's example proof is similar to the proofs we've seen Wildberger do, convert the letters into integer // integer form, apply law, commute the results together so that they are the same.


4.2.6 in my copy must be a mistake 'show that if x, y, z are real numbers…' that must be 'rational numbers' and sure enough checking Tao's errata page it is a mistake. There's a mistake in exercise 4.2.1 as well depending which version you're using.

  • Ex 4.2.1

We have the usual exercise, show that the definition of equality for rationals is reflexive, symmetric and transitive. We go into the Appendix A.7 Equality and see that the Reflexive axiom is given any object x, we have x = x. In Definition 4.2.1 equality of two rational numbers is defined as a/b = c/d iff ad = cb. If a/b = a/b then ab = ab thus reflexive. Symmetry axiom, if x = y then y = x. If a/b = c/d, symmetric equality is c/d = a/b. By definition 4.2.1 a/b = c/d is ad = cb, and c/d = a/b is cb = ad so Symmetry axiom satisfied.

The hint for proving transitivity is to see Corollary 4.1.9 the cancellation law for integers: If ac = bc, and c is non-zero, then a = b. In A.7 the Transitive axiom states if x = y and y = z then x = z. If a/b = c/d, and c/d = e/f, then a/b = e/f or if ad = cb, cf = ed, then af = eb. We can also rewrite this as adcf = cbed and simplify to af = eb. Staring at this we isolate af(cd) = eb(cd) so the rest of the proof involves logical deducation to determine that c and d aren't zero so we can cancel them from both sides, or through this deduction infer another equality of ab = ef, there is a few ways you could prove transitivity. For example we know that d can't be zero since c/d wouldn't be a rational number. If ad = 0 since we know d is positive, a must be 0. If c is zero, then cf = ed is 0 = ed and again d must be positive so e = 0. So now we have a and e as possible 0, so af = eb is 0 = 0 or: 0/b = 0/f and transitivity is proven though in a convoluted sort of way using deduction and making a lot of implications.

The transitivity proof is a great example of 'just naively write your proofs until you get better eventually'. We used Tao's hint of finding numbers than can or can't be zeros and wrote a proof that is probably not what he would've done, but at this early stage who cares, it works until we learn about proof theory and fortify our logic skills and in the process by manipulating around rational number equality you won't forget the definition that a/b = c/d if ad = cb, which to me is the point of doing proofs, to remember the material.

  • Ex. 4.2.2

Prove whatever remains to be proved in Lemma 4.2.3, let's try the negation since the multiplication is simple. If a/b = a'/b' then ab' = a'b is the setup Tao gives us. If -(a/b) = (-a')/b' then by def 4.2.1 we get (-a)/b = (-a')/b' which is -ab' = -a'b and multiplying each side by -1: ab' = a'b holds. Lemma 4.2.3 multiplication: If (a/b * c/d) = ac/bd and (a'/b' * c/d) = a'c/b'd we need to prove ac(b'd) = a'c(bd) by manipulating it into the form ab' = a'b since a/b = a'/b'. We can just do what Tao did, cancel c and d from both sides, leaving ab' = a'b however we should make sure they aren't zero, using the same kind of deduction we used to prove transitivity. We know d can't be zero since c/d would mean it's not a rational number if zero. Another strategy of course is to just add a scalar to the original ab' = a'b of (cd) on each side, which is fine, and solves the proof, providing you point to the definition in the book where a = a is the same as ab = ab.

  • Ex. 4.2.3

Prove the rest of Proposition 4.2.4. The first is the same as Tao's demo proof. Scratch work, you should try all these yourself they are simple:

  • x + 0 = 0 + x = x
    • a/b + 0/1 is (a*1) + (0*b)/b*1 or a/b
    • 0/1 + a/b is (0*b) + (a*1)/b*1 or a/b
  • x + (-x) = (-x) + x = 0
    • a/b + (-a)/b is (ab + -ab)/bb = 0/bb or 0
    • (-a)/b + a/b is (-ab + ab)/bb = 0/bb or 0
  • xy = yx
    • a/b*c/d is ac/bd
    • c/d*a/b is ac/bd
  • (xy)z = x(yz)
    • (a/b*c/d)e/f is ac/bd*e/f or ace/bdf
    • e/f(a/b*c/d) is e/f*ac/bd or ace/bdf
  • x1 = 1x = x
    • a/b*1/1 is a/b and also 1x as per xy = yx which we just proved
  • x(y + z) = xy + xz
    • simple task to do on paper a/b(c/d + e/f) = (cfa + aed)/bdf
    • a/b*c/d + a/b*e/f = ac/bd + ae/bf or (acbf + aebd)/bbdf and cancel one of the b's which we know can't be negative since it's in the denominator
  • rest same as above
  • Ex 4.2.4

Prove Lemma 4.2.7 is exactly the same as Tao's proof for Lemma 4.1.5 of the integers just substituting rational numbers and their definitions.

  • Ex 4.2.5

We've already proven propositions of order in naturals and integers, this is similar but with a/b < c/d

  • Ex 4.2.6

If you multiply -1 to both sides negating them, then 1 < 2 will end up -1 > -2 which is what should happen since -1 is not as small a qty as -2 if our interpretation is a number line. Proof is somewhere in Def 4.2.8, if x < y then x - y is a negative rational number so a/b - c/d = (-e)/f. Then come along with negative rational z: (-g)/f and multiply both sides, we get a positive number so x > y, at least in scratch work I did until I ran out of time.

Decimal numbers

The next chapter in Tao's Analysis I uses decimals to represent tiny positive quantities (epsilon) so we should probably know what decimals are.

Wildberger has a basic introduction lecture using natural numbers that assumes you know nothing about exponents and reviews the hindu-arabic notation, shows how to convert a decimal to a fraction. The second lecture introduces rational number decimals and their visualization.

His definition is a decimal number is a rational number a/b, where b is a power of 10 (hence deca/decimal), then shows various examples how you can manipulate rational numbers like 1/2 to be 5/10, or 0.50, we also find out why 1/3 is not a decimal number. In binary, it's a base-2 system, so the only prime factor is 2. If you load up any language interpreter and try and add 0.1 + 0.2 you will get surprising results, typically this number:

The examples are pretty good, showing how the hindu-arabic encoded powers of 10 drop down by the divisor and result in a decimal. The visualizations are for later lectures showcasing the complexity of bigger and bigger decimals. At 36:00 arithmetic w/decimals is covered, showing how the rational number and decimal number arithmetic methods are the same so you can just stay w/rational numbers if you want converting back to decimals at anytime.

Appendix B: the decimal system

From Tao's book Appendix B.1 The decimal representation of natural numbers his opinion is that the decimal system is not essential to mathematics, computers have replaced it with binary and 'we rarely use any numbers other than one-digit fractions in modern mathematical work'. Of course to understand scientific notation we still need to know how decimals work and we just did with those Wilberger lectures so this chapter is optional (we can't do it anyway before learning the the real numbers). The single exercise in B.1 is interesting, 'the purpose is to demonstrate that the procedure of long addition taught to you in elementary school is actually valid'.

4.3 Absolute value and exponentiation

Def 4.3.1 you apply that def to the rational number, so if x = -3, the absolute value is -x so -(-3) which of course makes it a positive rational number, as seen in the example of 4.3.2 where the absolute value of 3 - 5 = 2.

Def 4.3.4 that symbol is epsilon, meant to be a tiny quantity he defines as greater than 0. The example 4.3.6 if two numbers are equal, and their difference is zero, then they are 'epsilon close' for any positive epsilon, since their difference is not bigger than any positive epsilon.

Proposition 4.3.7 recall that all these definitions are for d(x, y) or d(z, w) so x - y or z - w. His proof he moves xz to one side, so it becomes negated (subtract from both sides). He then manipulates |az + bx + ab| according to previous definitions to get |a||z| etc., and then uses the various propositions in 4.3.7 to rewrite them as eps|z| + delta|x| + eps(delta). This will make sense after we attempt to prove the rest of the propositions.

Rest of this chapter covers exponents and their properties.


4.3.5 Prove that \(2^n \ge n\) for all positive integers n using induction.

  • base case: n = 0
    • 20 = 1 and since n = 0, 1 > 0.
    • but Tao specifically wrote 'positive integers' so base case must be 1
    • 21 = 2 and since n = 1, 2 > 1
  • inductive step:
    • \(2^{n + 1} \ge n + 1\)
    • left side: \(2^{n + 1}\) is 2n * 2 by def 4.3.9
    • 22n is also 2n + 2n by prop 4.3.10 (a)
    • so now we have 2n + 2n \(\ge\) n + 1
  • assumption:
    • since 2n \(\ge\) n, then 2n + 2n \(\ge\) n + n
    • since we are working in only positive integers, we know n + n is \(\ge\) n + 1
    • 2n + 2n \(\ge\) n + n \(\ge\) n + 1
      • and now here we have standard transitivity proving \(2^{n + 1} \ge n + 1\)

This of course falls apart if Tao changed it to 'for all integers' instead of positive integers, or if we allowed our assumption 2n > n to have n = 0, then we could not assume n + n is \(\ge\) n + 1. The rest of the induction exercises are similar, like proving Prop 4.3.10 so inducting on n: \(a^{n + 1}a^m\) = \(a^{n + 1 + m}\) the left hand side becomes \(a^n \cdot a \cdot a^m\) which is also the definition of \(a^{m+1}\) so can be rewritten \(a^na^{m+1}\) or \((a^na^m)a\) so in form of the hypotenuse, we change it to \((a^{n+m})a\) which is of course \(a^{n+1+m}\) and now both sides are equal.

4.4 Gaps in the rational numbers

Proposition 4.4.1 there is an integer n, and a rational number x, such that n <= x < n + 1 so sandwiched between integers 2 and 3 are rational numbers such as 25/10 or 100/40 etc.

Proposition 4.4.3 there is always some rational you can squeeze in between two other rationals as we've already seen in Wildberger lectures. 1/800 < 1/500 < 1/100 etc. If you want put in examples for his generalized proof to better understand it like x = 2, y = 3 and z = (2 + 3)/2.

Proposition 4.4.4 famous proof there does not exist a rational number x for which x2 = 2.

Limits and Cauchy sequences explained

In chapter 4.3 we're already coming across the notion of epsilon that Tao is trying to set up for teaching us limits and the construction of the real numbers. It will be extremely helpful to watch parts of the next few lectures to see the architecture of what we will be doing so things like absolute value distance within epsilon makes sense.

What exactly is a limit

  • YouTube - Real numbers and limits Math Foundations 106

Watch the first 20 minutes of this lecture and you'll see what the notion of an epsilon displacement is. Note that Tao will define all these things for us later, but at least you'll have an idea of what epsilon is. The rest of the lecture is excellent covering a different definition of limits but optional.

2D geometric interpretation of Cauchy sequences

  • YouTube - Real numbers and limits Math Foundations 94

Watch from 2:20 to 16:00, he shows the 2D geometric interpretation of Cauchy sequences, and how they are interpreted as 1D on the real line, the idea of 'converging sequences'.

Real numbers and Cauchy sequences I

  • YouTube - Real numbers and limits Math Foundations 111

He explains the triangle inequality around 8mins, we just read about this in chapter 4.3 of Analysis I (and is one of the exercises). He's not kidding about the huge and unweildly uncountable sets of sequences we will see these soon enough in Tao's book.

Real numbers and Cauchy sequences II

  • YouTube - Real numbers and limits Math Foundations 112

An excellent lecture why we even need these complex limit and real definitions in the first place, showing how limits make many subjects in analysis much easier. Archimedes bracketed measurement is covered when he tried to figure out the approximation of pi. We also get an introduction to transcendental functions as a geometric series, something we will see in Analysis II by Tao in detail.

The next lecture in the series is optional but Wildberger presents an alternative definition of real numbers, using Archimedean style of nested intervals where equality, algebra laws, order etc are all trivial to prove.

Linear Algebra cont.

3D affine geometry

We're watching this lecture: Three dimensional affine geometry | Wild Linear Algebra 9. He actually has a physical model of 3D affine space and shows how it projects to 2D. 3D (x, y, z) coordinates are explained with an interesting question 'how to model space?'. \(A^3\) space and \(V^3\) space are defined, insersecting and parallel planes are illustrated.

Equations of lines and planes in 3D

As typical with Wildberger lectures this goes far into more advanced material like Mobius bands to model all the lines in the plane, nobody ever showed me this in my linear algebra classes.

Lecture 10. Wildberger goes through different equations for lines: functions (calculus), cartesian and parametic. His example of cutting the Mobius band and not disconnecting all the lines reminds me of a lecture from Erik Demaine's advanced datastructures class and his fascination with origami. We get a new operation dmin(), when he finds the 'determinant of minors' you can see how easy this would be to write these operations as a program, and many optimized libraries already exist for doing so we won't be doing this by hand, this is to understand what those libraries do.

50:32 you can pause the lecture and easily calculate the dmin yourself for practice, in fact whenever he does some operation like finding the determinant see if you can do it yourself first. 51:39 the minus signs in 'the odd places', if you forgot this rewatch lecture 8. Add up the indices, for example the \(a_{1,2}\) index so 3 thus odd. The \(a_{2,3}\) index adds up to 5. Multiply those positions by -1 and any other odd indices.

Applications of 3x3 matrices

Lecture 11 the algebra of 3x3 matrices. A diagonal matrix, with constant diagonal, is a model or copy of the field of rational numbers, every number has a constant diagonal matrix. He goes through various transformations like reflections, projections and rotations in 3D. He shows the standard cos/sin rotation then rational trigonometry rotations, which makes rotations easier to manipulate when we are just learning, the basic rotations using sin/cos are the same.todo

Generalized dilations and eigenvalues

Lecture 12, change of coordinates.todo

Po-Shen Loh

Let's see what CMU professor Po-Shen Loh's sites are doing these days, since he's the current national coach of the US olympiad team who has won both the 2018 and 2019 competitions (and even wins the math competitions for former medalists) so surely there will be something interesting for us in his content.

Daily challenge

The marketing indicates it's curriculum is primarily targeted AMC8 to AMC12 (depending on module) middle school competition difficulty emphasizing how to solve problems without requiring memorization. The idea is that everyday for a month you do a challenge problem and then has numerous follow up lectures explaining the challenge as each problem is designed to teach several topics at once. Even though this is directed at kids the content is sophisticated as it's math competition techniques. The pre-reqs are you know everything from pre-algebra and 'are already experienced in math competitions' so things like the math olympiads for elementary and middle schools.

Singing up for free demo tier requires all kinds of age verification bs like $0.33 non refundable deposit of proof of ID for various regulatory reasons since this is a jr highschool targeted site so I will save you the step and look around.

Module 3: Combinatorics Tools he specializes in this so should be a good module it covers graph theory, set theory, prime factors, rotations and reflections, factorials, problem complexity reduction, recursion, case analysis. First challenge is 'Ordered Arrangements' how many permutations of 4 objects can you make in a certain way, dropping the hint you can do it by brute force. The challenge explanation video is a good set up for n choose k, he describes factorials as diminishing choices. He models the same problem with a tree then describes how to just deduce the entire problem is 4!/2.

Module 1: Algebra Basics covers harmonic average, probability, recurrence relations, estimation and everything to do with understanding ratios and fractions. The challenges are all done with pen and paper. The first challenge is 'Fraction Gymnastics', we're asked to multiply 2.5 x 0.25 x 1.333..x 1.2 which since we've seen Wildberger decimal lectures we know how to convert these to fractions and that 1/3 is .333 repeating. After you convert these to fractions and solve the problem he has a series of tutorials explaining the challenge. He gives a short lesson on cancelling. He explains why these fractions were picked, any num/2 is a 0.5 decimal, num/4 is 0.25 decimal, num/3 will be a repeating infinite decimal, and num/5 will be a 0.2 decimal so you can quickly figure out their decimals in the future if confronted with an enormous decimal in a competition. Long division is used to example why 1/3 repeats forever. More challenge problems are given and the 'weekend' challenge problems are AMC10 level.

Module 4: Algebra Tools first challenge is 'Algebra Tricks' a percent question, a store adds 35% to wholesale value, if you apply a 35% off coupon how much do you save off the wholesale price, which boils down to what is 35% - 35%. He shows how to do this (1 + 0.35)(1 - 0.35) which is of course (x + y)(x - y), and tricks to quickly square any 2 digit number like 0.35. The setup for this is quite good, he really is teaching multiple concepts all with a single problem.

Module 5: Number Theory Tools contains mainly topics you find in a first year 'intro to proofs' style course. One challenge question is "How many squares less than 10,000 are congruent to 1 mod 10?" and this is easy to answer after you see his explanations. I don't remember learning any of this when I was in middle school.

Should you do this if you aren't in Grade 7? Maybe, some of those AMC10 problems are pretty difficult and I know plenty of adults who wouldn't be able to figure out the decimal to fraction challenge let alone adding up all the squares from 1 to 100 in the Number Theory Tools module. Tao's book Solving Mathematical Problems contains some similar competition tools or you could buy the Art of Problem Solving series books if you wanted but I find Poh's style here is superior in every way. There is of course the Don Knuth series The Art of Computer Programming if you want hard problems with solution explanations to try everyday.

The university level competition is the Putnam and he recommends the book Putnam and beyond Razvan Gelca, Titu Andreescu which flipping through it has a good crash course chapter in probability.

This is his free site however the original Expii looked like this but you won't for some reason find that listed anywhere on their main page, and all the practice problems are gone too. The original concept was Expii is a personal tutor to fill missing gaps in the prereqs dependency chain though since this part of the site disappeared off the main page I'm assuming it's depreciated and being replaced with Expii solve.

Live sessions

Every Friday from 4:00pm – 5:00pm EST has a live session of the same content aimed at highschool competitive algebra and geometry where you can ask him anything, for example in 07/08 Wed stream he shows how to solve \(3^{x^2 - 2xy}\) = 1 where x2 = y + 3, we can do this too because we've read Tao's book but he always has some interesting optimized method being a national olympiad coach, like solving nested absolute values with case analysis.

Here at 41:13 he answers even nonsense questions giving a tutorial on logical implication. Somebody after asks him what should students focus on breadth or depth? He answers that you get both by doing old problems from math competitions.

A list of questions asked and answered in all these live session are on the forum which you can't view without registration, and if you register you can't post unless you're in the Daily Challenge which is probably a good idea since it's all young kids you don't want to post there anyway.

New method to solve quadratic equations

This is an interesting method he found while we was working on Daily Challenge content trying to better explain quadratic equations without having to use the quadratic formula. The full version is here. Wildberger did something similar, rewriting the quadratic formula using Newton's method of approximation to make it easier for students.

21-738 Extremal Combinatorics

Prof Loh recently put most of his PhD course 21-738 Extremal Combinatorics up on YouTube which uses similar teaching style of his live sessions. It's a theoretical course, if you've never seen one you can still follow along knowing only what we know now combinatorics has low barrier to entry because it's all finite math/sets. These courses aren't always linear they will on different days cover different theorems. Not that we expect to be able to solve hard problems in combinatorics or anything but it's worth watching to see how he plays around with structures and objects to test their properties in a theory course. This course has interesting content like Ramsey Theory where every large enough system no matter how chaotic it appears there must contain large structured subsystems. You can conjure up things with probability to show they exist, then use extremal techniques to 'derandomize' to reveal a deterministic solution.

You can get a crash course on graphs here and that same playlist has a beginner probability summary lecture good enough to follow along many of the lectures if you wanted.

Linear Algebra Explained

This is a professional development seiminar for teachers given by Prof Loh on linear algebra specifically why is the formula for matrix multiplication constructed the way it is, and how to demystify the determinant. I audited them to see if we could use them here, the determinant lecture is definitely worth your time.

Matrix multiplication

  • Session 1 of 3 - Why is matrix multiplication done in 'strange' order?

He begins talking about linear functions, which we know are functions that grow linear in size, using function composition as a model so g(f(x)). He writes his matrices using a full box/rectangle something he says he was taught in university to distinguish regular variables from matrices assigned to variables, like AB is actually matrix A x matrix B so he will draw a box around it to 'see it's shape better'.

1:06:00 or so he traces a variable from one function composition to another, distributes the variable at the end and shows that the multiplication you're doing at the end is exactly the same procedure of matrix multiplication, this part is worth watching if you're interested.

1:43:00 he demonstrates how Fibonacci sequences can also show why matrix multiplication is associative, also a method to rapidly find any Fibonacci index using an order of magnitude estimate. He claims this method is superior to the typical golden ratio method because you're no longer carrying around square roots of 5 and can get exact solutions.

He ends with 'some students ask me in this modern day when there is big data and all these algorithms (machine learning) what kind of math will be the most useful for them to learn, my advice is linear algebra because it's convenient for modeling these things with a linear approximation'. He claims he had to take linear algebra in university twice to understand it because what he was taught in highschool about solving systems of equations was totally different than the abstract linear transformations in university.


I couldn't find session 2. The idea of this lecture is how teachers can demystify determinants to students, pointing out (signed) area and determinant are the same thing. If you can watch the whole lecture.

Prof Loh models the determinant as the area of parallelograms which we already know from the Wildberger lectures. There is a very long setup of 2x2 row operations getting areas of parallelograms and the determinant formula ad - bc just pops out of the last row operation. Signed area is then talked about like negating it, meaning 'flipping' the area, reflecting it across the origin.

He shows a student hack he saw at 1:37:10 or so for figuring out the determinant of a 3x3 matrix where you add on two extra columns to diagonally group and multiply, which is useless for 4x4 or NxN matrices. He explains the method he learned in school and still uses is the same method Wildberger taught us of blocking a row and column, taking the remaining 2x2 determinant and alternating their signs to a(ei-fh) - b(di-fg) + c(dh-eg). He says this method we use is scalable and works for all dimensions.

From about 1:44:00 he shows how to explain to students the standard 3x3 det() formula: aei + bfg + cdh - ceg -afh - bdi. He explains to find out himself he tested various matrices using linear functions, so holding some items in a vector fixed and let a few things vary, so basically testing it like you would a program. Scaling a linear variable down to 0, if it's 0 then the area should be zero too, this means there can't be any constants left floating around the formula or you will get nonsensical area. In the q&a after he clarifies this.

He then takes every permutation of a row and shows that anything that multiplies two entries on a single row must also disappear from the determinant formula or else scaling breaks, the determinant formula multiplies a scalar twice in one vector.

Products from the same column breaks the formula as well. Recall that a 3x3 matrix is essentially 3 row vectors (x, y, z) so if you multiple anything in the same column, then you're exploding all the x vectors or z vectors while the rest stay constant, so you are stretching the signed area.

Finally why is there alternating signs. Take a standard abcdefghi 3x3 matrix and take the 3x3 determinant formula: aei + bfg + cdh - ceg -afh - bdi. Make a matrix for each one using 1's and zeros. So aei will be a diagonal matrix of 1's and the rest zero:

aei \(\begin{bmatrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\end{bmatrix}\)

Compare the negative matrices: afh, bdi, and ceg to the diagonal matrix aei. Notice that you are flipping two of the 3 vector rows, so for example aei if you exchange ei with each other one time it becomes afh. If you switch ai in aei then you get ceg, another negative matrix in the determinant formula. In order to convert aei to a positive matrix like bfg, you need to flip twice. So it's revealed the negative matrices in the det() formula are one's you only flip an odd number of times, and positive even number of times, from the aei diagonal matrix.

2:16:30 he covers questions and gives more examples of single variables breaking the formula, column stretching.

Most interesting part of this lecture is how he went about investigating the determinant formula, poking at it and experimenting around to see what happened when he broke the formula, modeling the formula using linear functions to see what happened, which is how you learn math from what we've seen already in Tao's blog posts and the book How to think like a mathematician. In a way this is also how you can learn some software you find in the wild with no documentation, disable some things and see what breaks. In other words poke at it to discover it's behavior.

Proof Theory

Robert Harper has explained in lectures that CMU students often ask the staff how do they know if their proofs are correct, learning basic proof theory will help answer that question. In constructive logic a proof is just an object like any other math object. We will learn natural deduction and how to read/build a proof tree. Finally we're going to learn how to write all our proofs using Agda a proof assistant, you can enter in the proofs you did for Tao's book and check them yourself.

  • First these notes of 3 lectures given by Per Martin-Lof's to understand why we are learning this and where all the terms come from. It's 50 pages but there is a lot of notation eating up space and the margins are huge, maybe 30 pages of text in total. If you can print this out it's comfy casual reading you can do in the evening without frying your eyes with backlight looking at a screen.
    • We already know what conjunction is
    • That weird formalism notation is turnstile and can be read as 'entails', and hilariously the wikipedia article also suggests we read the very paper we are reading to learn about it
    • line isn't divide or a fraction, these can be called rules and read: if everything above the bar (the premises) is true, then the thing below the bar (the conclusion) is true.
    • If you haven't figured it out: Gr. (word in greek) or Ger. (word in german) is the notation used
    • TODO, will mix these w/proof theory lectures.