Math Foundations from Scratch


This is a (ongoing) workshop to learn the mathematical maturity needed for various prereqs in university AI intro courses, and is part IB of the AI Tripos. No background is required since we will build everything from scratch. The idea here is that you do this workshop until you can figure out the courses in the AI workshop and then can either continue doing both or switch.

Do 1% effort for 100 days - Prof Ryan O'Donnell.


  • Foundations:
    • Some basic foundations
    • Terence Tao's Analysis I book to learn mathematical logic and the construction of the real numbers
    • A geometric course in Linear Algebra to visualize for yourself what is going on
    • Some workshops from Poh-Shen Lo on matrix algebra modeling and understanding the determinant
  • Logic:
    • Constructive logic and a seminar on proof theory which is a way to model a proof so you can then check your own work or enter it into Agda
  • Practice:
    • CMU Putnam seminar and book Putnam and Beyond to learn strategies and practice them
    • 15-751 CS Theory Toolkit to learn grad research tips, and to better understand complexity

Study advice

At first, your proofs will just be complete sentences that are a convincing argument to yourself. You don't need to have them verified or a solutions guide or anything. The act of re-reading definitions, experimenting around with the examples, and trying to deduce the logic to write your own proof is how you learn math. Remember when you learned to read and write, at first you just naively scribbled unaware your composition skills were terrible until you took more advanced grammar classes in school, and you began to read other people's writing to compare to your own. Then through experience you got better. This is same deal for proofs, we will eventually see and practice enough of them in the Putnam seminar you will get better at this skill.

Make your own examples

If you write all your own examples everytime you come across something being defined, many of the exercises will be trivial to complete. Write them like you would example tests in the software workshop, find all the corner cases and see if the definition holds. Remember why we write examples in that workshop, it's to learn more about the problem domain since there may be subtle things that escaped our understanding. It's the same deal here, and usually those subtle things end up as exercises so if you've already done your own examples you already know the answer.

Start here

We start with some 10 minute lectures on basic arithmetic, from the point of view of a mathematician.

The natural numbers

Presented here is a definition of a mathematical object called a natural number, as a string or sequence of strokes, which are successors of each other. ie: the successor of a number, is another number.

  • YouTube - Arithmetic and Geometry Math Foundations 1 *note these videos eventually get better in quality

Arithmetic with natural numbers

  • YouTube - Arithmetic and Geometry Math Foundations 2

An interesting explanation of multiplication.

  • Laws for addition
    • n + m = m + n (Commutative)
      • Changing the order of the elements.
    • (k + n) + m = k + (m + n) (Associative)
      • Changing the grouping of the elements.
    • n + 1 = s(n) (Successor)
  • Laws for multiplication
    • nm = mn (Commutative)
    • (kn)m = k(nm) (Associative)
    • n \(\cdot\) 1 = n (Identity)
  • Note the different notation for multiplication:
    • nm
    • n \(\cdot\) m
    • n \(\times\) m
  • Distributive laws
    • k(n + m) = kn + km
    • (k + n)m = km + nm

Exercise: Prove why the multiplication and distributive laws work. Since we have no idea what a proof is yet, we can instead try going through some of our own examples.

Identity law: n \(\cdot\) 1 = n.

  • We are creating n groups of 1. According to our definition of multiplication |||| \(\cdot\) | would be 4 groups of | or ||||. This is still 4.

Commutative law: nm = mn.

  • ||| \(\cdot\) || = || || || or 6. Rearranging this, || \(\cdot\) ||| is ||| ||| which is again 6.

Associative law: (kn)m = k(nm).

  • If k = |, and n = ||, and m = ||| and parens expressions are evaluated first:
    • Left side: (kn)m. We first evaluate (kn) as || (2 groups of 1, the identity law), and 2m is ||| ||| (2 groups of 3) for a total of 6.
    • Right side: k(nm): In the parens nm is 2 groups of 3, so ||| |||. k(6) is the identity law, one group of 6. Thus (kn)m = k(nm).

Distributive law: (k + n)m = km + nm.

  • If k = ||||, and n = ||, and m = |:
    • Left side: (|||| + ||) is 6, and (6) x 1 is six groups of 1 or 6 according to the identity law.
    • Right side: km + nm: 4 groups of one is 4, plus two groups of 1 is 2 (identity law). Answer is again 6.

Laws of arithmetic 2

  • YouTube - Arithmetic and Geometry Math Foundations 3

Wildberger shows another interpretation of multiplication, and walks through the proofs of the laws by rearranging according to the commutative law.

Subtraction and division

  • YouTube - Arithmetic and Geometry Math Foundations 4

Subtraction is the inverse of addition

  • If k + m = n then k = n - m
  • Note: n - m is only defined if n > m as we are still in the realm of the Naturals and negative numbers/integers aren't yet defined.

+, -, are binary operations and can only do 2 operations at a time

  • Laws of Subtraction:
    • n - (m + k) = (n - m) - k
    • n + (m - k) = (n + m) - k
    • n - (m - k) = (n - m) + k
  • Distributive laws of subtraction:
    • n(m - k) = nm - nk
    • (n - m)k = nk - mk

Note to remove parens: we can drop the grouping enforcement when using addition (associative law) but must distribute the negative sign, as subtraction is not associative.

  • n - (m + k) = (n - m) - k
    • removing grouping: n - m - k = n - m - k

The Hindu-Arabic number system

  • YouTube - Arithmetic and Geometry Math Foundations 6

A simplified Roman numeral system is introduced to abstract our previous notation of one's so we can manipulate bigger numbers.

  • Uses powers of 10
    • 327 = 3(100) + 2(10) + 7(1)

Arithmetic with Hindu-Arabic notation

  • YouTube - Arithmetic and Geometry Math Foundations 7

Multiplication with the Hindu-Arabic notation:

  • Same laws as before:
    • (ab)c = a(bc)
    • a(b + c) = ab + ac
    • (a + b)(c + d) = ac + ad + bc + bd

A different way of hand multiplication is shown, taking advantage of the fact the notation is using the distributive law.


  • YouTube - Arithmetic and Geometry Math Foundations 8

Division is repeated subtraction. Laws of Division:

  • 1. \(\frac{k}{m}\) + \(\frac{n}{m}\) = \(\frac{k + n}{m}\)
  • 2. \(\frac{k}{m}\) \(\cdot\) \(\frac{n}{l}\) = \(\frac{kn}{ml}\)
  • 3. \(\frac{k/m}{n}\) = \(\frac{k}{mn}\)
  • 4. \(\frac{k}{m/n}\) = \(\frac{kn}{m}\)


  • YouTube - Arithmetic and Geometry Math Foundations 9

We're still using natural numbers.

  • Def: A fraction, is an ordered pair (m,n) of natural numbers also written m/n.
    • Fractions m/n and k/l are equal precisely when ml=nk.

Arithmetic with fractions

  • YouTube - Arithmetic and Geometry Math Foundations 10

We're still in the natural numbers, which he defined as beginning with 1 so we aren't working with zeros yet (no division by zero corner cases). He defines the operations of addition and multiplication with fractions using the laws of division we already learned.

Note how he sets up the proof. The 'prime' notation (the tick, so c', a', b') represents another variable. You could rewrite this 'Suppose a/b = y/z so that a/z = b/y, and c/d = something/another, so that c/another = something/d does it then follow …'.

What we are trying to prove is ab' = ba' and cd' = dc'. He introduces some extra notation, multiplying both sides by (dd') to help do this.

Laws of arithmetic for fractions

  • YouTube - Arithmetic and Geometry Math Foundations 11

Wildberger proves the distributive and commutative laws using fractions while still in the natural numbers.


Now we will redo what we just learned rigorously in Terence Tao's Analysis I. The author explains here why we need this rigorous stage. This book was originally supplementary notes for students while they read through Rudin's analysis book. Tao starts from the very beginning, and constructs all the algebra laws for natural numbers, integers, rational and real numbers then will teach us calculus so if you have no math background this is fine since we're going to redo it all from scratch anyway.

Materials needed

  • The 3rd edition of Analysis I from WorldCat or LibGen (or buy it, it's an inexpensive book if you get the international version from abebooks or Amazon India)
  • The book How to Think Like a Mathematician by Kevin Houston. Most of this won't make sense until we do a few chapter's out of Tao's book, we will read it at the same time.
  • The optional book An Infinite Descent into Pure Mathematics by Clive Newstead which provides another view of the same material if you're confused by something, look it up here (or search YouTube for a tutorial)

How to read math

See chapter 2 Reading mathematics and chapter 12 Examples and counterexamples in Kevin Houston's book.

2.1 The Peano Axioms

We begin like we did with Wildberger, basic counting but rigorously defined.

  • The five Axioms
    • Axiom 2.1: 0 is a natural number
    • Axiom 2.2: If n is a natural number, then n++ is also a natural number
    • Axiom 2.3: 0 is not the successor of any natural number
    • Axiom 2.4: Different natural numbers must have different successors
    • Axiom 2.5: Let P(n) be any property pertaining to a natural number n. Suppose that P(0) is true, and suppose that whenever P(n) is true, P(n++) is also true. Then P(n) is true for every natural number n.
      • We make an assumption about a property P(n) being true meaning you do not prove P(n), you prove P(0) and P(n + 1) are true, and both these form a logical implication meaning that if they are both true, then your initial assumption of P(n) being true holds.

Remark 2.1.10 and the template in Proposition 2.1.11 is probably the clearest definition of proof by induction I've seen. Note the example where no natural number can be 'a half number', because of the increment step n + 1. The base case of 0 (as per Axiom 2.5, and Axiom 2.1) fails if we are using 'half numbers' instead of natural numbers because if zero is a natural number, and any n++ is another natural number, the only next successive number is 1 it can't be 0.5 or -3 or 3/4. The way you should view these axioms about the natural numbers is:

  • 0 is an element of N
  • if x is an element of N, then so is x + 1.
  • nothing else is an element of N


Let's understand Axiom 2.5 by watching a lecture from MIT on proofs though you will end up writing so many induction style proofs on the natural numbers in Tao's book this will become second nature.

  • 1. Induction lecture
    • From MIT's 6.042j Mathematics for Computer Scientists
  • Read chapter 24 Techniques of proof IV: Induction in Kevin Houston's book

Back to 2.1

Proposition 2.1.16 (Recursive definitions) in the book Analysis I. Tao uses a function, it's definition can be read in chapter 3.3. Here \(a_0\) is function notation so \(a_0\) is \(f_0(0)\), \(a_1\) is \(f_1(1)\). This proposition is using a new function (a rule that maps from one input to one output) every time, I'm guessing to avoid sequences where they could repeat ie 0, 1, 1, 2, 3..

Let's do an example.

  • \(a_0\) becomes 0 (for c = 0)
  • \(a_{0++}\) becomes \(f_1(a_1)\) or 1
  • \(a_{1++}\) becomes \(f_2(a_2)\) or 2

The proof notes Axiom 2.3 protects the value \(a_0\) from being redefined by one of the functions. His base case for this proof is \({a_0}\) (Axiom 2.3) and he is assuming P(n) which is that the procedure will always give a single value to \({a_n}\). Axiom 2.4 is the inductive step, that this procedure will also give a single value to \(a_{m++}\). All of the Axioms are used to prove this. If this didn't make sense it will when we come back here in chapter 3.5 for an exercise.

Aside: Induction and Recursion

Induction and recursion are the same concept. As an example, the specification of a program function twoExp(n) that when given a value n, returns \(2^n\). The examples you write for this program are as follows:

  • twoExp(0) = 1
  • twoExp(1) = 2
  • twoExp(2) = 4
  • twoExp(3) = 8
  • twoExp(4) = 16
  • twoExp(5) = 32

You notice the pattern: the result of the next value, is 2x the previous value. twoExp(5) is 2 * twoExp(4). You write an algorithm using the above examples: if input = 0, then return 1, otherwise return 2 * twoExp(n-1). You make an assumption, that the input will always be non-negative. Walking through the program execution, twoExp(3) evaluates:

  • 2 * twoExp(3-1)
  • 2 * (2 * twoExp(2-1))
  • 2 * (2 * (2 * twoExp(1-1)))
  • 2 * (2 * (2 * (1))) = \(2^3\)

To prove this by induction on n, we try the base case P(0). If n = 0 then twoExp(0) evaluates to 1, which is \(2^0\). What is P(n)? According to our recursive definition, n = 2*P where the value of P = twoExp(n-1) or \(2^{n-1}\). Our inductive step is n, since our algorithm is inductively defined n-1, and we wish to prove the successor n + 1, which is n. 2*P = \(2*2^{n-1}\) or \(2^n\).

Note that we didn't have to do anything extra to prove this recursive function. That's because an inductive proof is a recursive function, and a recursive function is an inductive proof.

2.2 Addition

Let's understand his example for addition. 3 + 5 should be the same as (((5++)++)++) and adding 0 to 5 should result in 5.

Definition 2.2.1 notice he has a base case (0 + m is m), the hypotenuse "Suppose we inductively defined how to add n to m" which we saw at the beginning of 2.2 where 3 + 5 is defined as incrementing five three times, finally the inductive step, we can add n++ to m by defining (n++) + m to be (n + m)++.

Try writing your own examples as advised in the book How to Think Like a Mathematician

  • 0 + m = m
  • 1 + m = ((0++) + m) or m++
  • 2 + m = ((1++) + m) or ((m++)++)
  • 3 + m = ((2++) + m) or (((m++)++)++)
  • 4 + m = ((3++) + m) or ((((m++)++)++)++)
  • 2 + 3 = ((3++)++) or (4++) or 5.
  • 3 + 2 = (((2++)++)++) or ((3++)++) or (4++) or 5.

Lemma 2.2.2 notice first how he is using the exact same induction template from 2.1.11, you will eventually get very familiar with this template. He's using prior established definitions to set up the proofs, notably definition 2.2.1 where 0 + m = m is defined. He assumed that n + 0 = n. Next the inductive step, where (n++) + 0 = n++. No arithmetic was done here, he pointed to the definition of addition from 2.2.1 that (n++) + 0 is (n + 0)++ or n++, so we have n++ = n++ and the proof is done. He used a different variable to not confuse it with the previously established definition of 0 + m = m, since we are now going the other way with n + 0 = n on our way to prove a + b = b + a.

Lemma 2.2.3 Prove: n + (m++) = (n + m)++. We are inducting on n, and keeping m fixed which means we apply the inductive step to n++ instead of m++. Replace all instances of n with 0 for the base case and show the inductive hypothesis holds that 0 plus the increment of some other number, equals that number. Try picking apart this proof yourself to follow it:

  • base case:
    • 0 + (m++) = (0 + m)++
    • left side is m++ because of definition 2.2.1
    • right side is m++ because of definition 2.2.1
  • inductive hypotenuse: 'Suppose n + m++ = (n + m)++'
  • inductive step:
    • n++ + (m++) = (n++ + m)++
    • left side is ((n + m)++)++ because of 2.2.1
    • right side is ((n + m)++)++ because of 2.2.1

Corollary: Tao asks why n++ = n + 1 is a consequence from the two proved lemmas. Compare lemma 2.2.3 For any natural numbers n and m, n + (m++) = (n + m)++ to definition 2.2.1 where we were able to show m++ = m + 1. Compare lemma 2.2.2 to the base case in definition 2.2.1

Proposition 2.2.4 (Addition is commutative) the base case is straight forward. We use the definition of addition, and lemma 2.2.2 to rewrite the successor of n to be (n + m)++ = (m + n)++ and now it is in the form of the inductive hypothesis, so we can use our hypothesis of n + m = m + n to show that (m + n)++ is the same as (n + m)++. Proposition 2.2.6 is similar.

Exercise 2.2.1

Remember a proof at our early level is just a convicing argument to yourself. To check that it is correct, all the forms have to already have been introduced in the text, for example say Tao has only defined 0 + b = 0 but not b + 0 = 0 + b. We can't assume b + 0 = 0 + b unless it has already been proved (or you write your own lemma for it) so if Tao has only shown 0 + b = 0, you can still do a dance where you claim b = 0 and then by substitution claim it's also 0 = b but you can't directly assume anything because 'it's obvious'. This is a hard skill to learn, spotting assumptions, we will take more courses on logic later and practice this skill. I've probably incorrectly assumed things in my own feeble attempts at the exercises, keep that in mind if you look at my proof scratchwork.

Proposition 2.2.5 (Addition is associative) what we want to prove is (a + b) + c = a + (b + c)

Base Case: What is the base case? We want to show that if a natural number is inside or outside brackets, it will have the same result when added together. Induct on any variable (a, b or c) you want. This will use substitution, which is in the appendix. If we say x = y, then every future occurrence of y we can substitute for x:

(a + b) + 0 = a + (b + 0) (substitute 0 for c)

  • left side: (a + b) + 0 since b + 0 = b (lemma 2.2.2) then we have:
  • a + (b + 0) (lemma 2.2.2)
    • base case: a + (b + 0) = a + (b + 0)
  • or choose a: right side: 0 + (b + c) = b + c by definition of addition
  • lef side: (0 + b) + c by definition of addition
    • another base case: 0 + (b + c) = 0 + (b + c)

IH: The inductive hypothesis (IH) we assume addition is associative, and that (a + b) + c = a + (b + c). If any steps of your proof are in the form (a + b) + c or a + (b + c) you can substitute them for each other by the IH, since we declared them equal. For example: y + a + (b + c). Note a + (b + c) is the form of our hypothesis, so can change this to y + (a + b) + c

Inductive step: There's a few ways to show this, I'll move stuff around randomly and you can see where the proof is:

Inducting on c: (a + b) + (c++) = a + (b + (c++))

  • right side: a + (b + (c++))
  • a + (b + c)++ (lemma 2.2.3)
  • ((a + b) + c)++ (by the IH)
  • ((a + b)++) + c (def of addition)
  • ((b + a)++) + c (commutative)
  • ((b++)+ a) + c (def of addition)
  • c + ((b++) + a) (commutative)
  • c + (a + (b++)) (commutative)
  • left side: (a + b) + (c++)
  • (c++) + (a + b) (commutative)
  • (c + (a + b))++ (def of addition.. consider (a + b) to be one value
    • Recall (n++) + m = (n + m)++ ergo (n++) + (m + b) is (n + (m + b))++
  • ((a + b) + c)++ (commutative)
  • (a + (b + c))++ (by the IH)
  • ((b + c) + a)++ (commutative)
    • etc., can keep moving these around

Try moving everything around until you understand exactly how commutativity and associativity work. See if you can get it to a form where you can use the inductive hypothesis, substitute and permute them around some more until this starts to make sense. Remember that (n + m) is one value, so (n++) + (n + m) is (n++) + (value) which we know from the text can be written as the form (n + (n + m))++, and (n + (n + m)) can be rewritten according to commutative property ((n + m) + n) because it's just (n + (value)) so ((value) + n) changes nothing. .

  • The proof, mimic Tao and write full sentences
    • We induct on a keeping b and c fixed. We first consider the base case a = 0 and we have to prove (0 + b) + c = 0 + (b + c). By the definition of addition (0 + b) + c = b + c, and 0 + (b + c) = b + c, thus both sides are equal to b + c and are equal to each other. Suppose inductively that addition is associative and (a + b) + c = a + (b + c), now we have to prove that ((a++) + b) + c = (a++) + (b + c) to close the induction. By the definition of addition, the left hand side is equal to ((a + b)++) + c which is also equal to ((a + b) + c)++ by the definition of addition. By the inductive hypothesis, ((a + b) + c)++ is equal to (a + (b + c))++. The right-hand side is equal to (a + (b + c))++ by the definition of addition and now both sides are equal to each other, and we have closed the induction.

2.2. cont.

Proposition 2.2.6 (Cancellation Law) uses Axiom 2.4 which states 'Different natural numbers must have different successors; i.e., if n, m are natural numbers and n = m, then n++ = m++. Equivalently, if n++ = m++, then we must have n = m'. So (a + b)++ = (a + c)++ is = a + b = a + c by the line if n++ = m++, then n = m in axiom 2.4. We dropped down from the successor by using axiom 2.4 and can now use our inductive hypothesis of the cancellation law for a to show b = c and close the induction. .

Lemma 2.2.10 has an exercise where induction is suggested but this is actually much easier to prove using contradiction (See Kevin Houston book). Always read the comments on Tao's blog if you're stuck: "You can modify the statement of the exercise to an equivalent statement which makes sense for a=0. (The statement will probably look something like “For any natural number a, either a=0, or else there exists exactly one…”.)" If you see this stackexchange answer they use the peano axioms of 2.2 existence and 2.4 uniqueness. We can also try the chapter on contradiction proofs in Kevin Houston's book or the appendix of Analysis: Suppose for the sake of contradiction that that there exists two different natural numbers with the same successor. Then a = m++ and b = m++. But then by Axiom 2.4 we have a = b, a contradiction.

Definition 2.2.11 n is greater or equal to m if n = m + some other natural number. If that number is 0, then the equality holds, if that number is 0++ then the inequality holds.


Try and figure these out yourself before looking at my interpretation. If you naively just wrote down your own argument why you think something is true while going back over all the text, and it was completely wrong, you'd still learn more than looking at my notes. It's better to do it yourself then later we will go through all of Tao's chapter on Mathematical Reasoning, and take a course in Proof Theory, then you will understand how to manipulate definitions by declaring them logically equivalent, making writing proofs and checking them for correctness much easier.

For now just continue writing whatever, doesn't matter if your exercise answers are correct or not it only matters that you attempted to do them.

Proposition 2.2.12 / Exercise 2.2.3: Prove the basic properties of order for natural numbers. Here I will show how you can 'reverse' or unravel lemma 2.2.2 or other proved proposition, none of this is a properly written proof only notes on moving through steps to understand the problem then after write a proof in the style Tao has been teaching us. We just want to point to definitions and lemmas here not use induction:

  • Order is reflexive
  • Reflexive property is that any natural number is equal to itself.
  • We want to show order is reflexive: a \(\ge\) a
  • a = a + n (def 2.2.11 of ordering of natural numbers)
  • From lemma 2.2.2 we know that a + 0 = a and so a = a and is reflexive
  • By definition 2.2.11 we also have a \(\ge\) a
  • Transitive order property
  • a \(\ge\) b and b \(\ge\) c, then a \(\ge\) c
  • If a = b + 0, b = c + 0, and a = c + 0, then we have lemma 2.2.2 and transitivity (equality) holds with a = b, b = c and a = c.
  • Substitution:
  • a = b + n
  • b = c + n
    • b is equal to (c + n) so let's put that in place of b in a = b + n:
  • a = (c + n) + n
  • a = c + (n++)
    • By lemma 2.2.10 we have a = c + m since b++ will always equal some nonzero natural number.
  • By definition 2.2.11 a = c + m is a \(\ge\) c + m

Proof attempt, though we could write our own def or lemma that a + b = a natural number it's largely implied by everything we've read so far: By the definition of Ordering of the natural numbers, if a = b + m and b = c + n, then a = c + m + n. By associativity a = c + (m + n) and since if a = b when m or n are 0, then a = c when m and n are 0. By substitution d = m + n, and since m and n are natural numbers, d is a natural number by the Peano axioms and 2.1.3 (m + n)++ = d++, and by Axiom 2.4 (m + n) = d so we have a = c + d as desired by the definition of Ordering of the natural numbers.

  • Anti-symmetric order
  • If a \(\ge\) b and b \(\ge\) a then a = b
  • See the entry for Antisymmetric relation
  • If these two inequalites hold then they must be equal
  • We need to show a = b:
  • a = b + m by 2.2.11
  • b = a + n by 2.2.11
  • Substitute b for (a + n):
  • a = (a + n) + m
  • a = a + (n + m)
  • a + 0 = a + (n + m) ('reversing' lemma 2.2.2)
  • 0 = n + m (cancellation law)
  • 0 = 0 + 0 (corollary 2.2.9)
  • If n = 0, and m = 0 by above, then a = b + 0 which is a = b
  • d) Addition preserves order
  • a \(\ge\) b iff a + c \(\ge\) b + c
  • a + c = b + n + c by 2.2.11
  • a = b + n by cancellation law and a \(\ge\) b by 2.2.11
  • Another method: a + c \(\ge\) b + c
  • a + c = b + n + c by 2.2.11
  • a + c = b + c + n (commutative law)
  • (a + c) = (b + c) + n (associative law)
    • This is nat number = nat number + another number or by 2.2.11 a \(\ge\) b
  • e) a < b iff a++ \(\le\) b
  • b = (a++) + n (def ordering of natural numbers)
  • b = (a + n)++ (def of addition)
    • successor of a number cannot be itself, so b does not equal a and by 2.2.11 a < b
    • We skipped steps, Tao is more thorough so let's try it again:
  • b = a + n and n does not equal zero (2.2.11)
  • Since n must be a positive natural number by 2.2.7 and 2.2.11:
  • b = a + (n++)
    • def of a positive natural lemma 2.2.10
  • b = a + (n + 1) (corollary from 2.2.2/2.2.3)
  • b = a + (1 + n ) (commutative law)
  • b = (a + 1) + n (associative law, grouping doesn't matter for addition)
  • b = (a++) + n by previous corollary of n++ is n + 1
    • a++ \(\le\) b since by 2.2.11
  • f) a < b iff b = a + d for some positive number d
  • For the sake of contradiction, assume a \(\le\) b in the a = b case for some positive number d. Then a + 0 = a + d by lemma 2.2.2, and 0 = d by cancellation law. This is a contradiction that d must be a positive number so a cannot equal b and a \(\lt\) b

Proposition 2.2.13 fill in the gaps:.

  • When a = 0, we have 0 \(\le\) b for all b (why?)
    • 2.2.1 Addition of natural numbers, 0 + m = m so 0 + b = b and by 2.2.11 0 \(\le\) b
  • If a > b then a++ > b (why?)
    • a = b + d (as per 2.2.12, d must be positive)
    • c++ = d (lemma 2.2.10)
    • a++ = b + (c++) (substitution)
    • a++ > b by def of ordering
    • or not so rigorous way:
    • a++ = (b + c)++ (def of addition, (b + c)++ must be positive
    • a = b + c (Axiom 2.4)
    • a > b (2.2.12 d is positive)
  • If a = b, then a++ > b (why?)
    • If a = b, then a++ = b++ by Axiom 2.4
    • a++ = b + 1 (corollary 2.2.2/2.2.3)
    • this is a++ > b by 2.2.11

Proposition 2.2.14 (Strong induction) From MIT this lecture on strong induction, the intuition here is Q(m) if all the dominoes are already knocked down and there's one left standing, then it too will be knocked down where 'weak induction' intuition is P(n) if you knock down one domino, they will all fall. If P(n) is true because P(n++) implies P(n) true, then for all m \(\le\) P(n), Q(m) is true. If you still don't get it search Youtube for another tutorial.

First look up vacuous truth or see the appendix on mathematical reasoning where Tao defines it.

  • Let Q(n) be the property that P(m) is true for all m\(_0\) \(\le\) m < n for n \(\ge\) m\(_0\).
  • Q(0) is true since:
    • m\(_0\) \(\le\) m < 0 is vacuously true as stated by Tao in the exercise hint
    • 2.2.14 proposition of for all n \(\ge\) m\(_0\) is true 0 \(\ge\) 0\(_0\)
  • Show that Q(n++) is true:
  • P(m) m\(_0\) \(\le\) m \(\lt\) n \(\lt\) n++ is true (2.2.11)
  • P(m) m\(_0\) \(\le\) m \(\lt\) n++ is true for all n \(\ge\) m\(_0\)

Try an example.

  • P(n) is the property that n + 0 = n
  • Q(0) is 0 + 0 = 0 and P(0)
  • Q(1) is 1 + 0 = 1 and P(1), P(0)
  • Q(2) is 2 + 0 = 2 and P(2), P(1), P(0)

Exercise 2.2.6: Your proof could roughly be 'Suppose P(n++) is true such that all P(m) is true for m \(\le\) n++. Then per Axiom 2.5 P(n) is true, so P(m) is true for all m \(\le\) n' since Tao hinted at 'backwards induction'.

2.3 Multiplication

Proposition 2.3.4 follow along with definition 2.3.1 of multiplication where (n++) x m = (n x m) + m also recall that n++ is n + 1: (n + 1)m so nm + m

  • left side:
  • a(b + c)++)) is a(b + c) + a
  • because (m(n++)) = m(n) + m
  • right side:
  • ab + a(c++) = ab + ac + a
  • ab + ac + a is in the form of the inductive hypothesis so a(b + c) + a

Remark 2.3.8 the contradiction is that ac = bc by corollary 2.3.7 when fed into the trichotomy of order can only result in a = b. This is a scheme to produce a temporary notion of division/cancellation.


Scratchwork and notes

Exercise 2.3.1 n x m = m x n:

  • base case: 0 x m = m x 0
  • left side: 0 x m = 0 by definition 2.3.1

For the right side, m x 0 = 0, write your own lemma that mimics 2.2.2 but for multiplication:

Lemma X: For any natural number n, n x 0 = 0:

  • base case: 0 x 0 = 0
  • IH: n x 0 = 0
  • Inductive step: (n++) x 0 = 0
  • (n x 0) + 0 = 0 (def 2.3.1)
  • Remove the + 0 (lemma 2.2.2)
  • Now it is in the form of the IH: n x 0 = 0

Lemma Y: m x (n++) is m + (n x m), rewriting Lemma 2.2.3 but for multiplication

  • base case: Our lemma X and def 2.3.1
  • IH: m x (n++) = m + (n x m)
  • Inductive step: m x (n++)++ = m + (n++ x m)
  • Left:
    • m + (n++ x m) by IH
    • m + (n x m) + m by def 2.3.1
  • Right:
    • m + (n x m) + m by def 2.3.1

Now we have to prove m x n = n x m

  • base case: def 2.3.1 and lemma X
  • IH: m x n = n x m
  • m x n++ = n++ x m
  • Lemma Y = def 2.3.1

Exercise 2.3.2

The hint is to prove the second statement first, if n and m are both positive, then nm is also positive. You could use Proposition 2.3.6 as a template for this:

  • If n > 0, and m > 0, then nm > 0
  • We have nm = 0 + d for some positive d
  • Since d is positive, n and m must be positive (or else 0 = 0 + d, a contradiction)

Or you could use Lemma 2.2.10, n x m = (b++) x m and then mimic 2.3.6 to determine n x m is positive if n and m are positive. Note that (b++) x m expands to the form (b x m) + b, and since you know a x b + a is the successor form of multiplication it can't be zero.

Lemma 2.3.3: n x m = 0 iff n = 0 or m = 0. You could try a contradiction using the nm > 0 proof we already wrote, so it must be n = 0 or m = 0 if nm = 0 or a contrapositive of Tao's statement: see the appendix on mathematical reasoning or look up contrapositive in Houston's book. If Tao states something, and then proves it, then there is a contrapositive statement we can use to prove another fact.

Exercise 2.3.3 A rewrite of 2.2.5: For any natural numbers a, b c, we have (a x b) x c = a x (b x c). You could induct on b, convert to the 2.3.1 form, and rearrange whenever you move something around and it is in the form of the inductive hypotenuse.

Tao also suggests using the distributive law:

  • Inductive Step: (a x b++) x c = a x (b++ x c)
  • (a(b++)c) = (a(b++)c) by distributed law
  • (ac(b++)) = (ac(b++)) by commutative law
  • (ac(b + 1)) = (ac(b + 1)) by corollary n++ = n + 1
  • acb + ac = acb + ac by distributed law again
    • cancellation law the two +ac:
    • acb = acb or (ab)c = a(bc)
    • or permute any way you want like we did for 2.2.5

Exercise 2.3.4

Prove the identity.

  • (a + b)(a + b) = (a + b)a + (a + b)b by distributive law/exponentiation
  • aa + ba + ab + bb by distributive law
  • a\(^2\) + ba + ab + b\(^2\) by def 2.3.11/Examples 2.3.12
  • ba + ab is ba + ba by commutative law
  • ba + ba is 2ba by def 2.3.1 where 2 x m = 0 + m + m
  • (a + b)\(^2\) = a\(^2\) + 2ab + b\(^2\)

Exercise 2.3.5

Prove proposition 2.3.9 by fixing q and inducting on n for n = mq + r and 0 \(\le\) r \(\lt\) q

For the base case, if 0 = mq + r, then m and r must be zero because this still satisfies the inequality where q is greater than r and remember we fixed q, so it can't be zero. For the inductive step, n + 1 = mq + r + 1 (adding + 1 to both sides as per previous lectures on induction), we have two cases, r + 1 < q or r + 1 \(\le\) q. If r + 1 < q then we have the inductive hypothesis and there's nothing more to prove since 0 \(\le\) r + 1 \(\lt\) q. Remember the order of multiplication/addition, this is still (number) = (number) plus n in the form of (n + 1) = (mq) + (r + 1).

If r + 1 \(\le\) q, or (r = q), then we have n + 1 = mq + q, and from the definition of multiplication/distributive law this is n + 1 = (m + 1)q or m++(q) + 0 and 0 still satisfies the inequality of r < q.


Before we learn some algebra and modern trigonometry, we should at least see some geometry. These lectures follow the John Stillwell book Mathematics and it's History where Wildberger slices various shapes with planes so you can see why we study things like a parabola or hyperbola. This is only casual viewing.


  • Pythagoras' Theorem 1 and Pythaoras' Theorem 2
    • A geometric proof of original Pythagoras theorem
    • A proof by contradiction why the square root of 2 is irrational
    • Part B shows classical cos/sin/tan trigonometry using chords


This is the geometry we will be doing shortly that allows for computational Euclidean geometry. We don't have a definition of polynomials yet but you can probably understand most of this lecture anyway.

  • Analytic Geometry
    • Cartesian coordinate plane explained
    • Part B covers periodic continued fractions method to estimate irrational numbers


Abstract geometry, project from one plane to another like 2D to 3D or into another field to reduce complexity.

  • 3D Projective Plane
    • Shows you different perspectives when looking at objects
    • Lay the x,y cartesian plane flat, add a z dimension like a flag pole, add another flat cartesian plane in z, project lines from x,y into z


Even more abstract geometry, construct a new plane by slicing a sphere and studying a hyperbola. Hyperbolic geometry seems better suited for deep learning as it has 'more capacity'. Maybe we will end up back here projecting our data into a hyperbolic plane.


Geometry of space curves, osculating plane

Some Algebra

Crash course in solving equations & polynomials

Flip through sections 1.3.1, 1.3.2 and 1.3.3 Review, Redo, Renew from this draft book used in CMU's 15-151/21-28 course. The author has a list of errata on his page. You will get a crash course in some quick arithmetic, solving linear equations and polynomials. There's a few interesting methods here you may not know about like how to variables from equations. Wildberger covers much of this material too for example Problem 1.3.8, an infinite nest is also in a continuous fractions lecture from the History of Mathematics playlist: \(x = \sqrt{2 + x}\)

If you're interested there is a summary of Algebra here that was originally made for non-english speaking university students. Tao will also define everything here as we progress through his analysis texts.


Some optional mostly 10 minute lectures on algebra basics, such as how Pascal's array/Khayyam triangle/Yang Hui's triangle encodes the binomial theorem, this series sets up understanding algebra and later calculus and combinatorics.

A different modelling of Polynomials

Wildberger has a unique encoding of polynomials in the rest of his math foundations playlist by abstracting coefficients into columns and rows. Pascal's array is then used, and things like factoring or polynomial division is surprisingly trivial to do. Later in the same playlist he teaches some algebraic calculus using this polynumber encoding and taking derivatives and doing integration is again trivial arithmetic. They could teach this to primary school kids. If you have time, I highly recommend watching the rest of the playlist.

Lectures about pi

Before we do trig, let's learn what \(\pi\) is. Wildberger talks about the history of pi such as Archimedes interval for pi, Newton's formula, a hexadecimal version of pi, his own formula. The end of this lecture he claims pi is some kind of complex object we still don't understand. At this stage we only need to know it's a constant, or the applied approximation of 22/7 or 355/113.

Trigonometry Boot Camp

Let's take the Brown Univeristy trig boot camp. There is a crash course lecture. I assume you know the basic gist of what an angle is which will be better defined when we take calculus.

  • Radians you could even make up your own notation and define it as dblpi = 6.283… and then 360deg is dblpi, 90deg is 1/4dblpi, 180deg is dblpi/2, etc.
  • We really only need to know sine and cosine functions, all others are derived from them
  • In the scaling of triangles example where \(\frac{1}{\sqrt 2}\) becomes \(\frac{\sqrt 2}{2}\) they are rationalizing the denominator:
    • \(\frac{1}{\sqrt 2}\cdot\frac{\sqrt 2}{\sqrt 2}\) is \(\frac{\sqrt 2}{\sqrt 2 \cdot \sqrt 2}\) which is \(\frac{\sqrt 2}{\sqrt 4}\) or \(\frac{\sqrt 2}{2}\)
  • You can download the slides to keep a copy of the scaled triangles
  • The reflections think about subtracting 1/6 from 1/1, you get 5/6 so 5pi/6 is pi (180 deg) minus 1pi/6, and a negative direction means we are moving clockwise so move to pi, subtract pi/6 in other direction.
  • The end of the lecture we finally discover why sin() and cos() are wave functions, since it's a function that moves around the unit circle. Tan() is the slope 'rise over run'

Summary: There exists 3 triangles, which we should know their measurements, and scale them down since this is a unit circle with maximum x and y values of 1. Then we superimpose the triangles in the pi/2 (90deg) angle space on the unit circle in such a way that we have known values of cos()/sin() and other trig functions, because we can derive those other functions from cosine and sine. We reflect these triangles into the other areas of the circle so we can get x/y values there as well by rotating around the circle and then adding up all the angles to indicate how much we rotated counter-clockwise (positive) and clockwise (negative). A reflection is demonstrated here so if f(x) = sin(x), cos(x) and tan(x) then y = -(fx).

Try the worksheet you can actually draw these out and count rotations if you want or use this radian to angle picture from Wikipedia. The equations if one side is squared, then you need to take the square root of both sides so it's (not-squared = value) instead of (squared thing = value-when-squared). If you're stuck watch the video that walks through all the solutions then do the worksheet again until you can finish it without needing to look at the video walkthru.

How I solved these: subtract 1 pi if it's bigger than 2pi, or a 1/2pi if smaller and see what you get, look at the 3 triangles if they have a corresponding value for sin/cos. If not subtract pi/3, pi/6 etc. If you're wondering why we are using radians instead of degrees it is explained here. Wildberger's rational trigonometry uses a turn, so pi/2 is 1/4, pi is 1/2, and 2pi is 1. Angles are replaced by 'the spread of vectors'.

Some Calculus

Now that we know trig functions, watch this basics of calculus crash course which was designed for non-english speaking students to learn math notation (with Chinese subtitles). Note in the last few examples how the derivative works d = \(nx^{n-1}\) so in the examples the derivative of \(y = \frac{-g}{2}t^2+v_0t+y_0\) is (\(2\cdot\frac{-g}{2}t^{2-1}+ 1\cdot v_0t^{1-1}\)) or \(-gt + v_0\) you take the exponent power, multiply the coefficient of the base, and subtract the power, so the derivative of \(-gt^1 + v_0\) is (\(1\cdot-gt^0 + v_0\)) which is -g, as we'll see later any terms without a variable (in this example t) cancels out or as Wildberger mentioned are so small so they are discarded.

Summary: if you know the acceleration, then you integrate to find the velocity, then integrate again to find the position. If you know the position, you take a derivative to find it's velocity, and take another derivative to find the acceleration. One function tells you how another function is changing. These are also higher order functions we learned in CS019, where a function produces another function.

Try reading the first chapter from Gilbert Strang's calculus and see if you understand it. His book was designed for self-learning, so that first chapter assumes little background and builds up this intuition of one function that tells you about another. There is also a lecture for this chapter and a self-study guide that acts as a recitation.

Set Theory

History of set theory

There is a lecture on the history of sets/functions, explaining ordinals, cardinals, various schools of logic, and a second lecture about the history of set theory with Godel and Turing.

Appendix A: the basics of mathematical logic

This chapter on set theory in Analysis I relies on the fact that you've already read the appendix on logic and have an idea what implies means or disjunction. Now that we have done some proofs ourselves, and the basics of trigonometry we can finally understand some of the examples he uses in this appendix. Tao makes a point that this chapter is meant to be read and understood not studied and memorized, and thankfully he is not bombarding us with piles of confusing logic notation that you normally see in other 'how to do proofs' books. At the same time you read this appendix, read How to Think Like a Mathematician by K. Houston it will clear up any cryptic definitions like how implication works.

A.2 Implication

A.2 Implication seems confusing on first read, it's modern language use relating to casuality is different from the logician's use which is explained here. The only permutation where implication is false is if p implies q and p is true, but q is false. See the Kevin Houston book for more details.

An optional short series on the history of various schools of logic that led to our present state.

Using Boole's Algebra w/implications

Another optional lecture from a short series on Boole's Algebra which is different than Boolean Algebra you've already seen (truth tables). For example at around 24:29, he shows how you can verify Modus Ponens (p implies q) with some very simple algebra instead of deducing symbols.

Appendix A cont.

Proposition A.2.3, confusing the hypotenuse "Suppose that…" by assuming the conclusion. We always assume the hypotenuse, and prove the conclusion. Proposition A.2.6, proof by contradiction, we know that when pi/2 (90deg) the sine function is 1, because remember it is the y axis of the unit circle. Tao also makes a note here to not use confusing logical notation in proofs.

Remark A.5.1 is in the above history of logic lectures about Aristotle logic.

Exercise A.5.1

  • a) For every x++, and every y++, y*y = x or x is the square root of y. Is this true? Square root of two is the obvious candidate for claiming this is a false statement since what y * y = sqrt(2).
  • Think about the rest, using the K. Houston book as you go to find a contrapositive

A.6 the point of the examples is to make a point about arguments that use outer variables depending on inner variables being false. We can come back later and reread this section when we learn more concepts like limits, indeed we will likely have to reread this entire appendix many times over while writing future proofs.

Quiz: Logic

These multiple choice quizzes are from Tao's Analysis class, try the logic quiz. If you struggle read the other two resources from Materials needed relevant chapters on logic, search YouTube for proofs by implication, or search 'discrete math inference rules' there are dozens of tutorials. Take the quiz again.

3.1 Fundamentals (Set Theory)

We are back to reading Tao's Analysis I and still in the realm of the natural numbers. Example 3.1.2 is similar to a nested list such as [list: 1, [list: 1, 2], 100] is length 3, even though there is a nested list overall it still only contains 3 elements. Remark 3.1.3 is in the above historical lectures on set theory. We are referred to Appendix A.7 Equality to read about the substitution axiom which you should have already read.

Lemma 3.1.6 if set A is equal to set (empty), then if x is in A it must also be in set empty, a contradiction, so A must be a non-empty set is how I interpret the lemma, which is a way to say that if an empty set exists, there must also be a possibility of a non-empty set. Remark 3.1.8 Cardinality if you watched the Wildberger History of Mathematics lecture linked above assigns a value.

Note the errata for Definition 3.1.4 from Tao's site: "Definition 3.1.4 has to be given the status of an axiom (the axiom of extensionality) rather than a definition, changing all references to this definition accordingly. This requires some changes to the text discussing this definition. Firstly, in the preceding paragraph, “define the notion of equality” will now be “seek to capture the notion of equality”, and “formalize this as a definition” should be “formalize this as an axiom”. For the paragraph after Example 3.1.5, delete the first two sentences, and remove the word “Thus” from the third sentence. Exercise 3.1.1 is now trivial and can be deleted." So this is now Axiom 3.1.4, but the definition remains the same.

Exercise 3.1.2 prove emptyset, {emptyset}, {{emptyset}}, and {emptyset{emptyset}} are all not equal.

  • If \(\emptyset\) = {\(\emptyset\)} then by Def 3.1.4 \(\emptyset\) \(\in \emptyset\) a contradiction to Axiom 3.2
  • Same reasoning as above for \(\emptyset\) is not equal to {{\(\emptyset\)}} or {\(\emptyset\){\(\emptyset\)}}
  • {\(\emptyset\)} is not equal to {{\(\emptyset\)}} because [list: 1] is not equal to [list: [list: 1]]. The first has element 1 in it, the second has element [list: 1]. More precisely if {\(\emptyset\)} = {{\(\emptyset\)}} then by Def 3.1.4 every element of {\(\emptyset\)} (it's only element is \(\emptyset\)) must be in {{\(\emptyset\)}} (it's only element is {\(\emptyset\)}) and vice-versa, so claiming {\(\emptyset\)} \(\in\) {\(\emptyset\)} which contradicts Axiom 3.3, where y \(\in\) {a} iff y = a. In this case y must = \(\emptyset\) to be in {\(\emptyset\)}.

Remark 3.1.21 clarifies the above exercise how {2} is not in {1,2,3} but 2 is in {1,2,3}. The rest of this chapter is straight forward definitions of set operations, better than the usual treatment of venn diagrams. Remark 3.1.26 distinct is defined as there are elements of one set which are not elements of the other so \(\emptyset\) and \(\emptyset\) are not distinct because neither can have elements. In Proposition 3.1.28 notice how any operation on two sets results in a new set, like a higher order filter function over a list results in a new list.

Example 3.1.32 looks a lot like programming syntax. There seems to be an order of evaluation, applying the axiom of specification first to create a new set, then applying the axiom of replacement so working right to left in {n++ : n \(\in\) {3,5,9); n<6}

Quiz Sets

Try Tao's quiz. If you totally failed this try secondary reading material of the other two books in Materials needed. Watch Youtube lectures and take the quiz again.


  • 3.1.3 prove the remaining claims in 3.1.13:
  • If a and b are objects, then {a, b} = {a} U {b}:
    • By (new axiom) 3.1.4 we need to show every element x in {a, b} is in {a} U {b}. Suppose that x is an element of {a, b} then by axiom 3.3 x = a or x = b. By Axiom 3.4 this means at least one of the following statements is true: x is in {a} or x is in {b}. We divide into cases. If x is in a, then by axiom 3.4 it is in {a} U {b}, similarly if x is in b, it is also in {a} U {b}. Thus in both cases we see every element of {a, b} is in {a} U {b}.
  • The next claim, prove the union operation is commutative:
    • Follows the same proof he did for the union operation is associative. Suppose x is an element of A \(\cup\) B. By def 3.1.4 and axiom 3.4 pairwise union then x \(\in\) A or x \(\in\) B. By disjuntion this is also x \(\in\) B or x \(\in\) A. So x is in B \(\cup\) A and x is in A \(\cup\) B.
  • Last claim, prove A \(\cup\) A is equal to A \(\cup\) \(\emptyset\) is equal to \(\emptyset\) \(\cup\) A is equal to A. Suppose x is an element in A \(\cup\) A, then x is in A or x is in A, so by disjunction (we can pick either, but there's only one choice) x is in A. If x is in \(\emptyset\) \(\cup\) A, x is in \(\emptyset\) or x is in A, so x is in the union of A and \(\emptyset\) and since the emptyset is empty x is in A.
  • 3.1.4 prove claims in Proposition 3.1.18
    • You can write both these proofs exactly like his 3.1.18 proof for transitivity. Suppose A \(\subset\) B and B \(\subset\) A. To prove that A = B we have to prove that every element A is an element of B, and every element B is an element of A by def 3.1.4. Let us pick an arbitrary element x \(\in\) A. Then, since A \(\subset\) B, x must also be an element of B. But since B \(\subset\) A, then every element of A is an element of B and every element B is an element of A. By 3.1.4 they are equal.
  • 3.1.15 Show how union, intersection and subset are logically equivalent:
  • Look in the Houston book in the appendix, see the advice for how to prove that two statements are equivalent. For this exercise if A is a subset of B, and the union of A and B = B, then this implies the intersection of A and B will be A, which in turn implies A is a subset of B.
    • We will show that any two set operations implies another. If x is in A \(\cup\) B, then by definition of union x is in A or B, so by disjunction x is in B. If x is in A, since A is a subset of B then x is again in B. Thus x \(\in\) A \(\cup\) B if and only if x \(\in\) B and by definition of union, and axiom of equality 3.1.4, A \(\cup\) B = B. Since x is in B, if x is in A \(\cap\) B it must also be in A, so x is in A and B, which is the definition of intersection, and if A \(\cap\) B = A, and if A \(\cup\) B = B, then every element of A is in B so A \(\subset\) B.
  • 3.1.16 Prove proposition 3.1.28
    • de Morgan's laws a proof exists for them everywhere including youtube.
    • Rough proof strategy: what you want to do is get your arbitrary element into the form of the definitions for union, intersection and difference sets.
  • 3.1.8 You can google the absorption laws (disjunction(OR) absorbs conjuction(AND)) then try your own proof
  • 3.1.9 If A intersection B = emptyset, then they are disjoint (have no elements in common). This is the definition of 3.1.27 difference of sets, where x is in A but not in B, so A = x - b and B = x - a if the union of A and B = X.
  • 3.1.10 All I can think of is the emptyset, since emptyset union emptyset is still emptyset, and emptyset intersection emptyset is also disjoint.
  • 3.1.11 There is a solution here, but suppose we have a statement P(x, y) pertaining to x and y such that for each x in A, there is at most one y for which P(x, y) is true. Then by the axiom of replacement there exists a set {y: y = f(x) for some x in A}, then y = f(x) for some x in A, and y is in the set of A precisely when P(x) is true, which is the definition of the axiom of specification. Not a complete proof but something like that is the strategy.

3.2 Russell's Paradox

Wildberger covered some of this already, 3.9 Axiom of Regularity is worth reading to see some more examples of what is and what isn't an element of a set.

3.3 Functions

Example 3.3.3 if you forgot range notation. Remark 3.3.5 clears up some notation for functions notably a sequence subscript also denotes a function. Example 3.3.9 asks why 3.3.7 asserts that all functions from empty-set to X are equal, because they have the same domain (nothing) and range.

Our first proof in this chapter is in Lemma 3.3.12, and if we can go by all the previous chapters, this will be our template for the exercises. It also doesn't make sense, so I checked the errata page and sure enough there is errata for this lemma depending what version of the book you're using.

3.3.14 See this tutorial. Write your own examples for this definition. If two distinct elements of A map to the same element of B, this is not injective. However if those two A elements are equal to each other then it is injective:

  • A = {a, b, a} (function input)
  • B = {1, 2} (output)
    • f(a) = 1
    • f(b) = 2
    • f(a) = 1
      • but {a} = {a}, so a = a and this function is injective.
  • A = {b, c}
  • B = {2}
    • f(c) = 2
    • f(b) = 2
      • but {c} doesn't equal {b} so this not injective

3.3.17 Onto functions, another tutorial and for Remark 3.3.24 here is a tutorial for bijective inverse functions

Quiz Functions

Another multiple choice quiz from Tao's analysis class. If you totally failed this try secondary reading material of the other two books in Materials needed. Watch Youtube lectures you can find. Try the quiz again.


A all of these proofs have a youtube tutorial or stackexchange answer, these are common proofs first year students do.

  • Ex 3.3.1 was deleted in errata (See Tao's site)
  • Ex 3.3.2 The proof for this is in the book
  • Ex 3.3.3 The empty function is always injective since it's 'vacuously true' that f(x) = f(x'). It is never surjective since there are no elements in the emptyset that fit the surjective definition of every element in Y is a value of f(x) of f(x) = y, and thus never bijective, since a function must be both injective and surjective to be bijective.
  • Ex 3.3.4 The tilde hat thinghy is just notation for two different fuctions, like f and f'. Cancellation law means we're trying to prove two different functions are the same so g = g' and f = f'. If f and g are injective, then g(f(x)) = g(f'(x)) and f(x) = f'(x), and by one-to-one definition of injective functions f = f' (since f is g's input). Surjective is similar, g(y) = g'f(x)) = g(f(x)) = g'(y) so g = g'. Does g need to be injective for this proof to always hold? Try corner case examples from the text, like if g = n2, then g(f(-1)) and g(f'(1)) have the same output, so two different inputs to g with same output means f(x) can't equal f(x') as per definition of one-to-one functions so we can't always derive f = f'.
  • Ex 3.3.5 If f is injective in g(f(x)), then g doesn't need to be injective. Consider an example A = {1}, B = {2, 3}, and C = {4}. f(1) = g(2) = 4 so f is still injective. The difference in 3.3.4 is the wording 'show that these two input functions are always equal if g in injective' and the second question 'Is this still true if g is not injective?' which means a single case where if g is not injective, and we get two non distinct inputs that equal the same outputg, the whole proof is false whereas here we are asked 'must g always be injective' so just needa single case of not needing to be injective.
  • Ex 3.3.6 Is described here see Remark 3.3.24 if you forgot what an inverse function is. Note how the inverse of f and f cancel each other out.
  • Ex 3.3.7 Similar to above and his association proof

Proof strategies

Watch this starting @ 36:00, from 41:00+ he shows how he solves problems step by step, trying small cases to understand the problem better and poking at the problem enough to finally figure out a proof. You can check your own proofs too, going through Tao's mathematical logic appendix and seeing if the opposite of your argument holds, and double checking all your assumptions 'is there a definition of X or did I just assume it existed'.

So tl;dr relax and just keep writing because the practice of flipping back through the text and rereading definitions and theorems, tracing the logic and seeing how one definition helps define another, and putting this in the form of notes (a proof) is how you learn. This is why none of these books contain solutions because you're supposed to figure it out yourself.

3.4 Images and inverse images

Chapter 3.4 of Tao's Analysis I. Example 3.4.5 makes more sense if you read Example 3.4.6 too, f-1({0,1,4}) is the set of everything that could result in {0,1,4} if f(x) = x2, so {-2,-1,0,1,2} and in Example 3.4.5, the set of everything in the natural numbers that can result in {1,2,3} if f(x) = 2x, is only f({1}). Axiom 3.11 (Union) if A = {{2,3}, {3,4}, {4,5}} and Ua = {2,3,4,5} why? Because it's {2,3} U {3,4} U {4, 5} you are performing union on every set, this is notation to avoid having to write out an enormous page of union operations. Search youtube for indexed sets. The power set you can choose which subsets you want to add, again search for this on youtube.

The exercises, we've already done similar exercises in set theory there's nothing new here. Reminder that millions of discrete and applied math students have already done similar exercises so you can look up proofs of function inverse images anywhere. I did enough of the exercises so I was sure I understood these set and function operations otherwise move on.

3.5 Cartesian products

Another topic with plenty of youtube tutorials. Definition 3.5.7 notation for product is introduced, it looks complicated but all it's doing is specifying an index (i is greater or equal to 1, and less than or equal to n) and repeated cross product. The exercises some of the solutions are walked through on youtube.

The cartesian plane itself is a cartesian product of R x R, you have tuples (x, y) or (x, y, z) so R x R x R. In programming the type int * int is the set Integers x Integers, anything that isn't a cross product of the integers with itself will not type check.

3.6 Cardinality of sets

A good tutorial. Tao is using #(A) notation which means finite cardinality, while |A| cardinality notation means infinite. There exists errata for the exercises specifically 3.6.8. The last exercise the pigeonhole principle if set A has greater cardinality than B, then f(x) A -> B can't be injective, since there would be two elements in A that point to a single element in B.

Programming sets

If you want you can re-do what we've done so far to get better at this material by programming sets using the book Discrete Math and Functional Programming. Everything we covered such as peano axioms, inductive proofs, cardinality, function images and even converting math notation into programs is covered and highly recommended.

Linear Algebra

We're doing Analysis I and Linear Algebra in parallel. These sets of lectures cover Linear Algebra from a geometric approach so later when we work through Tao's Linear Algebra notes we can understand what is happening.

Why this series and not X book/series

This is unique in that each linear algebra topic is setup with the problem it's trying to solve, how the special case of that problem can be solved, then how you would go about creating and proving a general case. The determinant is built up from scratch. The exercises are largely proofs of the lecture content presented at the end of each lecture. It frequently goes into much more advanced material you won't find in any other linear algebra course which is possible since we're just using a simple affine plane, and working with vectors. The idea is full understanding of the subject using visual geometry so later when we read piles of sums and notation for matrix manipulation we can understand what they are doing. This course teaches you to become 3blue1brown where you can make your own visualizations.

Introduction to Linear Algebra

He constructs the affine plane from scratch. The vector (8/3, 10,3) you need to full screen to better see exactly where they are. From B, you move right 2 lines, and 2/3 so \(2 + \frac{2}{3}\) which is 8/3. From there you move up 1/3, and 3 so \(3 + \frac{1}{3}\) which is 10/3. Reasons for using the affine plane are you do not impose anything on it artificially like the origin (0,0) in the Cartesian plane, you can do projections easily, and it's construction is simple, since only parallel lines.

The biggest a takeaway here is what vector direction is, what a basis vector is and how you can combine them to form larger vectors, and how to translate basis vectors to suit different perspectives. The exercises at the end the delta symbol will return in the lecture on area and the determinant.

In this optional related lecture he shows how ancient people made a ruler using just parallel line geometry, which leads to modelling the rational continuum. Some more optional (short) videos is this from 3Blue1Brown on what a vector is, and this presentation of basis vectors and what a span is. Change of basis is video 13 in the 3BlueBrown playlist but the explanation here by Wildberger in the first lecture I find is better.

Vector Arithmetic

All this vector arithmetic will come up later in other courses, at least now you will be somewhat familiar with it. Subtraction means going in the opposite direction, so if you're at the head of one vector and going to another vector's head, you are subtracting. The generalized affine combination where coefficients add up to 1 will come up again.

The proof for diagonals of a parallelogram bisect each other, note how quickly this can become a pile of symbols.

  • let lambda = mu in: lambda = 1 - mu
    • if lambda = mu, then moving mu over to the other side means 2lambda
    • 2lambda = 1
    • Divide each side by 2 to isolate lambda
    • lambda = mu = 1/2

@38:00 or so solving linear eq with 2 unknown variables we already covered in the first lecture or review Crash course in solving equations & polynomials.

Center of mass and barycentric coordinates

Since we just did Physics in the software workshop (CS019 Differential systems) let's learn about Archimedes principle of the lever, accelerations in the grid plane and Newton's laws. This will come up again, note the labs here for a course in applied math specifically Lab Manual vol 2, writing python fuctions that perform Barycentric Lagrange interpolation. This is used to study air pollution by approximating graphs.

Area and volume (determinants)

The fourth lecture in the series to lead up to determinants. You probably recognize ad-bc from the first lecture, which was one of the exercises. He uses different notation for bi-vectors since it represents the union of vector a and b. The torque example you have probably seen somebody using a long pipe over a wrench in order to increase the torque without needing increased force, now you know why increasing the area of the wrench increases the torque. All 3 physics examples can be modelled with a parallegram/bi-vector. The bi-vector rules (ie: adding a scalar to one side of a bi-vector doesn't change the bi-vector) will make sense around 35:50 when he produces a picture proof showing how moving things doesn't change the bi-vector. At around 38:00 we once again get the formula ad-bc, the signed area is a scalar of two basis vectors. 45:50 introduces three-dimensional affine space.

Now that you've seen this model of area using bi-vectors, see this 10 minute presentation by 3Blue1Brown which will fill in some more details, like why the signed area can be negative if you didn't already figure it out. The exercises are conceptually easy, you don't have to do all of them but the whole point of linear algebra is to take these geometric concepts and do them in algebra.

Change of coordinates and determinants

Lecture 5. Now we get a full intro to matrix and vector notation, and matrix operations, this is normally where linear algebra courses start. I like the fact that he built up the change of coordinates first so we aren't just presented with a meaningless array of numbers.

Exer 5.1 if you make a B matrix with e, f, g, h and then multiply AB, take the resulting matrices determinant. The piles of letters should equal det(A) x det(B) after you do some commutative rearranging, this is a 10 minute exercise.

2x2 matrices I

Lecture 6. I appreciate he starts all these lectures with a review, that makes them mostly self contained. This is something I haven't seen before, explaining all the arithmetic of 2x2 and higher matrices using just a 1x1 matrix. The inverse of a matrix is now simple to understand. He walks through some proofs of matrix laws like the associative law (AB)C = A(BC). The standard cartesian coord system is used here for examples of transformations like dilations and shear. A dilation is growing or shrinking, like your eye being dilated.

The end of this lecture: how do you visualize an arbitrary transformation? The next lecture expands on this intuition.

2x2 matrices II

Lecture 7. This tells you how to visualize transformations T(e1), T(e2) are the columns of a matrix, so you can look at it and see how it is being transformed. He briefly introduces an alternate framework of rational trig so we don't have to use those sin/cos functions to figure out a rotation. There is a book and entire playlist for rational trig, if you want to learn it write your own software library for rational trig as you go.

End of the lecture introduces non-linear approximations, using differential calculus, to set up doing calculus in higher dimensions. The exercises you become 3Blue1Brown, able to visualize transformations yourself.

Inverting 3x3 Matrices

Lecture 8. 3 dimensional linear algebra. Pretty good setup explaining the inverse of a 3x3 matrix, most books just give you a formula. We find out why delta is 1/delta. Transpose is also explained better than I've seen before. Reminder that these really complicated introductions all have an off the shelf formula, but he's showing us how they are derived fully so you understand the formula. 3x3 matrix multiplication is explained and pretty simple, explains why B(A(vec)) is the same as (BA)vec. If you're also doing the AI workshop, in 15-388 it's claimed B(A(vec)) is preferable in terms of complexity of running time to (BA)vec.

Integers and rational numbers

Introducing the integers

  • YouTube - Arithmetic and Geometry Math Foundations 12

Wildberger's definition is similar to Tao's, he walks through a proof of a-(b-c)=(a-b)+c

4.1 The integers

Chapter 4 of Analysis I by Tao. Definition 4.1.1 is exactly the definition from the above Wildberger foundations video. Lemma 4.1.3 proof you can finish most of it yourself without reading the whole proof. Lemma 4.1.5 Tao notes the same concerns Wildberger expressed why they both used the 'a less b' style of definition instead of just using Lemma 4.1.5 as the definition of integers, because of the many different cases that would be needed. Tao mentions this definition of the integers forms a commutative ring because xy = yx.


Proof scratchwork, these exercises are all similar to the Wildberger proof he gave in the lecture and Tao's example proofs.

4.1.1 Proving A.7 Equality again. From Def 4.1.1 equality is defined as: a–b = c–d iff a + d = c + b and the variables are natural numbers. Does a–b = a–b? In this defition: a + b = a + b so yes, since we already proved natural numbers obey the reflexive axiom. Does x–y = y–x or x + x = y + y and yes, x and y are natural number objects. If a–b = c–d, and c–d = x–y, does a–b = x–y? Converting everything to their respective natural number definition, and arranging on each side: a + d + c + y + a + y = c + b + x + d + x + b cancel and you're left with 2a + 2y = 2b + 2x, divide each by 2: a + y = x + b, converting this back into an integer this is a–b = x–y similar to Tao's proof in 4.1.1 of transitivity.

4.1.2 If -(a–b) = -(a'–b') are defined as b–a = b'–a' then b + a' = b' + a. This is the integer b–a = b'–a'. Now convert a–b = a'–b' to a + b' = a' + b and notice they are the same, and since natural number construction -(a–b) = -(a'–b') if a–b = a'–b'.

4.1.3 (-1) x a = -a to solve use Tao's advice in remark 4.1.7 to assign a = (a–b) or some other integer notation. In lemma 4.1.3 Tao writes 1 is equal to 1–0, so if want to negate 1, this is 0–1, and in full integer notation: (0–1)(a–b) = ((0a + 1b)–(0b + 1a)) which is (b–a), which is what (a–b) would be if -(a–b).

4.1.4 Prove the rest of 4.1.6, we already did x1 = 1x = x kind of in exercise 4.1.3, convert 1 to (1–0) and pick any arbitrary variables to represent x as (a–b) and this is easy to prove. Same with 0, use lemma 4.1.3 to convert to (0–0) so (x–y) + (0–0) = (x + 0)–(y + 0) or (x–y).

4.1.5 ab = 0 is (x–y)(r–s) = (0–0) or xr + ys = 0 which is true only if xr = 0 and ys = 0. So (0–0)(r–s) or (x–y)(0–0) or both are 0 since we are dealing with natural number constructions as per lemma 2.3.3 nm = 0 only if n=0 or m=0.

4.1.6 if a, b, c, are integers such that ac = bc and c is non-zero, then a = b. (a–a')(c–c') = (b–b')(c–c') and c is non-zero, then (a–a') = (b–b') or a + b' = b + a'. (ac + a'c')–(ac' + a'c) = (bc + b'c')–(bc'+b'c). This is ac + a'c' + bc' + b'c = b'c' + bc + a'c + ac' or factored to c(a + b') + c'(a' + b) = c(b + a') + c'(b' + a). Now we are the exact same place as Tao's proof in Lemma 4.1.3 where he wrote since a + b' = b + a' two sides are equal. If (c–c') was (0–0') then this would be 0 = 0 not (a–a') = (b–b'). Not a proof, but explains why the cancellation law works. If a + b' = b + a' we can also substitute, so c(b + a') + c'(b + a') = c(b + a') + c'(b + a'). Subtract one of the c(value) or c'(value) from both sides so c'(b + a') = c'(b + a') and now we can divide by c' to get b + a' = b + a', and since a + b' = b + a' then substitute back b + a' = a + b' which won't work if c' = 0. Not really the proof Tao wanted but it kind of works.

4.1.7 You can also do these in integer notation. If a,b,c are integers, then a > b iff a - b is a positive natural number. Ordering is defined as a = b + n and a does not equal b in strictly greater than a > b. Taking a = b + n subtract b from both sides, we get a - b = n. If n = 0, then that means a - b = 0 which is a contradiction to Tao's definition that a and b are not equal and thus a - b = some positive n iff a > b is how I read that exercise, the rest are similar.

4.1.8 find an example property, where Axiom 2.5 as stated in the book doesn't work for integers. The axiom relies on the fact that P(n++) is true so negative n can break many properties since n++ will be another negative number unless you state like Tao does 'for all positive integers'.

This chapter has solutions everywhere floating around on stackexchange and other sites. Remember the solutions you find online could be more difficult than necessary and if you tried to solve it yourself, maybe you come up with a better proof than they do.

Rational numbers

  • YouTube - Arithmetic and Geometry Math Foundations 13

This definition is similar to 4.2 in Analysis I. The inverse law, write your own example, the rational number 4/1 inverse is 1/4 and 4/1 x 1/4 is 4/4 or 1. A rational number is (a-b)/(c-d) according to it's definition but we write them as p/q and in some cases just p if q = 1.

Visualizing the rational numbers

  • YouTube - Arithmetic and Geometry Math Foundations 14

The rational number line sits on [0, 1], so you don't end up with a zero denominator which isn't defined. A rational number is a line through the origin [0,0], and an integer lying on point [m,n]. Ford circles are an example of the massive complexity of the rational numbers, which are expanded upon in the next lecture.

The deep structure of the rational numbers

  • YouTube - Real numbers and limits Math Foundations 95

This is about 35m long but worth the watch if you want to see what a convex combination is, learn about rational intervals, and a more complete description of Ford Circles set up with Farey sequences. This will help us understand what a sequence is later in Tao's book. The overall takeaway of this lecture is that the smaller the interval you take on the rational line, the more complex the numbers become with enormous denominators meaning it has a very deep structure.

The next videos, #96 and #97 about Stern-Brocot trees and wedges are highly recommended and can also be found in the book Concrete Mathematics. In Lecture #97, which we can now understand that we know basic linear algebra, he takes a visualization on the plane of the Stern-Brocot tree and shows it also can be used as a geometric interpretation for the rational numbers, it's kind of crazy when you see it. The 'fraction' 1/0 is also explained.

4.2 The rationals

Tao's Analysis I chapter 4.2, definition 4.2.1: 3//4 = 6//8 = -3//-4 we know from Wildberger's visualization of the rational numbers, these numbers all lay on a line. Lemma 4.2.3, in the proof of addition getting rid of +bb'cd is a simple case of subtracting it from both sides, cancelling it out leaving ab'd2 = a'bd2 and he can divide both sides by d because d is known to be non-zero, since it's in the denominator of a/b + c/d.

To recap we defined the natural numbers, embedded them in the integers (a - b) to define integers, then embedded integers inside the rational numbers where the arithmetic of the integers is consistent with the arithmetic of the rationals. The rationals are a field because of the last identity in 4.2.4 the inverse algebra law. Tao's example proof is similar to the proofs we've seen Wildberger do, convert the letters into integer // integer form, apply law, commute the results together so that they are the same.


4.2.6 in my copy must be a mistake 'show that if x, y, z are real numbers…' that must be 'rational numbers' and sure enough checking Tao's errata page it is a mistake. There's a mistake in exercise 4.2.1 as well depending which version you're using.

  • Ex 4.2.1

We have the usual exercise, show that the definition of equality for rationals is reflexive, symmetric and transitive. We go into the Appendix A.7 Equality and see that the Reflexive axiom is given any object x, we have x = x. In Definition 4.2.1 equality of two rational numbers is defined as a/b = c/d iff ad = cb. If a/b = a/b then ab = ab thus reflexive. Symmetry axiom, if x = y then y = x. If a/b = c/d, symmetric equality is c/d = a/b. By definition 4.2.1 a/b = c/d is ad = cb, and c/d = a/b is cb = ad so Symmetry axiom satisfied.

The hint for proving transitivity is to see Corollary 4.1.9 the cancellation law for integers: If ac = bc, and c is non-zero, then a = b. In A.7 the Transitive axiom states if x = y and y = z then x = z. If a/b = c/d, and c/d = e/f, then a/b = e/f or if ad = cb, cf = ed, then af = eb. We can also rewrite this as adcf = cbed and simplify to af = eb. Staring at this we isolate af(cd) = eb(cd) so the rest of the proof involves logical deducation to determine that c and d aren't zero so we can cancel them from both sides, or through this deduction infer another equality of ab = ef, there is a few ways you could prove transitivity. For example we know that d can't be zero since c/d wouldn't be a rational number. If ad = 0 since we know d is positive, a must be 0. If c is zero, then cf = ed is 0 = ed and again d must be positive so e = 0. So now we have a and e as possible 0, so af = eb is 0 = 0 or: 0/b = 0/f and transitivity is proven though in a convoluted sort of way using deduction and making a lot of implications.

The transitivity proof is a great example of 'just naively write your proofs until you get better eventually'. We used Tao's hint of finding numbers than can or can't be zeros and wrote a proof that is probably not what he would've done, but at this early stage who cares, it works until we learn about proof theory and fortify our logic skills and in the process by manipulating around rational number equality you won't forget the definition that a/b = c/d if ad = cb, which to me is the point of doing proofs, to remember the material.

  • Ex. 4.2.2

Prove whatever remains to be proved in Lemma 4.2.3, let's try the negation since the multiplication is simple. If a/b = a'/b' then ab' = a'b is the setup Tao gives us. If -(a/b) = (-a')/b' then by def 4.2.1 we get (-a)/b = (-a')/b' which is -ab' = -a'b and multiplying each side by -1: ab' = a'b holds. Lemma 4.2.3 multiplication: If (a/b * c/d) = ac/bd and (a'/b' * c/d) = a'c/b'd we need to prove ac(b'd) = a'c(bd) by manipulating it into the form ab' = a'b since a/b = a'/b'. We can just do what Tao did, cancel c and d from both sides, leaving ab' = a'b however we should make sure they aren't zero, using the same kind of deduction we used to prove transitivity. We know d can't be zero since c/d would mean it's not a rational number if zero. Another strategy of course is to just add a scalar to the original ab' = a'b of (cd) on each side, which is fine, and solves the proof, providing you point to the definition in the book where a = a is the same as ab = ab.

  • Ex. 4.2.3

Prove the rest of Proposition 4.2.4. The first is the same as Tao's demo proof. Scratch work, you should try all these yourself they are simple:

  • x + 0 = 0 + x = x
    • a/b + 0/1 is (a*1) + (0*b)/b*1 or a/b
    • 0/1 + a/b is (0*b) + (a*1)/b*1 or a/b
  • x + (-x) = (-x) + x = 0
    • a/b + (-a)/b is (ab + -ab)/bb = 0/bb or 0
    • (-a)/b + a/b is (-ab + ab)/bb = 0/bb or 0
  • xy = yx
    • a/b*c/d is ac/bd
    • c/d*a/b is ac/bd
  • (xy)z = x(yz)
    • (a/b*c/d)e/f is ac/bd*e/f or ace/bdf
    • e/f(a/b*c/d) is e/f*ac/bd or ace/bdf
  • x1 = 1x = x
    • a/b*1/1 is a/b and also 1x as per xy = yx which we just proved
  • x(y + z) = xy + xz
    • simple task to do on paper a/b(c/d + e/f) = (cfa + aed)/bdf
    • a/b*c/d + a/b*e/f = ac/bd + ae/bf or (acbf + aebd)/bbdf and cancel one of the b's which we know can't be negative since it's in the denominator
  • rest same as above
  • Ex 4.2.4

Prove Lemma 4.2.7 is exactly the same as Tao's proof for Lemma 4.1.5 of the integers just substituting rational numbers and their definitions.

  • Ex 4.2.5

We've already proven propositions of order in naturals and integers, this is similar but with a/b < c/d

  • Ex 4.2.6

If you multiply -1 to both sides negating them, then 1 < 2 will end up -1 > -2 which is what should happen since -1 is not as small a qty as -2 if our interpretation is a number line. Proof is somewhere in Def 4.2.8, if x < y then x - y is a negative rational number so a/b - c/d = (-e)/f. Then come along with negative rational z: (-g)/f and multiply both sides, we get a positive number so x > y, at least in scratch work I did until I ran out of time.

Decimal numbers

The next chapter in Tao's Analysis I uses decimals to represent tiny positive quantities (epsilon) so we should probably know what decimals are.

Wildberger has a basic introduction lecture using natural numbers that assumes you know nothing about exponents and reviews the hindu-arabic notation, shows how to convert a decimal to a fraction. The second lecture introduces rational number decimals and their visualization.

His definition is a decimal number is a rational number a/b, where b is a power of 10 (hence deca/decimal), then shows various examples how you can manipulate rational numbers like 1/2 to be 5/10, or 0.50, we also find out why 1/3 is not a decimal number. In binary, it's a base-2 system, so the only prime factor is 2. If you load up any language interpreter and try and add 0.1 + 0.2 you will get surprising results, typically this number:

The examples are pretty good, showing how the hindu-arabic encoded powers of 10 drop down by the divisor and result in a decimal. The visualizations are for later lectures showcasing the complexity of bigger and bigger decimals. At 36:00 arithmetic w/decimals is covered, showing how the rational number and decimal number arithmetic methods are the same so you can just stay w/rational numbers if you want converting back to decimals at anytime.

Appendix B: the decimal system

From Tao's book Appendix B.1 The decimal representation of natural numbers his opinion is that the decimal system is not essential to mathematics, computers have replaced it with binary and 'we rarely use any numbers other than one-digit fractions in modern mathematical work'. Of course to understand scientific notation we still need to know how decimals work and we just did with those Wilberger lectures so this chapter is optional (we can't do it anyway before learning the the real numbers). The single exercise in B.1 is interesting, 'the purpose is to demonstrate that the procedure of long addition taught to you in elementary school is actually valid'.

4.3 Absolute value and exponentiation

Def 4.3.1 you apply that def to the rational number, so if x = -3, the absolute value is -x so -(-3) which of course makes it a positive rational number, as seen in the example of 4.3.2 where the absolute value of 3 - 5 = 2.

Def 4.3.4 that symbol is epsilon, meant to be a tiny quantity he defines as greater than 0. The example 4.3.6 if two numbers are equal, and their difference is zero, then they are 'epsilon close' for any positive epsilon, since their difference is not bigger than any positive epsilon.

Proposition 4.3.7 recall that all these definitions are for d(x, y) or d(z, w) so x - y or z - w. His proof he moves xz to one side, so it becomes negated (subtract from both sides). He then manipulates |az + bx + ab| according to previous definitions to get |a||z| etc., and then uses the various propositions in 4.3.7 to rewrite them as eps|z| + delta|x| + eps(delta). This will make sense after we attempt to prove the rest of the propositions.

Rest of this chapter covers exponents and their properties.


4.3.5 Prove that \(2^n \ge n\) for all positive integers n using induction.

  • base case: n = 0
    • 20 = 1 and since n = 0, 1 > 0.
    • but Tao specifically wrote 'positive integers' so base case must be 1
    • 21 = 2 and since n = 1, 2 > 1
  • inductive step:
    • \(2^{n + 1} \ge n + 1\)
    • left side: \(2^{n + 1}\) is 2n * 2 by def 4.3.9
    • 22n is also 2n + 2n by prop 4.3.10 (a)
    • so now we have 2n + 2n \(\ge\) n + 1
  • assumption:
    • since 2n \(\ge\) n, then 2n + 2n \(\ge\) n + n
    • since we are working in only positive integers, we know n + n is \(\ge\) n + 1
    • 2n + 2n \(\ge\) n + n \(\ge\) n + 1
      • and now here we have standard transitivity proving \(2^{n + 1} \ge n + 1\)

This of course falls apart if Tao changed it to 'for all integers' instead of positive integers, or if we allowed our assumption 2n > n to have n = 0, then we could not assume n + n is \(\ge\) n + 1. The rest of the induction exercises are similar, like proving Prop 4.3.10 so inducting on n: \(a^{n + 1}a^m\) = \(a^{n + 1 + m}\) the left hand side becomes \(a^n \cdot a \cdot a^m\) which is also the definition of \(a^{m+1}\) so can be rewritten \(a^na^{m+1}\) or \((a^na^m)a\) so in form of the hypotenuse, we change it to \((a^{n+m})a\) which is of course \(a^{n+1+m}\) and now both sides are equal.

4.4 Gaps in the rational numbers

Proposition 4.4.1 there is an integer n, and a rational number x, such that n <= x < n + 1 so sandwiched between integers 2 and 3 are rational numbers such as 25/10 or 100/40 etc.

Proposition 4.4.3 there is always some rational you can squeeze in between two other rationals as we've already seen in Wildberger lectures. 1/800 < 1/500 < 1/100 etc. If you want put in examples for his generalized proof to better understand it like x = 2, y = 3 and z = (2 + 3)/2.

Proposition 4.4.4 famous proof by contradiction there does not exist a rational number x for which x2 = 2

Proposition 4.4.5 is basically saying for any epsilon bigger than 0, there exists a rational number that when squared it's less than two, and less than x + epsilon. Remember there is an infinite amount of rational numbers between 0 and 2 like trillion/1000 billion = 1. 12 is 1, if epsilon is 1 so that x + epsilon = 2, then (x + eps)2 is (1 + 1)(1 + 1) and bigger than 2. He asks why ne (e = epsilon) is non-negative for every natural number n in (ne)2 and the parens grouping ne and squaring it obviously make it positive ie (-2)2.

Tao mentions Dedekind cuts will be avoided, and the theory of infinite decimals has complexity problems so also will be avoided. Wildberger has numerous lectures on both theories if you're interested the most detailed ones are in his 'Famous Math Problems' playlist. Wildberger makes the point that you will only ever find the cherry picked example of the square root of 2 for Dedekind cuts in any undergrad material because of the mind boggling complexity involved trying to partition a set for pi2/6 and other irrational numbers.

The infinite descent exercise Tao has basically handed the solution to us, if you fix any sequence of natural numbers then by their definition there are finitely many numbers from 0 to your fixed point, so you can't choose infinitely many distinct natural numbers less than your fixed point so infinite descent does not exist for the naturals because you'll run out of them. Integers you can infinitely descend to negative infinity and of course rationals we know you can infinitely descend with bigger denominators until you approach zero or negative infinity.

Real Numbers

This is a pretty good chapter in Tao's book, probably the clearest treatment I've ever seen of what exactly the construction of the Real numbers is using Cauchy sequences but not using limits to define them and other circular definitions. Tao invents some concepts here for the sole purposes of staging the definitions then tosses them away when we learn limits.

It will be extremely helpful to watch parts of the next few lectures to see the architecture of what we will be doing so things like absolute value distance within epsilon makes sense.

What exactly is a limit

  • YouTube - Real numbers and limits Math Foundations 106

Watch the first 20 minutes of this lecture and you'll see what the notion of an epsilon displacement is. Note that Tao will define all these things for us later, but at least you'll have an idea of what epsilon is. The rest of the lecture is excellent covering a different definition of limits but optional.

2D geometric interpretation of Cauchy sequences

  • YouTube - Real numbers and limits Math Foundations 94

Watch from 2:20 to 16:00, he shows the 2D geometric interpretation of Cauchy sequences, and how they are interpreted as 1D on the real line, the idea of 'converging sequences'.

Real numbers and Cauchy sequences I

  • YouTube - Real numbers and limits Math Foundations 111

He explains the triangle inequality around 8mins, we just read about this in chapter 4.3 of Analysis I (and is one of the exercises).

Real numbers and Cauchy sequences II

  • YouTube - Real numbers and limits Math Foundations 112

An excellent lecture why we even need these complex limit and real definitions in the first place, showing how limits make many subjects in analysis much easier. Archimedes bracketed measurement is covered when he tried to figure out the approximation of pi. We also get an introduction to transcendental functions as a geometric series, something we will see in Analysis II by Tao in detail. The next lecture in the series is optional but Wildberger presents an alternative definition of real numbers, using Archimedean style of nested intervals where equality, algebra laws, order etc are all trivial to prove.

5.1 Cauchy sequences

Def 5.1.1 the notation outside the bracket is n = starting index, and the infinity symbol is the length of the index.

Example 5.1.2 reminds us of Def 4.3.4 d(x,y) = |x - y| \(\le \epsilon\) so if x and y are 2, then 2 - 2 = 0 and every single episilon is valid since x - y will always be less than epsilon.

Def 5.1.6 if you watched the Wildberger lectures on What Exactly is a Limit then you can see what Tao is setting up here with epsilon-steadiness, there is some index in the sequence where the entries past that index are epsilon-steady. Example 5.1.7 the sequence 10,0,0,0.. d(10,0) is 10, so this sequence is not eps-steady for any epsilon less than 10, but if we choose N past 10, then all the remaining sequence is 0,0,0.. and d(0,0) is 0 - 0 so as per Def 4.3.4 will always be less than any epsilon so 'eventually eps-steady for every eps > 0'.

Def 5.1.8 if you saw the above Wildberger lectures on Cauchy sequences this is defining a convergence so any sequence that eventually becomes epsilon-steady, is a Cauchy sequence. Example 5.1.10 depending on your edition there's errata, in mine (3rd edition Springer from libgen) the example makes no sense.

Prop 5.1.11 proof he cites the fact that there always exists some rational number you can squeeze in between numbers and chooses 1/N noting that no beginner would come up with this proof and it is merely staging in preparation of learning what a limit is. Later in Example 5.1.13 he notes Prop 4.4.1 can again be used to prove the infinite sequence 1, -2, 3, -4.. has no bound, I would guess because of the 4.4.1 statement 'there exists a natural number N and rational x such that N > x ie: no such thing as a rational that bounds all the natural numbers so an infinitely increasing and decreasing sequence can't be bounded.

Exercise 5.1.1 since we declared it a Cauchy sequence, you can point to the definition of Cauchy sequences being an object that eventually becomes epsilon-steady, so Lemma 5.1.14 proves the finite part of Cauchy sequences up to epsilon-steady are bounded, and obviously epsilon-steady means it's bounded since you can subtract two indices and they will always be less than epsilon if this is a Cauchy sequence and epsilon-steady, or 1-steady, 2-steady, 1billion-steady. So both are bounded are ergo all Cauchy sequences are bounded using his definition of episilon-steadiness.

5.2 Equivalent Cauchy sequences

Prop 5.2.8 is the proof that 0.9999… = 1, and again prop 4.4.1 is brought out to claim we can always choose any rational number we want to make 2/N <= epsilon. If you follow this proof:

  • (1 + 10-n ) - (1 - 10-n )
  • 10-n + 10-n = 2 x 10-n
  • 10-N is \(\le\) 1/N because of the chapter on exponents, if N = 3 then 10-3 is 1/3x3x3 which is smaller than 1/3
  • Now choose an N that satisfies the requirement that 2/N \(\le\) \(\epsilon\) and it will automatically be true that it also satisfies 2 x 10-N \(\le\) \(\epsilon\)
    • Prop 4.4.1: you can always choose some rational to shove in there that will satsify the req

Only two exercises here, simple point to definition pretend you are a lawyer writing an argument and convince an invisible jury through explanation why it's true an equivalent sequence of rationals is a Cauchy sequence if and only if one of them is a Cauchy sequence. There's no need to pull out any fancy techniques here we will learn all those later.

5.3 The construction of the real numbers

Tao has one of the few books anywhere that actually defines the real numbers. Just like understanding the constructions of rationals and integers helped us understand their operations better, understanding the construction of the reals should help us better understand crazy abstractions like xe or irrational square roots.

Def 5.3.1 the formal limit of a Cauchy sequence, meaning the number a sequence is converging towards, is a real number. Every rational number is also a real number, for example if we choose 5, then a Cauchy sequence converging to 5 like 4.99999… is bounded by 5 and is a real number. The notation is: x is a real number if x = the limit of a Cauchy sequence an, as the sequence goes from index n to infinity so a1, a2, a3

The rest of the chapter covers the same definitions we've already seen except now we are working with a different object, but since it's construction is entirely made up of rational numbers in a sequence almost all the laws are the same. Division by zero not being allowed for real numbers is explained using the sequence 0.1, 0.01, 0.001.. is actually converging towards 0, and it's inverse 1/0.1, 1/0.01, 1/0.001 is 10, 100, 1000… an infinite unbounded sequence therefore not a Cauchy sequence, therefore not a real number, so division by zero is undefined.

I am just casually reading the proofs in this chapter, Tao has simplified it for us with his epsilon-close 4.3.7 propositions and his definition of equivalent Cauchy sequences. This is a much more intuitive and clear You have to unravel all the abstractions like we did before with rationals and integer definitions but they make sense thanks to his prior staging for example the proof of Lemma 5.3.17 is surprisingly easy.

The proof of Lemma 5.3.6 was already explained by Wildberger here if you're wondering what epsilon/2 means.

Looking at the exercises, I'm guessing to prove 5.3.5 we show the reciprocal of 1/n is unbounded so we must be working with a sequence not bounded away from zero. In prop 5.1.11 Tao proved the 1/n sequence is a Cauchy sequence, and since all Cauchy sequences are bounded you can also show it's bound is 0. There is worked out solutions for this chapter here but remember there is no single proof answer, try to prove it a different way.

5.4 Ordering the reals

I thought this chapter would be tedious since we've already done ordering, but it was interesting to go back to prior definitions and see how they work when cast into this Cauchy series abstraction. Prop 5.4.8 if 5 > 4 then 1/5 < 1/4.

Prop 5.4.9 'non-negative reals' means 0 or positive, while positive reals do not include 0. Closed I take to mean a set is closed iff the operation on any two elements of that set produces another element of the same set, in other words adding two non-negative real numbers produces another non-negative real. The explanation of the proof is in the comments here but Tao tells us we'll see this more clearly in future chapters.

Prop 5.4.14 we already know from previous proofs and Wildberger lectures that there is an infinite amount of rational numbers you can squeeze in between two other numbers.

5.5 The least upper bound property

Now we learn analysis, the sup(E) least upper bound property that will come up in every single future chapter.

Example 5.5.7 the empty set doesn't have a least upper bound because of prior definitions where you can always find a rational number (which are also contained in the real numbers) to act as an upper bound to nothing like 1/trillion, 1/septillion, etc.

The proof for Theorem 5.5.9 is huge requiring going back to look at the Archimedean property again and some exercises. Wildberger has broken down this proof somewhat showing the gist of what is going on but the last part of the proof that the upper bound S is less than or equal to every upper bound of E is easy to follow.

Remark 5.5.11 we get the answer to 'can you add negative infinity with positive infinity and get zero' which is no.

The proof of prop 5.5.12 makes sense until you get to that inequality, wondering where he got x2 + 5e, which is the Archemedian property. This is basically saying x is the least upper bound, x plus some tiny displacement (epsilon) squared is less than or equal to the Archimedian property of x2 + 5(displacement) since if epsilon is really small, squaring it means it will be even smaller. He derives a double contradiction that x2 is bigger than 2, and x2 is less than 2, so x2 must equal 2 and magically we have a number that solves x2 = 2 which is not rational.

Exercise 5.5.1 you can type into youtube 'proof of infinum' and get a dozen tutorials or you can go through Prop 5.5.8 and Theorem 5.5.9 to change everything to prove a least lower bound which gives some more experience with the Archmedian property which all these proofs seem to rely on. The rest of the exercises 'draw a picture' think of Wildberger with his within epsilon/2 picture in earlier lectures.

5.6 Real number exponents I

Finally we learn square roots. Depending on your copy of Tao's book there's some minor errata here to look up. Note in the proof of Lemma 5.6.8 he's using substitution, if \(y^a = x^{1/b'}\) then we can substitute them both for any occurences, such as \((y^{a})^{a'}\) = \((x^{1/b'})^{a'}\)

Exercises 5.6.2 and 5.6.3 are simple and worth doing, using substitution to manipulate exponent laws. Exercise 5.6.3 that's the square root of x2, so x or using Lemma 5.6.6(g) or Lemma 5.6.9(b): \((x^{2/1})^{1/2}\) is \(x^{2/1 \cdot 1/2}\) which is x1 or x.

You now know what a real number is, sort of. It's an infinite sequence, with a bound meaning it converges to a limit and that bound is the real number. So for example .9999… just keeps growing but it never gets bigger than 1, so it's limit/bound is 1 and therefore the real number 1. The proofs make sense but the idea of everything being a sequence wasn't really explained except that it is a convenient abstraction for introducing the least upper bound, and therefore a limit. We didn't cover the real decimals appendix either which we'll have to look at after reading chapter 6.

Linear Algebra cont.

3D affine geometry

We're watching this lecture: Three dimensional affine geometry | Wild Linear Algebra 9. He actually has a physical model of 3D affine space and shows how it projects to 2D. 3D (x, y, z) coordinates are explained with an interesting question 'how to model space?'. \(A^3\) space and \(V^3\) space are defined, insersecting and parallel planes are illustrated.

Equations of lines and planes in 3D

As typical with Wildberger lectures this goes far into more advanced material like Mobius bands to model all the lines in the plane, nobody ever showed me this in my linear algebra classes.

Lecture 10. Wildberger goes through different equations for lines: functions (calculus), cartesian and parametic. His example of cutting the Mobius band and not disconnecting all the lines reminds me of a lecture from Erik Demaine's advanced datastructures class and his fascination with origami. We get a new operation dmin(), when he finds the 'determinant of minors' you can see how easy this would be to write these operations as a program, and many optimized libraries already exist for doing so we won't be doing this by hand, this is to understand what those libraries do.

50:32 you can pause the lecture and easily calculate the dmin yourself for practice, in fact whenever he does some operation like finding the determinant see if you can do it yourself first.

Applications of 3x3 matrices

Lecture 11 the algebra of 3x3 matrices. A diagonal matrix, with constant diagonal, is a model or copy of the field of rational numbers, every number has a constant diagonal matrix. He goes through various transformations like reflections, projections and rotations in 3D. He shows the standard cos/sin rotation then rational trigonometry rotations, which makes rotations easier to manipulate when we are just learning, the basic rotations using sin/cos are the same. Generalized reflections and projections are exampled, you don't have to memorize everything here just know what's happening: a line can project itself or relfect itself onto the plane, see if you can figure out what the matrices are doing as they all have patterns. The perpendicular definition, now you know why ax + by + cz = 0 is the plane equation. There's only two exercises, this was a good lecture of stuff that's usually not described at all in linear algebra texts.

Po-Shen Loh

Let's see what CMU professor Po-Shen Loh's sites are doing these days, since he's the current national coach of the US olympiad team who has won both the 2018 and 2019 competitions (and even wins the math competitions for former medalists) so surely there will be something interesting for us in his content.

Daily challenge

The marketing indicates it's curriculum is primarily targeted AMC8 to AMC12 (depending on module) middle school competition difficulty emphasizing how to solve problems without requiring memorization. The idea is that everyday for a month you do a challenge problem and then has numerous follow up lectures explaining the challenge as each problem is designed to teach several topics at once. Even though this is directed at kids the content is sophisticated as it's math competition techniques. The pre-reqs are you know everything from pre-algebra and 'are already experienced in math competitions' so things like the math olympiads for elementary and middle schools.

Singing up for free demo tier requires all kinds of age verification bs like $0.33 non refundable deposit of proof of ID for various regulatory reasons since this is a jr highschool targeted site so I will save you the step and look around.

Module 3: Combinatorics Tools he specializes in this so should be a good module it covers graph theory, set theory, prime factors, rotations and reflections, factorials, problem complexity reduction, recursion, case analysis. First challenge is 'Ordered Arrangements' how many permutations of 4 objects can you make in a certain way, dropping the hint you can do it by brute force. The challenge explanation video is a good set up for n choose k, he describes factorials as diminishing choices. He models the same problem with a tree then describes how to just deduce the entire problem is 4!/2.

Module 1: Algebra Basics covers harmonic average, probability, recurrence relations, estimation and everything to do with understanding ratios and fractions. The challenges are all done with pen and paper. The first challenge is 'Fraction Gymnastics', we're asked to multiply 2.5 x 0.25 x 1.333..x 1.2 which since we've seen Wildberger decimal lectures we know how to convert these to fractions and that 1/3 is .333 repeating. After you convert these to fractions and solve the problem he has a series of tutorials explaining the challenge. He gives a short lesson on cancelling. He explains why these fractions were picked, any num/2 is a 0.5 decimal, num/4 is 0.25 decimal, num/3 will be a repeating infinite decimal, and num/5 will be a 0.2 decimal so you can quickly figure out their decimals in the future if confronted with an enormous decimal in a competition. Long division is used to example why 1/3 repeats forever. More challenge problems are given and the 'weekend' challenge problems are AMC10 level.

Module 4: Algebra Tools first challenge is 'Algebra Tricks' a percent question, a store adds 35% to wholesale value, if you apply a 35% off coupon how much do you save off the wholesale price, which boils down to what is 35% - 35%. He shows how to do this (1 + 0.35)(1 - 0.35) which is of course (x + y)(x - y), and tricks to quickly square any 2 digit number like 0.35. The setup for this is quite good, he really is teaching multiple concepts all with a single problem.

Module 5: Number Theory Tools contains mainly topics you find in a first year 'intro to proofs' style course. One challenge question is "How many squares less than 10,000 are congruent to 1 mod 10?" and this is easy to answer after you see his explanations. I don't remember learning any of this when I was in middle school.

Should you do this if you aren't in Grade 7? Maybe, some of those AMC10 problems are pretty difficult and I know plenty of adults who wouldn't be able to figure out the decimal to fraction challenge let alone adding up all the squares from 1 to 100 in the Number Theory Tools module.

The university level competition is the Putnam and he recommends the book Putnam and beyond Razvan Gelca, Titu Andreescu which flipping through it, has complete solutions for all the 900+ problems. We should probably try this book here as our 'daily challenge'.

This is his free site however the original Expii looked like this but you won't for some reason find that listed anywhere on their main page, and all the practice problems are gone too. The original concept was Expii is a personal tutor to fill missing gaps in the prereqs dependency chain though since this part of the site disappeared off the main page I'm assuming it's depreciated and being replaced with Expii solve. You can still use expii of course just I don't find it as effective anymore, it's like a community contributed Khan Academy which if you're looking for a specific topic is still a great resource.

Live sessions

Every Friday from 4:00pm – 5:00pm EST has a live session of the same content aimed at highschool competitive algebra and geometry where you can ask him anything, for example in 07/08 Wed stream he shows how to solve \(3^{x^2 - 2xy}\) = 1 where x2 = y + 3, we can do this too because we've read Tao's book but he always has some interesting optimized method being a national olympiad coach.

Here at 41:13 he answers even nonsense questions giving a tutorial on logical implication. Somebody after asks him what should students focus on breadth or depth? He answers that you get both by doing old problems from math competitions.

A list of questions asked and answered in all these live session are on the forum which you can't view without registration, and if you register you can't post unless you're in the Daily Challenge which is probably a good idea since it's all young kids you don't want to post there anyway.

New method to solve quadratic equations

This is an interesting method he found while we was working on Daily Challenge content trying to better explain quadratic equations without having to use the quadratic formula. The full version is here. Wildberger did something similar, rewriting the quadratic formula using Newton's method of approximation to make it easier for students.

21-738 Extremal Combinatorics

Prof Loh recently put most of his PhD course 21-738 Extremal Combinatorics up on YouTube which uses similar teaching style of his live sessions. It's a theoretical course, if you've never seen one you can still follow along knowing only what we know now combinatorics has low barrier to entry because it's all finite math/sets. These courses aren't always linear they will on different days cover different theorems. Not that we expect to be able to solve hard problems in combinatorics or anything but it's worth watching to see how he plays around with structures and objects to test their properties in a theory course. This course has interesting content like Ramsey Theory where every large enough system no matter how chaotic it appears there must contain large structured subsystems. You can conjure up things with probability to show they exist, then use extremal techniques to 'derandomize' to reveal a deterministic solution.

You can get a crash course on graphs here and that same playlist has a beginner probability summary lecture good enough to follow along many of the lectures if you wanted.

Linear Algebra Explained

This is a professional development seiminar for teachers given by Prof Loh on linear algebra specifically why is the formula for matrix multiplication constructed the way it is, and how to demystify the determinant. I audited them to see if we could use them here, the determinant lecture is definitely worth your time.

Matrix multiplication

  • Session 1 of 3 - Why is matrix multiplication done in 'strange' order?

He begins talking about linear functions, which we know are functions that grow linear in size, using function composition as a model so g(f(x)). He writes his matrices using a full box/rectangle something he says he was taught in university to distinguish regular variables from matrices assigned to variables, like AB is actually matrix A x matrix B so he will draw a box around it to 'see it's shape better'.

1:06:00 or so he traces a variable from one function composition to another, distributes the variable at the end and shows that the multiplication you're doing at the end is exactly the same procedure of matrix multiplication, this part is worth watching if you're interested.

1:21:00 he describes the tech setup he's using for these videos: a lenovo laptop w/touchscreen he writes on, a green screen, and OBS Studio (free) to combine it all together and broadcast to YouTube. If you're interested you can see the camera (looks like a Blackmagic URSA Broadcast or similar) and huge lights he's using, plus filming in what looks like a closet to drown out noise and sunlight. He says everything is run on linux and customized the touchscreen writing software as the code is open source.

1:43:00 he demonstrates how Fibonacci sequences can also show why matrix multiplication is associative, also a method to rapidly find any Fibonacci index using an order of magnitude estimate. He claims this method is superior to the typical golden ratio method because you're no longer carrying around square roots of 5 and can get exact solutions.

He ends with 'some students ask me in this modern day when there is big data and all these algorithms (machine learning) what kind of math will be the most useful for them to learn, my advice is linear algebra because it's convenient for modeling these things with a linear approximation'. He claims he had to take linear algebra in university twice to understand it because what he was taught in highschool about solving systems of equations was totally different than the abstract linear transformations in university.


I couldn't find session 2. The idea of this lecture is how teachers can demystify determinants to students, pointing out (signed) area and determinant are the same thing. If you can watch the whole lecture.

Prof Loh models the determinant as the area of parallelograms which we already know from the Wildberger lectures. There is a very long setup of 2x2 row operations getting areas of parallelograms and the determinant formula ad - bc just pops out of the last row operation. Signed area is then talked about like negating it, meaning 'flipping' the area, reflecting it across the origin.

He shows a student hack he saw at 1:37:10 or so for figuring out the determinant of a 3x3 matrix where you add on two extra columns to diagonally group and multiply, which is useless for 4x4 or NxN matrices. He explains the method he learned in school and still uses is the same method Wildberger taught us of blocking a row and column, taking the remaining 2x2 determinant and alternating their signs to a(ei-fh) - b(di-fg) + c(dh-eg). He says this method we use is scalable and works for all dimensions.

From about 1:44:00 he shows how to explain to students the standard 3x3 det() formula: aei + bfg + cdh - ceg -afh - bdi. He explains to find out himself he tested various matrices using linear functions, so holding some items in a vector fixed and let a few things vary, so basically testing it like you would a program. Scaling a linear variable down to 0, if it's 0 then the area should be zero too, this means there can't be any constants left floating around the formula or you will get nonsensical area. In the q&a after he clarifies this.

He then takes every permutation of a row and shows that anything that multiplies two entries on a single row must also disappear from the determinant formula or else scaling breaks, the determinant formula multiplies a scalar twice in one vector.

Products from the same column breaks the formula as well. Recall that a 3x3 matrix is essentially 3 row vectors (x, y, z) so if you multiple anything in the same column, then you're exploding all the x vectors or z vectors while the rest stay constant, so you are stretching the signed area.

Finally why is there alternating signs. Take a standard abcdefghi 3x3 matrix and take the 3x3 determinant formula: aei + bfg + cdh - ceg -afh - bdi. Make a matrix for each one using 1's and zeros. So aei will be a diagonal matrix of 1's and the rest zero:

aei \(\begin{bmatrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\end{bmatrix}\)

Compare the negative matrices: afh, bdi, and ceg to the diagonal matrix aei. Notice that you are flipping two of the 3 vector rows, so for example aei if you exchange ei with each other one time it becomes afh. If you switch ai in aei then you get ceg, another negative matrix in the determinant formula. In order to convert aei to a positive matrix like bfg, you need to flip twice. So it's revealed the negative matrices in the det() formula are one's you only flip an odd number of times, and positive even number of times, from the aei diagonal matrix.

2:16:30 he covers questions and gives more examples of single variables breaking the formula, column stretching.

Most interesting part of this lecture is how he went about investigating the determinant formula, poking at it and experimenting around to see what happened when he broke the formula, modeling the formula using linear functions to see what happened, which is how you learn math from what we've seen already in Tao's blog posts and the book How to think like a mathematician. In a way this is also how you can learn some software you find in the wild with no documentation, disable some things and see what breaks. In other words poke at it to discover it's behavior.


6.1 Convergence and limit laws

Returning back to Tao's Analysis I let's learn what a limit is after using formal limits for all of chapter 5. Reminder if you haven't seen it Wildberger has a lecture on limits. Think of any infinite convergent sequence such as 1.91 1.992 1.9993 … converging to 2 and that you have taken such a sequence and stuck it in an imaginary infinite array and indexed every value. This sequence does not end and goes on forever getting closer to the least upper bound. Everytime you subtract the limit from the next sequence entry you get a new smaller epsilon if it is a converging sequence. Try this yourself subtract (1.9 - 2), (1.99 - 2), (1.999999 - 2) and take the absolute value. If you choose an arbitrary epsilon of say 1, then every entry in the remaining infinite sequence will have an equal or smaller distance from 1 if it is converging, which is what the definition means by 'every epsilon > 0'. The further you go down the infinite sequence the smaller this epsilon becomes so that 'for all positive epsilon' there is some index in this array where if you pick any arbitrary epsilon like .0000000001, every future index will contain a value equal or smaller than that epsilon.

Prop 6.1.7 notice he has assigned epsilon to the value |L-L'|/3 which is why d(L,L') \(\le\) 2epsilon is d(L,L') = 2|L-L'|/3 I somehow missed that assignment and spent a few minutes wondering what was going on. He wanted a tiny displacement but why divide by 3 instead of 2? I'm guessing to set up the triangle inequality.

In Remark 6.1.9 we throw out the starting index, I also noticed he changed the index type from naturals to integers in these definitions. Try some exercises:

Ex 6.1.1 assume we're using induction because Tao noting these are natural numbers. We want to show \(a_{n+i}\) is > an for all natural numbers i. If i = 1 then base case is satisfied, since n+1 > n. Suppose \(a_{n+i}\) is > an, then the successor of i \(a_{n+i++}\) is > an. By earlier propositions of addition I'm too lazy to cite, \(a_{n+i++}\) can be rewritten as \(a_{(n+i)++}\) or \(a_{(n++)+ i}\) and since n++ is greater than n by peano axioms we're good. If m > n, then we can assign m = n + 1 to the example sequence and by above terribly lazy induction proof am > an and by transitivity. All we are doing here is defining what an increasing sequence means. Note these indices are recursive, we are creating a structure adding +1 so we can take it apart like we would a recursive datastructure, that's why the indices for a sequence often end with n-2, n-1, n or begin with n-n, n-(n-1).

Ex 6.1.2 is asking is it possible for a sequence of real numbers to converge to another real number, if none of the conditions are satisfied so a proof by contradiction where you look in the logic appendix for if and only if statements and try to prove it false. Let's look at Def 6.1.5 and see if it's possible. Assume for contradiction given any real epsilon > 0, you can't find an index in a convergent infinite sequence so that all future entries are less than or equal to that epsilon. Then this sequence is not converging, and is divergent by def 6.1.5. If the difference between the limit and the sequence entry does not continually shrink meaning there exists some index that can satisfy every epsilon > 0 then it is moving away from the limit and contradicts the definition of convergence of sequences.

Ex 6.1.3 this looks a lot like ex 6.1.2, we have a sequence converging to a real number, and one index is bigger or equal to the other, so if a later index entry of the sequence is not within epsilon, obviously an earlier index can't be a convergent sequence either because then we don't have the property that n \(\ge\) N ie all future index values of some arbitrary epsilon displacement between c (the real number) and the sequence entries. This is in Remark 6.1.9 that the starting index is irrelevant only the fact that a sequence eventually converge to a limit is relevant.

Ex 6.1.4 another question about indices, this one you are adding an arbitrary integer to the index (which could also be zero) that acts like a stepping function, moving up in increments of 1000. If a'n and an are supposed to be converging to a limit, if you add 1000 to the index, a1001, a2002, a3003 better converge if a'n wants to converge or else the properties of def 6.1.5 don't hold. You could show this with k = 0, j = n + k, and now you have a sequence aj = an where an will only converge if aj does since they are the same sequence and n + k \(\ge\) n \(\ge\) N which is def 6.1.5 again, so adding an arbitrary step to the indices shouldn't matter.

Ex 6.1.5 Looking at Def 5.1.8 for a Cauchy sequence, for every possible positive epsilon there must exist an eventual index where two adjacent index entries subtracted from each other have to be less than or equal to that epsilon meaning of course they are converging so the distance between entries grows smaller and smaller. The limit definition, is the limit subtracted from an eventual index > N will always be within an epsilon displacement if it is converging to that limit. The triangle inequality for distance back in 4.3.3 states the distance from x to z, is less than or equal to the distance from x to y plus the distance from y to z. The distance aj - ak \(\le\) epsilon, and the distance an - limit \(\le\) epsilon so assign y = limit in the triangle inequality: aj - ak \(\le\) aj - limit \(\le\) ak - limit and of course are all within epsilon so a convergent sequence of real numbers is also a Cauchy sequence.

Ex 6.1.6 prove prop 6.1.15. There is a long proof answer here you probably want to try and understand but it's worth going back and seeing how he superceded the proofs of temporary constructions before. For example in remark 4.2.5 he defined the quotient x/y of two rationals then substituted in his temporary notation and definitions to prove the new notation and definitions led to the same conclusion, and thus could discard it and use the new notation. Here I did the same thing using 6.1.11 his proof that the limit of 1/n = 0 rewriting it using new limit notation, substituted in Cauchy/formal limits, came to the same conclusion and now I claim formal limits are 'genuine limits' and can toss that notation. I also used 6.1.11 as a concrete example to try his contradiction suggestion, assuming 1/n was not eventually ε-close to 0 and found it was much easier to prove that it had to converge to 0 thus be the genuine limit of 1/n. Is this correct? Who cares, we are proceeding as Poh-Shen Loh once did, coming up with our own unusual proofs we can always come back to Tao's book later and attempt to prove every exercise in Agda to see if our proofs are at least logically sound, at this stage we are just beginners.

Ex 6.1.7 we look at Prop 6.1.4 and rewrite it to bridge rational bounded sequences from 5.1.12 to real number bounded sequences defined in 6.1.16, just like how 6.1.4 bridged rational Cauchy sequences to real number Cauchy sequences, you are using 6.1.4's proof as a template: Suppose that \((a_n)_{n=m}^\infty\) is a sequence in the sense of 6.1.16 and it is bounded by a real number M > 0 and an \(\le\) M for all n \(\ge\) m, in particular it is bounded by a rational M > 0 which makes it a bounded sequence in the sense of 5.1.12. Now suppose \((a_n)_{n=m}^{\infty}\) is a sequence in the sense of 5.1.12 that is bounded by a rational M > 0 and an \(\le\) M for all n \(\ge\) m. If M is a real number, then there exists an integer N > M by the Archimedean property… and your proof carries on like this showing that 6.1.16 is indeed bounded by a real number.

6.1.8 Try and prove these yourself, doesn't matter how awful your proof is just that you looked at previous Cauchy series lemma proofs like 5.3.6 and tried to adapt them to limits, like he does in Prop 6.1.11 so proving the addition of two limits converges to another limit, if an - L must be less than epsilon for every epsilon > 0, then substitute in L = an + bn and see what happens: \(|a_n - (a_n + b_{n})| + |b_n - (a_n + b_{n})|\) \(\le\) epsilon then lim\((a_n + b_{n})\). This is the point of doing proof based math so you can experiment around to better learn constructions and derive all the rules later if you forget them. There are answers floating around for Ex 6.1.8 but maybe you know a better/more intuitive way to prove that an + bn converges to the real number x + y.

6.1.9, why does x/0 fail, and 6.1.10 modifying 6.1.4 again you can figure out easily since we know you can always squeeze a rational number in, so an - bn \(\le\) rational epsilon \(\le\) real epsilon

6.2 The Extended real numbers

We patch the definition of real numbers with negative and positive infinity elements that are distinct from all other real numbers. All we can do with these two new objects is equate things to them, something is infinite if it equals negative or positive infinity. Tao points out cancellation of infinity elements leads to nonsense conclusions.

Example 6.2.7 and def 6.2.6 if a set 'E' contains negative infinity, sup(E) is that set with negative infinity removed and the least upper bound taken, so his example E = {-1, -2, ….} U {negative infinity} results in the set {-1, -2, …, -\(\infty\)} so there is a set of infinite descending numbers unioned with an object 'negative infinity' and sup(E) is -1 because we remove -\(\infty\), but are still left with the first set of {-1, -2,..} even though it technically is also negative infinity it is not the object \(-\infty\). Example 6.2.9 that set is equal to infinity but does not contain the actual element infinity.

6.3 Sup and Inf of sequences

Example 6.3.4 the supremum = 1 is obvious but how do we prove it? The sequence is an \(\le\) 1 for all n \(\ge\) 1, so any element x of this set x < 1, but since a1 = 1 and is in the set, we have x < a1 so x is not an upper bound, and 1 is the upper bound. Similar reasoning for the infimum, an > 0 for all n \(\ge\) 1, so we have x > 0 for all elements in this set. By 5.4.13 Archimedean property we have a positive integer N such that Nx > 1 or 1/N < x, so if 1/n < x that means x > 0 is not the lower bound of this sequence since an < x so we must have 0 as the lower bound. There is a completely different solution here you may want to look at, using 1/n \(\le\) M for all n > 0, so for n = 1 we have 1/1 \(\le\) M or 1 \(\le\) M for the supremum. The infimum is derived using a contradiction. It doesn't matter how you prove it.

Prop 6.3.10 why is it decreasing? Because \(a_{n+1} \le a_n\), pick any arbitrary x between 0 and 1, like 1/2. a1 is (1/2)1 or 1/2. a2 is (1/2)2 or 1/4. a3 is (1/2)3 or 1/2 x 1/2 x 1/2 or 1/8. This fails when x > 1 since it is no longer a decreasing series and is divergent. Tao asks to look at the first chapter again, which we skipped but now can read. The problem with 1.2.3 is that it assumes there is a limit for all real numbers xn, a limit can only exist if 0 < x < 1 for xn as n goes to infinity.

You can look up a proof of the monotone sequence theorem here. If you forgot all the inequality manipulations he did here review Tao's chapter on them again.

6.4 Limsup, Liminf and limit points

If this chapter doesn't make sense find a real analysis playlist. Since this is an analysis book we are exploring metric space (the real line) and now getting into topology territory. Example 6.4.4 we find out limits are just a special case of limit points.

Def 6.4.6 note the new notation is "the supremum of all the elements from aN onwards" so the example 6.4.7, what's the supremum of a1 and all future indices? 1.1. What's the supremum of a2 and everything past it? 1.001. What's the supremum of a3 and everything past it? 1.001 again. You are looking ahead not before the current sequence index.

Prop 6.4.12 L+ and L- are extended real numbers so we can equate them to infinity. In the same proposition a) and b) seem to be saying that there exists an infinite amount of real numbers in the sequence bigger or less to wherever we happen to be when we take the limit superior or limit inferior. For c), the infimum of the entire sequence will be less or equal to the limit inferior because the limit inferior doesn't look at all the previous sequence entries, same with the limit superior so it too will be less or equal to the supremum.

Squeeze test introduced here in 6.4.14, and Tao finally completes the real number definition.


Try some of his multiple choice applets here like Inequalities, Infimum and supremum, Logic and the Sequences quiz if you haven't already done them. If you don't remember the answer look at that chapter it's not a memory test and our goal isn't to become real analysts it's just to understand the construction of these objects for use in later courses.

The plan for the rest of this book is once we finish chapter 7, we will have attained pre-calculus levels.

6.5 Some standard limits

Very short chapter, the proof of 6.5.1 is a standard point-at-definitions style proof, see Lemma 5.6.6(a) and (b) and (g), if L = 1/x1/k then Lk = (1/x1\k)k\1 which is Lk = 1/x. For the exercise with xq, look at Def 5.6.7 and 5.6.9 again.

6.6 Subsequences

Example 6.6.2, a2n is multiplying the indices by 2, producing the subsequence a0, a2, a4 like in his example. It just means if you apply a function to the indices that is strictly increasing, you get another sequence and can call this new sequence bn or cn. Theorem 6.6.8 proof: since we already declared a bounded sequence meaning an \(\le\) M for all n in N, then (negative bound) \(\le\) an \(\le\) (positive bound), so this can't be equated to infinity as it lies in between these two bounds. This means L is a limit point of this sequence and within this sequence one or more subsequences exist that converge to L. Tao teaches us some topology terms in 6.6.9.

I didn't do any exercises here, later when we do probability and the Putnam book we will revisit all these definitions again as needed all you need to know that if you apply an increasing function to the indices of an infinite sequence you can probably find a subsequence in there somewhere that has a limit point.

6.7 Real number exponents II

Here we learn what things like xe or xπ mean. TODO

7.1 Series

We finally arrive at pre-calculus. TODO

Our daily challenge TODO

This will be the end of the workshop after we finish Tao's book(s) and most of Wildberger's Linear Algebra. We will mix all the below Putnam problems with CMU's lectures on research math, and maybe Wildberger's new series on pure math from scratch ie: how to go about doing research:

After watching about 20+ hours of prof Loh lectures, interviews and 'Ask Math Anything' streams:

  • Linear Algebra is the subject we should most focus on for our purposes (CompSci, AI) and we've already been doing that.
  • The best way to learn is to work on solving hard problems that you have a less than 25-50% chance of solving, and more than 10% chance of success so not completely impossible, then you are always engaging yourself and gaining skills.
  • Working on competition level problems teaches you the stamina and patience to be able to work on actual real world problems people care about because IRL problems take focus and dedication to solve.
  • There's very few things to memorize, you can always deduce things using mathematical logic once you figure out how the logic works, which you best learn by solving problems.

Prof Loh gave an interview describing his second attempt at getting into the national IMO team. He thought it would be his last training seminar with no chance to get selected, so wanted to get the most possible benefit he could from the training so he ignored all the 'easy' problems he thought he could solve instead focusing on all the hardest possible problems, trying to solve them in the most unique and unorthodox methods he could think up just to see if he could expand his knowledge. Now he's the head coach of the national team and in training sessions they use this same strategy: ignore everything you think you can solve, spend your time working on harder and harder things and you will get better.

Putnam seminar

This is an introductory recorded Putnam seminar for CMU students. Since CMU is closed, this seminar is being streamed on Youtube.

Proof by contradiction walkthrough

Watching week 1 CMU Putnam Seminar.

Until 29:20 it's mainly course logistics explaining the different sections and grading. He's walking through problem 3 from the lecture notes clarifying that 'integral coefficient' means the coefficients of the polynomial are integers. If you forgot what modular arithmetic is or polynomial standard form see the CMU book by Sullivan/Mackey or watch the relevant lectures from this playlist such as this lecture on polynomials where at 25:30 you see how to eval a polynomial such as p(a) or p(a/b) or p(4). That same 15-251 lecture explains why constants like 5 are a polynomial, you set all the degrees to 0 and when converted to standard form, anything with a variable is zeroed and you're just left with + 5.


Watching the second CMU Putnam Seminar.

The first problem is solved using basic calculus which we'll learn from Tao's book. The second problem at 25:00 is interesting in how he rewrites a large polynomial using substitution, then again at 33:00 where he substitutes the product rule to add h for 3 products. None of this would make sense if we hadn't already done a few chapters out of Tao's book and saw substitution manipulation dozens of times. todo, learn some calculus first.

Putnam book

The book Putnam and Beyond is recommended by Prof Loh in his Putnam seminar, written by one of his IMO coaches. It contains mainly competition level problems, showing us how to solve something then asking the reader to try it themselves. Every problem in the book has a worked out solution with the most interesting or elegant proof they could demonstrate to teach the reader different strategies. Prof Loh says we should spend about 2 hours trying to figure out a problem and increase the time we spend as we go, until we build up the stamina needed to focus on a problem for many hours (this can be split up, not doing it all at once). We can also go through this lecture series on doing research math aka 'methods to solve any problem by hook or by crook' as we go through Putnam and Beyond, and learn things like Proof Theory and natural deduction when they come up.

Let's look at the second edition of Putnam and Beyond by Andreescu and Gelca and see how far we get, alternating between Tao's text and Wildberger's Linear Algebra course we're already doing. Remember this workshop is just to generally learn undergrad math, all the specific AI needed math will be done in that workshop. This is a hard book but will get easier the more proofs we see.

1.1 Argument by Contradiction

Coprime if you look it up means two numbers where their highest common denominator is 1. The indices for N = p1 p2.. pn means multiplication (factorial), as there is no commas between them so you have a list of primes p1, p2, .. pn and assume pn is your whole list of finite primes. Then multiply every index together and add 1, that will still be divisible by some prime p yet it's not in our original list of primes. Try the example, suppose pn is 4 primes, so 2, 3, 5, 7, and pn! + 1 is 2x3x5x7 +1 or 211. That is divisible by 11, another prime but it's not in our list, hence Euclid claiming there will always be more primes. The second example of this proof, 'repunit' means repeated units like 11, 111.. and try the examples yourself of 10xn! + 1 so take x1, multiply it by 10, add 1, you get 11, then you get 111, etc.

The Euler polynomial example, recall that polynomials are written in descending powers of n, so a2x2 + a1x1 + a0.

todo after we finish Tao's Analysis I since we need Calculus.