Math Foundations from Scratch


This is part IB of the AI Tripos to learn the basics of mathematical reasoning and problem solving tactics which you will discover is all you need to then learn whatever it is you want on your own. No background is required.

Do 1% effort for 100 days - Prof Ryan O'Donnell.

What is math

As per Poh-Shen Loh it is one of the easiest subject you will ever take because there is little to memorize, you deduce whatever it is you don't know using mathematical reasoning. The closest analogy would be lawyers, they deduce from statutes and case law new arguments, and these are proofs ie: 'proven beyond a reasonable doubt'. We will be doing the same thing using methods of reasoning. Once you start to get familiar with this logic then you can use it not only to manipulate models you build in math of real life things but in everyday life your thinking will change to become more rigorous. In math the model studies you while you study the model.

Terence Tao believes mathematics is complex, high-dimensional (as in moving in any direction, like the depiction of an ancient deity with many arms) and evolving in unexpected and adaptive ways.

Start here

We start from the very beginning counting sticks and work our way up to the CMU Putnam Seminar.

The natural numbers

Numbers are objects you build to act like the concept of a number. Here we begin to construct the natural number object.

  • YouTube - Arithmetic and Geometry Math Foundations 1 *note these videos eventually get better in quality

Arithmetic with natural numbers

  • YouTube - Arithmetic and Geometry Math Foundations 2

An interesting explanation of multiplication.

  • Laws for addition
    • n + m = m + n (Commutative)
      • Changing the order of the elements.
    • (k + n) + m = k + (m + n) (Associative)
      • Changing the grouping of the elements.
    • n + 1 = s(n) (Successor)
  • Laws for multiplication
    • nm = mn (Commutative)
    • (kn)m = k(nm) (Associative)
    • n \(\cdot\) 1 = n (Identity)
  • Note the different notation for multiplication:
    • nm
    • n \(\cdot\) m
    • n \(\times\) m
  • Distributive laws
    • k(n + m) = kn + km
    • (k + n)m = km + nm

Exercise: Prove why the multiplication and distributive laws work. Since we have no idea what a proof is yet, we can instead try going through some of our own examples.

Identity law: n \(\cdot\) 1 = n.

  • We are creating n groups of 1. According to our definition of multiplication |||| \(\cdot\) | would be 4 groups of | or ||||. This is still 4.

Commutative law: nm = mn.

  • ||| \(\cdot\) || = || || || or 6. Rearranging this, || \(\cdot\) ||| is ||| ||| which is again 6.

Associative law: (kn)m = k(nm).

  • If k = |, and n = ||, and m = ||| and parens expressions are evaluated first:
    • Left side: (kn)m. We first evaluate (kn) as || (2 groups of 1, the identity law), and 2m is ||| ||| (2 groups of 3) for a total of 6.
    • Right side: k(nm): In the parens nm is 2 groups of 3, so ||| |||. k(6) is the identity law, one group of 6. Thus (kn)m = k(nm).

Distributive law: (k + n)m = km + nm.

  • If k = ||||, and n = ||, and m = |:
    • Left side: (|||| + ||) is 6, and (6) x 1 is six groups of 1 or 6 according to the identity law.
    • Right side: km + nm: 4 groups of one is 4, plus two groups of 1 is 2 (identity law). Answer is again 6.

Laws of arithmetic 2

  • YouTube - Arithmetic and Geometry Math Foundations 3

Wildberger shows another interpretation of multiplication, and walks through the proofs of the laws by rearranging according to the commutative law.

Subtraction and division

  • YouTube - Arithmetic and Geometry Math Foundations 4

Subtraction is the inverse of addition

  • If k + m = n then k = n - m
  • Note: n - m is only defined if n > m as we are still in the realm of the Naturals and negative numbers/integers aren't yet defined.

+, -, are binary operations and can only do 2 operations at a time

  • Laws of Subtraction:
    • n - (m + k) = (n - m) - k
    • n + (m - k) = (n + m) - k
    • n - (m - k) = (n - m) + k
  • Distributive laws of subtraction:
    • n(m - k) = nm - nk
    • (n - m)k = nk - mk

Note to remove parens: we can drop the grouping enforcement when using addition (associative law) but must distribute the negative sign, as subtraction is not associative.

  • n - (m + k) = (n - m) - k
    • removing grouping: n - m - k = n - m - k

The Hindu-Arabic number system

  • YouTube - Arithmetic and Geometry Math Foundations 6

A simplified Roman numeral system is introduced to abstract our previous notation of one's so we can manipulate bigger numbers.

  • Uses powers of 10
    • 327 = 3(100) + 2(10) + 7(1)

Arithmetic with Hindu-Arabic notation

  • YouTube - Arithmetic and Geometry Math Foundations 7

Multiplication with the Hindu-Arabic notation:

  • Same laws as before:
    • (ab)c = a(bc)
    • a(b + c) = ab + ac
    • (a + b)(c + d) = ac + ad + bc + bd

A different way of hand multiplication is shown, taking advantage of the fact the notation is using the distributive law.


  • YouTube - Arithmetic and Geometry Math Foundations 8

Division is repeated subtraction. Laws of Division:

  • 1. \(\frac{k}{m}\) + \(\frac{n}{m}\) = \(\frac{k + n}{m}\)
  • 2. \(\frac{k}{m}\) \(\cdot\) \(\frac{n}{l}\) = \(\frac{kn}{ml}\)
  • 3. \(\frac{k/m}{n}\) = \(\frac{k}{mn}\)
  • 4. \(\frac{k}{m/n}\) = \(\frac{kn}{m}\)


  • YouTube - Arithmetic and Geometry Math Foundations 9

We're still using only natural numbers.

  • Def: A fraction, is an ordered pair (m,n) of natural numbers also written m/n.
    • Fractions m/n and k/l are equal precisely when ml=nk.

Arithmetic with fractions

  • YouTube - Arithmetic and Geometry Math Foundations 10

We're still in the natural numbers, which he defined as beginning with 1 so we aren't working with zeros yet (no division by zero corner cases). The prime notation (the ' character so c', a', b') used in the proof represents another variable. What we are trying to prove is ab' = ba' and cd' = dc'. He introduces some extra notation multiplying both sides by (dd') to help prove this. This is our first experience with a proof strategy: multiply both sides of something linear by a scalar.

Laws of arithmetic for fractions

  • YouTube - Arithmetic and Geometry Math Foundations 11

Wildberger proves the distributive and commutative laws using fractions while still in the natural numbers.


Now we will redo what we just learned rigorously in Terence Tao's Analysis I. The author explains here why we need this rigorous stage. This book was originally supplementary notes for an honors analysis class, and in the preface he mentions we will need some lectures to go with the material to help us understand it. My goal here is just to provide some scratchwork so you can have extra intuition into what is going when Tao abstractly moves things around.

Nature of mathematical logic

Figure 1: What is a proof from 15-251

Axioms are laws without a proof, as there are no earlier laws from which we can deduce their proofs from and they should be self-evident and simple enough that they can be fully understood such as Euclid's parallel postulate that it is obvious there exists a concept of parallelism.

Deduction can be described as if P then Q, assume P and prove Q which is part of the rules of inference. There is a series of lectures given by Per Martin-Lof in 1983, which in clear language anybody can understand, explains the meaning of logical constants and the justifications of the rules of inference, and their history such as hypothesis a Greek word translated into Latin as supposition, meaning an assumption. If you want to know exactly how a proof is constructed you should read this as you do Tao's book noting the elimination rules. In math the contrapositive of statements is often used for proofs such as 'if P then Q so not Q then not P' as these are logically equivalent statements.

Variables are used to express general laws, for example how do you express that every natural number is equal to itself ie: 2 = 2 and 3 = 3. You introduce a free variable x = x and it's meaning remains fixed inside this context. Often the prime symbol is used to denote two different variables with the same letter such as b and b' called b and b prime.

The operation of replacing all occurences of one variable for a value is known as substitution such as a + 0 = a and if a = 1, then substitute 1 + 0 = 1. There is also a substitution rule, where equivalent terms or formulas can be substituted for one another and this becomes highly abstract interchanging entire complex expressions not just single variables for another.

Math has data structures just like programming languages. There are sets like {3, 1, 5} in which order does not matter, and lists like (a1, a2, …., an) which are ordered data. These are often interpreted as vectors or n-tuples like (x, y) coordinates. A list (x, x, y) is different from (x, y) as order matters but set {x, x, y} is the same as {x, y}. Math is object-oriented so theories describe classes of structures and the instance of those structures models the theory/axioms.

That's it, you now know all you need the rest is just practice. Much of the above was taken from texts you can find on library genesis about mathematical logic like Introduction to Metamathematics by Kleene if you want to see how reduction is used to collect a bunch of theories and reduce them to axioms, and how similar it all is to computer science with relational models, decidability, complexity theory and determining semantics of a natural language.

Lectures on logic

This lecture gives examples of an initial object, and deduction rules. You really just need to watch up to 5:30, then skip to 13:35 to see his ATM example of deduction and how he deduces the proof that the ATM can provide any postitive natural number denomination except for 1 and 3. The rest of the lecture is optional where he models deduction using a boolean circuit: given input X, Y or Z these are passed through gates such as NOT, AND, OR to arrive at an output which is your deduced formula.

  • Logic - Lecture from 15-251 CMU

Around 32:00 is the model for 'pretty much all mathematical reasoning' and logical quantifiers like 'for all' or 'there exists' and propositional connectives such as 'implies' or 'iff' are introduced. Most of this is in Tao's appendix.

You only need to know these connectives:

  • AND \(\land\) (conjunction)
  • OR \(\lor\) (disjunction)
  • NOT \(\neg\)
  • Implies \(\implies\) is only false when A is true and B is false
  • If and only if \(\iff\)

These connectives can of course be used to define each other:

  • A \(\implies\) B is \(\neg\) A \(\lor\) B
    • implication is only true when either A is false OR B is true
  • Iff is (A \(\implies\) B) \(\land\) (B \(\implies\) A)

Implication seems confusing at first, it's modern language use relating to casuality is different from the logician's use which is explained here and those Per Martin-Lof lectures above gives a complete justification of the logical rules of implication.

Some more optional lectures on the history of logic from Aristotle to the present state:

  • Aristotle and deduction - Brief history of logic MF 251
    • Origins of modern logic
  • Stoics and other thinkers - Brief history of logic MF 252
    • Implication, disjunction and conjunction is discussed here and highly relevant to all further chapters
  • Islamic/Medieval logic - Brief history of logic MF 253
    • Modal logic origins
  • Leibniz to Boole - Brief history of logic MF 254
    • From deduction to induction
    • Notation for relations
    • Boole's book Analysis of Logic
    • De Morgan's laws

Materials needed

  • The 3rd edition of Analysis I by Terence Tao Buy it cheap from Amazon India/Abebooks, or use library genesis (see Wikipedia page for their latest domain) or WorldCat to get it.

2.1 The Peano Axioms

Start with Analysis I chapter 2. We begin again with basic counting but rigorously defined.

  • The five Axioms
    • Axiom 2.1: 0 is a natural number
    • Axiom 2.2: If n is a natural number, then n++ is also a natural number
    • Axiom 2.3: 0 is not the successor of any natural number
    • Axiom 2.4: Different natural numbers must have different successors
    • Axiom 2.5: Let P(n) be any property pertaining to a natural number n. Suppose that P(0) is true, and suppose that whenever P(n) is true, P(n++) is also true. Then P(n) is true for every natural number n.
      • We make an assumption about a property P(n) being true meaning you do not prove P(n) it is already assumed to be true, you prove P(0) and P(n++) are true, and all three form the logical chain that also proves n+2, n+3, etc.

You can think of n++ being a unary successor operation to generate the natural numbers, while n + 1 will be defined later as a binary operation. The idea is no matter how many natural numbers you have generated, n++ will generate one more.

Remark 2.1.10 the example where no natural number can be 'a half number', because of the increment step n++. The base case of 0 (as per Axiom 2.5, and Axiom 2.1) fails if we are using 'half numbers' instead of natural numbers because if zero is a natural number, and any n++ is another natural number, the only next successive number is 1 it can't be 0.5 or -3 or 3/4. The way you should view these axioms about the natural numbers is:

  • 1. 0 is an element of N
  • 2. If x is an element of N, then so is x++
  • 3. Nothing else is an element of N

This is called an inductive definition: clause 1 is an application, clause 2 a succession of applications, clause 3 is the extremal clause stating only steps 1 and 2 are instances of what N is.

P(n) is not a function it represents "n has the property P". Think about how the natural numbers are generated with successors, those generated natural numbers all inherit properties of previous generated numbers such as P(n): "any natural number n + 0 = n". If you prove that for 0, n and the next successor n+1 that these all have property P(n), any new successor will inherit this property too since they are derived in a chain from P(0) as per Axiom 2.1. You have already encountered mathematical induction in elementary school when presented with a summing progression such as '1+2+3+4…' the inductive step is 'and so on'.

2.2 Addition

The example for addition: 3 + 5 should be the same as (((5++)++)++) and adding 0 to 5 should result in 5.

Definition 2.2.1 we can add n++ to m by defining (n++) + m to be (n + m)++. You can read this as (thing)++ and anything in brackets is just a value to be determined, so you can consider it as a single value to be incremented. Example (1 + 2 + 3)++ + 5 is (n++) + 5.

Try writing your own examples:

  • 0 + m = m
  • 1 + m = ((0++) + m) or m++
  • 2 + m = ((1++) + m) or ((m++)++)
  • 3 + m = ((2++) + m) or (((m++)++)++)
  • 4 + m = ((3++) + m) or ((((m++)++)++)++)
  • 2 + 3 = ((3++)++) or (4++) or 5.
  • 3 + 2 = (((2++)++)++) or ((3++)++) or (4++) or 5.

First proof

Let's go through his proof in Lemma 2.2.2. Notice first it is a written essay, and that's how all your proofs should look. Second, he is using the exact same induction template from 2.1.11. Third he's using a different variable to not confuse it with the already established definition of 0 + m = m reminder these variables are just placeholders for value and if you see m twice or n twice it means the same value.

Induction proofs are a logical chain:

  • P(basis First link contains the property and is true, you prove this
  • P(n) Assume this second link contains the property, you do not prove this because you assumed it's already true
  • P(n++) Prove successor is true, with ability to use the previous 2 like they are already established definitions

How does the base case for n + 0 = n become 0 + 0 = 0, from the previous definition of 0 + m = m? We are inducting on n so once we substitute in 0 for n we now have 0 + (value) = (value) which is from the definition of addition: 0 + m = m. Since (value) is 0 it is 0 + 0 = 0.

Suppose that n + 0 = n (assumption). Then the successor (n++) + 0 = n++. Pointing to the definition of addition where it defines (n++) + m = (n + m)++ we see it is in the same form as (n++) + 0, so if we say something is equal to something else, we can replace them for each other: (n++) + 0 becomes (n + 0)++ as they are equal so interchangable.

Finally we are left with (n + 0)++ = n++. The value in the brackets (n + 0) is in the form of P(n) the inductive hypothesis or assumption, since we have assumed n + 0 = n is true, we can again replace n + 0 for n and arrive at the proof: (n)++ = n++. Remember to consider your assumption 'already true' and a definition you can just point to while in the inductive proof. Why is induction used? Because we are building up a structure. He defined natural numbers as being the successor of another, which means an incremental layer upon layer that you can recursively take apart back to the base case just like you can in computer programming. Property P(n) exists in base case 0, plus 1, plus another, so this property holds for all natural numbers derived from the Peano axioms.

The most important takeaway: no arithmetic was done here at all. We pointed at definitions, and reasoned about them using replacement to exchange their forms until they were the same on both sides of the equal sign. You are a logician now pattern matching forms and moving them around. If thing = thing2, then thing2 = thing. At this early stage you could literally cut out the definitions and assemble proofs by swapping out definitions for each other.

How can you check this is correct? Reverse the logic, if n + 0 does not equal n, then 0 + 0 does not equal 0.

Second proof

Lemma 2.2.3 Here we are proving n + (m++) = (n + m)++, inducting on n. Since we are building a logical chain of deductions we start at the first link, which is 0 and induct on variable n meaning we replace n with 0 to get the form: 0 + (m++) = (0 + m)++. Looking at the right side, get out your scissors and cut apart the definition of addition and replace (0 + m)++ with (m)++. So now we have 0 + (m++) = m++. Since we know from the definition of addition that 0 + (value) = (value), you can replace 0 + (m++) with m++ and now the base case is equal on both sides: m++ = m++

Next, we assume n + (m++) = (n + m)++.

We prove the successor: n++ + (m++) = ((n++) + m)++. Looking at the left side, refer to the definition of addition and notice (n++) + m = (n + m)++ or (successor value) + (something) = (value + something)successor, the left hand side is now (n + (m++))++. This form is similar to the hypothesis of n + (m++) = (n + m)++ so replace the left side and now we have ((n + m)++)++. The right side follows the same logic as the left. ((n++) + m)++ can be seen abstractly as (value)++ or (successor + value)++, and the second is the definition of addition again so ((value + value)successor)++

You can always insert concrete examples to see what is going on, ie for n+(m++) = (n+m)++ let n=2, m=0 then 2+(0+1) = (2+0)+1.

Third proof

Proposition 2.2.4 (Addition is commutative) the base case is point-to definition of addition and Lemma 2.2.2. We use the definition of addition, and lemma 2.2.2 to rewrite the successor of n to be (n + m)++ = (m + n)++ and now it is in the form of the inductive hypothesis: (hypothesis form)successor = (hypothesis form)successor so we can use our assumption that n + m = m + n to commute m and n around.

Fourth proof & Implication

Proposition 2.2.6 we want to prove b = c. In the inductive step, he uses Axiom 2.4 to drop down from the successor to the form: a + b = a + c which is the form of the inductive hypothesis, where we claimed 'a + b = a + c implies b = c' so we can now imply b = c and have 'virtual cancellation without subtraction'. Tao writes about implication in the appendix which you should read then return to many times over as it comes up in later chapters.

Exercise 2.2.1

Prove 2.2.5 (Addition is associative) that (a + b) + c = a + (b + c)

We want to show that if a natural number is inside or outside brackets, it will have the same result when added together. Induct on any variable (a, b or c) you want. Starting our logical chain at the first link 0:

  • Base case:
  • (a + b) + 0 = a + (b + 0) (substitute 0 for c)
  • looking at left side: (a + b) + 0
    • since b + 0 = b (lemma 2.2.2) then (value) + 0 = value
    • (a + b) + 0 = a + b (lemma 2.2.2)
  • looking at right side: a + (b + 0)
    • a + (b + 0) is a + (b) for the same lemma 2.2.2 reasons
  • now we have a + b = a + b and base case is done.

We selected c for the base case, but what if you choose a or b? Try it while looking at the definition of addition. The next step in our logical chain is our inductive hypothesis(IH) that addition is associative, and that (a + b) + c = a + (b + c). The last part of the logical chain is prove the next link, the successor of our hypothesis, is true. There's a few ways to show this, I'll move stuff around randomly and you can see where the proof is:

Inducting on c: (a + b) + (c++) = a + (b + (c++))

  • right side: a + (b + (c++))
    • a + (b + c)++ (lemma 2.2.3)
    • ((a + b) + c)++ (by the IH)
    • ((a + b)++) + c (def of addition)
    • ((b + a)++) + c (commutative)
    • ((b++)+ a) + c (def of addition)
    • c + ((b++) + a) (commutative)
    • c + (a + (b++)) (commutative)
  • left side: (a + b) + (c++)
    • (c++) + (a + b) (commutative)
    • (c + (a + b))++ (def of addition.. consider (a + b) to be (value)): Recall (n++) + m = (n + m)++ so this means (c++) + (a + b) is (c + (a + b))++ because (n++) + (value) = (n + value)++
    • (c + (a + b))++ is the same as ((a + b) + c)++ (commutative around the (a + b) bracket)
    • (a + (b + c))++ (by the IH)
    • ((b + c) + a)++ (commutative)
    • etc., can keep moving these around

Try moving everything around until you understand exactly how commutativity and associativity work. See if you can get it to a form where you can use the inductive hypothesis, replace and permute them around some more until this starts to make sense.

2.2. cont.

Lemma 2.2.10 has an exercise where induction is suggested. Always read the comments on Tao's blog if you're stuck but this is the Peano Axiom 2.4 so not sure why it's even here, it's just restating Axiom 2.4 and 2.3. Assume for contradiction that b++ = a and c++ = a so a positive number is the successor of two different natural numbers. But by the Peano axioms this is not possible so c++ must equal b++. That is my good enough proof so we can later use this lemma to replace 'a positive number' for b++.

Definition 2.2.11 n is greater or equal to m if n = m + some other natural number. If that number is 0, then the equality holds, if that number is 0++ then the inequality holds.

Another way to think about order of the natural numbers, is because they are inductively defined and generated, they already have an order. You can think of m < n if m is generated before n.


Try a few yourself, write a convincing argument to yourself like how Tao writes proofs. Click on details button to see my scratchwork you do not have to do all the exercises.

Proposition 2.2.12 / Exercise 2.2.3: Prove the basic properties of order for natural numbers. None of this is a properly written proof only notes on moving through steps to understand the problem then after write a proof in the style Tao has been teaching us. See the appendix what reflexive, transitive and anti-symmetric properties are.

  • Order is reflexive
  • Reflexive property is that any natural number is equal to itself.
  • We want to show order is reflexive: a \(\ge\) a
  • a = a + n (def 2.2.11 of ordering of natural numbers)
  • From lemma 2.2.2 we know that a + 0 = a and so a = a and is reflexive
  • By definition 2.2.11 we also have a \(\ge\) a
  • Transitive order property
  • a \(\ge\) b and b \(\ge\) c, then a \(\ge\) c
  • If a = b + 0, b = c + 0, and a = c + 0, then we have lemma 2.2.2 and transitivity (equality) holds with a = b, b = c and a = c.

Try some more equivalency replacement to prove a \(\ge\) b and b \(\ge\) c, then a \(\ge\) c

  • a = b + n
  • b = c + n
    • b is equal to (c + n) so replace for b in a = b + n:
  • a = (c + n) + n
  • a = c + (n++)
    • By lemma 2.2.10 we have a = c + (value) since n++ will always equal some nonzero natural number.
  • By definition 2.2.11 a = c + (value) is a \(\ge\) c + (value)

Possible proof strategy: By the definition of ordering of the natural numbers, if a = (b) + m and b = c + n, then a = (c + n) + m. By associativity a = c + (m + n) and since if a = b when m or n are 0, then a = c when m and n are 0. By replacement d = m + n, and since m and n are natural numbers, d is a natural number by the Peano axioms and 2.1.3 (m + n)++ = d++, and by Axiom 2.4 (m + n) = d so we have a = c + d as desired by the definition of ordering of the natural numbers.

  • Anti-symmetric order
  • If a \(\ge\) b and b \(\ge\) a then a = b
  • See the entry for Antisymmetric relation
  • If these two inequalites hold then they must be equal
  • We need to show a = b:
    • a = b + m by 2.2.11
    • b = a + n by 2.2.11
  • Replace b for (a + n):
    • a = (a + n) + m
    • a = a + (n + m)
    • a + 0 = a + (n + m) (lemma 2.2.2, if a = a + 0 then if you see a single variable, adding 0 to it is fine)
    • 0 = n + m (cancel a by cancellation law)
    • 0 = 0 + 0 (corollary 2.2.9)
  • If n = 0, and m = 0 by above, then a = b + 0 which is a = b

Scratchwork for (d) preserving order exercise:

  • d) Addition preserves order
  • a \(\ge\) b iff a + c \(\ge\) b + c
    • Use def 2.2.11 to convert a \(\ge\) b to the form a = b + d, add c to both sides:
    • a + c = b + d + c by 2.2.11
    • a = b + d by cancellation law and a \(\ge\) b by 2.2.11
  • Another method: if a + c \(\ge\) b + c then:
    • a + c = b + n + c by 2.2.11
    • a + c = b + c + n (commutative law)
    • (a + c) = (b + c) + n (associative law)
    • This is (nat number) = (nat number) + n or by 2.2.11 a \(\ge\) b
      • Again don't get hung up on what is inside the brackets, think of larger picture of def 2.2.11 that (value) = (value) + number

Scratchwork for exercise (e):

  • e) a < b iff a++ \(\le\) b
    • if a is less than b, then b = a + n and n does not equal zero (2.2.11)
    • Since n must be a positive natural number by 2.2.7 and 2.2.11 then b = a + (n++) by 2.2.10
    • b = a + (n + 1) (corollary from 2.2.2/2.2.3)
    • b = a + (1 + n ) (commutative law)
    • b = (a + 1) + n (associative law)
    • b = (a++) + n by previous corollary of n++ is n + 1
  • We have a++ \(\le\) b by 2.2.11

Scratchwork for exercise (f):

  • f) a < b iff b = a + d for some positive number d
  • For the sake of contradiction, assume a \(\le\) b in the a = b case for some positive number d. Then a + 0 = a + d by lemma 2.2.2, and 0 = d by cancellation law. This is a contradiction that d must be a positive number so a cannot equal b and a \(\lt\) b

Proposition 2.2.13 fill in the gaps:.

  • When a = 0, we have 0 \(\le\) b for all b (why?)
    • 2.2.1 Addition of natural numbers, 0 + m = m so 0 + b = b and by 2.2.11 0 \(\le\) b
  • If a > b then a++ > b (why?)
    • a = b + d (as per 2.2.12, d must be positive)
    • c++ = d (lemma 2.2.10)
    • a++ = b + (c++) (Replacing c++ for d)
    • a++ > b by def of ordering
  • Or another way:
    • a++ = (b + c)++ (def of addition, (b + c)++ must be positive
    • a = b + c (Axiom 2.4)
    • a > b (2.2.12 d is positive)
  • If a = b, then a++ > b (why?)
    • If a = b, then a++ = b++ by Axiom 2.4
    • a++ = b + 1 (corollary 2.2.2/2.2.3)
    • this is a++ > b by 2.2.11

Proposition 2.2.14 (Strong induction) Intuitively, if the natural numbers are generated by an inductive definition, then you can assume all numbers generated up to m that are less than n, meaning n is no longer just the next step after the base case, it can be anywhere in the chain of generation and you assume an entire sequence of generated naturals up to n have property P(n).

First look up vacuous truth or see the appendix on mathematical reasoning where Tao defines it.

  • Let Q(n) be the property that P(m) is true for all m\(_0\) \(\le\) m < n for n \(\ge\) m\(_0\).
  • Q(0) is true since:
    • m\(_0\) \(\le\) m < 0 is vacuously true as stated by Tao in the exercise hint
    • 2.2.14 proposition of for all n \(\ge\) m\(_0\) is true 0 \(\ge\) 0\(_0\)
  • Show that Q(n++) is true:
  • P(m) m\(_0\) \(\le\) m \(\lt\) n \(\lt\) n++ is true (2.2.11)
  • P(m) m\(_0\) \(\le\) m \(\lt\) n++ is true for all n \(\ge\) m\(_0\)

Try an example.

  • P(n) is the property that n + 0 = n
  • Q(0) is 0 + 0 = 0 and P(0)
  • Q(1) is 1 + 0 = 1 and P(1), P(0)
  • Q(2) is 2 + 0 = 2 and P(2), P(1), P(0)

2.3 Multiplication

Proposition 2.3.4 follow along with definition 2.3.1 of multiplication where (n++) x m = (n x m) + m also recall that n++ is n + 1: (n + 1)m so nm + m

  • left side:
    • a(b + (c++)) is a(b + c) + a because (m(n++)) = m(n) + m from def 2.3.1
    • a(b + (c++)) is a(value)++ or a(value) + a
  • right side:
    • ab + a(c++) is ab + ac + a
    • ab + ac + a is in the form of the inductive hypothesis so change it for a(b + c) + a

Remark 2.3.8 the contradiction is that ac = bc by corollary 2.3.7 when fed into the trichotomy of order can only result in a = b. This is a scheme to produce a temporary notion of division/cancellation.


Scratchwork and notes

Exercise 2.3.1 From def 2.3.1 we have 0m = 0. So we need m0 = 0. Tao states to mimic 2.2.2 but for multiplication:

Lemma X: For any natural number n, n0 = 0.

  • base case: 0(0) is 0 x 0 = 0 by 2.3.1 so 0 = 0
  • IH: n0 = 0
  • Inductive step: (n++)0 = 0
    • (n++) x m = (n x m) + m by def 2.3.1 so (n0) + 0 = 0
    • Remove the + 0 (lemma 2.2.2)
    • Now it is in the form of the IH: n0 = 0 so we can replace 0 for n0
    • 0 = 0 and induction is closed.

From 2.3.1 we have (n++) x m = (n x m) + m but not n x (m++). We need a Lemma Y: n x (m++) is (n x m) + n rewriting Lemma 2.2.3 but for multiplication. Plug in values to make sure it's correct: 1 x (2++) = (1 x 2) + 1 or 2 x (2++) = (2 x 2) + 2.

Lemma Y: n(m++) is (nm) + m:

  • base case: Our lemma X and def 2.3.1
  • IH: n x (m++) = (n x m) + n
  • Inductive step on n: (n++ x m++) = (n++ x m) + n++
  • Left: (n++ x m++)
    • (n++) x (thing) is (n x thing) + thing by def 2.3.1
    • (n x m++) + m++
    • It's in the form of IH now, we can replace (n x (m++)) for (n x m) + n
    • ((n x m) + n)) + m++
      • (n x m) + n + m + 1 since m++ is m + 1
  • Right: (n++ x m) + n++
    • (n++ x m) can be rewritten to (n x m) + m as per def 2.3.1
    • ((n x m) + m)) + n++
      • (n x m) + m + n + 1 since n++ is n + 1

Now we have to prove mn = nm

  • Base case: def 2.3.1 and lemma X
  • Hypothesis: mn = nm
  • Inductive step: m(n++) = (n++)m
    • Lemma Y = def 2.3.1

Exercise 2.3.2

The hint is to prove the second statement first, if n and m are both positive, then nm is also positive. You could use Proposition 2.3.6 as a template for this:

  • If n > 0, and m > 0, then nm > 0
    • We have nm = 0 + d for some positive d
    • Since d is positive, n and m must be positive (or else 0 = 0 + d, a contradiction)

Or you could use Lemma 2.2.10, n x m = (b++) x m and then mimic 2.3.6 to determine n x m is positive if n and m are positive. Note that (b++) x m expands to the form (b x m) + b, and since you know a x b + a is the successor form of multiplication it can't be zero.

Lemma 2.3.3: n x m = 0 iff n = 0 or m = 0. You could try a contradiction using the nm > 0 proof we already wrote, so it must be n = 0 or m = 0 if nm = 0 or a contrapositive of Tao's statement: see the appendix on mathematical reasoning. If Tao states something, and then proves it, then there is a contrapositive statement we can use.

Exercise 2.3.3 For any natural numbers a, b c, we have (ab)c = a(bc). Tao suggests using the distributive law:

  • Inductive Step: (a(b++))c = a((b++)c)
    • (a(b + 1))c by corollary n++ = n + 1
    • (ab + a)c by distributive law
    • (ab + a) is also a(b++)
    • a(b++)c is also ac(b++) by commutative law
      • you can see where this is going, keep distributing and moving things around like we did before to prove associativity for addition

Exercise 2.3.4

Prove the identity.

  • (a + b)(a + b) = (a + b)a + (a + b)b by distributive law/exponentiation
    • aa + ba + ab + bb by distributive law
    • a\(^2\) + ba + ab + b\(^2\) by def 2.3.11/Examples 2.3.12
    • ba + ab is ba + ba by commutative law
    • ba + ba is 2ba by def 2.3.1 where 2 x m = 0 + m + m
      • (a + b)\(^2\) = a\(^2\) + 2ab + b\(^2\)

Set Theory

History of set theory

There is a lecture on the history of sets/functions, explaining ordinals, cardinals, various schools of logic, and a second lecture about the history of set theory with Godel and Turing.

Appendix A: the basics of mathematical logic

This chapter on set theory in Analysis I relies on the fact that you've already read the appendix on logic. Tao makes a point that this chapter is meant to be read and understood not studied and memorized. Proposition A.2.6, proof by contradiction, we will learn later that when pi/2 (90deg) the sine function is 1.

Exercise A.5.1

  • a) For every x++, and every y++, y*y = x or x is the square root of y. Is this true? Square root of two is the obvious candidate for claiming this is a false statement since what y * y = sqrt(2).

Lectures on deduction

Optional lecture, you've already done some deduction from writing proofs, however this lecture shows what is and what isn't deducible. The end of the lecture has overlap with the appendix and the first logic lecture in this workshop so can be skipped.

Quiz: Logic

These multiple choice quizzes are from Tao's Analysis class, try the logic quiz. If you struggle search YouTube for proofs by implication, or search 'discrete math inference rules' there are dozens of tutorials.

3.1 Fundamentals (Set Theory)

You may want to watch this lecture by Wildberger he visualizes sets.

Example 3.1.2 is similar to a nested list such as [list: 1, [list: 1, 2], 100] is length 3, even though there is a nested list overall it still only contains 3 elements. Remark 3.1.3 is in the above historical lectures on set theory. We are referred to Appendix A.7 Equality to read about the substitution axiom which you should have already read.

Lemma 3.1.6 if set A is equal to set (empty), then if x is in A it must also be in set empty, a contradiction, so A must be a non-empty set is how I interpret the lemma, which is a way to say that if an empty set exists, there must also be a possibility of a non-empty set. Remark 3.1.8 Cardinality if you watched the Wildberger History of Mathematics lecture linked above assigns a value.

Note the errata for Definition 3.1.4 from Tao's site: "Definition 3.1.4 has to be given the status of an axiom (the axiom of extensionality) rather than a definition, changing all references to this definition accordingly. This requires some changes to the text discussing this definition. In the preceding paragraph, “define the notion of equality” will now be “seek to capture the notion of equality”, and “formalize this as a definition” should be “formalize this as an axiom”. For the paragraph after Example 3.1.5, delete the first two sentences, and remove the word “Thus” from the third sentence. Exercise 3.1.1 is now trivial and can be deleted." So this is now Axiom 3.1.4, but the definition remains the same.

Exercise 3.1.2 prove emptyset, {emptyset}, {{emptyset}}, and {emptyset{emptyset}} are all not equal.

  • If \(\emptyset\) = {\(\emptyset\)} then by Def 3.1.4 \(\emptyset\) \(\in \emptyset\) a contradiction to Axiom 3.2
  • Same reasoning as above for \(\emptyset\) is not equal to {{\(\emptyset\)}} or {\(\emptyset\){\(\emptyset\)}}
  • {\(\emptyset\)} is not equal to {{\(\emptyset\)}} because [list: 1] is not equal to [list: [list: 1]]. The first has element 1 in it, the second has element [list: 1].

Remark 3.1.21 clarifies the above exercise how {2} and 2 are distinct objects. is not in {1,2,3} but 2 is in {1,2,3}. The rest of this chapter is straight forward definitions of set operations.

Example 3.1.32 looks a lot like programming syntax. There seems to be an order of evaluation, applying the axiom of specification first to create a new set, then applying the axiom of replacement so working right to left in {n++ : n \(\in\) {3,5,9); n<6}

Quiz Sets

Try Tao's quiz. There are many Youtube lectures on basic theory.


You can skip these, even in the preface of this book Tao advocates to just read this chapter so I've only done a few.

  • 3.1.3 prove the remaining claims in 3.1.13:

If a and b are objects, then {a, b} = {a} U {b}: By (new axiom) 3.1.4 we need to show every element x in {a, b} is in {a} U {b}. Suppose that x is an element of {a, b} then by axiom 3.3 x = a or x = b. By Axiom 3.4 this means at least one of the following statements is true: x \(\in\) {a} or x \(\in\) {b}. We divide into cases. If x is in a, then by axiom 3.4 x \(\in\) {a} U {b}, similarly if x is in b, it is also in {a} U {b}. Thus in both cases we see every element of {a, b} is in {a} U {b}. Note the proof works by showing x is in both a AND b which is the definition of union.

  • The next claim, prove the union operation is commutative so A \(\cup\) B = B \(\cup\) A

Follows the same proof he did for the union operation is associative. Suppose x is an element of A \(\cup\) B. By def 3.1.4 and axiom 3.4 pairwise union then x \(\in\) A or x \(\in\) B. By disjuntion this is also x \(\in\) B or x \(\in\) A. So x is in B \(\cup\) A and x is in A \(\cup\) B.

  • 3.1.4 prove claims in Proposition 3.1.18

You can write both these proofs exactly like his 3.1.18 proof for transitivity. Suppose A \(\subset\) B and B \(\subset\) A. To prove that A = B we have to prove that every element A is an element of B, and every element B is an element of A by def 3.1.4. Let us pick an arbitrary element x \(\in\) A. Then, since A \(\subset\) B, x must also be an element of B. But since B \(\subset\) A, then every element of A is an element of B and every element B is an element of A. By 3.1.4 they are equal.

  • 3.1.15 Show how union, intersection and subset are logically equivalent:

Show that any two set operations implies another. If x is in A \(\cup\) B, then by definition of union x \(\in\) A or B, so by disjunction x is in B because logically we can pick either one. If x is in A, since A is a subset of B then x is again \(\in\) B. Thus x \(\in\) A \(\cup\) B if and only if x \(\in\) B and by definition of union, and axiom of equality 3.1.4, A \(\cup\) B = B. Since x is in B, if x is in A \(\cap\) B it must also be in A, so x is in A and B, which is the definition of intersection, and if A \(\cap\) B = A, and if A \(\cup\) B = B, then every element of A is in B so A \(\subset\) B.

3.2 Russell's Paradox

Wildberger has a lecture in his history of math series about this if you watch the lecture on set theory. 3.9 Axiom of Regularity is worth reading to see some more examples of what is and what isn't an element of a set otherwise you can skip this chapter.

3.3 Functions

Example 3.3.3 if you forgot range notation which Tao will teach us in later chapters. Remark 3.3.5 clears up some notation for functions notably a sequence subscript also denotes a function. Example 3.3.9 asks why 3.3.7 asserts that all functions from {empty} set to X are equal, because they have the same domain (nothing) and range.

Our first proof in this chapter is in Lemma 3.3.12, and if we can go by all the previous chapters, this will be our template for the exercises. It also doesn't make sense, so I checked the errata page and sure enough there is errata for this lemma depending what version of the book you're using.

Def 3.3.10 composition, you can think of that composition symbol as being a binary operator (function) that accepts two functions as input and outputs a single function.

3.3.14 See this tutorial and notice Tao has defined this function by it's contrapositive, which you can do as it is logically equivalent. Write your own examples for this definition. If two distinct elements of A map to the same element of B, this is not injective. However if those two A elements are equal to each other then it is injective:

  • A = {a, b, a} (function input)
  • B = {1, 2} (output)
    • f(a) = 1
    • f(b) = 2
    • f(a) = 1
      • but {a} = {a}, so f(a) = f(a) and this function is injective.
  • A = {b, c}
  • B = {2}
    • f(c) = 2
    • f(b) = 2
      • but {c} doesn't equal {b} so this not injective

3.3.17 Onto functions, another tutorial and for Remark 3.3.24 here is a tutorial for bijective inverse functions

Quiz Functions

Another multiple choice quiz from Tao's analysis class.


All optional, do you get what a bijective/surjective/injective function is? Then you're good we will do all these definitions again in a calculus course.

All of these proofs have a youtube tutorial or stackexchange answer, these are common proofs first year students do.

  • Ex 3.3.1 was deleted in errata (See Tao's site)
  • Ex 3.3.2 The proof for this is in the book
  • Ex 3.3.3 The empty function is always injective since it's 'vacuously true' that f(x) = f(x'). It is never surjective since there are no elements in the empty set that fit the surjective definition of every element in Y is a value of f(x) or y = f(x), and thus never bijective, since a function must be both injective and surjective to be bijective.
  • Ex 3.3.4 The tilde hat thinghy is just notation for two different fuctions, like f and f'. Cancellation law means we're trying to prove two different functions are the same so g = g' and f = f'. If f and g are injective, then g(f(x)) = g(f'(x)) and f(x) = f'(x), and by one-to-one definition of injective functions f = f' (since f is g's input). Surjective is similar, g(y) = g'f(x)) = g(f(x)) = g'(y) so g = g'. Does g need to be injective for this proof to always hold? Try corner case examples from the text, like if g = n2, then g(f(-1)) and g(f'(1)) have the same output, so two different inputs to g with same output means f(x) can't equal f(x') as per definition of one-to-one functions so we can't always derive f = f'.
  • Ex 3.3.5 If f is injective in g(f(x)), then g doesn't need to be injective. Consider an example A = {1}, B = {2, 3}, and C = {4}. f(1) = g(2) = 4 so f is still injective. The difference in 3.3.4 is the wording 'show that these two input functions are always equal if g in injective' and the second question 'Is this still true if g is not injective?' which means a single case where if g is not injective, and we get two non distinct inputs that equal the same outputg, the whole proof is false whereas here we are asked 'must g always be injective' so just needa single case of not needing to be injective.
  • Ex 3.3.6 Is described here see Remark 3.3.24 if you forgot what an inverse function is. Note how the inverse of f and f cancel each other out.
  • Ex 3.3.7 Similar to above and his association proof

Proof strategies

Watch this starting @ 36:00, from 41:00+ he shows how he solves problems step by step, trying small cases to understand the problem better and poking at the problem enough to finally figure out a proof.

3.4 Images and inverse images

This is only casual reading, skip the exercises and return to do them when these topics come up again.

Chapter 3.4 of Tao's Analysis I. Example 3.4.5 makes more sense if you read Example 3.4.6 too, f-1({0,1,4}) is the set of everything that could result in {0,1,4} if f(x) = x2, so {-2,-1,0,1,2}. The rest of this chapter, if it doesn't make sense on first read search youtube ie: indexed sets.

3.5 Cartesian products

Again more casual reading, come back to do the exercises as needed.

Another topic with plenty of youtube tutorials. Definition 3.5.7 notation for product is introduced, it looks complicated but all it's doing is specifying an index (i is greater or equal to 1, and less than or equal to n) and repeated cross product. The exercises some of the solutions are walked through on youtube. The cartesian plane itself is a cartesian product of R x R, you have tuples (x, y) or (x, y, z) so R x R x R.

3.6 Cardinality of sets

Skip the exercises like the last two chapters, we just want to be familiar with the material. Reminder in the preface of this book, Tao himself advocates to skip the exercises as 'set theory is not essential for analysis' but it will be essential for us to understand numerous other math topics, at which time we can come back here.

A good tutorial. Tao is using #(A) notation which means finite set cardinality, while |A| cardinality notation means infinite. There exists errata for the exercises specifically 3.6.8. The last exercise the pigeonhole principle if set A has greater cardinality than B, then f(x) A -> B can't be injective, since there would be two elements in A that point to a single element in B.

Integers and rational numbers

Introducing the integers

  • YouTube - Arithmetic and Geometry Math Foundations 12

Wildberger's definition is similar to Tao's, he walks through a proof of a-(b-c)=(a-b)+c

4.1 The integers

Chapter 4 of Analysis I by Tao. Definition 4.1.1 is exactly the definition from the above Wildberger foundations video. Lemma 4.1.3 proof you can finish most of it yourself without reading the whole proof. Lemma 4.1.5 Tao notes the same concerns Wildberger expressed why they both used the 'a less b' style of definition instead of just using Lemma 4.1.5 as the definition of integers, because of the many different cases that would be needed. Tao mentions this definition of the integers forms a commutative ring because xy = yx.


You don't have to do the exercises, though proving equality as per A.7 for every object he creates is worth trying. This is the poor scratchwork I did

4.1.1 Proving A.7 Equality again. From Def 4.1.1 equality is defined as: a–b = c–d iff a + d = c + b and the variables are natural numbers. Does a–b = a–b? In this defition: a + b = a + b so yes, since we already proved natural numbers obey the reflexive axiom. Does x–y = y–x or x + x = y + y and yes, x and y are natural number objects. If a–b = c–d, and c–d = x–y, does a–b = x–y? Converting everything to their respective natural number definition, and arranging on each side: a + d + c + y + a + y = c + b + x + d + x + b cancel and you're left with 2a + 2y = 2b + 2x, divide each by 2: a + y = x + b, converting this back into an integer this is a–b = x–y similar to Tao's proof in 4.1.1 of transitivity.

4.1.2 If -(a–b) = -(a'–b') are defined as b–a = b'–a' then b + a' = b' + a. This is the integer b–a = b'–a'. Now convert a–b = a'–b' to a + b' = a' + b and notice they are the same, and since natural number construction -(a–b) = -(a'–b') if a–b = a'–b'.

4.1.3 (-1) x a = -a to solve use Tao's advice in remark 4.1.7 to assign a = (a–b) or some other integer notation. In lemma 4.1.3 Tao writes 1 is equal to 1–0, so if want to negate 1, this is 0–1, and in full integer notation: (0–1)(a–b) = ((0a + 1b)–(0b + 1a)) which is (b–a), which is what (a–b) would be if -(a–b).

4.1.4 Prove the rest of 4.1.6, we already did x1 = 1x = x kind of in exercise 4.1.3, convert 1 to (1–0) and pick any arbitrary variables to represent x as (a–b) and this is easy to prove. Same with 0, use lemma 4.1.3 to convert to (0–0) so (x–y) + (0–0) = (x + 0)–(y + 0) or (x–y).

4.1.5 ab = 0 is (x–y)(r–s) = (0–0) or xr + ys = 0 which is true only if xr = 0 and ys = 0. So (0–0)(r–s) or (x–y)(0–0) or both are 0 since we are dealing with natural number constructions as per lemma 2.3.3 nm = 0 only if n=0 or m=0.

4.1.6 if a, b, c, are integers such that ac = bc and c is non-zero, then a = b. (a–a')(c–c') = (b–b')(c–c') and c is non-zero, then (a–a') = (b–b') or a + b' = b + a'. (ac + a'c')–(ac' + a'c) = (bc + b'c')–(bc'+b'c). This is ac + a'c' + bc' + b'c = b'c' + bc + a'c + ac' or factored to c(a + b') + c'(a' + b) = c(b + a') + c'(b' + a). Now we are the exact same place as Tao's proof in Lemma 4.1.3 where he wrote since a + b' = b + a' two sides are equal. If (c–c') was (0–0') then this would be 0 = 0 not (a–a') = (b–b'). Not a proof, but explains why the cancellation law works. If a + b' = b + a' we can also replace, so c(b + a') + c'(b + a') = c(b + a') + c'(b + a'). Subtract one of the c(value) or c'(value) from both sides so c'(b + a') = c'(b + a') and now we can divide by c' to get b + a' = b + a', and since a + b' = b + a' then replace back b + a' = a + b' which won't work if c' = 0. Not really the proof Tao wanted but it kind of works.

4.1.7 You can also do these in integer notation. If a,b,c are integers, then a > b iff a - b is a positive natural number. Ordering is defined as a = b + n and a does not equal b in strictly greater than a > b. Taking a = b + n subtract b from both sides, we get a - b = n. If n = 0, then that means a - b = 0 which is a contradiction to Tao's definition that a and b are not equal and thus a - b = some positive n iff a > b is how I read that exercise, the rest are similar.

4.1.8 find an example property, where Axiom 2.5 as stated in the book doesn't work for integers. The axiom relies on the fact that P(n++) is true so negative n can break many properties since n++ will be another negative number unless you state like Tao does 'for all positive integers'.

Rational numbers

  • YouTube - Arithmetic and Geometry Math Foundations 13

This definition is similar to Tao's in Analysis I. The inverse law, write your own example, the rational number 4/1 inverse is 1/4 and 4/1 x 1/4 is 4/4 or 1. A rational number is (a-b)/(c-d) according to it's definition but we write them as p/q and in some cases just p if q = 1.

Visualizing the rational numbers

  • YouTube - Arithmetic and Geometry Math Foundations 14

The rational number line sits on [0, 1], so you don't end up with a zero denominator which isn't defined. A rational number is a line through the origin [0,0], and an integer lying on point [m,n]. Ford circles are an example of the massive complexity of the rational numbers, which are expanded upon in the next lecture.

The deep structure of the rational numbers

  • YouTube - Real numbers and limits Math Foundations 95

This is about 35m long but worth the watch if you want to see what a convex combination is, learn about rational intervals, and a more complete description of Ford Circles set up with Farey sequences. This will help us understand what a sequence is later in Tao's book. The overall takeaway of this lecture is that the smaller the interval you take on the rational line, the more complex the numbers become with enormous denominators meaning it has a very deep structure.

The next videos, #96 and #97 about Stern-Brocot trees and wedges are highly recommended and can also be found in the book Concrete Mathematics. In Lecture #97, he takes a visualization on the plane of the Stern-Brocot tree and shows it also can be used as a geometric interpretation for the rational numbers, it's kind of crazy when you see it. The 'fraction' 1/0 is also explained.

4.2 The rationals

Tao's Analysis I chapter 4.2, definition 4.2.1: 3//4 = 6//8 = -3//-4 we know from Wildberger's visualization of the rational numbers, these numbers all lay on the same line. Lemma 4.2.3, in the proof of addition getting rid of +bb'cd is a simple case of subtracting it from both sides, cancelling it out leaving ab'd2 = a'bd2 and he can divide both sides by d because d is known to be non-zero, since it's in the denominator of a/b + c/d.

To recap we defined the natural number object, embedded them in the integers (a - b) to define integers, then embedded integers inside the rational numbers (another constructed object) where the arithmetic of the integers is consistent with the arithmetic of the rationals. The rationals are a field because of the last identity in 4.2.4 the inverse algebra law. Tao's example proof is similar to the proofs we've seen Wildberger do, convert the letters into integer // integer form, apply law, commute the results together so that they are the same.


These are pretty easy, try them. 4.2.6 in my copy must be a mistake 'show that if x, y, z are real numbers…' that must be 'rational numbers' and sure enough checking Tao's errata page it is a mistake. There's a mistake in exercise 4.2.1 as well depending which version you're using.

  • Ex 4.2.1

We have the usual exercise, show that the definition of equality for rationals is reflexive, symmetric and transitive. We go into the Appendix A.7 Equality and see that the Reflexive axiom is given any object x, we have x = x. In Definition 4.2.1 equality of two rational numbers is defined as a/b = c/d iff ad = cb. If a/b = a/b then ab = ab thus reflexive. Symmetry axiom, if x = y then y = x. If a/b = c/d, symmetric equality is c/d = a/b. By definition 4.2.1 a/b = c/d is ad = cb, and c/d = a/b is cb = ad so Symmetry axiom satisfied.

The hint for proving transitivity is to see Corollary 4.1.9 the cancellation law for integers: If ac = bc, and c is non-zero, then a = b. In A.7 the Transitive axiom states if x = y and y = z then x = z. If a/b = c/d, and c/d = e/f, then a/b = e/f or if ad = cb, cf = ed, then af = eb. We can also rewrite this as adcf = cbed and simplify to af = eb. Staring at this we isolate af(cd) = eb(cd) so the rest of the proof involves logical deducation to determine that c and d aren't zero so we can cancel them from both sides, or through this deduction infer another equality of ab = ef, there is a few ways you could prove transitivity. For example we know that d can't be zero since c/d wouldn't be a rational number. If ad = 0 since we know d is positive, a must be 0. If c is zero, then cf = ed is 0 = ed and again d must be positive so e = 0. So now we have a and e as possible 0, so af = eb is 0 = 0 or: 0/b = 0/f and transitivity is proven though in a convoluted sort of way using deduction and making a lot of implications.

The transitivity proof is a great example of just naively writing your proofs until you get better eventually. We used Tao's hint of finding numbers than can or can't be zeros and are getting better using mathematical logic to deduce answers.

  • Ex. 4.2.2

Prove whatever remains to be proved in Lemma 4.2.3, let's try the negation since the multiplication is simple. If a/b = a'/b' then ab' = a'b is the setup Tao gives us. If -(a/b) = (-a')/b' then by def 4.2.1 we get (-a)/b = (-a')/b' which is -ab' = -a'b and multiplying each side by -1: ab' = a'b holds. Lemma 4.2.3 multiplication: If (a/b * c/d) = ac/bd and (a'/b' * c/d) = a'c/b'd we need to prove ac(b'd) = a'c(bd) by manipulating it into the form ab' = a'b since a/b = a'/b'. We can just do what Tao did, cancel c and d from both sides, leaving ab' = a'b however we should make sure they aren't zero, using the same kind of deduction we used to prove transitivity. We know d can't be zero since c/d would mean it's not a rational number if zero. Another strategy of course is to just add a scalar to the original ab' = a'b of (cd) on each side, which is fine, and solves the proof, providing you point to the definition in the book where a = a is the same as ab = ab.

  • Ex. 4.2.3

Prove the rest of Proposition 4.2.4. The first is the same as Tao's demo proof. Scratch work, you should try all these yourself they are simple:

  • x + 0 = 0 + x = x
    • a/b + 0/1 is (a*1) + (0*b)/b*1 or a/b
    • 0/1 + a/b is (0*b) + (a*1)/b*1 or a/b
  • x + (-x) = (-x) + x = 0
    • a/b + (-a)/b is (ab + -ab)/bb = 0/bb or 0
    • (-a)/b + a/b is (-ab + ab)/bb = 0/bb or 0
  • xy = yx
    • a/b*c/d is ac/bd
    • c/d*a/b is ac/bd
  • (xy)z = x(yz)
    • (a/b*c/d)e/f is ac/bd*e/f or ace/bdf
    • e/f(a/b*c/d) is e/f*ac/bd or ace/bdf
  • x1 = 1x = x
    • a/b*1/1 is a/b and also 1x as per xy = yx which we just proved
  • x(y + z) = xy + xz
    • simple task to do on paper a/b(c/d + e/f) = (cfa + aed)/bdf
    • a/b*c/d + a/b*e/f = ac/bd + ae/bf or (acbf + aebd)/bbdf and cancel one of the b's which we know can't be negative since it's in the denominator
  • rest same as above
  • Ex 4.2.4

Prove Lemma 4.2.7 is exactly the same as Tao's proof for Lemma 4.1.5 of the integers just replacing rational numbers and their definitions.

  • Ex 4.2.5

We've already proven propositions of order in naturals and integers, this is similar but with a/b < c/d

  • Ex 4.2.6

If you multiply -1 to both sides negating them, then 1 < 2 will end up -1 > -2 which is what should happen since -1 is not as small a qty as -2 if our interpretation is a number line. Proof is somewhere in Def 4.2.8, if x < y then x - y is a negative rational number so a/b - c/d = (-e)/f. Then come along with negative rational z: (-g)/f and multiply both sides, we get a positive number so x > y, at least in scratch work I did until I ran out of time.

Decimal numbers

The next chapter in Tao's Analysis I uses decimals to represent tiny positive quantities (epsilon) so we should probably know what decimals are.

Wildberger has a basic introduction lecture using natural numbers that assumes you know nothing about exponents and reviews the hindu-arabic notation, shows how to convert a decimal to a fraction. The second lecture introduces rational number decimals and their visualization.

His definition is a decimal number is a rational number a/b, where b is a power of 10 (hence deca/decimal), then shows various examples how you can manipulate rational numbers like 1/2 to be 5/10, or 0.50, we also find out why 1/3 is not a decimal number. In binary, which is a base-2 system, the only prime factor is 2. If you load up any language interpreter and try and add 0.1 + 0.2 you will get surprising results, typically this number: because of the way floating point is used to represent decimals.

At 36:00 arithmetic w/decimals is covered, showing how the rational number and decimal number arithmetic methods are the same so you can just stay w/rational numbers if you want converting back to decimals at anytime.

Appendix B: the decimal system

From Tao's book Appendix B.1 The decimal representation of natural numbers his opinion is that the decimal system is not essential to mathematics, computers have replaced it with binary/floating point and 'we rarely use any numbers other than one-digit fractions in modern mathematical work'. Of course to understand scientific notation we still need to know how decimals work and we just did with those Wilberger lectures so this chapter is optional (we can't do it anyway before learning the the real numbers). The single exercise in B.1 is interesting, 'the purpose is to demonstrate that the procedure of long addition taught to you in elementary school is actually valid'.

4.3 Absolute value and exponentiation

Def 4.3.1 you apply that def to the rational number, so if x = -3, the absolute value is -x so -(-3) which of course makes it a positive rational number, as seen in the example of 4.3.2 where the absolute value of 3 - 5 = 2.

Def 4.3.4 that symbol is epsilon, meant to be a tiny quantity he defines as greater than 0. The example 4.3.6 if two numbers are equal, and their difference is zero, then they are 'epsilon close' for every epsilon since their difference is not bigger than any positive epsilon.

Proposition 4.3.7 recall that all these definitions are for d(x, y) or d(z, w) so x - y or z - w. His proof he moves xz to one side, so it becomes negated (subtract from both sides). He then manipulates |az + bx + ab| according to previous definitions to get |a||z| etc., and then uses the various propositions in 4.3.7 to rewrite them as eps|z| + delta|x| + eps(delta). This will make sense after we attempt to prove the rest of the propositions.

Rest of this chapter covers exponents and their properties.


Exercise 4.3.5 Prove that \(2^n \ge n\) for all positive integers n using induction.

  • base case: n = 0
    • 20 = 1 and since n = 0, 1 > 0.
      • but Tao specifically wrote 'positive integers' so base case must be 1
    • 21 = 2 and since n = 1, 2 > 1
  • assumption: \(2^n \ge n\) for all positive integers n
  • successor: \(2^{n + 1} \ge n + 1\)
    • left side:
    • \(2^{n + 1}\) is 2n*2 by def 4.3.9
    • 2*2n is also 2n + 2n by prop 4.3.10 (a)
    • now we have 2n + 2n \(\ge\) n + 1
  • since 2n \(\ge\) n (pointing to our assumption), then 2n + 2n \(\ge\) n + n
    • n is only positive, so n + n is \(\ge\) n + 1
    • 2n + 2n \(\ge\) n + n \(\ge\) n + 1
      • here we have standard transitivity proving \(2^{n + 1} \ge n + 1\)

This of course falls apart if Tao changed it to 'for all integers' instead of positive integers, or if we allowed our assumption 2n > n to have n = 0, then we could not assume n + n is \(\ge\) n + 1. Search around for proving inequalities on YouTube there are dozens of demonstrations of various strategies.

4.4 Gaps in the rational numbers

The idea here is that certain things don't exist in the rational numbers object like the square root of 2, so there are gaps where the square root of 2 should be.

Proposition 4.4.1 there is an integer n, and a rational number x, such that n <= x < n + 1 so sandwiched between integers 2 and 3 are rational numbers such as 25/10 or 100/40 etc.

Proposition 4.4.3 there is always some rational you can squeeze in between two other rationals as we've already seen in Wildberger lectures. 1/800 < 1/500 < 1/100 etc. If you want put in examples for his generalized proof to better understand it like x = 2, y = 3 and z = (2 + 3)/2.

Proposition 4.4.4 famous proof by contradiction there does not exist a rational number x for which x2 = 2 though Tao's is significantly longer than most and as per the preface this is for the reader's benefit. He rewrites (p/q)2 = 2 into p2 = 2q2 by multiplying each side by the denominator, which is q2 as p/q squared is p/q * p/q or p*p/q*q.

Proposition 4.4.5 is basically saying for any epsilon bigger than 0, there exists a rational number that when squared it's less than two, and less than x + epsilon. Remember there is an infinite amount of rational numbers between 0 and 2 like trillion/1000 billion = 1. 12 is 1, if epsilon is 1 so that x + epsilon = 2, then (x + eps)2 is (1 + 1)(1 + 1) and bigger than 2. He asks why ne (e = epsilon) is non-negative for every natural number n in (ne)2 and the parens grouping ne and squaring it obviously make it positive ie (-2)2.

Tao mentions Dedekind cuts will be avoided, and the theory of infinite decimals has problems so also will be avoided. Wildberger has numerous lectures on both theories if you're interested the most detailed ones are in his 'Famous Math Problems' playlist. Wildberger makes the point that you will only ever find the cherry picked example of the square root of 2 for Dedekind cuts in any undergrad material because of the mind boggling complexity involved trying to partition a set for pi2/6 and other irrational numbers.

The infinite descent exercise Tao has basically handed the solution to us, if you fix any sequence of natural numbers then by their definition there are finitely many numbers from 0 to your fixed point, so you can't choose infinitely many distinct natural numbers less than your fixed point so infinite descent does not exist for the naturals because you'll run out of them. Integers you can infinitely descend to negative infinity and of course rationals we know you can infinitely descend with bigger denominators until you approach zero or descend to negative infinity.

Some Algebra


Some optional mostly 10 minute lectures on algebra basics, such as how Pascal's array/Khayyam triangle/Yang Hui's triangle encodes the binomial theorem.

Linear Algebra

We're doing Analysis I and Linear Algebra in parallel. These sets of lectures cover Linear Algebra from a geometric approach so later when we work through Tao's Linear Algebra notes we can understand what is happening.

This series is unique in that each linear algebra topic is setup with the problem it's trying to solve, how the special case of that problem can be solved, then how you would go about creating and proving a general case and the general case is what you find in most linear algebra books, they don't build it up from scratch. Wildberger has removed most of the complexity of linear algebra and any middle school student such as grade 7 and up can understand this series.

Introduction to Linear Algebra

He constructs the affine plane from scratch. The vector (8/3, 10,3) you need to full screen to better see exactly where they are. From B, you move right 2 lines, and 2/3 so \(2 + \frac{2}{3}\) which is 8/3. From there you move up 1/3, and 3 so \(3 + \frac{1}{3}\) which is 10/3. Reasons for using the affine plane are you do not impose anything on it artificially like the origin (0,0) in the Cartesian plane, you can do projections easily, and it's construction is simple, since only parallel lines.

The biggest a takeaway here is what vector direction is, what a basis vector is and how you can combine them to form larger vectors, and how to translate basis vectors to suit different perspectives. The exercises at the end the delta symbol will return in the lecture on area and the determinant.

In this optional related lecture he shows how ancient people made a ruler using just parallel line geometry, which leads to modelling the rational continuum. Some more optional (short) videos is this from 3Blue1Brown on what a vector is, and this presentation of basis vectors and what a span is. Change of basis is video 13 in the 3BlueBrown playlist but the explanation here by Wildberger in the first lecture I find is better.

Vector Arithmetic

All this vector arithmetic will come up later in other courses, at least now you will be somewhat familiar with it. Subtraction means going in the opposite direction, so if you're at the head of one vector and going to another vector's head, you are subtracting. The generalized affine combination where coefficients add up to 1 will come up again.

The proof for diagonals of a parallelogram bisect each other, note how quickly this can become a pile of symbols.

  • let lambda = mu in: lambda = 1 - mu
    • if lambda = mu, then moving mu over to the other side means 2lambda
    • 2lambda = 1
    • Divide each side by 2 to isolate lambda
    • lambda = mu = 1/2

@38:00 or so solving linear eq with 2 unknown variables we already covered in the first lecture

Center of mass and barycentric coordinates

Since we just did Physics in the software workshop (CS019 Differential systems) let's learn about Archimedes principle of the lever, accelerations in the grid plane and Newton's laws. This will come up again, note the labs here for a course in applied math specifically Lab Manual vol 2, writing python fuctions that perform Barycentric Lagrange interpolation. This is used to study air pollution by approximating graphs.

Area and volume (determinants)

The fourth lecture in the series to lead up to determinants. You probably recognize ad-bc from the first lecture, which was one of the exercises. He uses different notation for bi-vectors since it represents the union of vector a and b. The torque example you have probably seen somebody using a long pipe over a wrench in order to increase the torque without needing increased force, now you know why increasing the area of the wrench increases the torque. All 3 physics examples can be modelled with a parallegram/bi-vector. The bi-vector rules (ie: adding a scalar to one side of a bi-vector doesn't change the bi-vector) will make sense around 35:50 when he produces a picture proof showing how moving things doesn't change the bi-vector. At around 38:00 we once again get the formula ad-bc, the signed area is a scalar of two basis vectors. 45:50 introduces three-dimensional affine space.

Now that you've seen this model of area using bi-vectors, see this 10 minute presentation by 3Blue1Brown which will fill in some more details, like why the signed area can be negative if you didn't already figure it out. The exercises are conceptually easy, you don't have to do all of them but the whole point of linear algebra is to take these geometric concepts and do them in algebra.

Change of coordinates and determinants

Lecture 5. Now we get a full intro to matrix and vector notation, and matrix operations, this is normally where linear algebra courses start. I like the fact that he built up the change of coordinates first so we aren't just presented with a meaningless array of numbers.

Exer 5.1 if you make a B matrix with e, f, g, h and then multiply AB, take the resulting matrices determinant. The piles of letters should equal det(A) x det(B) after you do some commutative rearranging, this is a 10 minute exercise.

2x2 matrices I

Lecture 6. I appreciate he starts all these lectures with a review, that makes them mostly self contained. This is something I haven't seen before, explaining all the arithmetic of 2x2 and higher matrices using just a 1x1 matrix. The inverse of a matrix is now simple to understand. He walks through some proofs of matrix laws like the associative law (AB)C = A(BC). The standard cartesian coord system is used here for examples of transformations like dilations and shear. A dilation is growing or shrinking, like your eye being dilated.

The end of this lecture: how do you visualize an arbitrary transformation? The next lecture expands on this intuition.

2x2 matrices II

Lecture 7. This tells you how to visualize transformations T(e1), T(e2) are the columns of a matrix, so you can look at it and see how it is being transformed. He briefly introduces an alternate framework of rational trig so we don't have to use those sin/cos functions to figure out a rotation. There is a book and entire playlist for rational trig, if you want to learn it write your own software library for rational trig as you go.

End of the lecture introduces non-linear approximations, using differential calculus, to set up doing calculus in higher dimensions. The exercises you become 3Blue1Brown, able to visualize transformations yourself.

Inverting 3x3 Matrices

Lecture 8. 3 dimensional linear algebra. Pretty good setup explaining the inverse of a 3x3 matrix, most books just give you a formula. We find out why delta is 1/delta. Transpose is also explained better than I've seen before. Reminder that these really complicated introductions all have an off the shelf formula, but he's showing us how they are derived fully so you understand the formula. 3x3 matrix multiplication is explained and pretty simple, explains why B(A(vec)) is the same as (BA)vec. If you're also doing the AI workshop, in 15-388 it's claimed B(A(vec)) is preferable in terms of complexity of running time to (BA)vec.

Real Numbers

This is a pretty good chapter in Tao's book, probably the clearest treatment I've ever seen of what exactly the construction of the Real numbers is using Cauchy sequences but not using limits to define them and other circular definitions. Tao invents some concepts here for the sole purposes of staging the definitions then tosses them away when we learn limits. Why do we need yet another numbers objects? Remember the supposed gaps in the rational numbers? This new object is supposed to fix the gaps. There's many, many more numbers objects like complex numbers.

What exactly is a limit

  • YouTube - Real numbers and limits Math Foundations 106

Watch the first 20 minutes of this lecture and you'll see what the notion of an epsilon displacement is. Note that Tao will define all these things for us later, but at least you'll have an idea of what epsilon of a sequence is. The rest of the lecture is excellent covering a different definition of limits but optional.

2D geometric interpretation of Cauchy sequences

  • YouTube - Real numbers and limits Math Foundations 94

Watch from 2:20 to 16:00, he shows the 2D geometric interpretation of Cauchy sequences, and how they are interpreted as 1D on the real line, the idea of 'converging sequences'.

Real numbers and Cauchy sequences I

  • YouTube - Real numbers and limits Math Foundations 111

He explains the triangle inequality around 8mins, we just read about this in chapter 4.3 of Analysis I (and is one of the exercises).

Real numbers and Cauchy sequences II

  • YouTube - Real numbers and limits Math Foundations 112

An excellent lecture why we even need these complex limit and real definitions in the first place, showing how limits make many subjects in analysis much easier. Archimedes bracketed measurement is covered when he tried to figure out the approximation of pi. We also get an introduction to transcendental functions as a geometric series, something we will see in Analysis II by Tao in detail. The next lecture in the series is optional but Wildberger presents an alternative definition of real numbers, using Archimedean style of nested intervals where equality, algebra laws, order etc are all trivial to prove.

5.1 Cauchy sequences

Wildberger has good lecture introducing what a sequence is here.

Def 5.1.1 the notation outside the bracket is n = starting index, and the infinity symbol is how far the index should go which is forever because we're now doing continuous mathematics.

Example 5.1.2 reminds us of Def 4.3.4 d(x,y) = |x - y| \(\le \epsilon\) so if x and y are 2, then 2 - 2 = 0 and every single epsilon is valid since x - y will always be less than epsilon.

The definition 5.1.3 of epsilon-closeness means two adjacent indexes like 1 and 2 in our above array when subtracted are less than or equal to the epsilon we declare as being epsilon-close. 1.4 - 1.41 (absolute value). The next set of adjacent sequence indexes is 2 and 3, so 1.41 - 1.414. Everytime we subtract two of the indexes their displacement or 'absolute value distance' should be closer and closer meaning a smaller epsilon.

Let's do an example:

  • a4 = 1.41 1.412 1.4143 1.41424
    • index 1 - index 2 = 0.1 (absolute value)
    • index 2 - index 3 = .004
    • index 3 - index 4 = .0002

This sequence is 0.1-close, after index 2 it's 0.004-close, after index 4 it's 0.0002-close.

Def 5.1.6 if you watched the Wildberger lectures on What Exactly is a Limit then you can see what Tao is setting up here with epsilon-steadiness, there is some index in the sequence where the entries past that index are epsilon-steady. Example 5.1.7 the sequence 10,0,0,0.. d(10,0) is 10, so this sequence is not eps-steady for any epsilon less than 10, but if we choose N past 10, then all the remaining sequence is 0,0,0.. and d(0,0) is 0 - 0 so as per Def 4.3.4 will always be less than any epsilon so 'eventually eps-steady for every eps > 0'.

Def 5.1.8 if you saw the above Wildberger lectures on Cauchy sequences this is defining a convergence so any sequence that eventually becomes epsilon-steady, is a Cauchy sequence. Example 5.1.10 depending on your edition there's errata, in mine (3rd edition Springer from libgen) the example makes no sense.

Prop 5.1.11 proof he cites the fact that there always exists some rational number you can squeeze in between numbers and chooses 1/N noting that no beginner would come up with this proof and it is merely staging in preparation of learning what a limit is. Later in Example 5.1.13 he notes Prop 4.4.1 can again be used to prove the infinite sequence 1, -2, 3, -4.. has no bound, I would guess because of the 4.4.1 statement 'there exists a natural number N and rational x such that N > x ie: no such thing as a rational that bounds all the natural numbers so an infinitely increasing and decreasing sequence can't be bounded.

Exercise 5.1.1 since we declared it a Cauchy sequence, you can point to the definition of Cauchy sequences being an object that eventually becomes epsilon-steady, so Lemma 5.1.14 proves the finite part of Cauchy sequences up to epsilon-steady are bounded, and obviously epsilon-steady means it's bounded since you can subtract two indices and they will always be less than epsilon if this is a Cauchy sequence and epsilon-steady, or 1-steady, 2-steady, 1billion-steady. So both are bounded are ergo all Cauchy sequences are bounded using his definition of episilon-steadiness.

There is a lecture here if you're interested more about the proofs used in this chapter.

5.2 Equivalent Cauchy sequences

Prop 5.2.8 is the proof that 0.9999… = 1, and again prop 4.4.1 is brought out to claim we can always choose any rational number we want to make 2/N <= epsilon. If you follow this proof:

  • (1 + 10-n ) - (1 - 10-n )
  • 10-n + 10-n = 2 x 10-n
  • 10-N is \(\le\) 1/N because of the chapter on exponents, if N = 3 then 10-3 is 1/3x3x3 which is smaller than 1/3
  • Now choose an N that satisfies the requirement that 2/N \(\le\) \(\epsilon\) and it will automatically be true that it also satisfies 2 x 10-N \(\le\) \(\epsilon\)
    • Prop 4.4.1: you can always choose some rational to shove in there that will satsify the req

Only two exercises here, simple point to definition pretend you are a lawyer writing an argument and convince an invisible jury through explanation why it's true an equivalent sequence of rationals is a Cauchy sequence if and only if one of them is a Cauchy sequence.

5.3 The construction of the real numbers

Tao has one of the few books anywhere that actually defines the real numbers. Just like understanding the constructions of rationals and integers helped us understand their operations better, understanding the construction of the reals should help us better understand crazy abstractions like xe.

Def 5.3.1 the formal limit of a Cauchy sequence, meaning the number a sequence is converging towards, is a real number. Every rational number is also a real number, for example if we choose 5, then a Cauchy sequence converging to 5 like 4.99999… is bounded by 5 and is a real number. The notation is: x is a real number if x = the limit of a Cauchy sequence an, as the sequence goes from index n to infinity so a1, a2, a3

The rest of the chapter covers the same definitions we've already seen except now we are working with a different object, but since it's construction is entirely made up of rational numbers in a sequence almost all the laws are the same. Division by zero not being allowed for real numbers is explained using the sequence 0.1, 0.01, 0.001.. is actually converging towards 0, and it's inverse 1/0.1, 1/0.01, 1/0.001 is 10, 100, 1000… an infinite unbounded sequence therefore not a Cauchy sequence, therefore not a real number, so division by zero is undefined.

The #1 takeaway for this chapter is the real numbers are all a limit. I am just casually reading the proofs in this chapter, Tao has simplified it for us with his epsilon-close 4.3.7 propositions and his definition of equivalent Cauchy sequences. The proof of Lemma 5.3.6 was already explained by Wildberger here if you're wondering what epsilon/2 means.

Looking at the exercises, I'm guessing to prove 5.3.5 we show the reciprocal of 1/n is unbounded so we must be working with a sequence not bounded away from zero. In prop 5.1.11 Tao proved the 1/n sequence is a Cauchy sequence, and since all Cauchy sequences are bounded you can also show it's bound is 0. There is worked out solutions for this chapter here but you will never need to know how to prove any of these definitions.

5.4 Ordering the reals

For me it was interesting to go back to prior definitions and see how they work when cast into this Cauchy series abstraction. Prop 5.4.8 if 5 > 4 then 1/5 < 1/4.

Prop 5.4.9 'non-negative reals' means 0 or positive, while positive reals do not include 0. Closed I take to mean a set is closed iff the operation on any two elements of that set produces another element of the same set, in other words adding two non-negative real numbers produces another non-negative real. The explanation of the proof is in the comments here but Tao tells us we'll see this more clearly in future chapters on closures.

Prop 5.4.14 we already know from previous proofs and Wildberger lectures that there is an infinite amount of rational numbers you can squeeze in between two other numbers.

5.5 The least upper bound property

Example 5.5.7 the empty set doesn't have a least upper bound because of prior definitions where you can always find a rational number (which are also contained in the real numbers) to act as an upper bound to nothing like 1/trillion, 1/septillion, etc.

The proof for Theorem 5.5.9 is huge requiring going back to look at the Archimedean property again and some exercises. Wildberger has broken down this proof showing the gist of what is going on but the last part of the proof that the upper bound S is less than or equal to every upper bound of E is easy to follow.

Remark 5.5.11 we get the answer to 'can you add negative infinity with positive infinity and get zero' which is no.

The proof of prop 5.5.12 makes sense until you get to that inequality, wondering where he got x2 + 5e, which is the Archemedian property. This is basically saying x is the least upper bound, x plus some tiny displacement (epsilon) squared is less than or equal to the Archimedian property of x2 + 5(displacement) since if epsilon is really small, squaring it means it will be even smaller. He derives a double contradiction that x2 is bigger than 2, and x2 is less than 2, so x2 must equal 2 and magically we have a number that solves x2 = 2 which is not rational.

Exercise 5.5.1 you can type into youtube 'proof of infinum' and get a dozen tutorials or you can go through Prop 5.5.8 and Theorem 5.5.9 to change everything to prove a least lower bound which gives some more experience with the Archmedian property which all these proofs seem to rely on. The rest of the exercises 'draw a picture' think of Wildberger with his within epsilon/2 picture in earlier lectures.

5.6 Real number exponents I

Finally we learn square roots. Depending on your copy of Tao's book there's some minor errata here to look up. Note in the proof of Lemma 5.6.8 he's using equivalency replacement, if \(y^a = x^{1/b'}\) then we can replace them both for any occurences, such as \((y^{a})^{a'}\) = \((x^{1/b'})^{a'}\). Anytime in the future when you see some complicated nested square roots you can go back to this chapter and understand them.

Exercises 5.6.2 and 5.6.3 are simple and worth doing, using replacement to manipulate exponent laws. Exercise 5.6.3 that's the square root of x2, so x or using Lemma 5.6.6(g) or Lemma 5.6.9(b): \((x^{2/1})^{1/2}\) is \(x^{2/1 \cdot 1/2}\) which is x1 or x.

Trigonometry Boot Camp

Before we do trig, let's learn what \(\pi\) is. Wildberger talks about the history of pi such as Archimedes interval for pi, Newton's formula, a hexadecimal version of pi, his own formula. The end of this lecture he claims pi is some kind of complex object we still don't understand. At this stage we only need to know it's a constant, or the applied approximation of 22/7 or 355/113.

Let's take the Brown Univeristy trig boot camp. There is a crash course lecture. I assume you know the basic gist of what an angle is which will be better defined when we take calculus.

  • Radians you could even make up your own notation and define it as dblpi = 6.283… and then 360deg is dblpi, 90deg is 1/4dblpi, 180deg is dblpi/2, etc.
  • We really only need to know sine and cosine functions, all others are derived from them
  • In the scaling of triangles example where \(\frac{1}{\sqrt 2}\) becomes \(\frac{\sqrt 2}{2}\) they are rationalizing the denominator:
    • \(\frac{1}{\sqrt 2}\cdot\frac{\sqrt 2}{\sqrt 2}\) is \(\frac{\sqrt 2}{\sqrt 2 \cdot \sqrt 2}\) which is \(\frac{\sqrt 2}{\sqrt 4}\) or \(\frac{\sqrt 2}{2}\)
  • You can download the slides to keep a copy of the scaled triangles
  • The reflections think about subtracting 1/6 from 1/1, you get 5/6 so 5pi/6 is pi (180 deg) minus 1pi/6, and a negative direction means we are moving clockwise so move to pi, subtract pi/6 in other direction.
  • The end of the lecture we finally discover why sin() and cos() are wave functions, since it's a function that moves around the unit circle. Tan() is the slope 'rise over run'

Summary: There exists 3 triangles, which we should know their measurements, and scale them down since this is a unit circle with maximum x and y values of 1. Then we superimpose the triangles in the pi/2 (90deg) angle space on the unit circle in such a way that we have known values of cos()/sin() and other trig functions, because we can derive those other functions from cosine and sine. We reflect these triangles into the other areas of the circle so we can get x/y values there as well by rotating around the circle and then adding up all the angles to indicate how much we rotated counter-clockwise (positive) and clockwise (negative). A reflection is demonstrated here so if f(x) = sin(x), cos(x) and tan(x) then y = -(fx).

Try the worksheet you can actually draw these out and count rotations if you want or use this radian to angle picture from Wikipedia. The equations if one side is squared, then you need to take the square root of both sides so it's (not-squared = value) instead of (squared thing = value-when-squared). If you're stuck watch the video that walks through all the solutions then do the worksheet again until you can finish it without needing to look at the video walkthru.

How I solved these: subtract 1 pi if it's bigger than 2pi, or a 1/2pi if smaller and see what you get, look at the 3 triangles if they have a corresponding value for sin/cos. If not subtract pi/3, pi/6 etc. If you're wondering why we are using radians instead of degrees it is explained here. Wildberger's rational trigonometry uses a turn, so pi/2 is 1/4, pi is 1/2, and 2pi is 1. Angles are replaced by 'the spread of vectors'.

Linear Algebra cont.

3D affine geometry

We're watching this lecture: Three dimensional affine geometry | Wild Linear Algebra 9. He actually has a physical model of 3D affine space and shows how it projects to 2D. 3D (x, y, z) coordinates are explained with an interesting question 'how to model space?'. \(A^3\) space and \(V^3\) space are defined, insersecting and parallel planes are illustrated.

Equations of lines and planes in 3D

As typical with Wildberger lectures this goes far into more advanced material like Mobius bands to model all the lines in the plane, nobody ever showed me this in my linear algebra classes.

Lecture 10. Wildberger goes through different equations for lines: functions (calculus), cartesian and parametic. His example of cutting the Mobius band and not disconnecting all the lines reminds me of a lecture from Erik Demaine's advanced datastructures class and his fascination with origami. We get a new operation dmin(), when he finds the 'determinant of minors' you can see how easy this would be to write these operations as a program, and many optimized libraries already exist for doing so we won't be doing this by hand, this is to understand what those libraries do.

50:32 you can pause the lecture and easily calculate the dmin yourself for practice, in fact whenever he does some operation like finding the determinant see if you can do it yourself first.

Applications of 3x3 matrices

Lecture 11 the algebra of 3x3 matrices. A diagonal matrix, with constant diagonal, is a model or copy of the field of rational numbers, every number has a constant diagonal matrix. He goes through various transformations like reflections, projections and rotations in 3D. He shows the standard cos/sin rotation then rational trigonometry rotations, which makes rotations easier to manipulate when we are just learning, the basic rotations using sin/cos are similar. Generalized reflections and projections are exampled, you don't have to memorize everything here just know what's happening: a line can project itself or relfect itself onto the plane, see if you can figure out what the matrices are doing as they all have patterns. The perpendicular definition, now you know why ax + by + cz = 0 is the plane equation. There's only two exercises, this was a good lecture of stuff that's usually not described at all in linear algebra texts.

Finish this playlist

You can either finish this at your leisure or wait until we do the Putnam seminar and watch them when problems come up for linear algebra.

Po-Shen Loh

Let's see what CMU professor Po-Shen Loh is doing these days, since he's the current national coach of the US olympiad team who has won both the 2018 and 2019 competitions (and even wins the math competitions for former medalists) so surely there will be something interesting for us in his content.

Linear Algebra Explained

This is a professional development seiminar for teachers given by Prof Loh on linear algebra specifically why is the formula for matrix multiplication constructed the way it is, and how to demystify the determinant. I audited them to see if we could use them here, the determinant lecture is definitely worth your time.

Matrix multiplication

  • Session 1 of 3 - Why is matrix multiplication done in 'strange' order?

He begins talking about linear functions, which we know are functions that grow linear in size, using function composition as a model so g(f(x)). He writes his matrices using a full box/rectangle something he says he was taught in university to distinguish regular variables from matrices assigned to variables, like AB is actually matrix A x matrix B so he will draw a box around it to 'see it's shape better'.

1:06:00 or so he traces a variable from one function composition to another, distributes the variable at the end and shows that the multiplication you're doing at the end is exactly the same procedure of matrix multiplication, this part is worth watching if you're interested.

1:21:00 he describes the tech setup he's using for these videos: a lenovo laptop w/touchscreen he writes on, a green screen, and OBS Studio (free) to combine it all together and broadcast to YouTube. If you're interested you can see the camera (looks like a Blackmagic URSA Broadcast or similar) and huge lights he's using, plus filming in what looks like a closet to drown out noise and sunlight. He says everything is run on linux and customized the touchscreen writing software as the code is open source.

1:43:00 he demonstrates how Fibonacci sequences can also show why matrix multiplication is associative, also a method to rapidly find any Fibonacci index using an order of magnitude estimate. He claims this method is superior to the typical golden ratio method because you're no longer carrying around square roots of 5 and can get exact solutions.

He ends with 'some students ask me in this modern day when there is big data and all these algorithms (machine learning) what kind of math will be the most useful for them to learn, my advice is linear algebra because it's convenient for modeling these things with a linear approximation'. He claims he had to take linear algebra in university twice to understand it because what he was taught in highschool about solving systems of equations was totally different than the abstract linear transformations in university.


I couldn't find session 2. The idea of this lecture is how teachers can demystify determinants to students, pointing out (signed) area and determinant are the same thing. If you can watch the whole lecture.

Prof Loh models the determinant as the area of parallelograms which we already know from the Wildberger lectures. There is a very long setup of 2x2 row operations getting areas of parallelograms and the determinant formula ad - bc just pops out of the last row operation. Signed area is then talked about like negating it, meaning 'flipping' the area, reflecting it across the origin.

He shows a student hack he saw at 1:37:10 or so for figuring out the determinant of a 3x3 matrix where you add on two extra columns to diagonally group and multiply, which is useless for 4x4 or NxN matrices. He explains the method he learned in school and still uses is the same method Wildberger taught us of blocking a row and column, taking the remaining 2x2 determinant and alternating their signs to a(ei-fh) - b(di-fg) + c(dh-eg). He says this method we use is scalable and works for all dimensions.

From about 1:44:00 he shows how to explain to students the standard 3x3 det() formula: aei + bfg + cdh - ceg -afh - bdi. He explains to find out himself he tested various matrices using linear functions, so holding some items in a vector fixed and let a few things vary, so basically testing it like you would a program. Scaling a linear variable down to 0, if it's 0 then the area should be zero too, this means there can't be any constants left floating around the formula or you will get nonsensical area. In the q&a after he clarifies this.

He then takes every permutation of a row and shows that anything that multiplies two entries on a single row must also disappear from the determinant formula or else scaling breaks, the determinant formula multiplies a scalar twice in one vector.

Products from the same column breaks the formula as well. Recall that a 3x3 matrix is essentially 3 row vectors (x, y, z) so if you multiple anything in the same column, then you're exploding all the x vectors or z vectors while the rest stay constant, so you are stretching the signed area.

Finally why is there alternating signs. Take a standard abcdefghi 3x3 matrix and take the 3x3 determinant formula: aei + bfg + cdh - ceg -afh - bdi. Make a matrix for each one using 1's and zeros. So aei will be a diagonal matrix of 1's and the rest zero:

aei \(\begin{bmatrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\end{bmatrix}\)

Compare the negative matrices: afh, bdi, and ceg to the diagonal matrix aei. Notice that you are flipping two of the 3 vector rows, so for example aei if you exchange ei with each other one time it becomes afh. If you switch ai in aei then you get ceg, another negative matrix in the determinant formula. In order to convert aei to a positive matrix like bfg, you need to flip twice. So it's revealed the negative matrices in the det() formula are one's you only flip an odd number of times, and positive even number of times, from the aei diagonal matrix.

2:16:30 he covers questions and gives more examples of single variables breaking the formula, column stretching.

Most interesting part of this lecture is how he went about investigating the determinant formula, poking at it and experimenting around to see what happened when he broke the formula, modeling the formula using linear functions to see what happened, which is how you learn math from what we've seen already in Tao's blog posts and the book How to think like a mathematician. In a way this is also how you can learn some software you find in the wild with no documentation, disable some things and see what breaks. In other words poke at it to discover it's behavior.

Poh-Shen Loh's advice

His advice is we should concentrate on learning linear algebra, and the best way to learn math is through doing old competition problems. He claims math is the easiest of all subjects because 'there is little to memorize' meaning you can always derive things yourself using mathematical logic. In other interviews and videos on his channel I went through he claimed when he first was invited to a math olympiad seminar as a highschool student, he wasn't selected so the second year he was invited he decided to go for broke and tried to prove everything in the seminar using the most unusual methods he could think of and it worked, he ended up selected and winning a silver medal despite having little competition experience. Maybe there is something to his strategy of just winging it and seeing what you can prove naively yourself.


We will relearn this in a calculus class and can return to this chapter for more background. Everything in Calculus is a limit, like the derivative.

6.1 Convergence and limit laws

Returning back to Tao's Analysis I let's learn what a limit is after using formal limits for all of chapter 5. Think of any infinite convergent sequence such as 1.91 1.992 1.9993 … converging to 2 and that you have taken such a sequence and stuck it in an imaginary infinite array and indexed every value. This sequence does not end and goes on forever getting closer to the least upper bound. Everytime you subtract the limit from the next sequence entry you get a new smaller epsilon if it is a converging sequence. Try this yourself subtract (1.9 - 2), (1.99 - 2), (1.999999 - 2) and take the absolute value. If you choose an arbitrary epsilon of say 1, then every entry in the remaining infinite sequence will have an equal or smaller distance from 1 if it is converging, which is what the definition means by 'every epsilon > 0'. The further you go down the infinite sequence the smaller this epsilon becomes so that 'for all positive epsilon' there is some index in this array where if you pick any arbitrary epsilon like .0000000001, every future index will contain a value equal or smaller than that epsilon if it converging to a limit.

Prop 6.1.7 notice he has assigned epsilon to the value |L-L'|/3 which is why d(L,L') \(\le\) 2epsilon is d(L,L') = 2|L-L'|/3 I somehow missed that assignment and spent a few minutes wondering what was going on. He wanted a tiny displacement but why divide by 3 instead of 2? I'm guessing to set up the triangle inequality.

In Remark 6.1.9 we throw out the starting index, I also noticed he changed the index type from naturals to integers in these definitions.

These exercises can wait until we do a calculus course

Ex 6.1.1 assume we're using induction because Tao noting these are natural numbers. We want to show \(a_{n+i}\) is > an for all natural numbers i. If i = 1 then base case is satisfied, since n+1 > n. Suppose \(a_{n+i}\) is > an, then the successor of i \(a_{n+i++}\) is > an. By earlier propositions of addition I'm too lazy to cite, \(a_{n+i++}\) can be rewritten as \(a_{(n+i)++}\) or \(a_{(n++)+ i}\) and since n++ is greater than n by peano axioms we're good. If m > n, then we can assign m = n + 1 to the example sequence and by above terribly lazy induction proof am > an and by transitivity. All we are doing here is defining what an increasing sequence means. Note these indices are recursive, we are creating a structure adding +1 so we can take it apart like we would a recursive datastructure, that's why the indices for a sequence often end with n-2, n-1, n or begin with n-n, n-(n-1).

Ex 6.1.2 is asking is it possible for a sequence of real numbers to converge to another real number, if none of the conditions are satisfied so a proof by contradiction where you look in the logic appendix for if and only if statements and try to prove it false. Let's look at Def 6.1.5 and see if it's possible. Assume for contradiction given any real epsilon > 0, you can't find an index in a convergent infinite sequence so that all future entries are less than or equal to that epsilon. Then this sequence is not converging, and is divergent by def 6.1.5. If the difference between the limit and the sequence entry does not continually shrink meaning there exists some index that can satisfy every epsilon > 0 then it is moving away from the limit and contradicts the definition of convergence of sequences.

Ex 6.1.3 this looks a lot like ex 6.1.2, we have a sequence converging to a real number, and one index is bigger or equal to the other, so if a later index entry of the sequence is not within epsilon, obviously an earlier index can't be a convergent sequence either because then we don't have the property that n \(\ge\) N ie all future index values of some arbitrary epsilon displacement between c (the real number) and the sequence entries. This is in Remark 6.1.9 that the starting index is irrelevant only the fact that a sequence eventually converge to a limit is relevant.

Ex 6.1.4 another question about indices, this one you are adding an arbitrary integer to the index (which could also be zero) that acts like a stepping function, moving up in increments of 1000. If a'n and an are supposed to be converging to a limit, if you add 1000 to the index, a1001, a2002, a3003 better converge if a'n wants to converge or else the properties of def 6.1.5 don't hold. You could show this with k = 0, j = n + k, and now you have a sequence aj = an where an will only converge if aj does since they are the same sequence and n + k \(\ge\) n \(\ge\) N which is def 6.1.5 again, so adding an arbitrary step to the indices shouldn't matter.

Ex 6.1.5 Looking at Def 5.1.8 for a Cauchy sequence, for every possible positive epsilon there must exist an eventual index where two adjacent index entries subtracted from each other have to be less than or equal to that epsilon meaning of course they are converging so the distance between entries grows smaller and smaller. The limit definition, is the limit subtracted from an eventual index > N will always be within an epsilon displacement if it is converging to that limit. The triangle inequality for distance back in 4.3.3 states the distance from x to z, is less than or equal to the distance from x to y plus the distance from y to z. The distance aj - ak \(\le\) epsilon, and the distance an - limit \(\le\) epsilon so assign y = limit in the triangle inequality: aj - ak \(\le\) aj - limit \(\le\) ak - limit and of course are all within epsilon so a convergent sequence of real numbers is also a Cauchy sequence.

Ex 6.1.6 prove prop 6.1.15. There is a long proof answer here you probably want to try and understand but it's worth going back and seeing how he superceded the proofs of temporary constructions before. For example in remark 4.2.5 he defined the quotient x/y of two rationals then substituted in his temporary notation and definitions to prove the new notation and definitions led to the same conclusion, and thus could discard it and use the new notation. Here I did the same thing using 6.1.11 his proof that the limit of 1/n = 0 rewriting it using new limit notation substituted in Cauchy/formal limits, came to the same conclusion and now I claim formal limits are 'genuine limits' and can toss that notation. I also used 6.1.11 as a concrete example to try his contradiction suggestion, assuming 1/n was not eventually ε-close to 0 and found it was much easier to prove that it had to converge to 0 thus be the genuine limit of 1/n. Is this correct? The proofs of all the limits can be found on YouTube if you're wondering.

Ex 6.1.7 we look at Prop 6.1.4 and rewrite it to bridge rational bounded sequences from 5.1.12 to real number bounded sequences defined in 6.1.16, just like how 6.1.4 bridged rational Cauchy sequences to real number Cauchy sequences, you are using 6.1.4's proof as a template: Suppose that \((a_n)_{n=m}^\infty\) is a sequence in the sense of 6.1.16 and it is bounded by a real number M > 0 and an \(\le\) M for all n \(\ge\) m, in particular it is bounded by a rational M > 0 which makes it a bounded sequence in the sense of 5.1.12. Now suppose \((a_n)_{n=m}^{\infty}\) is a sequence in the sense of 5.1.12 that is bounded by a rational M > 0 and an \(\le\) M for all n \(\ge\) m. If M is a real number, then there exists an integer N > M by the Archimedean property… and your proof carries on like this showing that 6.1.16 is indeed bounded by a real number.

6.1.8 Try and prove these yourself, doesn't matter how awful your proof is just that you looked at previous Cauchy series lemma proofs like 5.3.6 and tried to adapt them to limits, like he does in Prop 6.1.11 so proving the addition of two limits converges to another limit, if an - L must be less than epsilon for every epsilon > 0, then substitute in L = an + bn and see what happens: \(|a_n - (a_n + b_{n})| + |b_n - (a_n + b_{n})|\) \(\le\) epsilon then lim\((a_n + b_{n})\). This is the point of doing proof based math so you can experiment around to better learn constructions and derive all the rules later if you forget them. There are answers floating around for Ex 6.1.8 but maybe you know a better/more intuitive way to prove that an + bn converges to the real number x + y.

6.1.9, why does x/0 fail, and 6.1.10 modifying 6.1.4 again you can figure out easily since we know you can always squeeze a rational number in, so an - bn \(\le\) rational epsilon \(\le\) real epsilon

6.2 The Extended real numbers

We patch the definition of real numbers with negative and positive infinity elements that are distinct from all other real numbers. All we can do with these two new objects is equate things to them, something is infinite if it equals negative or positive infinity. Tao points out cancellation of infinity elements leads to nonsense conclusions.

Example 6.2.7 and def 6.2.6 if a set 'E' contains negative infinity, sup(E) is that set with negative infinity removed and the least upper bound taken, so his example E = {-1, -2, ….} U {negative infinity} results in the set {-1, -2, …, -\(\infty\)} so there is a set of infinite descending numbers unioned with an object 'negative infinity' and sup(E) is -1 because we remove -\(\infty\), but are still left with the first set of {-1, -2,..} even though it technically is also negative infinity it is not the object \(-\infty\). Example 6.2.9 that set is equal to infinity but does not contain the actual element infinity.

6.5 Some standard limits

Very short chapter, the proof of 6.5.1 is a standard point-at-definitions style proof, see Lemma 5.6.6(a) and (b) and (g), if L = 1/x1/k then Lk = (1/x1\k)k\1 which is Lk = 1/x. For the exercise with xq, look at Def 5.6.7 and 5.6.9 again.

If you are weak with limits we will revisit these chapters numerous times there is no need to memorize everything here just know where to look it up when needed.

6.7 Real number exponents II

Here we learn what things like xe or xπ mean. The proof is huge, but Tao makes it easy to follow it's all things we've already seen.

Proposition 6.7.3 we use the limit laws to prove the usual exponent laws for rational numbers hold for real numbers. All you have to know is that a real exponent is a limit so you can use the limit laws to manipulate them in the future if needed otherwise the normal laws of rational exponents hold. Some solutions to the exercises are here on Tao's class page.


7.1 Finite series

Wildberger has a lecture on summation of arithmetic and geometric series to go with this chapter.

We are just manipulating indices here. Make an example series for yourself: ai = 11, 32, 53. \(\sum_{i=2}^3 a_i\) means a2 + a3 or 32 + 53 as per Definition 7.1.1.

In Lemma 7.1.4 the triangle inequality proof is here note how they get the base case of 0 by setting the variable on top of the sigma less than the initial starting variable so all indexed summations are zero (let b < a) like Tao's beginning of chapter examples with m - 2, m - 1.

Check the errata for this chapter, in my version (third edition (hardback)) there's a few problems. Remark 7.1.9 the proof is straight forward when you flip back to Lemma 7.1.4 and note that because he already assumed x to be g(n + 1) he could not also assign x to h(n + 1), which is why he needed to introduce h(j).

7.1.13 why is P(0) true? Because the assertion P(n) is true for any set X with n elements, so if it has P(0) elements, it's sum is 0. Reminder to also re-read what a cartesian product is, the cartesian plane itself is a cartesian product of every value of x with every value of y.

Proving all these finite sum identities are easy once you get one of them, all you need is 7.1.1 and the recursive definition he uses to pull out values, manipulate and reindex the sum.

  • Try 7.1.4(d) without induction, expand the series, factor out the c, done:
    • \(\sum_{i=1}^n (ca_i)\) = ca1 + ca2 + … + can
    • c(a1 + a2 … + an) by distributive law
    • rewrite as c(summation notation)
  • 7.1.4(c) expand ai + bi, group them together:
    • (a1 + b1 +.. an + bn) = (a1 +.. an) + (b1 +.. bn)
    • rewrite in summation notation
  • 7.1.4(a) let m = 1:
    • (a1 +.. an) + (an+1 +.. ap) = (a1 +.. ap)

7.2 Infinite series

Example 7.2.4 is also a Khan Academy lecture, SN partial sum the N is any index value greater or equal to the starting index, meaning you take the first 10 sums, the first 20 sums, the first billion sums. Plug in some values to his example, try a 4 as N:

  • \(\sum_{n=1}^4 2^{-n} = 1 - 2^{-4}\)
  • or 1/21 + 1/22 + 1/23 + 1/24 = 15/16
    • S4 = 1 - 1/16 = 15/16

Proposition 7.2.5 the proof is on Tao's class page see the homework solutions but it's things we've already done a dozen times, this is a partial sum from p to q, where if you subtract Sq - Sp you should get an absolute value smaller or equal to epsilon for some index that is bigger than N known as the 'tail'.

Really all you need out of this chapter is the series laws (Prop 7.2.14) for convergent infinite series and know later if it ever comes up, where to find these definitions and methods to test for convergence. For example in the book The Art of Computer Programming you will rarely ever use any of these definitions instead using the finite definitions.

7.3 Sums of non-negative numbers

7.3.7, do you get how he rewrote that sum to the geometric series?:

  • \(2^{k}\frac{1}{(2^{k})^q}\)
  • \(2^k(2^{-kq})\) by Lemma 5.6.9(d)
  • \((2^{k +(-kq)})\) by exp laws xk * xk = xk+k
  • \((2^{1 - q})^k\) which is \(2^{k - qk}\) by Lemma 5.6.9(b) (xq)r = xqr
    • the geometric series is the sum xn
    • \((2^{1 - q})^k\) is (thing)k or xn

Just flip through the remainder of this chapter noting things you can always come back to if needed to understand better, such as rearranging infinite sums.

CMU Putnam Seminar

The rest of this workshop we follow Prof Loh's advice and do competitive math problems in order to learn undergrad math. We will acheive this with the CMU Putnam Seminar and the book Putnam and Beyond second edition you can find on library genesis. Most of the lectures are archived on YouTube and the class page is here.

According to Stanford "The competition emphasizes ingenuity rather than knowledge, so freshmen are not at much of a disadvantage compared to seniors".


Let's watch the Putnam Seminar 9/4 introduction.

At 13:31: the top 500 scores are only ~23 points out of 120 max possible points, the exam is that hard, meaning you solved the 2 easiest problems and maybe had insight into a third but didn't solve it. He explains having just 2 insights to a problem requires a mental search tree of 100 x 100 options repeatedly claiming this not an exaggeration, the Putnam is actually this hard since no 3rd party materials are allowed during the exam though in 2021 it's not proctored and entirely online, so all results will be unofficial.

@30:00 the first seminar starts, the pdf he pulls up is here. The very first problem, the 8x8 block pattern he already has a talk about this that gives away the answer, you need 6 bits of memory. Third problem @31:45 'integral coefficients' means they are integers. If you forgot what a polynomial is Wildberger has a brief crash course describing their properties and graphing in this linear algebra lecture. A root means an input to the polynomial so it evaluates to 0. We will of course learn polynomials doing problems about polynomials as we go.

Now we go through ideas, poking at the problem to solve it like p(1) to sum all the coefficients, a strategy to show all the integers are neither even nor odd so can't be rational. Notice how he's just abstracting away using boxes ignoring all the values. If you forgot what modular arithmetic is Wildberger also has a lecture for this in typical Wildberger style where he introduces the problem it's trying to solve, the history and then at around 11:43 the rules of k congruent to l mod m. A conceptual crash course is: consider an abstract system of the type (D, 0, successor) where D has two distinct objects 0 and 1, and let 0++ = 1 and 1++ = 0. This is called residues modulo 2. The natural numbers become this mod2 system when each natural number is replaced by it's remainder after division by 2:

  • 0/2 is remainder 0
  • 1/2 is remainder 1
  • 2/2 is remainder 0
  • 3/2 is remainder 1
  • 4/2 is remainder 0

He cleared the fractions by multiplying the polynomial by cn which you can see here (rational root theorem), which makes sense if (b/c)n then it's (bn/cn) ie (3/4)2 then 3/4 x 3/4 or 9/16. This introduction is worth rewatching a few times to learn how this style of problem solving works.

His other site Expii Solve has similar kinds of challenges they start easy and get harder with links at the bottom of each question where to find background material, you may want to try those too they help you learn all kinds of topics such as spherical geometry, probability and even calculus. It's a community written site so kind of like math overflow where anybody can write a tutorial for a subject and these are upvoted if other people find them useful, which could be good or bad, a well written Tao-like fundamentals post would likely not be upvoted compared to some bag of tricks with a flashy youtube embed.


There are more problems and all are proof by contradiction since he just showed us how to come up with a contradiction. Problem 2 looks the easiest for us, there is a closed set where there can't exist the same negative or positive number, like -3 and 3. A contradiction that immediatley comes to mind is multiplication, what if 3 exists in the set, and -1. Then 3(-1) is -3, which can't happen. The strategy now would be to prove -1 must exist in the set, since 4 + (-5) is -1, so then we disallow -5 but 6 + (-7) also results in -1. The pattern here is p + (-)(p+1) = -1 so we have to eliminate all negative p+1 numbers, and 0, leaving us with just the set of positive rationals.


Let's learn some basic geometric derivatives as they will be required for the Putnam seminar. This will also prove to you that you can already do any calculus course you want. If you want a full, typical university calculus sequence then try (workshop todo).


A derivative tells you what is happening at one point x of space or at one moment t of time. If dy/dx is > 0 or df/dx > 0 then either y (the graph) or f (function) is increasing. The derivative's origins are the problem of finding a tangent to a given curve and the problem of finding the velocity given the position over time.

Let's begin, watching the first lecture. There's a course reader how to graph functions and move them around which is what you think it is, if you add a displacement to the input x then the y coordinate also moves. Watching the MIT lecture, note the point x0 just means 'a fixed point on the x axis' so it's a placeholder/variable for any arbitrary point on the x axis.

At around 17:40 or so pause this lecture to find out what he means by 'the limit delta-x goes to zero' it means that tiny delta-x displacement moving towards x0, which is it's zero point since it begins at x0. This is visualized here.

Delta-Y/Delta-X the alternative notation, why is y in the numerator? Because (f(x0) - f(x)) is (y0 - y) the output of those functions.

Non-polynomial derivatives

The lecture continues with a good breakdown of finding the derivative of 1/x, which is a rational function and not a polynomial. It is the inverse to xn which is a polynomial. He only used Wildberger algebra from the very first lectures of this workshop recall the laws of division: \(\frac{k/m}{n}\) = \(\frac{k}{mn}\) and he used the distributed law to factor out the 1/delta-x so it became 1/n(k/m).

27:10 of the first lecture, how would you calculate the area of the triangle below the tangent line of = 1/x, stop it at 28:32 and see if you can solve this problem on your own before him showing you the answer. Make your own Putnam seminar exercises for example at 33:00 can you simplify that tangent equation before he does.

  • 0 - \(\frac{1}{x_0}\) = \(\frac{-1}{x_0^2} \cdot (x - x_0)\)
  • this is:
    • 0 - \(\frac{1}{x_0}\) = \(\frac{-1}{x_0^2} \cdot \frac{(x - x_0)}{1}\)
  • standard wildberger algebra:
    • 0 - \(\frac{1}{x_0}\) = \(\frac{-x + x_0}{x_0^2}\)
  • add \(\frac{1}{x_0}\) to both sides, cancelling from left side:
    • 0 = \(\frac{(-x + x_0)}{x_0^2}\) + \(\frac{1}{x_0}\)
  • more standard wildberger algebra:
    • 0 = \(\frac{((-x + x_0)(x_0)) + 1(x_0^2)}{x_0^3}\)
  • distribute:
    • 0 = \(\frac{-x(x_0) + x_0^2 + x_0^2}{x_0^3}\)
  • combine:
    • 0 = \(\frac{-x(x_0) + 2x_0^2}{x_0^3}\)
  • we can remove one x0 from each side of the addition
    • why? because this is: \(\frac{-x(x_0)}{x_0^3}\) + \(\frac{2x_0^2}{x_0^3}\)
    • 0 = \(\frac{-x + 2x_0}{x_0^2}\)
  • solve for x, add x/(x0)2 to both sides
    • \(\frac{x}{x_0^2}\) = \(\frac{2x_0}{x_0^2}\)
  • clear fractions multiplying by \(\frac{x_0^2}{1}\)
    • x = 2x0

The last area calculation:

  • \(\frac{1}{2}\)(2x0)(2y0)
    • this is:
  • \(\frac{2x_0 \cdot 2y_0}{2}\)
    • cancel one of the scalars:
  • \(\frac{2x_0}{1} \cdot \frac{2y_0}{2}\) = 2\(x_0 \cdot y_0\)
    • since y = 1/x (or y0 = 1/x0)
  • 2x0(\(\frac{1}{x_0}\)) = 2

You can read all this in the notes though they don't really explain steps, expecting you to figure it out. Reminder this is the geometric interpretation of a derivative being used to find a tangent. There exists other interpretations of what a derivative is like velocity or a limit.

Polynomial derivatives

The polynomial y = xn and it's derivative nxn-1, the 'junk' notation of \(O(\Delta x)^2\) is a bound meaning anything squared or larger is discarded, and since delta-x goes to 0, so once you substitute in 0 for delta-x that term disappears.

Discrete Math

CMU's 21-228 Discrete Mathematics course is being live streamed this semester here. The class page advises using the book Discrete Mathematics by L. Lovász, J. Pelikán, and K. Vesztergombi as optional reading if you want further clarification of whatever comes up in lectures.

I'm adding it into the daily problem set because why not, it's combinatorics not requiring much background. Let's start with the first lecture which as I write this, begins a live stream in 20 minutes. He's breaking down some elementary combinatorics problems to show the logic of solving them. The first one he did in reverse. @37:40 why does the last digit have to be even? Because it's supposed to be an even 4-digit number so ending in 0, 2, 4, 5, 8. The general form is crazy but he did it by writing down a table of one digit numbers, two digits .. n + 1 digits to find the general pattern.

Assignment 1

Let's try some of the assignments.

1.) The first is a symmetric difference assignment which is defined here. Look under properties, the answer is there 'from the property of the inverses in a Boolean group, it follows…'. We can eliminate B and are left with:

(A \(\triangle\) C) \(\triangle\) (C \(\triangle\) A) and from that same article, we can eliminate C so are left with A \(\triangle\) A which is just the empty set.

2.) We have a general set bigger than 3, and we need to know how many contain permutations of only: {1, 2}, {1, 3}, {2, 3}. What if the set is {1,2,3,4}? or 24 subset, how many will include 2 of our numbers of the 16 generated subsets? Let's find out: Using some free powerset 'calculator' I randomly googled, I count 6 occurences for 24, 12 occurences for 25. So we have 3, 3x2, 3x2x2 and I'm going to assume 26 will be 3x2x2x2 occurences or 24. This pattern looks like 3 x 2n-3 for n \(\ge\) 3 and if you were to prove this, the inductive n + 1 step is: 3 x 2n+1-3 which is what we want, input power of 5 and the successor (n+1-3) is power of 6 with 3x2x2x2.

3.) Do you remember the Wildberger lectures on Pascal's Array on the number of paths to the binomial coefficient in other words n choose k? Think about paths on a grid and how to exclude one point, all of this is in the course book which is your reference to help solve these problems, not something you have to read end to end.

The Binomial/Trinomial Theorem

Wildberger has an extended lecture here constructing the notation for n choose k from scratch.

Daily Challenge

Now this workshop is a problem per day, get the second edition Putnam and Beyond by Gelca and Andreescu from library genesis. These will be hard but not impossible problems, I'll throw in Poh-Shen Loh's Discrete Mathematics course materials and lectures as well since they are the exact same style. Putnam and Beyond is a great book as it will fill in all the blanks for algebra, geometry, calculus, combinatorics, probability and number theory. Every problem in the book has a fully worked out solution but as per Poh-Shen Loh's advice we should spend a long time thinking how to solve these ourselves before looking at the solution or you will get nothing out of this material. By long he means 'a day or more'.