What's the current 'state of the art' as way as software security only because I don't want my own software to be broken.
CSCI 2951U Topics in Software Security
From the overview:
"CSCI 2951U investigates the state-of-the-art in software/systems exploitation and defense. More specifically, the course is structured as a seminar where students present (along with the instructor) research papers to their peers. We will begin with a summary of prevalent software- and hardware-related defects, which are typically found in applications written in memory unsafe languages, like C/C++, and/or contemporary architectures, such as x86 and ARM, and proceed to surveying what we are up against: traditional and modern exploitation techniques, ranging from classical code injection and code reuse up to the newest goodies, like JIT-ROP, Blind ROP, and software-based microarchitectural attacks. For the bulk part, we will be focusing on the latest advances in protection mechanisms, mitigation techniques, and tools against modern vulnerability classes and exploitation methods."
How the course works:
"Every week we will be discussing (a set of) research papers. Students are expected to read the assigned papers and write a short review (critique) before each class…..(Your paper review) should discuss the pros and cons of the proposed idea, protection mechanism, or bypass technique"
The only lecture with slides. There's prereqs but they're going to review the most important concepts anyway, great. The first paper they want us to read is this SoK: Eternal War in Memory. Get the DOI number from that page and paste it into Sci-Hub or click on PDF.
Reading the abstract they setup a general model for memory corruption attacks, pointing out performance is a problem for better memory policies so nobody uses them. We read this with the course advice in mind to 'discuss the pros and cons of the proposed idea, protection mechanism, or bypass technique'. Let's read the paper.
This paper talks about pwn2own which you can watch day 1 here and around the 6hr mark there's a good interview with an Egyptian hacker who says it takes 3 months to understand how a thing works, then not very long after to find a vuln using IDA Pro to reverse engineer said thing. He says in the beginning he knew nothing and was just persistent to get to his level. There is a moment of cringe when the interviewer implies that web hacking is somehow not as 'hardcore' as blowing up junk VMs or reversing some junk proprietary binary. Egyptian hacker guy seems unimpressed with this question and repeats that you just have to put in the work to understand whatever it is you're trying to exploit. He ends with saying the biggest problem is nobody is looking at solutions which is something Egor Homakov another security researcher has written about many times: actual investment into real security is a waste of time to most software companies when the current market for software is like a slot machine you add as little investment as possible and pull that handle until you cash out meaning somebody bigger buys your software then security is their problem. I'm also amazed anybody wastes their time with these competitions, if you look at their 2021 live results, some team chained together 6 bugs to execute code but because a single bug was declared to be already known (and of course not yet patched) they were screwed and received nothing for their work.
C crash course
Watch the second lecture from CMU's 15-213 playlist to understand what a bit is, what a byte is, what two's complement is, TMax and Tmin, hexadecimal encoding, basic shift operations, you don't have to watch the whole lecture where it later goes into type casting.
Next these notes to learn the C memory model. They are a great resource, getting right to the point and explaining just what we need aimed at a reader who has functional programming experience in Racket, and I assume you've done CS19 in the other workshop and Pyret was originally built from Racket so they are similar languages.
Start reading at 2.1 Basics look at the C memory model diagram. Each row is 32bits (one word) which the author notes that 32bits is used for the purposes of making it easier to read this guide and it's similar to how a 64bit model works. He goes through a sample C program, these are all sequentially executed statements that return nothing (void) they produce side effects and the return statement is an exit back to whatever program called it and the numbers indicate error codes. Moving on to 2.2 and 2.3 you can see an example of prototypes or external global variables here in MIT's implementation of xv6 which is a V6 unix clone they use to teach their operating systems class. Some of the header files have external globals some don't. At the end of 2.3 there's questions: what happens if we exclude x file? The compiler can't link, if gcd-driver.c forgets to include gcd.h then the global variable being used in gcd-driver will fail, the compiler will also give warnings no prototypes when it compiles gcd-driver.c and doesn't know the types of the parameters of the gcd function inside gcd-func. This will all be covered when we do linking.
2.4 Iteration if you've taken CS19 in the compsci/software workshop then you already know what structural recursion is, it means taking apart the structure as it's built whereas accumulative recursion means you have some kind of variable that holds a value which accumulates and then you return it.
Of course you can translate all the Racket programs in this book to Pyret (if you've done this)
fun sum1(n :: Number) -> Number: doc: "Structural Recursion, if structure is a natural number n++ then disassemble by n--" if n == 0: 0 else: n + sum1(n - 1) end where: sum1(5) is 5 + 4 + 3 + 2 + 1 end fun sum2(n :: Number) -> Number: doc: "Accumulator recursion, keeps a running tally and returns it" fun sum2-helper(counter, acc): if counter == 0: acc else: sum2-helper((counter - 1), acc + counter) end end sum2-helper(n, 0) where: sum2(5) is 5 + 4 + 3 + 2 + 1 end
In general you prefer a while loop when the range isn't known like user input prompts, and a for loop is preferred where the range is already known. There's a comment that Haskell performed this computation at the same speed as C with far shorter/declarative syntax.
2.5 Computational Model 'FICS' is not ready yet and is a work in progress if you're wondering, it's a functional intro to cs he's still writing but we already took one in the compsci workshop doing CS19 which is equivalent. Here we learn read-only memory, stack and heap. If not clear, #include <stdio.h> means open the namespace of this library filled with standard input/output programs so you can call them by name and they're all in scope.
We have to detour into assembly language to learn exactly what stack frames/pointers are in the C level abstraction of memory. Luckily for us there are great lectures for this in CMU's 15-213 course, then we will come back to this text. We're just skipping through the lectures, you can get the CS:APP book the course uses later and learn every detail of x86-64 architecture later if you want as we come across it. For now this is just an introduction
Let's begin here Machine Level Programming| Basics. There is some really good info on history, how to generate object code w/gcc, but the lecture for assembly code starts at @47:54 talking about registers which at a high level is just a storage location in the CPU it can access very quickly. 'Memory' for the purposes of this quick introduction is any storage that's not a CPU register, so cache memories or RAM. Interesting @ 55:15 in x86 assembly you can't copy from memory to memory, it has to go through a register for the benefit of the companies designing these things he claims.
'Dereferencing' a pointer in C, using the * syntax, means retrieve the value stored at that memory address. 'Referencing' or the & syntax means return the memory address, and usually set it to this pointer variable so we can directly access the value later. In the lecture he goes through Swap() completely to understand this.
- movq (%rdi), %rax
- The brackets in a movq instruction mean the memory address that %rdi points to, move that address to %rax whereas movq %rdi, %rax means move the value in %rdi to %rax.
The rest of this lecture is things you don't have to know yet, there's no reason to memorize x86 assembly instructions we will come across them many more times later. Let's move on to the next one TODO