Software Development

Table of Contents

Start here

I assume you are teaching yourself at the same time some kind of intro course you found on YouTube or a book somebody recommended such as:

Strategy

Find a mentor. There is someone somewhere doing what you want to do and making a lot of money doing it and your task is to get in with them somehow and absorb everything. Once you are in all you have to do is be the reliable person they turn to whenever some (tedious at first) work needs to be done and if this work over time becomes increasingly more skilled then you have found the right mentor and you are set for life. If they get promoted, you get promoted. If they found a company, you are the first hire. If they get hired at a top tier trading firm guess who they will tell the hiring managers they should also hire.

You approach this person and sell yourself to them not through cringe inducing fake flattery but telling them you are available if they have some work to offload because you want to learn it all from the ground up. There is always labor to do and your goal is to be this magic portal where work can be dumped and it returns as per specifications they gave you. Then there's time for them to show you things and you make your way up to highly sophisticated work. You can find a mentor in your community, sometimes in an open source project, or at work get the lowest possible position at some place that has elite developers then seek out where the critical work is being done and approach those people/person.

In my example I had to do it in my spare time but to me it's free school so I didn't care. Imagine you are hired somewhere and after the usual 3 months onboarding you get good at what you're doing so can finish the work in a few hours, then you go approach some coworker in the critical department being paid millions per year and you ask them hey I want to know this, give me something to do. They will but you have to be 'likeable' and what does that mean? It means you shut your mouth and just do the work at first until you have proven yourself to be trusted. Don't be that guy who is hovering around, harassing, needy, annoying, or contradicting them with some foolish 'actually if we do it this way….' imagine telling someone with two decades of experience at the top of their game that they're wrong. There will be opportunities later to voice your opinion but for now just be their contractor. When people go home at night they should never think about you at work. You should just be a tool they use and then you will advance. Nobody uses tools then goes out for dinner and thinks about the tools. Lot's of people can't be the tool they need constant validation, they need to be friends, they need whatever weird things that change you from 'glad this person is here' to 'how do we get rid of them'.

I lived a ridiculous parody of a life running in the streets until age ~30 and shouldn't get into places I do. Therefore I have to be aware of my physical presence and this is actually an important point. When you are working with people doing highly technical, cerebral and sensitive work if you are like me towering over everyone and almost brutish with many social glitches you have to be self aware and just be furniture. You leave work parties early so your personality doesn't annoy because I can definitely say things to make people rage. In the working class world this doesn't matter but when you roll with people who are elite at what they do it defintely matters. The person who is my mentor I don't tower over them threatening them with my very existence I sit down and try and be invisible. Whenever someone talks I listen to them and this is critical I would say to how far I've come. If spicy topics come up in conversation I'm not going to rant on cruise control I always listen and ask them questions why they have the beliefs they do I would never contradict them and be a fool demanding they conform to my opinions. This is the kind of game you have to do to get into this world and make a ton of money but of course you keep your own friends, have your own life it's not like you are not winning you are strategizing to maximize your potential with the full realization that likeability is a factor in all success. Once you make it you're free to be the Theo de Raadt ultra based developer because you're so good they can't ignore you but none of us are at his level yet. With a mentorship you will be that good.

Plan

I'm doing all the following at the same time.

We should learn how a browser works and see some examples of the Model-View-Controller pattern. The model is a representation of the data, the view is what is to be displayed and the controller does transformations to the model while interacting with the outside world. The controller updates the model which in turn updates the view since it relies on the model. Every year MIT has a web lab and publishes all the lectures on YouTube and the examples they work on in the lab is freely available on GitHub. The point is to be enough of a crash course you can deploy a simple MVP on free hosting. We will make our own page on free hosting to sell ourselves that's our goal.

We should learn algorithms from USACO and a graduate course on design. Anyone can do this so why not and it helps to practice all this so if you get interviewed somewhere you can crush it. I'm going far beyond any kind of interview for fun you don't have to it but you can so why not.

Finally Andy Pavlo runs two fully open courses at CMU where even the grading scripts are available and the advanced course teaches you how program a modern distributed and optimized application the exact kind of application everyone else is doing in 2024. Nothing else exists like this anywhere we should take advantage. Everything taught here translates to the kind of work some hedge fund in London or Australia would do who can't find anyone to hire right now. I'm also going to explore the AI driven Postgres deployments because this is learnai and we can go freelance after doing ETL or even as the lowest dbms admin and move within to find a mentor.

Some of the content is C++ but the majority of the hard content is any language you want in fact cmu's optimizer for the 2024 dbms course is all in Rust but you can write your own it doesn't matter. I've never written production code in C++ but I have had to read it many times.

Curriculum

These high-freq trading outfits typically write customized software that avoids the C++ STL containers instead hand rolling everything like circular buffers/queues and bypassing the operating system to do direct memory access on network cards. They manually finesse garbage collection in JVM languages or OCaml and try and avoid any kind of heap allocated objects. In other words they have extensive knowledge of language runtimes, system memory architecture, algorithm analysis and know enough classic queueing theory to see patterns in message traffic medians that can be optimized. This is exactly what we learn in the above curriculum and you should try it if this is your bag.

USACO guide

In competitive programming you submit source code as a solution to a problem and the online judge compiles and runs your code against tests. There is a huge list of online judges available like Kattis, SPOJ, Timus, Aizu, and more worldwide.

First task is to choose one of these languages. We'll learn C++ for USACO so pick something else here that you want to use. List all the problems by easiest and start with the very first one 'Velkomin!'. You get unlimited submissions to the Kattis judge.

Input/Output

Read the book or online documemtation of the language you chose or use the USACO guide first chapter to teach yourself how input/output works in C++/Java/Python. There's the USACO online IDE too if you want.

The problem qaly is an example of how to take multiple lines of input. Many of these problems the first input will be how much more input to expect which you can then loop on:

#include <iostream>
using namespace std;
int main(){
    int tt;  
    double sum = 0;  
    cin >> tt;   // pipe first stdin to tt
    while (tt--) {  // false when input empty 
        double a, b;  // each loop declare these 2 new assignables
        cin >> a >> b; // take 2 more inputs
        sum = sum + (a * b);  // accumulator
        }
    cout << sum;  // pipe sum to stdout
}

Finish up to 1.4 Easy problems on Kattis or until you get how I/O works in your language.

Real competitive programming

Here is Gennady Korotkevich aka tourist practicing on codeforces and explaining what he's doing, though he didn't start with C++ he used to win the IOI with Pascal in high school then switched later for the ICPC. The first thing to notice is he's using a basic text editor and minimalist shell that looks like MinGW and a file manager where he manually submits through the codeforces website UI. He prepares individual directories named after the problem, then always makes a sol.cpp file in that directory so he doesn't have to change a macro he's written to compile the solution from the shell.

His C++ template:

// include every C++ std library and bits library
// doesn't work on all platforms
#include <bits/stdc++.h>

// use std namespace to avoid having to write std::cin or std::cout
// in other words bringing into scope all C++ std libraries 
using namespace std;

int main() {
// turn off default cout flush 
 ios::synch_with_stdio(false);
 cin.tie(0)

std::cin is synchronized to std::cout and performs default buffer flushing operations so to save some runtime he disables this sync. His compiler flags he explains in one of these recorded streams, or you can figure out yourself as you go, most of them are to make compiler messages extra verbose for debugging. Note you would never write software outside of competitive programming like this because he has opened a gigantic scope.

You don't have to watch the entire 2hr vid, the last problem seems like the most informative. He begins most problems by writing himself examples as comments. He copies the problem input test cases into a text file 'in1' then pipes it into his compiled program "sol.exe < in1" from the shell as stdin (standard input). He writes multiple submissions to compare their runtime/memory usage. He looks for some kind of mathematical structure in the formulas given in the problem writeup. After submission he looks up the solutions and compares them to his own.

A time limit exceeded verdict shows how he would optimize to get accepted verdict also trolling the stream by making all the variable names to single letters claiming it will speed up the program. The judging server ran his program faster than his own laptop, something to remember if debugging and believing your solution still isn't optimized enough, resubmit anyway as you get unlimited submissions. We aren't trying to be competitive programmers but if you wanted to you could with what we're learning here.

USACO bronze

We are now bronze level. There's a set of videos for bronze free here.

USACO online judge only accepts C/C++, Java or Python but you can also learn from the guide then practice on Kattis using Methods to Solve or the translation of e-maxx here then you can use any language you want. I will do everything in USACO in C++ and the extra problems from e-maxx or Kattis in OCaml.

Time complexity

Skim through time complexity.

Think about how you would invent the idea of costs yourself. First we want to impose some kind of cost structure on a program that captures behavior and produces a math function so we can manipulate it separately. There's different cost models and most algorithm textbooks use a simple random access machine model where all writes and reads to memory cells have constant costs. This doesn't work at all for parallel programs, and textbooks never consider higher-order functions because you don't know what kind of functions will be given to the program as inputs and can only guess using the fixed cost of some existing function.

A cost using only natural numbers can be derived from the language where each step of execution incurs a cost, so we can try step counting assigning a cost of 1 for each step of evaluation. In a simple loop like a while/for loop we multiply whatever costs are in the body of the loop by the amount of times it is looped and typically this is size of the input. If there's another function call in the body then we have to figure out the cost of that function and it is multiplied by how many times it's looped.

for 1 to n do:
 - perform a move step +1
 - perform some other step +1
 - call a linear function +n

Above example is n(2 + n) there is 2 + n in the body of costs, multipled by the input n.

fun rec(n):
 - check if n is empty +1
 - move around stuff +1
 - divide something +1
 - call rec(n - 1)    

Above is a simple recursive function and the entire function is now the body of the loop and multiplied by how many times looped like in the previous case so cost is 3n. A non trivial recursive function is usually analyzed by extracting a recurrence relation which we'll see later if you didn't already see it in the intro courses listed in the beginning of this page.

The 4 suggested resources:

First is 'IUSACO' 3-Algorithm Analysis here. It's only chapter 3 and 4 pages long. The grading server can do 100m operations per second. We will read a paper soon on how wall clock time vs CPU time works giving us a crash course on how to analyze algorithms on real life hardware. Page 8 where did the 5n + 17 come from? If you had a program that had 17 constant operations then performed a loop n times with 5 lines of constant ops in the body, and if you were using a model of computation like the ram model meaning memory can be accessed and written in O(1) constant time then the function to describe the complexity of your program is f(n) = 5n + 17 or O(n).

Next is 'CPH' 2-Time Complexity here from the competitive programming handbook. This is already known to us from the intro courses so let's look at 2.4 Maximum subarray sum. If it doesn't make sense it means which sequence or combination of neighboring elements produces the greatest sum. Try to solve this yourself before reading their solution. Remember you can prep something with multiple linear consecutive runs since O(n + n) is O(2n) which is of course O(n) and in this problem keep removing the negative first and last array element since why would you keep them. The solution they give is Kadane's Algorithm.

Next is 'PAPS' 5-Time Complexity here and the wall clock (using time() in the shell) is again mentioned but we'll learn you can't rely on wall clock time. Insertion sort we already did in the intro to CS courses where we learned for small data insert can be a constant operation so overall complexity only O(n) (since sort will always be O(n)). Page 91 he substituted ti for i and since this is a sum where the counter starts at 0, it's going to sum 0 + 1 + 2… up to n - 1 times and we already know a sum from 1 to n closed form is (n + 1)n/2 or here (n - 1)n/2 but we can already see the dominating power is going to be n2 and this is followed by an example how n2 is both an upper bound of n2 and n. The proof of the bound first he noted that if you multiply N >= 1 you get N2 >= N and this fact is used in the inequality to help reduce that side to 2N2 a trick we learn in The Mechanics of Proof.

The video they suggest goes through how to find function f(x) by analyzing the source. @11:03 remember you multiply everything in the body of a loop by the amount of times you loop. Without doing any of the analysis I can immediately tell this is n * c * n3 or n4

Caveats

  • not all loops perform an expensive operation during every loop so we will learn amortized analysis
  • a language based model of computation learned in Functional CS is the modern way instead of pointer machine or ram models that assumes everything is constant read/write
  • in virtually all papers I've read in CS they try and nail down constant factors by showing a multiplicative error or by taking it's log show an additive error and you often wonder where did they get that bound, if interested then take the advanced time complexity material below.

C++ STL

C++ has versions C++98/03/11/14/17/20/23 these all represent years there was a new standard. The current state of C++20/23 or 'modern' C++ is you can do functional programming with it. Options, lambdas, higher-order functions, monads and more all now exist in the definition depending which compiler you use. Google has rage quit C++ and is writing their own replacement called Carbon and as a result clang/llvm has not even implemented all C++20 features. If you want to use C++20/23 in some project you have to use GNU gcc or Microsoft msvc.

Competitive programming typically only accepts C++11/14/17 which we can refer to online documentation for thus we really only need to learn the STL or the standard template library. The STL to me seems like a big hack to make C++ a generic programming language a concept invented by ML in 1973. If you took the Standard ML course listed in the intro to this workship then generic programming is simply polymorphism or 'a types. If you took the accelerated intro course then generic programming is the same as 'type variables' in other words a function who's type is determined by whatever parameters you give it:

# a generic maximum function of any type T

fun my-max<T>(a :: T, b :: T)-> T:
  if (a > b):
    a
    # if not(a > b) then a == b or a < b   
  else:
    b
  end
where:
  my-max(2, 3) is 3
  # a is unicode 97, b is 98
  my-max("a", "b") is "b"
end

The C++ STL was written by Alexander Stepanov and if you look through the source of the STL you see a bunch of template<typename x, typename n,> type code. These are templates to enable generic types in C++ functions. He wrote a book about it and we can audit the entire book right now because it's only 200 pages or so. Go to a library genesis domain you have access to and get a copy of Elements of Programming.

TODO


Home