Software Development

Table of Contents

Start here

I assume you are teaching yourself at the same time some kind of intro course you found on YouTube or a book/curriculum somebody recommended such as Accelerated Intro to CS

Motivation

This is born from conversations I had with trading shop recruiters during work socials who tell me a tale of woe how much harder it has become in 2025 to find good entry-level candidates from schools so I asked them what would a non-degree candidate need.

Their optimal candidate:

  • has a very high standard of quality
  • knows about the CPU execution pipeline/cache memory hierarchy
  • knows their chosen language and can explain it's runtime
  • knows the standard undergrad algorithm design and problem-solving strategies
  • is at least familiar with writing multicore code or can learn
  • knows what they don't know
    • they catch the candidate bluffing during the interview and ask them to define on the spot any jargon used

These places train everything else including whatever customized tooling/compilers or languages they use.

The work

There are many reasons why you'd want to work for trading firms (besides the money) and most of them like Valkyrie will hire juniors.

Every day you are the race car engineer of software similar to how they will rebuild an engine and retune the car after each race to squeeze out more optimization that's what you do at many of the fun firms. There's also work in building high-performance math/AI models and fancy client dashboards or even FPGAs.

Recruiters

Jove Intl, Oxford Knight, Reload Search, and ex-employees who have quit to become recruiters. They hire internationally for Chicago, NYC, Austin, London, Sydney, Hong Kong, Mumbai, Shanghai, Tokyo, and now Singapore seems to be the location everyone is opening offices these days. LinkedIn is trash for applying directly but works fine as a net for harvesting recruiter offers.

Selection

I got to try their fancy recruiter tools to see the minefield of bad candidates they have to filter.

They told me all resumes are reviewed manually there's no automated applicant tracking system. USACO training or competitive background in anything from sports to math stands out because everything in finance is hyper competitive and it shows you have some kind of drive and can teach yourself. Experience in 'modern C++' meaning C++20/23 stands out but any firm will train you so long as you already know at least one language well. You never advertise yourself as a junior, companies will hire anyone who knows how to sell themselves and can pass the competence filters.

Campus recruiting

Getting more and more monopolized by big tech corps pressuring students with 2-week expiring offers (aka exploding offers) that are against campus hiring guidelines but they do it anyway. Think about how much money is spent on campus engagement to find (highly paid) interns and they have to hope the intern comes back after graduating and many don't. This should tell you how motivated they are to find junior candidates willing to stay to become senior staff so if they find you as a rough diamond in a sea of mediocre candidates it's an easy investment for them.

Phone screens

The 'phone screen' today is typically a technical interview over Zoom or with CoderPad. I don't know if they still do this but one Chicago trading company was notorious for firing off 3 problems to solve on HackerRank immediately after applying online. Some I found also ask you to submit code for a problem they define on the careers page at the same time as your application basically saying show us your best code.

The new fizzbuzz filter is to implement atoi and tests how careless of a developer you are meaning you don't ask any questions about requirements and start writing code immediately. Here is a typical screening interview using Coderpad.

On-site

If you pass the phone screen then you are paid to fly out and do a 'super day' that's where you do back-to-back interviews and job shadowing. You are often interviewing for every job in the company not just the one you applied for as they will see where you best fit so if they start asking you about math or fermi style problems they may think you are better placed in trading or even in the research dept.

The strategy for passing these is simply take your time and don't make any mistakes. It is better to be glacially slow and pursue all potential problems as you reason through the problem space than it is to come up with the O(1) solution while missing a software bug they have to point out to you. Remember the most desired trait is a high personal standard of quality everything else is secondary and can be taught but nobody can teach you to care about what you're constructing. When you make a sloppy mistake like this it's the same as a professional chef during an interview wiping his nose with a rag then using the same rag to clean customer plates before sending them out it means you can't be fixed and it's over.

Bluffing test

You are tasked with a seemingly impossible problem to solve and repeatedly asked to justify all reasoning as you attempt to solve it. Don't worry everyone is told their reasoning is wrong you aren't failing the interview they are seeing how you react. Someone used to bluffing their way through work meetings suggesting solutions they don't fully understand will be ruined here and this is of course the point they are making in this interview is you should know what you don't know. Candidates here with an impressive pedigree may start acting very aggressive as they've never been told they're wrong and the firm wants to see how you handle being told your idea is stupid.

Where do you learn how to justify your decisions? Game Theory which is a bag of analytical tools that teaches you about decision-making and of course one of those large algorithms text books full of proofs but nobody will expect you to know game theory or prove anything.

There will be someone eventually in these 'super day' interviews who goes through your entire resume and tests you on everything there so if you write down that you know Java they will ask you to describe in detail how the language works. It's fine if you don't know but say you don't know don't start bluffing because it will go very bad when they insist you define all jargon on the spot.

Quality test

The trap problems I've seen have an obvious area of optimization to take advantage of but will open a potential bug or performance sink elsewhere if you aren't taking your time thinking through the whole problem space and asking questions. One example is cache invalidation your slop coder optimization you pounced on requires the cache to be invalidated because you changed some data dependency.

Find a mentor

Once you are in and develop a routine, meaning you are no longer panicking, if internal NDAs allow it then try and walk over to the department where all the money is being made and find someone willing to teach you by asking them to give you the work they don't like doing because you want to learn it all from scratch. All you have to do is be the reliable person they turn to whenever some (tedious at first) work needs to be done and if this work over time becomes increasingly more skilled then you have found the right mentor and you are set for life because they will eventually want you to work with them once you have proven to be indispensable. If they get promoted you get promoted, if they go off and start a new company you are the first hire. This can be tricky because of signing multiple NDAs even between internal departments but if you are serious about wanting to learn they will tell management to make it happen. I took all the above advice from a comedian many years ago and my life would not be the same if I hadn't watched that throwaway clip at 3 in the morning because I always assumed there was no way I could ever do this kind of work.

Non-finance

If you're not interested in finance many other companies mentor developers sometimes even direct from local high schools and build them to senior developers. One is in New Hampshire called Northwoods Software. If you live in Egypt, Georgia, North Macedonia, Albania, Latvia, or Bosnia you can work remotely for Scandiweb (Latvia). The pay is only 500-600 Euros per month but you don't know anything yet so are being paid to learn and become a senior developer.

Curriculum

Designing Software

We audit this to see how to design software people enjoy using and why certain software is successful and others failed. This is also what you would use if you wanted to write entire apps using Grok/ChatGPT/etc., because you lay out all the state and actions.

Optimized Local Software

How to design and build any high performance application and manage the memory yourself ignoring the operating system as they are too old, not consistent, don't validate memory, stall threads, etc.

You can probably think of dozens of applications where there is a system that manages state such as a process scheduler, a logging program, a video game, the React js framework, a browser, an IDE, an iOS app, a network card, a router, a modern car, a jet, a space shuttle, a space station.

Optimized Distributed Software

How to design and build an online analytical system or in other words modern distributed applications. This is where we learn low-latency networking, SIMD vectorization, queues, microservices, code generation, system analysis, and finishes with the hardest software to build which is Query Optimization. How do you take an instruction (or AI prompt) and produce the most efficient representation of it using real world cost models. We eventually get into statistics here and queueing theory.

Modern Architecture

  • ETH Zurich DDCA Digital Design & Computer Architecture

Very unusual hardware architecture course with YouTube lectures covering all the current security problems in CPU cores, DDRx and flash memory storage. You can learn how to program FPGAs if you choose to do any of the assignments but this is only here to understand arch as we go I'm not doing any assignments/work.

Algorithmic Problem Solving

USACO training is excellent and it's completely free. This is where we learn how to solve seemingly impossible tasks. This is not leetcode. If you've never taken discrete math we learn that too. If you've never taken probability we learn that too.

Software Design

MIT has a course 6.1040 Software Design and the professor of that course wrote a book The Essence of Software. See this video where the professor of this course further explains what a concept is and how you would use a bunch of state machines to get a LLM to develop an entire app like iTunes. State in a concept means 'what it has to remember'. If you think AI can write anything well here's your chance to try.

Audit through these slides first then see the tutorials and case studies on his book website when you are planning on building something from scratch. You always learn through doing:

  • Diverge/Converge brainstorm ideas for what you want to make then turn those ideas into concepts and work out the interface they will each need. In the tutorial for divergent design on the prof's personal site he used a LLM to suggest additional features.
  • Wireframing recitation. If you go on Upwork or other freelancing sites you will see many jobs looking for someone to take a figma design and turn it into a working app. We can use a piece of paper to wireframe or hire someone to use Figma after giving the conceptual design plans.
  • (Optional) Watch the lectures on concepts see here, skip the 'message of the day', and scroll to the very bottom of the materials list where there are 6 lecture videos. There is UX design vids here too.
  • More Concepts and at the end of these slides you can imagine having a complete draft of an application that is now trivial to code into a minimum viable product (MVP).
  • Design moves how to unify or loosen concepts with many examples here like Zoom's screwy interface or Gmail labels.
  • API design what a RESTful API is.
  • Data design there are also recorded lectures here. NoSQL is introduced which you never want to use if not writing a temporary prototype as doing updates and inserts as JSON is a convoluted mess.
  • Make things learnable anyone should be able to figure out your software without overly relying on documentation.
  • Good UIX design basics.
  • Design innovation what was novel about Zoom? Why did everyone use it instead of FaceTime, Meet, Microsoft Teams etc?

It's worth reviewing the slides on ethical software where they show an example of someone who made a traffic routing app that ended up sending an army of jackasses speeding through some small residential neighborhood.

Here's the quick rundown:

  • Open the tutorials (book site) for reference
    • Look at the case studies as you go
  • Brainstorm ideas of things you want to make
    • Use AI to come up with use scenarios (features) you didn't think of
  • Converge these ideas into reusable concepts
  • Split concepts like in his examples where an expiring resource is a familiar concept and can be reused
  • Figure out what each concept needs to remember when it's used and what it consumes
  • Audit your concepts and see if there is an unwanted sync between them (loosen) or more needed sync (tighten). This is all in the tutorials/slides.
  • Write out a dependency diagram by going back to the slides and looking at what they did. Go by the criteria: A uses B only if A is simpler because it uses B and B is simpler because it doesn't need A. There is no useful subset containing A but not B. That means a legit dependency where A depends on B.
  • Transform to a database schema as he did with concept-driven data models though we will learn this later in detail
  • Implement the concept in software here is a TypeScript example

Imagine if you were a wandering software salesman who could meet clients and then right in front of them use the above methods on a piece of paper to sketch an early design of the MVP features for the software they want. The author has a second book about using lightweight formal methods to prove a system you design is correct such as security policy making sure there is no combination of states that lead to a problem.

Version control

The prof of the MIT design course with a grad student created gitless.com see this research page. Read how they audited the conceptual design of git and found things like staging were confusing. If you still want git resources then try MIT's version control lecture from it's course The Missing Semester of Your CS Education or read 'Git for Hackers'.

Many companies today use a customized Mercurial fork/clone instead of git like Meta's rust written EdenFS and Sapling SCM because git was originally developed to take advantage of linux file system optimizations ergo is conceptually attached to only that file system. As a lone developer Mercurial is dead simple to use.

OSS contributing

If you want to make a Pull Request or PR to an open source project hosted on Github this workflow explains how it's properly done.

The unwritten rules for this are don't play code golf meaning messing with whitespaces or someone else's code for no other reason than to make it shorter or to change it to your preferred style. I used to break this rule all the time not knowing the existing code is likely written that way for a good reason. Code golfing is hated by project maintainers.

The PR should be as small as possible or they won't review it so break something large into smaller PRs where each is self-contained and adds benefit they can clearly see. If you fix a bug include a test for it by finding where all the tests are kept (read the contrib docs) so it doesn't return when someone changes the code later. Properly benchmark whatever you change too to make sure you haven't screwed over the project reducing performance they will probably have instructions how to do this.

Free Software Foundation (FSF)

The GNU/FOSS way uses Savannah as a github GUI replacement for example here is a patch but all the work is done in plain text using a traditional mailing list style of attaching files and people vote/review it. Richard Stallman's ideology of free software means no corp can use this software to lock you in their prison and you will get more contributors to help you because whatever you're doing can't be undermined by some proprietary fork. See software licenses below.

The best part about AI generated code is it steals licensed code so can't use it in any kind of serious project because there is a good chance GNU licensed code or even worse patent lawsuit code is being used. One day they will get around this with better learning but not today.

Freelancing locally

You can become your own palantir if you take the advanced dbms CMU course below and become the local business intelligence and efficiency developer meaning you are consulted to extract data out of old company systems by optimized methods, transform this data into a form that Databricks or Snowflake can analyze and set up all of this in a centralized dashboard. Then business suit guys can do forecasting or machine learning and there is huge demand for this.

You could also start out as the local Zapier style developer and stitch together software people are already using and prefer like Quickbooks for accounting, Zoho/Salesforce for CRM, or iCal/Google calendar for bookings. You now sew that all together into a custom product for them plus you are a local they can actually talk to and meet. If they like your work they will tell every other business owner they know.

An example is a centralized desktop that combines all 3rd party online ordering systems for a restaurant. If you walk into one today you will see multiple tablets/laptops all provided by different delivery companies and it looks ridiculous when the employees try and manage them. These gigwork outfits all have an API you can create a single dashboard and eliminate that row of tablets by the cash register they have to juggle but it better be correct you don't want the restaurant missing orders. As the local software glue freelancer you can also make sure all this order data gets sent to their CRM or accounting software too.

What to charge

There's a contractrates.fyi site to view project/hourly rates for freelancers though charge whatever you want at first then keep upping your daily rate, never charge hourly if it's possible. If you have too much work then your rate isn't high enough.

A common method of billing is for deliverables on a set schedule (daily, weekly). You never want to just take off and spend months building software by yourself. If you're a web developer then adding authentication is a deliverable, adding a working catalog for some online store is a deliverable, etc.

Freelancing research

Here's what I found out by walking into places and just talking to people. Every one of them had a problem with existing software they used and none had ever met anyone locally who wasn't some SaaS sales guy pushing an existing product. All of them wanted a software mercenary which unfortunately I'm not but thinking I probably should be after seeing how easy it was.

Salons

An example using a nail salon. At the time of this writing their website uses a 3rd party hosted form for appointments that means they will have to phone or text/email to correct any overbookings. This can all be automated easily to show blocks of time that are previously reserved and sync some iCal protocol. Every one of these modern salon businesses rents out chairs to independent freelancers so they needed a fast way to add/remove new names to the booking calendar as these freelancers move around cities every month. Deposits for appointment bookings they wanted too but nobody offers this unless paying a high monthly cost for SaaS. They wanted to offer a gift scheme where someone can pay for a service and give it away but needed a way to know if these credits were legit if they started selling dozens of them, another software problem.

If your software works for one salon you can sell it to every other in your city. If they're all running your software there's other money generating ideas you can try like a B2B sales backend between all the shops, some kind of employee shift bidding software so if one salon needs temporary extra help another can rent them out, maybe nail techs want to teach and add that service to be booked. Once you hit critical mass in your area hire someone else to spread it to a new city.

Abstract this, think of all the service businesses that use similar software: tattoo artists, plumbers, window cleaners, junk removal, any service is all the same really they either have set rates and it's all done through appointment software or they harvest quotes and need some kind of form so the person can upload details and pics.

Restaurants/Bars

Restaurants I found use at least 3-4 different software vendors. They all prefer handwritten order tickets because any consumer electronic touch screen is too junky and slow. The wait staff manually use a restaurant management system after writing down the orders. These are not consumer grade they have high-volume grade touch screen hardware and are impressively optimized there is never any lag using the UI. This kind of software would be impossible to compete with as tax authorities have to certify them to avoid classic service industry sales tax fraud where the restaurant generates a smaller internal bill but presents the customer with the real bill containing much more sales tax and pocketing the difference. These systems all have open APIs that can be interfaced with.

The second software they used was 3rd party table reservations which we can compete with. They absolutely hated every option because of ripoff fees like 10-20% 'service charge' being added on which customers mistakenly thought was the tip but it was the scam fee for the online booking service. Prepaid reservations reduced no shows and only 1 online service they knew of offered it but charged way too much. Yet another easy Stripe integration or whatever payment gateway they wish to use.

The remaining software was online orders/main website which usually was just a instagram page with a link to Uber or a similar gig service for delivery or pickup. As noted before the #1 request was to centralize all these 3rd party order sites into a single dashboard. Another request was some kind of dashboard for reputation management and a way to capture and control reviews meaning if you had a bad review you get directed to management to correct and if it was a good review you are redirected to google reviews or elsewhere. There are schemes for capturing these reviews such as emailing them a yes/no question about the experience and if they click yes redirect to a public review site and if no then redirect to contact the restaurant manager. I'm sure there's better techniques.

I was asked how to automate ticket sales for special events, adding merch sales, automate volumes on TVs showing sports games so the commercials were muted and a music playlist started automatically, lighting automation, asking me if customized POS is available (Stripe Terminal) to capture signatures so they don't have to dig them out for disputes months later and many other problems this was a gold mine of potential freelance work I didn't expect.

Tailors

If you look at Huntsman suit tailors on Savile Row they have an interesting website where you can click Made to Order and customize almost everything. This feature was easy to write and I used an AI image generator to make the drawings of the clothing styles. I approached some tailors locally and they said this is exactly what they wanted plus some kind of zoom/facetime booking for remote consultations.

Design examples

Let's look at a web design award site. Here are some nominees we can mercilessly review using the conceptual design skills we just learned. You can use DevTools in Chrome or Inspect in FireFox to see responsive mode (a phone/tablet screen).

The purpose of this site is a portfolio showcase for interior/exterior designers and the website itself was created by an agency so I'm reviewing the (April 2025) website not the business.

Intros are cancer but this one is short at least. You can change the layout (desktop view only) so a concept with no purpose. The 'About' page doesn't use a scroll concept indicating there is more to see, we'll see other sites do. The reel is laggy/broken on desktop I got a white box that expanded to fill up my screen and only noticed it had content when I went back to it after. This reel of their work should be on the landing page instead of the wojak drawings. There's a FAQ concept here with no purpose at all, a FAQ is frequently asked questions like how does the design process work what can I expect working with you guys, etc. It is not 'What made you laugh today?'.

Desktop view when you click on their portfolio of work you get an epilepsy triggering overly sensitive scroll that confusingly begins halfway through their portfolio and bombards you with rapidly changing images. Responsive mode uses a totally different concept that works well so 2 concepts, 1 purpose. The contact page should be some kind of form for directing the communication. I would imagine all their clients are architects who personally know them or city hall managers and this is just a lead generating page and portfolio.

There's a journal with no recent updates which tells me they had a limited contract with this developer and now that it's over nobody knows how to update the journal. This is the curse of all bespoke software and why most businesses prefer to simply have an Instagram or Facebook page because then at least they can update it themselves. A good developer (that's you) would make this easy for them to update.

The purpose of this site is a freelance web designer showcasing a portfolio.

Another site where the desktop version is broken, here it is missing the color highlighting on the logo and text that the mobile version has. The scroll circle concept on mobile is broken it shows you've fully scrolled after the first page but there's multiple more scrolls left as you end up jumping to every other page. The menu is now a useless concept if you're just going to scroll through the whole site tree. Desktop view if you click on service the case study hijacks the pointer with this gigantic floating box preview which is awful. The contact page is amusing: 'Contact us for crime prevention service' wait, what? I can summon Batman? No mention of cyber security in the service section. There's terms and conditions you have to agree to before submitting the contact form which is strange but maybe law in Japan. Their website design work in the portfolio is pretty good I would have made that the focus of the site and eliminated all the eye searing infinity scrolling animations and pointer hijacking.

They are recruiting so if you're in Japan do a complete conceptual design analysis of this site and offer to fix it if they hire you.

The purpose of this site is a trade catalog for bulk orders by architects and designers and some agency built their website.

The desktop view is guilty of the worst internet crime which is a scroll that keeps repeating itself for infinity whereas the phone view is nicely done and we finally get a scroll bar, wow. I gave up reviewing the desktop version it's totally unusable and broken everywhere. The most critical part of the site 'About Monolith' is only accessible by scrolling the front page and is not in the menu despite the title 'Site links' so we have another menu concept at the top of the page with no purpose. 'Monolith Trade' oddly redirects to a 3rd party form that only asks for an email, not sure why it exists with the contact email beside it. If you want to capture emails for mailchimp style marketing you don't need this $50/month form.

Almost every site reviewed so far their desktop versions have this awful concept with no purpose of mirroring the pointer movements with a very slow animation making the site seem sluggish.

Review the Reviewers

Members of the award site can vote on and rate other sites. Let's review the reviewers to find someone who is actually honest as most seem to give every site too high a score in hopes when they post their own websites for nomination that they are returned the favor. The ratings for the Monolith site there is only one honest review with everyone else giving this a 8+/10 except for this Bulgarian developer. Looking at her site it's pretty good for one there's a real scroll bar and everything has a purpose as this is a mix of a personal and business site which I would never mix but whatever. She clearly lays out the development process in the FAQ which I would promote to the front page and is an actual FAQ unlike the other site we reviewed earlier. The desktop view is not broken, a first!

Another honest reviewer is this Russian design outfit. I am again shocked the desktop view actually works and there's even a scroll bar! His site does only what it needs to do which is show off skill and present contacts to harvest leads.

Design examples II

Let's look at the CSS Design Awards.

Some of these sites are foolishly using TrustPilot widgets when your policy should be to ignore that site completely. Let's imagine I run a review site. I add your service without you knowing then I can leave some bad reviews even though I've never used your service. I contact you and say 'hey there's some bad reviews here of your business'. You want to comment on these reviews to point out they are fake but to do so you have to agree to let me use all your logos and branding before you can comment. Now I have an official looking page with all your graphics on it licensed for free, forever.

The purpose of this website is a portfolio for a photography biz and some agency created this website.

The intro finally has a real purpose! It sets up the portfolio and displays that before anything else. The desktop view cookie and privacy box is non-intrusive and well designed instead of those awful ones that take up half the screen. The scroll concept on the mobile view is well done and the desktop view has a real scroll bar. When you click on a portfolio entry the concept they use to see more photos tells you the amount which was taught in the MIT course about adding information bits to the design. Everything here has a clear purpose. The contact form is actually useful. The animations are useful and not overused, finally a good website. The agency that designed this website is here.

These website award sites are filled with freelancers and agencies most of them front end designers. You know how to properly design software and can build it from scratch. They can make it come to life with css and animations. Why not contact some of these people and show them your software skill portfolio if you need inexpensive or free hobby hosting then look at the deployments for Neon which is a free tier serverless PostgreSQL dbms. If you need a domain try Porkbun.

Marketplace case study

Let's see what's going on here at the Shopify app marketplace.

Forms are a major problem as Shopify refuses to fix their own form offerings so you have to pay some 3rd party. There is a total shit show of inept offerings except for Hulk Form Builder who is the most popular it seems. No surprise why they are popular they offer a premium onboarding service to install the form by booking a phone call.

Looking through the Form Builder site they are owned by someone else. We found the final boss of the monopoly on shopify apps it's this company which offers to buy your app. You could make a startup by finding an app/ability they don't have listed here and monopolizing that niche to get them to buy it. They have some affiliate program where developers can tout any app they own and make shill bux too.

Let's review this app it's recent so we can judge how to start one of these. It's concept is to add a progress bar for free shipping/rewards a classic sales driving tactic. Looking at the reviews of course the biggest problem once again is not offering to onboard customers with custom installation. One review said this app causes a lot of debug spam and breaks some product pages and is a good reminder of effects your software can have that are unintended. A review from a store called Fuffies is using it and laments the app is too slow to reflect changes though it seems the guy developing the app manually fixed the problem which again would have been solved by offering installation. Looking at the Fuffies shopping cart to see the progress bar in action I found an issue there should be some kind of threshold/rounding config available for example at the time of this writing if you buy 2 of her stuffed piggu you are $0.02 short from getting free shipping which is sure to induce rage for all customers.

Maybe the progress bar concept for free shipping shouldn't be in currency but space available in the package instead. For her store she could have a picture of an empty box in the checkout then when a stuffed crab is added to the cart it's placed in the box showing extra room available. Buy more and the box is full with free shipping unlocked. Internally it can be a numerical threshold but externally a better concept instead of 'sorry you are 2 cents short for free shipping'.

Marketplace strategy

Let's say we want to sell an app using the shopify marketplace.

The easiest way to start would be to simply talk (see startups below) to every existing shopify store owner you can find and learn about their business to discover ways software can help them.

If whatever you decide to make has competitors look at their pricing schemes. You don't want to undercut their pricing you want to match their most expensive pricing but justify it through superior product and service. We are in the B2B software game it's not like selling shoes where you can just discount the price because these companies will pay premium if you offer a premium solution.

While you are looking at competitors write down their existing features and go through their reviews writing down what every wished for feature was. You don't implement all these features you do conceptual design which is to arrange all the similar features together and think of some concept that combines them into one. Write down everything they dislike too which is going to be many complaints about install glitches that require manual intervention and imprecise estimates for support replies.

Whatever you make it seems to be absolutely critical to offer a booked phone call where you can fully integrate your app with their design because if you don't it will become a support problem anyway. If you are talking to the owner which you likely will then you have yet another opportunity to ask them about their business. You don't want to rush through the install panicking that's not how B2B works you build relationships which pay off over time. Where I work there is a fleet of account executives and their sole job is to work directly with the buyers of our specialty trading software and maintain a relationship.

Another way of deciding what to make is use the same analysis we learned in the MIT course: take all the available apps in the marketplace and arrange them by concept seeing if any of them have overlapping purposes. There's one app you can combine together and your customers save money only needing to buy one instead of two apps.

Startups

Here's Peter Thiel giving an old but very good crash course in startups to Stanford students. Start small and monopolize something small. Here's an old article of Sam Altman of Open AI (now a monopoly too) talking about how Y Combinator's success is due to it being (surprise) a monopoly.

As a founder all you do is talk to people because you'll be able to spot opportunities where software is useful. In that video he suggests never asking them 'what features do you want' instead asking them everything about their job and you as the startup founder can figure out what the features need to be.

Getting good at writing software

Here is some recent advice from antirez a hacker from Sicily who created Redis (now an OSS disaster, see licenses below) on how you get good at building software:

  • 1. Learn algorithms, buy some book and start reading.
  • 2. Learn theory of neural networks, use the Chollet book to develop some knowledge and intuition.
  • 3. Write many small "pet" programs implementing basic stuff: implement a small database, a small programming language interpreter, a small editor, a small neural network, … Each time try to apply good design.
  • 4. Embrace simplicity in everything you write, don't make things more complex than needed.
  • 5. Read good quality code, especially read open source code which is at a degree of complexity that makes reading valuable. Don't read Kubernets… or PostgreSQL perhaps. Learn more self-contained code bases.
  • 6. Participate to some OSS project.
  • 7. Start some side-project and put it on GitHub, where you are the main designer. Develop something that you need. A library, or a small utility, something that you really enojoy doing. Do it at a quality level that makes you happy. Never regret pushing code on GitHub if you feel that for the level you are right now it is appropriate.
  • 8. Don't give a fck about what other programmers think of your work, if you did it at the max level of what you can (currently) do. Anyway most programs on the Internet suck, including the ones of people feeling very competent.

Here is from advice from Citadel for developers on how to 'maximize their effectiveness':

  • Exceptional subject-matter expertise in a specific field like database engineering can give you a competitive edge in a crowded field.

If you want the so-called soft skills that developers should have simply read any of those meme books by Jocko Willink about extreme ownership. Yes that sounds like a ridiculous idea however there is nothing worse than having a manager or coworker who blames everyone else. I had one that would never let you get a word in and talk over you coming up with a thousand excuses per second whenever some issue was raised. Everyone hated him.

Licenses

Antirez sold all rights to the Redis software years ago and that new company went private in 2024 taking control over all the libraries and clients to lease it to Microsoft or something. This pulled the rug out from all the contributors who went ballistic about it but the license permitted this to happen.

OpenBSD has a good breakdown of licenses here where they promote the use of the ISC license which best suits their project. They note there is even problems with a public domain license. Most open source projects have some kind of dual license like Rust with their MIT and Apache licenses that protects contributors to your project from you later turning around and suing them for patent infringement. Patent trolling is rife in software for example you will never see bid/ask displayed as a single column with the two converging in prices in the middle. These are always shown as parallel columns for a good reason it's because some clown will sue you who owns a patent somehow for that convergent design.

The full list of licenses you can use for an open source project are here. If you want to keep it open source forever and prevent any proprietary forks undermining your project then use the latest GNU/FSF license. An example of a proprietary fork is when Apple yoinked the Packet Filter Firewall from OpenBSD and then wrapped their proprietary license around all future enhancements which locked out the original developers.

Modern Architecture

Today's hardware has serious problems outlined in this course:

CPUs

Defensive programming against the hardware doing data corruption is now a thing:

  • Adding more CPU cores is useless as the primary bottleneck is memory/storage
    • The cores themselves are broken due to 'silent data corruption' and this may be exploitable
    • Nobody knows why this is happening (high temp? silicon defects??)

DDRx memory

Rowhammer was originally a tactic to read/close DRAM memory repeatedly like you were pounding it with a hammer which triggered charge leakage to adjacent memory. Bits got flipped and you could exploit this. This problem has only become worse since it was discovered and now similar attacks can be done with WebGL, remote over a network, attacking a VM host and all it's guests, and the mitigations so far don't work.

  • DRAM scaling can't continue due to rowhammer style attacks (memory read disturbances)
  • The more recent a DRAM chip is made the less reliable against read disturbances
    • Periodic refresh has been lowered by mfgs to avoid performance overhead and enables these attacks even more
    • First generation read disturbances could only affect adjacent n+1 or n-1 rows now can access many rows
  • DDR5 mitigations enabled new DoS attacks, performance problems
    • The mitigations don't even work as they rely on observing high row activation count but now we have low activation attacks (rowpress)
  • The latest read disturbance attacks are not even rowhammer in nature and affect different rows/cells with an unknown underlying error mechanism only physics can explain
  • Memory controllers are not intelligent and need to be rewritten from scratch (that's you writing a FPGA)
  • Network cards where DMA is exploited (as taught in 15-721) are open to remote memory read disturbance attacks (another FPGA you may be writing)

Flash Memory

  • The charge leaks in flash memory too as they're also now so small everything interferes
  • High retention errors even with <1 year flash memory compared to SSDs/NAND flash years ago
  • Flash memory controllers are intelligent and can remap data to prevent data degradation but very few controllers do this (yet another FPGA you'll have to write)

Digital Design and Computer Architecture (DDCA)

There is an excellent ETH Zurich undergrad course that is fully open and doesn't require a physics background. It focuses on modern and future hardware, I guarantee you will like this course it's not like any other architecture courses I've seen. The lectures are live streamed with breaks included so aren't 2hrs long you can skip at least an hour per lecture.

Casually watching the YouTube lectures you're interested in and reading some of papers is enough but there is also an opportunity to try some FPGA programming with the labs. Vivado software is used however you can also use OCaml to run similar simulations since Vivado is a ridiculously large download (60g!!).

I'm not going to go through this course since anyone can and you should take what you want. Everything you want to know about the cache or CPU execution pipeline is here in excellent detail. The grad version (called 'Computer Architecture') if you take it is amazing how he shows new designs and builds memory controllers to help counter memory disturbance attacks.

Check his YouTube playlists for the latest 2025 version of DDCA but there's little change in the introductory course.

Optimized Local Software (15-445 Relational DBMS)

This is the only official requirement for 15-799 Query Optimization which we transition to eventually.

Lecture 00

Watching lecture #00 from 15-445 @6:43 he says non-CMU students can complete all assignments using Gradescope and even hands out a code we can use if you're interested. There's a public discord channel too. The assignments are in C++ and their C++23 bootcamp is pretty good. BusTub assignments you have to read existing code and write features so it mimics what you'd do in open source development or a job. Majority of this lecture you can skip it's all logistics after the intro.

I won't be doing any assignments because I'm using this material to build my own customized dbms from scratch and I assume you are too or at least some major parts of it.

Lecture 1

TODO

Optimized Distributed Software (15-721 Advanced DBMS)

In this course the data is no longer locally stored and lives in Amazon s3 buckets. How do you engineer so that you are moving around the least amount of data and at the same time maximizing the network connection?

Lecture 0

Watching lecture #00 from CMU's 15-721.

He lists the reasons why we should take this course and we are taking it for the second reason which is if we are good enough to write code for an online analytical system then we can write code for anything else that cares about performance. Most of the second half of the lecture is course logistics we don't care about.

If you look at the projects students had to build from scratch some part of a modern OLAP system like the scheduler, execution engine, I/O etc.

GlareDB

Every year CMU does a database seminar with the lectures uploaded to YouTube and there's a very good one by GlareDB how they built a product using Datafusion as their execution engine then quickly found out after experiencing dependency management hell they needed to rewrite it all from scratch. The video is worth watching since this is what we're doing but of course on a much smaller scale. The work in progress code is here written in Rust.

Lecture 1

Watching lecture #01 from 15-721 explaining what a modern OLAP system is.

The first paper assigned is by Databricks where they shill for standardized, open data storage formats like Parquet which is taught in the next lecture I believe. All these terms like 'Lakehouse' are just marketing but means computation is done near the data. The second paper is written by Facebook developers and advocates that the architecture of OLAP systems should be open source modules working together instead of reinventing the wheel with every new database system as there's now dozens of startups causing fragmentation.

Materialized views are precomputed queries like sql aggregations (sum, max, avg) that have been cached so you don't have to run the query again. S3 is referring to Amazon S3 or 'simple storage service' which is a giga cloud for storing immutable data in any format like CSV, JSON or Apache Parquet format. These are organized into different global regions or 'buckets' like the aws-cn bucket is the Chinese region s3 data pool. Salesforce has 100+ Petabytes on s3 that they do analytics on.

@30:58 the architecture overview breaks everything up into 6 parts and we will learn all 6 in this course with a focus on the query optimizer as he says it's the hardest to build. @53m POSIX API means your operating system API which are syscalls like reading/writing files to a virtual file system. POSIX is supposed to be a standard to maintain compatibility between operating systems but I don't even think Linux is POSIX certified so you still have to do platform testing anyway. At the end he talks about how Yellowbrick did some fintech tricks to speed up fetching s3 data.

Lecture 3

TODO

Algorithmic Problem Solving (USACO)

Should you take this?

Any modern software interview is now almost exclusively system design which we are already learning from the above Andy Pavlo dbms courses. The data structures learned in those courses are really all you would ever need to pass any phone screen where they want you to make a queue or hash something and all the typical strategies like dynamic programming are also covered. So why would you want to take this?

You take this if you want to (eventually) become very fast at solving seemingly impossible problems or you want to learn a traditional algorithms design course. The best part about competitive programming is you learn the algorithm as you write it.

Plan

We are going to use CodinGame (free, 25 languages). It's designed so you learn as you go and Coderpad is likely the interview software they'll ask you to use (Apple, Jane Street, Nasdaq, Shopify, Snowflake) and many interviewers at companies are lazy so they will reach into the bin of premade interview tasks which are designed by the same people who make CodinGame.

The USACO Guide (free, any language) solution writeups to competition problems are a master class in algorithm analysis and the level of difficulty there is very similar to 'superday' interviews where they purposely give you something that seems impossible to solve just to see your logic process. A competitive programmer needs to know about strategies to solve problems so why not learn from the pros. There is an online IDE to submit solutions in C++/Java/Python3 or you can paste in source code to the judge on the contest pages.

To properly utilize some AI like Grok/ChatGPT to help you with these USACO problems you would prompt it to find any patterns instead of directly asking for a solution. Whatever solution it gives will just be some stack exchange post anyway and likely not what you're really looking for.

Optional

These help us complete the above but remember all of this is optional anyway:

Poh-Shen Loh's Discrete Math 2021 lectures on YouTube because these are problem solving seminars and everything there directly relates to CodinGame/USACO problems.

The course notes from whatever latest semester of CMU's 15-451 Design and Analysis of Algorithms is almost a complete book and seems to be a modernized supplement for CLRS (Algorithms by Cormen, Leiserson, Rivest, and Stein) and DPV (Algorithms by Dasgupta, Papadimitriou, and Vazirani). Helps to understand the proofs in the USACO solutions.

The chapters on Discrete Probability from the book (free) Intro to Probability for Computing simply to understand expectation and z-transforms to solve recurrence relations. This is a workbook not really a textbook.

Select chapters from the book (free) Calculus Made Easy for the sole purpose of understanding integration for probability.

Languages

The plai.org workbook on languages teaches the standard model of programming languages using a web-based stacker to see exactly how programs execute. If you look at the stack overflow yearly developer survey almost every language on that list you will discover is nearly identical internally once you read the PLAI chapters on SMoL.

CodinGame supports Haskell, OCaml, F#, Rust, Scala, JS/TypeScript, C#, Lua, Swift, Python, C/C++, Java, Clojure, Bash and some more. USACO options (C/C++17/Java/Python3) is explained in the general section of the guide but you can always write the solution in any language because we are given all the test inputs/outputs.

  • Lua is probably the simplest to get started since it's already used for game modding and everywhere else for wrangling configuration files.
  • Python resources are everywhere like this CMU 15-112 schedule.
  • OCaml you can learn here or here with modified libraries.
  • Haskell is introduced here with category theory.
  • To learn C there is Modern C and IYMLC.
  • To learn C++20/23 try the CMU bootcamp or NVIDIA slides.
  • To learn Rust try the Brown University experimental Rust book.
  • TypeScript software engineering is taught by MIT here.
    • Java versions here.

You typically are only using a very small subset of any language you choose this is not professional sofware engineering we are hacking problems to get better at problem solving.

Parity

CMU's 2021 Discrete Math (combinatorics) course is fully open and we may as well take it while we complete USACO. Graphs, trees, permutations, generating functions, it's all explained here.

Discrete Math lecture 1 the basic multiplication principle, parity, if order matters you can always reorder something, how to guess and try different cases, how to reduce the problem to something easier, recursion to generate the next solution.

Vectors/Matrices

We need some linear algebra and will take it as it comes up.

Watch through some of WildLinAlg1 so you can see a vector is an abtract object with a direction. @17:36 covers scaling, addition, what a basis vector is and then the classic problem of linear algebra which is solving a system of linear equations and if you're interested he sets this up from special cases to the general case.

Watch through some of WildLinAlg5 specifically the vector interpretation of a linear system, change of coordinates where he writes out a complete generalization of using abcdef coefficients sets up matrix array notation and vector column notation. A matrix can be an encoded function and multiplying it by a vector is like giving it input. Multiplying two matrices is function composition. Rest of this series is optional I will go through it in AI and Math workshops eventually.

You could obtain Evan Chen's 18.02 notes skip to 'Part Alfa: Linear Algebra of Vectors' for a good crash course but the linear algebra used in the CMU discrete math course is explained as he goes.

Parity 2

Discrete Math lecture 2 same problem we saw before but in linear algebra (it's explained). The point is to cast the problem into a richer domain where it's easier to solve. You don't have to memorize anything here it's just an example of playing around with a problem. Did you get @37:35 ABA-1 * ABA-1 is AB * BA-1 or ABBA-1 and we don't have to know how to invert a matrix by hand though WildLinAlg has a vid on it if you want.

This ends with an intro to number of paths a common USACO bronze problem.

USACO problems

All the USACO contest problems are here and besides the guide being filled with problems there is also the legacy training site.

Test inputs and time limits

As mentioned before the contest problems give us all the tests. This means you can write in your language of choice then use a shell to pipe inputs to your program via standard I/O and run a diff of your program's output against their .out tests. If your solution passes all the tests try and translate to some language the online judge will accept or you could set up a sandbox of your own to report CPU + system time but the online judge will have different hardware and may accept your solution while you get TLE on your own hardware.

2025 Feb contests

Here are 12 contest problems from 2025 Feb and includes all the test data and solutions.

Min Max Subarrays

The first problem is Min Max Subarrays. Wait, a platinum problem? They're all equally hard (for us) nothing in bronze is easy either. Wait until we try the ICPC problems on Kattis and you will wish you were back in platinum. Remember in the writeup above about the 'superday' when you are given an impossible to solve problem well these are next to impossible but we have the advantage of being able to casually look through the guide for strategies.

First problem is easy we have to take the input and create all possible subarrays, writing out examples in lists we need the following but of course we could also just do ranges on the array indices and not move around anything.

[list: 2, 1]) is [list:
    [list: 1],
    [list: 2],
    [list: 2, 1]]
length is (2 * (2 + 1)) / 2 or 3 subarrays/sublists

[list: 1, 2, 3]) is [list:
    [list: 1],
    [list: 2],
    [list: 3],
    [list: 1, 2],
    [list: 2, 3],
    [list: 1, 2, 3]]
length is (3 * (3 + 1)) / 2 or 6 

Patterns, if n is the total number of integers then the number of subarrays:

  • 1 array of length n
  • 2 subs of length n-1
  • 3 subs of length n-2
  • 4 subs of length n-3 …

The USACO guide section on time complexity says the grading server can handle at the lower bound 108 operations per second and we are given 3 seconds. The problem input is max n = 106 with n(n + 1)/2 or 500 billion subarrays.

I wrote a simple brute force solution using my phone to generate all subarrays as a curious test that won't pass the time limit for any large inputs, this can be run here:

fun subarray(lst :: List<Number>, tk :: Number)-> List<Number>:
  doc: "Helper for all-subs"
  if  tk > 0:
    [list: lst.take(tk)] + subarray(lst, tk - 1)
  else:
    empty
  end
end

fun all-subs(
    lst :: List<Number>, 
    len :: Number, 
    drp :: Number, 
    tk :: Number)-> List<Number>:
  doc: "Generate all subarrays"
  if drp < len:
    subarray(lst.drop(drp), tk) + all-subs(lst, len, drp + 1, tk - 1) 
  else:
    empty
  end
end

check: 
  fun test(lst :: List<Number>, n :: Number)-> List<Number>:
    all-subs(lst, n, 0, n)
  end

  test([list: 2, 1], 2).length() is (2 * (2 + 1)) / 2
  test([list: 2, 4, 1, 3], 4).length() is (4 * (4 + 1)) / 2
  # make up any sized list
  a = [list: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
  test(a, 14).length() is (14 * (14 + 1)) / 2
end

I read the problem writeup incorrectly for the next subproblem yet somehow managed to produce a working program that was correct for up to N <= 100 before I went back to look at it again and finally understood what they are asking us to do. We can replace any 2 consecutive integers in the input we don't have to linearly approach this from the end points like the writeup examples show. The goal is to produce the maximum single element left over.

Examples:

  • [1]
    • single element it is already the maximum possible value
    • may as well sum these as we take the I/O input
  • [1, 2]
    • this is a min as per the algorithm instruction
  • [1, 2, 5, 4]
    • another forced min we can't achieve 5 (even length)
    • [1, 2, 5, 4] min (1, 2)
    • [1, 5, 4] only max possible is 4
  • [1, 2, 5, 4, 3]
    • we can min (1,2) and get 1
    • [1, 5, 4, 3]
    • we can max (1,5) and get 5
    • [5, 4, 3]
    • we can min (4,3) and remaining integer is 5 the max of whole array
  • [1, 2, 5, 4, 3, 2]
    • even length, another forced min
    • we can min (3,2) and get 2
    • [1, 2, 5, 4, 2]
    • we can max (1,2) and get 2
    • [2, 5, 4, 2]
    • we can min (4,2) and get 2
    • [1, 5, 2] and we can't achieve 5
      • Could we have achieved 4?
      • [1, 2, 4, 3, 2] (after min (5, 4)
      • [2, 4, 3, 2] (after max 1,2)
      • [2, 4, 2] damn no
  • [1, 2, 5, 4, 3, 2, 1] (odd length)
    • min operation (1,2)
    • [1, 5, 4, 3, 2, 1]
    • max operation (1, 5)
    • [5, 4, 3, 2, 1]
    • min operation (2, 1)
    • [5, 4, 3, 1]
    • max operation (5, 4)
    • [5, 3, 1]
    • after min of (3,1) we have 5 left

If the size of the subarray is odd, and greater than 5, we just return the max of the whole array via a simple comparison sort. If the size of the subarray is even, past some length threshold, we always finish the algorithm with a forced min and return the next biggest possible max. Poh-Shen Loh's combinatorics course would come in handy here to generate all possible permutations in a test to see exactly what that length threshold is but for now we can just keep writing larger examples and see what happens. Of course if we were to do this in an interview they'd ask why is this possible for every odd length array >= 5 and the hint in the writeup is it can be proven but we haven't learned any proofs yet that's why we will try the CMU notes after.

If you keep writing examples I think the even threshold to achieve the lesser max is length = 8. Now we can write a solution and try it (you get unlimited tries with the online judge/grading server). We have to learn dynamic programming and if you watched the Accelerated CS lectures for Brown's CS19 course there is conceptually no difference between memoization and "dynamic programming" they both store memory of previous computations except dynamic programming is an old imperative style algorithm where you have to be careful computing small answers up to the solution whereas memoization is much easier to use/understand.

I know a trick for university lectures it's called covid-19 you go back to when classes were all cancelled and there was panic trying to get them online (before zoom) and you can watch any lecture you want such as Dynamic Programming from CMU's 15-295 Competition Programming. USACO guide also has a good entry on DP. If we are going to min/max subarrays smaller than 8 for even and smaller than 5 for odd then we should remember previous computations and not repeat them. For the subarrays themselves we can impose some kind of range finding data structure on the array we store from I/O and just manipulate the indices and this is also likely in the guide though a recursive approach is pretty simple.

TODO


Home