System Design Cheatsheet

Computer Arch:

RAM and cache do not store persistent data but our disk does
RAM stores a place for code and data
- The CPU can read the code and perform operations to write to the data
Moore’s Law: # of transistors doubles every two years - is beginning to plateau
- Individual computers are computationally limited by the CPU’s speed

Horizontal scaling – add more servers to handle requests in parallel.
- users send requests → load balancer → redirects request to the appropriate server
  - requires load balancing and often stateless design
- Pro: fault tolerance (if one server fails, traffic is redirected)
- Con: more engineering complexity (server coordination, consistent state)
- Favored in large-scale systems; often cheaper with commodity hardware.
Vertically scale - upgrade the server to handle more user requests
- Easier to implement, but limited by hardware capacity.
- ex. upgrading to higher-core CPU or adding RAM.
Our server can communicate with external servers (APIs - i.e. Spotify API)
As devs, our server will often have a logging service which can be used to create time-series metrics
- These metrics feed our alerting system in cause our application performance dips to an unsatisfactory level

Availability –

Formula: uptime / (uptime + downtime)
- SLO (Service Level Objective) – Target uptime (e.g., AWS “five 9s” = 99.999%).
- SLA (Service Level Agreement) – Contractual guarantee; failure triggers compensation (e.g., AWS guarantees 99.9%).
Key Reliability Umbrella Terms
- Reliability – Probability the service will run without failure.
- Fault Tolerance – Ability to keep working despite component failures.
- Redundancy – Extra capacity or servers (e.g., horizontal scaling).
Throughput – Requests processed per time unit.
- QPS (Queries Per Second) – Common in server-to-storage operations (CRUD).
Latency – Time to complete a request.
- Not limited to networking — CPUs use caching to reduce latency and boost performance.