Marc Brooker's Blog
Dynamo, DynamoDB, and Aurora DSQL
LLMs as Parts of Systems
Career advice, or something like it
Systems Fun at HotOS
Good Performance for Bad Days
Decomposing Aurora DSQL
One or Two? How Many Queues?
What Fekete's Anomaly Can Teach Us About Isolation
Versioning versus Coordination
Snapshot Isolation vs Serializability
DSQL Vignette: Wait! Isn't That Impossible?
DSQL Vignette: Transactions and Durability
DSQL Vignette: Reads and Compute
DSQL Vignette: Aurora DSQL, and A Personal Story
Ten Years of AWS Lambda
Garbage Collection and Metastability
Resource Management in Aurora Serverless
Let's Consign CAP to the Cabinet of Curiosities
Not Just Scale
It's always TCP_NODELAY. Every damn time.
MemoryDB: Speed, Durability, and Composition.
Formal Methods: Just Good Engineering Practice?
Finding Needles in a Haystack with Best-of-K
The Builder's Guide to Better Mousetraps
Better Benchmarks Through Graphs
How Do You Spend Your Time?
Pat's Big Deal, and Transaction Coordination
What is Scalability Anyway?
Why Aren't We SIEVE-ing?
It's About Time!
Optimism vs Pessimism in Distributed Systems
Writing For Somebody
Exponential Value at Linear Cost
On The Acoustics of Cocktail Parties
Invariants: A Better Debugger?
My Favorite Bits of OSDI/ATC'23
Bélády's Anomaly Doesn't Happen Often
What is a container?
Container Loading in AWS Lambda
Open and Closed, Omission and Collapse
The Four Hobbies, and Apparent Expertise
Surprising Scalability of Multitenancy
False Sharing versus Perfect Placement
Hot Keys, Scalability, and the Zipf Distribution
NoSQL: The Baby and the Bathwater
Erasure Coding versus Tail Latency
Under My Thumb: Insight Behind the Rules
Lambda Snapstart, and snapshots as a tool for system builders
Amazon's Distributed Computing Manifesto
Writing Is Magic
Give Your Tail a Nudge
Atomic Commitment: The Unscalability Protocol
Histogram vs eCDF
What is Backoff For?
Getting into formal specification, and getting my team into it too
The DynamoDB paper
Formal Methods Only Solve Half My Problems
What is a simple system?
Simple Simulations for System Builders
Fixing retries with token buckets and circuit breakers
Will circuit breakers solve my problems?
Software Deployment, Speed, and Safety
DynamoDB's Best Feature: Predictability
The Bug in Paxos Made Simple
Serial, Parallel, and Quorum Latencies
Caches, Modes, and Unstable Systems
My Proposal for Arecibo: Drones
Latency Sneaks Up On You
Metastability and Distributed Systems
Tail Latency Might Matter More Than You Think
Redundant against what?
What You Can Learn From Old Hard Drive Adverts
Incident Response Isn't Enough
The Fundamental Mechanism of Scaling
Quorum Availability
Getting Big Things Done
Consensus is Harder Than It Looks
Focus on the Good Parts
Surprising Economics of Load-Balanced Systems
A Story About a Fish
Code Only Says What it Does
Some Virtualization Papers Worth Reading
Reading Research: A Guide for Software Engineers
Two Years With Rust
Firecracker: Lightweight Virtualization for Serverless Applications
Physalia: Millions of Tiny Databases
Why do we need distributed systems?
Kindness, Wickedness and Safety
When Redundancy Actually Helps
Is Anatoly Dyatlov to blame?
Some risks of coordinating only sometimes
Learning to build distributed systems
Control Planes vs Data Planes
Telling Stories About Little's Law
Availability and availability
Balls Into Bins In Distributed Systems
Is the Mean Really Useless?
Why Must Systems Be Operated?
Heuristic Traps for Systems Operators
Is there a CAP theorem for Durability?
CALISDO: Threat Modeling for Distributed Designs
Sodium Carbonate, and Ramenized Pasta
The Zero, One, Infinity Disease
How Amazon Web Services Uses Formal Methods
Jitter: Making Things Better With Randomness
Electoral Trouble in Sybilania
Does Bitcoin Solve Byzantine Consensus?
A Quiet Defense of Patterns
Make Your Program Slower With Threads
Two Farmers and Common Knowledge
Exactly-Once Delivery May Not Be What You Want
Ice Cream and Distributed Systems
Harvest and Yield: Not A Natural Cure for Tradeoff Confusion
The Essential Barbara Liskov
The Space Between Theory and Practice in Distributed Systems
Use of Formal Methods at Amazon Web Services
CAP and PACELC: Thinking More Clearly About Consistency
Two traps in iostat: %util and svctm
The Operations Gradient: Improving Safety in Complex Systems
Viewstamped Replication: The Less-Famous Consensus Protocol
The Essential Nancy Lynch
Failure Detectors, and Non-Blocking Atomic Commit
The Essential Leslie Lamport
Snark, Chord, and Trust in Algorithms
Distributed Consensus: Beating Impossibility with Probability One
Restricted Transactional Memory on Haswell
Hardware Lock Elision on Haswell
Beyond iostat: Storage performance analysis with blktrace
Some Patterns of Engineering Design Meetings
Exploring TLA+ with two-phase commit
C++11's atomic and volatile, under the hood on x86
Java's Atomic and volatile, under the hood on x86
Are volatile reads really free?
Highly contended and fair locking in Java
Expect Less, Get More?
Latency lags bandwidth
The properties of crash-only software
The power of two random choices
The benefits of having data