Understanding Distributed Systems: A Practical Guide
Learn the fundamentals of distributed systems, including consensus algorithms, CAP theorem, and real-world patterns for building reliable systems.
Table of Contents
Introduction
Distributed systems are everywhere. Every time you send a message, make a payment, or stream a video, you’re interacting with distributed systems. But what makes them so challenging?
The CAP Theorem
The CAP theorem states that a distributed system can only provide two of the following three guarantees:
- Consistency: Every read receives the most recent write
- Availability: Every request receives a response
- Partition Tolerance: The system continues to operate despite network partitions
// Example: Choosing consistency over availability
async function writeWithConsistency(key: string, value: string) {
const ackCount = await Promise.all(
replicas.map((r) => r.write(key, value))
);
// Wait for majority acknowledgment
const majority = Math.floor(replicas.length / 2) + 1;
if (ackCount.filter(Boolean).length < majority) {
throw new Error("Failed to achieve consensus");
}
}
Consensus Algorithms
Raft
Raft is a consensus algorithm designed to be understandable. It separates the key elements of consensus:
- Leader Election - One node is elected leader
- Log Replication - Leader sends entries to followers
- Safety - If a server has applied a log entry, no other server will apply a different entry for the same index
Event Sourcing
Instead of storing current state, store all events that led to the current state:
interface DomainEvent {
type: string;
timestamp: Date;
payload: Record<string, unknown>;
}
class EventStore {
private events: DomainEvent[] = [];
append(event: DomainEvent) {
this.events.push(event);
}
replay(): State {
return this.events.reduce(
(state, event) => applyEvent(state, event),
initialState
);
}
}
Real-World Patterns
Circuit Breaker
Prevent cascading failures by detecting when a service is unhealthy:
class CircuitBreaker {
private failures = 0;
private state: "closed" | "open" | "half-open" = "closed";
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === "open") {
throw new Error("Circuit is open");
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failures = 0;
this.state = "closed";
}
private onFailure() {
this.failures++;
if (this.failures >= 5) {
this.state = "open";
setTimeout(() => { this.state = "half-open"; }, 30000);
}
}
}
Conclusion
Building distributed systems is fundamentally about making trade-offs. Understanding these trade-offs — consistency vs. availability, latency vs. throughput, simplicity vs. resilience — is what separates good engineers from great ones.
The key is to start simple and add complexity only when needed. Not every system needs to be a distributed system, and not every distributed system needs to solve all problems at once.