Understanding Distributed Systems: A Practical Guide

Introduction

Distributed systems are everywhere. Every time you send a message, make a payment, or stream a video, you’re interacting with distributed systems. But what makes them so challenging?

The CAP Theorem

The CAP theorem states that a distributed system can only provide two of the following three guarantees:

Consistency: Every read receives the most recent write
Availability: Every request receives a response
Partition Tolerance: The system continues to operate despite network partitions

// Example: Choosing consistency over availability
async function writeWithConsistency(key: string, value: string) {
  const ackCount = await Promise.all(
    replicas.map((r) => r.write(key, value))
  );

  // Wait for majority acknowledgment
  const majority = Math.floor(replicas.length / 2) + 1;
  if (ackCount.filter(Boolean).length < majority) {
    throw new Error("Failed to achieve consensus");
  }
}

Consensus Algorithms

Raft

Raft is a consensus algorithm designed to be understandable. It separates the key elements of consensus:

Leader Election - One node is elected leader
Log Replication - Leader sends entries to followers
Safety - If a server has applied a log entry, no other server will apply a different entry for the same index

Event Sourcing

Instead of storing current state, store all events that led to the current state:

interface DomainEvent {
  type: string;
  timestamp: Date;
  payload: Record<string, unknown>;
}

class EventStore {
  private events: DomainEvent[] = [];

  append(event: DomainEvent) {
    this.events.push(event);
  }

  replay(): State {
    return this.events.reduce(
      (state, event) => applyEvent(state, event),
      initialState
    );
  }
}

Real-World Patterns

Circuit Breaker

Prevent cascading failures by detecting when a service is unhealthy:

class CircuitBreaker {
  private failures = 0;
  private state: "closed" | "open" | "half-open" = "closed";

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === "open") {
      throw new Error("Circuit is open");
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    this.failures = 0;
    this.state = "closed";
  }

  private onFailure() {
    this.failures++;
    if (this.failures >= 5) {
      this.state = "open";
      setTimeout(() => { this.state = "half-open"; }, 30000);
    }
  }
}

Conclusion

Building distributed systems is fundamentally about making trade-offs. Understanding these trade-offs — consistency vs. availability, latency vs. throughput, simplicity vs. resilience — is what separates good engineers from great ones.

The key is to start simple and add complexity only when needed. Not every system needs to be a distributed system, and not every distributed system needs to solve all problems at once.