Lease Management

How StreamHouse agents coordinate through leases.

7 min readAgents

What Are Leases?

Leases are time-bounded locks that coordinate work among agents. When an agent needs exclusive access to a resource (like flushing a partition's write buffer to S3), it acquires a lease from the metadata store. Leases prevent duplicate work and ensure exactly-once segment uploads even when multiple agents are running.

How Lease Coordination Works

Each agent periodically heartbeats its active leases. If an agent crashes or becomes unresponsive, its leases expire after the TTL (default: 30 seconds), and another agent can take over. This provides automatic failover without manual intervention.

  • Lease acquisition: Agent requests a lease from PostgreSQL using a SELECT FOR UPDATE query
  • Heartbeat: Active leases are renewed every 10 seconds (configurable)
  • Expiry: Leases expire after 30 seconds without renewal
  • Takeover: Any agent can acquire an expired lease and resume the work

Failure Detection

StreamHouse uses lease expiry as the primary mechanism for detecting agent failures. When an agent stops heartbeating (due to crash, network partition, or GC pause), its leases expire and other agents detect this during their next coordination cycle. The detection latency is bounded by the lease TTL, providing predictable recovery times.