The Durability Problem
Here's the worst-case scenario: your producer sends 10,000 events to a StreamHouse agent. The agent buffers them in memory, preparing a segment for S3 upload. Then the process crashes. The segment never reaches S3. Are those 10,000 events gone?
Without the Write-Ahead Log (WAL), yes. With it, no.
The Data Path
To understand the WAL, you need to understand the data path:
Producer → Agent (gRPC) → WAL (disk) → SegmentBuffer (RAM) → S3
Every record that enters an agent is written to the WAL before it enters the in-memory segment buffer. The WAL is a sequential, append-only file on the agent's local disk. If the agent crashes, the WAL file survives, and on restart, records are replayed from the WAL back into memory.
WAL Record Format
Each entry in the WAL contains everything needed to reconstruct the record:
┌─────────────────────────────────┐
│ WAL Entry │
│ - Length: u32 │
│ - CRC32: u32 │
│ - Topic: string │
│ - Partition: u32 │
│ - Key: bytes │
│ - Value: bytes │
│ - Timestamp: i64 │
│ - Headers: [(string, bytes)] │
└─────────────────────────────────┘
The CRC32 checksum covers the entire entry, including the length field. This catches partial writes, bit flips, and filesystem corruption. During recovery, any entry with a bad CRC is discarded — it represents an incomplete write that was interrupted by the crash.
Three Sync Policies
The critical question is: when do we fsync the WAL to disk? StreamHouse offers three policies to match different durability-performance tradeoffs:
Always Sync
export WAL_SYNC_POLICY=always
Every record is fsync'd to disk before the produce request is acknowledged. Zero data loss even on power failure. Throughput: 50,000-100,000 records/sec.
This is the safest option. The latency cost is 100-500 microseconds per fsync on SSD, which is acceptable for most workloads.
Interval Sync (Recommended)
export WAL_SYNC_POLICY=interval
export WAL_SYNC_INTERVAL_MS=100
The WAL is fsync'd every 100ms. Records written between syncs may be lost on a power failure (not a process crash — the OS buffer cache survives process crashes on Linux). At risk: 100-1000 records in the worst case.
This is the recommended default. It achieves 1-2 million records/sec while losing at most 100ms of data on a catastrophic hardware failure.
Never Sync
export WAL_SYNC_POLICY=never
The WAL relies entirely on the OS buffer cache for durability. Data is durable against process crashes but may be lost on power failure. Throughput: 2+ million records/sec.
Use this for development or workloads where occasional data loss is acceptable (metrics, debug logs).
The Recovery Process
When a StreamHouse agent starts, it checks for an existing WAL file. If one exists, recovery runs automatically:
- Open the WAL file and read from the beginning
- Validate each entry by computing the CRC32 and comparing it to the stored checksum
- Replay valid entries into the in-memory SegmentBuffer, partitioned by topic and partition
- Skip invalid entries — these represent partially written records from the crash point
- Resume normal operation — the next produce request appends to the existing WAL
- Flush recovered segments to S3 following the normal segment lifecycle
Agent startup:
[INFO] WAL file found: /data/wal/streamhouse.wal (24MB)
[INFO] Replaying WAL entries...
[INFO] Recovered 48,231 records across 12 partitions
[INFO] Skipped 3 entries with invalid CRC (partial writes)
[INFO] Recovery complete in 340ms
[INFO] Agent ready to accept connections
The recovery process is fast — it reads sequentially from disk at SSD speed, typically recovering millions of records per second.
Failure Scenarios
Scenario 1: Agent Process Crash
The agent receives a SIGSEGV, OOM kill, or unhandled panic.
- WAL entries already synced: Recovered on restart. Zero loss.
- WAL entries in OS buffer cache: Recovered on restart (process crash doesn't clear the page cache). Zero loss.
- Segment buffer (RAM): Anything already in the WAL is safe. The segment that was building in memory is reconstructed from WAL replay.
Scenario 2: Agent Crash During S3 Upload
The agent crashes while uploading a segment to S3.
- S3 upload is atomic — either the full object lands or it doesn't. Partial uploads don't create visible objects.
- On restart, the WAL replay reconstructs the segment buffer. The agent re-uploads the segment.
- Duplicate segments are prevented by checking the metadata store for existing segment registrations before uploading.
Scenario 3: Power Failure
The physical machine loses power, wiping both RAM and the OS buffer cache.
- With always sync: Zero loss. Every acknowledged record is on disk.
- With interval sync: Loss of up to one sync interval (default 100ms) of data.
- With never sync: Loss of all unflushed WAL data.
The Full Durability Stack
The WAL is one layer in a multi-layer durability strategy:
Layer 1: WAL (local disk) → survives process crashes
Layer 2: S3 (object storage) → survives hardware failure (11 nines)
Layer 3: PostgreSQL (metadata) → survives with automated backups
Layer 4: CRC32 checksums → detects corruption at every level
Once a segment is flushed to S3 and registered in the metadata store, the WAL entries for those records are no longer needed. The WAL is periodically truncated to reclaim disk space.
Monitoring the WAL
Keep an eye on these metrics:
streamhouse_wal_size_bytes # Current WAL file size
streamhouse_wal_entries_total # Total entries written
streamhouse_wal_recovery_records # Records recovered on last startup
streamhouse_wal_sync_duration_ms # Time spent in fsync
streamhouse_wal_corruption_detected # CRC failures (should be 0)
Alert on wal_corruption_detected > 0 — it indicates a disk issue that needs investigation.
The Bottom Line
StreamHouse's WAL guarantees that acknowledged events survive agent crashes with configurable durability. Combined with S3's 11-nines durability and CRC32 checksums at every level, the system provides end-to-end data integrity from producer to consumer.
Choose your sync policy based on your workload:
- Financial transactions: Always sync
- General production: Interval sync (100ms)
- Dev and metrics: Never sync
Your data is safe.