How StreamHouse works
A complete guide to how StreamHouse replaces Kafka + Flink with a single, cloud-native streaming platform. Understand the architecture, data flow, and why your data is always safe.
Why we built something new
Traditional: Kafka + Flink
• 3x storage for replication
• Two systems to operate
• Complex failover
• Stateful brokers = hard scaling
StreamHouse: Unified & Stateless
11 nines durability via S3
One system, SQL built-in
Instant failover (lease-based)
Scale agents freely
The complete picture
StreamHouse separates compute from storage. Agents are stateless and can be killed or restarted instantly. All durability comes from S3 and PostgreSQL.
Agent 1
Ephemeral
Agent 2
Ephemeral
Agent 3
Ephemeral
Agent 4
Ephemeral
PostgreSQL
Metadata Store
• Topic & partition info
• Consumer offsets
• Segment locations
• Partition leases
S3 / Object Storage
Durable Data Store
• Compressed segment files
• 11 nines durability
• Infinite capacity
• $0.023/GB/month
From producer to consumer
Follow a message through the system and understand exactly how durability and performance are achieved at each step.
Write Path
Producer → S3
Producer sends records
Your application batches records and sends them to an agent via gRPC.
Write-Ahead Log (WAL)
Records are immediately written to local disk with CRC32 checksums. This is the durability checkpoint.
In-memory buffer
Records accumulate in memory with delta encoding and varint compression.
Segment finalization
When buffer reaches ~64-256MB, it's compressed with LZ4 and indexed for fast seeks.
S3 upload
Immutable segment file uploaded to S3. Metadata registered in PostgreSQL.
WAL truncated
After S3 confirms upload, WAL is safely cleared for that segment.
Read Path
S3 → Consumer
Consumer requests offset
Consumer asks for records starting from a specific offset via gRPC.
Segment index lookup
O(log n) BTreeMap lookup finds which segment contains the offset. Sub-microsecond.
Check local cache
LRU disk cache checked first. Cache hit = <1ms latency.
S3 download (if miss)
On cache miss, segment downloaded from S3 (50-200ms) and cached locally.
Decompress & decode
LZ4 decompression at 3.1M records/sec. Delta decoding restores original values.
Return records
Ordered records streamed back to consumer. Offset committed to PostgreSQL.
Write-Ahead Log (WAL)
Before any record is acknowledged, it's written to a local Write-Ahead Log. This ensures zero data loss even if an agent crashes before uploading to S3.
CRC32 Checksums
Every record includes a checksum. Corrupted records are detected and skipped during recovery.
Configurable Fsync
Choose between Always (safest), Interval (100ms default), or Never (testing only).
Automatic Recovery
On agent restart, WAL is replayed to recover any unflushed records. Zero manual intervention.
Why it's still fast
Despite storing data in S3, StreamHouse achieves high performance through smart optimizations at every layer.
Delta Encoding + Varints
5-7 bytes saved per record
Instead of storing absolute values, we store differences. Sequential offsets become tiny deltas that compress to 1 byte.
| Field | Traditional | StreamHouse |
|---|---|---|
| Offset 1000 | 8 bytes | 2 bytes |
| Offset 1001 | 8 bytes | 1 byte (δ=1) |
| Timestamp | 8 bytes | 1-3 bytes |
| Total saved | — | ~70% |
LZ4 Block Compression
80%+ space savings
Each ~1MB block is compressed independently. Fast enough to not be a bottleneck, effective enough to dramatically reduce storage and transfer costs.
Segment Index
O(log n) offset lookup
Binary search over block offsets without decompressing. Find any offset in microseconds, even in multi-GB segments.
LRU Segment Cache
80%+ cache hit rate
Frequently accessed segments are cached on local disk. Sequential reads achieve excellent cache hit rates, avoiding S3 latency.
How you won't lose data
Multiple layers of protection ensure your data is always safe, from agent crashes to S3 outages.
Write-Ahead Log
Every record hits local disk before acknowledgment. CRC32 checksums detect corruption. On crash, WAL replays automatically.
S3 Durability
Once segments upload to S3, you get 11 nines (99.999999999%) durability. Immutable files, no partial writes, optional multi-region.
PostgreSQL Metadata
All metadata durably stored: segment locations, consumer offsets, partition leases. Standard PostgreSQL replication for HA.
Failure Recovery Scenarios
Agent Crash Mid-Write
Partition Rebalancing
S3 Temporarily Down
Lease-based leadership
Instead of complex consensus protocols like Raft or ZooKeeper, StreamHouse uses simple time-based leases with epoch fencing to prevent split-brain scenarios.
30-Second Leases
Each partition has exactly one leader. Leases renewed every 10 seconds.
Epoch Fencing
Every write includes epoch number. Stale epochs are rejected, preventing duplicate writes.
Automatic Failover
If leader fails, lease expires and any healthy agent can take over within 30 seconds.
Partition Lease Table
| Partition | Agent | Epoch | Status |
|---|---|---|---|
| orders:0 | agent-1 | 42 | Active |
| orders:1 | agent-2 | 38 | Active |
| orders:2 | agent-1 | 55 | Active |
Inside a segment file
Segment files are the immutable storage unit. Each contains thousands of records, compressed and indexed for fast random access.
S3 Object Hierarchy
PostgreSQL Schema
StreamHouse vs. Kafka + Flink
| Aspect | Kafka + Flink | StreamHouse |
|---|---|---|
| Storage | Local SSDs per broker | S3 (11 nines durability) |
| Compute model | Stateful brokers + tasks | Stateless agents |
| Coordination | ZooKeeper + Flink JobManager | PostgreSQL + leases |
| Scaling | Add brokers, rebalance | Spin up more agents instantly |
| Failover time | Minutes (leader election) | 30 seconds (lease expiration) |
| Recovery | Log replication + checkpoints | WAL replay + S3 |
| SQL processing | Separate Flink cluster | Built-in |
| Systems to operate | 2+ (Kafka, Flink, ZK) | 1 |
| Storage cost | 3x for replication | 1x (S3 handles durability) |