Deep Dive

How StreamHouse works

A complete guide to how StreamHouse replaces Kafka + Flink with a single, cloud-native streaming platform. Understand the architecture, data flow, and why your data is always safe.

The Challenge

Why we built something new

Traditional: Kafka + Flink

Kafka BrokersStateful, local disks
Broker 1
Broker 2
Broker 3
Flink ClusterSeparate system
TaskManager
TaskManager

• 3x storage for replication

• Two systems to operate

• Complex failover

• Stateful brokers = hard scaling

StreamHouse: Unified & Stateless

Stateless AgentsEphemeral compute
Agent 1
Agent 2
Agent 3
PostgreSQLMetadata only
S3 StorageAll data

11 nines durability via S3

One system, SQL built-in

Instant failover (lease-based)

Scale agents freely

Architecture

The complete picture

StreamHouse separates compute from storage. Agents are stateless and can be killed or restarted instantly. All durability comes from S3 and PostgreSQL.

Your Applications
Rust SDK
Python SDK
Node SDK
Java SDK
gRPC
REST API
gRPC / HTTP
Stateless Agent Pool

Agent 1

Ephemeral

Agent 2

Ephemeral

Agent 3

Ephemeral

Agent 4

Ephemeral

+Scale freely
Metadata
Segments

PostgreSQL

Metadata Store

• Topic & partition info

• Consumer offsets

• Segment locations

• Partition leases

S3 / Object Storage

Durable Data Store

• Compressed segment files

• 11 nines durability

• Infinite capacity

• $0.023/GB/month

Data Flow

From producer to consumer

Follow a message through the system and understand exactly how durability and performance are achieved at each step.

Write Path

Producer → S3

1

Producer sends records

Your application batches records and sends them to an agent via gRPC.

2

Write-Ahead Log (WAL)

Records are immediately written to local disk with CRC32 checksums. This is the durability checkpoint.

3

In-memory buffer

Records accumulate in memory with delta encoding and varint compression.

4

Segment finalization

When buffer reaches ~64-256MB, it's compressed with LZ4 and indexed for fast seeks.

5

S3 upload

Immutable segment file uploaded to S3. Metadata registered in PostgreSQL.

6

WAL truncated

After S3 confirms upload, WAL is safely cleared for that segment.

Read Path

S3 → Consumer

1

Consumer requests offset

Consumer asks for records starting from a specific offset via gRPC.

2

Segment index lookup

O(log n) BTreeMap lookup finds which segment contains the offset. Sub-microsecond.

3

Check local cache

LRU disk cache checked first. Cache hit = <1ms latency.

4

S3 download (if miss)

On cache miss, segment downloaded from S3 (50-200ms) and cached locally.

5

Decompress & decode

LZ4 decompression at 3.1M records/sec. Delta decoding restores original values.

6

Return records

Ordered records streamed back to consumer. Offset committed to PostgreSQL.

Durability Layer

Write-Ahead Log (WAL)

Before any record is acknowledged, it's written to a local Write-Ahead Log. This ensures zero data loss even if an agent crashes before uploading to S3.

CRC32 Checksums

Every record includes a checksum. Corrupted records are detected and skipped during recovery.

Configurable Fsync

Choose between Always (safest), Interval (100ms default), or Never (testing only).

Automatic Recovery

On agent restart, WAL is replayed to recover any unflushed records. Zero manual intervention.

WAL Record Format
// Each record in the WAL file
4 bytes
Record Size
4 bytes
CRC32 Checksum
8 bytes
Timestamp (ms)
4 bytes
Key Size
N bytes
Key Data
4 bytes
Value Size
M bytes
Value Data
Performance

Why it's still fast

Despite storing data in S3, StreamHouse achieves high performance through smart optimizations at every layer.

Delta Encoding + Varints

5-7 bytes saved per record

Instead of storing absolute values, we store differences. Sequential offsets become tiny deltas that compress to 1 byte.

FieldTraditionalStreamHouse
Offset 10008 bytes2 bytes
Offset 10018 bytes1 byte (δ=1)
Timestamp8 bytes1-3 bytes
Total saved~70%

LZ4 Block Compression

80%+ space savings

Each ~1MB block is compressed independently. Fast enough to not be a bottleneck, effective enough to dramatically reduce storage and transfer costs.

2.26M
records/sec write
3.10M
records/sec read
80%+
compression ratio
640µs
seek to offset

Segment Index

O(log n) offset lookup

Binary search over block offsets without decompressing. Find any offset in microseconds, even in multi-GB segments.

// Segment index structure
Block 0:offset=0, pos=64
Block 1:offset=10000, pos=1048640
Block 2:offset=20000, pos=2097216
// Find offset 15000:
binary_search → Block 1
seek(1048640) → decompress

LRU Segment Cache

80%+ cache hit rate

Frequently accessed segments are cached on local disk. Sequential reads achieve excellent cache hit rates, avoiding S3 latency.

Cache hit latency<1ms
Cache miss latency50-200ms
Sequential read hit rate>80%
Automatic prefetching✓ Enabled
Reliability

How you won't lose data

Multiple layers of protection ensure your data is always safe, from agent crashes to S3 outages.

1

Write-Ahead Log

Every record hits local disk before acknowledgment. CRC32 checksums detect corruption. On crash, WAL replays automatically.

Recovery scenario:
Agent crashes → Restart → WAL replay → Zero data loss
2

S3 Durability

Once segments upload to S3, you get 11 nines (99.999999999%) durability. Immutable files, no partial writes, optional multi-region.

What this means:
Lose 1 object per 10 million every 10,000 years
3

PostgreSQL Metadata

All metadata durably stored: segment locations, consumer offsets, partition leases. Standard PostgreSQL replication for HA.

Tracked data:
Offsets, watermarks, leases, segment registry

Failure Recovery Scenarios

Agent Crash Mid-Write

Records in WAL
Agent restarts
WAL replayed
Zero data loss

Partition Rebalancing

Agent 1 dies
Lease expires (30s)
Agent 2 acquires lease
Seamless failover

S3 Temporarily Down

S3 unavailable
Circuit breaker activates
Records buffer in WAL
S3 recovers, WAL drains
Coordination

Lease-based leadership

Instead of complex consensus protocols like Raft or ZooKeeper, StreamHouse uses simple time-based leases with epoch fencing to prevent split-brain scenarios.

30-Second Leases

Each partition has exactly one leader. Leases renewed every 10 seconds.

Epoch Fencing

Every write includes epoch number. Stale epochs are rejected, preventing duplicate writes.

Automatic Failover

If leader fails, lease expires and any healthy agent can take over within 30 seconds.

Partition Lease Table

PartitionAgentEpochStatus
orders:0agent-142Active
orders:1agent-238Active
orders:2agent-155Active
How epoch fencing works:
Agent-1 (epoch=5) tries to write → REJECTED
Agent-2 (epoch=6) is current leader
Agent-2 (epoch=6) writes → ACCEPTED
Storage Format

Inside a segment file

Segment files are the immutable storage unit. Each contains thousands of records, compressed and indexed for fast random access.

segment_00000000000000000000.seg
Header (64 bytes)
Magic: "STRM"
Version: 1
Compression: LZ4
Records: 100,000
Block 0(~1MB compressed)
offsets 0-33332
Block 1(~1MB compressed)
offsets 33333-66665
Block 2(~1MB compressed)
offsets 66666-99998
Index
Block 0 → offset=0, pos=64
Block 1 → offset=33333, pos=1048640
Block 2 → offset=66666, pos=2097216
Footer (32 bytes)
Index pos: 3145728
CRC32: 0xA1B2C3D4
Magic: "STRM"

S3 Object Hierarchy

s3://streamhouse-bucket/
└── org-{org_id}/
└── data/
├── orders/
├── 0/
├── 00000000000000000000.seg
├── 00000000000000100000.seg
└── ...
├── 1/
└── ...
└── users/

PostgreSQL Schema

segments
topic, partition, base_offset, end_offset, s3_key, size_bytes
consumer_offsets
group_id, topic, partition, committed_offset, committed_at
partition_leases
topic, partition, agent_id, lease_epoch, expires_at
Comparison

StreamHouse vs. Kafka + Flink

AspectKafka + FlinkStreamHouse
StorageLocal SSDs per brokerS3 (11 nines durability)
Compute modelStateful brokers + tasksStateless agents
CoordinationZooKeeper + Flink JobManagerPostgreSQL + leases
ScalingAdd brokers, rebalanceSpin up more agents instantly
Failover timeMinutes (leader election)30 seconds (lease expiration)
RecoveryLog replication + checkpointsWAL replay + S3
SQL processingSeparate Flink clusterBuilt-in
Systems to operate2+ (Kafka, Flink, ZK)1
Storage cost3x for replication1x (S3 handles durability)

Ready to simplify
your streaming?

Get started in minutes. No complex configuration, no 50-page runbooks.