Deep Dive

How StreamHouse works

A complete guide to how StreamHouse replaces Kafka + Flink with a single, cloud-native streaming platform. Understand the architecture, data flow, and why your data is always safe.

The Challenge

Why we built something new

Traditional: Kafka + Flink

Kafka BrokersStateful, local disks

Broker 1

Broker 2

Broker 3

Flink ClusterSeparate system

TaskManager

• 3x storage for replication

• Two systems to operate

• Complex failover

• Stateful brokers = hard scaling

StreamHouse: Unified & Stateless

Stateless AgentsEphemeral compute

Agent 1

Agent 2

Agent 3

PostgreSQLMetadata only

S3 StorageAll data

11 nines durability via S3

One system, SQL built-in

Instant failover (lease-based)

Scale agents freely

Architecture

The complete picture

StreamHouse separates compute from storage. Agents are stateless and can be killed or restarted instantly. All durability comes from S3 and PostgreSQL.

Your Applications

Rust SDK

Python SDK

Node SDK

Java SDK

gRPC

REST API

gRPC / HTTP

Stateless Agent Pool

Agent 1

Ephemeral

Agent 2

Ephemeral

Agent 3

Ephemeral

Agent 4

Ephemeral

+Scale freely

Metadata

Segments

PostgreSQL

Metadata Store

• Topic & partition info

• Consumer offsets

• Segment locations

• Partition leases

S3 / Object Storage

Durable Data Store

• Compressed segment files

• 11 nines durability

• Infinite capacity

• $0.023/GB/month

Data Flow

From producer to consumer

Follow a message through the system and understand exactly how durability and performance are achieved at each step.

Write Path

Producer → S3

Producer sends records

Your application batches records and sends them to an agent via gRPC.

Write-Ahead Log (WAL)

Records are immediately written to local disk with CRC32 checksums. This is the durability checkpoint.

In-memory buffer

Records accumulate in memory with delta encoding and varint compression.

Segment finalization

When buffer reaches ~64-256MB, it's compressed with LZ4 and indexed for fast seeks.

S3 upload

Immutable segment file uploaded to S3. Metadata registered in PostgreSQL.

WAL truncated

After S3 confirms upload, WAL is safely cleared for that segment.

Read Path

S3 → Consumer

Consumer requests offset

Consumer asks for records starting from a specific offset via gRPC.

Segment index lookup

O(log n) BTreeMap lookup finds which segment contains the offset. Sub-microsecond.

Check local cache

LRU disk cache checked first. Cache hit = <1ms latency.

S3 download (if miss)

On cache miss, segment downloaded from S3 (50-200ms) and cached locally.

Decompress & decode

LZ4 decompression at 3.1M records/sec. Delta decoding restores original values.

Return records

Ordered records streamed back to consumer. Offset committed to PostgreSQL.

Durability Layer

Write-Ahead Log (WAL)

Before any record is acknowledged, it's written to a local Write-Ahead Log. This ensures zero data loss even if an agent crashes before uploading to S3.

CRC32 Checksums

Every record includes a checksum. Corrupted records are detected and skipped during recovery.

Configurable Fsync

Choose between Always (safest), Interval (100ms default), or Never (testing only).

Automatic Recovery

On agent restart, WAL is replayed to recover any unflushed records. Zero manual intervention.

WAL Record Format

// Each record in the WAL file

4 bytes

Record Size

4 bytes

CRC32 Checksum

8 bytes

Timestamp (ms)

4 bytes

Key Size

N bytes

Key Data

4 bytes

Value Size

M bytes

Value Data

Performance

Why it's still fast

Despite storing data in S3, StreamHouse achieves high performance through smart optimizations at every layer.

Delta Encoding + Varints

5-7 bytes saved per record

Instead of storing absolute values, we store differences. Sequential offsets become tiny deltas that compress to 1 byte.

Field	Traditional	StreamHouse
Offset 1000	8 bytes	2 bytes
Offset 1001	8 bytes	1 byte (δ=1)
Timestamp	8 bytes	1-3 bytes
Total saved	—	~70%

LZ4 Block Compression

80%+ space savings

Each ~1MB block is compressed independently. Fast enough to not be a bottleneck, effective enough to dramatically reduce storage and transfer costs.

2.26M

records/sec write

3.10M

records/sec read

80%+

compression ratio

640µs

seek to offset

Segment Index

O(log n) offset lookup

Binary search over block offsets without decompressing. Find any offset in microseconds, even in multi-GB segments.

// Segment index structure

Block 0:offset=0, pos=64

Block 1:offset=10000, pos=1048640

Block 2:offset=20000, pos=2097216

// Find offset 15000:

binary_search → Block 1

seek(1048640) → decompress

LRU Segment Cache

80%+ cache hit rate

Frequently accessed segments are cached on local disk. Sequential reads achieve excellent cache hit rates, avoiding S3 latency.

Cache hit latency<1ms

Cache miss latency50-200ms

Sequential read hit rate>80%

Automatic prefetching✓ Enabled

Reliability

How you won't lose data

Multiple layers of protection ensure your data is always safe, from agent crashes to S3 outages.

Write-Ahead Log

Every record hits local disk before acknowledgment. CRC32 checksums detect corruption. On crash, WAL replays automatically.

Recovery scenario:

Agent crashes → Restart → WAL replay → Zero data loss

S3 Durability

Once segments upload to S3, you get 11 nines (99.999999999%) durability. Immutable files, no partial writes, optional multi-region.

What this means:

Lose 1 object per 10 million every 10,000 years

PostgreSQL Metadata

All metadata durably stored: segment locations, consumer offsets, partition leases. Standard PostgreSQL replication for HA.

Tracked data:

Offsets, watermarks, leases, segment registry

Failure Recovery Scenarios

Agent Crash Mid-Write

Records in WAL

Agent restarts

WAL replayed

Zero data loss

Partition Rebalancing

Agent 1 dies

Lease expires (30s)

Agent 2 acquires lease

Seamless failover

S3 Temporarily Down

S3 unavailable

Circuit breaker activates

Records buffer in WAL

S3 recovers, WAL drains

Coordination

Lease-based leadership

Instead of complex consensus protocols like Raft or ZooKeeper, StreamHouse uses simple time-based leases with epoch fencing to prevent split-brain scenarios.

30-Second Leases

Each partition has exactly one leader. Leases renewed every 10 seconds.

Epoch Fencing

Every write includes epoch number. Stale epochs are rejected, preventing duplicate writes.

Automatic Failover

If leader fails, lease expires and any healthy agent can take over within 30 seconds.

Partition Lease Table

Partition	Agent	Epoch	Status
orders:0	agent-1	42	Active
orders:1	agent-2	38	Active
orders:2	agent-1	55	Active

How epoch fencing works:

Agent-1 (epoch=5) tries to write → REJECTED

Agent-2 (epoch=6) is current leader

Agent-2 (epoch=6) writes → ACCEPTED

Storage Format

Inside a segment file

Segment files are the immutable storage unit. Each contains thousands of records, compressed and indexed for fast random access.

segment_00000000000000000000.seg

Header (64 bytes)

Magic: "STRM"

Version: 1

Compression: LZ4

Records: 100,000

Block 0(~1MB compressed)

offsets 0-33332

Block 1(~1MB compressed)

offsets 33333-66665

Block 2(~1MB compressed)

offsets 66666-99998

Index

Block 0 → offset=0, pos=64

Block 1 → offset=33333, pos=1048640

Block 2 → offset=66666, pos=2097216

Footer (32 bytes)

Index pos: 3145728

CRC32: 0xA1B2C3D4

Magic: "STRM"

S3 Object Hierarchy

s3://streamhouse-bucket/

└── org-{org_id}/

└── data/

├── orders/

├── 0/

├── 00000000000000000000.seg

├── 00000000000000100000.seg

└── ...

├── 1/

└── ...

└── users/

PostgreSQL Schema

segments

topic, partition, base_offset, end_offset, s3_key, size_bytes

consumer_offsets

group_id, topic, partition, committed_offset, committed_at

partition_leases

topic, partition, agent_id, lease_epoch, expires_at

Comparison

StreamHouse vs. Kafka + Flink

Aspect	Kafka + Flink	StreamHouse
Storage	Local SSDs per broker	S3 (11 nines durability)
Compute model	Stateful brokers + tasks	Stateless agents
Coordination	ZooKeeper + Flink JobManager	PostgreSQL + leases
Scaling	Add brokers, rebalance	Spin up more agents instantly
Failover time	Minutes (leader election)	30 seconds (lease expiration)
Recovery	Log replication + checkpoints	WAL replay + S3
SQL processing	Separate Flink cluster	Built-in
Systems to operate	2+ (Kafka, Flink, ZK)	1
Storage cost	3x for replication	1x (S3 handles durability)

Ready to simplify
your streaming?

Get started in minutes. No complex configuration, no 50-page runbooks.

How StreamHouse works

Why we built something new

Traditional: Kafka + Flink

StreamHouse: Unified & Stateless

The complete picture

PostgreSQL

S3 / Object Storage

From producer to consumer

Write Path

Producer sends records

Write-Ahead Log (WAL)

In-memory buffer

Segment finalization

S3 upload

WAL truncated

Read Path

Consumer requests offset

Segment index lookup

Check local cache

S3 download (if miss)

Decompress & decode

Return records

Write-Ahead Log (WAL)

CRC32 Checksums

Configurable Fsync

Automatic Recovery

Why it's still fast

Delta Encoding + Varints

LZ4 Block Compression

Segment Index

LRU Segment Cache

How you won't lose data

Write-Ahead Log

S3 Durability

PostgreSQL Metadata

Failure Recovery Scenarios

Agent Crash Mid-Write

Partition Rebalancing

S3 Temporarily Down

Lease-based leadership

30-Second Leases

Epoch Fencing

Automatic Failover

Partition Lease Table

Inside a segment file

S3 Object Hierarchy

PostgreSQL Schema

StreamHouse vs. Kafka + Flink

Ready to simplifyyour streaming?

Ready to simplify
your streaming?