Every Event Has a Home
When you produce a message to StreamHouse, it doesn't just vanish into the cloud. It follows a precise, deterministic path: from your producer, through an agent's memory buffer, into a compressed binary segment, and finally into an S3 object where it will live with 11 nines of durability.
This post walks through exactly what happens to your bytes at every step.
The Storage Hierarchy
StreamHouse organizes data in four levels:
Records → Blocks → Segments → Partitions
│ │ │ │
Single ~1MB ~64MB Ordered
event batches S3 files log
Records are individual events — a key, value, timestamp, and optional headers. Blocks group ~1MB of records together for compression. Segments are the unit of storage on S3, typically 64MB containing many blocks. Partitions are the ordered sequence of segments that form a topic's log.
The Segment Binary Format
Every segment is a self-contained, immutable file with four sections:
┌─────────────────────────────────┐
│ Header (64 bytes) │
│ - Magic: 0x5348 ("SH") │
│ - Version: u16 │
│ - Flags: u32 │
│ - Compression: u8 │
│ - Created timestamp: i64 │
│ - Record count: u64 │
│ - Start/End offset: u64 │
├─────────────────────────────────┤
│ Block 0 │
│ - Compressed records (~1MB) │
│ - CRC32 checksum │
├─────────────────────────────────┤
│ Block 1 │
│ - Compressed records (~1MB) │
│ - CRC32 checksum │
├─────────────────────────────────┤
│ ...more blocks... │
├─────────────────────────────────┤
│ Sparse Index │
│ - Offset → byte position │
│ - One entry per block │
├─────────────────────────────────┤
│ Footer (16 bytes) │
│ - Index offset: u64 │
│ - File CRC32: u32 │
│ - Magic: 0x454E ("EN") │
└─────────────────────────────────┘
The header is read first to verify the file and understand its contents. The footer is read to locate the index. The index maps offsets to byte positions so consumers can jump directly to the block containing a target offset without scanning the entire file.
Record Encoding
Inside each block, records use varint encoding and delta compression for maximum space efficiency:
┌─────────────────────────────────┐
│ Record │
│ - Offset delta (varint) │
│ - Timestamp delta (varint) │
│ - Key length (varint) │
│ - Key bytes │
│ - Value length (varint) │
│ - Value bytes │
│ - Header count (varint) │
│ - Header entries │
└─────────────────────────────────┘
Instead of storing absolute offsets and timestamps, we store deltas from the previous record. A sequence of offsets like [1000, 1001, 1002, 1003] becomes [1000, 1, 1, 1] — varints that encode in a single byte each.
Why LZ4?
We chose LZ4 as the default compression algorithm after extensive benchmarking:
- JSON payloads: 4.3x compression ratio, 3.2 GB/s decompression
- Protobuf payloads: 1.4x ratio, 3.8 GB/s decompression
- Text logs: 8x ratio, 2.9 GB/s decompression
LZ4 decompresses at nearly memory bandwidth speed, which matters because consumers need to decompress segments on every read. We also support Zstd for workloads where storage cost matters more than latency — it achieves roughly 2x better compression at 5-10x slower decompression.
# Choose compression per topic
streamctl topic create --name logs --partitions 6 --compression lz4
streamctl topic create --name archive --partitions 3 --compression zstd
Why 64MB Segments?
The 64MB target isn't arbitrary. It's the result of balancing three forces:
- More S3 PUT operations = higher cost ($0.005 per 1000 PUTs)
- More metadata entries in PostgreSQL
- Better read granularity for small range queries
- Fewer S3 operations = lower cost
- Less metadata overhead
- Higher read amplification — consumers must download more unused data
At 64MB with LZ4, a typical segment holds 100K-500K records and costs ~$0.000005 to PUT. The segment flushes every 10 seconds or when the buffer hits 64MB, whichever comes first.
CRC32 at Every Level
Data integrity is non-negotiable. StreamHouse computes CRC32 checksums at three levels:
- Per-block: Each compressed block has a CRC32 of its compressed bytes. If a block fails validation during read, the agent retries the S3 fetch.
- Per-file: The footer contains a CRC32 of the entire segment. Corrupted segments are detected during any read operation.
- Per-record (WAL): The Write-Ahead Log checksums every individual record before it enters the buffer.
If any checksum fails, StreamHouse rejects the data and logs a corruption event rather than serving bad records.
The Lifecycle of a Segment
- Buffering: Records arrive via gRPC/HTTP and enter the agent's in-memory SegmentBuffer, organized by partition
- Flushing: When the buffer reaches 64MB or 10 seconds elapse, the agent compresses blocks with LZ4, builds the index, and computes checksums
- Upload: The segment is uploaded to S3 as a single PUT operation
- Registration: The agent records the segment's S3 path, offset range, and size in the PostgreSQL metadata store
- Sealing: The segment is now immutable — it will never be modified, only eventually deleted by retention policies
S3 Path Layout
Segments are organized in S3 with a predictable path structure:
s3://streamhouse-data/
topics/
user-events/
partitions/
0/
segments/
00000000-00000999.seg
00001000-00001999.seg
1/
segments/
00000000-00000499.seg
This structure enables efficient prefix listing when an agent needs to discover segments for a partition, and makes it easy to configure S3 lifecycle rules for cost optimization.
What This Means for You
The segment format is entirely transparent to users — you never interact with it directly. But understanding it explains several StreamHouse behaviors:
- Why produce latency is 50-100ms: Records are buffered until a segment is ready to flush
- Why tail reads are fast: Recent data is still in the agent's memory buffer, no S3 fetch needed
- Why storage is cheap: LZ4 compression + S3 pricing = pennies per GB per month
- Why data is durable: CRC32 checksums + S3's 11 nines = you won't lose events
The segment format is open and documented. Build tools on top of it, inspect your data directly, or just rest easy knowing your events are stored with care.