Segment Format
Internal format of StreamHouse data segments.
8 min readStorage
On this page
Segment Structure
A segment is an immutable, compressed file stored in S3 that contains a sequence of records for a single partition. Each segment has a header, a data section containing the compressed records, and a footer with an offset index for fast lookups.
Binary Format
The segment binary format is designed for fast reads and efficient storage.
text
┌─────────────────────────────────┐
│ Header (32 bytes) │
│ - Magic bytes: "STRM" │
│ - Version: u16 │
│ - Compression: u8 (LZ4=1) │
│ - Record count: u64 │
│ - Start offset: u64 │
│ - End offset: u64 │
├─────────────────────────────────┤
│ Data Section (variable) │
│ - LZ4-compressed records │
│ - Each record: │
│ - Key length (u32) │
│ - Key bytes │
│ - Value length (u32) │
│ - Value bytes │
│ - Timestamp (i64) │
│ - Headers count (u16) │
│ - Header entries │
├─────────────────────────────────┤
│ Index Section │
│ - Sparse offset index │
│ - Every 1000th record offset │
│ - Byte position in data │
├─────────────────────────────────┤
│ Footer (16 bytes) │
│ - Index offset: u64 │
│ - CRC32: u32 │
│ - Magic bytes: "ENDS" │
└─────────────────────────────────┘Segment Sizing
The target segment size is 64MB (configurable). Smaller segments increase metadata overhead but reduce read amplification. Larger segments are more efficient for storage but increase the minimum granularity for reads. For most workloads, 64MB provides a good balance.