Stateful Processing Concepts
This page introduces core ideas behind processing ordered data in chunks.
Ordered streams: data indexed by a monotonically increasing key (e.g., timestamps).
Chunked iteration: process batches that fit in memory; keep ordering across chunks.
Warm-up: some operations require a look-back window; handle via state or
n_prev.State: persist minimal mutable state across chunks (e.g., counters, last values).
Resuming: on subsequent runs, restore state to continue seamlessly via the loop persistence file.
Building blocks
StatefulLoop: orchestrates iteration, exposes
is_last_iterationanditeration_count, manages state bindings, and provides memory-awarebuffer.Stateful ops: vectorized operations designed for chunked usage, such as
SegmentedAggregator(planned, refactoring ofAggStream) andAsofMerger.Store: ordered parquet datasets with merging and intersections across datasets.
Warm-up strategies
Use
StatefulLoopstate to carry needed history between different runs.Use
Store.iter_intersections(..., n_prev=...)to prepend previous rows in the first chunk.
Cross-links
Quickstart: Quickstart
StatefulLoop guide: StatefulLoop
AsofMerger guide: AsofMerger
Store architecture: Store Architecture