Stateful Processing Concepts

This page introduces core ideas behind processing ordered data in chunks.

  • Ordered streams: data indexed by a monotonically increasing key (e.g., timestamps).

  • Chunked iteration: process batches that fit in memory; keep ordering across chunks.

  • Warm-up: some operations require a look-back window; handle via state or n_prev.

  • State: persist minimal mutable state across chunks (e.g., counters, last values).

  • Resuming: on subsequent runs, restore state to continue seamlessly via the loop persistence file.

Building blocks

  • StatefulLoop: orchestrates iteration, exposes is_last_iteration and iteration_count, manages state bindings, and provides memory-aware buffer.

  • Stateful ops: vectorized operations designed for chunked usage, such as SegmentedAggregator (planned, refactoring of AggStream) and AsofMerger.

  • Store: ordered parquet datasets with merging and intersections across datasets.

Warm-up strategies

  • Use StatefulLoop state to carry needed history between different runs.

  • Use Store.iter_intersections(..., n_prev=...) to prepend previous rows in the first chunk.