oups

oups stands for Ordered Unified Processing Stack — out-of-core processing for ordered data (batch + live).

oups unifies processing of ordered data across two core workflows:

Training dataset production (offline): process large historical ordered data out-of-core using vectorized, stateful operations.
Live usage (streaming/batch): reuse the exact same logic on incoming chunks, with resumable state.

The library consists of three python packages that work together:

stateful_loop: iterate over ordered chunks, bind and persist loop state, and buffer DataFrames under a memory cap with flush-on-limit/last-iteration.
stateful_ops: vectorized operations designed for chunked usage (e.g., AsofMerger; SegmentedAggregator is planned).
store: ordered Parquet datasets with schema-driven indexing, incremental updates, duplicate handling, and synchronized iteration across datasets.

Key Capabilities

Single code path for offline and live: Process historical and streaming ordered data with the same stateful, vectorized tools.
Stateful orchestration: StatefulLoop provides iteration context, state binding/persistence, and memory-aware buffering.
Chunk-friendly stateful ops: AsofMerger performs multi-DataFrame as-of joins (with previous values) iteratively.
Ordered storage: Store and OrderedParquetDataset validate ordering, handle duplicates, and support incremental updates.
Synchronized iteration: Iterate aligned chunks across datasets via intersections with optional warm-up (n_prev).