oups

oups stands for Ordered Unified Processing Stack — out-of-core processing for ordered data (batch + live).

oups unifies processing of ordered data across two core workflows:

  • Training dataset production (offline): process large historical ordered data out-of-core using vectorized, stateful operations.

  • Live usage (streaming/batch): reuse the exact same logic on incoming chunks, with resumable state.

The library consists of three python packages that work together:

  • stateful_loop: iterate over ordered chunks, bind and persist loop state, and buffer DataFrames under a memory cap with flush-on-limit/last-iteration.

  • stateful_ops: vectorized operations designed for chunked usage (e.g., AsofMerger; SegmentedAggregator is planned).

  • store: ordered Parquet datasets with schema-driven indexing, incremental updates, duplicate handling, and synchronized iteration across datasets.

Key Capabilities

  • Single code path for offline and live: Process historical and streaming ordered data with the same stateful, vectorized tools.

  • Stateful orchestration: StatefulLoop provides iteration context, state binding/persistence, and memory-aware buffering.

  • Chunk-friendly stateful ops: AsofMerger performs multi-DataFrame as-of joins (with previous values) iteratively.

  • Ordered storage: Store and OrderedParquetDataset validate ordering, handle duplicates, and support incremental updates.

  • Synchronized iteration: Iterate aligned chunks across datasets via intersections with optional warm-up (n_prev).

Documentation

Indices and Tables