Explore streaming / sliding-window support for the Dip Test

#### **What**

Hello @RUrlus, I hope you are fine! I was recently considering the possibility of extending the current `diptst()` implementation to support **stream-processing scenarios**, where data arrive continuously rather than as a static sorted array. The idea is to compute or approximate the dip statistic incrementally — either for an **append-only stream** (ever-growing sample) or for a **sliding window** (fixed-size or time-based window).

#### **Why**

Right now, every dip calculation requires a full re-scan of all ( n ) sorted elements. In streaming or online systems, this means recomputing **O(n)** work for each new observation, which quickly becomes infeasible for large or high-rate data.

Supporting online updates would:

* enable real-time unimodality monitoring in evolving distributions;
* allow integration with systems like Kafka / Flink / AWS Kinesis;
* reduce latency for continuous dip-based anomaly detection or clustering tasks.

#### **Ideas / Discussion**

We could explore three directions, each with different complexity and precision trade-offs:

1. **Append-only incremental dip**

   * Maintain the data in an **order-statistics tree** (balanced BST with rank queries).
   * Locally update the **convex envelopes (GCM/LCM)** around the insertion point.
   * Expected complexity: *O(log n)* per insert + small local convex repairs.
   * Requires deeper refactoring of `compute_indices()` to support partial updates.

2. **Sliding-window recomputation (baseline)**

   * Keep a **fixed-length deque** or circular buffer of the most recent ( W ) samples.
   * On each new sample, remove the oldest, insert the newest, re-sort, and call `diptst()`.
   * Complexity: *O(W)* per step.
   * Easiest to implement today; could serve as an MVP for stream compatibility.

3. **Approximate streaming dip**

   * Maintain a **quantile sketch** (e.g., KLL, t-digest, or reservoir sampling).
   * Periodically materialize a synthetic sorted sample from the sketch and run the exact dip on it.
   * Complexity: *O(log m)* per update, *O(m)* per dip query (m ≪ n).
   * Offers controllable accuracy–speed trade-off for large or high-velocity streams.

#### **References**

* Hartigan & Hartigan (1985): *The Dip Test of Unimodality*
* Incremental convex hull maintenance literature (e.g., Overmars & van Leeuwen, 1981)
* Quantile sketch algorithms: KLL (Agarwal et al., 2018), t-digest (Dunning, 2013)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore streaming / sliding-window support for the Dip Test #92

What

Why

Ideas / Discussion

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Explore streaming / sliding-window support for the Dip Test #92

Description

What

Why

Ideas / Discussion

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions