Skip to content

Count by class is wrong when input array contains values that are not number / non finite number #41

@mthh

Description

@mthh

Consider the following code:

let discr = require("statsbreaks")

let data = [1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 'foo', -Infinity, NaN]
let series = new discr.JenksClassifier(data, 2);
let bks = series.classify(3);
let count = series.countByClass();

I think count should be [8, 5, 2] (as if we used [1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8] as input array) instead of [9, 5, 2, NaN].

The breaks returned are correct (because the input array is filtered in the inner classification function) but in Classifier classes we store the input array before it is filtered :

this._values = values;

A quick fix is simply to store the filtered input array in the line of code shown below (but we'll be redoing this filtering for nothing in the internal classification function).

A better fix might be to avoid doing this filtering twice (and to avoid creating too many new arrays, since doing array.filter(/* some code */).map(/* some code */) creates two new arrays). However, in most cases this shouldn't make any noticeable difference to performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions