Skip to content

Top-level APIs

These methods and objects are available directly in the babeldata module.

Functions in 'common'

glass_wool(x_in, maxstd, side='both')

Iteratively remove outliers from data.

Iteratively removes outliers from normally distributed input data until there are no more outliers more than maxstd standard deviations from the mean. The returned array has the same length - outliers are set to np.nan.

Drop nans after calling glass_wool

Use x[~np.isnan(x)] to remove outliers (np.nan) from the returned array.

Parameters:

Name Type Description Default
x_in np.ndarray

The input data.

required
maxstd float | Tuple[float, float]

The maximum number of standard deviations allowed for an outlier. If a float is given, the same maximum standard deviation is used for the upper and lower sides of the distribution. If a tuple is given and side is "both", the first value is used for the lower side and the second value is used for the upper side.

required
side str

The side(s) on which to remove outliers. Options are "lower", "upper", or "both" (default).

'both'

Returns:

Type Description
numpy.ndarray

A copy of the input data with outliers set to np.nan.

Examples:

Cut values at plus and minus 2 standard deviations from the mean:

>>> import numpy as np
>>> from babeldata.common import glass_wool
>>> x = np.array([1., 442., 443., 444., 445., 446., 447., 448., 449., 900.])
>>> glass_wool(x, 2.0)
array([ nan, 442., 443., 444., 445., 446., 447., 448., 449.,  nan])

Only cut upper outliers:

>>> glass_wool(x, 2.0, side='upper')
array([  1., 442., 443., 444., 445., 446., 447., 448., 449.,  nan])

Only cut lower outliers:

>>> glass_wool(x, 2.0, side='lower')
array([ nan, 442., 443., 444., 445., 446., 447., 448., 449., 900.])

Use asymmetric upper and lower limits:

>>> glass_wool(x, (2.0, 4.0), 'both')
array([ nan, 442., 443., 444., 445., 446., 447., 448., 449., 900.])

Notes

- The input data must be a 1D numpy.ndarray.
- The function is optimized with the @jit decorator for improved performance.

Last update: March 8, 2023