biopsykit.utils.array_handling module¶

Module providing various functions for low-level handling of array data.

biopsykit.utils.array_handling.sanitize_input_1d(data)[source]¶

Convert 1-d array-like data (DataFrame/Series) to a numpy array.

Parameters: data (array_like) – input data. Needs to be 1-d
Returns: data as 1-d ndarray
Return type: ndarray

biopsykit.utils.array_handling.sanitize_input_nd(data, ncols=None)[source]¶

Convert n-d array-like data (DataFrame/Series) to a numpy array.

Parameters

data (array_like) – input data
ncols (int or tuple of ints) – number of columns (2nd dimension) the data is expected to have, a list of such if data can have a set of possible column numbers or None to allow any number of columns. Default: None

Returns

data as n-d numpy array

Return type

ndarray

biopsykit.utils.array_handling.find_extrema_in_radius(data, indices, radius, extrema_type='min')[source]¶

Find extrema values (min or max) within a given radius around array indices.

Parameters

data (array_like) – input data
indices (array_like) – array with indices for which to search for extrema values around
radius (int or tuple of int) –
radius around indices to search for extrema:
- if radius is an int then search for extrema equally in both directions in the interval [index - radius, index + radius].
- if radius is a tuple then search for extrema in the interval [ index - radius[0], index + radius[1] ]
extrema_type ({'min', 'max'}, optional) – extrema type to be searched for. Default: ‘min’

Returns

numpy array containing the indices of the found extrema values in the given radius around indices. Has the same length as indices.

Return type

ndarray

Examples

>>> from biopsykit.utils.array_handling import find_extrema_in_radius
>>> data = pd.read_csv("data.csv")
>>> indices = np.array([16, 25, 40, 57, 86, 100])
>>>
>>> radius = 4
>>> # search minima in 'data' in a 4 sample 'radius' around each entry of 'indices'
>>> find_extrema_in_radius(data, indices, radius)
>>>
>>> radius = (5, 0)
>>> # search maxima in 'data' in a 5 samples before each entry of 'indices'
>>> find_extrema_in_radius(data, indices, radius, extrema_type='max')

biopsykit.utils.array_handling.remove_outlier_and_interpolate(data, outlier_mask, x_old=None, desired_length=None)[source]¶

Remove outliers, impute missing values and optionally interpolate data to desired length.

Detected outliers are removed from array and imputed by linear interpolation. Optionally, the output array can be linearly interpolated to a new length.

Parameters

data (array_like) – input data
outlier_mask (ndarray) – boolean outlier mask. Has to be the same length as data. True entries indicate outliers. If outlier_mask is not a bool array values will be casted to bool
x_old (array_like, optional) – x values of the input data to be interpolated or None if output data should not be interpolated to new length. Default: None
desired_length (int, optional) – desired length of the output signal or None to keep length of input signal. Default: None

Returns

data with removed and imputed outliers, optionally interpolated to desired length

Return type

ndarray

Raises

ValueError – if data and outlier_mask don’t have the same length or if x_old is None when desired_length is passed as parameter

biopsykit.utils.array_handling.sliding_window(data, window_samples=None, window_sec=None, sampling_rate=None, overlap_samples=None, overlap_percent=None)[source]¶

Create sliding windows from an input array.

The window size of sliding windows can either be specified in samples (window_samples) or in seconds (window_sec, together with sampling_rate).

The overlap of windows can either be specified in samples (overlap_samples) or in percent (overlap_percent).

Note

If data has more than one dimension the sliding window view is applied to the first dimension. In the 2-d case this would correspond to applying windows along the rows.

Parameters

data (array_like) – input data
window_samples (int, optional) – window size in samples or None if window size is specified in seconds + sampling rate. Default: None
window_sec (int, optional) – window size in seconds or None if window size is specified in samples. Default: None
sampling_rate (float, optional) – sampling rate of data in Hz. Only needed if window size is specified in seconds (window_sec parameter). Default: None
overlap_samples (int, optional) – overlap of windows in samples or None if window overlap is specified in percent. Default: None
overlap_percent (float, optional) – overlap of windows in percent or None if window overlap is specified in samples. Default: None

Returns

sliding windows from input array.

Return type

ndarray

See also

sliding_window_view(): create sliding window of input array. low-level function with less input parameter configuration possibilities

biopsykit.utils.array_handling.sanitize_sliding_window_input(window_samples=None, window_sec=None, sampling_rate=None, overlap_samples=None, overlap_percent=None)[source]¶

Sanitize input parameters for creating sliding windows from array data.

The window size of sliding windows can either be specified in samples (window_samples) or in seconds (window_sec, together with sampling_rate).

The overlap of windows can either be specified in samples (overlap_samples) or in percent (overlap_percent).

Parameters

window_samples (int, optional) – window size in samples or None if window size is specified in seconds + sampling rate. Default: None
window_sec (int, optional) – window size in seconds or None if window size is specified in samples. Default: None
sampling_rate (float, optional) – sampling rate of data in Hz. Only needed if window size is specified in seconds (window_sec parameter). Default: None
overlap_samples (int, optional) – overlap of windows in samples or None if window overlap is specified in percent. Default: None
overlap_percent (float, optional) – overlap of windows in percent or None if window overlap is specified in samples. Default: None

Returns

window (int) – window size in samples
overlap (int) – window overlap in samples

Return type

Tuple[int, int]

biopsykit.utils.array_handling.sliding_window_view(array, window_length, overlap, nan_padding=False)[source]¶

Create a sliding window view of an input array with given window length and overlap.

Warning

This function will return by default a view onto your input array, modifying values in your result will directly affect your input data which might lead to unexpected behaviour! If padding is disabled (default), last window fraction of input may not be returned! However, if nan_padding is enabled, this will always return a copy instead of a view of your input data, independent if padding was actually performed or not!

Parameters

array (ndarray with shape (n,) or (n, m)) – array on which sliding window action should be performed. Windowing will always be performed along axis 0.
window_length (int) – length of desired window (must be smaller than array length n)
overlap (int) – length of desired overlap (must be smaller than window_length)
nan_padding (bool) – select if last window should be nan-padded or discarded if it not fits with input array length. If nan-padding is enabled the return array will always be a copy of the input array independent if padding was actually performed or not!

Returns

windowed view (or copy if nan_padding is True) of input array as specified, last window might be nan-padded if necessary to match window size

Return type

ndarray

Examples

>>> data = np.arange(0,10)
>>> windowed_view = sliding_window_view(array = data, window_length = 5, overlap = 3, nan_padding = True)
>>> windowed_view
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 2.,  3.,  4.,  5.,  6.],
       [ 4.,  5.,  6.,  7.,  8.],
       [ 6.,  7.,  8.,  9., nan]])

biopsykit.utils.array_handling.downsample(data, fs_in, fs_out)[source]¶

Downsample input signal to a new sampling rate.

If the output sampling rate is a divisor of the input sampling rate, the signal is downsampled using decimate(). Otherwise, data is first filtered using an aliasing filter before it is downsampled using linear interpolation.

Parameters

data (ndarray) – input data
fs_in (float) – sampling rate of input data in Hz.
fs_out (float) – sampling rate of output data in Hz

Returns

output data with new sampling rate

Return type

ndarray

biopsykit.utils.array_handling.bool_array_to_start_end_array(bool_array)[source]¶

Find regions in bool array and convert those to start-end indices.

Note

The end index is inclusive!

Parameters: bool_array (ndarray with shape (n,)) – boolean array with either 0/1, 0.0/1.0 or True/False elements
Returns: array of [start, end] indices with shape (n,2)
Return type: ndarray

Examples

>>> example_array = np.array([0,0,1,1,0,0,1,1,1])
>>> start_end_list = bool_array_to_start_end_array(example_array)
>>> start_end_list
array([[2, 4],
       [6, 9]])
>>> example_array[start_end_list[0, 0]: start_end_list[0, 1]]
array([1, 1])

biopsykit.utils.array_handling.split_array_equally(data, n_splits)[source]¶

Generate indices to split array into parts with equal lengths.

Parameters

data (array_like) – data to split
n_splits (int) – number of splits

Returns

list with start and end indices which will lead to splitting array into parts with equal lengths

Return type

list of tuples

biopsykit.utils.array_handling.accumulate_array(data, fs_in, fs_out)[source]¶

Accumulate 1-d array by summing over windows.

Parameters

data (array_like) – data to accumulate. must be 1-d array
fs_in (float) – sampling rate of input data in Hz
fs_out (float) – sampling rate of output data in Hz

Returns

accumulated array

Return type

array_like

biopsykit.utils.array_handling.add_datetime_index(arr, start_time, sampling_rate, column_name=None)[source]¶

Add datetime index to dataframe.

Parameters

arr (array_like) – numpy array to add index to
start_time (Timestamp) – start time of the index
sampling_rate (float) – sampling rate of input data in Hz.
column_name (str, optional) – column of the resulting dataframe or None to leave it empty.

Returns

dataframe with datetime index

Return type

DataFrame

biopsykit.utils package biopsykit.utils.data_processing module