biopsykit.utils.array_handling module

Module providing various functions for low-level handling of array data.

biopsykit.utils.array_handling.sanitize_input_1d(data)[source]

Convert 1-d array-like data (DataFrame/Series) to a numpy array.

Parameters

data (array_like) – input data. Needs to be 1-d

Returns

data as 1-d ndarray

Return type

ndarray

biopsykit.utils.array_handling.sanitize_input_nd(data, ncols=None)[source]

Convert n-d array-like data (DataFrame/Series) to a numpy array.

Parameters
  • data (array_like) – input data

  • ncols (int or tuple of ints) – number of columns (2nd dimension) the data is expected to have, a list of such if data can have a set of possible column numbers or None to allow any number of columns. Default: None

Returns

data as n-d numpy array

Return type

ndarray

biopsykit.utils.array_handling.find_extrema_in_radius(data, indices, radius, extrema_type='min')[source]

Find extrema values (min or max) within a given radius around array indices.

Parameters
  • data (array_like) – input data

  • indices (array_like) – array with indices for which to search for extrema values around

  • radius (int or tuple of int) –

    radius around indices to search for extrema:

    • if radius is an int then search for extrema equally in both directions in the interval [index - radius, index + radius].

    • if radius is a tuple then search for extrema in the interval [ index - radius[0], index + radius[1] ]

  • extrema_type ({'min', 'max'}, optional) – extrema type to be searched for. Default: ‘min’

Returns

numpy array containing the indices of the found extrema values in the given radius around indices. Has the same length as indices.

Return type

ndarray

Examples

>>> from biopsykit.utils.array_handling import find_extrema_in_radius
>>> data = pd.read_csv("data.csv")
>>> indices = np.array([16, 25, 40, 57, 86, 100])
>>>
>>> radius = 4
>>> # search minima in 'data' in a 4 sample 'radius' around each entry of 'indices'
>>> find_extrema_in_radius(data, indices, radius)
>>>
>>> radius = (5, 0)
>>> # search maxima in 'data' in a 5 samples before each entry of 'indices'
>>> find_extrema_in_radius(data, indices, radius, extrema_type='max')
biopsykit.utils.array_handling.remove_outlier_and_interpolate(data, outlier_mask, x_old=None, desired_length=None)[source]

Remove outliers, impute missing values and optionally interpolate data to desired length.

Detected outliers are removed from array and imputed by linear interpolation. Optionally, the output array can be linearly interpolated to a new length.

Parameters
  • data (array_like) – input data

  • outlier_mask (ndarray) – boolean outlier mask. Has to be the same length as data. True entries indicate outliers. If outlier_mask is not a bool array values will be casted to bool

  • x_old (array_like, optional) – x values of the input data to be interpolated or None if output data should not be interpolated to new length. Default: None

  • desired_length (int, optional) – desired length of the output signal or None to keep length of input signal. Default: None

Returns

data with removed and imputed outliers, optionally interpolated to desired length

Return type

ndarray

Raises

ValueError – if data and outlier_mask don’t have the same length or if x_old is None when desired_length is passed as parameter

biopsykit.utils.array_handling.sliding_window(data, window_samples=None, window_sec=None, sampling_rate=None, overlap_samples=None, overlap_percent=None)[source]

Create sliding windows from an input array.

The window size of sliding windows can either be specified in samples (window_samples) or in seconds (window_sec, together with sampling_rate).

The overlap of windows can either be specified in samples (overlap_samples) or in percent (overlap_percent).

Note

If data has more than one dimension the sliding window view is applied to the first dimension. In the 2-d case this would correspond to applying windows along the rows.

Parameters
  • data (array_like) – input data

  • window_samples (int, optional) – window size in samples or None if window size is specified in seconds + sampling rate. Default: None

  • window_sec (int, optional) – window size in seconds or None if window size is specified in samples. Default: None

  • sampling_rate (float, optional) – sampling rate of data in Hz. Only needed if window size is specified in seconds (window_sec parameter). Default: None

  • overlap_samples (int, optional) – overlap of windows in samples or None if window overlap is specified in percent. Default: None

  • overlap_percent (float, optional) – overlap of windows in percent or None if window overlap is specified in samples. Default: None

Returns

sliding windows from input array.

Return type

ndarray

See also

sliding_window_view()

create sliding window of input array. low-level function with less input parameter configuration possibilities

biopsykit.utils.array_handling.sanitize_sliding_window_input(window_samples=None, window_sec=None, sampling_rate=None, overlap_samples=None, overlap_percent=None)[source]

Sanitize input parameters for creating sliding windows from array data.

The window size of sliding windows can either be specified in samples (window_samples) or in seconds (window_sec, together with sampling_rate).

The overlap of windows can either be specified in samples (overlap_samples) or in percent (overlap_percent).

Parameters
  • window_samples (int, optional) – window size in samples or None if window size is specified in seconds + sampling rate. Default: None

  • window_sec (int, optional) – window size in seconds or None if window size is specified in samples. Default: None

  • sampling_rate (float, optional) – sampling rate of data in Hz. Only needed if window size is specified in seconds (window_sec parameter). Default: None

  • overlap_samples (int, optional) – overlap of windows in samples or None if window overlap is specified in percent. Default: None

  • overlap_percent (float, optional) – overlap of windows in percent or None if window overlap is specified in samples. Default: None

Returns

  • window (int) – window size in samples

  • overlap (int) – window overlap in samples

Return type

Tuple[int, int]

biopsykit.utils.array_handling.sliding_window_view(array, window_length, overlap, nan_padding=False)[source]

Create a sliding window view of an input array with given window length and overlap.

Warning

This function will return by default a view onto your input array, modifying values in your result will directly affect your input data which might lead to unexpected behaviour! If padding is disabled (default), last window fraction of input may not be returned! However, if nan_padding is enabled, this will always return a copy instead of a view of your input data, independent if padding was actually performed or not!

Parameters
  • array (ndarray with shape (n,) or (n, m)) – array on which sliding window action should be performed. Windowing will always be performed along axis 0.

  • window_length (int) – length of desired window (must be smaller than array length n)

  • overlap (int) – length of desired overlap (must be smaller than window_length)

  • nan_padding (bool) – select if last window should be nan-padded or discarded if it not fits with input array length. If nan-padding is enabled the return array will always be a copy of the input array independent if padding was actually performed or not!

Returns

windowed view (or copy if nan_padding is True) of input array as specified, last window might be nan-padded if necessary to match window size

Return type

ndarray

Examples

>>> data = np.arange(0,10)
>>> windowed_view = sliding_window_view(array = data, window_length = 5, overlap = 3, nan_padding = True)
>>> windowed_view
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 2.,  3.,  4.,  5.,  6.],
       [ 4.,  5.,  6.,  7.,  8.],
       [ 6.,  7.,  8.,  9., nan]])
biopsykit.utils.array_handling.downsample(data, fs_in, fs_out)[source]

Downsample input signal to a new sampling rate.

If the output sampling rate is a divisor of the input sampling rate, the signal is downsampled using decimate(). Otherwise, data is first filtered using an aliasing filter before it is downsampled using linear interpolation.

Parameters
  • data (ndarray) – input data

  • fs_in (float) – sampling rate of input data in Hz.

  • fs_out (float) – sampling rate of output data in Hz

Returns

output data with new sampling rate

Return type

ndarray

biopsykit.utils.array_handling.bool_array_to_start_end_array(bool_array)[source]

Find regions in bool array and convert those to start-end indices.

Note

The end index is inclusive!

Parameters

bool_array (ndarray with shape (n,)) – boolean array with either 0/1, 0.0/1.0 or True/False elements

Returns

array of [start, end] indices with shape (n,2)

Return type

ndarray

Examples

>>> example_array = np.array([0,0,1,1,0,0,1,1,1])
>>> start_end_list = bool_array_to_start_end_array(example_array)
>>> start_end_list
array([[2, 4],
       [6, 9]])
>>> example_array[start_end_list[0, 0]: start_end_list[0, 1]]
array([1, 1])
biopsykit.utils.array_handling.split_array_equally(data, n_splits)[source]

Generate indices to split array into parts with equal lengths.

Parameters
  • data (array_like) – data to split

  • n_splits (int) – number of splits

Returns

list with start and end indices which will lead to splitting array into parts with equal lengths

Return type

list of tuples

biopsykit.utils.array_handling.accumulate_array(data, fs_in, fs_out)[source]

Accumulate 1-d array by summing over windows.

Parameters
  • data (array_like) – data to accumulate. must be 1-d array

  • fs_in (float) – sampling rate of input data in Hz

  • fs_out (float) – sampling rate of output data in Hz

Returns

accumulated array

Return type

array_like

biopsykit.utils.array_handling.add_datetime_index(arr, start_time, sampling_rate, column_name=None)[source]

Add datetime index to dataframe.

Parameters
  • arr (array_like) – numpy array to add index to

  • start_time (Timestamp) – start time of the index

  • sampling_rate (float) – sampling rate of input data in Hz.

  • column_name (str, optional) – column of the resulting dataframe or None to leave it empty.

Returns

dataframe with datetime index

Return type

DataFrame