biopsykit.saliva package

Module for processing saliva data and computing established features (AUC, slope, maximum increase, …).

biopsykit.saliva.auc(data, saliva_type='cortisol', remove_s0=False, compute_auc_post=False, sample_times=None)[source]

Compute area-under-the-curve (AUC) for saliva samples.

The area-under-the-curve is computed according to Pruessner et al. (2003) using the trapezoidal rule (numpy.trapz()). To compute an AUC the saliva time points are required in minutes. They can either be part of the SalivaRawDataFrame (time column) or can be supplied as extra parameter (sample_times).

Pruessner defined two types of AUC, which are computed by default:

  • AUC with respect to ground (\(AUC_{G}\)), and

  • AUC with respect to the first sample, i.e., AUC with respect to increase (\(AUC_{I}\))

If the first sample should be excluded from computation, e.g., because the first sample was just collected for controlling against high initial saliva levels, remove_s0 needs to set to True.

If saliva samples were collected during an acute stress task \(AUC_{I}\) can additionally be computed only for the saliva values after the stressor by setting compute_auc_post to True.

Note

For a pre/post stress scenario post-stress saliva samples are indicated by time points \(t \geq 0\), saliva sampled collected before start of the stressor are indicated by time points \(t < 0\). This means that a saliva sample collected at time \(t = 0\) is defined as right after stressor.

The feature names will be auc_g, auc_i (and auc_i_post if compute_auc_post is True), preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be: cortisol_auc_g.

Parameters
  • data (SalivaRawDataFrame) – saliva data in SalivaRawDataFrame format

  • saliva_type (str or list of str) – saliva type or list of saliva types to compute features on

  • remove_s0 (bool, optional) – whether to exclude the first saliva sample from computing auc or not. Default: False

  • compute_auc_post (bool, optional) – whether to additionally compute \(AUC_I\) only for saliva samples post stressor. Saliva samples post stressor are defined as all samples with non-negative sample_times. Default: False

  • sample_times (numpy.ndarray or list, optional) – Saliva sampling times (corresponding to x-axis values for computing AUC). By default (sample_times is None) sample times are expected to be part of the dataframe (in the time column). Alternatively, sample times can be specified by passing a list or a numpy array to this argument. If sample_times is a 1D array, it is assumed that saliva times are the same for all subjects. Then, sample_times needs to have the shape (n_samples,). If sample_times is a 2D array, it is assumed that saliva times are individual for all subjects. Then, saliva_times needs to have the shape (n_subjects, n_samples).

Returns

dataframe containing the computed features, or a dict of such if saliva_type is a list

Return type

SalivaFeatureDataFrame or dict of such

Raises

ValidationError – if data is not a SalivaRawDataFrame

References

Pruessner, J. C., Kirschbaum, C., Meinlschmid, G., & Hellhammer, D. H. (2003). Two formulas for computation of the area under the curve represent measures of total hormone concentration versus time-dependent change. Psychoneuroendocrinology, 28(7), 916-931. https://doi.org/10.1016/S0306-4530(02)00108-7

biopsykit.saliva.initial_value(data, saliva_type='cortisol', remove_s0=False)[source]

Compute initial saliva sample.

The output feature name will be ini_val, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be: cortisol_ini_val.

Parameters
  • data (SalivaRawDataFrame) – saliva data in SalivaRawDataFrame format

  • saliva_type (str or list of str) – saliva type or list of saliva types to compute features on

  • remove_s0 (bool, optional) – whether to remove the first saliva sample for computing initial value or not. Default: False

Returns

dataframe containing the computed features, or a dict of such if saliva_type is a list

Return type

SalivaFeatureDataFrame or dict of such

Raises

ValidationError – if data is not a SalivaRawDataFrame

biopsykit.saliva.max_increase(data, saliva_type='cortisol', remove_s0=False, percent=False)[source]

Compute maximum increase between first saliva sample and all others.

The maximum increase (max_inc) is defined as the difference between the first sample and the maximum of all subsequent samples.

If the first sample should be excluded from computation, e.g., because the first sample was just collected for controlling against high initial saliva levels, remove_s0 needs to set to True.

The output is either the absolute increase or the relative increase to the first sample in percent (if percent is True).

The output feature name will be max_inc, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be: cortisol_max_inc.

Parameters
  • data (SalivaRawDataFrame) – saliva data in SalivaRawDataFrame format

  • saliva_type (str or list of str) – saliva type or list of saliva types to compute features on

  • remove_s0 (bool, optional) – whether to exclude the first saliva sample from computing max_inc or not. Default: False

  • percent (bool, optional) – whether to compute max_inc in percent (i.e., relative increase) or not. Default: False

Returns

dataframe containing the computed features, or a dict of such if saliva_type is a list

Return type

SalivaFeatureDataFrame or dict of such

Raises

ValidationError – if data is not a SalivaRawDataFrame

biopsykit.saliva.max_value(data, saliva_type='cortisol', remove_s0=False)[source]

Compute maximum value.

The output feature name will be max_val, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be: cortisol_max_val.

Parameters
  • data (SalivaRawDataFrame) – saliva data in SalivaRawDataFrame format

  • saliva_type (str or list of str) – saliva type or list of saliva types to compute features on

  • remove_s0 (bool, optional) – whether to remove the first saliva sample for computing maximum or not. Default: False

Returns

dataframe containing the computed features, or a dict of such if saliva_type is a list

Return type

SalivaFeatureDataFrame or dict of such

Raises

ValidationError – if data is not a SalivaRawDataFrame

biopsykit.saliva.mean_se(data, saliva_type='cortisol', group_cols=None, remove_s0=False)[source]

Compute mean and standard error per saliva sample.

Parameters
  • data (SalivaRawDataFrame) – saliva data in SalivaRawDataFrame format

  • saliva_type (str or list of str) – saliva type or list of saliva types to compute features on

  • group_cols (str or list of str, optional) – columns to group on before computing mean and se. If group_cols is None (the default), data will be grouped on by all columns except the sample column. Usually, data wants to be grouped by subject (followed by condition, day, night, etc., if applicable).

  • remove_s0 (bool, optional) – whether to exclude the first saliva sample from computing mean and standard error or not. Default: False

Returns

dataframe with mean and standard error per saliva sample or a dict of such if saliva_type is a list

Return type

SalivaMeanSeDataFrame

Raises

ValidationError – if data is not a SalivaRawDataFrame

biopsykit.saliva.slope(data, sample_labels=None, sample_idx=None, saliva_type='cortisol', sample_times=None)[source]

Compute the slope between two saliva samples.

The samples to compute the slope can either be specified by index (parameter sample_idx) [0, num_of_samples-1] or by label (parameter sample_idx).

The output feature name for the slope between saliva samples with labels label1 and label2 will be slope<label1><label2>, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be: cortisol_slopeS1S2.

Parameters
  • data (SalivaRawDataFrame) – saliva data in SalivaRawDataFrame format

  • saliva_type (str or list of str) – saliva type or list of saliva types to compute features on

  • sample_labels (list or tuple) – pair of saliva sample labels to compute slope between. Labels correspond to the names in the sample column of the dataframe. An error will the raised if not exactly 2 samples are specified.

  • sample_idx (list or tuple) – pair of saliva sample indices to compute slope between. An error will the raised if not exactly 2 sample are specified

  • sample_times (numpy.ndarray or list, optional) – Saliva sampling times (corresponding to x-axis values for computing slope). By default (sample_times is None) sample times are expected to be part of the dataframe (in the time column). Alternatively, sample times can be specified by passing a list or a numpy array to this argument. If sample_times is a 1D array, it is assumed that saliva times are the same for all subjects. Then, sample_times needs to have the shape (n_samples,). If sample_times is a 2D array, it is assumed that saliva times are individual for all subjects. Then, saliva_times needs to have the shape (n_subjects, n_samples).

Returns

dataframe containing the computed features, or a dict of such if saliva_type is a list

Return type

SalivaFeatureDataFrame or dict of such

Raises
biopsykit.saliva.standard_features(data, saliva_type='cortisol', group_cols=None, keep_index=True)[source]

Compute a set of standard features on saliva data.

The following list of features is computed:

  • argmax: Argument (=index) of the maximum value

  • mean: Mean value

  • std: Standard deviation

  • skew: Skewness

  • kurt: Kurtosis

For all features the built-in pandas functions (e.g. pandas.DataFrame.mean()) will be used, except for argmax, which will use numpy’s function (numpy.argmax()). The functions will be applied on the dataframe using the aggregate functions from pandas (pandas.DataFrame.agg()).

The output feature names will be argmax, mean, std, skew, kurt, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be: cortisol_argmax.

Parameters
  • data (SalivaRawDataFrame) – saliva data in SalivaRawDataFrame format

  • saliva_type (str or list of str) – saliva type or list of saliva types to compute features on

  • group_cols (str or list of str, optional) – columns to group on before applying the aggregate function. If group_cols is None (the default), data will be grouped on by all columns except the sample column. Usually, data wants to be grouped by subject (followed by condition, day, night, etc., if applicable).

  • keep_index (bool, optional) – whether to try keeping the old index or use the new index returned by the groupby-aggregate-function. Keeping the old index is e.g. useful if the dataframe has a multiindex with several levels, but grouping is only performed on a subset of these levels. Default: True

Returns

dataframe containing the computed features, or a dict of such if saliva_type is a list

Return type

SalivaFeatureDataFrame or dict of such

Raises