biopsykit.saliva package¶
Module for processing saliva data and computing established features (AUC, slope, maximum increase, …).
- biopsykit.saliva.auc(data, saliva_type='cortisol', remove_s0=False, compute_auc_post=False, sample_times=None)[source]¶
Compute area-under-the-curve (AUC) for saliva samples.
The area-under-the-curve is computed according to Pruessner et al. (2003) using the trapezoidal rule (
numpy.trapz()). To compute an AUC the saliva time points are required in minutes. They can either be part of theSalivaRawDataFrame(time column) or can be supplied as extra parameter (sample_times).Pruessner defined two types of AUC, which are computed by default:
AUC with respect to ground (\(AUC_{G}\)), and
AUC with respect to the first sample, i.e., AUC with respect to increase (\(AUC_{I}\))
If the first sample should be excluded from computation, e.g., because the first sample was just collected for controlling against high initial saliva levels,
remove_s0needs to set toTrue.If saliva samples were collected during an acute stress task \(AUC_{I}\) can additionally be computed only for the saliva values after the stressor by setting
compute_auc_posttoTrue.Note
For a pre/post stress scenario post-stress saliva samples are indicated by time points \(t \geq 0\), saliva sampled collected before start of the stressor are indicated by time points \(t < 0\). This means that a saliva sample collected at time \(t = 0\) is defined as right after stressor.
The feature names will be
auc_g,auc_i(andauc_i_postifcompute_auc_postisTrue), preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_auc_g.- Parameters
data (
SalivaRawDataFrame) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
remove_s0 (bool, optional) – whether to exclude the first saliva sample from computing
aucor not. Default:Falsecompute_auc_post (bool, optional) – whether to additionally compute \(AUC_I\) only for saliva samples post stressor. Saliva samples post stressor are defined as all samples with non-negative
sample_times. Default:Falsesample_times (
numpy.ndarrayor list, optional) – Saliva sampling times (corresponding to x-axis values for computing AUC). By default (sample_timesisNone) sample times are expected to be part of the dataframe (in the time column). Alternatively, sample times can be specified by passing a list or a numpy array to this argument. Ifsample_timesis a 1D array, it is assumed that saliva times are the same for all subjects. Then,sample_timesneeds to have the shape (n_samples,). Ifsample_timesis a 2D array, it is assumed that saliva times are individual for all subjects. Then,saliva_timesneeds to have the shape (n_subjects, n_samples).
- Returns
dataframe containing the computed features, or a dict of such if
saliva_typeis a list- Return type
SalivaFeatureDataFrameor dict of such- Raises
ValidationError – if
datais not aSalivaRawDataFrame
References
Pruessner, J. C., Kirschbaum, C., Meinlschmid, G., & Hellhammer, D. H. (2003). Two formulas for computation of the area under the curve represent measures of total hormone concentration versus time-dependent change. Psychoneuroendocrinology, 28(7), 916-931. https://doi.org/10.1016/S0306-4530(02)00108-7
- biopsykit.saliva.initial_value(data, saliva_type='cortisol', remove_s0=False)[source]¶
Compute initial saliva sample.
The output feature name will be
ini_val, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_ini_val.- Parameters
data (
SalivaRawDataFrame) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
remove_s0 (bool, optional) – whether to remove the first saliva sample for computing initial value or not. Default:
False
- Returns
dataframe containing the computed features, or a dict of such if
saliva_typeis a list- Return type
SalivaFeatureDataFrameor dict of such- Raises
ValidationError – if
datais not aSalivaRawDataFrame
- biopsykit.saliva.max_increase(data, saliva_type='cortisol', remove_s0=False, percent=False)[source]¶
Compute maximum increase between first saliva sample and all others.
The maximum increase (max_inc) is defined as the difference between the first sample and the maximum of all subsequent samples.
If the first sample should be excluded from computation, e.g., because the first sample was just collected for controlling against high initial saliva levels,
remove_s0needs to set toTrue.The output is either the absolute increase or the relative increase to the first sample in percent (if
percentisTrue).The output feature name will be
max_inc, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_max_inc.- Parameters
data (
SalivaRawDataFrame) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
remove_s0 (bool, optional) – whether to exclude the first saliva sample from computing max_inc or not. Default:
Falsepercent (bool, optional) – whether to compute
max_incin percent (i.e., relative increase) or not. Default:False
- Returns
dataframe containing the computed features, or a dict of such if
saliva_typeis a list- Return type
SalivaFeatureDataFrameor dict of such- Raises
ValidationError – if
datais not aSalivaRawDataFrame
- biopsykit.saliva.max_value(data, saliva_type='cortisol', remove_s0=False)[source]¶
Compute maximum value.
The output feature name will be
max_val, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_max_val.- Parameters
data (
SalivaRawDataFrame) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
remove_s0 (bool, optional) – whether to remove the first saliva sample for computing maximum or not. Default:
False
- Returns
dataframe containing the computed features, or a dict of such if
saliva_typeis a list- Return type
SalivaFeatureDataFrameor dict of such- Raises
ValidationError – if
datais not aSalivaRawDataFrame
- biopsykit.saliva.mean_se(data, saliva_type='cortisol', group_cols=None, remove_s0=False)[source]¶
Compute mean and standard error per saliva sample.
- Parameters
data (
SalivaRawDataFrame) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
group_cols (str or list of str, optional) – columns to group on before computing mean and se. If
group_colsisNone(the default), data will be grouped on by all columns except the sample column. Usually, data wants to be grouped by subject (followed by condition, day, night, etc., if applicable).remove_s0 (bool, optional) – whether to exclude the first saliva sample from computing mean and standard error or not. Default:
False
- Returns
dataframe with mean and standard error per saliva sample or a dict of such if
saliva_typeis a list- Return type
- Raises
ValidationError – if
datais not aSalivaRawDataFrame
- biopsykit.saliva.slope(data, sample_labels=None, sample_idx=None, saliva_type='cortisol', sample_times=None)[source]¶
Compute the slope between two saliva samples.
The samples to compute the slope can either be specified by index (parameter sample_idx) [0, num_of_samples-1] or by label (parameter sample_idx).
The output feature name for the slope between saliva samples with labels label1 and label2 will be
slope<label1><label2>, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_slopeS1S2.- Parameters
data (
SalivaRawDataFrame) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
sample_labels (list or tuple) – pair of saliva sample labels to compute slope between. Labels correspond to the names in the sample column of the dataframe. An error will the raised if not exactly 2 samples are specified.
sample_idx (list or tuple) – pair of saliva sample indices to compute slope between. An error will the raised if not exactly 2 sample are specified
sample_times (
numpy.ndarrayor list, optional) – Saliva sampling times (corresponding to x-axis values for computing slope). By default (sample_timesisNone) sample times are expected to be part of the dataframe (in the time column). Alternatively, sample times can be specified by passing a list or a numpy array to this argument. Ifsample_timesis a 1D array, it is assumed that saliva times are the same for all subjects. Then,sample_timesneeds to have the shape (n_samples,). Ifsample_timesis a 2D array, it is assumed that saliva times are individual for all subjects. Then,saliva_timesneeds to have the shape (n_subjects, n_samples).
- Returns
dataframe containing the computed features, or a dict of such if
saliva_typeis a list- Return type
SalivaFeatureDataFrameor dict of such- Raises
IndexError – if invalid sample_labels or sample_idx is provided
ValidationError – if
datais not aSalivaRawDataFrame
- biopsykit.saliva.standard_features(data, saliva_type='cortisol', group_cols=None, keep_index=True)[source]¶
Compute a set of standard features on saliva data.
The following list of features is computed:
argmax: Argument (=index) of the maximum valuemean: Mean valuestd: Standard deviationskew: Skewnesskurt: Kurtosis
For all features the built-in pandas functions (e.g.
pandas.DataFrame.mean()) will be used, except forargmax, which will use numpy’s function (numpy.argmax()). The functions will be applied on the dataframe using the aggregate functions from pandas (pandas.DataFrame.agg()).The output feature names will be
argmax,mean,std,skew,kurt, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_argmax.- Parameters
data (
SalivaRawDataFrame) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
group_cols (str or list of str, optional) – columns to group on before applying the aggregate function. If
group_colsisNone(the default), data will be grouped on by all columns except the sample column. Usually, data wants to be grouped by subject (followed by condition, day, night, etc., if applicable).keep_index (bool, optional) – whether to try keeping the old index or use the new index returned by the groupby-aggregate-function. Keeping the old index is e.g. useful if the dataframe has a multiindex with several levels, but grouping is only performed on a subset of these levels. Default:
True
- Returns
dataframe containing the computed features, or a dict of such if
saliva_typeis a list- Return type
SalivaFeatureDataFrameor dict of such- Raises
ValidationError – if
datais not aSalivaRawDataFrameDataFrameTransformationError – if
keep_indexisTrue, but applying the old index fails