biopsykit.saliva.saliva module¶
Functions for processing saliva data and computing established features (AUC, slope, maximum increase, …).
- biopsykit.saliva.saliva.max_value(data, saliva_type='cortisol', remove_s0=False)[source]¶
Compute maximum value.
The output feature name will be
max_val
, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_max_val
.- Parameters
data (
SalivaRawDataFrame
) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
remove_s0 (bool, optional) – whether to remove the first saliva sample for computing maximum or not. Default:
False
- Returns
dataframe containing the computed features, or a dict of such if
saliva_type
is a list- Return type
SalivaFeatureDataFrame
or dict of such- Raises
ValidationError – if
data
is not aSalivaRawDataFrame
- biopsykit.saliva.saliva.initial_value(data, saliva_type='cortisol', remove_s0=False)[source]¶
Compute initial saliva sample.
The output feature name will be
ini_val
, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_ini_val
.- Parameters
data (
SalivaRawDataFrame
) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
remove_s0 (bool, optional) – whether to remove the first saliva sample for computing initial value or not. Default:
False
- Returns
dataframe containing the computed features, or a dict of such if
saliva_type
is a list- Return type
SalivaFeatureDataFrame
or dict of such- Raises
ValidationError – if
data
is not aSalivaRawDataFrame
- biopsykit.saliva.saliva.max_increase(data, saliva_type='cortisol', remove_s0=False, percent=False)[source]¶
Compute maximum increase between first saliva sample and all others.
The maximum increase (max_inc) is defined as the difference between the first sample and the maximum of all subsequent samples.
If the first sample should be excluded from computation, e.g., because the first sample was just collected for controlling against high initial saliva levels,
remove_s0
needs to set toTrue
.The output is either the absolute increase or the relative increase to the first sample in percent (if
percent
isTrue
).The output feature name will be
max_inc
, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_max_inc
.- Parameters
data (
SalivaRawDataFrame
) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
remove_s0 (bool, optional) – whether to exclude the first saliva sample from computing max_inc or not. Default:
False
percent (bool, optional) – whether to compute
max_inc
in percent (i.e., relative increase) or not. Default:False
- Returns
dataframe containing the computed features, or a dict of such if
saliva_type
is a list- Return type
SalivaFeatureDataFrame
or dict of such- Raises
ValidationError – if
data
is not aSalivaRawDataFrame
- biopsykit.saliva.saliva.auc(data, saliva_type='cortisol', remove_s0=False, compute_auc_post=False, sample_times=None)[source]¶
Compute area-under-the-curve (AUC) for saliva samples.
The area-under-the-curve is computed according to Pruessner et al. (2003) using the trapezoidal rule (
numpy.trapz()
). To compute an AUC the saliva time points are required in minutes. They can either be part of theSalivaRawDataFrame
(time column) or can be supplied as extra parameter (sample_times
).Pruessner defined two types of AUC, which are computed by default:
AUC with respect to ground (\(AUC_{G}\)), and
AUC with respect to the first sample, i.e., AUC with respect to increase (\(AUC_{I}\))
If the first sample should be excluded from computation, e.g., because the first sample was just collected for controlling against high initial saliva levels,
remove_s0
needs to set toTrue
.If saliva samples were collected during an acute stress task \(AUC_{I}\) can additionally be computed only for the saliva values after the stressor by setting
compute_auc_post
toTrue
.Note
For a pre/post stress scenario post-stress saliva samples are indicated by time points \(t \geq 0\), saliva sampled collected before start of the stressor are indicated by time points \(t < 0\). This means that a saliva sample collected at time \(t = 0\) is defined as right after stressor.
The feature names will be
auc_g
,auc_i
(andauc_i_post
ifcompute_auc_post
isTrue
), preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_auc_g
.- Parameters
data (
SalivaRawDataFrame
) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
remove_s0 (bool, optional) – whether to exclude the first saliva sample from computing
auc
or not. Default:False
compute_auc_post (bool, optional) – whether to additionally compute \(AUC_I\) only for saliva samples post stressor. Saliva samples post stressor are defined as all samples with non-negative
sample_times
. Default:False
sample_times (
numpy.ndarray
or list, optional) – Saliva sampling times (corresponding to x-axis values for computing AUC). By default (sample_times
isNone
) sample times are expected to be part of the dataframe (in the time column). Alternatively, sample times can be specified by passing a list or a numpy array to this argument. Ifsample_times
is a 1D array, it is assumed that saliva times are the same for all subjects. Then,sample_times
needs to have the shape (n_samples,). Ifsample_times
is a 2D array, it is assumed that saliva times are individual for all subjects. Then,saliva_times
needs to have the shape (n_subjects, n_samples).
- Returns
dataframe containing the computed features, or a dict of such if
saliva_type
is a list- Return type
SalivaFeatureDataFrame
or dict of such- Raises
ValidationError – if
data
is not aSalivaRawDataFrame
References
Pruessner, J. C., Kirschbaum, C., Meinlschmid, G., & Hellhammer, D. H. (2003). Two formulas for computation of the area under the curve represent measures of total hormone concentration versus time-dependent change. Psychoneuroendocrinology, 28(7), 916-931. https://doi.org/10.1016/S0306-4530(02)00108-7
- biopsykit.saliva.saliva.slope(data, sample_labels=None, sample_idx=None, saliva_type='cortisol', sample_times=None)[source]¶
Compute the slope between two saliva samples.
The samples to compute the slope can either be specified by index (parameter sample_idx) [0, num_of_samples-1] or by label (parameter sample_idx).
The output feature name for the slope between saliva samples with labels label1 and label2 will be
slope<label1><label2>
, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_slopeS1S2
.- Parameters
data (
SalivaRawDataFrame
) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
sample_labels (list or tuple) – pair of saliva sample labels to compute slope between. Labels correspond to the names in the sample column of the dataframe. An error will the raised if not exactly 2 samples are specified.
sample_idx (list or tuple) – pair of saliva sample indices to compute slope between. An error will the raised if not exactly 2 sample are specified
sample_times (
numpy.ndarray
or list, optional) – Saliva sampling times (corresponding to x-axis values for computing slope). By default (sample_times
isNone
) sample times are expected to be part of the dataframe (in the time column). Alternatively, sample times can be specified by passing a list or a numpy array to this argument. Ifsample_times
is a 1D array, it is assumed that saliva times are the same for all subjects. Then,sample_times
needs to have the shape (n_samples,). Ifsample_times
is a 2D array, it is assumed that saliva times are individual for all subjects. Then,saliva_times
needs to have the shape (n_subjects, n_samples).
- Returns
dataframe containing the computed features, or a dict of such if
saliva_type
is a list- Return type
SalivaFeatureDataFrame
or dict of such- Raises
IndexError – if invalid sample_labels or sample_idx is provided
ValidationError – if
data
is not aSalivaRawDataFrame
- biopsykit.saliva.saliva.standard_features(data, saliva_type='cortisol', group_cols=None, keep_index=True)[source]¶
Compute a set of standard features on saliva data.
The following list of features is computed:
argmax
: Argument (=index) of the maximum valuemean
: Mean valuestd
: Standard deviationskew
: Skewnesskurt
: Kurtosis
For all features the built-in pandas functions (e.g.
pandas.DataFrame.mean()
) will be used, except forargmax
, which will use numpy’s function (numpy.argmax()
). The functions will be applied on the dataframe using the aggregate functions from pandas (pandas.DataFrame.agg()
).The output feature names will be
argmax
,mean
,std
,skew
,kurt
, preceded by the name of the saliva type to allow better conversion into long-format later on (if desired). So e.g., for cortisol, it will be:cortisol_argmax
.- Parameters
data (
SalivaRawDataFrame
) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
group_cols (str or list of str, optional) – columns to group on before applying the aggregate function. If
group_cols
isNone
(the default), data will be grouped on by all columns except the sample column. Usually, data wants to be grouped by subject (followed by condition, day, night, etc., if applicable).keep_index (bool, optional) – whether to try keeping the old index or use the new index returned by the groupby-aggregate-function. Keeping the old index is e.g. useful if the dataframe has a multiindex with several levels, but grouping is only performed on a subset of these levels. Default:
True
- Returns
dataframe containing the computed features, or a dict of such if
saliva_type
is a list- Return type
SalivaFeatureDataFrame
or dict of such- Raises
ValidationError – if
data
is not aSalivaRawDataFrame
DataFrameTransformationError – if
keep_index
isTrue
, but applying the old index fails
- biopsykit.saliva.saliva.mean_se(data, saliva_type='cortisol', group_cols=None, remove_s0=False)[source]¶
Compute mean and standard error per saliva sample.
- Parameters
data (
SalivaRawDataFrame
) – saliva data in SalivaRawDataFrame formatsaliva_type (str or list of str) – saliva type or list of saliva types to compute features on
group_cols (str or list of str, optional) – columns to group on before computing mean and se. If
group_cols
isNone
(the default), data will be grouped on by all columns except the sample column. Usually, data wants to be grouped by subject (followed by condition, day, night, etc., if applicable).remove_s0 (bool, optional) – whether to exclude the first saliva sample from computing mean and standard error or not. Default:
False
- Returns
dataframe with mean and standard error per saliva sample or a dict of such if
saliva_type
is a list- Return type
- Raises
ValidationError – if
data
is not aSalivaRawDataFrame