biopsykit.questionnaires.utils module¶
Module containing utility functions for manipulating and processing questionnaire data.
- biopsykit.questionnaires.utils.bin_scale(data, bins, cols=None, first_min=True, last_max=False, inplace=False, **kwargs)[source]¶
Bin questionnaire scales.
Questionnaire scales are binned using
pandas.cut()according to the bins specified bybins.- Parameters
bins (int or list of float or
IntervalIndex`) –The criteria to bin by.
binscan have one of the following types:int: Defines the number of equal-width bins in the range ofdata. The range ofdatais extended by 0.1% on each side to include the minimum and maximum values ofdata.sequence of scalars : Defines the bin edges allowing for non-uniform width. No extension of the range of
datais done.IntervalIndex: Defines the exact bins to be used. Note that theIntervalIndexforbinsmust be non-overlapping.
cols (list of str or list of int, optional) – column name/index (or list of such) to be binned or
Noneto use all columns (or ifdatais a series). Default:Nonefirst_min (bool, optional) – whether the minimum value should be added as the leftmost edge of the last bin or not. Only considered if
binsis a list. Default:Falselast_max (bool, optional) – whether the maximum value should be added as the rightmost edge of the last bin or not. Only considered if
binsis a list. Default:Falseinplace (bool, optional) – whether to perform the operation inplace or not. Default:
False**kwargs – additional parameters that are passed to
pandas.cut()
- Returns
dataframe (or series) with binned scales or
NoneifinplaceisTrue- Return type
See also
pandas.cut()Pandas method to bin values into discrete intervals.
- biopsykit.questionnaires.utils.compute_scores(data, quest_dict, quest_kwargs=None)[source]¶
Compute questionnaire scores from dataframe.
This function can be used if multiple questionnaires from a dataframe should be computed at once. If the same questionnaire was assessed at multiple time points, these scores will be computed separately (see
NotesandExamples). The questionnaires (and the dataframe columns belonging to the questionnaires) are specified byquest_dict.Note
If questionnaires were collected at different time points (e.g., pre and post), which should all be computed, then the dictionary keys need to have the following format: “<questionnaire_name>-<time_point>”.
- Parameters
data (
DataFrame) – dataframe containing questionnaire dataquest_dict (dict) – dictionary with questionnaire names to be computed (keys) and columns of the questionnaires (values)
quest_kwargs (dict) – dictionary with optional arguments to be passed to questionnaire functions. The dictionary is expected consist of questionnaire names (keys) and
**kwargsdictionaries (values) with arguments per questionnaire
- Returns
dataframe with computed questionnaire scores
- Return type
Examples
>>> from biopsykit.questionnaires.utils import compute_scores >>> quest_dict = { >>> "PSS": ["PSS_{:02d}".format(i) for i in range(1, 11)], # PSS: one time point >>> "PASA-pre": ["PASA_{:02d}_T0".format(i) for i in range(1, 17)], # PASA: two time points (pre and post) >>> "PASA-post": ["PASA_{:02d}_T1".format(i) for i in range(1, 17)], # PASA: two time points (pre and post) >>> } >>> compute_scores(data, quest_dict)
- biopsykit.questionnaires.utils.convert_scale(data, offset, cols=None, inplace=False)[source]¶
Convert the score range of questionnaire items.
- Parameters
- Returns
dataframe with converted columns or
NoneifinplaceisTrue- Return type
- Raises
ValidationError – if
datais no dataframe or series
Examples
>>> from biopsykit.questionnaires.utils import convert_scale >>> data_in = pd.DataFrame({"A": [1, 2, 3, 1], "B": [4, 0, 1, 3], "C": [0, 3, 2, 3], "D": [0, 1, 2, 4]}) >>> # convert data from range [0, 4] to range [1, 5] >>> data_out = convert_scale(data_in, offset=1) >>> data_out["A"] >>> [2, 3, 4, 2] >>> data_out["B"] >>> [5, 1, 2, 4] >>> data_out["C"] >>> [1, 4, 3, 4] >>> data_out["D"] >>> [1, 2, 3, 5] >>> data_in = pd.DataFrame({"A": [1, 2, 3, 1], "B": [4, 2, 1, 3], "C": [3, 3, 2, 3], "D": [4, 1, 2, 4]}) >>> # convert data from range [1, 4] to range [0, 3] >>> data_out = convert_scale(data_in, offset=-1) >>> print(data_out) >>> # convert only specific columns >>> data_out = convert_scale(data_in, offset=-1, columns=["A", "C"]) >>> print(data_out)
- biopsykit.questionnaires.utils.crop_scale(data, score_range, set_nan=False, inplace=False)[source]¶
Crop questionnaire scales, i.e., set values out of range to specific minimum and maximum values or to NaN.
- Parameters
score_range (list of int) – possible score range of the questionnaire items. Values out of
score_rangeare cropped.set_nan (bool, optional) – whether to set values out of range to NaN or to the values specified by
score_range. Default:Falseinplace (bool, optional) – whether to perform the operation inplace or not. Default:
False
- Returns
dataframe (or series) with cropped scales or
NoneifinplaceisTrue- Return type
- biopsykit.questionnaires.utils.find_cols(data, regex_str=None, starts_with=None, ends_with=None, contains=None, zero_pad_numbers=True)[source]¶
Find columns in dataframe that match a specific pattern.
This function is useful to find all columns that belong to a questionnaire. Column names can be filtered based on one (or a combination of) the following criteria:
starts_with: columns have to start with the specified stringends_with: columns have to end with the specified stringcontains: columns have to contain the specified string
Optionally, the item numbers in the matching column names can be zero-padded, if they are not already.
Note
If
zero_pad_numbersisTruethen the column names returned by this function will be renamed and might thus not match the column names of the original dataframe. To solve this, make sure your orignal dataframe already has zero-padded columns (by manually renaming them) or convert column names usingzero_pad_columns().Warning
Zero-padding using
zero_pad_columns()assumes, by default, that numbers are at the end of column names. If you want to change that behavior (e.g., because the column names have string suffixes), you might need to apply zero-padding manually.- Parameters
data (
DataFrame) – dataframe with columns to be filteredregex_str (str, optional) – regex string to extract column names. If this parameter is passed the other parameters (
starts_with,ends_with,contains) will be ignored. Default:Nonestarts_with (str, optional) – string columns have to start with. Default:
Noneends_with (str, optional) – string columns have to end with. Default:
Nonecontains (str, optional) – string columns have to contain. Default:
Nonezero_pad_numbers (bool, optional) – whether to zero-pad numbers in column names. Default:
True
- Returns
- Return type
tuple[pandas.core.frame.DataFrame, collections.abc.Sequence[str]]
Examples
>>> import biopsykit as bp >>> import pandas as pd >>> # Option 1: has to start with "XX" >>> data = pd.DataFrame(columns=["XX_{}".format(i) for i in range(1, 11)]) >>> df, cols = bp.questionnaires.utils.find_cols(data, starts_with="XX") >>> print(cols) >>> ["XX_01", "XX_02", ..., "XX_10"] >>> # Option 2: has to end with "Post" >>> data = pd.DataFrame(columns=["XX_1_Pre", "XX_2_Pre", "XX_3_Pre", "XX_1_Post", "XX_2_Post", "XX_3_Post"]) >>> df, cols = bp.questionnaires.utils.find_cols(data, ends_with="Post") >>> print(cols) >>> ["XX_01_Post", "XX_02_Post", "XX_03_Post"] >>> # Option 3: has to start with "XX" and end with "Post" >>> data = pd.DataFrame(columns=["XX_1_Pre", "XX_2_Pre", "XX_3_Pre", "XX_1_Post", "XX_2_Post", "XX_3_Post", "YY_1_Pre", "YY_2_Pre", "YY_1_Post", "YY_2_Post"]) >>> bp.questionnaires.utils.find_cols(data, starts_with="XX", ends_with="Post") >>> print(cols) >>> # WARNING: this will not zero-pad the questionnaire numbers! >>> ["XX_1_Post", "XX_2_Post", "XX_3_Post"] >>> # Option 4: pass custom regex string >>> data = pd.DataFrame(columns=["XX_1_Pre", "XX_2_Pre", "XX_3_Pre", "XX_1_Post", "XX_2_Post", "XX_3_Post", "YY_1_Pre", "YY_2_Pre", "YY_1_Post", "YY_2_Post"]) >>> bp.questionnaires.utils.find_cols(data, regex_str=r"XX_\d+_\w+") >>> print(cols) >>> # here, zero-padding will be possible again >>> ["XX_01_Post", "XX_02_Post", "XX_03_Post"] >>> # Option 5: disable zero-padding >>> data = pd.DataFrame(columns=["XX_{}".format(i) for i in range(1, 11)]) >>> df, cols = bp.questionnaires.utils.find_cols(data, starts_with="XX", zero_pad_numbers=False) >>> print(cols) >>> ["XX_1", "XX_2", ..., "XX_10"]
- biopsykit.questionnaires.utils.get_supported_questionnaires()[source]¶
List all supported (i.e., implemented) questionnaires.
- Returns
dictionary with questionnaire names (keys) and description (values)
- Return type
- biopsykit.questionnaires.utils.invert(data, score_range, cols=None, inplace=False)[source]¶
Invert questionnaire scores.
In many questionnaires some items need to be inverted (reversed) before sum scores can be computed. This function can be used to either invert a single column (Series), selected columns in a dataframe (by specifying columns in the
colsparameter), or a complete dataframe.- Parameters
- Returns
dataframe with inverted columns or
NoneifinplaceisTrue- Return type
DataFrameorNone- Raises
ValidationError – if
datais no dataframe or series ifscore_rangedoes not have length 2ValueRangeError – if values in
dataare not inscore_range
Examples
>>> from biopsykit.questionnaires.utils import invert >>> data_in = pd.DataFrame({"A": [1, 2, 3, 1], "B": [4, 0, 1, 3], "C": [0, 3, 2, 3], "D": [0, 1, 2, 4]}) >>> data_out = invert(data_in, score_range=[0, 4]) >>> data_out["A"] >>> [3, 2, 1, 3] >>> data_out["B"] >>> [0, 4, 3, 1] >>> data_out["C"] >>> [4, 1, 2, 1] >>> data_out["D"] >>> [4, 3, 2, 0] >>> # Other score range >>> data_out = invert(data, score_range=[0, 5]) >>> data_out["A"] >>> [3, 2, 1, 3] >>> data_out["B"] >>> [1, 5, 4, 2] >>> data_out["C"] >>> [5, 2, 3, 2] >>> data_out["D"] >>> [5, 4, 3, 1] >>> # Invert only specific columns >>> data_out = invert(data, score_range=[0, 4], cols=["A", "C"]) >>> data_out["A"] >>> [3, 2, 1, 3] >>> data_out["B"] >>> [4, 0, 1, 3] >>> data_out["C"] >>> [4, 1, 2, 1] >>> data_out["D"] >>> [0, 1, 2, 4]
- biopsykit.questionnaires.utils.to_idx(col_idxs)[source]¶
Convert questionnaire item indices into array indices.
In questionnaires, items indices start at 1. To avoid confusion in the implementation of questionnaires (because array indices start at 0) all questionnaire indices in BioPsyKit also start at 1 and are converted to 0-based indexing using this function.
- biopsykit.questionnaires.utils.wide_to_long(data, quest_name, levels)[source]¶
Convert a dataframe wide-format into long-format.
Warning
This function is deprecated and will be removed in the future! Please use
wide_to_long()instead.- Parameters
data (
DataFrame) – pandas DataFrame containing saliva data in wide-format, i.e. one column per saliva sample, one row per subject.quest_name (str) – questionnaire name, i.e., common name for each column to be converted into long-format.
levels (str or list of str) – index levels of the resulting long-format dataframe.
- Returns
pandas DataFrame in long-format
- Return type
See also
wide_to_long()convert dataframe from wide to long format
- biopsykit.questionnaires.utils.zero_pad_columns(data, inplace=False)[source]¶
Add zero-padding to numbers at the end of column names in a dataframe.
Warning
By default, this function assumes that numbers are at the end of column names. If you need to change that behavior (e.g., because the column names have string suffixes), you might need to apply zero-padding manually.