Questionnaire Example¶
This example illustrates how to process questionnare data.
Setup and Helper Functions¶
[1]:
from pathlib import Path
import re
import pandas as pd
import numpy as np
from fau_colors import cmaps
import biopsykit as bp
import pingouin as pg
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")
palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)
plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"
palette
[2]:
Load Questionnaire Data¶
[3]:
# Example data
data = bp.example_data.get_questionnaire_example()
# Alternatively: Load your own data using bp.io.load_questionnaire_data()
# bp.io.load_questionnaire_data("<path-to-questionnaire-data>")
[4]:
data.head()
[4]:
PSS_01 | PSS_02 | PSS_03 | PSS_04 | PSS_05 | PSS_06 | PSS_07 | PSS_08 | PSS_09 | PSS_10 | ... | PASA_07 | PASA_08 | PASA_09 | PASA_10 | PASA_11 | PASA_12 | PASA_13 | PASA_14 | PASA_15 | PASA_16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | |||||||||||||||||||||
Vp01 | 3 | 2 | 3 | 3 | 2 | 2 | 2 | 2 | 3 | 1 | ... | 2 | 2 | 2 | 5 | 4 | 2 | 4 | 2 | 1 | 2 |
Vp02 | 1 | 1 | 1 | 3 | 2 | 1 | 3 | 3 | 1 | 0 | ... | 1 | 1 | 6 | 4 | 4 | 1 | 5 | 4 | 4 | 6 |
Vp03 | 2 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 3 | 1 | ... | 4 | 4 | 2 | 5 | 5 | 1 | 4 | 1 | 2 | 5 |
Vp04 | 2 | 2 | 2 | 3 | 2 | 2 | 3 | 2 | 2 | 1 | ... | 3 | 3 | 3 | 3 | 2 | 3 | 2 | 4 | 4 | 4 |
Vp05 | 0 | 2 | 2 | 3 | 2 | 1 | 3 | 3 | 2 | 1 | ... | 1 | 4 | 3 | 3 | 2 | 3 | 5 | 5 | 2 | 6 |
5 rows × 66 columns
Example 1: Compute Perceived Stress Scale (PSS)¶
In this example we compute the Perceived Stress Scale (PSS).
The PSS is a widely used self-report questionnaire with adequate reliability and validity asking about how stressful a person has found his/her life during the previous month.
Slice Dataframe and Select Columns¶
To extract only the columns belonging to the PSS questionnaire we can use the function utils.find_cols(). This function returns the sliced dataframe and the columns belonging to the questionnaire.
[5]:
data_pss, columns_pss = bp.questionnaires.utils.find_cols(data, starts_with="PSS")
data_pss.head()
[5]:
PSS_01 | PSS_02 | PSS_03 | PSS_04 | PSS_05 | PSS_06 | PSS_07 | PSS_08 | PSS_09 | PSS_10 | |
---|---|---|---|---|---|---|---|---|---|---|
subject | ||||||||||
Vp01 | 3 | 2 | 3 | 3 | 2 | 2 | 2 | 2 | 3 | 1 |
Vp02 | 1 | 1 | 1 | 3 | 2 | 1 | 3 | 3 | 1 | 0 |
Vp03 | 2 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 3 | 1 |
Vp04 | 2 | 2 | 2 | 3 | 2 | 2 | 3 | 2 | 2 | 1 |
Vp05 | 0 | 2 | 2 | 3 | 2 | 1 | 3 | 3 | 2 | 1 |
Compute PSS Score¶
We can compute the PSS score by passing the questionnaire data to the function questionnaires.pss().
This can be achieved on two ways: 1. Directly passing the sliced PSS dataframe 2. Passing the whole dataframe and a list of all column names that belong to the PSS. This option is better suited for computing multiple questionnaire scores at once (more on that later!)
Option 1: Sliced PSS dataframe¶
[6]:
pss = bp.questionnaires.pss(data_pss)
pss.head()
[6]:
PSS_Helpless | PSS_SelfEff | PSS_Total | |
---|---|---|---|
subject | |||
Vp01 | 14 | 7 | 21 |
Vp02 | 5 | 5 | 10 |
Vp03 | 14 | 10 | 24 |
Vp04 | 11 | 6 | 17 |
Vp05 | 8 | 5 | 13 |
Option 2: Whole dataframe + PSS columns¶
[7]:
pss = bp.questionnaires.pss(data, columns=columns_pss)
pss.head()
[7]:
PSS_Helpless | PSS_SelfEff | PSS_Total | |
---|---|---|---|
subject | |||
Vp01 | 14 | 7 | 21 |
Vp02 | 5 | 5 | 10 |
Vp03 | 14 | 10 | 24 |
Vp04 | 11 | 6 | 17 |
Vp05 | 8 | 5 | 13 |
Feature Demo: Compute PSS Score with Wrong Item Ranges¶
This example is supposed to demonstrate BioPsyKit
’s feature of asserting that questionnaire items are provided in the correct value range according to the original definition of the questionnaire before computing the actual questionnaire score.
In this example, we load an example dataset where the PSS items in this dataset are (wrongly) coded from 1
to 5
. The original definition of the PSS, however, was defined for items that are coded from 0
to 4
. Attempting to computing the PSS by passing the data to questionnaires.pss() will result in a
ValueRangeError.
Load Questionnaire Data with Wrong Item Ranges¶
[8]:
data_wrong = bp.example_data.get_questionnaire_example_wrong_range()
data_wrong.head()
[8]:
PSS_01 | PSS_02 | PSS_03 | PSS_04 | PSS_05 | PSS_06 | PSS_07 | PSS_08 | PSS_09 | PSS_10 | ... | PASA_07 | PASA_08 | PASA_09 | PASA_10 | PASA_11 | PASA_12 | PASA_13 | PASA_14 | PASA_15 | PASA_16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | |||||||||||||||||||||
Vp01 | 4 | 3 | 4 | 4 | 3 | 3 | 3 | 3 | 4 | 2 | ... | 2 | 2 | 2 | 5 | 4 | 2 | 4 | 2 | 1 | 2 |
Vp02 | 2 | 2 | 2 | 4 | 3 | 2 | 4 | 4 | 2 | 1 | ... | 1 | 1 | 6 | 4 | 4 | 1 | 5 | 4 | 4 | 6 |
Vp03 | 3 | 4 | 4 | 3 | 3 | 3 | 2 | 2 | 4 | 2 | ... | 4 | 4 | 2 | 5 | 5 | 1 | 4 | 1 | 2 | 5 |
Vp04 | 3 | 3 | 3 | 4 | 3 | 3 | 4 | 3 | 3 | 2 | ... | 3 | 3 | 3 | 3 | 2 | 3 | 2 | 4 | 4 | 4 |
Vp05 | 1 | 3 | 3 | 4 | 3 | 2 | 4 | 4 | 3 | 2 | ... | 1 | 4 | 3 | 3 | 2 | 3 | 5 | 5 | 2 | 6 |
5 rows × 66 columns
Slice Columns and Compute PSS Score¶
Note: This code will fail on purpose (the Exception is being catched) because the items are provided in the wrong range.
[9]:
data_pss_wrong, columns_pss = bp.questionnaires.utils.find_cols(data_wrong, starts_with="PSS")
[10]:
try:
pss = bp.questionnaires.pss(data_pss_wrong)
except bp.utils.exceptions.ValueRangeError as e:
print("ValueRangeError: {}".format(e))
ValueRangeError: Some of the values are out of the expected range. Expected were values in the range [0, 4], got values in the range [1, 5]. If values are part of questionnaire scores, you can convert questionnaire items into the correct range by calling `biopsykit.questionnaire.utils.convert_scale()`.
Solution: Convert (Recode) Questionnaire Items¶
To solve this issue we need to convert the PSS questionnaire items into the correct value range first by simply subtracting all values by -1
. This can easily be done using the function utils.convert_scale(). This can also be done on two different ways:
Convert the whole, sliced PSS dataframe
Convert only the PSS columns, leave the other columns
Option 1: Convert the sliced PSS dataframe¶
[11]:
data_pss_conv = bp.questionnaires.utils.convert_scale(data_pss_wrong, offset=-1)
data_pss_conv.head()
[11]:
PSS_01 | PSS_02 | PSS_03 | PSS_04 | PSS_05 | PSS_06 | PSS_07 | PSS_08 | PSS_09 | PSS_10 | |
---|---|---|---|---|---|---|---|---|---|---|
subject | ||||||||||
Vp01 | 3 | 2 | 3 | 3 | 2 | 2 | 2 | 2 | 3 | 1 |
Vp02 | 1 | 1 | 1 | 3 | 2 | 1 | 3 | 3 | 1 | 0 |
Vp03 | 2 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 3 | 1 |
Vp04 | 2 | 2 | 2 | 3 | 2 | 2 | 3 | 2 | 2 | 1 |
Vp05 | 0 | 2 | 2 | 3 | 2 | 1 | 3 | 3 | 2 | 1 |
Option 2: Convert only the PSS columns, leave the other columns unchanged¶
[12]:
data_conv = bp.questionnaires.utils.convert_scale(data_wrong, cols=columns_pss, offset=-1)
data_conv.head()
[12]:
PSS_01 | PSS_02 | PSS_03 | PSS_04 | PSS_05 | PSS_06 | PSS_07 | PSS_08 | PSS_09 | PSS_10 | ... | PASA_07 | PASA_08 | PASA_09 | PASA_10 | PASA_11 | PASA_12 | PASA_13 | PASA_14 | PASA_15 | PASA_16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | |||||||||||||||||||||
Vp01 | 3 | 2 | 3 | 3 | 2 | 2 | 2 | 2 | 3 | 1 | ... | 2 | 2 | 2 | 5 | 4 | 2 | 4 | 2 | 1 | 2 |
Vp02 | 1 | 1 | 1 | 3 | 2 | 1 | 3 | 3 | 1 | 0 | ... | 1 | 1 | 6 | 4 | 4 | 1 | 5 | 4 | 4 | 6 |
Vp03 | 2 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 3 | 1 | ... | 4 | 4 | 2 | 5 | 5 | 1 | 4 | 1 | 2 | 5 |
Vp04 | 2 | 2 | 2 | 3 | 2 | 2 | 3 | 2 | 2 | 1 | ... | 3 | 3 | 3 | 3 | 2 | 3 | 2 | 4 | 4 | 4 |
Vp05 | 0 | 2 | 2 | 3 | 2 | 1 | 3 | 3 | 2 | 1 | ... | 1 | 4 | 3 | 3 | 2 | 3 | 5 | 5 | 2 | 6 |
5 rows × 66 columns
Compute PSS Score (Finally!)¶
Now the scores are in the correct range and we can compute the PSS score:
[13]:
# Option 1: the sliced PSS dataframe
pss = bp.questionnaires.pss(data_pss_conv)
pss.head()
[13]:
PSS_Helpless | PSS_SelfEff | PSS_Total | |
---|---|---|---|
subject | |||
Vp01 | 14 | 7 | 21 |
Vp02 | 5 | 5 | 10 |
Vp03 | 14 | 10 | 24 |
Vp04 | 11 | 6 | 17 |
Vp05 | 8 | 5 | 13 |
[14]:
# Option 2: the whole dataframe + PSS columns
pss = bp.questionnaires.pss(data_conv, columns=columns_pss)
pss.head()
[14]:
PSS_Helpless | PSS_SelfEff | PSS_Total | |
---|---|---|---|
subject | |||
Vp01 | 14 | 7 | 21 |
Vp02 | 5 | 5 | 10 |
Vp03 | 14 | 10 | 24 |
Vp04 | 11 | 6 | 17 |
Vp05 | 8 | 5 | 13 |
Example 2: Compute Positive and Negative Affect Schedule (PANAS)¶
The PANAS assesses positive affect (interested, excited, strong, enthusiastic, proud, alert, inspired, determined, attentive, and active) and negative affect (distressed, upset, guilty, scared, hostile, irritable, ashamed, nervous, jittery, and afraid).
Higher scores on each subscale indicate greater positive or negative affect.
Slice Dataframe and Select Columns¶
In this example, the PANAS was assessed pre and post Stress:
[15]:
data_panas_pre, columns_panas_pre = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Pre")
data_panas_post, columns_panas_post = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Post")
Compute PANAS¶
[16]:
panas_pre = bp.questionnaires.panas(data_panas_pre)
panas_pre.head()
[16]:
PANAS_NegativeAffect | PANAS_PositiveAffect | PANAS_Total | |
---|---|---|---|
subject | |||
Vp01 | 2.2 | 2.4 | 3.10 |
Vp02 | 2.5 | 2.3 | 2.90 |
Vp03 | 3.0 | 2.1 | 2.55 |
Vp04 | 2.0 | 2.8 | 3.40 |
Vp05 | 2.4 | 1.6 | 2.60 |
[17]:
panas_post = bp.questionnaires.panas(data_panas_post)
panas_post.head()
[17]:
PANAS_NegativeAffect | PANAS_PositiveAffect | PANAS_Total | |
---|---|---|---|
subject | |||
Vp01 | 2.2 | 2.8 | 3.30 |
Vp02 | 1.9 | 2.7 | 3.40 |
Vp03 | 2.2 | 3.3 | 3.55 |
Vp04 | 1.6 | 1.9 | 3.15 |
Vp05 | 2.3 | 2.3 | 3.00 |
Example 3: Compute Multiple Scores at Once¶
Build a dictionary where each key corresponds to the questionnaire score to be computed and each value corresponds to the columns of the questionnaire. If some scores were assessed repeatedly (e.g. PANAS was assessed at two different time points, pre and post) separate the suffix from the computation by a -
(e.g. panas-pre
and panas-post
).
Load Example Questionnaire Data¶
[18]:
data = bp.example_data.get_questionnaire_example()
data.head()
[18]:
PSS_01 | PSS_02 | PSS_03 | PSS_04 | PSS_05 | PSS_06 | PSS_07 | PSS_08 | PSS_09 | PSS_10 | ... | PASA_07 | PASA_08 | PASA_09 | PASA_10 | PASA_11 | PASA_12 | PASA_13 | PASA_14 | PASA_15 | PASA_16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | |||||||||||||||||||||
Vp01 | 3 | 2 | 3 | 3 | 2 | 2 | 2 | 2 | 3 | 1 | ... | 2 | 2 | 2 | 5 | 4 | 2 | 4 | 2 | 1 | 2 |
Vp02 | 1 | 1 | 1 | 3 | 2 | 1 | 3 | 3 | 1 | 0 | ... | 1 | 1 | 6 | 4 | 4 | 1 | 5 | 4 | 4 | 6 |
Vp03 | 2 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 3 | 1 | ... | 4 | 4 | 2 | 5 | 5 | 1 | 4 | 1 | 2 | 5 |
Vp04 | 2 | 2 | 2 | 3 | 2 | 2 | 3 | 2 | 2 | 1 | ... | 3 | 3 | 3 | 3 | 2 | 3 | 2 | 4 | 4 | 4 |
Vp05 | 0 | 2 | 2 | 3 | 2 | 1 | 3 | 3 | 2 | 1 | ... | 1 | 4 | 3 | 3 | 2 | 3 | 5 | 5 | 2 | 6 |
5 rows × 66 columns
[19]:
from biopsykit.questionnaires.utils import find_cols
dict_scores = {
"pss": find_cols(data, starts_with="PSS")[1],
"pasa": find_cols(data, starts_with="PASA")[1],
"panas-pre": find_cols(data, starts_with="PANAS", ends_with="Pre")[1],
"panas-post": find_cols(data, starts_with="PANAS", ends_with="Post")[1],
}
[20]:
# Compute all scores and store in result dataframe
data_scores = bp.questionnaires.utils.compute_scores(data, dict_scores)
data_scores.head()
[20]:
PSS_Helpless | PSS_SelfEff | PSS_Total | PASA_Threat | PASA_Challenge | PASA_SelfConcept | PASA_ControlExp | PASA_Primary | PASA_Secondary | PASA_StressComposite | PANAS_NegativeAffect_pre | PANAS_PositiveAffect_pre | PANAS_Total_pre | PANAS_NegativeAffect_post | PANAS_PositiveAffect_post | PANAS_Total_post | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | ||||||||||||||||
Vp01 | 14 | 7 | 21 | 19 | 12 | 15 | 10 | 15.5 | 12.5 | 3.0 | 2.2 | 2.4 | 3.10 | 2.2 | 2.8 | 3.30 |
Vp02 | 5 | 5 | 10 | 18 | 18 | 19 | 14 | 18.0 | 16.5 | 1.5 | 2.5 | 2.3 | 2.90 | 1.9 | 2.7 | 3.40 |
Vp03 | 14 | 10 | 24 | 18 | 11 | 15 | 11 | 14.5 | 13.0 | 1.5 | 3.0 | 2.1 | 2.55 | 2.2 | 3.3 | 3.55 |
Vp04 | 11 | 6 | 17 | 13 | 14 | 12 | 12 | 13.5 | 12.0 | 1.5 | 2.0 | 2.8 | 3.40 | 1.6 | 1.9 | 3.15 |
Vp05 | 8 | 5 | 13 | 19 | 18 | 11 | 16 | 18.5 | 13.5 | 5.0 | 2.4 | 1.6 | 2.60 | 2.3 | 2.3 | 3.00 |
Convert Scores into Long Format¶
[21]:
data_scores.head()
[21]:
PSS_Helpless | PSS_SelfEff | PSS_Total | PASA_Threat | PASA_Challenge | PASA_SelfConcept | PASA_ControlExp | PASA_Primary | PASA_Secondary | PASA_StressComposite | PANAS_NegativeAffect_pre | PANAS_PositiveAffect_pre | PANAS_Total_pre | PANAS_NegativeAffect_post | PANAS_PositiveAffect_post | PANAS_Total_post | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | ||||||||||||||||
Vp01 | 14 | 7 | 21 | 19 | 12 | 15 | 10 | 15.5 | 12.5 | 3.0 | 2.2 | 2.4 | 3.10 | 2.2 | 2.8 | 3.30 |
Vp02 | 5 | 5 | 10 | 18 | 18 | 19 | 14 | 18.0 | 16.5 | 1.5 | 2.5 | 2.3 | 2.90 | 1.9 | 2.7 | 3.40 |
Vp03 | 14 | 10 | 24 | 18 | 11 | 15 | 11 | 14.5 | 13.0 | 1.5 | 3.0 | 2.1 | 2.55 | 2.2 | 3.3 | 3.55 |
Vp04 | 11 | 6 | 17 | 13 | 14 | 12 | 12 | 13.5 | 12.0 | 1.5 | 2.0 | 2.8 | 3.40 | 1.6 | 1.9 | 3.15 |
Vp05 | 8 | 5 | 13 | 19 | 18 | 11 | 16 | 18.5 | 13.5 | 5.0 | 2.4 | 1.6 | 2.60 | 2.3 | 2.3 | 3.00 |
Questionnaires that only have different subscales => Create one new index level subscale
:
[22]:
print(list(data_scores.filter(like="PASA").columns))
['PASA_Threat', 'PASA_Challenge', 'PASA_SelfConcept', 'PASA_ControlExp', 'PASA_Primary', 'PASA_Secondary', 'PASA_StressComposite']
[23]:
pasa = bp.questionnaires.utils.wide_to_long(data_scores, quest_name="PASA", levels=["subscale"])
pasa.head()
[23]:
PASA | ||
---|---|---|
subject | subscale | |
Vp01 | Challenge | 12.0 |
ControlExp | 10.0 | |
Primary | 15.5 | |
Secondary | 12.5 | |
SelfConcept | 15.0 |
Questionnaires that have different subscales and different assessment times => Create two new index levels subscale
and time
[24]:
print(list(data_scores.filter(like="PANAS").columns))
['PANAS_NegativeAffect_pre', 'PANAS_PositiveAffect_pre', 'PANAS_Total_pre', 'PANAS_NegativeAffect_post', 'PANAS_PositiveAffect_post', 'PANAS_Total_post']
utils.wide_to_long() converts the data into the wide format recursively from the first level (here: subscale
) to the last level (here: time
):
[25]:
panas = bp.questionnaires.utils.wide_to_long(data_scores, quest_name="PANAS", levels=["subscale", "time"])
panas.head()
[25]:
PANAS | |||
---|---|---|---|
subject | subscale | time | |
Vp01 | NegativeAffect | post | 2.2 |
pre | 2.2 | ||
PositiveAffect | post | 2.8 | |
pre | 2.4 | ||
Total | post | 3.3 |
Plotting¶
In one Plot¶
[26]:
fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
data=panas, x="subscale", y="PANAS", hue="time", hue_order=["pre", "post"], palette=cmaps.faculties_light, ax=ax
);
Note: See Documentation for plotting.feature_boxplot() for further information of the used functions.
In Subplots¶
Regular¶
[27]:
fig, axs = plt.subplots(ncols=3)
bp.plotting.multi_feature_boxplot(
data=panas,
x="time",
y="PANAS",
features=["NegativeAffect", "PositiveAffect", "Total"],
group="subscale",
order=["pre", "post"],
palette=cmaps.faculties_light,
ax=axs,
)
fig.tight_layout()
Note: See Documentation for plotting.multi_feature_boxplot() for further information of the used functions.
With Significance Brackets¶
Note: See StatsPipeline_Plotting_Example.ipynb for further information!
[28]:
pipeline = bp.stats.StatsPipeline(
steps=[("prep", "normality"), ("test", "pairwise_tests")],
params={"dv": "PANAS", "groupby": "subscale", "subject": "subject", "within": "time"},
)
pipeline.apply(panas);
[29]:
fig, axs = plt.subplots(ncols=3)
features = ["NegativeAffect", "PositiveAffect", "Total"]
box_pairs, pvalues = pipeline.sig_brackets(
"test", stats_effect_type="within", plot_type="single", x="time", features=features, subplots=True
)
bp.plotting.multi_feature_boxplot(
data=panas,
x="time",
y="PANAS",
features=features,
group="subscale",
order=["pre", "post"],
stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
palette=cmaps.faculties_light,
ax=axs,
)
for ax, feature in zip(axs, features):
ax.set_title(feature)
fig.tight_layout()
[ ]: