Questionnaire Example

This example illustrates how to process questionnare data.

Setup and Helper Functions

[1]:
from pathlib import Path

import re

import pandas as pd
import numpy as np

from fau_colors import cmaps
import biopsykit as bp
import pingouin as pg

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")

palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)

plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

palette
[2]:

Load Questionnaire Data

[3]:
# Example data
data = bp.example_data.get_questionnaire_example()
# Alternatively: Load your own data using bp.io.load_questionnaire_data()
# bp.io.load_questionnaire_data("<path-to-questionnaire-data>")
[4]:
data.head()
[4]:
PSS_01 PSS_02 PSS_03 PSS_04 PSS_05 PSS_06 PSS_07 PSS_08 PSS_09 PSS_10 ... PASA_07 PASA_08 PASA_09 PASA_10 PASA_11 PASA_12 PASA_13 PASA_14 PASA_15 PASA_16
subject
Vp01 3 2 3 3 2 2 2 2 3 1 ... 2 2 2 5 4 2 4 2 1 2
Vp02 1 1 1 3 2 1 3 3 1 0 ... 1 1 6 4 4 1 5 4 4 6
Vp03 2 3 3 2 2 2 1 1 3 1 ... 4 4 2 5 5 1 4 1 2 5
Vp04 2 2 2 3 2 2 3 2 2 1 ... 3 3 3 3 2 3 2 4 4 4
Vp05 0 2 2 3 2 1 3 3 2 1 ... 1 4 3 3 2 3 5 5 2 6

5 rows × 66 columns

Example 1: Compute Perceived Stress Scale (PSS)

In this example we compute the Perceived Stress Scale (PSS).

The PSS is a widely used self-report questionnaire with adequate reliability and validity asking about how stressful a person has found his/her life during the previous month.

Slice Dataframe and Select Columns

To extract only the columns belonging to the PSS questionnaire we can use the function utils.find_cols(). This function returns the sliced dataframe and the columns belonging to the questionnaire.

[5]:
data_pss, columns_pss = bp.questionnaires.utils.find_cols(data, starts_with="PSS")
data_pss.head()
[5]:
PSS_01 PSS_02 PSS_03 PSS_04 PSS_05 PSS_06 PSS_07 PSS_08 PSS_09 PSS_10
subject
Vp01 3 2 3 3 2 2 2 2 3 1
Vp02 1 1 1 3 2 1 3 3 1 0
Vp03 2 3 3 2 2 2 1 1 3 1
Vp04 2 2 2 3 2 2 3 2 2 1
Vp05 0 2 2 3 2 1 3 3 2 1

Compute PSS Score

We can compute the PSS score by passing the questionnaire data to the function questionnaires.pss().

This can be achieved on two ways: 1. Directly passing the sliced PSS dataframe 2. Passing the whole dataframe and a list of all column names that belong to the PSS. This option is better suited for computing multiple questionnaire scores at once (more on that later!)

Option 1: Sliced PSS dataframe

[6]:
pss = bp.questionnaires.pss(data_pss)
pss.head()
[6]:
PSS_Helpless PSS_SelfEff PSS_Total
subject
Vp01 14 7 21
Vp02 5 5 10
Vp03 14 10 24
Vp04 11 6 17
Vp05 8 5 13

Option 2: Whole dataframe + PSS columns

[7]:
pss = bp.questionnaires.pss(data, columns=columns_pss)
pss.head()
[7]:
PSS_Helpless PSS_SelfEff PSS_Total
subject
Vp01 14 7 21
Vp02 5 5 10
Vp03 14 10 24
Vp04 11 6 17
Vp05 8 5 13

Feature Demo: Compute PSS Score with Wrong Item Ranges

This example is supposed to demonstrate BioPsyKit’s feature of asserting that questionnaire items are provided in the correct value range according to the original definition of the questionnaire before computing the actual questionnaire score.

In this example, we load an example dataset where the PSS items in this dataset are (wrongly) coded from 1 to 5. The original definition of the PSS, however, was defined for items that are coded from 0 to 4. Attempting to computing the PSS by passing the data to questionnaires.pss() will result in a ValueRangeError.

Load Questionnaire Data with Wrong Item Ranges

[8]:
data_wrong = bp.example_data.get_questionnaire_example_wrong_range()
data_wrong.head()
[8]:
PSS_01 PSS_02 PSS_03 PSS_04 PSS_05 PSS_06 PSS_07 PSS_08 PSS_09 PSS_10 ... PASA_07 PASA_08 PASA_09 PASA_10 PASA_11 PASA_12 PASA_13 PASA_14 PASA_15 PASA_16
subject
Vp01 4 3 4 4 3 3 3 3 4 2 ... 2 2 2 5 4 2 4 2 1 2
Vp02 2 2 2 4 3 2 4 4 2 1 ... 1 1 6 4 4 1 5 4 4 6
Vp03 3 4 4 3 3 3 2 2 4 2 ... 4 4 2 5 5 1 4 1 2 5
Vp04 3 3 3 4 3 3 4 3 3 2 ... 3 3 3 3 2 3 2 4 4 4
Vp05 1 3 3 4 3 2 4 4 3 2 ... 1 4 3 3 2 3 5 5 2 6

5 rows × 66 columns

Slice Columns and Compute PSS Score

Note: This code will fail on purpose (the Exception is being catched) because the items are provided in the wrong range.

[9]:
data_pss_wrong, columns_pss = bp.questionnaires.utils.find_cols(data_wrong, starts_with="PSS")
[10]:
try:
    pss = bp.questionnaires.pss(data_pss_wrong)
except bp.utils.exceptions.ValueRangeError as e:
    print("ValueRangeError: {}".format(e))
ValueRangeError: Some of the values are out of the expected range. Expected were values in the range [0, 4], got values in the range [1, 5]. If values are part of questionnaire scores, you can convert questionnaire items into the correct range by calling `biopsykit.questionnaire.utils.convert_scale()`.

Solution: Convert (Recode) Questionnaire Items

To solve this issue we need to convert the PSS questionnaire items into the correct value range first by simply subtracting all values by -1. This can easily be done using the function utils.convert_scale(). This can also be done on two different ways:

  1. Convert the whole, sliced PSS dataframe

  2. Convert only the PSS columns, leave the other columns

Option 1: Convert the sliced PSS dataframe
[11]:
data_pss_conv = bp.questionnaires.utils.convert_scale(data_pss_wrong, offset=-1)
data_pss_conv.head()
[11]:
PSS_01 PSS_02 PSS_03 PSS_04 PSS_05 PSS_06 PSS_07 PSS_08 PSS_09 PSS_10
subject
Vp01 3 2 3 3 2 2 2 2 3 1
Vp02 1 1 1 3 2 1 3 3 1 0
Vp03 2 3 3 2 2 2 1 1 3 1
Vp04 2 2 2 3 2 2 3 2 2 1
Vp05 0 2 2 3 2 1 3 3 2 1
Option 2: Convert only the PSS columns, leave the other columns unchanged
[12]:
data_conv = bp.questionnaires.utils.convert_scale(data_wrong, cols=columns_pss, offset=-1)
data_conv.head()
[12]:
PSS_01 PSS_02 PSS_03 PSS_04 PSS_05 PSS_06 PSS_07 PSS_08 PSS_09 PSS_10 ... PASA_07 PASA_08 PASA_09 PASA_10 PASA_11 PASA_12 PASA_13 PASA_14 PASA_15 PASA_16
subject
Vp01 3 2 3 3 2 2 2 2 3 1 ... 2 2 2 5 4 2 4 2 1 2
Vp02 1 1 1 3 2 1 3 3 1 0 ... 1 1 6 4 4 1 5 4 4 6
Vp03 2 3 3 2 2 2 1 1 3 1 ... 4 4 2 5 5 1 4 1 2 5
Vp04 2 2 2 3 2 2 3 2 2 1 ... 3 3 3 3 2 3 2 4 4 4
Vp05 0 2 2 3 2 1 3 3 2 1 ... 1 4 3 3 2 3 5 5 2 6

5 rows × 66 columns

Compute PSS Score (Finally!)

Now the scores are in the correct range and we can compute the PSS score:

[13]:
# Option 1: the sliced PSS dataframe
pss = bp.questionnaires.pss(data_pss_conv)
pss.head()
[13]:
PSS_Helpless PSS_SelfEff PSS_Total
subject
Vp01 14 7 21
Vp02 5 5 10
Vp03 14 10 24
Vp04 11 6 17
Vp05 8 5 13
[14]:
# Option 2: the whole dataframe + PSS columns
pss = bp.questionnaires.pss(data_conv, columns=columns_pss)
pss.head()
[14]:
PSS_Helpless PSS_SelfEff PSS_Total
subject
Vp01 14 7 21
Vp02 5 5 10
Vp03 14 10 24
Vp04 11 6 17
Vp05 8 5 13

Example 2: Compute Positive and Negative Affect Schedule (PANAS)

The PANAS assesses positive affect (interested, excited, strong, enthusiastic, proud, alert, inspired, determined, attentive, and active) and negative affect (distressed, upset, guilty, scared, hostile, irritable, ashamed, nervous, jittery, and afraid).

Higher scores on each subscale indicate greater positive or negative affect.

Slice Dataframe and Select Columns

In this example, the PANAS was assessed pre and post Stress:

[15]:
data_panas_pre, columns_panas_pre = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Pre")
data_panas_post, columns_panas_post = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Post")

Compute PANAS

[16]:
panas_pre = bp.questionnaires.panas(data_panas_pre)
panas_pre.head()
[16]:
PANAS_NegativeAffect PANAS_PositiveAffect PANAS_Total
subject
Vp01 2.2 2.4 3.10
Vp02 2.5 2.3 2.90
Vp03 3.0 2.1 2.55
Vp04 2.0 2.8 3.40
Vp05 2.4 1.6 2.60
[17]:
panas_post = bp.questionnaires.panas(data_panas_post)
panas_post.head()
[17]:
PANAS_NegativeAffect PANAS_PositiveAffect PANAS_Total
subject
Vp01 2.2 2.8 3.30
Vp02 1.9 2.7 3.40
Vp03 2.2 3.3 3.55
Vp04 1.6 1.9 3.15
Vp05 2.3 2.3 3.00

Example 3: Compute Multiple Scores at Once

Build a dictionary where each key corresponds to the questionnaire score to be computed and each value corresponds to the columns of the questionnaire. If some scores were assessed repeatedly (e.g. PANAS was assessed at two different time points, pre and post) separate the suffix from the computation by a - (e.g. panas-pre and panas-post).

Load Example Questionnaire Data

[18]:
data = bp.example_data.get_questionnaire_example()
data.head()
[18]:
PSS_01 PSS_02 PSS_03 PSS_04 PSS_05 PSS_06 PSS_07 PSS_08 PSS_09 PSS_10 ... PASA_07 PASA_08 PASA_09 PASA_10 PASA_11 PASA_12 PASA_13 PASA_14 PASA_15 PASA_16
subject
Vp01 3 2 3 3 2 2 2 2 3 1 ... 2 2 2 5 4 2 4 2 1 2
Vp02 1 1 1 3 2 1 3 3 1 0 ... 1 1 6 4 4 1 5 4 4 6
Vp03 2 3 3 2 2 2 1 1 3 1 ... 4 4 2 5 5 1 4 1 2 5
Vp04 2 2 2 3 2 2 3 2 2 1 ... 3 3 3 3 2 3 2 4 4 4
Vp05 0 2 2 3 2 1 3 3 2 1 ... 1 4 3 3 2 3 5 5 2 6

5 rows × 66 columns

[19]:
from biopsykit.questionnaires.utils import find_cols

dict_scores = {
    "pss": find_cols(data, starts_with="PSS")[1],
    "pasa": find_cols(data, starts_with="PASA")[1],
    "panas-pre": find_cols(data, starts_with="PANAS", ends_with="Pre")[1],
    "panas-post": find_cols(data, starts_with="PANAS", ends_with="Post")[1],
}
[20]:
# Compute all scores and store in result dataframe
data_scores = bp.questionnaires.utils.compute_scores(data, dict_scores)
data_scores.head()
[20]:
PSS_Helpless PSS_SelfEff PSS_Total PASA_Threat PASA_Challenge PASA_SelfConcept PASA_ControlExp PASA_Primary PASA_Secondary PASA_StressComposite PANAS_NegativeAffect_pre PANAS_PositiveAffect_pre PANAS_Total_pre PANAS_NegativeAffect_post PANAS_PositiveAffect_post PANAS_Total_post
subject
Vp01 14 7 21 19 12 15 10 15.5 12.5 3.0 2.2 2.4 3.10 2.2 2.8 3.30
Vp02 5 5 10 18 18 19 14 18.0 16.5 1.5 2.5 2.3 2.90 1.9 2.7 3.40
Vp03 14 10 24 18 11 15 11 14.5 13.0 1.5 3.0 2.1 2.55 2.2 3.3 3.55
Vp04 11 6 17 13 14 12 12 13.5 12.0 1.5 2.0 2.8 3.40 1.6 1.9 3.15
Vp05 8 5 13 19 18 11 16 18.5 13.5 5.0 2.4 1.6 2.60 2.3 2.3 3.00

Convert Scores into Long Format

[21]:
data_scores.head()
[21]:
PSS_Helpless PSS_SelfEff PSS_Total PASA_Threat PASA_Challenge PASA_SelfConcept PASA_ControlExp PASA_Primary PASA_Secondary PASA_StressComposite PANAS_NegativeAffect_pre PANAS_PositiveAffect_pre PANAS_Total_pre PANAS_NegativeAffect_post PANAS_PositiveAffect_post PANAS_Total_post
subject
Vp01 14 7 21 19 12 15 10 15.5 12.5 3.0 2.2 2.4 3.10 2.2 2.8 3.30
Vp02 5 5 10 18 18 19 14 18.0 16.5 1.5 2.5 2.3 2.90 1.9 2.7 3.40
Vp03 14 10 24 18 11 15 11 14.5 13.0 1.5 3.0 2.1 2.55 2.2 3.3 3.55
Vp04 11 6 17 13 14 12 12 13.5 12.0 1.5 2.0 2.8 3.40 1.6 1.9 3.15
Vp05 8 5 13 19 18 11 16 18.5 13.5 5.0 2.4 1.6 2.60 2.3 2.3 3.00

Questionnaires that only have different subscales => Create one new index level subscale:

[22]:
print(list(data_scores.filter(like="PASA").columns))
['PASA_Threat', 'PASA_Challenge', 'PASA_SelfConcept', 'PASA_ControlExp', 'PASA_Primary', 'PASA_Secondary', 'PASA_StressComposite']
[23]:
pasa = bp.questionnaires.utils.wide_to_long(data_scores, quest_name="PASA", levels=["subscale"])
pasa.head()
[23]:
PASA
subject subscale
Vp01 Challenge 12.0
ControlExp 10.0
Primary 15.5
Secondary 12.5
SelfConcept 15.0

Questionnaires that have different subscales and different assessment times => Create two new index levels subscale and time

[24]:
print(list(data_scores.filter(like="PANAS").columns))
['PANAS_NegativeAffect_pre', 'PANAS_PositiveAffect_pre', 'PANAS_Total_pre', 'PANAS_NegativeAffect_post', 'PANAS_PositiveAffect_post', 'PANAS_Total_post']

utils.wide_to_long() converts the data into the wide format recursively from the first level (here: subscale) to the last level (here: time):

[25]:
panas = bp.questionnaires.utils.wide_to_long(data_scores, quest_name="PANAS", levels=["subscale", "time"])
panas.head()
[25]:
PANAS
subject subscale time
Vp01 NegativeAffect post 2.2
pre 2.2
PositiveAffect post 2.8
pre 2.4
Total post 3.3

Plotting

In one Plot

[26]:
fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
    data=panas, x="subscale", y="PANAS", hue="time", hue_order=["pre", "post"], palette=cmaps.faculties_light, ax=ax
);
../../_images/examples__notebooks_Questionnaire_Example_62_0.svg

Note: See Documentation for plotting.feature_boxplot() for further information of the used functions.

In Subplots

Regular

[27]:
fig, axs = plt.subplots(ncols=3)
bp.plotting.multi_feature_boxplot(
    data=panas,
    x="time",
    y="PANAS",
    features=["NegativeAffect", "PositiveAffect", "Total"],
    group="subscale",
    order=["pre", "post"],
    palette=cmaps.faculties_light,
    ax=axs,
)
fig.tight_layout()
../../_images/examples__notebooks_Questionnaire_Example_66_0.svg

Note: See Documentation for plotting.multi_feature_boxplot() for further information of the used functions.

With Significance Brackets

Note: See StatsPipeline_Plotting_Example.ipynb for further information!

[28]:
pipeline = bp.stats.StatsPipeline(
    steps=[("prep", "normality"), ("test", "pairwise_tests")],
    params={"dv": "PANAS", "groupby": "subscale", "subject": "subject", "within": "time"},
)

pipeline.apply(panas);
[29]:
fig, axs = plt.subplots(ncols=3)

features = ["NegativeAffect", "PositiveAffect", "Total"]

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="within", plot_type="single", x="time", features=features, subplots=True
)

bp.plotting.multi_feature_boxplot(
    data=panas,
    x="time",
    y="PANAS",
    features=features,
    group="subscale",
    order=["pre", "post"],
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    palette=cmaps.faculties_light,
    ax=axs,
)
for ax, feature in zip(axs, features):
    ax.set_title(feature)

fig.tight_layout()
../../_images/examples__notebooks_Questionnaire_Example_70_0.svg
[ ]:

Download Notebook
(Right-Click -> Save Link As...)