Questionnaire Example — BioPsyKit 0.10.2 documentation

Setup and Helper Functions¶

[1]:

from pathlib import Path

import re

import pandas as pd
import numpy as np

from fau_colors import cmaps
import biopsykit as bp
import pingouin as pg

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%load_ext autoreload
%autoreload 2

[2]:

plt.close("all")

palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)

plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

palette

[2]:

Load Questionnaire Data¶

[3]:

# Example data
data = bp.example_data.get_questionnaire_example()
# Alternatively: Load your own data using bp.io.load_questionnaire_data()
# bp.io.load_questionnaire_data("<path-to-questionnaire-data>")

[4]:

data.head()

[4]:

	PSS_01	PSS_02	PSS_03	PSS_04	PSS_05	PSS_06	PSS_07	PSS_08	PSS_09	PSS_10	...	PASA_07	PASA_08	PASA_09	PASA_10	PASA_11	PASA_12	PASA_13	PASA_14	PASA_15	PASA_16
subject
Vp01	3	2	3	3	2	2	2	2	3	1	...	2	2	2	5	4	2	4	2	1	2
Vp02	1	1	1	3	2	1	3	3	1	0	...	1	1	6	4	4	1	5	4	4	6
Vp03	2	3	3	2	2	2	1	1	3	1	...	4	4	2	5	5	1	4	1	2	5
Vp04	2	2	2	3	2	2	3	2	2	1	...	3	3	3	3	2	3	2	4	4	4
Vp05	0	2	2	3	2	1	3	3	2	1	...	1	4	3	3	2	3	5	5	2	6

5 rows × 66 columns

Example 1: Compute Perceived Stress Scale (PSS)¶

In this example we compute the Perceived Stress Scale (PSS).

The PSS is a widely used self-report questionnaire with adequate reliability and validity asking about how stressful a person has found his/her life during the previous month.

Slice Dataframe and Select Columns¶

To extract only the columns belonging to the PSS questionnaire we can use the function utils.find_cols(). This function returns the sliced dataframe and the columns belonging to the questionnaire.

[5]:

data_pss, columns_pss = bp.questionnaires.utils.find_cols(data, starts_with="PSS")
data_pss.head()

[5]:

	PSS_01	PSS_02	PSS_03	PSS_04	PSS_05	PSS_06	PSS_07	PSS_08	PSS_09	PSS_10
subject
Vp01	3	2	3	3	2	2	2	2	3	1
Vp02	1	1	1	3	2	1	3	3	1	0
Vp03	2	3	3	2	2	2	1	1	3	1
Vp04	2	2	2	3	2	2	3	2	2	1
Vp05	0	2	2	3	2	1	3	3	2	1

Compute PSS Score¶

We can compute the PSS score by passing the questionnaire data to the function questionnaires.pss().

This can be achieved on two ways: 1. Directly passing the sliced PSS dataframe 2. Passing the whole dataframe and a list of all column names that belong to the PSS. This option is better suited for computing multiple questionnaire scores at once (more on that later!)

Option 1: Sliced PSS dataframe¶

[6]:

pss = bp.questionnaires.pss(data_pss)
pss.head()

[6]:

	PSS_Helpless	PSS_SelfEff	PSS_Total
subject
Vp01	14	7	21
Vp02	5	5	10
Vp03	14	10	24
Vp04	11	6	17
Vp05	8	5	13

Option 2: Whole dataframe + PSS columns¶

[7]:

pss = bp.questionnaires.pss(data, columns=columns_pss)
pss.head()

[7]:

	PSS_Helpless	PSS_SelfEff	PSS_Total
subject
Vp01	14	7	21
Vp02	5	5	10
Vp03	14	10	24
Vp04	11	6	17
Vp05	8	5	13

Feature Demo: Compute PSS Score with Wrong Item Ranges¶

This example is supposed to demonstrate BioPsyKit’s feature of asserting that questionnaire items are provided in the correct value range according to the original definition of the questionnaire before computing the actual questionnaire score.

In this example, we load an example dataset where the PSS items in this dataset are (wrongly) coded from 1 to 5. The original definition of the PSS, however, was defined for items that are coded from 0 to 4. Attempting to computing the PSS by passing the data to questionnaires.pss() will result in a ValueRangeError.

Load Questionnaire Data with Wrong Item Ranges¶

[8]:

data_wrong = bp.example_data.get_questionnaire_example_wrong_range()
data_wrong.head()

[8]:

	PSS_01	PSS_02	PSS_03	PSS_04	PSS_05	PSS_06	PSS_07	PSS_08	PSS_09	PSS_10	...	PASA_07	PASA_08	PASA_09	PASA_10	PASA_11	PASA_12	PASA_13	PASA_14	PASA_15	PASA_16
subject
Vp01	4	3	4	4	3	3	3	3	4	2	...	2	2	2	5	4	2	4	2	1	2
Vp02	2	2	2	4	3	2	4	4	2	1	...	1	1	6	4	4	1	5	4	4	6
Vp03	3	4	4	3	3	3	2	2	4	2	...	4	4	2	5	5	1	4	1	2	5
Vp04	3	3	3	4	3	3	4	3	3	2	...	3	3	3	3	2	3	2	4	4	4
Vp05	1	3	3	4	3	2	4	4	3	2	...	1	4	3	3	2	3	5	5	2	6

5 rows × 66 columns

Slice Columns and Compute PSS Score¶

Note: This code will fail on purpose (the Exception is being catched) because the items are provided in the wrong range.

[9]:

data_pss_wrong, columns_pss = bp.questionnaires.utils.find_cols(data_wrong, starts_with="PSS")

[10]:

try:
    pss = bp.questionnaires.pss(data_pss_wrong)
except bp.utils.exceptions.ValueRangeError as e:
    print("ValueRangeError: {}".format(e))

ValueRangeError: Some of the values are out of the expected range. Expected were values in the range [0, 4], got values in the range [1, 5]. If values are part of questionnaire scores, you can convert questionnaire items into the correct range by calling `biopsykit.questionnaire.utils.convert_scale()`.

Solution: Convert (Recode) Questionnaire Items¶

To solve this issue we need to convert the PSS questionnaire items into the correct value range first by simply subtracting all values by -1. This can easily be done using the function utils.convert_scale(). This can also be done on two different ways:

Convert the whole, sliced PSS dataframe
Convert only the PSS columns, leave the other columns

Option 1: Convert the sliced PSS dataframe¶

[11]:

data_pss_conv = bp.questionnaires.utils.convert_scale(data_pss_wrong, offset=-1)
data_pss_conv.head()

[11]:

	PSS_01	PSS_02	PSS_03	PSS_04	PSS_05	PSS_06	PSS_07	PSS_08	PSS_09	PSS_10
subject
Vp01	3	2	3	3	2	2	2	2	3	1
Vp02	1	1	1	3	2	1	3	3	1	0
Vp03	2	3	3	2	2	2	1	1	3	1
Vp04	2	2	2	3	2	2	3	2	2	1
Vp05	0	2	2	3	2	1	3	3	2	1

Option 2: Convert only the PSS columns, leave the other columns unchanged¶

[12]:

data_conv = bp.questionnaires.utils.convert_scale(data_wrong, cols=columns_pss, offset=-1)
data_conv.head()

[12]:

	PSS_01	PSS_02	PSS_03	PSS_04	PSS_05	PSS_06	PSS_07	PSS_08	PSS_09	PSS_10	...	PASA_07	PASA_08	PASA_09	PASA_10	PASA_11	PASA_12	PASA_13	PASA_14	PASA_15	PASA_16
subject
Vp01	3	2	3	3	2	2	2	2	3	1	...	2	2	2	5	4	2	4	2	1	2
Vp02	1	1	1	3	2	1	3	3	1	0	...	1	1	6	4	4	1	5	4	4	6
Vp03	2	3	3	2	2	2	1	1	3	1	...	4	4	2	5	5	1	4	1	2	5
Vp04	2	2	2	3	2	2	3	2	2	1	...	3	3	3	3	2	3	2	4	4	4
Vp05	0	2	2	3	2	1	3	3	2	1	...	1	4	3	3	2	3	5	5	2	6

5 rows × 66 columns

Compute PSS Score (Finally!)¶

Now the scores are in the correct range and we can compute the PSS score:

[13]:

# Option 1: the sliced PSS dataframe
pss = bp.questionnaires.pss(data_pss_conv)
pss.head()

[13]:

	PSS_Helpless	PSS_SelfEff	PSS_Total
subject
Vp01	14	7	21
Vp02	5	5	10
Vp03	14	10	24
Vp04	11	6	17
Vp05	8	5	13

[14]:

# Option 2: the whole dataframe + PSS columns
pss = bp.questionnaires.pss(data_conv, columns=columns_pss)
pss.head()

[14]:

	PSS_Helpless	PSS_SelfEff	PSS_Total
subject
Vp01	14	7	21
Vp02	5	5	10
Vp03	14	10	24
Vp04	11	6	17
Vp05	8	5	13

Example 2: Compute Positive and Negative Affect Schedule (PANAS)¶

The PANAS assesses positive affect (interested, excited, strong, enthusiastic, proud, alert, inspired, determined, attentive, and active) and negative affect (distressed, upset, guilty, scared, hostile, irritable, ashamed, nervous, jittery, and afraid).

Higher scores on each subscale indicate greater positive or negative affect.

Slice Dataframe and Select Columns¶

In this example, the PANAS was assessed pre and post Stress:

[15]:

data_panas_pre, columns_panas_pre = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Pre")
data_panas_post, columns_panas_post = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Post")

Compute PANAS¶

[16]:

panas_pre = bp.questionnaires.panas(data_panas_pre)
panas_pre.head()

[16]:

	PANAS_NegativeAffect	PANAS_PositiveAffect	PANAS_Total
subject
Vp01	2.2	2.4	3.10
Vp02	2.5	2.3	2.90
Vp03	3.0	2.1	2.55
Vp04	2.0	2.8	3.40
Vp05	2.4	1.6	2.60

[17]:

panas_post = bp.questionnaires.panas(data_panas_post)
panas_post.head()

[17]:

	PANAS_NegativeAffect	PANAS_PositiveAffect	PANAS_Total
subject
Vp01	2.2	2.8	3.30
Vp02	1.9	2.7	3.40
Vp03	2.2	3.3	3.55
Vp04	1.6	1.9	3.15
Vp05	2.3	2.3	3.00

Example 3: Compute Multiple Scores at Once¶

Build a dictionary where each key corresponds to the questionnaire score to be computed and each value corresponds to the columns of the questionnaire. If some scores were assessed repeatedly (e.g. PANAS was assessed at two different time points, pre and post) separate the suffix from the computation by a - (e.g. panas-pre and panas-post).

Load Example Questionnaire Data¶

[18]:

data = bp.example_data.get_questionnaire_example()
data.head()

[18]:

	PSS_01	PSS_02	PSS_03	PSS_04	PSS_05	PSS_06	PSS_07	PSS_08	PSS_09	PSS_10	...	PASA_07	PASA_08	PASA_09	PASA_10	PASA_11	PASA_12	PASA_13	PASA_14	PASA_15	PASA_16
subject
Vp01	3	2	3	3	2	2	2	2	3	1	...	2	2	2	5	4	2	4	2	1	2
Vp02	1	1	1	3	2	1	3	3	1	0	...	1	1	6	4	4	1	5	4	4	6
Vp03	2	3	3	2	2	2	1	1	3	1	...	4	4	2	5	5	1	4	1	2	5
Vp04	2	2	2	3	2	2	3	2	2	1	...	3	3	3	3	2	3	2	4	4	4
Vp05	0	2	2	3	2	1	3	3	2	1	...	1	4	3	3	2	3	5	5	2	6

5 rows × 66 columns

[19]:

from biopsykit.questionnaires.utils import find_cols

dict_scores = {
    "pss": find_cols(data, starts_with="PSS")[1],
    "pasa": find_cols(data, starts_with="PASA")[1],
    "panas-pre": find_cols(data, starts_with="PANAS", ends_with="Pre")[1],
    "panas-post": find_cols(data, starts_with="PANAS", ends_with="Post")[1],
}

[20]:

# Compute all scores and store in result dataframe
data_scores = bp.questionnaires.utils.compute_scores(data, dict_scores)
data_scores.head()

[20]:

	PSS_Helpless	PSS_SelfEff	PSS_Total	PASA_Threat	PASA_Challenge	PASA_SelfConcept	PASA_ControlExp	PASA_Primary	PASA_Secondary	PASA_StressComposite	PANAS_NegativeAffect_pre	PANAS_PositiveAffect_pre	PANAS_Total_pre	PANAS_NegativeAffect_post	PANAS_PositiveAffect_post	PANAS_Total_post
subject
Vp01	14	7	21	19	12	15	10	15.5	12.5	3.0	2.2	2.4	3.10	2.2	2.8	3.30
Vp02	5	5	10	18	18	19	14	18.0	16.5	1.5	2.5	2.3	2.90	1.9	2.7	3.40
Vp03	14	10	24	18	11	15	11	14.5	13.0	1.5	3.0	2.1	2.55	2.2	3.3	3.55
Vp04	11	6	17	13	14	12	12	13.5	12.0	1.5	2.0	2.8	3.40	1.6	1.9	3.15
Vp05	8	5	13	19	18	11	16	18.5	13.5	5.0	2.4	1.6	2.60	2.3	2.3	3.00

Convert Scores into Long Format¶

[21]:

data_scores.head()

[21]:

	PSS_Helpless	PSS_SelfEff	PSS_Total	PASA_Threat	PASA_Challenge	PASA_SelfConcept	PASA_ControlExp	PASA_Primary	PASA_Secondary	PASA_StressComposite	PANAS_NegativeAffect_pre	PANAS_PositiveAffect_pre	PANAS_Total_pre	PANAS_NegativeAffect_post	PANAS_PositiveAffect_post	PANAS_Total_post
subject
Vp01	14	7	21	19	12	15	10	15.5	12.5	3.0	2.2	2.4	3.10	2.2	2.8	3.30
Vp02	5	5	10	18	18	19	14	18.0	16.5	1.5	2.5	2.3	2.90	1.9	2.7	3.40
Vp03	14	10	24	18	11	15	11	14.5	13.0	1.5	3.0	2.1	2.55	2.2	3.3	3.55
Vp04	11	6	17	13	14	12	12	13.5	12.0	1.5	2.0	2.8	3.40	1.6	1.9	3.15
Vp05	8	5	13	19	18	11	16	18.5	13.5	5.0	2.4	1.6	2.60	2.3	2.3	3.00

Questionnaires that only have different subscales => Create one new index level subscale:

[22]:

print(list(data_scores.filter(like="PASA").columns))

['PASA_Threat', 'PASA_Challenge', 'PASA_SelfConcept', 'PASA_ControlExp', 'PASA_Primary', 'PASA_Secondary', 'PASA_StressComposite']

[23]:

pasa = bp.questionnaires.utils.wide_to_long(data_scores, quest_name="PASA", levels=["subscale"])
pasa.head()

[23]:

		PASA
subject	subscale
Vp01	Challenge	12.0
	ControlExp	10.0
	Primary	15.5
	Secondary	12.5
	SelfConcept	15.0

Questionnaires that have different subscales and different assessment times => Create two new index levels subscale and time

[24]:

print(list(data_scores.filter(like="PANAS").columns))

['PANAS_NegativeAffect_pre', 'PANAS_PositiveAffect_pre', 'PANAS_Total_pre', 'PANAS_NegativeAffect_post', 'PANAS_PositiveAffect_post', 'PANAS_Total_post']

utils.wide_to_long() converts the data into the wide format recursively from the first level (here: subscale) to the last level (here: time):

[25]:

panas = bp.questionnaires.utils.wide_to_long(data_scores, quest_name="PANAS", levels=["subscale", "time"])
panas.head()

[25]:

			PANAS
subject	subscale	time
Vp01	NegativeAffect	post	2.2
	NegativeAffect	pre	2.2
	PositiveAffect	post	2.8
	PositiveAffect	pre	2.4
	Total	post	3.3

Plotting¶

In one Plot¶

[26]:

fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
    data=panas, x="subscale", y="PANAS", hue="time", hue_order=["pre", "post"], palette=cmaps.faculties_light, ax=ax
);

../../_images/examples__notebooks_Questionnaire_Example_62_0.svg

Note: See Documentation for plotting.feature_boxplot() for further information of the used functions.

In Subplots¶

Regular¶

[27]:

fig, axs = plt.subplots(ncols=3)
bp.plotting.multi_feature_boxplot(
    data=panas,
    x="time",
    y="PANAS",
    features=["NegativeAffect", "PositiveAffect", "Total"],
    group="subscale",
    order=["pre", "post"],
    palette=cmaps.faculties_light,
    ax=axs,
)
fig.tight_layout()

../../_images/examples__notebooks_Questionnaire_Example_66_0.svg

Note: See Documentation for plotting.multi_feature_boxplot() for further information of the used functions.

With Significance Brackets¶

Note: See StatsPipeline_Plotting_Example.ipynb for further information!

[28]:

pipeline = bp.stats.StatsPipeline(
    steps=[("prep", "normality"), ("test", "pairwise_tests")],
    params={"dv": "PANAS", "groupby": "subscale", "subject": "subject", "within": "time"},
)

pipeline.apply(panas);

[29]:

fig, axs = plt.subplots(ncols=3)

features = ["NegativeAffect", "PositiveAffect", "Total"]

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="within", plot_type="single", x="time", features=features, subplots=True
)

bp.plotting.multi_feature_boxplot(
    data=panas,
    x="time",
    y="PANAS",
    features=features,
    group="subscale",
    order=["pre", "post"],
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    palette=cmaps.faculties_light,
    ax=axs,
)
for ax, feature in zip(axs, features):
    ax.set_title(feature)

fig.tight_layout()

../../_images/examples__notebooks_Questionnaire_Example_70_0.svg

[ ]:

Questionnaire Example¶

Setup and Helper Functions¶

Load Questionnaire Data¶

Example 1: Compute Perceived Stress Scale (PSS)¶

Slice Dataframe and Select Columns¶

Compute PSS Score¶

Option 1: Sliced PSS dataframe¶

Option 2: Whole dataframe + PSS columns¶

Feature Demo: Compute PSS Score with Wrong Item Ranges¶

Load Questionnaire Data with Wrong Item Ranges¶

Slice Columns and Compute PSS Score¶

Solution: Convert (Recode) Questionnaire Items¶

Option 1: Convert the sliced PSS dataframe¶

Option 2: Convert only the PSS columns, leave the other columns unchanged¶

Compute PSS Score (Finally!)¶

Example 2: Compute Positive and Negative Affect Schedule (PANAS)¶

Slice Dataframe and Select Columns¶

Compute PANAS¶

Example 3: Compute Multiple Scores at Once¶

Load Example Questionnaire Data¶

Convert Scores into Long Format¶

Plotting¶

In one Plot¶

In Subplots¶

Regular¶

With Significance Brackets¶