biopsykit.stats.multicoll module

Functions to handle multicollinearity in data.

biopsykit.stats.multicoll.remove_multicollinearity_correlation(data, threshold=0.8)[source]

Remove features with multicollinearity based on cross-correlation coefficient.

Parameters
  • data (pandas.DataFrame) – Input data with features to check for multicollinearity.

  • threshold (float, optional) – Cross-correlation coefficient threshold. Features with a correlation coefficient above this value will be removed. Default: 0.8

Returns

Dataframe without features with high multicollinearity.

Return type

pandas.DataFrame