Skip to content

Comments

Add filter feature selection using Pearson correlation#6

Open
mandjevant wants to merge 1 commit intoCaroFuchs:masterfrom
mandjevant:filter_selection
Open

Add filter feature selection using Pearson correlation#6
mandjevant wants to merge 1 commit intoCaroFuchs:masterfrom
mandjevant:filter_selection

Conversation

@mandjevant
Copy link

In short:

Adding a filter feature selection method for applications that require efficient computation over the detection of complex relationships.

Implementation

  1. Transpose the data: self.dataX contains the data as a matrix. This data must be transposed to calculated Pearson correlation.
  2. Calculate Pearson correlation: Since scipy is already a requirement for this library, we can simply use scipy.stats.pearsonr. This will also return the p-value.
  3. Remove the features where the absolute correlation and p-value do not obey the set minimum and maximum.
  4. Return the indices of the selected features and the names of the selected features.

Function arguments:

Args:
    min_corr: Minimum correlation value for feature to be selected. Standard: 0.2
    max_corr: Maximum correlation value for feature to be selected. Standard: 1.0
    max_pvalue: Maximum p-value to determine statistical significance. Standard: 0.05

Results

Wrapper method:

  • The following features were selected: ['RM', 'TAX', 'LSTAT', 'PTRATIO', 'DIS', 'AGE']
  • The estimated error of the developed model is: 2.7131430936707424
  • Method took 73.3180787563324 seconds to complete.

fst-pso method:

  • The following features have been selected: ['CRIM', 'ZN', 'NOX', 'RM', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'LSTAT'] with a MAE of 2.79
  • The estimated error of the developed model is: 2.907917419119525
  • Method took 3338.643133163452 seconds to complete.

Filter method:

  • The following features were selected: ['CRIM', 'ZN', 'INDUS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'LSTAT']
  • The estimated error of the developed model is: 2.6634497163494317
  • Method took 2.928159236907959 seconds to complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant