F-Score Difference Permutation Test Example

Reference

Yeh, A. (2000). More accurate tests for the statistical significance of result differences. arXiv preprint cs/0008005

Null Hypothesis: There is no difference in F-score between two methods

Algorithm:

  1. Take predictions of the classifiers
  2. Randomly assign them to any of the classifiers and observe the difference in F-score
  3. Count how often that random difference is larger than the actual, observed difference in F-score

Simulate Data from Binary Classification with 2 Useless Predictors

Permutation Test Whether One Predictor's F-Score is Better than the Other's