Simula@BI: On the asymptotic behaviour of the variance estimator of a U-statistic

Speaker: Riccardo De Bin, Associate Professor - Statistics and Data Science, UiO

In supervised learning, including supervised classification as an important special case, the prediction error is often estimated by means of resampling-based procedures such as cross-validation. In methodological studies, the prediction error is used to contrast the performances of several prediction algorithms. A crucial but challenging question is whether the observed differences between the estimates are statistically significant or not, i.e., whether they are compatible with the null-hypothesis of no true difference. To answer this question, a good understanding of the error estimates’ distribution is required. In the case of resampling-based procedures, however, the estimation of the variance is difficult: the learning andtest sets considered in the successive resampling iterations overlap and, therefore, the iteration-specific error estimates computed in the resampling iterations are de- pendent. Their covariance structure is complex, thus making the estimation of the variance of their average very arduous in general. An unbiased variance estimator, suggested in the literature, can be recast as a U-statistics variance. However, its kernel size depends on the sample size, preventing asymptotic statements. Here, we solve this issue by decomposing the variance estimator into a linear combination of U-statistics with fixed kernel size, and consequently obtaining the desired asymptotic. We show that it is possible to construct a confidence interval for the true error and derive a statistical test which compares the error estimates of two classification algorithms. The confidence interval’s coverage probability and the test are illustrated by means of both a simulation study and real data application.

Read more about the speaker:

Practical information

  • Time:Thursday, 09 September 2021 13:30 - 14:30
  • Place:A2-030
  • Contact:Siri Johnsen (