[question] Estimation of the result on test sample

### Enter the chapter number

3.3

### Enter the page number

184 to 202

### What is the cell's number in the notebook

_No response_

### Enter the environment you are using to run the notebook

None

### Question

Hello

Second question :

It seems there is two schools in statistics about validating a model. The kind of “traditional” approach I was taught at school : making hypothesis about data (most often normality), deduce the law of the predictor, and deduce the 9x% confidence interval. Often, this approach not really convince me, as the hypothesis about normality is often made without serious justification, but because this is the only way to compute the confidence interval.

The "machine learning" approach, described in the book is, at the opposite, completely empirical : you said « my model work, because I checked it work on a test data ». So you don’t need any discussable hypothesis on the distribution of the data. At first glance it looked rather more rigorous than the first one. But actually, when you are doing this, you are estimating the unknown performance score your model would give on you entire univers (in the statistical sens) by computing a the score on you test sample. So, you are still doing a basic statistical inference. And who say “statistical inference” say “confidence interval” : you should be sure that you test sample is representative of your univers.

That means, when you compute a roc_auc or a precision/recall curve, theses curves should have a confidence interval around them. When you say “with this threshold, the precision is p%" you should give a CI around p%.

How do you deal with that ? Is there a way to compute this CI ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[question] Estimation of the result on test sample #221

Enter the chapter number

Enter the page number

What is the cell's number in the notebook

Enter the environment you are using to run the notebook

Question

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[question] Estimation of the result on test sample #221

Description

Enter the chapter number

Enter the page number

What is the cell's number in the notebook

Enter the environment you are using to run the notebook

Question

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions