This lesson introduced concepts to determine if a model performs well. The lesson discussed split the data into training and testing data and then why it’s good to actually split the data into training, validation, and test data. The lesson introduces confusion matrices along with true positives, false positives, false negatives, and true negatives. It also discusses how to calculate accuracy for a classification model. The instructor introduces the mean absolute error, mean squared error, and R2 metrics for regression models. The instructor discusses underfitting/bias and overfitting/high variance and provides some graphical examples. He also discusses how model selection is a tradeoff between underfitting and overfitting. This tradeoff leads to the introduction of the Model Complexity Graph and the desirable point on the graph to use as the model. The instructor intentionally breaks the golden rule of never using the test data to train the model when he plots a Model Complexity Graph using training and test data and uses it to make a decision on which model to use. He does this to introduce the need for K-Fold Cross Validation. Then he shows how the Model Complexity Graph needs to plot curves for the training and validation data results instead of curves for training and test data.

## L#9: Model Evaluation and Validation

Advertisements