Model selection 101, using R

Quick and dirty markup of simple model selection using R

Peter Nistrup
Towards Data Science
12 min readNov 26, 2018

--

What‘re we doing?

Since this is a very introductory look at model selection we assume the data you’ve acquired has already been cleaned, scrubbed and ready to go. Data cleaning is a whole subject in and of itself and is actually the primary time-sink of any Data Scientist. Go to the end of this article if you want to download the data for yourself and follow along!

Edit: I’ve made a “sequel” to this article about visualizing and plotting the model we find if you want to check that out after reading this one!:

Make sure to follow my profile if you enjoy this article and want to see more!

Lets look at the pipeline:

This is the skeleton I use for creating a simple LM or GLM:

  1. Create a base-model using all available variables and data
  2. Factorize categorical variables if R didn’t do the job
  3. Add relevant

--

--