-
Book Overview & Buying
-
Table Of Contents
Machine Learning For Dummies
By :
In a perfect world, you could perform a test on data that your machine learning algorithm has never learned from before. However, waiting for fresh data isn’t always feasible in terms of time and costs. As a first simple remedy, you can randomly split your data into training and test sets. The common split is from 25 to 30 percent for testing and the remaining 75 to 70 percent for training. You split your data consisting of your response and features at the same time, keeping correspondence between each response and its features.
The second remedy occurs when you need to tune your learning algorithm. In this case, the test split data isn’t a good practice because it causes another kind of overfitting called snooping (see more on this topic later in the chapter). To overcome snooping, you need a third split, called a validation set. A suggested split is to have your examples partitioned in thirds: 70 percent for training, 20 percent for validation...
Change the font size
Change margin width
Change background colour