Book Image

Machine Learning: Make Your Own Recommender System

By : Oliver Theobald
Book Image

Machine Learning: Make Your Own Recommender System

By: Oliver Theobald

Overview of this book

With an introductory overview, the course prepares you for a deep dive into the practical application of Scikit-Learn and the datasets that bring theories to life. From the basics of machine learning to the intricate details of setting up a sandbox environment, this course covers the essential groundwork for any aspiring data scientist. The course focuses on developing your skills in working with data, implementing data reduction techniques, and understanding the intricacies of item-based and user-based collaborative filtering, along with content-based filtering. These core methodologies are crucial for creating accurate and efficient recommender systems that cater to the unique preferences of users. Practical examples and evaluations further solidify your learning, making complex concepts accessible and manageable. The course wraps up by addressing the critical topics of privacy, ethics in machine learning, and the exciting future of recommender systems. This holistic approach ensures that you not only gain technical proficiency but also consider the broader implications of your work in this field. With a final look at further resources, your journey into machine learning and recommender systems is just beginning, armed with the knowledge and tools to explore new horizons.
Table of Contents (15 chapters)
Free Chapter
1
FOREWORD
2
DATASETS USED IN THIS BOOK
3
INTRODUCING SCIKIT-LEARN
4
INTRODUCTION
5
THE ANATOMY
6
SETTING UP A SANDBOX ENVIRONMENT
7
WORKING WITH DATA
8
DATA REDUCTION
9
ITEM-BASED COLLABORATIVE FILTERING
10
USER-BASED COLLABORATIVE FILTERING
11
CONTENT-BASED FILTERING
12
EVALUATION
13
PRIVACY & ETHICS
14
THE FUTURE OF RECOMMENDER SYSTEMS
15
FURTHER RESOURCES

EVALUATION

 

If you’re familiar with the mechanics of machine learning, you might have noticed the absence of training and test data in the models used in the exercises thus far. An explanation for this vital question will be revealed later in this chapter, but, first, let’s review the rationale of split validation.

The partition of a dataset into training data and test data, known as split validation, is a fundamental part of machine learning. The training data is used to detect general patterns and design a prediction model, while the test data is used to road-test the model and compare the results. Thus, if we reserve 30% of the data and test it with the model developed from patterns discovered in the initial 70% of the data, will the model’s predictions still hold accurate?

Two possible reasons why the model may falter at making predictions using the test data are overfitting and underfitting. Overfitting exists when the model adjusts itself to fit patterns...