Book Image

The Kaggle Workbook

By : Konrad Banachewicz, Luca Massaron
5 (1)
Book Image

The Kaggle Workbook

5 (1)
By: Konrad Banachewicz, Luca Massaron

Overview of this book

More than 80,000 Kaggle novices currently participate in Kaggle competitions. To help them navigate the often-overwhelming world of Kaggle, two Grandmasters put their heads together to write The Kaggle Book, which made plenty of waves in the community. Now, they’ve come back with an even more practical approach based on hands-on exercises that can help you start thinking like an experienced data scientist. In this book, you’ll get up close and personal with four extensive case studies based on past Kaggle competitions. You’ll learn how bright minds predicted which drivers would likely avoid filing insurance claims in Brazil and see how expert Kagglers used gradient-boosting methods to model Walmart unit sales time-series data. Get into computer vision by discovering different solutions for identifying the type of disease present on cassava leaves. And see how the Kaggle community created predictive algorithms to solve the natural language processing problem of subjective question-answering. You can use this workbook as a supplement alongside The Kaggle Book or on its own alongside resources available on the Kaggle website and other online communities. Whatever path you choose, this workbook will help make you a formidable Kaggle competitor.
Table of Contents (7 chapters)

The Most Renowned Tabular Competition – Porto Seguro’s Safe Driver Prediction

Learning how to reach the top on the leaderboard in any Kaggle competition requires patience, diligence, and many attempts to learn the best way to compete and achieve top results. For this reason, we have thought of a workbook that can help you build those skills faster by trying some Kaggle competitions of the past and learning how to reach the top of the leaderboard by reading discussions, reusing notebooks, engineering features, and training various models.

We start with one of the most renowned tabular competitions, Porto Seguro’s Safe Driver Prediction. In this competition, you are asked to solve a common problem in insurance and figure out who is going to have a car insurance claim in the next year. Such information is useful to increase the insurance fee for drivers more likely to have a claim and to lower it for those less likely to.

In illustrating the key insights and technicalities necessary for cracking this competition, we will show you the necessary code and ask you to study topics and answer questions found in The Kaggle Book itself. Therefore, without much more ado, let’s start this new learning path of yours.

In this chapter, you will learn:

  • How to tune and train a LightGBM model
  • How to build a denoising autoencoder and how to use it to feed a neural network
  • How to effectively blend models that are quite different from each other

All the code files for this chapter can be found at Change to