Book Image

The Kaggle Workbook

By : Konrad Banachewicz, Luca Massaron
5 (1)
Book Image

The Kaggle Workbook

5 (1)
By: Konrad Banachewicz, Luca Massaron

Overview of this book

More than 80,000 Kaggle novices currently participate in Kaggle competitions. To help them navigate the often-overwhelming world of Kaggle, two Grandmasters put their heads together to write The Kaggle Book, which made plenty of waves in the community. Now, they’ve come back with an even more practical approach based on hands-on exercises that can help you start thinking like an experienced data scientist. In this book, you’ll get up close and personal with four extensive case studies based on past Kaggle competitions. You’ll learn how bright minds predicted which drivers would likely avoid filing insurance claims in Brazil and see how expert Kagglers used gradient-boosting methods to model Walmart unit sales time-series data. Get into computer vision by discovering different solutions for identifying the type of disease present on cassava leaves. And see how the Kaggle community created predictive algorithms to solve the natural language processing problem of subjective question-answering. You can use this workbook as a supplement alongside The Kaggle Book or on its own alongside resources available on the Kaggle website and other online communities. Whatever path you choose, this workbook will help make you a formidable Kaggle competitor.
Table of Contents (7 chapters)

Computing predictions for specific dates and time horizons

The plan for replicating Monsaraida’s solution is to create a notebook customizable by input parameters in order to produce the necessary processed data for train and test and the LightGBM models for predictions. The models, given data in the past, will be trained to learn to predict values in a specific number of days in the future. The best results can be obtained by having each model to learn to predict the values in a specific week range in the future. Since we have to predict up to 28 days ahead in the future, we need a model predicting from day +1 to day +7 in the future, then another one able to predict from day +8 to day +14, another from day +15 to +21 and finally another last one capable of handling predictions from day +22 to day +28. We will need a Kaggle notebook for each of these time ranges, thus we need four notebooks. Each of these notebooks will be trained to predict that future time span for each of the...