Book Image

The Kaggle Workbook

By : Konrad Banachewicz, Luca Massaron
5 (1)
Book Image

The Kaggle Workbook

5 (1)
By: Konrad Banachewicz, Luca Massaron

Overview of this book

More than 80,000 Kaggle novices currently participate in Kaggle competitions. To help them navigate the often-overwhelming world of Kaggle, two Grandmasters put their heads together to write The Kaggle Book, which made plenty of waves in the community. Now, they’ve come back with an even more practical approach based on hands-on exercises that can help you start thinking like an experienced data scientist. In this book, you’ll get up close and personal with four extensive case studies based on past Kaggle competitions. You’ll learn how bright minds predicted which drivers would likely avoid filing insurance claims in Brazil and see how expert Kagglers used gradient-boosting methods to model Walmart unit sales time-series data. Get into computer vision by discovering different solutions for identifying the type of disease present on cassava leaves. And see how the Kaggle community created predictive algorithms to solve the natural language processing problem of subjective question-answering. You can use this workbook as a supplement alongside The Kaggle Book or on its own alongside resources available on the Kaggle website and other online communities. Whatever path you choose, this workbook will help make you a formidable Kaggle competitor.
Table of Contents (7 chapters)

To get the most out of this book

The Python code proposed in this book has been designed to run on a Kaggle Notebook without any installation on a local computer. Therefore, don’t worry about what machine you have available and about what version of Python package you have to install. All you need is a computer with access to the internet and a free Kaggle account. (you will find instructions about the procedures in Chapter 3 of The Kaggle Book). If you don’t have a free Kaggle account yet, just go to and follow the instructions on the website.

When referred to a link, just explore it: you can find code available on public Kaggle Notebooks that you can reuse or further materials to illustrate concepts and ideas that we outlined in the book.

Download the example code files

The code bundle for the book is hosted on GitHub at We also have other code bundles from our rich catalog of books and videos available at Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here:

Conventions used

There are a few text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example, “An important component of our feature extraction pipeline is the TfidfVectorizer.”

A block of code is set as follows:

!pip install transformers
import transformer

Any command-line input or output is written as follows:

LightGBM CV Gini Normalized Score: 0.289 (0.015)

Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example: “ We will evaluate the performance of our baseline model using Out-Of-Fold (OOF) cross validation.”

Link: Indicates a hyperlink to a web page containing additional information on a topic or to a resource on Kaggle.

Exercises are displayed as follows:

Exercise Number

Exercise Notes (write down any notes or workings that will help you):

Warnings or important notes appear like this.

Tips and tricks appear like this.