5 (1)

5 (1)

#### Overview of this book

More than 80,000 Kaggle novices currently participate in Kaggle competitions. To help them navigate the often-overwhelming world of Kaggle, two Grandmasters put their heads together to write The Kaggle Book, which made plenty of waves in the community. Now, they’ve come back with an even more practical approach based on hands-on exercises that can help you start thinking like an experienced data scientist. In this book, you’ll get up close and personal with four extensive case studies based on past Kaggle competitions. You’ll learn how bright minds predicted which drivers would likely avoid filing insurance claims in Brazil and see how expert Kagglers used gradient-boosting methods to model Walmart unit sales time-series data. Get into computer vision by discovering different solutions for identifying the type of disease present on cassava leaves. And see how the Kaggle community created predictive algorithms to solve the natural language processing problem of subjective question-answering. You can use this workbook as a supplement alongside The Kaggle Book or on its own alongside resources available on the Kaggle website and other online communities. Whatever path you choose, this workbook will help make you a formidable Kaggle competitor.
Preface
Free Chapter
The Most Renowned Tabular Competition – Porto Seguro’s Safe Driver Prediction
The Makridakis Competitions – M5 on Kaggle for Accuracy and Uncertainty
Vision Competition – Cassava Leaf Disease Competition
NLP Competition – Google Quest Q&A Labeling
Other Books You May Enjoy
Index

# Ensembling the results

Now, having two models, what’s left is to mix them together and see if we can improve the results. As suggested by Jahrer we go straight for a blend of them, but we do not limit ourselves to producing just an average of the two (since our approach in the end has slightly differed from Jahrer’s one) but we will also try to get optimal weights for the blend. We start importing the out-of-fold predictions and having our evaluation function ready.

``````import pandas as pd
import numpy as np
from numba import jit
@jit
def eval_gini(y_true, y_pred):
y_true = np.asarray(y_true)
y_true = y_true[np.argsort(y_pred)]
ntrue = 0
gini = 0
delta = 0
n = len(y_true)
for i in range(n-1, -1, -1):
y_i = y_true[i]
ntrue += y_i
gini += y_i * delta
delta += 1 - y_i
gini = 1 - 2 * gini / (ntrue * (n - ntrue))
return gini