Book Image

Beginning Data Science with Python and Jupyter

By : Chris DallaVilla
Book Image

Beginning Data Science with Python and Jupyter

By: Chris DallaVilla

Overview of this book

Getting started with data science doesn’t have to be an uphill battle. This step-by-step video course is ideal for beginners who know a little Python and are looking for a quick, fast-paced introduction. Get to grips with the skills you need for entry-level data science in this hands-on Python and Jupyter course. You’ll learn about some of the most commonly used libraries that are part of the Anaconda distribution, and then explore machine learning models with real datasets to give you the skills and exposure you need for the real world.We'll start with understanding the basics of Jupyter and its standard features. You'll be analyzing an example of a data analytics report. After analyzing a data analytics report, next step is to implement multiple classification algorithms. We’ll then show you how easy it can be to scrape and gather your own data from the open web, so that you can apply your new skills in an actionable context. Finish up by learning to visualize these data interactively. The code bundle for this course is available at https://github.com/TrainingByPackt/Beginning-Data-Science-with-Python-and-Jupyter-eLearning
Table of Contents (3 chapters)
Chapter 2
Data Cleaning and Advanced Machine Learning
Content Locked
Section 5
K-Fold Cross-Validation
Thus far, we have trained models on a subset of the data and then assessed performance on the unseen portion, called the test set. This is good practice because the model performance on training data is not a good indicator of its e?ectiveness as a predictor. It's very easy to increase accuracy on a training dataset by overfitting a model, which can result in poorer performance on unseen data. This video covers: - Assessing Models with K-Fold Cross-Validation and Validation Curves - K-Fold Cross Validation - K-Fold Cross Validation Algorithm - Stratified –fold - Validation Curves - Demo on Using K-fold Cross Validation and Validation Curves in Python with Scikit-learn - Dimensionality Reduction Techniques - Principal Component Analysis (PCA) - Key Insights of PCA - Demo on Training a Predictive Model For The Employee Retention Problem