Book Image

The Data Science Workshop - Second Edition

By : Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare
5 (1)
Book Image

The Data Science Workshop - Second Edition

5 (1)
By: Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare

Overview of this book

Where there’s data, there’s insight. With so much data being generated, there is immense scope to extract meaningful information that’ll boost business productivity and profitability. By learning to convert raw data into game-changing insights, you’ll open new career paths and opportunities. The Data Science Workshop begins by introducing different types of projects and showing you how to incorporate machine learning algorithms in them. You’ll learn to select a relevant metric and even assess the performance of your model. To tune the hyperparameters of an algorithm and improve its accuracy, you’ll get hands-on with approaches such as grid search and random search. Next, you’ll learn dimensionality reduction techniques to easily handle many variables at once, before exploring how to use model ensembling techniques and create new features to enhance model performance. In a bid to help you automatically create new features that improve your model, the book demonstrates how to use the automated feature engineering tool. You’ll also understand how to use the orchestration and scheduling workflow to deploy machine learning models in batch. By the end of this book, you’ll have the skills to start working on data science projects confidently. By the end of this book, you’ll have the skills to start working on data science projects confidently.
Table of Contents (16 chapters)
Preface
12
12. Feature Engineering

1. Introduction to Data Science in Python

Overview

This very first chapter will introduce you to the field of data science and walk you through an overview of Python's core concepts and their application in the world of data science.

By the end of this chapter, you will be able to explain what data science is and distinguish between supervised and unsupervised learning. You will also be able to explain what machine learning is and distinguish between regression, classification, and clustering problems. You'll have learnt to create and manipulate different types of Python variable, including core variables, lists, and dictionaries. You'll be able to build a for loop, print results using f-strings, define functions, import Python packages and load data in different formats using pandas. You will also have had your first taste of training a model using scikit-learn.