Python: Real-World Data Science

By : Fabrizio Romano, Dusty Phillips, Phuong Vo.T.H, Martin Czygan, Robert Layton, Sebastian Raschka

Python: Real-World Data Science

By: Fabrizio Romano, Dusty Phillips, Phuong Vo.T.H, Martin Czygan, Robert Layton, Sebastian Raschka

Overview of this book

The Python: Real-World Data Science course will take you on a journey to become an efficient data science practitioner by thoroughly understanding the key concepts of Python. This learning path is divided into four modules and each module are a mini course in their own right, and as you complete each one, you’ll have gained key skills and be ready for the material in the next module. The course begins with getting your Python fundamentals nailed down. After getting familiar with Python core concepts, it’s time that you dive into the field of data science. In the second module, you'll learn how to perform data analysis using Python in a practical and example-driven way. The third module will teach you how to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis to more complex data types including text, images, and graphs. Machine learning and predictive analytics have become the most important approaches to uncover data gold mines. In the final module, we'll discuss the necessary details regarding machine learning concepts, offering intuitive yet informative explanations on how machine learning algorithms work, how to use them, and most importantly, how to avoid the common pitfalls.

Free Chapter

Table of Contents

Python: Real-World Data Science

Meet Your Course Guide

What's so cool about Data Science?

Course Structure

Course Journey

The Course Roadmap and Timeline

1. Course Module 1: Python Fundamentals

1. Introduction and First Steps – Take a Deep Breath

2. Object-oriented Design

3. Objects in Python

4. When Objects Are Alike

5. Expecting the Unexpected

6. When to Use Object-oriented Programming

7. Python Data Structures

8. Python Object-oriented Shortcuts

9. Strings and Serialization

10. The Iterator Pattern

11. Python Design Patterns I

12. Python Design Patterns II

13. Testing Object-oriented Programs

14. Concurrency

2. Course Module 2: Data Analysis

1. Introducing Data Analysis and Libraries

2. NumPy Arrays and Vectorized Computation

3. Data Analysis with pandas

4. Data Visualization

5. Time Series

6. Interacting with Databases

7. Data Analysis Application Examples

3. Course Module 3: Data Mining

1. Getting Started with Data Mining

2. Classifying with scikit-learn Estimators

3. Predicting Sports Winners with Decision Trees

4. Recommending Movies Using Affinity Analysis

5. Extracting Features with Transformers

6. Social Media Insight Using Naive Bayes

7. Discovering Accounts to Follow Using Graph Mining

8. Beating CAPTCHAs with Neural Networks

9. Authorship Attribution

10. Clustering News Articles

11. Classifying Objects in Images Using Deep Learning

12. Working with Big Data

13. Next Steps…

4. Course Module 4: Machine Learning

1. Giving Computers the Ability to Learn from Data

2. Training Machine Learning Algorithms for Classification

3. A Tour of Machine Learning Classifiers Using scikit-learn

4. Building Good Training Sets – Data Preprocessing

5. Compressing Data via Dimensionality Reduction

6. Learning Best Practices for Model Evaluation and Hyperparameter Tuning

7. Combining Different Models for Ensemble Learning

8. Predicting Continuous Target Variables with Regression Analysis

A. Reflect and Test Yourself! Answers

B. Bibliography

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Chapter 2. Classifying with scikit-learn Estimators

The scikit-learn library is a collection of data mining algorithms, written in Python and using a common programming interface. This allows users to easily try different algorithms as well as utilize standard tools for doing effective testing and parameter searching. There are a large number of algorithms and utilities in scikit-learn.

In this chapter, we focus on setting up a good framework for running data mining procedures. This will be used in later chapters, which are all focused on applications and techniques to use in those situations.

The key concepts introduced in this chapter are as follows:

Estimators: This is to perform classification, clustering, and regression
Transformers: This is to perform preprocessing and data alterations
Pipelines: This is to put together your workflow into a replicable format

scikit-learn estimators

Estimators are scikit-learn's abstraction, allowing for the standardized implementation of a...

Python: Real-World Data Science

By : Fabrizio Romano, Dusty Phillips, Phuong Vo.T.H, Martin Czygan, Robert Layton, Sebastian Raschka

Python: Real-World Data Science

By: Fabrizio Romano, Dusty Phillips, Phuong Vo.T.H, Martin Czygan, Robert Layton, Sebastian Raschka

Overview of this book

Related Content you might be interested in

Current Title:

Python: Real-World Data Science

Chapter 2. Classifying with scikit-learn Estimators

scikit-learn estimators