Python Data Science Essentials

Python Data Science Essentials - Second Edition

By : Luca Massaron, Alberto Boschetti

Buy this Book

Python Data Science Essentials - Second Edition

By: Luca Massaron, Alberto Boschetti

Buy this Book

Overview of this book

Fully expanded and upgraded, the second edition of Python Data Science Essentials takes you through all you need to know to suceed in data science using Python. Get modern insight into the core of Python data, including the latest versions of Jupyter notebooks, NumPy, pandas and scikit-learn. Look beyond the fundamentals with beautiful data visualizations with Seaborn and ggplot, web development with Bottle, and even the new frontiers of deep learning with Theano and TensorFlow. Dive into building your essential Python 3.5 data science toolbox, using a single-source approach that will allow to to work with Python 2.7 as well. Get to grips fast with data munging and preprocessing, and all the techniques you need to load, analyse, and process your data. Finally, get a complete overview of principal machine learning algorithms, graph analysis techniques, and all the visualization and deployment instruments that make it easier to present your results to an audience of both data science experts and business users.

Python Data Science Essentials - Second Edition

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

First Steps

Introducing data science and Python

Installing Python

Introducing Jupyter

Datasets and code used in the book

Summary

Data Munging

The data science process

Data loading and preprocessing with pandas

Working with categorical and text data

Data processing with NumPy

Creating NumPy arrays

NumPy's fast operations and computations

Summary

The Data Pipeline

Introducing EDA

Building new features

Dimensionality reduction

The detection and treatment of outliers

Validation metrics

Testing and validating

Cross-validation

Hyperparameter optimization

Feature selection

Wrapping everything in a pipeline

Summary

Machine Learning

Preparing tools and datasets

Linear and logistic regression

Dealing with big data

Approaching deep learning

A peek at Natural Language Processing (NLP)

An overview of unsupervised learning

Summary

Social Network Analysis

Introduction to graph theory

Graph algorithms

Graph loading, dumping, and sampling

Summary

Visualization, Insights, and Results

Introducing the basics of matplotlib

Wrapping up matplotlib's commands

Interactive visualizations with Bokeh

Advanced data-learning representations

Summary

Strengthen Your Python Foundations

Your learning list

Learn by watching, reading, and doing

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Learn by watching, reading, and doing

What if the refresher courses and our learning list are not enough and you need more support to strengthen your knowledge of Python? We will recommend further resources that are available free on the Web. By watching tutorial videos, you can try out complex and different examples and challenge yourself in a difficult task that requires you to interact with other data scientists and Python experts.

MOOCs

MOOCs have become increasingly popular in recent years, offering free on their online platforms some of the best courses from the best universities and experts from around the world. You will find Python courses on Coursera (https://www.coursera.org/), Edx (https://www.edx.org/), and Udacity (https://www.udacity.com). Another great source is the MIT open course ware, which is easily accessible (https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-00sc-introduction-to-computer-science-and-programming-spring-2011/). When you consult each of these sites, you may find different active courses on Python. We recommend a free, always available, and do it at your own pace course by Peter Norvig, the Director of Research at Google Inc. This course aims to take your knowledge of Python to a higher level of proficiency.

PyCon and PyData

The Python Conference (PyCon) is an annual convention organized at various locations around the world with the purpose of promoting the usage and diffusion of the Python language. During such conventions, tutorials, hands-on demonstrations, and training sessions are commonly held. You can check out http://www.pycon.org/ to find out where and when the next PyCon will be held near you. If you cannot attend, you can still perform a search on https://www.youtube.com/ because most of the interesting sessions are recorded and uploaded there. Attending and watching the real demonstration is a different thing anyway, so we warmly suggest you attend such conventions because they are really worth it. Similarly, PyData, a community of Python developers and users devoted to data analysis, hold many events around the world. You can check out http://pydata.org/events.html for upcoming events (to go to and attend) and check whether any past event may have interested you. As with PyCon, presentations are often available on YouTube, on dedicated channels such as PyDataTV.

Interactive Jupyter

Sometimes, you need some written explanations and the opportunity to test some sample code by yourself. Jupyter, an open tool like Python itself, offers you all of this via its notebooks—interactive web pages where you will find both explanations and example code that can be tested directly. We devote explanations about Jupyter and its kernels throughout the book because it is a real data science workhorse. It allows you easily to run Python scripts and evaluate their effects on the data that you are working on.

The GitHub location of the IPython kernel (the Python kernel of Jupyter, since Jupyter can run many different programming languages) offers a complete list of example notebooks. You can check it out at: https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks. In particular, a section of the list is about General Python Programming, whereas another one is about Statistics, Machine Learning, and Data Science, where you will find quite a lot of examples of Python scripts that you can take inspiration from in your learning.

Don't be shy, take a real challenge

If you want to do something that can take your Python coding ability to a different level, we suggest you go and take a challenge on Kaggle. Kaggle (http://www.kaggle.com/) is a platform for predictive modeling and analytic competitions, which applies the idea of competitive programming (participants try to program according to the provided specifications) in data science by proposing challenging data problems to participants and asking them to provide possible solutions that are evaluated on a test set. The results of the test set are partly public, partly private. The most interesting part for a Python learner is the opportunity to take part in a real problem with no obvious solution, which requires you to code something to propose possible solutions to the problem, even something simple or naive (which we warmly suggest you start with first before getting involved in complex solutions). By doing so, the learner will come across interesting tutorials, beat-the-benchmark codes, helpful communities of data scientists, and some very smart solutions proposed by other data scientists or Kaggle itself in its blog, no free hunch (http://blog.kaggle.com/).

You may wonder how to find the right challenge for yourself. Just have a look at the present and past competitions at https://www.kaggle.com/competitions and look for every competition that has knowledge as a reward. You will be surprised to find an ideal stage for learning about how other data scientists code in Python, and you can immediately apply what you learn from this book.

Python Data Science Essentials - Second Edition

By : Luca Massaron, Alberto Boschetti

Python Data Science Essentials - Second Edition

By: Luca Massaron, Alberto Boschetti

Overview of this book

Related Content you might be interested in

Current Title:

Python Data Science Essentials - Second Edition

Learn by watching, reading, and doing

MOOCs

PyCon and PyData

Interactive Jupyter

Don't be shy, take a real challenge