Note
About
This section briefly introduces the author, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.
Applied Data Science with Python and Jupyter teaches you the skills you need for entry-level data science. You'll learn about some of the most commonly used libraries that are part of the Anaconda distribution, and then explore machine learning models with real datasets to give you the skills and exposure you need for the real world. You'll finish up by learning how easy it can be to scrape and gather your own data from the open web so that you can apply your new skills in an actionable context.
Alex Galea has been doing data analysis professionally since graduating with a master's in physics from the University of Guelph in Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. More recently, Alex has been doing web data analytics, where Python continues to play a large part in his work. He frequently blogs about work and personal projects, which are generally data-centric and usually involve Python and Jupyter Notebooks.
Get up and running with the Jupyter ecosystem
Identify potential areas of investigation and perform exploratory data analysis
Plan a machine learning classification strategy and train classification models
Use validation curves and dimensionality reduction to tune and enhance your models
Scrape tabular data from web pages and transform it into Pandas DataFrames
Create interactive, web-friendly visualizations to clearly communicate your findings
Applied Data Science with Python and Jupyter is ideal for professionals with a variety of job descriptions across a large range of industries, given the rising popularity and accessibility of data science. You'll need some prior experience with Python, with any prior work with libraries such as Pandas, Matplotlib, and Pandas providing you a useful head start.
Applied Data Science with Python and Jupyter covers every aspect of the standard data workflow process with a perfect blend of theory, practical hands-on coding, and relatable illustrations. Each module is designed to build on the learnings of the previous chapter. The book contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context.
The minimum hardware requirements are as follows:
Processor: Intel i5 (or equivalent)
Memory: 8 GB RAM
Hard disk: 10 GB
An internet connection
You'll also need the following software installed in advance:
Python 3.5+
Anaconda 4.3+
Python libraries included with Anaconda installation:
matplotlib 2.1.0+
ipython 6.1.0+
requests 2.18.4+
beautifulsoup4 4.6.0+
numpy 1.13.1+
pandas 0.20.3+
scikit-learn 0.19.0+
seaborn 0.8.0+
bokeh 0.12.10+
Python libraries that require manual installation:
mlxtend
version_information
ipython-sql
pdir2
graphviz
Before you start with this book, we'll install Anaconda environment which consists of Python and Jupyter Notebook.
Visit https://www.anaconda.com/download/ in your browser.
Click on Windows, Mac, or Linux, depending on the OS you are working on.
Next, click on the Download option. Make sure you download the latest version.
Open the installer after download.
Follow the steps in the installer and that's it! Your Anaconda distribution is ready.
Search for Anaconda Prompt and open it.
Type the following commands to update conda and Jupyter:
#Update conda conda update conda #Update Jupyter conda update Jupyter #install packages conda install numpy conda install pandas conda install statsmodels conda install matplotlib conda install seaborn
To open Jupyter Notebook from Anaconda Prompt, use the following command:
jupyter notebook pip install -U scikit-learn
The code bundle for this book is also hosted on GitHub at https://github.com/TrainingByPackt/Applied-Data-Science-with-Python-and-Jupyter.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Code words in text, database table names, folder names, filenames, file extensions, path names, dummy URLs, user input, and Twitter handles are shown as follows:
"The final figure is then saved as a high resolution PNG to the figures folder."
A block of code is set as follows:
y = df['MEDV'].copy() del df['MEDV'] df = pd.concat((y, df), axis=1)
Any command-line input or output is written as follows:
jupyter notebook
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Click on New in the upper-right corner and select a kernel from the drop-down menu."