#### Overview of this book

Python is one of the most common and popular languages preferred by leading data analysts and statisticians for working with massive datasets and complex data visualizations. Become a Python Data Analyst introduces Python’s most essential tools and libraries necessary to work with the data analysis process, right from preparing data to performing simple statistical analyses and creating meaningful data visualizations. In this book, we will cover Python libraries such as NumPy, pandas, matplotlib, seaborn, SciPy, and scikit-learn, and apply them in practical data analysis and statistics examples. As you make your way through the chapters, you will learn to efficiently use the Jupyter Notebook to operate and manipulate data using NumPy and the pandas library. In the concluding chapters, you will gain experience in building simple predictive models and carrying out statistical computation and analysis using rich Python tools and proven data analysis techniques. By the end of this book, you will have hands-on experience performing data analysis with Python.
Preface
Free Chapter
The Anaconda Distribution and Jupyter Notebook
Vectorizing Operations with NumPy
Pandas - Everyone's Favorite Data Analysis Library
Visualization and Exploratory Data Analysis
Statistical Computing with Python
Introduction to Predictive Analytics Models
Other Books You May Enjoy

# Regression model to predict house prices

In this section, we will build a regression model using the housing dataset from the previous sections. We begin by loading the housing prices dataset and preparing it for modeling. We then train a linear regression model and proceed to evaluate this model in a simple but intuitive manner. We shall conclude by using this model to make predictions.

We load the libraries that we will need to use and also import the dataset. As observed in previous sections, we are aware of the fact that there are a number of neighborhoods in this dataset that contain very few observations. To eliminate this, we would use this model only for neighborhoods with more than 30 observations. To do this, we need to use the following code block:

`counts = housing['Neighborhood'].value_counts()more_than_30 = list(counts[counts>30].index)housing = housing...`