Book Image

Python Data Analysis - Third Edition

By : Avinash Navlani, Ivan Idris
5 (1)
Book Image

Python Data Analysis - Third Edition

5 (1)
By: Avinash Navlani, Ivan Idris

Overview of this book

Data analysis enables you to generate value from small and big data by discovering new patterns and trends, and Python is one of the most popular tools for analyzing a wide variety of data. With this book, you’ll get up and running using Python for data analysis by exploring the different phases and methodologies used in data analysis and learning how to use modern libraries from the Python ecosystem to create efficient data pipelines. Starting with the essential statistical and data analysis fundamentals using Python, you’ll perform complex data analysis and modeling, data manipulation, data cleaning, and data visualization using easy-to-follow examples. You’ll then understand how to conduct time series analysis and signal processing using ARMA models. As you advance, you’ll get to grips with smart processing and data analytics using machine learning algorithms such as regression, classification, Principal Component Analysis (PCA), and clustering. In the concluding chapters, you’ll work on real-world examples to analyze textual and image data using natural language processing (NLP) and image analytics techniques, respectively. Finally, the book will demonstrate parallel computing using Dask. By the end of this data analysis book, you’ll be equipped with the skills you need to prepare data for analysis and create meaningful data visualizations for forecasting values from data.
Table of Contents (20 chapters)
1
Section 1: Foundation for Data Analysis
6
Section 2: Exploratory Data Analysis and Data Cleaning
11
Section 3: Deep Dive into Machine Learning
15
Section 4: NLP, Image Analytics, and Parallel Computing

The skillsets of data analysts and data scientists

A data analyst is someone who discovers insights from data and creates value out of it. This helps decision-makers understand how the business is performing. Data analysts must acquire the following skills:

  • Exploratory Data Analysis (EDA): EDA is an essential skill for data analysts. It helps with inspecting data to discover patterns, test hypotheses, and assure assumptions.
  • Relational Database: Knowledge of at least one of the relational database tools, such as MySQL or Postgre, is mandatory. SQL is a must for working on relational databases.
  • Visualization and BI Tools: A picture speaks more than words. Visuals have more of an impact on humans and visuals are a clear and easy option for representing the insights. Visualization and BI tools such as Tableau, QlikView, MS Power BI, and IBM Cognos can help analysts visualize and prepare reports.
  • Spreadsheet: Knowledge of MS Excel, WPS, Libra, or Google Sheets is mandatory for storing and managing data in tabular form.
  • Storytelling and Presentation Skills: The art of storytelling is another necessary skill. A data analyst should be an expert in connecting data facts to an idea or an incident and turning it into a story.

On the other hand, the primary job of a data scientist is to solve problems using data. In order to do this, they need to understand the client's requirements, their domain, their problem space, and ensure that they get exactly what they really want. The tasks that data scientists undertake vary from company to company. Some companies use data analysts and offer the title of data scientist just to glorify the job designation. Some combine data analyst tasks with data engineers and offer data scientists designation; others assign them to machine learning-intensive tasks with data visualizations.

The task of the data scientist varies, depending on the company. Some employ data scientists as well-known data analysts and combine their responsibilities with data engineers. Others give them the task of performing intensive data visualization on machines.

A data scientist has to be a jack of all trades and wear multiple hats, including those of a data analyst, statistician, mathematician, programmer, ML, or NLP engineer. Most people are not skilled enough or experts in all these trades. Also, getting skilled enough requires lots of effort and patience. This is why data science cannot be learned in 3 or 6 months. Learning data science is a journey. A data scientist should have a wide variety of skills, such as the following:

  • Mathematics and Statistics: Most machine learning algorithms are based on mathematics and statistics. Knowledge of mathematics helps data scientists develop custom solutions.
  • Databases: Knowledge of SQL allows data scientists to interact with the database and collect the data for prediction and recommendation.
  • Machine Learning: Knowledge of supervised machine learning techniques such as regression analysis, classification techniques, and unsupervised machine learning techniques such as cluster analysis, outlier detection, and dimensionality reduction.
  • Programming Skills: Knowledge of programming helps data scientists automate their suggested solutions. Knowledge of Python and R is recommended.
  • Storytelling and Presentation skills: Communicating the results in the form of storytelling via PowerPoint presentations.
  • Big Data Technology: Knowledge of big data platforms such as Hadoop and Spark helps data scientists develop big data solutions for large-scale enterprises.
  • Deep Learning Tools: Deep learning tools such as Tensorflow and Keras are utilized in NLP and image analytics.

Apart from these skillsets, knowledge of web scraping packages/tools for extracting data from diverse sources and web application frameworks such as Flask or Django for designing prototype solutions is also obtained. It is all about the skillset for data science professionals.

Now that we have covered the basics of data analysis and data science, let's dive into the basic setup needed to get started with data analysis. In the next section, we'll learn how to install Python.