Book Image

Practical Data Analysis Using Jupyter Notebook

By : Marc Wintjen
Book Image

Practical Data Analysis Using Jupyter Notebook

By: Marc Wintjen

Overview of this book

Data literacy is the ability to read, analyze, work with, and argue using data. Data analysis is the process of cleaning and modeling your data to discover useful information. This book combines these two concepts by sharing proven techniques and hands-on examples so that you can learn how to communicate effectively using data. After introducing you to the basics of data analysis using Jupyter Notebook and Python, the book will take you through the fundamentals of data. Packed with practical examples, this guide will teach you how to clean, wrangle, analyze, and visualize data to gain useful insights, and you'll discover how to answer questions using data with easy-to-follow steps. Later chapters teach you about storytelling with data using charts, such as histograms and scatter plots. As you advance, you'll understand how to work with unstructured data using natural language processing (NLP) techniques to perform sentiment analysis. All the knowledge you gain will help you discover key patterns and trends in data using real-world examples. In addition to this, you will learn how to handle data of varying complexity to perform efficient data analysis using modern Python libraries. By the end of this book, you'll have gained the practical skills you need to analyze data with confidence.
Table of Contents (18 chapters)
1
Section 1: Data Analysis Essentials
7
Section 2: Solutions for Data Discovery
12
Section 3: Working with Unstructured Big Data
16
Works Cited

What makes a good data analyst?

I will now break down the contributing factors that make up a good data analyst. From my experience, a good data analyst must be eager to learn and continue to ask questions throughout the process of working with data. The focus of those questions will vary based on the audience who are consuming the results. To be an expert in the field of data analysis, excellent communication skills are required so you can understand how to translate raw data into insights that can impact change in a positive way. To make it easier to remember, use the following acronyms to help to improve your data analyst skills.

Know Your Data (KYD)

Knowing your data is all about understanding the source technology that was used to create the data along with the business requirements and rules used to store it. Do research ahead of time to understand what the business is all about and how the data is used. For example, if you are working with a sales team, learn what drives their team's success. Do they have daily, monthly, or quarterly sales quotas? Do they do reporting for month-end/quarter-end that goes to senior management and has to be accurate because it has financial impacts on the company? Learning more about the source data by asking questions about how it will be consumed will help focus your analysis when you have to deliver results.


KYD is also about data lineage, which is understanding how the data was originally sourced including the technologies used along with the transformations that occurred before, during, and afterward. Refer back to the 3Vs so you can effectively communicate the responses from common questions about the data such as where this data is sourced from or who is responsible for maintaining the data source.

Voice of the Customer (VOC)

The concept of VOC is nothing new and has been taught at universities for years as a well-known concept applied in sales, marketing, and many other business operations. VOC is the concept of understanding customer needs by learning from or listening to their needs before, during, and after they use a company's product or service. The relevance of this concept remains important today and should be applied to every data project that you participate in. This process is where you should interview the consumers of the data analysis results before even looking at the data. If you are working with business users, listen to what their needs are by writing down the specific points on what business questions are they trying to answer.

Schedule a working session with them where you can engage in a dialog. Make sure you focus on their current pain points such as the time to curate all of the data used to make decisions. Does it take three days to complete the process every month? If you can deliver an automated data product or a dashboard that can reduce that time down to a few mouse clicks, your data analysis skills will make you look like a hero to your business users.

During a tech talk at a local university, I was asked the difference between KYD and VOC. I explained that both are important and focused on communicating and learning more about the subject area or business. The key differences are prepared versus present. KYD is all about doing your homework ahead of time to be prepared before talking to experts. VOC is all about listening to the needs of your business or consumers regarding the data.

Always Be Agile (ABA)

The agile methodology has become commonplace in the industry for application, web, and mobile development Software Development Life Cycle (SDLC). One of the reasons that makes the agile project management process successful is that it creates an interactive communication line between the business and technical teams to iteratively deliver business value through the use of data and usable features.

The agile process involves creating stories with a common theme where a development team completes tasks in 2-3 week sprints. In that process, it is important to understand the what and the why for each story including the business value/the problem you are trying to solve.

The agile approach has ceremonies where the developers and business sponsors come together to capture requirements and then deliver incremental value. That improvement in value could be anything from a new dataset available for access to a new feature added to an app.

See the following diagram for a nice visual representation of these concepts. Notice how these concepts are not linear and should require multiple iterations, which help to improve the communication between all people involved in the data analysis before, during, and after delivery of results:

Finally, I believe the most important trait of a good data analyst is a passion for working with data. If your passion can be fueled by continuously learning about all things data, it becomes a lifelong and fulfilling journey.