Book Image

The Applied SQL Data Analytics Workshop - Second Edition

By : Matt Goldwasser, Upom Malik, Benjamin Johnston
3.5 (2)
Book Image

The Applied SQL Data Analytics Workshop - Second Edition

3.5 (2)
By: Matt Goldwasser, Upom Malik, Benjamin Johnston

Overview of this book

Every day, businesses operate around the clock and a huge amount of data is generated at a rapid pace. Hidden in this data are key patterns and behaviors that can help you and your business understand your customers at a deep, fundamental level. Are you ready to enter the exciting world of data analytics and unlock these useful insights? Written by a team of expert data scientists who have used their data analytics skills to transform businesses of all shapes and sizes, The Applied SQL Data Analytics Workshop is a great way to get started with data analysis, showing you how to effectively sieve and process information from raw data, even without any prior experience. The book begins by showing you how to form hypotheses and generate descriptive statistics that can provide key insights into your existing data. As you progress, you'll learn how to write SQL queries to aggregate, calculate and combine SQL data from sources outside of your current dataset. You'll also discover how to work with different data types, like JSON. By exploring advanced techniques, such as geospatial analysis and text analysis, you'll finally be able to understand your business at a deeper level. Finally, the book lets you in on the secret to getting information faster and more effectively by using advanced techniques like profiling and automation. By the end of The Applied SQL Data Analytics Workshop, you'll have the skills you need to start identifying patterns and unlocking insights in your own data. You will be capable of looking and assessing data with the critical eye of a skilled data analyst.
Table of Contents (9 chapters)
Preface
7
7. The Scientific Method and Applied Problem Solving

Summary

Data analytics is a powerful method through which to understand the world. The ultimate goal for analytics is to turn data into information and knowledge. To accomplish this goal, statistics can be used to better understand data, especially descriptive statistics and statistical significance testing.

One branch of descriptive statistics (univariate analysis) can be utilized to understand a single variable of data. Univariate analysis can be used to find outliers; the distribution of data by utilizing frequency distributions and quantiles; the central tendency of a variable by calculating the mean, median, and mode of data; and the dispersion of data using the range, standard deviation, and IQR.

Bivariate analysis can also be used to understand the relationship between data. Using scatterplots, we can determine trends, changes in trends, periodic behavior, and anomalous points in regard to two variables. We can also use the Pearson correlation coefficient to measure the strength of a linear trend between the two variables. The Pearson correlation coefficient, however, is subject to scrutiny due to the outliers or the number of data points used to calculate the coefficient. Additionally, just because two variables have a strong correlation coefficient does not mean that one variable causes the other variable.

Statistical significance testing can also provide important information about data. Statistical significance testing allows us to determine how likely certain outcomes could occur by chance and can help us to understand whether the changes seen between groups are of consequence.

Data analytics can be further enhanced by the power of relational databases. Relational databases are mature and ubiquitous technology for storing and querying data. Relational databases store data in the form of relations, also known as tables, which allow an excellent combination of performance, efficiency, and ease of use. SQL is the language used to access relational databases. SQL is a declarative language that allows users to focus on what to create as opposed to how to create it. SQL supports many different data types, including numeric data, text data, and even data structures.

When querying data, SQL allows a user to pick which fields to pull, as well as how to filter the data. This data can also be ordered, and SQL allows for as much or as little data as we need to be pulled. Creating, reading, updating, and deleting data is also fairly simple and can be quite surgical.

Having reviewed the basics of data analytics and SQL, we will move on to the next chapter's discussion of how SQL can be used to perform the first step in data analytics: the cleaning and transformation of data.