Book Image

SQL for Data Analytics

By : Upom Malik, Matt Goldwasser, Benjamin Johnston
3 (1)
Book Image

SQL for Data Analytics

3 (1)
By: Upom Malik, Matt Goldwasser, Benjamin Johnston

Overview of this book

Understanding and finding patterns in data has become one of the most important ways to improve business decisions. If you know the basics of SQL, but don't know how to use it to gain the most effective business insights from data, this book is for you. SQL for Data Analytics helps you build the skills to move beyond basic SQL and instead learn to spot patterns and explain the logic hidden in data. You'll discover how to explore and understand data by identifying trends and unlocking deeper insights. You'll also gain experience working with different types of data in SQL, including time-series, geospatial, and text data. Finally, you'll learn how to increase your productivity with the help of profiling and automation. By the end of this book, you'll be able to use SQL in everyday business scenarios efficiently and look at data with the critical eye of an analytics professional. Please note: if you are having difficulty loading the sample datasets, there are new instructions uploaded to the GitHub repository. The link to the GitHub repository can be found in the book's preface.
Table of Contents (11 chapters)
9
9. Using SQL to Uncover the Truth – a Case Study

Summary

Data is a powerful method by which to understand the world. The ultimate goal for analytics is to turn data into information and knowledge. To accomplish this goal, statistics can be used to better understand data, especially descriptive statistics, and statistical significance testing.

One branch of descriptive statistics, univariate analysis, can be utilized to understand a single variable of data. Univariate analysis can be used to find the distribution of data by utilizing frequency distributions and quantiles. We can also find the central tendency of a variable by calculating the mean, median, and mode of data. It can also be used to find the dispersion of data using the range, standard deviation, and IQR. Univariate analysis can also be used to find outliers.

Bivariate analysis can also be used to understand the relationship between data. Using scatterplots, we can determine trends, changes in trends, periodic behavior, and anomalous points in regard to two variables. We can also use the Pearson correlation coefficient to measure the strength of a linear trend between the two variables. The Pearson correlation coefficient, however, is subject to scrutiny due to outliers or the number of data points used to calculate the coefficient. Additionally, just because two variables have a strong correlation coefficient does not mean that one variable causes the other variable.

Statistical significance testing can also provide important information about data. Statistical significance testing allows us to determine how likely certain outcomes are to occur by chance and can help us to understand whether the changes seen between groups are of consequence.

Now that we have the basic analytical tools necessary to understand data, we will now review SQL and how we can use it to manipulate a database in the next chapter.