Book Image

SQL for Data Analytics

By : Upom Malik, Matt Goldwasser, Benjamin Johnston
3 (1)
Book Image

SQL for Data Analytics

3 (1)
By: Upom Malik, Matt Goldwasser, Benjamin Johnston

Overview of this book

Understanding and finding patterns in data has become one of the most important ways to improve business decisions. If you know the basics of SQL, but don't know how to use it to gain the most effective business insights from data, this book is for you. SQL for Data Analytics helps you build the skills to move beyond basic SQL and instead learn to spot patterns and explain the logic hidden in data. You'll discover how to explore and understand data by identifying trends and unlocking deeper insights. You'll also gain experience working with different types of data in SQL, including time-series, geospatial, and text data. Finally, you'll learn how to increase your productivity with the help of profiling and automation. By the end of this book, you'll be able to use SQL in everyday business scenarios efficiently and look at data with the critical eye of an analytics professional. Please note: if you are having difficulty loading the sample datasets, there are new instructions uploaded to the GitHub repository. The link to the GitHub repository can be found in the book's preface.
Table of Contents (11 chapters)
9
9. Using SQL to Uncover the Truth – a Case Study

Window Functions

Aggregate functions allow us to take many rows and convert those rows into one number. For example, the COUNT function takes in the rows of a table and returns the number of rows there are. However, we sometimes want to be able to calculate multiple rows but still keep all the rows following the calculation. For example, let's say you wanted to rank every user in order according to the time they became a customer, with the earliest customer being ranked 1, the second-earliest customer being ranked 2, and so on. You can get all the customers using the following query:

SELECT *
FROM customers
ORDER BY date_added;

You can order customers from the earliest to the most recent, but you can't assign them a number. You can use an aggregate function to get the dates and order them that way:

SELECT date_added, COUNT(*)
FROM customers
GROUP BY date_added
ORDER BY date_added

The following is the output of the preceding code:

Figure 5...