Book Image

SQL for Data Analytics

By : Upom Malik, Matt Goldwasser, Benjamin Johnston
3 (1)
Book Image

SQL for Data Analytics

3 (1)
By: Upom Malik, Matt Goldwasser, Benjamin Johnston

Overview of this book

Understanding and finding patterns in data has become one of the most important ways to improve business decisions. If you know the basics of SQL, but don't know how to use it to gain the most effective business insights from data, this book is for you. SQL for Data Analytics helps you build the skills to move beyond basic SQL and instead learn to spot patterns and explain the logic hidden in data. You'll discover how to explore and understand data by identifying trends and unlocking deeper insights. You'll also gain experience working with different types of data in SQL, including time-series, geospatial, and text data. Finally, you'll learn how to increase your productivity with the help of profiling and automation. By the end of this book, you'll be able to use SQL in everyday business scenarios efficiently and look at data with the critical eye of an analytics professional. Please note: if you are having difficulty loading the sample datasets, there are new instructions uploaded to the GitHub repository. The link to the GitHub repository can be found in the book's preface.
Table of Contents (11 chapters)
9
9. Using SQL to Uncover the Truth – a Case Study

The HAVING Clause

We can now perform all sorts of aggregate operations using GROUP BY. Sometimes, though, certain rows in aggregate functions may not be useful, and you may like to remove them from the query output. For example, when doing the customer counts, perhaps you are only interested in places that have at least 1,000 customers. Your first instinct may be to write something such as this:

SELECT state, COUNT(*)
FROM customers
WHERE COUNT(*)>=1,000
GROUP BY state
ORDER BY state

However, you will find that the query does not work and gives you the following error:

Figure 4.18: Error showing the query not working

In order to use filter on aggregate functions, you need to use a new clause, HAVING. The HAVING clause is similar to the WHERE clause, except it is specifically designed for GROUP BY queries. The general structure of a GROUP BY operation with a HAVING statement is:

SELECT {KEY}, {AGGFUNC(column1)}
FROM {table1}
GROUP BY {KEY}
HAVING...