Book Image

SQL for Data Analytics. - Third Edition

By : Jun Shan, Matt Goldwasser, Upom Malik
Book Image

SQL for Data Analytics. - Third Edition

By: Jun Shan, Matt Goldwasser, Upom Malik

Overview of this book

Every day, businesses operate around the clock, and a huge amount of data is generated at a rapid pace. This book helps you analyze this data and identify key patterns and behaviors that can help you and your business understand your customers at a deep, fundamental level. SQL for Data Analytics, Third Edition is a great way to get started with data analysis, showing how to effectively sort and process information from raw data, even without any prior experience. You will begin by learning how to form hypotheses and generate descriptive statistics that can provide key insights into your existing data. As you progress, you will learn how to write SQL queries to aggregate, calculate, and combine SQL data from sources outside of your current dataset. You will also discover how to work with advanced data types, like JSON. By exploring advanced techniques, such as geospatial analysis and text analysis, you will be able to understand your business at a deeper level. Finally, the book lets you in on the secret to getting information faster and more effectively by using advanced techniques like profiling and automation. By the end of this book, you will be proficient in the efficient application of SQL techniques in everyday business scenarios and looking at data with the critical eye of analytics professional.
Table of Contents (11 chapters)
9
9. Using SQL to Uncover the Truth: A Case Study

Column Constraints

Column constraints are keywords that help you specify the properties you want to attribute to a particular column. In other words, you can ensure that all the rows in that column adhere to your specified constraint. Some major column constraints are as follows:

  • NOT NULL: This constraint guarantees that no value in a column can be NULL.
  • UNIQUE: This constraint guarantees that every single row for a column has a unique value and that no value is repeated.
  • PRIMARY KEY: This is a special constraint that is unique for each row and helps you to find a specific row more quickly. If the primary key of this table contains only one column, you can add this PRIMARY KEY constraint to the column definition of the primary key column. If the primary key of this table consists of multiple columns, you need to use a table constraint to define the key in the CREATE statement.

Simple CREATE Statement

Now that you know about data types and column constraints, you can start creating your first table. Suppose you want to create a table called state_populations with columns for the initials and populations of states. The query would look as follows:

CREATE TABLE state_populations (
  state VARCHAR(2) PRIMARY KEY,
  population NUMERIC
);

Once you execute this statement, you can run a simple SELECT statement to verify that the table is created. However, you cannot see any row in the output as you have not run any statements to populate it.

Figure 2.32: Simple CREATE statement

Figure 2.32: Simple CREATE statement

Note

Sometimes, you may run a CREATE TABLE query and get the error relation {table_name} already exists. This simply means that a table with the same name already exists. You either must delete the table with the same name or change the name of your table. You will learn how to delete a table later in this chapter.

You will soon be exploring the second way to create a table, which is by using a SQL query. But first, you will do an exercise to create a blank table in SQL.

Exercise 2.03: Creating a Table in SQL

In this exercise, you will create a table using the CREATE TABLE statement. The marketing team at ZoomZoom would like to create a table called countries to analyze the data of different countries. It should have four columns: an integer key column, a unique name column, a founding year column, and a capital column.

Follow these steps to complete the exercise:

  1. Open pgAdmin, connect to the sqlda database, and open SQL query editor.
  2. Execute the following query to drop the countries table since it already exists in the database:
    DROP TABLE IF EXISTS countries;
  3. Run the following query to create the countries table:
    CREATE TABLE countries ( 
      key INT PRIMARY KEY,
      name text UNIQUE, 
      founding_year INT, 
      capital text
    );

You should get a result message as follows, which indicates the creation of a blank table:

Figure 2.33: CREATE statement for the countries table

Figure 2.33: CREATE statement for the countries table

Note

To access the source code for this specific section, please refer to https://packt.link/COMnA.

In this exercise, you learned how to create a table using different column constraints and the CREATE TABLE statement. In the next section, you will create tables using the SELECT query.

Creating Tables with SELECT

You already know how to create a table. However, say you wanted to create a table using data from an existing table. This can be done by using a modification of the CREATE TABLE statement:

CREATE TABLE {table_name} AS (
  {select_query}
);

Here, {select_query} is any SELECT query that can be run in your database. For instance, say you wanted to create a table based on the products table that only had products from the year 2014. Suppose the title of the table is products_2014; you could write the following query:

CREATE TABLE products_2014 AS ( 
  SELECT *
FROM products 
WHERE year=2014
);

Running this SQL will yield the following result:

Figure 2.34: CREATE from a SELECT query

Figure 2.34: CREATE from a SELECT query

This can be done with any query, and the table will inherit all the properties of the output query.

PostgreSQL also provides another way to create a table from a query, which utilizes a SELECT … INTO … syntax. An example of this syntax is shown below:

SELECT * 
INTO products_2014
FROM products 
WHERE year=2014;

Note

Before running this query, please check the table list in the sqlda database and make sure this table does not exist. If it does, please drop the table from the console.

This query achieves the same result as the CREATE … AS statement. In this book, you will use the CREATE … AS statement because the syntax inside the parenthesis is a complete SELECT statement, thus it is easier to create and modify the query without changing the structure of the statement. You can choose either based on your personal preference.

One issue with creating a table with a query is that the data types of the query are not explicitly specified and can be confusing. Luckily, PostgreSQL stores the table definitions in a set of system tables, and you can read the table definition from the system tables. For example, to check the column definitions of the products_2014 table, you can run the following SQL:

SELECT COLUMN_NAME, DATA_TYPE 
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_NAME = 'products_2014';

From the result, you can identify all the columns and their data types in the products_2014 table:

Figure 2.35: Query table definition from information schema

Figure 2.35: Query table definition from information schema