Book Image

SQL for Data Analytics - Third Edition

By : Jun Shan, Matt Goldwasser, Upom Malik, Benjamin Johnston
Book Image

SQL for Data Analytics - Third Edition

By: Jun Shan, Matt Goldwasser, Upom Malik, Benjamin Johnston

Overview of this book

Every day, businesses operate around the clock, and a huge amount of data is generated at a rapid pace. This book helps you analyze this data and identify key patterns and behaviors that can help you and your business understand your customers at a deep, fundamental level. SQL for Data Analytics, Third Edition is a great way to get started with data analysis, showing how to effectively sort and process information from raw data, even without any prior experience. You will begin by learning how to form hypotheses and generate descriptive statistics that can provide key insights into your existing data. As you progress, you will learn how to write SQL queries to aggregate, calculate, and combine SQL data from sources outside of your current dataset. You will also discover how to work with advanced data types, like JSON. By exploring advanced techniques, such as geospatial analysis and text analysis, you will be able to understand your business at a deeper level. Finally, the book lets you in on the secret to getting information faster and more effectively by using advanced techniques like profiling and automation. By the end of this book, you will be proficient in the efficient application of SQL techniques in everyday business scenarios and looking at data with the critical eye of analytics professional.
Table of Contents (11 chapters)
9
9. Using SQL to Uncover the Truth: A Case Study

Updating Tables

Over time, you may also need to modify a table by adding columns, adding new data, or updating existing rows. This section will help you understand how to do this.

Adding and Removing Columns

To add new columns to an existing table, you use the ALTER TABLE … ADD COLUMN statement, as shown in the following query:

ALTER TABLE {table_name}
ADD COLUMN {column_name} {data_type};

For example, if you wanted to add a new column to the products_2014 table that you will use to store the products' weights in kilograms called weight, you could do this by using the following query:

ALTER TABLE products_2014 
ADD COLUMN weight INT;

This query will make a new column called weight in the products_2014 table and will give it the integer data type so that only integers can be stored in it.

Figure 2.36: ALTER statement that adds a column to a table

Figure 2.36: ALTER statement that adds a column to a table

If you want to remove a column from a table, you can use the ALTER TABLE … DROP COLUMN statement:

ALTER TABLE {table_name} 
DROP COLUMN {column_name};

Here, {table_name} is the name of the table you want to change, and {column_name} is the name of the column you want to drop. Imagine that you decide to delete the weight column you just created. You could get rid of it using the following query:

ALTER TABLE products_2014 
DROP COLUMN weight;

As you can see from the screenshot below, the column is dropped:

Figure 2.37: ALTER statement that drops a column from a table

Figure 2.37: ALTER statement that drops a column from a table

Adding New Data

You can add new data to a table using several methods in SQL. One of those methods is to simply insert values straight into a table using the INSERT INTO… VALUES statement. It has the following structure:

INSERT INTO {table_name} (
  {column_1], {column_2}, …{column_last}
)
VALUES (
  {column_value_1}, {column_value_2}, … {column_value_last}
);

Here, {table_name} is the name of the table you want to insert your data into, {column_1}, {column_2}, … {column_last} is a list of the columns whose values you want to insert, and {column_value_1}, {column_value_2}, … {column_value_last} is the list of values you want to insert into the table. If a column in the table is not put into the INSERT statement, the column is assumed to have a NULL value.

For example, say you want to insert a new entry for a scooter into the products_2014 table. This can be done with the following query:

INSERT INTO products_2014 ( 
product_id, model, year, 
  product_type, base_msrp,
  production_start_date, production_end_date
)
VALUES (
  13, 'Nimbus 5000', 2014,
  'scooter', 500.00,
  '2014-03-03', '2020-03-03'
);

This query adds a new row to the products_2014 table accordingly. You can run a SELECT query to see all the rows in the table:

Figure 2.38: INSERT statement adding one row to table

Figure 2.38: INSERT statement adding one row to table

Another way to insert data into a table is to use the INSERT statement with a SELECT query using the following syntax:

INSERT INTO {table_name} ({column_1], {column_2}, …{column_last})
{select_query};

Here, {table_name} is the name of the table into which you want to insert the data, {column_1}, {column_2}, … {column_last} is a list of the columns whose values you want to insert, and {select query} is a query with the same structure as the values you want to insert into the table.

Take the example of the products_2014 table. You have created it with a SELECT query with one row. Earlier in this section, you have inserted one row into it. So, now it contains two rows. If you also want to insert the products from 2016, you could use the following query, which inserts one more row into the table:

INSERT INTO products_2014( 
  product_id, model, year, product_type, base_msrp,
  production_start_date, production_end_date
) 
SELECT* 
FROM products 
WHERE year=2016;

This query produces the following result:

Figure 2.39: The Products_2014 table after a successful INSERT INTO query

Figure 2.39: The Products_2014 table after a successful INSERT INTO query

Now it contains three rows from three different ways of inserting data: one row from CREATE as the result of a SELECT query, one row from an INSERT with data, and one row from INSERT using the result of a SELECT query.

Next, you will learn how to update the content in a row.

Updating Existing Rows

Sometimes, you may need to update the values of the data present in a table. To do this, you can use the UPDATE statement:

UPDATE {table_name} SET 
  {column_1} = {column_value_1},
  {column_2} = {column_value_2},
  …
  {column_last} = {column_value_last} 
WHERE {conditional};

Here, {table_name} is the name of the table with data that will be changed, {column_1}, {column_2},… {column_last} is the list of columns whose values you want to change, {column_value_1}, {column_value_2}, … {column_value_last} is the list of new values you want to update into those columns, and {WHERE} is a conditional statement like the one you would find in a SELECT query.

To illustrate its use of the UPDATE statement, imagine that, for the rest of the year, the company has decided to sell all scooter models before 2018 for $299.99. You could change the data in the products_2014 table using the following query:

UPDATE Products_2014 SET
  base_msrp = 299.99 
WHERE product_type = 'scooter' 
AND year<2018;

This query produces the following output. You can see that the base_msrp column of all three records has been updated to 299.99 because they are all scooters manufactured before 2018.

Figure 2.40: Successful update of the products_2014 table

Figure 2.40: Successful update of the products_2014 table

In the following exercise, you will take a closer look at how to use UPDATE statements in a SQL database.

Exercise 2.04: Updating the Table to Increase the Price of a Vehicle

In this exercise, you will update the data in a table using the UPDATE statement. Due to an increase in the cost of the rare metals needed to manufacture an electric vehicle, the 2022 Model Chi will need to undergo a price hike of 10%. The current price is $95,000.

In a real-world scenario, you will update the products table to increase the price of this product. However, because you will use the same sqlda database throughout the book, it would be better to keep the values in the original tables unchanged so that your SQL results remain consistent. For this reason, you will create new tables for all the INSERT, ALTER, UPDATE, DELETE, and DROP statement examples.

Perform the following steps to complete the exercise:

  1. Open pgAdmin, connect to the sqlda database, and open SQL query editor.
  2. Run the following query to create a product_2022 table from the products table:
    CREATE TABLE products_2022 AS ( 
    SELECT * 
    FROM products 
    WHERE year=2022
    );
  3. Run the following query to update the price of Model Chi by 10% in the products_2022 table:
    UPDATE Products_2022 SET 
      base_msrp = base_msrp*1.10 
    WHERE model='Model Chi' 
    AND year=2022;
  4. Write the SELECT query to check whether the price of Model Chi in 2022 has been updated:
    SELECT * 
    FROM products_2022 
    WHERE model='Model Chi' 
    AND year=2022;

The following is the output of the preceding code:

Figure 2.41: The updated price of Model Chi in 2022

Figure 2.41: The updated price of Model Chi in 2022

As you see from the output, the price of Model Chi is now $104,500; it was previously $95,000.

Note

To access the source code for this specific section, please refer to https://packt.link/fOQgA.

In this exercise, you learned how to update a table using the UPDATE statement. Next, you will learn how to delete data from tables and drop tables.

Deleting Data and Tables

You often discover that data in a table is out of date and, therefore, can no longer be used. At such times, you might need to delete data from a table.

Deleting Values from a Row

Often, you might be interested in deleting a value from a row. The easiest way to accomplish this is to use the UPDATE structure that has already been discussed, and by setting the column value to NULL:

UPDATE {table_name} SET 
  {column_1} = NULL,
  {column_2} = NULL,
  …
  {column_last} = NULL 
WHERE {conditional};

Here, {table_name} is the name of the table with the data that needs to be changed, {column_1}, {column_2},… {column_last} is the list of columns whose values you want to delete, and {WHERE} is a conditional statement like the one you would find in a SELECT query.

For instance, you have the wrong email address on file for the customer with the customer ID equal to 3. To fix that, you can use the following query:

UPDATE customers SET
  email = NULL 
WHERE customer_id=3;

However, there might be cases where you might need to delete rows from a table. For example, in the database, you have a row labeled test customer, which is no longer needed and needs to be deleted. In the next section, you will learn how to delete rows from a table.

Deleting Rows from a Table

Deleting a row from a table can be done using the DELETE statement, which looks like this:

DELETE FROM {table_name} 
WHERE {condition};

For instance, you must delete the products whose product_type is scooter from the products_2014 table. To do that, you can use the following query:

DELETE FROM products_2014 
WHERE product_type='scooter';

In the past few sections, you have inserted three products into this table, all scooters. After running the DELETE statement, PostgreSQL shows that there was no product in this table anymore as all records are deleted.

Figure 2.42: DELETE statement example

Figure 2.42: DELETE statement example

If you want to delete all the data in the products_2014 table without deleting the table, you could write the following query, which is DELETE without any conditions:

DELETE FROM products_2014;

Alternatively, if you want to delete all the data in a query without deleting the table, you could use the TRUNCATE keyword like so:

TRUNCATE TABLE products_2014;

Now you have learned how to delete rows from a table, the next section will teach you how to delete a table entirely.

Deleting Tables

To delete all the data in a table and the table itself, you can just use the DROP TABLE statement with the following syntax:

DROP TABLE {table_name};

Here, {table_name} is the name of the table you want to delete. If you wanted to delete all the data in the products_2014 table along with the table itself, you would write the following:

DROP TABLE products_2014;

If you want to read from this table, you will receive an error message from PostgreSQL telling you that the table does not exist:

Figure 2.43: DROP statement example

Figure 2.43: DROP statement example

As seen in Figure 2.46, once the table is dropped, all aspects of this table are gone, and you cannot perform any operations on it. For example, if you try to run the DROP TABLE products_2014 statement again, you will run into an error. A PostgreSQL enhancement of the DROP statement is DROP TABLE IF EXISTS. This statement will check the existence of the table. If the table is not in the database, PostgreSQL will skip this statement with a notification, but without reporting an error, as shown below:

DROP TABLE IF EXISTS products_2014;

Figure 2.44: DROP TABLE IF EXISTS statement example

Figure 2.44: DROP TABLE IF EXISTS statement example

DROP TABLE IF EXISTS is helpful if you want to automate SQL script execution. One common usage scenario is to use it before the CREATE TABLE statement. If the table already exists, your CREATE TABLE statement will fail and raise an error. But if your DROP TABLE IF EXISTS statement is before your CREATE TABLE statement, pre-existing tables would have been dropped before you tried to recreate them. This is useful in automated computing operations where you constantly create temporary tables that you do not need after the current computing job is completed. The catch is that you must make sure that the table is truly temporary and is not used by anyone else. Otherwise, you may accidentally drop tables that are used by some other users without knowing. For this reason, the DROP TABLE IF EXISTS statement is usually only used in environments designated for automated data processing.

Now test what you have learned by performing an exercise to delete or drop the table using the DROP TABLE statement.

Exercise 2.05: Deleting an Unnecessary Reference Table

In this exercise, you will learn how to delete a table using SQL. For instance, the marketing team has finished analyzing the potential number of customers they have in every state, and they no longer need the state_populations table. To save space in the database, delete the table. If you have not created this table, please go back to the Simple CREATE Statement section in this chapter and create it now.

Perform the following steps to complete the exercise:

  1. Open pgAdmin, connect to the sqlda database, and open SQL query editor.
  2. Run the following query to drop the state_populations table:
    DROP TABLE state_populations;
  3. Check that the state_populations table has been deleted from the database.
  4. Since the table has just been dropped, a SELECT query on this table throws an error, as expected:
    SELECT * 
    FROM state_populations;

You will find the error shown in the following figure:

Figure 2.45: Error shown as the state_populations table was dropped

Figure 2.45: Error shown as the state_populations table was dropped

  1. Also, drop the products_2022 table that was created above to keep the database clean:
    DROP TABLE products_2022;

    Note

    To access the source code for this specific section, please refer to https://packt.link/kJVag.

In this exercise, you learned how to delete a table using the DROP TABLE statement. In the next activity, you will test the skills you learned by creating and modifying tables using SQL.

Activity 2.02: Creating and Modifying Tables for Marketing Operations

In this activity, you will test your ability to create and modify tables using SQL.

You did a great job of pulling data for the marketing team. However, the marketing manager, who you helped, realized that they had made a mistake. It turns out that instead of just the query, the manager needs to create a new table in the company's analytics database. Furthermore, they need to make some changes to the data that is present in the customers table. It is your job to help the marketing manager with the table:

  1. Open pgAdmin, connect to the sqlda database and open SQL query editor. Create a new table called customers_nyc that pulls all the rows from the customers table where the customer lives in New York City in the state of New York.
  2. Delete all customers in postal code 10014 from the new table. Due to local laws, they will not be eligible for marketing.
  3. Add a new text column called event.
  4. Set the value of the event column to thank-you party.

The following is the expected output:

Figure 2.46: The customers_nyc table with event set to thank-you party

Figure 2.46: The customers_nyc table with event set to thank-you party

You tell the manager that you have completed these steps. He tells the marketing operations team, who then uses the data to launch a marketing campaign. The marketing manager then asks you to delete the customers_nyc table.

Note

To access the source code for this specific section, please refer to https://packt.link/xeMaT.

In this activity, you used different CRUD operations to modify a table as requested by the marketing manager. You will now come full circle to explore how SQL and analytics connect.

Note

The solution for this activity can be found via this link.