Book Image

SQL for Data Analytics - Third Edition

By : Jun Shan, Matt Goldwasser, Upom Malik, Benjamin Johnston
Book Image

SQL for Data Analytics - Third Edition

By: Jun Shan, Matt Goldwasser, Upom Malik, Benjamin Johnston

Overview of this book

Every day, businesses operate around the clock, and a huge amount of data is generated at a rapid pace. This book helps you analyze this data and identify key patterns and behaviors that can help you and your business understand your customers at a deep, fundamental level. SQL for Data Analytics, Third Edition is a great way to get started with data analysis, showing how to effectively sort and process information from raw data, even without any prior experience. You will begin by learning how to form hypotheses and generate descriptive statistics that can provide key insights into your existing data. As you progress, you will learn how to write SQL queries to aggregate, calculate, and combine SQL data from sources outside of your current dataset. You will also discover how to work with advanced data types, like JSON. By exploring advanced techniques, such as geospatial analysis and text analysis, you will be able to understand your business at a deeper level. Finally, the book lets you in on the secret to getting information faster and more effectively by using advanced techniques like profiling and automation. By the end of this book, you will be proficient in the efficient application of SQL techniques in everyday business scenarios and looking at data with the critical eye of analytics professional.
Table of Contents (11 chapters)
9
9. Using SQL to Uncover the Truth: A Case Study

Loading the Sample Datasets – Linux

Most exercises in this book use a sample database, sqlda, which contains fabricated data for a fictional electric vehicle company called ZoomZoom. Set it up by performing the following steps:

  1. Switch to the postgres user by typing the following command in the terminal. Press the return key to execute it:
    sudo su postgres

You should see your shell change as follows:

Figure 0.26: Loading the sample datasets on Linux

Figure 0.26: Loading the sample datasets on Linux

  1. Type or paste the following command to create a new database called sqlda. Press the return key to execute it:
    createdb sqlda

You can then type the psql command to enter the PostgreSQL shell, followed by \l (a backslash followed by lowercase L) to check if the database was successfully created:

Figure 0.27: Accessing the PostgreSQL shell on Linux

Figure 0.27: Accessing the PostgreSQL shell on Linux

Enter \q and then press the return key to quit the PostgreSQL shell.

  1. Download the data.dump file from the Datasets folder in the GitHub repository of this book by running this command:
    wget "https://github.com/PacktPublishing/SQL-for-Data-Analytics-Third-Edition/tree/main/Datasets/data.dump"
  2. Navigate to the folder where you have downloaded the file using the cd command. Then, type the following command:
    psql -d sqlda < data.dump
  3. Then, wait for the dataset to be imported:
Figure 0.28: Importing the dataset on Linux

Figure 0.28: Importing the dataset on Linux

  1. To test whether the dataset was imported correctly, type ppsql postgres and then press the return key to enter the PostgreSQL shell. Then, run \c sqlda followed by \dt to see the list of tables within the database:
Figure 0.29: Validating the import on Linux

Figure 0.29: Validating the import on Linux

Note

You are importing the database using the postgres superuser for demonstration purposes only. It is advised in production environments to use a separate account.