Preface

The Structured Query Language, better known as SQL, has been the main workhorse for data professionals over the past five decades. It is one of the most sought-after skills in the job market and is frequently tested in various job interviews. It is widely taught in colleges around the world, from computer science majors to business/statistics schools, and from undergraduate studies to doctoral programs. Yet, as popular as it is, there is a common complaint that the teaching and studying of SQL are isolated from the actual usage patterns in the real world. The authors have personally heard many new grads complaining that although they learned SQL in school, they still don’t know how to start writing SQL statements when they are assigned to their first projects at work. The traditional way of teaching is very well structured and detail-oriented but focuses on functionalities instead of the purpose and usage scenarios, creating a disconnect for beginners.

This disconnect is exactly why this book is here. From the very first edition, this book has focused on one goal: to help beginners understand not only what is in SQL, but also when and how to use it. With this in mind, the contents of this book have been arranged based on the natural flow of everyday data usage, instead of following the semantic definitions of relational databases. The book also utilizes a combination of discussions, hands-on exercises, and exploratory activities, with GitHub-based sample code, to get you more involved in the learning process. Based on the overwhelmingly (thankfully!) positive feedback we have received, this goal has been well achieved.

As positive as the response has been, ever since the third edition of this book, we have also noticed opportunities to improve. During the 2020 pandemic, the authors were engaged in several semesters of remote teaching of SQL, and had the opportunity to test different arrangements of SQL content and learning activities. It was observed that while students benefit from a mixture of lectures and hands-on exercises, the most welcomed approach is to have hands-on exercises immediately following lectures, and then have a related activity shortly after the exercises. As such, in this fourth edition, we have adjusted the topics to follow a learning path that is more natural for beginners, which was derived from our experience in remote teaching. We have added hands-on exercises right after each topic. Finally, as the chapters wrap up, we offer activities without step-by-step instructions for you to practice the skills you just obtained. We hope this new format makes the learning process easier.

SQL is already 55 years old, even older than the authors! Yet it is still evolving, and still dominating the data management practices. The authors, having been in the data management field for so long, are also actively learning and adapting. This book is also a part of this learning/adapting process. We would like to invite you to join us in this journey, and welcome your feedback so that this book can evolve too. Bon voyage!

Who this book is for

This book is primarily for beginners who would like to teach themselves how to use SQL for data analytics, but could also be useful for analytics professionals who would like to learn specific features and functionalities of SQL.

What this book covers

Chapter 1, Introduction to Data Management Systems, introduces how to represent real-world objects using data and provides a foundational understanding of relational databases and SQL. It also guides you through setting up a PostgreSQL database on your machine to store, organize, and analyze data effectively.

Chapter 2, Creating Tables with Solid Structures, introduces the fundamentals of working with relational data using SQL, including creating tables, inserting data, and performing CRUD operations. It also covers defining data types, creating tables from existing datasets, and deleting tables, giving you a solid foundation for managing the full life cycle of relational data.

Chapter 3, Exchanging Data Using COPY, introduces how to export data from and import data into a PostgreSQL database. These skills enable you to transfer data between PostgreSQL and external filesystems for tasks such as backup, migration, and integration.

Chapter 4, Manipulating Data with Python, introduces the basics of getting started with Python and using it to manage data. You will learn how to set up a Python environment, connect to a relational database, and perform data manipulation tasks within the database.

Chapter 5, Presenting Data with SELECT, introduces how to use SQL SELECT expressions to retrieve data from a database. It also covers filtering query results, enabling you to extract specific subsets of data based on defined conditions.

Chapter 6, Transforming and Updating Data, introduces how to modify existing data and table structures in a database to reflect changes in real-world scenarios. It also covers applying data transformation functions and creating user-defined functions for more advanced and customized data processing.

Chapter 7, Defining Datasets from Existing Datasets, introduces techniques for creating derived datasets, joining multiple tables, and performing set operations in SQL. These skills enable you to analyze complex relationships and combine data from various sources to support advanced data exploration and research.

Chapter 8, Aggregating Data with GROUP BY, introduces how to aggregate data in SQL to summarize and extract key insights from raw datasets. It covers the use of GROUP BY and HAVING clauses to organize and filter aggregated results, helping you understand overall patterns and trends in your data.

Chapter 9, Inter-Row Operation with Window Functions, introduces window functions in SQL, which allow you to perform calculations across rows related to the current row without collapsing the result set. It also covers advanced window definitions, enabling you to analyze row positions and relationships within partitions of your dataset.

Chapter 10, Performant SQL, introduces how PostgreSQL scans databases and indexes to retrieve data efficiently. Understanding these mechanisms will help you write optimized queries that improve database performance.

Chapter 11, Processing JSON and Arrays, introduces JSON and array data types in PostgreSQL. You will learn how to parse and work with these complex types, expanding your ability to manage data beyond traditional relational formats.

Chapter 12, Advanced Data Types: Date, Text, and Geospatial, introduces how to work with specialized data types such as date and time, text, and geospatial data in PostgreSQL. By mastering their unique processing rules, you’ll enhance your ability to analyze complex real-world scenarios within a relational database.

Chapter 13, Inferential Statistics Using SQL, introduces key inferential statistical concepts, including hypothesis testing, confidence intervals, and regression analysis. It covers how to use SQL for data processing alongside statistical knowledge to gain insights and support data-informed decisions.

Chapter 14, A Case Study for Analytics Using SQL, introduces the fundamentals of data analytical systems and how to apply SQL for data analysis. You will learn how to connect your SQL skills with a real-world case study and gain an overview of the modern data management workflow.

To get the most out of this book

To get the most out of this book, do the following:

Combine the concepts you learn in this book with real-world examples that you run into at school or work
Access the SQL code in the GitHub repository of this book

Download the example code files

The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/SQL-for-Data-Analytics-Fourth-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781836646259.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example: “Here is an example of using the \COPY command to copy rows into a table from a file on the local machine.”

A block of code is set as follows:

SELECT product_id, model, base_msrp FROM products;

Any command-line input or output is written as follows:

     product_name      | discount | discounted_price
-----------------------+----------+------------------
 Lemon                 |      0.9 |          359.991
 Lemon Limited Edition |      0.9 |          719.991
 Lemon                 |      0.9 |          449.991

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book’s title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you reported this to us. Please visit http://www.packtpub.com/submit-errata, click Submit Errata, and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com/.

Share your thoughts

Once you’ve read SQL for Data Analytics, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

SQL for Data Analytics - Fourth Edition

By : Jun Shan, Benjamin Johnston, Haibin Li, Matt Goldwasser, Upom Malik

SQL for Data Analytics

By: Jun Shan, Benjamin Johnston, Haibin Li, Matt Goldwasser, Upom Malik

Overview of this book

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share your thoughts

SQL for Data Analytics - Fourth Edition

By : Jun Shan, Benjamin Johnston, Haibin Li, Matt Goldwasser, Upom Malik

SQL for Data Analytics

By: Jun Shan, Benjamin Johnston, Haibin Li, Matt Goldwasser, Upom Malik

Overview of this book

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share your thoughts

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access