Book Image

Cracking the Data Science Interview

By : Leondra R. Gonzalez, Aaren Stubberfield
Book Image

Cracking the Data Science Interview

By: Leondra R. Gonzalez, Aaren Stubberfield

Overview of this book

The data science job market is saturated with professionals of all backgrounds, including academics, researchers, bootcampers, and Massive Open Online Course (MOOC) graduates. This poses a challenge for companies seeking the best person to fill their roles. At the heart of this selection process is the data science interview, a crucial juncture that determines the best fit for both the candidate and the company. Cracking the Data Science Interview provides expert guidance on approaching the interview process with full preparation and confidence. Starting with an introduction to the modern data science landscape, you’ll find tips on job hunting, resume writing, and creating a top-notch portfolio. You’ll then advance to topics such as Python, SQL databases, Git, and productivity with shell scripting and Bash. Building on this foundation, you'll delve into the fundamentals of statistics, laying the groundwork for pre-modeling concepts, machine learning, deep learning, and generative AI. The book concludes by offering insights into how best to prepare for the intensive data science interview. By the end of this interview guide, you’ll have gained the confidence, business acumen, and technical skills required to distinguish yourself within this competitive landscape and land your next data science job.
Table of Contents (21 chapters)
Free Chapter
1
Part 1: Breaking into the Data Science Field
4
Part 2: Manipulating and Managing Data
10
Part 3: Exploring Artificial Intelligence
16
Part 4: Getting the Job

Aggregating data with GROUP BY and HAVING

Aggregation is a concept with which you should already be familiar thanks to the discussion of Python using pandas in Chapter 3. Just like in Python, aggregation in SQL is about summarizing or grouping data in a way that makes it more useful, understandable, and manageable. GROUP BY and HAVING are two crucial components in SQL that help accomplish this.

The GROUP BY statement

Much like how grouping is performed in Python using pandas, the GROUP BY statement in SQL is used with aggregate functions (such as COUNT, SUM, AVG, MAX, and MIN) to group the result set by one or more columns. Thus, using GROUP BY should be familiar to you! The syntax is as follows:

SELECT column1, column2, columnN aggregate_function(columnX)
FROM table
GROUP BY columns(s);

Aggregate values are best managed by using aliases. An alias is simply a nickname for a calculated or aggregated field or temporary table. Simply use the term AS, like so:

SELECT column1...