Cracking the Data Science Interview

By : Leondra R. Gonzalez, Aaren Stubberfield

Cracking the Data Science Interview

By: Leondra R. Gonzalez, Aaren Stubberfield

Overview of this book

The data science job market is saturated with professionals of all backgrounds, including academics, researchers, bootcampers, and Massive Open Online Course (MOOC) graduates. This poses a challenge for companies seeking the best person to fill their roles. At the heart of this selection process is the data science interview, a crucial juncture that determines the best fit for both the candidate and the company. Cracking the Data Science Interview provides expert guidance on approaching the interview process with full preparation and confidence. Starting with an introduction to the modern data science landscape, you’ll find tips on job hunting, resume writing, and creating a top-notch portfolio. You’ll then advance to topics such as Python, SQL databases, Git, and productivity with shell scripting and Bash. Building on this foundation, you'll delve into the fundamentals of statistics, laying the groundwork for pre-modeling concepts, machine learning, deep learning, and generative AI. The book concludes by offering insights into how best to prepare for the intensive data science interview. By the end of this interview guide, you’ll have gained the confidence, business acumen, and technical skills required to distinguish yourself within this competitive landscape and land your next data science job.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download a free PDF copy of this book

Free Chapter

Part 1: Breaking into the Data Science Field

Chapter 1: Exploring Today’s Modern Data Science Landscape

What is data science?

Exploring the data science process

Dissecting the flavors of data science

Reviewing career paths in data science

Tackling the experience bottleneck

Understanding expected skills and competencies

Exploring the evolution of data science

Summary

References

Chapter 2: Finding a Job in Data Science

Searching for your first data science job

Constructing the Golden Resume

Prepping for landing the interview

References

Part 2: Manipulating and Managing Data

Chapter 3: Programming with Python

Using variables, data types, and data structures

Indexing in Python

Using string operations

Using Python control statements, loops, and list comprehensions

Using user-defined functions

Handling files in Python

Wrangling data with pandas

Summary

References

Chapter 4: Visualizing Data and Data Storytelling

Understanding data visualization

Surveying tools of the trade

Developing dashboards, reports, and KPIs

Developing charts and graphs

Applying scenario-based storytelling

Summary

Chapter 5: Querying Databases with SQL

Introducing relational databases

Mastering SQL basics

Aggregating data with GROUP BY and HAVING

Creating fields with CASE WHEN

Analyzing subqueries and CTEs

Merging tables with joins

Calculating window functions

Approaching complex queries

Summary

Chapter 6: Scripting with Shell and Bash Commands in Linux

Introducing operating systems

Navigating system directories

Filing and directory manipulation

Scripting with Bash

Introducing control statements

Creating functions

Processing data and pipelines

Using cron

Summary

Chapter 7: Using Git for Version Control

Introducing repositories (repos)

Creating a repo

Detailing the Git workflow for data scientists

Using Git tags for data science

Understanding common operations

Summary

Part 3: Exploring Artificial Intelligence

Chapter 8: Mining Data with Probability and Statistics

Describing data with descriptive statistics

Introducing populations and samples

Understanding the Central Limit Thereom (CLT)

Shaping data with sampling distributions

Testing hypotheses

Understanding Type I and Type II errors

Summary

References

Chapter 9: Understanding Feature Engineering and Preparing Data for Modeling

Understanding feature engineering

Applying data transformations

Engineering categorical data and other features

Performing feature selection

Working with imbalanced data

Reducing the dimensionality

Summary

Chapter 10: Mastering Machine Learning Concepts

Introducing the machine learning workflow

Getting started with supervised machine learning

Getting started with unsupervised machine learning

Summarizing other notable machine learning models

Understanding the bias-variance trade-off

Tuning with hyperparameters

Summary

Chapter 11: Building Networks with Deep Learning

Introducing neural networks and deep learning

Weighing in on weights and biases

Activating neurons with activation functions

Unraveling backpropagation

Using optimizers

Understanding embeddings

Listing common network architectures

Introducing GenAI and LLMs

Summary

Chapter 12: Implementing Machine Learning Solutions with MLOps

Introducing MLOps

Understanding data ingestion

Learning the basics of data storage

Reviewing model development

Packaging for model deployment

Deploying a model with containers

Validating and monitoring the model

Using Azure ML for MLOps

Summary

Part 4: Getting the Job

Chapter 13: Mastering the Interview Rounds

Mastering early interactions with the recruiter

Mastering the different interview stages

Summary

References

Chapter 14: Negotiating Compensation

Understanding the compensation landscape

Negotiating the offer

Summary

Final words

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Aggregating data with GROUP BY and HAVING

Aggregation is a concept with which you should already be familiar thanks to the discussion of Python using pandas in Chapter 3. Just like in Python, aggregation in SQL is about summarizing or grouping data in a way that makes it more useful, understandable, and manageable. GROUP BY and HAVING are two crucial components in SQL that help accomplish this.

The GROUP BY statement

Much like how grouping is performed in Python using pandas, the GROUP BY statement in SQL is used with aggregate functions (such as COUNT, SUM, AVG, MAX, and MIN) to group the result set by one or more columns. Thus, using GROUP BY should be familiar to you! The syntax is as follows:

SELECT column1, column2, columnN aggregate_function(columnX)
FROM table
GROUP BY columns(s);

Aggregate values are best managed by using aliases. An alias is simply a nickname for a calculated or aggregated field or temporary table. Simply use the term AS, like so:

SELECT column1...

Cracking the Data Science Interview

By : Leondra R. Gonzalez, Aaren Stubberfield

Cracking the Data Science Interview

By: Leondra R. Gonzalez, Aaren Stubberfield

Overview of this book

Related Content you might be interested in

Current Title:

Cracking the Data Science Interview

Business Intelligence Career Master Plan

Machine Learning with Qlik Sense

Principles of Data Science

Aggregating data with GROUP BY and HAVING

The GROUP BY statement