Book Image

Hands-On Data Science with the Command Line

By : Jason Morris, Chris McCubbin, Raymond Page
Book Image

Hands-On Data Science with the Command Line

By: Jason Morris, Chris McCubbin, Raymond Page

Overview of this book

The Command Line has been in existence on UNIX-based OSes in the form of Bash shell for over 3 decades. However, very little is known to developers as to how command-line tools can be OSEMN (pronounced as awesome and standing for Obtaining, Scrubbing, Exploring, Modeling, and iNterpreting data) for carrying out simple-to-advanced data science tasks at speed. This book will start with the requisite concepts and installation steps for carrying out data science tasks using the command line. You will learn to create a data pipeline to solve the problem of working with small-to medium-sized files on a single machine. You will understand the power of the command line, learn how to edit files using a text-based and an. You will not only learn how to automate jobs and scripts, but also learn how to visualize data using the command line. By the end of this book, you will learn how to speed up the process and perform automated tasks using command-line tools.
Table of Contents (8 chapters)

Data Science at the Command Line and Setting It Up

"In the beginning... was the command line" Years ago, we didn't have fancy frameworks that handled our distributed computing for us, or applications that could read files intelligently and give us accurate results. If we did, it was very expensive or only worked for a small problem set, very few people had access to this technology, and it was mostly proprietary.

For newcomers to the world of data science, you might have used the command line for a small number of things. Maybe you moved a file from one place to another using mv, or read a file using cat. Or you might have never used the command line at all, or at least not for data science. In this book, we hope to show you a number of tools and ways you can perform some everyday tasks that you can do locally, without using today's buzzword framework.

We created this book for the folks who have little to no experience with the command line, and perform a lot of data extraction, modelling, parsing, and analyzing. This doesn't mean that if you do have a lot of command-line experience (a lot of DevOps and systems folks do), you shouldn't read this book. In fact, you might pick up a couple commands and techniques that you haven't used before.

In this chapter, we will cover the following topics:

  • The history of the command line
  • Language-focused shells
  • Why use the command line?

We will also walk through the setup and configuration of the command line with the following operating systems:

  • Windows 10
  • Mac OS X
  • Ubuntu Linux

If you are running a different operating system, we suggest obtaining an instance from a cloud provider or using the Docker container that's provided in this book.