Book Image

Jupyter Cookbook

By : Dan Toomey
Book Image

Jupyter Cookbook

By: Dan Toomey

Overview of this book

Jupyter has garnered a strong interest in the data science community of late, as it makes common data processing and analysis tasks much simpler. This book is for data science professionals who want to master various tasks related to Jupyter to create efficient, easy-to-share, scientific applications. The book starts with recipes on installing and running the Jupyter Notebook system on various platforms and configuring the various packages that can be used with it. You will then see how you can implement different programming languages and frameworks, such as Python, R, Julia, JavaScript, Scala, and Spark on your Jupyter Notebook. This book contains intuitive recipes on building interactive widgets to manipulate and visualize data in real time, sharing your code, creating a multi-user environment, and organizing your notebook. You will then get hands-on experience with Jupyter Labs, microservices, and deploying them on the web. By the end of this book, you will have taken your knowledge of Jupyter to the next level to perform all key tasks associated with it.
Table of Contents (17 chapters)
Title Page
Copyright and Credits
Packt Upsell

Reading flat files

In contrast to the CSV files seen earlier, a flat file does not contain any separator between the fields. Since there is no separator, all records in a flat file are usually of the same length, as the length of columns is the only way of separating data. Prior to the advent of spreadsheet programs, it was a common practice to use only flat files. Flat files are still used according to the preference of the authors.

Getting ready

In this example, we will be using Python to read in a flat file. The pandas library of routines includes a function to read flat files, read_fwf. Your Python script passes in the column widths and names to read_fwf, and the function returns a DataFrame.

Of course, now that I am looking for a flat file, I can't find one! I took the first 20 records of the preceding baseball data and stored that in a flat file, baseball.txt. There is no header record. Only the first several columns are available. It looks like this:

How to do it...

We can use a Python...