Book Image

Jupyter Cookbook

By : Dan Toomey
Book Image

Jupyter Cookbook

By: Dan Toomey

Overview of this book

Jupyter has garnered a strong interest in the data science community of late, as it makes common data processing and analysis tasks much simpler. This book is for data science professionals who want to master various tasks related to Jupyter to create efficient, easy-to-share, scientific applications. The book starts with recipes on installing and running the Jupyter Notebook system on various platforms and configuring the various packages that can be used with it. You will then see how you can implement different programming languages and frameworks, such as Python, R, Julia, JavaScript, Scala, and Spark on your Jupyter Notebook. This book contains intuitive recipes on building interactive widgets to manipulate and visualize data in real time, sharing your code, creating a multi-user environment, and organizing your notebook. You will then get hands-on experience with Jupyter Labs, microservices, and deploying them on the web. By the end of this book, you will have taken your knowledge of Jupyter to the next level to perform all key tasks associated with it.
Table of Contents (17 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Reading CSV files


The most common file format for datasets is a comma separated value (CSV) file. A CSV may have a header record followed by a variable number of data records.

The header record may be the first record in the file. In that record, the separated values are headings or column names for each of the columns of data in the file. The column names are all character string values. We can use these column names for variable names in our scripts, corresponding to column names in a dataset.

Each subsequent data record will have a separated value in that record for every column. The value may be an empty string or no value, but the comma separation of the record will correspond to the columns in the header record. 

If there is no header record, you may have to find out what the column layout is for the file. There is normally a descriptor in the same location as the CSV file that describes each of the columns. In this case, you have to manually assign column names to your working dataset...