Book Image

Jupyter for Data Science

By : Dan Toomey
Book Image

Jupyter for Data Science

By: Dan Toomey

Overview of this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook. If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.
Table of Contents (17 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Optimizing your script


There are optimizations that you can make to have your notebook scripts run more efficiently. The optimizations are script language dependent. We have covered using Python and R scripts in our notebooks and will cover optimizations that can be made for those two languages.

Jupyter does support additional languages, such as Scala and Spark. The other languages have their own optimization tools and strategies.

Optimizing your Python scripts

Performance tuning your Python scripts can be done using several tools:

  • timeit
  • Python regular expressions
  • String handling
  • Loop optimizations
  • hotshot profiling

Determining how long a script takes

The timeit function in Python takes a line of code and determines how long it takes to execute. You can also repeatedly execute the same script to see if there are start-up issues that need to be addressed.

timeit is used in this manner:

import timeitt = timeit.Timer("myfunction('Hello World')", "import myfunction")   t.timeit()              3.32132323232...