Book Image

Learning Jupyter

By : Dan Toomey
Book Image

Learning Jupyter

By: Dan Toomey

Overview of this book

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. The Jupyter Notebook system is extensively used in domains such as data cleaning and transformation, numerical simulation, statistical modeling, machine learning, and much more. This book starts with a detailed overview of the Jupyter Notebook system and its installation in different environments. Next we’ll help you will learn to integrate Jupyter system with different programming languages such as R, Python, JavaScript, and Julia and explore the various versions and packages that are compatible with the Notebook system. Moving ahead, you master interactive widgets, namespaces, and working with Jupyter in a multiuser mode. Towards the end, you will use Jupyter with a big data set and will apply all the functionalities learned throughout the book.
Table of Contents (16 chapters)
Learning Jupyter
About the Author
About the Reviewer

Spark primes

We can run a series of numbers through a filter to determine whether each number is prime or not. We can use this script:

import pyspark
if not 'sc' in globals():
    sc = pyspark.SparkContext()
def is_it_prime(number):
    # make sure n is a positive integer
    number = abs(int(number))
    # simple tests
    if number < 2:
        return False
    # 2 is prime
    if number == 2:
        return True
    # other even numbers aren't
    if not number & 1:
        return False
    # check whether number is divisible into it's square root
    for x in range(3, int(number**0.5)+1, 2):
        if number % x == 0:
            return False
    #if we get this far we are good
    return True
# create a set of numbers to 100,000
numbers = sc.parallelize(xrange(100000))
# count out the number of primes we found
print numbers.filter(is_it_prime).count()

The script generates numbers up to 100,000.

We then loop over each of the numbers and pass it to our filter. If the filter returns...