Mastering Python High Performance

Book Image

Mastering Python High Performance

Book Image

Mastering Python High Performance

Overview of this book

Mastering Python High Performance

Mastering Python High Performance

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Profiling 101

What is profiling?

The importance of profiling

What can we profile?

Memory consumption and memory leaks

The risk of premature optimization

Running time complexity

Profiling best practices

The Profilers

Getting to know our new best friends: the profilers

Going Visual – GUIs to Help Understand Profiler Output

Going Visual – GUIs to Help Understand Profiler Output

KCacheGrind – pyprof2calltree

Optimize Everything

Optimize Everything

Memoization / lookup tables

Usage of default arguments

List comprehension and generators

String concatenation

Other tips and tricks

Multithreading versus Multiprocessing

Multithreading versus Multiprocessing

Parallelism versus concurrency

Generic Optimization Options

Generic Optimization Options

How to choose the right option

Lightning Fast Number Crunching with Numba, Parakeet, and pandas

Lightning Fast Number Crunching with Numba, Parakeet, and pandas

The pandas tool

Putting It All into Practice

Putting It All into Practice

The problem to solve

The initial code base

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

The initial code base

Let's now list all of the code that we'll optimize in future, based on the earlier description.

The first of the following points is quite simple: a single file script that takes care of scraping and saving in JSON format like we discussed earlier. The flow is simple, and the order is as follows:

It will query the list of questions page by page.
For each page, it will gather the question's links.
Then, for each link, it will gather the information listed from the previous points.
It will move on to the next page and start over again.
It will finally save all of the data into a JSON file.

The code is as follows:

from bs4 import BeautifulSoup
import requests
import json


SO_URL = "http://scifi.stackexchange.com"
QUESTION_LIST_URL = SO_URL + "/questions"
MAX_PAGE_COUNT = 20

global_results = []
initial_page = 1 #first page is page 1

def get_author_name(body):
  link_name = body.select(".user-details a")
  if len(link_name) == 0:
    text_name = body.select(".user-details")
...