Book Image

Learning Cython Programming (Second Edition) - Second Edition

By : Philip Herron
Book Image

Learning Cython Programming (Second Edition) - Second Edition

By: Philip Herron

Overview of this book

Cython is a hybrid programming language used to write C extensions for Python language. Combining the practicality of Python and speed and ease of the C language it’s an exciting language worth learning if you want to build fast applications with ease. This new edition of Learning Cython Programming shows you how to get started, taking you through the fundamentals so you can begin to experience its unique powers. You’ll find out how to get set up, before exploring the relationship between Python and Cython. You’ll also look at debugging Cython, before moving on to C++ constructs, Caveat on C++ usage, Python threading and GIL in Cython. Finally, you’ll learn object initialization and compile time, and gain a deeper insight into Python 3, which will help you not only become a confident Cython developer, but a much more fluent Python developer too.
Table of Contents (14 chapters)
Learning Cython Programming Second Edition
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Parsing large amounts of data


I want to try and prove how powerful and natively compiled C types are to programmers by showing the difference in parsing large amounts of XML. We can take the geographic data from the government as the test data for this experiment (http://www.epa.gov/enviro/geospatial-data-download-service).

Let's look at the size of this XML data:

 ls -liah
total 480184
7849156 drwxr-xr-x   5 redbrain  staff   170B 25 Jul 16:42 ./
5803438 drwxr-xr-x  11 redbrain  staff   374B 25 Jul 16:41 ../
7849208 -rw-r--r--@  1 redbrain  staff   222M  9 Mar 04:27 EPAXMLDownload.xml
7849030 -rw-r--r--@  1 redbrain  staff    12M 25 Jul 16:38 EPAXMLDownload.zip
7849174 -rw-r--r--   1 redbrain  staff    57B 25 Jul 16:42 README

It's huge! Before we write programs, we need to understand a little bit about the structure of this data to see what we want to do with it. It contains facility site locations with addresses. This seems to be the bulk of the data in here, so let's try and parse it all...