Book Image

Mastering Geospatial Analysis with Python

By : Silas Toms, Paul Crickard, Eric van Rees
Book Image

Mastering Geospatial Analysis with Python

By: Silas Toms, Paul Crickard, Eric van Rees

Overview of this book

Python comes with a host of open source libraries and tools that help you work on professional geoprocessing tasks without investing in expensive tools. This book will introduce Python developers, both new and experienced, to a variety of new code libraries that have been developed to perform geospatial analysis, statistical analysis, and data management. This book will use examples and code snippets that will help explain how Python 3 differs from Python 2, and how these new code libraries can be used to solve age-old problems in geospatial analysis. You will begin by understanding what geoprocessing is and explore the tools and libraries that Python 3 offers. You will then learn to use Python code libraries to read and write geospatial data. You will then learn to perform geospatial queries within databases and learn PyQGIS to automate analysis within the QGIS mapping suite. Moving forward, you will explore the newly released ArcGIS API for Python and ArcGIS Online to perform geospatial analysis and create ArcGIS Online web maps. Further, you will deep dive into Python Geospatial web frameworks and learn to create a geospatial REST API.
Table of Contents (23 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
7
Geoprocessing with Geodatabases
Index

What is Hadoop?


Hadoop is an open-source framework for working with large quantities of data spread across a single computer to thousands of computers. Hadoop is composed of four modules:

  • Hadoop Core
  • Hadoop Distributed File System (HDFS)
  • Yet Another Resource Negotiator (YARN)
  • MapReduce

The Hadoop Core makes up the components needed to run the other three modules. HDFS is a Java-based file system that has been designed to be distributed and is capable of storing large files across many machines. By large files, we are talking terabytes. YARN manages the resources and scheduling in your Hadoop framework. The MapReduce engine allows you to process data in parallel.

There are several other projects that can be installed to work with the Hadoop framework. In this chapter, you will use Hive and Ambari. Hive allows you to read and write data using SQL. You will use Hive to run the spatial queries on your data at the end of the chapter. Ambari provides a web user interface to Hadoop and Hive. In this...