Book Image

Geospatial Data Analytics on AWS

By : Scott Bateman, Janahan Gnanachandran, Jeff DeMuth

Book Image

Geospatial Data Analytics on AWS

By: Scott Bateman, Janahan Gnanachandran, Jeff DeMuth

Overview of this book

Managing geospatial data and building location-based applications in the cloud can be a daunting task. This comprehensive guide helps you overcome this challenge by presenting the concept of working with geospatial data in the cloud in an easy-to-understand way, along with teaching you how to design and build data lake architecture in AWS for geospatial data. You’ll begin by exploring the use of AWS databases like Redshift and Aurora PostgreSQL for storing and analyzing geospatial data. Next, you’ll leverage services such as DynamoDB and Athena, which offer powerful built-in geospatial functions for indexing and querying geospatial data. The book is filled with practical examples to illustrate the benefits of managing geospatial data in the cloud. As you advance, you’ll discover how to analyze and visualize data using Python and R, and utilize QuickSight to share derived insights. The concluding chapters explore the integration of commonly used platforms like Open Data on AWS, OpenStreetMap, and ArcGIS with AWS to enable you to optimize efficiency and provide a supportive community for continuous learning. By the end of this book, you’ll have the necessary tools and expertise to build and manage your own geospatial data lake on AWS, along with the knowledge needed to tackle geospatial data management challenges and make the most of AWS services.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Share Your Thoughts

Download a free PDF copy of this book

Part 1: Introduction to the Geospatial Data Ecosystem

Part 1: Introduction to the Geospatial Data Ecosystem

Free Chapter

Chapter 1: Introduction to Geospatial Data in the Cloud

Chapter 1: Introduction to Geospatial Data in the Cloud

Introduction to cloud computing and AWS

Storing geospatial data in the cloud

Building your geospatial data strategy

Geospatial data management best practices

Cost management in the cloud

Chapter 2: Quality and Temporal Geospatial Data Concepts

Chapter 2: Quality and Temporal Geospatial Data Concepts

Quality impact on geospatial data

Transmission methods

Understanding file formats

Normalizing data

Considering temporal dimensions

Part 2: Geospatial Data Lakes using Modern Data Architecture

Part 2: Geospatial Data Lakes using Modern Data Architecture

Chapter 3: Geospatial Data Lake Architecture

Chapter 3: Geospatial Data Lake Architecture

Modern data architecture overview

The AWS modern data architecture pillars

Geospatial Data Lake

Designing a geospatial data lake using modern data architecture

Chapter 4: Using Geospatial Data with Amazon Redshift

Chapter 4: Using Geospatial Data with Amazon Redshift

What is Redshift?

Understanding Redshift partitioning

Redshift Spectrum

Redshift geohashing support

Redshift geospatial support

Launching a Redshift cluster and running a geospatial query

Chapter 5: Using Geospatial Data with Amazon Aurora PostgreSQL

Chapter 5: Using Geospatial Data with Amazon Aurora PostgreSQL

Lab prerequisites

Setting up the database

Connecting to the database

Geospatial data loading

Queries and transformations

Architectural considerations

Chapter 6: Serverless Options for Geospatial

Chapter 6: Serverless Options for Geospatial

What is serverless?

Geospatial applications and S3 web hosting

Python with Lambda and API Gateway

Deploying your first serverless geospatial application

Chapter 7: Querying Geospatial Data with Amazon Athena

Chapter 7: Querying Geospatial Data with Amazon Athena

Setting up and configuring Athena

Geospatial data formats

Spatial query structure

Spatial functions

AWS service integration

Architectural considerations

Part 3: Analyzing and Visualizing Geospatial Data in AWS

Part 3: Analyzing and Visualizing Geospatial Data in AWS

Chapter 8: Geospatial Containers on AWS

Chapter 8: Geospatial Containers on AWS

Understanding containers

Deploying containers

Chapter 9: Using Geospatial Data with Amazon EMR

Chapter 9: Using Geospatial Data with Amazon EMR

Introducing Hadoop

Common Hadoop frameworks

Geospatial with EMR

Chapter 10: Geospatial Data Analysis Using R on AWS

Chapter 10: Geospatial Data Analysis Using R on AWS

Introduction to the R geospatial data analysis ecosystem

Setting up R and RStudio on EC2

RStudio on Amazon SageMaker

Analyzing and visualizing geospatial data using RStudio

Chapter 11: Geospatial Machine Learning with SageMaker

Chapter 11: Geospatial Machine Learning with SageMaker

AWS ML background

Common libraries and algorithms

Introducing Geospatial ML with SageMaker

Deploying a SageMaker Geospatial example

Architectural considerations

Chapter 12: Using Amazon QuickSight to Visualize Geospatial Data

Chapter 12: Using Amazon QuickSight to Visualize Geospatial Data

Geospatial visualization background

Amazon QuickSight overview

Connecting to your data source

Visualization layout

Putting it all together

Reports and collaboration

Part 4: Accessing Open Source and Commercial Platforms and Services

Part 4: Accessing Open Source and Commercial Platforms and Services

Chapter 13: Open Data on AWS

Chapter 13: Open Data on AWS

What is open data?

The Registry of Open Data on AWS

Analyzing open data

Federated queries with Athena

Open Data on AWS benefits

Chapter 14: Leveraging OpenStreetMap on AWS

Chapter 14: Leveraging OpenStreetMap on AWS

What is OpenStreetMap?

Accessing OSM from AWS

Application – ski lift scout

The OSM community

Architectural considerations

Chapter 15: Feature Servers and Map Servers on AWS

Chapter 15: Feature Servers and Map Servers on AWS

Types of servers and deployment options

Capabilities and cloud integrations

Deploying a container on AWS with ECR and EC2

Further reading

Chapter 16: Satellite and Aerial Imagery on AWS

Chapter 16: Satellite and Aerial Imagery on AWS

Imagery options

Architectural considerations

Demonstrating satellite imagery using AWS

Index

Other Books You May Enjoy

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Geospatial data management best practices

The single most important consideration in a data management strategy is a deep understanding of the use cases the data intends to support. Data ingestion workflows need to eliminate bottlenecks in write performance. Geospatial transformation jobs need access to powerful computational resources, and the ability to cache large amounts of data temporarily in memory. Analytics and visualization concerns require quick searching and the retrieval of geospatial data. These core disciplines of geospatial data management have benefitted from decades of fantastic work done by the community, which has driven AWS to create pathways to implement these best practices in the cloud.

Data – it’s about both quantity and quality

A long-standing anti-pattern of data management is to rely primarily on folder structures or table names to infer meaning about datasets. Having naming standards is a good thing, but it is not a substitute for a well-formed data management strategy. Naming conventions invariably change over time and are never fully able to account for the future evolution of data and the resulting taxonomy. In addition to the physical structure of the data, instrumenting your resources with predefined tags and metadata becomes crucial in cloud architectures. This is because AWS inherently provides capabilities to specify more information about your geospatial data, and many of the convenient tools and services are built to consume and understand these designations. Enriching your geospatial data with the appropriate metadata is a best practice in the cloud as it is for any GIS.

Another best practice is to quantify your data quality. Simply having a hunch that your data is good or bad is not sufficient. Mature organizations not only quantitatively describe the quality of their data with continually assessed scores but also track the scores to ensure that the quality of critical data improves over time. For example, if you have a dataset of addresses, it is important to know what percentage of the addresses are invalid. Hopefully, that percentage is 0, but very rarely is that the case. More important than having 100% accurate data is having confidence in what the quality of a given dataset is… today. Neighborhoods are being built every day. Separate buildings are torn down to create apartment complexes. Perfect data today may not be perfect data tomorrow, so the most important aspect of data quality is real-time transparency. A threshold should be set to determine the acceptable data quality based on the criticality of the dataset. High-priority geospatial data should require a high bar for quality, while infrequently used low-impact datasets don’t require the same focus. Categorizing your data based on importance allows you to establish guidelines by category. This approach will allow finite resources to be directed toward the most pressing concerns to maximize value.

People, processes, and technology are equally important

Managing geospatial data successfully in the cloud relies on more than just the technology tools offered by AWS. Designating appropriate roles and responsibilities in your organization ensures that your cloud ecosystem will be sustainable. Avoid single points of failure with respect to skills or tribal knowledge of your environment. Having at least a primary and secondary person to cover each area will add resiliency to your people operations. Not only will this allow you to have more flexibility in coverage and task assignment but it also creates training opportunities within your team and allows team members to continually learn and improve their skills.

Next, let’s move on to talk about how to stretch your geospatial dollars to do more with less.