Book Image

Geospatial Data Science Quick Start Guide

By : Abdishakur Hassan, Jayakrishnan Vijayaraghavan
Book Image

Geospatial Data Science Quick Start Guide

By: Abdishakur Hassan, Jayakrishnan Vijayaraghavan

Overview of this book

Data scientists, who have access to vast data streams, are a bit myopic when it comes to intrinsic and extrinsic location-based data and are missing out on the intelligence it can provide to their models. This book demonstrates effective techniques for using the power of data science and geospatial intelligence to build effective, intelligent data models that make use of location-based data to give useful predictions and analyses. This book begins with a quick overview of the fundamentals of location-based data and how techniques such as Exploratory Data Analysis can be applied to it. We then delve into spatial operations such as computing distances, areas, extents, centroids, buffer polygons, intersecting geometries, geocoding, and more, which adds additional context to location data. Moving ahead, you will learn how to quickly build and deploy a geo-fencing system using Python. Lastly, you will learn how to leverage geospatial analysis techniques in popular recommendation systems such as collaborative filtering and location-based recommendations, and more. By the end of the book, you will be a rockstar when it comes to performing geospatial analysis with ease.
Table of Contents (9 chapters)

Exploratory data analysis

For this chapter, we will be using curated data from the New York taxi trip dataset provided by the city of New York. The original source for this data can be found here: https://data.cityofnewyork.us/api/odata/v4/hvrh-b6nb.

Visit the following website for more details about the data that's included in this dataset: https://data.cityofnewyork.us/Transportation/2016-Green-Taxi-Trip-Data/hvrh-b6nb.

For starters, let's have a peek at the data at hand using pandas. The curated data (NYC_sample.csv) that we will be using here can be found at the following download link: https://drive.google.com/file/d/1OkkYZJEcsdCkU0V42eP6pj6YaK2WCGCE/view.

df = pd.read_csv("NYC_sample.csv")
df.head().T

The curated New York taxi trip data that we are using has around 1.14 million records and has columns related to taxi fare, as well as trip duration, as...