Book Image

Geospatial Data Science Quick Start Guide

By : Abdishakur Hassan, Jayakrishnan Vijayaraghavan
Book Image

Geospatial Data Science Quick Start Guide

By: Abdishakur Hassan, Jayakrishnan Vijayaraghavan

Overview of this book

Data scientists, who have access to vast data streams, are a bit myopic when it comes to intrinsic and extrinsic location-based data and are missing out on the intelligence it can provide to their models. This book demonstrates effective techniques for using the power of data science and geospatial intelligence to build effective, intelligent data models that make use of location-based data to give useful predictions and analyses. This book begins with a quick overview of the fundamentals of location-based data and how techniques such as Exploratory Data Analysis can be applied to it. We then delve into spatial operations such as computing distances, areas, extents, centroids, buffer polygons, intersecting geometries, geocoding, and more, which adds additional context to location data. Moving ahead, you will learn how to quickly build and deploy a geo-fencing system using Python. Lastly, you will learn how to leverage geospatial analysis techniques in popular recommendation systems such as collaborative filtering and location-based recommendations, and more. By the end of the book, you will be a rockstar when it comes to performing geospatial analysis with ease.
Table of Contents (9 chapters)

Location data intelligence

Every industry uses location intelligence. It helps industries understand what their customers are doing, where their customers are based, what the geographic environment of their customers is, and what their interests are. Location intelligence is normally defined as using location data with other attributes to add context and derive useful information, services, and products that help organizations make effective and efficient decisions. The information derived through location intelligence can have a business and economic insights as well as environmental and social insights.

Application of location data intelligence

To illustrate how location intelligence is applied in a real-world application, we will take as an example Foursquare check-ins. Foursquare initially started in 2009 as a social platform to collect user check-ins and provide guides and search-results for its users to recommend places to visit near the user's current location. However, recently, Foursquare repositioned itself as a less social platform to a location intelligence company. The company describes itself as a "technology company that uses location intelligence to build meaningful consumer experiences and business solutions" and claims the following:

"If it tells you where, it's probably built on Foursquare."

In its anonymized and aggregated trends of check-ins in physical brand locations, Foursquare provides insights and metrics that were not easily available before. Take, for example, the loyalty of customers, frequency of their visits, brand losses, and profits. This allows analysts and brands to understand their customers, reveal demographic insights and track patterns of customers, and look into and understand competition brands. To illustrate how powerful location intelligence is, let's explore a subset of Foursquare data in NYC. We will use this dataset later in Chapter 3, Performing Spatial Operations Like a Pro, but for now let's look into what it consists and how location intelligence is derived from it.

The NYC Foursquare check-in dataset has 10 months' worth of data spanning from April 12, 2012 to February 16, 2013.

Source: NYC Foursquare Check-in dataset first appeared in Fine-Grained Preference-Aware Location Search Leveraging Crowdsourced Digital Footprints from LBSN, Dingqi Yang, Daqing Zhang, Zhiyong Yu, and Zhiwen Yu, proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2013), September 8 to 12, 2013, in Zurich, Switzerland.

The following table shows the first five rows of the data and consists of eight columns with a unique UserID and VenueID. Both of these features are anonymized for privacy issues; VenueCategoryID and VenueCategoryName indicate aggregated types of business. Here, we have more than 250 business types, including a medical center, arts store, burger joint, hardware store, and so on; Latitude and Longitude columns store the geographic coordinates of the venues.

The last two columns indicate the time of the check-in:

Foursquare data: first five rows

Here, we have the first five rows of the Foursquare data. In this chapter, we will only look at the data from a wider perspective. The code for this chapter is available, but you do not need to understand it right now. We will come to learn the details of reading and processing location data with Python in the next chapters.

So, what kind of location intelligence can be derived from this type of data? We will cover this from two broad perspectives: the user/customer perspective and the venue/business perspective.

User or customer perspective

Here we will get a clear idea from a customer perspective. Often the following questions will come into picture:

Where does customer X spend his/her time? What does this place offer? How often does he/she visit these places? When does he/she visit these places?

The code of this section is available in the accompanying Jupyter Notebook. You do not need to understand all the code right now, as it serves to give you the bigger picture of location data intelligence. Feel free to consult Jupyter Notebook of this chapter if you want to run the code and experiment.

Let's take an example for the UserID = 395 from the fourth row in the preceding table. This particular user has made 106 check-ins in total during this period of the dataset visiting 36 unique venues in NY (visualized as the map as follows):



User 395: Venues visited in NY

We can also look at what type of venues this particular user has visited. In this case, this user has visited frequently an office, a residential building, and a gym, in NYC. Other less-visited venues include an airport, outdoors, a medical center, and many others, as you can see from the following graph:

User 395: Check-ins plot

The user perspective can elicit many aspects related to the frequency of visits, preferences, and activities of the user that can guide location intelligence and decision making. Privacy issues in location data are very sensitive and require diligence. In this case, although it is anonymized data, it still reveals patterns and other useful information as we have shown. Now let's also look from the business perspective in the following section.

Venue or business perspective

Here we will get a clear idea from a business perspective. Often the following questions will come into picture:

How many customers does venue X Receive per day? What about per hour? What is the pattern? Can we estimate business value based on the check-ins?

We will use a gym venue as an example here, with VenueID = 4aca718ff964a520f6c120e3. For this dataset, this gym has 118 check-ins. Although the data is small and cannot be generalizable in this particular VenueID, imagine it has enough data for a longer period of time. We can estimate the peak times of this gym as the following graph shows. There is a peak of check-ins at 14:00 and at 20:00:

Gym visit check-ins: per hour

This kind of business perspective analysis helps both decision makers and competitors to gain an insight into businesses. This is only an individual business example, but this can simply be extended to businesses in this dataset and look further into it. In fact, Foursquare predicted Chipotle's sales (link available in the information box), a Mexican grill, to drop 30% during the months of 2016 before the company announced its loss.

Foursquare predicted Chipotle's Sales Will Plummet 30%: http://fortune.com/2016/04/15/chipotle-foursquare-swarm/.

Let's now look at how location data science is different than data science in the next section.