Book Image

Python Web Penetration Testing Cookbook

By : Benjamin May, Cameron Buchanan, Andrew Mabbitt, Dave Mound, Terry Ip
Book Image

Python Web Penetration Testing Cookbook

By: Benjamin May, Cameron Buchanan, Andrew Mabbitt, Dave Mound, Terry Ip

Overview of this book

Table of Contents (16 chapters)
Python Web Penetration Testing Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Harvesting additional results from the Google+ API using pagination


By default, the Google+ APIs return a maximum of 25 results, but we can extend the previous scripts by increasing the maximum value and harvesting more results through pagination. As before, we will communicate with the Google+ API through a URL and the urllib library. We will create arbitrary numbers that will increase as requests go ahead, so we can move across pages and gather more results.

How to do it

The following script shows how you can harvest additional results from the Google+ API:

import urllib2
import json

GOOGLE_API_KEY = "{Insert your Google API key}"
target = "packtpub.com"
token = ""
loops = 0

while loops < 10:
  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

  if len(json_response['items']) == 0:
    break

  for result in json_response['items']:
        name = result['displayName']
        print name
        image = result['image']['url'].split('?')[0]
    f = open(name+'.jpg','wb+')
    f.write(urllib2.urlopen(image).read())
  loops+=1

How it works

The first big change in this script that is the main code has been moved into a while loop:

token = ""
loops = 0

while loops < 10:

Here, the number of loops is set to a maximum of 10 to avoid sending too many requests to the API servers. This value can of course be changed to any positive integer. The next change is to the request URL itself; it now contains two additional trailing parameters maxResults and pageToken. Each response from the Google+ API contains a pageToken value, which is a pointer to the next set of results. Note that if there are no more results, a pageToken value is still returned. The maxResults parameter is self-explanatory, but can only be increased to a maximum of 50:

  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

The next part reads the same as before in the JSON response, but this time it also extracts the nextPageToken value:

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

The main while loop can stop if the loops variable increases up to 10, but sometimes you may only get one page of results. The next part in the code checks to see how many results were returned; if there were none, it exits the loop prematurely:

  if len(json_response['items']) == 0:
    break

Finally, we ensure that we increase the value of the loops integer each time. A common coding mistake is to leave this out, meaning the loop will continue forever:

  loops+=1