Book Image

RStudio for R Statistical Computing Cookbook

By : Andrea Cirillo
Book Image

RStudio for R Statistical Computing Cookbook

By: Andrea Cirillo

Overview of this book

The requirement of handling complex datasets, performing unprecedented statistical analysis, and providing real-time visualizations to businesses has concerned statisticians and analysts across the globe. RStudio is a useful and powerful tool for statistical analysis that harnesses the power of R for computational statistics, visualization, and data science, in an integrated development environment. This book is a collection of recipes that will help you learn and understand RStudio features so that you can effectively perform statistical analysis and reporting, code editing, and R development. The first few chapters will teach you how to set up your own data analysis project in RStudio, acquire data from different data sources, and manipulate and clean data for analysis and visualization purposes. You'll get hands-on with various data visualization methods using ggplot2, and you will create interactive and multidimensional visualizations with D3.js. Additional recipes will help you optimize your code; implement various statistical models to manage large datasets; perform text analysis and predictive analysis; and master time series analysis, machine learning, forecasting; and so on. In the final few chapters, you'll learn how to create reports from your analytical application with the full range of static and dynamic reporting tools that are available in RStudio so that you can effectively communicate results and even transform them into interactive web applications.
Table of Contents (15 chapters)
RStudio for R Statistical Computing Cookbook
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Accessing an API with R


As we mentioned before, an always increasing proportion of our data resides on the Web and is made available through web APIs.

Note

APIs in computer programming are intended to be APIs, groups of procedures, protocols, and software used for software application building. APIs expose software in terms of input, output, and processes.

Web APIs are developed as an interface between web applications and third parties.

The typical structure of a web API is composed of a set of HTTP request messages that have answers with a predefined structure, usually in the XML or JSON format.

A typical use case for API data contains data regarding web and mobile applications, for instance, Google Analytics data or data regarding social networking activities.

The successful web application If This ThenThat (IFTTT), for instance, lets you link together different applications, making them share data with each other and building powerful and customizable workflows:

This useful job is done by leveraging the application's API (if you don't know IFTTT, just navigate to https://ifttt.com, and I will see you there).

Using R, it is possible to authenticate and get data from every API that adheres to the OAuth 1 and OAuth 2 standards, which are nowadays the most popular standards (even though opinions about these protocols are changing; refer to this popular post by the OAuth creator Blain Cook at http://hueniverse.com/2012/07/26/oauth-2-0-and-the-road-to-hell/). Moreover, specific packages have been developed for a lot of APIs.

This recipe shows how to access custom APIs and leverage packages developed for specific APIs.

In the There's more... section, suggestions are given on how to develop custom functions for frequently used APIs.

Getting ready

The rvest package, once again a product of our benefactor Hadley Whickham, provides a complete set of functionalities for sending and receiving data through the HTTP protocol on the Web. Take a look at the quick-start guide hosted on GitHub to get a feeling of rvest functionalities (https://github.com/hadley/rvest).

Among those functionalities, functions for dealing with APIs are provided as well.

Both OAuth 1.0 and OAuth 2.0 interfaces are implemented, making this package really useful when working with APIs.

Let's look at how to get data from the GitHub API. By changing small sections, I will point out how you can apply it to whatever API you are interested in.

Let's now actually install the rvest package:

install.packages("rvest")
library(rvest)

How to do it…

  1. The first step to connect with the API is to define the API endpoint. Specifications for the endpoint are usually given within the API documentation. For instance, GitHub gives this kind of information at http://developer.github.com/v3/oauth/.

    In order to set the endpoint information, we are going to use the oauth_endpoint() function, which requires us to set the following arguments:

    • request: This is the URL that is required for the initial unauthenticated token. This is deprecated for OAuth 2.0, so you can leave it NULL in this case, since the GitHub API is based on this protocol.

    • authorize: This is the URL where it is possible to gain authorization for the given client.

    • access: This is the URL where the exchange for an authenticated token is made.

    • base_url: This is the API URL on which other URLs (that is, the URLs containing requests for data) will be built upon.

      In the GitHub example, this will translate to the following line of code:

      github_api <- oauth_endpoint(request   = NULL, 
                                   authorize =          "https://github.com/login/oauth/authorize",                     access    = "https://github.com/login/oauth/access_token",
                                   base_url  =  "https://github.com/login/oauth")
  2. Create an application to get a key and secret token. Moving on with our GitHub example, in order to create an application, you will have to navigate to https://github.com/settings/applications/new (assuming that you are already authenticated on GitHub).

    Be aware that no particular URL is needed as the homepage URL, but a specific URL is required as the authorization callback URL.

    This is the URL that the API will redirect to after the method invocation is done.

    As you would expect, since we want to establish a connection from GitHub to our local PC, you will have to redirect the API to your machine, setting the Authorization callback URL to http://localhost:1410.

    After creating your application, you can get back to your R session to establish a connection with it and get your data.

  3. After getting back to your R session, you now have to set your OAuth credentials through the oaut_app() and oauth2.0_token() functions and establish a connection with the API, as shown in the following code snippet:

    app <- oauth_app("your_app_name",
      key = "your_app_key",
      secret = "your_app_secret")
      API_token <- oauth2.0_token(github_api,app)
  4. This is where you actually use the API to get data from your web-based software. Continuing on with our GitHub-based example, let's request some information about API rate limits:

    request <- GET("https://api.github.com/rate_limit", config(token = API_token))

How it works...

Be aware that this step will be required both for OAuth 1.0 and OAuth 2.0 APIs, as the difference between them is only the absence of a request URL, as we noted earlier.

Note

Endpoints for popular APIs

The httr package comes with a set of endpoints that are already implemented for popular APIs, and specifically for the following websites:

  • LinkedIn

  • Twitter

  • Vimeo

  • Google

  • Facebook

  • GitHub

For these APIs, you can substitute the call to oauth_endpoint() with a call to the oauth_endpoints() function, for instance:

oauth_endpoints("github")

The core feature of the OAuth protocol is to secure authentication. This is then provided on the client side through a key and secret token, which are to be kept private.

The typical way to get a key and a secret token to access an API involves creating an app within the service providing the API.

The callback URL

Within the web API domain, a callback URL is the URL that is called by the API after the answer is given to the request. A typical example of a callback URL is the URL of the page navigated to after completing an online purchase.

In this example, when we finish at the checkout on the online store, an API call is made to the payment circuit provider.

After completing the payment operation, the API will navigate again to the online store at the callback URL, usually to a thank you page.

There's more...

You can also write custom functions to handle APIs. When frequently dealing with a particular API, it can be useful to define a set of custom functions in order to make it easier to interact with.

Basically, the interaction with an API can be summarized with the following three categories:

  • Authentication

  • Getting content from the API

  • Posting content to the API

Authentication can be handled by leveraging the HTTR package's authenticate() function and writing a function as follows:

api_auth    function (path = "api_path", password){
authenticate(user = path, password)
}

You can get the content from the API through the get function of the httr package:

api_get <- function(path = "api_path",password){
auth <- api_auth(path, password )
request <- GET("https://api.com", path = path, auth)

}

Posting content will be done in a similar way through the POST function:

api_post <- function(Path, post_body, path = "api_path",password){
auth <- api_auth(pat) stopifnot(is.list(body)) 
body_json <- jsonlite::toJSON(body) 
request <- POST("https://api.application.com", path = path, body = body_json, auth, post, ...) 
}