Book Image

RStudio for R Statistical Computing Cookbook

By : Andrea Cirillo
Book Image

RStudio for R Statistical Computing Cookbook

By: Andrea Cirillo

Overview of this book

The requirement of handling complex datasets, performing unprecedented statistical analysis, and providing real-time visualizations to businesses has concerned statisticians and analysts across the globe. RStudio is a useful and powerful tool for statistical analysis that harnesses the power of R for computational statistics, visualization, and data science, in an integrated development environment. This book is a collection of recipes that will help you learn and understand RStudio features so that you can effectively perform statistical analysis and reporting, code editing, and R development. The first few chapters will teach you how to set up your own data analysis project in RStudio, acquire data from different data sources, and manipulate and clean data for analysis and visualization purposes. You'll get hands-on with various data visualization methods using ggplot2, and you will create interactive and multidimensional visualizations with D3.js. Additional recipes will help you optimize your code; implement various statistical models to manage large datasets; perform text analysis and predictive analysis; and master time series analysis, machine learning, forecasting; and so on. In the final few chapters, you'll learn how to create reports from your analytical application with the full range of static and dynamic reporting tools that are available in RStudio so that you can effectively communicate results and even transform them into interactive web applications.
Table of Contents (15 chapters)
RStudio for R Statistical Computing Cookbook
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Getting data from Google Analytics


Google Analytics is a powerful analytics solution that gives you really detailed insights into how your online content is performing. However, besides a tabular format and a data visualization tool, no other instruments are available to model your data and gain more powerful insights.

This is where R comes to help, and this is why the RGoogleAnalytics package was developed: to provide a convenient way to extract data from Google Analytics into an R environment.

As an example, we will import data from Google Analytics into R regarding the daily bounce rate for a website in a given time range.

Getting ready

As a preliminary step, we are going to install and load the RGoogleAnalytics package:

install.packages("RGoogeAnalytics")
library(RGoogleAnalytics)

How to do it...

  1. The first step that is required to get data from Google Analytics is to create a Google Analytics application.

    This can be easily obtained from (assuming that you are already logged in to Google Analytics) https://console.developers.google.com/apis.

    After creating a new project, you will see a dashboard with a left menu containing among others the APIs & auth section, with the APIs subsection.

    After selecting this section, you will see a list of available APIs, and among these, at the bottom-left corner of the page, there will be the Advertising APIs with the Analytics API within it:

    After enabling the API, you will have to go back to the APIs & auth section and select the Credentials subsection.

    In this section, you will have to add an OAuth client ID, select Other, and assign a name to your app:

    After doing that and selecting the Create button, you will be prompted with a window showing your app ID and secret. Take note of them, as you will need them to access the analytics API from R.

  2. In order to authenticate on the API, we will leverage the Auth() function, providing the annotated ID and secret:

    ga_token ← Auth(client.id = "the_ID", client.secret = "the_secret")

    At this point, a browser window will open up and ask you to allow access permission from the app to your Google Analytics account.

    After you allow access, the R console will print out the following:

    Authentication complete
  3. This last step basically requires you to shape a proper query and submit it through the connection established in the previous paragraphs. A Google Analytics query can be easily built, leveraging the powerful Google Query explorer which can be found at https://ga-dev-tools.appspot.com/query-explorer/.

    This web tool lets you experiment with query parameters and define your query before submitting the request from your code.

    The basic fields that are mandatory in order to execute a query are as follows:

    • The view ID: This is a unique identifier associated with your Google Analytics property. This ID will automatically show up within Google Query Explorer.

    • Start-date and end-date: This is the start and end date in the form YYYY-MM-DD, for example, 2012-05-12.

    • Metrics: This refers to the ratios and numbers computed from the data related to visits within the date range. You can find the metrics code in Google Query Explorer.

    If you are going to further elaborate your data within your data project, you will probably find it useful to add a date dimension ("ga:date") in order to split your data by date.

    Having defined your arguments, you will just have to pack them in a list using the init() function, build a query using the QueryBuilder() function, and submit it with the GetReportData() function:

    query_parameters <- Init(start.date = "2015-01-01",
                             end.date   = "2015-06-30",
                             metrics    =   "ga:sessions,
                                             ga:bounceRate",
                             dimensions = "ga:date",
                             table.id = "ga:33093633")
    ga_query <- QueryBuilder(query_parameters)
    ga_df <- GetReportData(ga_query, ga_token)
    

    The first representation of this data could be a simple plot of data that will result in a representation of the bounce rate for each day from the start date to the end date:

    plot(ga_df)
    

There's more...

Google Analytics is a complete and always-growing set of tools for performing web analytics tasks. If you are facing a project involving the use of this platform, I would definitely suggest that you take the time to go through the official tutorial from Google at https://analyticsacademy.withgoogle.com.

This complete set of tutorials will introduce you to the fundamental logic and assumptions of the platform, giving you a solid foundation for any of the following analysis.