Book Image

R for Data Science Cookbook (n)

By : Yu-Wei, Chiu (David Chiu)
Book Image

R for Data Science Cookbook (n)

By: Yu-Wei, Chiu (David Chiu)

Overview of this book

This cookbook offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently. The first section deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We also focus on “ggplot2” and show you how to create advanced figures for data exploration. In addition, you will learn how to build an interactive report using the “ggvis” package. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction. By the end of this book, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.
Table of Contents (19 chapters)
R for Data Science Cookbook
About the Author
About the Reviewer

Understanding environments

Besides the function name, body, and formal arguments, the environment is another basic component of a function. In a nutshell, the environment is where R manages and stores different types of variables. Besides the global environment, each function activates its environment whenever a new function is created. In this recipe, we will show you how the environment of each function works.

Getting ready

Ensure that you completed the previous recipes by installing R on your operating system.

How to do it...

Perform the following steps to work with the environment:

  1. First, you can examine the current environment with the environment function:

    <environment: R_GlobalEnv>
  2. You can also browse the global environment with .GlobalEnv and globalenv:

    > .GlobalEnv
    <environment: R_GlobalEnv>
    <environment: R_GlobalEnv>
  3. You can compare the environment with the identical function:

    >identical(globalenv(), environment())
    [1] TRUE
  4. Furthermore, you can create a new environment as follows:

    >myenv<- new.env()
    <environment: 0x0000000017e3bb78>
  5. Next, you can find the variables of different environments:

    >myenv$x<- 3
    [1] "x"
    [1] "myenv"
    Error: object 'x' not found
  6. At this point, you can create an addnum function and use environment to get the environment of the function:

    >addnum<- function(x, y){
    + x+y
    + }
    <environment: R_GlobalEnv>
  7. You can also determine that the environment of a function belongs to the package:

    <environment: namespace:stats>
  8. Moving on, you can print the environment within a function:

    >addnum2<- function(x, y){
    + print(environment())
    + x+y
    + }
    <environment: 0x0000000018468710>
    [1] 5
  9. Furthermore, you can compare the environment inside and outside a function:

    >addnum3<- function(x, y){
    + func1<- function(x){
    + print(environment())
    + }
    + func1(x)
    + print(environment())
    + x + y
    + }
    <environment: 0x000000001899beb0>
    <environment: 0x000000001899cc50>
    [1] 7

How it works...

We can regard an R environment as a place to store and manage variables. That is, whenever we create an object or a function in R, we add an entry to the environment. By default, the top-level environment is the R_GlobalEnv global environment, and we can determine the current environment using the environment function. Then, we can use either .GlobalEnv or globalenv to print the global environment, and we can compare the environment with the identical function.

Besides the global environment, we can actually create our environment and assign variables into the new environment. In the example, we created the myenv environment and then assigned x <- 3 to myenv by placing a dollar sign after the environment name. This allows us to use the ls function to list all variables in myenv and global environment. At this point, we find x in myenv, but we can only find myenv in the global environment.

Moving on, we can determine the environment of a function. By creating a function called addnum, we can use environment to get its environment. As we created the function under global environment, the function obviously belongs to the global environment. On the other hand, when we get the environment of the lm function, we get the package name instead. That means that the lm function is in the namespace of the stat package.

Furthermore, we can print out the current environment inside a function. By invoking the addnum2 function, we can determine that the environment function outputs a different environment name from the global environment. That is, when we create a function, we also create a new environment for the global environment and link a pointer to its parent environment. To further examine this characteristic, we create another addnum3 function with a func1 nested function inside. At this point, we can print out the environment inside func1 and addnum3, and it is possible that they have completely different environments.

There's more...

To get the parent environment, we can use the parent.env function. In the following example, we can see that the parent environment of parentenv is R_GlobalEnv:

>parentenv<- function(){
+ e <- environment()
+ print(e)
+ print(parent.env(e))
+ }
<environment: 0x0000000019456ed0>
<environment: R_GlobalEnv>