It is assumed here that you are at least familiar with the basics of R or have worked with R before. Hence, we won't be talking much about downloading and installations. There are plenty of resources on the web which provide a lot of information on this. I recommend that you use RStudio which is an **Integrated Development Environment** (**IDE**), which is much better than the base R
**Graphical User Interface** (**GUI**). You can visit https://www.rstudio.com/ to get more information about it.

### Note

For details about the R project, you can visit https://www.r-project.org/ to get an overview of the language. Besides this, R has a vast arsenal of wonderful packages at its disposal and you can view everything related to R and its packages at https://cran.r-project.org/ which contains all the archives.

You must already be familiar with the R interactive interpreter, often called a **Read-Evaluate-Print **
**Loop** (**REPL**). This interpreter acts like any command line interface which asks for input and starts with a `>`

character, which indicates that R is waiting for your input. If your input spans multiple lines, like when you are writing a function, you will see a `+`

prompt in each subsequent line, which means that you didn't finish typing the complete expression and R is asking you to provide the rest of the expression.

It is also possible for R to read and execute complete files containing commands and functions which are saved in files with an `.R`

extension. Usually, any big application consists of several `.R`

files. Each file has its own role in the application and is often called as a module. We will be exploring some of the main features and capabilities of R in the following sections.

The most basic constructs in R include variables and arithmetic operators which can be used to perform simple mathematical operations like a calculator or even complex statistical calculations.

> 5 + 6[1] 11> 3 * 2[1] 6> 1 / 0[1] Inf

Remember that everything in R is a vector. Even the output results indicated in the previous code snippet. They have a leading [1] symbol indicating it is a vector of size `1`

.

You can also assign values to variables and operate on them just like any other programming language.

> num <- 6> num ^ 2[1] 36> num[1] 6 # a variable changes value only on re-assignment> num <- num ^ 2 * 5 + 10 / 3> num[1] 183.3333

The most basic data structure in R is a vector. Basically, anything in R is a vector, even if it is a single number just like we saw in the earlier example! A vector is basically a sequence or a set of values. We can create vectors using the `:`

operator or the `c`

function which concatenates the values to create a vector.

> x <- 1:5> x[1] 1 2 3 4 5> y <- c(6, 7, 8 ,9, 10)> y[1] 6 7 8 9 10> z <- x + y> z[1] 7 9 11 13 15

You can clearly in the previous code snippet, that we just added two vectors together without using any loop, using just the `+`

operator. This is known as vectorization and we will be discussing more about this later on. Some more operations on vectors are shown next:

> c(1,3,5,7,9) * 2[1] 2 6 10 14 18> c(1,3,5,7,9) * c(2, 4)[1] 2 12 10 28 18 # here the second vector gets recycled

**Output:**

> factorial(1:5)[1] 1 2 6 24 120> exp(2:10) # exponential function[1] 7.389056 20.085537 54.598150 148.413159 403.428793 1096.633158[7] 2980.957987 8103.083928 22026.465795> cos(c(0, pi/4)) # cosine function[1] 1.0000000 0.7071068> sqrt(c(1, 4, 9, 16))[1] 1 2 3 4> sum(1:10)[1] 55

You might be confused with the second operation where we tried to multiply a smaller vector with a bigger vector but we still got a result! If you look closely, R threw a warning also. What happened in this case is, since the two vectors were not equal in size, the smaller vector in this case `c(2, 4)`

got recycled or repeated to become `c(2, 4, 2, 4, 2)`

and then it got multiplied with the first vector `c(1, 3, 5, 7 ,9)`

to give the final result vector, `c(2, 12, 10, 28, 18)`

. The other functions mentioned here are standard functions available in base R along with several other functions.

### Tip

**Downloading the example code**

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the

**SUPPORT**tab at the topClick on

**Code Downloads & Errata**Enter the name of the book in the

**Search**boxSelect the book for which you're looking to download the code files

Choose from the drop-down menu where you purchased this book from

Click on

**Code Download**

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

Since you will be dealing with a lot of messy and dirty data in data analysis and machine learning, it is important to remember some of the special values in R so that you don't get too surprised later on if one of them pops up.

> 1 / 0[1] Inf> 0 / 0[1] NaN> Inf / NaN[1] NaN> Inf / Inf[1] NaN> log(Inf)[1] Inf> Inf + NA[1] NA

The main values which should concern you here are `Inf`

which stands for **Infinity**, `NaN`

which is **Not a Number,** and `NA`

which indicates a value that is missing or **Not Available**. The following code snippet shows some logical tests on these special values and their results. Do remember that `TRUE`

and `FALSE`

are logical data type values, similar to other programming languages.

> vec <- c(0, Inf, NaN, NA)> is.finite(vec)[1] TRUE FALSE FALSE FALSE> is.nan(vec)[1] FALSE FALSE TRUE FALSE> is.na(vec)[1] FALSE FALSE TRUE TRUE> is.infinite(vec)[1] FALSE TRUE FALSE FALSE

The functions are pretty self-explanatory from their names. They clearly indicate which values are finite, which are finite and checks for `NaN`

and `NA`

values respectively. Some of these functions are very useful when cleaning dirty data.