## Chapter 5: Data Comparison Methods

### Activity 11: Create an Image Signature for a Photograph of a Person

Solution:

Download the Borges photo to your computer and save it as

**borges.jpg**. Make sure that it is saved in R's working directory. If it is not in R's working directory, then change R's working directory using the**setwd()**function. Then, you can load this image into a variable called**im**(short for image), as follows:install.packages('imager') library('imager') filepath<-'borges.jpg' im <- imager::load.image(file =filepath)

The rest of the code we will explore will use this image, called

**im**. Here, we have loaded a picture of the Alamo into**im**. However, you can run the rest of the code on any image, simply by saving the image to your working directory and specifying its path in the f**ilepath**variable.The signature we are developing is meant to be used for grayscale images. So, we will convert this image to grayscale, using functions in the

**imager**package:im<-imager::rm.alpha(im) im<-imager::grayscale(im) im<-imager::imsplit(im,axis = "x", nb = 10)

The second line of this code is the conversion to grayscale. The last line performs a split of the image into 10 equal sections.

The following code creates an empty matrix that we will fill with information about each section of our 10x10 grid:

matrix <- matrix(nrow = 10, ncol = 10)

Next, we will run the following loop. The first line of this loop uses the

**imsplit**command. This command was also used previously to split the x axis into 10 equal parts. This time, for each of the 10 x-axis splits, we will do a split along the y-axis, also splitting it into 10 equal parts:for (i in 1:10) { is <- imager::imsplit(im = im[[i]], axis = "y", nb = 10) for (j in 1:10) { matrix[j,i] <- mean(is[[j]]) }

}

The output so far is the

**matrix**variable. We will use this in*step 4*.Get the signature of the Borges photograph by running the following code:

borges_signature<-get_signature(matrix) borges_signature

The output is as follows:

Next, we will start calculating a signature using a 9x9 matrix, instead of a 10x10 matrix. We start with the same process we used before. The following lines of code load our Borges image like we did previously. The final line of this code splits the image into equal parts, but instead of 10 equal parts, we set

**nb=9**so that we split the image into 9 equal parts:filepath<-'borges.jpg' im <- imager::load.image(file =filepath) im<-imager::rm.alpha(im) im<-imager::grayscale(im) im<-imager::imsplit(im,axis = "x", nb = 9)

The following code creates an empty matrix that we will fill with information about each section of our 9x9 grid:

matrix <- matrix(nrow = 9, ncol = 9)

Note that we use

**nrow=9**and**ncol=9**so that we have a 9x9 matrix to fill with our brightness measurements.Next, we will run the following loop. The first line of this loop uses the

**imsplit**command. This command was also used earlier to split the x axis into 9 equal parts. This time, for each of the 9 x axis splits, we will do a split along the y axis, also splitting it into 9 equal parts:for (i in 1:9) { is <- imager::imsplit(im = im[[i]], axis = "y", nb = 9) for (j in 1:9) { matrix[j,i] <- mean(is[[j]]) } }

The output so far is the

**matrix**variable. We will repeat*Step 4*.Get a 9x9 signature of the Borges photograph by running the following code:

borges_signature_ninebynine<-get_signature(matrix) borges_signature_ninebynine

The output is as follows:

### Activity 12: Create an Image Signature for the Watermarked Image

Solution:

Download the watermarked photo to your computer and save it as

**alamo_marked.jpg**. Make sure that it is saved in R's working directory. If it is not in R's working directory, then change R's working directory using the**setwd()**function. Then, you can load this image into a variable called**im**(short for image), as follows:install.packages('imager') library('imager') filepath<-'alamo_marked.jpg' im <- imager::load.image(file =filepath)

The rest of the code we will explore will use this image called

**im**. Here, we have loaded a watermarked picture of the Alamo into**im**. However, you can run the rest of the code on any image, simply by saving the image to your working directory, and specifying its path in the**filepath**variable.The signature we are developing is meant to be used for grayscale images. So, we will convert this image to grayscale by using functions in the

**imager**package:im<-imager::rm.alpha(im) im<-imager::grayscale(im) im<-imager::imsplit(im,axis = "x", nb = 10)

The second line of this code is the conversion to grayscale. The last line performs a split of the image into 10 equal sections.

The following code creates an empty matrix that we will fill with information about each section of our 10x10 grid:

matrix <- matrix(nrow = 10, ncol = 10)

Next, we will run the following loop. The first line of this loop uses the

**imsplit**command. This command was also used earlier to split the x axis into 10 equal parts. This time, for each of the 10 x-axis splits, we will do a split along the y axis, also splitting it into 10 equal parts:for (i in 1:10) { is <- imager::imsplit(im = im[[i]], axis = "y", nb = 10) for (j in 1:10) { matrix[j,i] <- mean(is[[j]]) } }

The output so far is the

**matrix**variable. We will use this in*Step 4*.We can get the signature of the watermarked photograph by running the following code:

watermarked_signature<-get_signature(matrix) watermarked_signature

The output is as follows:

The final output of this activity is the

**watermarked_signature**variable, which is the analytic signature of the watermarked Alamo photo. If you have completed all of the exercises and activities so far, then you should have three analytic signatures: one called**building_signature**, one called**borges_signature**, and one called**watermarked_signature**.After completing this activity, we have stored this signature in a variable called

**watermarked_signature**. Now, we can compare it to our original Alamo signature, as follows:comparison<-mean(abs(watermarked_signature-building_signature)) comparison

In this case, the result we get is 0.015, indicating a very close match between the original image signature and this new image's signature.

What we have seen is that our analytic signature method returns similar signatures for similar images, and different signatures for different images. This is exactly what we want a signature to do, and so we can judge this method a success.

### Activity 13: Performing Factor Analysis

Solution:

The data file can be downloaded from https://github.com/TrainingByPackt/Applied-Unsupervised-Learning-with-R/tree/master/Lesson05/Data/factor.csv. Save it to your computer and make sure that it is in R's working directory. If you save it as

**factor.csv**, then you can load it in R by executing the following command:factor<-read.csv('factor.csv')

Load the

**psych**package as follows:library(psych)

We will be performing factor analysis on the user ratings, which are recorded in columns 2 through 11 of the data. We can select these columns as follows:

ratings<-factor[,2:11]

Create a correlation matrix of the ratings data as follows:

ratings_cor<-cor(ratings)

Determine the number of factors we should use by creating a scree plot. A scree plot is produced as one of the outputs of the following command:

parallel <- fa.parallel(ratings_cor, fm = 'minres', fa = 'fa')

The scree plot looks like the following:

The scree plot shows one factor whose eigenvalue is much higher than the others. While we are free to choose any number of factors in our analysis, the single factor that is much larger than the others provides good reason to use one factor in our analysis.

We can perform factor analysis as follows, specifying the number of factors in the

**nfactors**parameter:factor_analysis<-fa(ratings_cor, nfactors=1)

This stores the results of our factor analysis in a variable called

**factor_analysis**:We can examine the results of our factor analysis as follows:

print(factor_analysis)

The output looks as follows:

The numbers under

**MR1**show us the factor loadings for each category for our single factor. Since we have only one explanatory factor, all of the categories that have positive loadings on this factor are positively correlated with each other. We could interpret this factor as general positivity, since it would indicate that if people rate one category highly, they will also rate other categories highly, and if they rate one category poorly, they are likely to rate other categories poorly.

The only major exception to this rule is **Category 10**, which records users' average ratings of religious institutions. In this case, the factor loading is large and negative. This indicates that people who rate most other categories highly tend to rate religious institutions poorly, and vice versa. So, maybe we can interpret the positivity factor we have found as positivity about recreational activities, instead since religious institutions are arguably not places for recreation but rather for worship. It seems that, in this dataset, those who are positive about recreational activities are negative about worship, and vice versa. For the factor loadings that are close to 0, we can also conclude that the rule about positivity about recreation holds less strongly. You can see that factor analysis has enabled us to find relationships between the observations in our data that we had not previously suspected.