Book Image

R Deep Learning Cookbook

By : PKS Prakash, Achyutuni Sri Krishna Rao
Book Image

R Deep Learning Cookbook

By: PKS Prakash, Achyutuni Sri Krishna Rao

Overview of this book

Deep Learning is the next big thing. It is a part of machine learning. It's favorable results in applications with huge and complex data is remarkable. Simultaneously, R programming language is very popular amongst the data miners and statisticians. This book will help you to get through the problems that you face during the execution of different tasks and Understand hacks in deep learning, neural networks, and advanced machine learning techniques. It will also take you through complex deep learning algorithms and various deep learning packages and libraries in R. It will be starting with different packages in Deep Learning to neural networks and structures. You will also encounter the applications in text mining and processing along with a comparison between CPU and GPU performance. By the end of the book, you will have a logical understanding of Deep learning and different deep learning packages to have the most appropriate solutions for your problems.
Table of Contents (17 chapters)
Title Page
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Analyzing documents using tf-idf


In this section, we will learn how to analyze documents quantitatively. A simple way is to look at the distribution of unigram words across the document and their frequency of occurrence, also termed as term frequency (tf). The words with higher frequency of occurrence generally tend to dominate the document.

However, one would disagree in case of generally occurring words such as the, is, of, and so on. Hence, these are removed by stop word dictionaries. Apart from these stop words, there might be some specific words that are more frequent with less relevance. Such kinds of words are penalized using their inverse document frequency (idf) values. Here, the words with higher frequency of occurrence are penalized.

Note

The statistic tf-idf combines these two quantities (by multiplication) and provides a measure of importance or relevance of each word for a given document across multiple documents (or a corpus).

In this section, we will generate a tf-idf matrix...