Book Image

Learning R Programming

By : Kun Ren
Book Image

Learning R Programming

By: Kun Ren

Overview of this book

R is a high-level functional language and one of the must-know tools for data science and statistics. Powerful but complex, R can be challenging for beginners and those unfamiliar with its unique behaviors. Learning R Programming is the solution - an easy and practical way to learn R and develop a broad and consistent understanding of the language. Through hands-on examples you'll discover powerful R tools, and R best practices that will give you a deeper understanding of working with data. You'll get to grips with R's data structures and data processing techniques, as well as the most popular R packages to boost your productivity from the offset. Start with the basics of R, then dive deep into the programming techniques and paradigms to make your R code excel. Advance quickly to a deeper understanding of R's behavior as you learn common tasks including data analysis, databases, web scraping, high performance computing, and writing documents. By the end of the book, you'll be a confident R programmer adept at solving problems with the right techniques.
Table of Contents (21 chapters)
Learning R Programming
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Using regular expressions


For research, you may need to download data from open-access websites or authentication-required databases. These data sources provide data in various formats, and most of the data supplied are very likely well-organized. For example, many economic and financial databases provide data in the CSV format, which is a widely supported text format to represent tabular data. A typical CSV format looks like this:

id,name,score 
1,A,20 
2,B,30 
3,C,25

In R, it is convenient to call read.csv() to import a CSV file as a data frame with the right header and data types because the format is a natural representation of a data frame.

However, not all data files are well organized, and dealing with poorly organized data is painstaking. Built-in functions such as read.table() and read.csv() work in many situations, but they may not help at all for such format-less data.

For example, if you need to analyze raw data (messages.txt) organized in a CSV-like format as shown...