-
Book Overview & Buying
-
Table Of Contents
The Statistics and Machine Learning with R Workshop
By :
Character-typed strings are standard in real-life data, such as name and address. Analyzing string data requires properly cleaning the raw characters and converting the information embedded in a blob of textual data into a quantifiable numeric summary. For example, we may want to find the matching names of all students that follow a specific pattern.
This section will cover different ways to define patterns via regular expressions to detect, split, and extract string data. Let’s start with the basics of strings.
A string is a character-typed variable that is represented by a sequence of characters (including punctuation) wrapped by a pair of double quotes (""). Sometimes, a single quote (') is also used to denote a string, although it is generally recommended to use double quotes unless the characters themselves include double quotes.
There are multiple ways to create a string. The following exercise introduces...