Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## Regexp Pattern Matching

20180608 One of the most powerful string processing concepts is the concept of regular expressions. A regular expression is a sequence of characters that describe a pattern. The concept was formalised by the American mathematician Stephen Cole Kleene. A regular expression pattern can contain a combination of alphanumeric and special characters. It is a complex topic and we take an introductory look at it here to craft regular expressions in R. An important concept is that of metacharacters which have special meaning within a regular expression. Unlike other characters that are used to match themselves, metacharacters have a specific meaning beyond the character they represent. The following table contains a list of common metacharacters used in regular expressions.

 Metacharacter Description 1 `^` Matches at the start of the string 2 \$ Matches at the end of the string 3 () Define a subexpression to be matched and retrieved later. 4 Matches the pattern before or pattern after 5 [ ] Matches a single character that is contained within bracket 6 . Matches any single character

Such metacharacters are used to match different patterns which can be found using
base::grep(). According to gnu.org/software/grep `g/re/p` is a command from the command line tool ed to get the regular expression and print it.
s <- c("hands", "data", "on", "data\$cience", "handsondata\$cience", "handson")
grep(pattern="^data", s, value=TRUE)
 ```## [1] "data" "data\$cience" ```
grep(pattern="on\$", s, value=TRUE)
 ```## [1] "on" "handson" ```
grep(pattern="(nd)..(nd)", s, value=TRUE)
 ```## [1] "handsondata\$cience" ```

In order to match a metacharacter in R we need to escap it with (double backslash).

grep(pattern="\\\$", s, value=TRUE)
 ```## [1] "data\$cience" "handsondata\$cience" ```