## Regexp Pattern Matching

20180608 One of the most powerful string processing concepts is the concept of regular expressions. A regular expression is a sequence of characters that describe a pattern. The concept was formalised by the American mathematician Stephen Cole Kleene. A regular expression pattern can contain a combination of alphanumeric and special characters. It is a complex topic and we take an introductory look at it here to craft regular expressions in R. An important concept is that of metacharacters which have special meaning within a regular expression. Unlike other characters that are used to match themselves, metacharacters have a specific meaning beyond the character they represent. The following table contains a list of common metacharacters used in regular expressions.

 Metacharacter Description 1 `^` Matches at the start of the string 2 \$ Matches at the end of the string 3 () Define a subexpression to be matched and retrieved later. 4 Matches the pattern before or pattern after 5 [ ] Matches a single character that is contained within bracket 6 . Matches any single character

Such metacharacters are used to match different patterns which can be found using
base::grep(). According to gnu.org/software/grep `g/re/p` is a command from the command line tool ed to get the regular expression and print it.
s <- c("hands", "data", "on", "data\$cience", "handsondata\$cience", "handson")
grep(pattern="^data", s, value=TRUE)
 ```## [1] "data" "data\$cience" ```
grep(pattern="on\$", s, value=TRUE)
 ```## [1] "on" "handson" ```
grep(pattern="(nd)..(nd)", s, value=TRUE)
 ```## [1] "handsondata\$cience" ```

In order to match a metacharacter in R we need to escap it with (double backslash).

grep(pattern="\\\$", s, value=TRUE)
 ```## [1] "data\$cience" "handsondata\$cience" ```