6.10 Regexp Pattern Matching
20180608 One of the most powerful string processing concepts is the concept of regular expressions. A regular expression is a sequence of characters that describe a pattern. The concept was formalised by the American mathematician Stephen Cole Kleene. A regular expression pattern can contain a combination of alphanumeric and special characters. It is a complex topic and we take an introductory look at it here to craft regular expressions in R.
An important concept is that of metacharacters which have special meaning within a regular expression. Unlike other characters that are used to match themselves, metacharacters have a specific meaning beyond the character they represent. The following table contains a list of common metacharacters used in regular expressions.% latex table generated in R 4.1.0 by xtable 1.8-4 package % Sat Jul 31 17:36:54 2021
Such metacharacters are used to match different patterns which can be
found using base::grep(). According to
g/re/p is a command from the
command line tool
ed to get the regular
expression and print it.
<- c("hands", "data", "on", "data$cience", "handsondata$cience", "handson") s grep(pattern="^data", s, value=TRUE)
##  "data" "data$cience"
grep(pattern="on$", s, value=TRUE)
##  "on" "handson"
grep(pattern="(nd)..(nd)", s, value=TRUE)
##  "handsondata$cience"
In order to match a metacharacter in R we need to escape it with \(\backslash\backslash\) (double backslash).
grep(pattern="\\$", s, value=TRUE)
##  "data$cience" "handsondata$cience"
Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.