Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Regexp Pattern Matching

20180608 One of the most powerful string processing concepts is the concept of regular expressions. A regular expression is a sequence of characters that describe a pattern. The concept was formalised by the American mathematician Stephen Cole Kleene. A regular expression pattern can contain a combination of alphanumeric and special characters. It is a complex topic and we take an introductory look at it here to craft regular expressions in R. An important concept is that of metacharacters which have special meaning within a regular expression. Unlike other characters that are used to match themselves, metacharacters have a specific meaning beyond the character they represent. The following table contains a list of common metacharacters used in regular expressions.

  Metacharacter Description
1 ^ Matches at the start of the string
2 $ Matches at the end of the string
3 () Define a subexpression to be matched and retrieved later.
4 $\vert$ Matches the pattern before or pattern after
5 [ ] Matches a single character that is contained within bracket
6 . Matches any single character

Such metacharacters are used to match different patterns which can be found using
base::grep(). According to gnu.org/software/grep g/re/p is a command from the command line tool ed to get the regular expression and print it.
s <- c("hands", "data", "on", "data$cience", "handsondata$cience", "handson")
grep(pattern="^data", s, value=TRUE)
## [1] "data"        "data$cience"
grep(pattern="on$", s, value=TRUE)
## [1] "on"      "handson"
grep(pattern="(nd)..(nd)", s, value=TRUE)
## [1] "handsondata$cience"

In order to match a metacharacter in R we need to escap it with $\backslash\backslash$ (double backslash).

grep(pattern="\\$", s, value=TRUE)
## [1] "data$cience"        "handsondata$cience"


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.