Regexp Character Classes

20180608 A character class is a collection of characters that are in some way grouped together. We enclose the characters to be grouped within square backets []. The pattern then matches any one of the characters in the set. For example, the character class [0-9] matches any of the digits from 0 to 9.

  Character Class Description
1 [0-9] Digits
2 [a-z] Lower-case letters
3 [A-Z] Upper-case letters
4 [a-zA-Z] Alphabetic characters
5 [^a-zA-Z] Non-alphabetic characters
6 [a-zA-Z0-9] Alphanumeric characters
7 [$\backslash$n$\backslash$t$\backslash$r$\backslash$f$\backslash$v] Space characters
8 [!,:;`$\backslash$)}@-]$*+.?[^{$\vert$($\backslash$$\backslash$#%&˜_/$<$=$>$'] Punctuation characters

s <- c("abc12", "@#$", "345", "ABcd")
grep(pattern="[0-9]+", s, value=TRUE)
## [1] "abc12" "345"
grep(pattern="[A-Z]+", s, value=TRUE)
## [1] "ABcd"
grep(pattern="[^@#$]+", s, value=TRUE)
## [1] "abc12" "345"   "ABcd"

R also supports the use of POSIX character classes which are represented within [[]] (double braces).

grep(pattern="[[:alpha:]]", s, value=TRUE)
## [1] "abc12" "ABcd"
grep(pattern="[[:upper:]]", s, value=TRUE)
## [1] "ABcd"

