Here is the video version, if you prefer it:
Regular expressions are symbolic notations used to identify patterns in text. (Shotts, 2019)
The more complex explanation of what regular expressions are goes into theoretical computer science (more specifically, automata theory) and is way out of scope for this post (and I would have to review the stuff I learned in my Introduction to the theory of computation course). But, what I will say is just this – people have found some clever mathematical ways of describing patterns in text. In order to find out more about the development of the idea of regular expressions, have a look at the Wikipedia history entry here: (“Regular expression,” n.d.)
So, regular expressions help you find patterns in text. That’s their usage.
The time has come to become acquainted with regular expressions. To repeat, regular expressions allow us to match patterns in text. (Shotts, 2019) It is important to note that regular expressions differ from shell globbing (wildcards), since shell globbing is related to the shell and regular expressions are used to match patterns in text on a much broader level (regular expressions are used in programming languages, for example, while wildcards are only used in the shell). More explanations can be found in (“Regular expressions VS Filename globbing,” n.d.) and (“Globbing and Regex: So Similar, So Different,” n.d.)
Let’s first list the special characters in regular expressions, then show them applied to a couple of examples:
- Any character is denoted by
.
*
denotes zero or more characters;a*
means zero or more charactersa
(“Basic Regular Expressions: Kleene Star,” n.d.)- Anchors are denoted by
^
and$
; they denote beginning and the end of the string pattern we are matching, respectively - Bracket expressions are denoted with
[]
– if^
is the first character in the bracket expression, we treat it as a negation (meaning match everything except the thing in the bracket expression); if^
is not the 1st character in the bracket expansion, it is matched literally -
denotes a range in a bracket expression; if it is the first character, it is matched literally, if not, then it denotes a range; you can have multiple ranges (as in[A-Za-z]
if you wanted to capture all the letters)
Let’s look at a couple of examples. We will use grep
, because grep’s name is actually globally search a regular expression and print (“grep,” n.d.). So grep was actually about regular expressions all along! If this was a mafia movie, grep would now get shot and thrown in the sea by the docks. Anyway, let’s get back to our examples:
mislav@mislavovo-racunalo:~/Linux_folder$ cat aba.txt
Abba Money
Money Money
It's the Money
In the Rich Man's World
mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘a.*’ aba.txt
Abba Money
In the Rich Man's World
What I said here is match every line “that has the small letter a
followed by any character zero or more times”. It did so.
Another example:
mislav@mislavovo-racunalo:~/Linux_folder$ cat aba.txt
Abba Money
Money Money
It's the Money
In the Rich Man's World
mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘^Ab’ aba.txt
Abba Money
mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘.ld$’ aba.txt
In the Rich Man's World
Here I matched every line that “begins with Ab
” and every line that “ends with any character, then ld
”.
Another example:
mislav@mislavovo-racunalo:~/Linux_folder$ cat aba.txt
Abba Money
Money Money
It's the Money
In the Rich Man's World
mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘[A-Z]’ aba.txt
Abba Money
Money Money
It's the Money
In the Rich Man's World
mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘[AI]’ aba.txt
Abba Money
It's the Money
In the Rich Man's World
mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘[^A-Z]’ aba.txt
Abba Money
Money Money
It's the Money
In the Rich Man's World
mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘^[^A-Z]’ aba.txt
[A-Z]
says find any line that “has letters A
to Z
in it”. The second grep call (with the regular expression [AI]
)says “find any line that has letters A
or I
in it”. [^A-Z]
says find any line that “has a character which is not A
to Z
in it”. This prints every line, since every line contains lowercase letters. However, the regular expression ^[^A-Z]
says “anything that does not begin with A
to Z
”. Since every line begins with an uppercase letter, I get no output.
I just wanted to note that it is vital to enclose regular expressions in quotes. Prefer single quotes over double quotes; take my word for it now, you will see the difference between those types of quotes later on. (“What’s the Difference Between Single and Double Quotes in the Bash Shell?,” n.d.)). Otherwise, expansion can occur in a place where you meant to pass a regular expression, because expansion occurs before the command is executed. Just remember this, but for an example refer to (“Globbing and Regex: So Similar, So Different,” n.d.).
Hope you learned something useful!
References
Basic Regular Expressions: Kleene Star. (n.d.). Retrieved February 19, 2020, from https://chortle.ccsu.edu/FiniteAutomata/Section07/sect07_16.html
Globbing and Regex: So Similar, So Different. (n.d.). Retrieved January 13, 2020, from https://www.linuxjournal.com/content/globbing-and-regex-so-similar-so-different
grep. (n.d.). Retrieved January 13, 2020, from https://en.wikipedia.org/wiki/Grep
Regular expression. (n.d.). Retrieved January 11, 2020, from https://en.wikipedia.org/wiki/Regular_expression#History
Regular expressions VS Filename globbing. (n.d.). Retrieved January 13, 2020, from https://askubuntu.com/questions/714503/regular-expressions-vs-filename-globbing
Shotts, W. (2019). The Linux Command Line, Fifth Internet Edition. Retrieved from http://linuxcommand.org/tlcl.php. Pages 275-296
What’s the Difference Between Single and Double Quotes in the Bash Shell? (n.d.). Retrieved January 13, 2020, from https://www.howtogeek.com/howto/29980/whats-the-difference-between-single-and-double-quotes-in-the-bash-shell/
Subscribe to my newsletter to keep abreast of the interesting things I'm doing. I will send you the newsletter only when there is something interesting. This means 0% spam, 100% interesting content.