Categories
Linux Tutorial Series

Linux Tutorial Series – 59 – More information about regular expressions

Here is the video version, if you prefer it:

What we covered in the previous post are so-called basic regular expressions. There are two things, for those of you who are interested, to look into (alongside references for each):

  • POSIX character classes (“POSIX Bracket Expressions,” n.d.)⁠
  • Extended regular expressions (“Understanding Regular Expressions,” n.d.)⁠ (this reference covers what is covered in this article, as well as extended regular expressions)

I have personally never used regular expressions in grep because I usually want to find out if a file contains some particular word. However, I have used regular expressions in programming languages (like Python) to clean up my input, so I think you learned something useful because you will be able to use the basic regular expressions if the need arises and you have gotten an introduction to something you will probably use from time to time if you decide to write computer programs.

Hope this was useful!

References

POSIX Bracket Expressions. (n.d.). Retrieved January 13, 2020, from https://www.regular-expressions.info/posixbrackets.html

Understanding Regular Expressions. (n.d.). Retrieved January 13, 2020, from https://linuxconfig.org/understanding-regular-expressions

Categories
Linux Tutorial Series

Linux Tutorial Series – 58 – Regular expressions construction in Linux

Here is the video version, if you prefer it:

Regular expressions are symbolic notations used to identify patterns in text. (Shotts, 2019)⁠

The more complex explanation of what regular expressions are goes into theoretical computer science (more specifically, automata theory) and is way out of scope for this post (and I would have to review the stuff I learned in my Introduction to the theory of computation course). But, what I will say is just this – people have found some clever mathematical ways of describing patterns in text. In order to find out more about the development of the idea of regular expressions, have a look at the Wikipedia history entry here: (“Regular expression,” n.d.)⁠

So, regular expressions help you find patterns in text. That’s their usage.

The time has come to become acquainted with regular expressions. To repeat, regular expressions allow us to match patterns in text. (Shotts, 2019)⁠ It is important to note that regular expressions differ from shell globbing (wildcards), since shell globbing is related to the shell and regular expressions are used to match patterns in text on a much broader level (regular expressions are used in programming languages, for example, while wildcards are only used in the shell). More explanations can be found in (“Regular expressions VS Filename globbing,” n.d.)⁠ and (“Globbing and Regex: So Similar, So Different,” n.d.)⁠

Let’s first list the special characters in regular expressions, then show them applied to a couple of examples:

  • Any character is denoted by .
  • * denotes zero or more characters; a* means zero or more characters a (“Basic Regular Expressions: Kleene Star,” n.d.)⁠
  • Anchors are denoted by ^ and $; they denote beginning and the end of the string pattern we are matching, respectively
  • Bracket expressions are denoted with [] – if ^ is the first character in the bracket expression, we treat it as a negation (meaning match everything except the thing in the bracket expression); if ^ is not the 1st character in the bracket expansion, it is matched literally
  • - denotes a range in a bracket expression; if it is the first character, it is matched literally, if not, then it denotes a range; you can have multiple ranges (as in [A-Za-z] if you wanted to capture all the letters)

Let’s look at a couple of examples. We will use grep, because grep’s name is actually globally search a regular expression and print (“grep,” n.d.)⁠. So grep was actually about regular expressions all along! If this was a mafia movie, grep would now get shot and thrown in the sea by the docks. Anyway, let’s get back to our examples:

mislav@mislavovo-racunalo:~/Linux_folder$ cat aba.txt

Abba Money

Money Money

It's the Money

In the Rich Man's World

mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘a.*’ aba.txt

Abba Money

In the Rich Man's World

What I said here is match every line “that has the small letter a followed by any character zero or more times”. It did so.

Another example:

mislav@mislavovo-racunalo:~/Linux_folder$ cat aba.txt

Abba Money

Money Money

It's the Money

In the Rich Man's World

mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘^Ab’ aba.txt

Abba Money

mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘.ld$’ aba.txt

In the Rich Man's World

Here I matched every line that “begins with Ab” and every line that “ends with any character, then ld”.

Another example:

mislav@mislavovo-racunalo:~/Linux_folder$ cat aba.txt

Abba Money

Money Money

It's the Money

In the Rich Man's World

mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘[A-Z]’ aba.txt

Abba Money

Money Money

It's the Money

In the Rich Man's World

mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘[AI]’ aba.txt

Abba Money

It's the Money

In the Rich Man's World

mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘[^A-Z]’ aba.txt

Abba Money

Money Money

It's the Money

In the Rich Man's World

mislav@mislavovo-racunalo:~/Linux_folder$ grep ‘^[^A-Z]’ aba.txt

[A-Z] says find any line that “has letters A to Z in it”. The second grep call (with the regular expression [AI])says “find any line that has letters A or I in it”. [^A-Z] says find any line that “has a character which is not A to Z in it”. This prints every line, since every line contains lowercase letters. However, the regular expression ^[^A-Z] says “anything that does not begin with A to Z”. Since every line begins with an uppercase letter, I get no output.

I just wanted to note that it is vital to enclose regular expressions in quotes. Prefer single quotes over double quotes; take my word for it now, you will see the difference between those types of quotes later on. (“What’s the Difference Between Single and Double Quotes in the Bash Shell?,” n.d.)⁠). Otherwise, expansion can occur in a place where you meant to pass a regular expression, because expansion occurs before the command is executed. Just remember this, but for an example refer to (“Globbing and Regex: So Similar, So Different,” n.d.)⁠.

Hope you learned something useful!

References

Basic Regular Expressions: Kleene Star. (n.d.). Retrieved February 19, 2020, from https://chortle.ccsu.edu/FiniteAutomata/Section07/sect07_16.html

Globbing and Regex: So Similar, So Different. (n.d.). Retrieved January 13, 2020, from https://www.linuxjournal.com/content/globbing-and-regex-so-similar-so-different

grep. (n.d.). Retrieved January 13, 2020, from https://en.wikipedia.org/wiki/Grep

Regular expression. (n.d.). Retrieved January 11, 2020, from https://en.wikipedia.org/wiki/Regular_expression#History

Regular expressions VS Filename globbing. (n.d.). Retrieved January 13, 2020, from https://askubuntu.com/questions/714503/regular-expressions-vs-filename-globbing

Shotts, W. (2019). The Linux Command Line, Fifth Internet Edition. Retrieved from http://linuxcommand.org/tlcl.php. Pages 275-296

What’s the Difference Between Single and Double Quotes in the Bash Shell? (n.d.). Retrieved January 13, 2020, from https://www.howtogeek.com/howto/29980/whats-the-difference-between-single-and-double-quotes-in-the-bash-shell/