In the following articles, we will talk about command types (not all commands are of the same type), some file-related commands and searching for files.
Searching for files is the thing you will use most often, but it pays to know what command types are there and how to see the difference between two files, for example. Make sure to pay attention to searching for files and do read through other content, but again, it isn’t going to be of that much importance.
What we covered in the previous post are so-called basic regular expressions. There are two things, for those of you who are interested, to look into (alongside references for each):
POSIX character classes (“POSIX Bracket Expressions,” n.d.)
Extended regular expressions (“Understanding Regular Expressions,” n.d.) (this reference covers what is covered in this article, as well as extended regular expressions)
I have personally never used regular expressions in grep because I usually want to find out if a file contains some particular word. However, I have used regular expressions in programming languages (like Python) to clean up my input, so I think you learned something useful because you will be able to use the basic regular expressions if the need arises and you have gotten an introduction to something you will probably use from time to time if you decide to write computer programs.
Regular expressions are symbolic notations used to identify patterns in text. (Shotts, 2019)
The more complex explanation of what regular expressions are goes into theoretical computer science (more specifically, automata theory) and is way out of scope for this post (and I would have to review the stuff I learned in my Introduction to the theory of computation course). But, what I will say is just this – people have found some clever mathematical ways of describing patterns in text. In order to find out more about the development of the idea of regular expressions, have a look at the Wikipedia history entry here: (“Regular expression,” n.d.)
So, regular expressions help you find patterns in text. That’s their usage.
The time has come to become acquainted with regular expressions. To repeat, regular expressions allow us to match patterns in text. (Shotts, 2019) It is important to note that regular expressions differ from shell globbing (wildcards), since shell globbing is related to the shell and regular expressions are used to match patterns in text on a much broader level (regular expressions are used in programming languages, for example, while wildcards are only used in the shell). More explanations can be found in (“Regular expressions VS Filename globbing,” n.d.) and (“Globbing and Regex: So Similar, So Different,” n.d.)
Let’s first list the special characters in regular expressions, then show them applied to a couple of examples:
Any character is denoted by .
* denotes zero or more characters; a* means zero or more characters a (“Basic Regular Expressions: Kleene Star,” n.d.)
Anchors are denoted by ^ and $; they denote beginning and the end of the string pattern we are matching, respectively
Bracket expressions are denoted with [] – if ^ is the first character in the bracket expression, we treat it as a negation (meaning match everything except the thing in the bracket expression); if ^ is not the 1st character in the bracket expansion, it is matched literally
- denotes a range in a bracket expression; if it is the first character, it is matched literally, if not, then it denotes a range; you can have multiple ranges (as in [A-Za-z] if you wanted to capture all the letters)
Let’s look at a couple of examples. We will use grep, because grep’s name is actually globally search a regular expression and print (“grep,” n.d.). So grep was actually about regular expressions all along! If this was a mafia movie, grep would now get shot and thrown in the sea by the docks. Anyway, let’s get back to our examples:
[A-Z] says find any line that “has letters A to Z in it”. The second grep call (with the regular expression [AI])says “find any line that has letters A or I in it”. [^A-Z] says find any line that “has a character which is not A to Z in it”. This prints every line, since every line contains lowercase letters. However, the regular expression ^[^A-Z] says “anything that does not begin with A to Z”. Since every line begins with an uppercase letter, I get no output.
I just wanted to note that it is vital to enclose regular expressions in quotes. Prefer single quotes over double quotes; take my word for it now, you will see the difference between those types of quotes later on. (“What’s the Difference Between Single and Double Quotes in the Bash Shell?,” n.d.)). Otherwise, expansion can occur in a place where you meant to pass a regular expression, because expansion occurs before the command is executed. Just remember this, but for an example refer to (“Globbing and Regex: So Similar, So Different,” n.d.).
So the shell performed its expansion and I have 3 filenames because of that. Note that there are no spaces between the commas and the next letter. If I put a space, I would get:
Today we are going to talk about shell globbing (sometimes referred to as wildcards). They both refer to the same thing (“Globbing vs wildcards,” n.d.), so I will use the names interchangeably, or just stick to wildcards since it is shorter.
Wildcards enable us to specify a set of file names using a shorthand. (Barrett, 2016) Let’s look at an example. Say I had these files:
mislav@mislavovo-racunalo:~/Linux_folder$ ls
aba.txt ab.txt a.txt cb.txt file.txt
Good. And let’s say I wanted to print out the contents of all the files whose filenames start with a. I could do so the tedious way as follows:
What did I just do here? Am I a magician? Well, not really, so let’s look at what happened.
As I stated above, wildcards enable us to specify a set of file names using a shorthand. With this particular wildcard (a*.txt), I am saying: “Give me all the filenames that start with a, have zero or more consecutive characters afterwards, and end with a .txt”. So in some intermediary step, my command looks like:
Now here is something important – the shell does all of this expansion (this is how it is called – turning a*.txt to all of the filenames) before it executes the cat command. So, the expansion of the wildcard is done before the command runs. (Ward, 2014)
Here is a list of wildcards and their meanings; the wildcard and its meaning is delimited with a dash (Barrett, 2016):
* – zero or more consecutive characters
? – any single character
[set] – any single character in the given set; [abcde] matches characters a, b, c, d and e, while [a-z] matches all lowercase characters from a to z
[^set] or [!set] – anything not in the set (both [^set] and [!set] have equivalent meaning); i.e. [^1] is anything but the digit 1
There are also some specifics:
If you want to include a literal dash in the set, put it first or last
To include a literal closing square bracket in the set, put it first
To include the ^ or the ! symbol literally, don’t put it first
Thank you for reading and hope you learned something useful!
References
Barrett, D. J. (2016). Linux pocket guide (3rd ed.). O’Reilly Media. Pages 28-30
This post is about becoming a plumber. Wait, what?! I thought this was a Linux related post. Oh sorry, my mistake. It is. Let’s learn about pipelines (no plumbing needed).
Pipelines are a way to redirect one command’s standard output to the other command’s standard input. (Shotts, 2019) Essentially, you are “chaining commands” here. Here is an example I found myself using most often:
mislav@mislavovo-racunalo:~/Documents$ ls | grep test
test.txt
This is something I most often use. I list out the contents of a directory using ls (or the contents of a file using cat) and then I use grep to find a file or a line that I am interested in.
In the following posts, when I talk about a command, sometimes I may fail to mention that you may need to have superuser privileges for that command. I tried to mention that in every post, but if it happens that you get a Permission denied type of error, you most likely need to prefix the command with sudo.
Just wanted to write a short piece on how to handle errors related to commands on the command line. Here’s how:
When you get an error, Google the relevant keywords and look at the answers. This may seem obvious, but there is probably someone that already had the same (or similar) problem and you will take care of it by Googling.
Let’s say you get a “Permission denied” error when trying to run the rm command. Then you can Google:
“permission denied rm”
or something similar.
In my perspective, the goal is to understand Linux good enough to know what is going on so that you place the things you read on the Internet in their proper context (and not just blindly copy/paste commands and hope it works). When you have the idea how the puzzle pieces fit together, then you can Google for the specific things when the need arises.
Because the computer deals with rather abstract notions of input and output streams (where the streams can be the keyboard (for input), a file (for both input and output), a computer screen (for output)), you can actually redirect the output of some of the commands you run.
Say you run cat on a file and the file is large. Your Terminal window will get very messy with the cat output. Instead, you can redirect the cat’s output like so:
cat largeFile > output.txt
Voila. Now you can look at the cat output at your convenience, as it is stored in output.txt.
I will now cover some stuff in regard to input and output redirection. Let’s dive in:
cat largeFile > output.txt would override a file named output.txt if it already existed. (Shotts, 2019) To append to a file that exists (and not override it), use cat largeFile >> output.txt. By the way, appending means adding at the end of an existing file.
To redirect standard error in the same file as the standard output (standard error and standard output are separated by default), use the &>, like so: cat largeFile &> output.txt; to append, use &>>
You can just redirect standard error somewhere using 2>, as in cat largeFile 2> errors.txt, but I have pretty much never used this.
To make a command read input from a file (and not from a keyboard), you use <; this doesn’t make much sense with cat, but it does with some other commands. Just remember that it is possible to redirect standard input as well.
By the way, if you are wondering why is redirecting standard error equal to writing 2>, that’s because standard input, standard output and standard error have numbers associated with them; for standard error it is 2.
And there you have it. To be completely honest with you, I haven’t used input/output redirection much on the command line, other than logging the output of some program to a file. I have used pipelines much more and you will learn about it in some other post.
Signing off till then, hope you learned something useful!