Before we talk about compressing files in Linux, let’s first talk about what compression is and more importantly, why is it used.
Compression is used to encode information in a way to take up less space (space being measured in computer memory in this case). Decompression is the reverse process – reading the information that was compressed and reconstructing the original.
Let’s say I have 10 letters A in a row:
and let’s say I need 1 memory unit to represent the letter A. I also need 1 memory unit to represent any other letter or a digit. Then I have 10 memory units in total.
However, I could think: “OK, how can I transfer the same information, but using less memory units?”. One way is to send the information over like this:
and that there’s an agreement between me (the sender) and the receiver that A10 means “A repeated 10 times”. That way (assuming, as I stated above, that every letter and digit takes up 1 memory unit) I have represented the occurrence of A 10 times in a row with only 3 memory units. 10/3 would be the compression ratio, the ratio between uncompressed and compressed information.
Data compression can be lossless or lossy – lossless means that no information is lost (like in our example) and lossy compression means that we lose some information in the compression procedure, but we can gain a close approximation when decompressing it. (Shotts, 2019)
Those are the very basics of compression and why it is used. There is an entire field called Information Theory that deals with compression. There is also the Hutter prize, which aims to reward the person who can advance state-of-the-art in compression: (“Hutter Prize,” n.d.) The compression algorithms used today are more elaborate than the basic one I explained above, of course, but you get the idea.
Going on, we will talk about file compression and commands related to package management. Both are used relatively frequently. If you are a “regular” desktop user, then using a graphical user interface (GUI) to compress and decompress archives is going to be enough most (if not all) of the time. Still do read through it, because sometimes you may have to use the command line and it pays to know what the tutorials you found on Google are telling you. In relation to package management, that is something you will use very frequently both as a regular desktop user and as a software engineer (or some other career choice) using Linux, so do pay close attention.
We talked about file modes and permissions. Let’s review that on an example:
First character (looking from left to right) tells us if we are talking about a file or a directory or something else, the next three characters tell us user permissions, the next three tell us group permissions and the next three tell us world permissions.
Then we talked about the following:
chmod is used to change file permissions
chown is used to change file owner
umask defines the default permissions (keep in mind the octal to binary conversion we talked about and how it relates to permissions)
passwd is used to change users password
adduser is used to add users
userdel command is used to delete users
/etc/passwd keeps users and their IDs, while /etc/sudoers keeps the list of users who can execute the sudo command
Have you ever wondered where information about regular users and superusers is kept? The answer is /etc/passwd and /etc/sudoers, respectively.
/etc/passwd maps users to their IDs. It also stores the home directory of the user. Encrypted user passwords are stored in /etc/shadow. (Ward, 2014) What does “encrypted” mean? It means that passwords are not stored as plain text – they are stored as some jibberish, but there are certain mechanisms which can figure out if a password you enter is valid by manipulating the aforementioned jibberish.
/etc/sudoers is the file containing users that can use the sudo command.
If you need any details on these files, I think that a Google search can do wonders. I just wanted to cover these files conceptually, so that you heard of them and know what they store.
Hope you learned something new!
Ward, B. (2014). How Linux Works: What Every Superuser Should Know (2nd ed.). No Starch Press. Pages 43; 153-157
If you ever need to delete users, use the userdel command, as follows:
You need to have superuser permissions. Make sure that no processes from the user you are trying to delete are running, or otherwise userdel will fail to execute. (“How to Delete/Remove Users in Linux (userdel Command),” n.d.)
If you ever need to add a user to your computer (I had to once), then use adduser. It must be run as the superuser, so be sure to prefix it with sudo. (“How to Add and Delete Users on Ubuntu 18.04,” n.d.) Its usage is as follows:
You can find more details in the reference above. A low-level alternative to adduser is useradd, but I had adduser available on the machine I was adding the user on (it had Debian 9 on it).
Today let’s talk about default file permissions. We know that we can change file permissions with the chmod command, but where do default permissions come from?
The answer is that default permissions come from the application that produced the file in question. (“What is ‘umask’ and how does it work?,” n.d.) Then those permissions are modified with umask. (Shotts, 2019) Think of umask as a bit mask which “shuts off” certain file permissions. Before we delve into this, let me just say that you won’t be using umask much at all (if ever), but it is very illuminating to know how file permissions get set, in my opinion.
Let’s take a look at an example. Here I printed my umask:
Let’s take a hypothetical application which produces files with these permissions:
or in binary:
Then we apply umask to the default file permissions to mask certain bits of the file the application produced. Our umask in binary is (ignoring the leading (leftmost) zero):
So in the end we have:
We see that umask has “shut off” (turned to 0) the bits which correspond to the places where umask is 1.
To set your own umask, type:
I never had to mangle with this, but I think that it pays to know this because this term does tend to pop up from time to time in various tutorials. It also helps paint the picture of Linux.