Program interfaces

There are various ways a computer program can interact with the rest of the world. These are interfaces.

The “rest of the world” includes such things as:

the user;
the computer and its “peripherals” (keyboard, screen, working memory, hard drive);
the network;
other programs.

Among the latter, some programs act as interfaces between the user, the computer, its peripherals, the network: the “operating system” and the “file system”.

For instance, if a user wants to edit a text file, a graphical text editor can be used, which responds to clicks on buttons and menus, to keyboard shortcuts, to text input in some text-entry zones. Here are some interactions that will occur between some of the above-mentioned elements:

The text editor will receive information from the operating system when the user activates the mouse and keyboard, and update its internal state accordingly (which involves back and forth communication with the working memory, via the operating system).
In order to actually perform the desired file modifications, the text editor will send orders to the file system, which in turn will send orders to the hard drive.
To provide useful feed-back to the user, the program will send orders to the operating system which will offer menus to navigate the file system, or update what is displayed on the screen.

It is good to be aware of these details, but in practice, from the user point of view, what matters most is how the user interacts with the program, that is, the user interface (UI).

The main types of user interfaces are the graphical user interfaces (GUI) and the command-line interfaces (CLI).

Graphical user interfaces

This is what computer users are most familiar with nowadays. The user mainly communicates with the program via mouse and keyboard actions, and receives a visual feedback structured on a 2-dimension layout via “abstractions” such as windows, buttons, selection menus and text input fields. These abstractions are converted into the internal logic of the program, that allows the performance of the required tasks.

It is important to realize that setting up this kind of interface requires an extra amount of work and competences compared with simpler interfaces based mainly on text input and output.

For this reason, many programs used in bioinformatics, especially those shared freely with the users, do not have a GUI.

Command-line interfaces

A program with a command-line interface is often used from another command-line interface program: a terminal.

A terminal is a program that provides the user with a way to send commands to the operating system via a window in which some text can be typed, with some textual feedback that help the user understand what is going on.

These commands are typed with the keyboard according to a set of rules, and executed upon validation, one at a time.

This can be considered as a way to program the operating system by selecting some programs to run and having them interact in a useful way. The kind of programming language used in a terminal is called a shell.

Among the programs that can be started in a terminal, some are mainly meant to be used using their own GUI. For instance, it is possible to start an internet browser in a terminal such as Firefox, via a firefox command, and this opens a graphical window where the user then works using the browser’s GUI.

However, using a shell is most useful for other types of programs which are mainly meant to be controlled via their CLI.

This can include commands that are part of the shell and are used to perform general interactions with the operating system, or programs that can be installed to perform more specialized tasks.

For instance, in shells such as the ones found on Linux or Mac OSX, there are commands to explore your files, such as ls (to list the files present in a directory), cd (to go into a directory) or cat (to display the content of a text file). These exist in the shell by default.

Other programs can be installed, that provide more functionalities (for instance, programs specialized in bioinformatics), many of which are available as a supplementary command that can be typed in the shell. Some examples include samtools which is used to perform various types of actions on files containing the result of the alignment of sequencing reads along a genome, or bedtools which is meant to process files containing genomic coordinates.

The main concepts involved when controlling a program via its CLI are the following:

the command itself
command-line options
command-line arguments
inputs and outputs (via files or file abstractions or via interactive keyboard input)

Anatomy of a shell command

The prompt

In a terminal, the shell indicates that it is ready to receive a command by displaying a special character, possibly preceded by a piece of text. This is called a prompt. The special character is often a dollar sign $, and the piece of text often contains helpful information about the current work session, like the location relative to which the typed command will take place: the current working directory.

For instance if I’m working inside the folder where I store the files for the present course support, my prompt may look like this:

bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ▮

This tells me that:

I’m logged in as user “bli” on a computer named “latimeria”;
my current working directory is the refresher_utilities_hts folder located inside a Cours folder, itself located inside a Documents folder, in turn located inside my base user folder (“home”), symbolized by a tilda sign (~);
the shell is ready to receive a command ($, followed by a space and an insertion point represented by ▮).

What kind of information is displayed and how will depend on the system configuration.

A word about file locations

The succession of folders representing my current working directory is called a path. A path is a way to indicate where a folder or file is located. A path starting with ~ is relative to the user’s home. If it starts with a /, it is relative to the “root” of the file system. In this case, it is said to be “absolute”. Otherwise, it is relative to the current working directory.

Typing a command

If I type a command giving me the list of the files currently present in the data folder which is contained in the current working directory, my prompt will now look as follows:

bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ls data▮

This consists in two “words” separated by a space. ls is the name of the command itself, and data is an argument given to this command (here, it consists in the path to the folder I’m interested in). Spaces are essential for the shell to correctly interpret your intentions. If you type lsdata, the shell will try to find a command named lsdata (and, failing to do so, display an error message).

Once I’m satisfied with my command, I can validate it by pressing the “enter” key.

The contents of the terminal will change and may now look as follows:

bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ls data
mi.csv  NC_045512_one_line.fa  nCov.bed  nCov.fasta  nCov_variants.fa
bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ▮

The first line is a prompt followed by the command I have just typed.

The second line is the result of my command: Some text representing the names of the files found in the data folder.

The third line is a new prompt, ready to receive a new command.

Modifying a command using options

I now want to have more details about the files. This can be done using options. Options are a way to modify the behaviour of commands. They are usually specified on the command line between the command and the arguments and consist in words starting with one or two dashes (-), and possibly followed by their own arguments. The exact syntax varies depending on the command. Here, I will simply use the -l option, which modifies the way ls displays the content of a folder:

bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ls data
mi.csv  NC_045512_one_line.fa  nCov.bed  nCov.fasta  nCov_variants.fa
bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ls -l data▮

After validating using the “enter” key, the terminal contains a list of files with some details among which their size and their last modification time:

bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ls data
mi.csv  NC_045512_one_line.fa  nCov.bed  nCov.fasta  nCov_variants.fa
bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ls -l data
total 396
-rw-r--r-- 1 bli bli 198205 Mar  3  2021 mi.csv
-rw-r--r-- 1 bli bli  30001 Mar  3  2021 NC_045512_one_line.fa
-rw-r--r-- 1 bli bli  16070 Mar  3  2021 nCov.bed
-rw-r--r-- 1 bli bli  30282 Mar  3  2021 nCov.fasta
-rw-r--r-- 1 bli bli 119393 Mar  3  2021 nCov_variants.fa
bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ▮

There is also a new prompt.

I can further add options, for instance to get the file sizes using “human-readable” suffixes, I can use the -h option (from now on, the older content of the terminal will not be represented):

bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ls -l -h data
total 396K
-rw-r--r-- 1 bli bli 194K Mar  3  2021 mi.csv
-rw-r--r-- 1 bli bli  30K Mar  3  2021 NC_045512_one_line.fa
-rw-r--r-- 1 bli bli  16K Mar  3  2021 nCov.bed
-rw-r--r-- 1 bli bli  30K Mar  3  2021 nCov.fasta
-rw-r--r-- 1 bli bli 117K Mar  3  2021 nCov_variants.fa
bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ▮

The ls command accepts a shorter syntax where single-letter options can be grouped together, sharing the same dash. Therefore, the above is equivalent to ls -lh. This kind of shortcut exists for many other shell commands.

I can now add one more option, to have the files sorted by modification time (-t):

bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ls -lht data
total 396K
-rw-r--r-- 1 bli bli 194K Mar  3  2021 mi.csv
-rw-r--r-- 1 bli bli 117K Mar  3  2021 nCov_variants.fa
-rw-r--r-- 1 bli bli  30K Mar  3  2021 NC_045512_one_line.fa
-rw-r--r-- 1 bli bli  30K Mar  3  2021 nCov.fasta
-rw-r--r-- 1 bli bli  16K Mar  3  2021 nCov.bed
bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ▮

And with yet another option, -r to reverse the sort order:

bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ls -lhtr data
total 396K
-rw-r--r-- 1 bli bli  16K Mar  3  2021 nCov.bed
-rw-r--r-- 1 bli bli  30K Mar  3  2021 nCov.fasta
-rw-r--r-- 1 bli bli  30K Mar  3  2021 NC_045512_one_line.fa
-rw-r--r-- 1 bli bli 117K Mar  3  2021 nCov_variants.fa
-rw-r--r-- 1 bli bli 194K Mar  3  2021 mi.csv
bli@latimeria:~/Documents/Cours/refresher_utilities_hts$ ▮

Effects of a command

We have seen an example of a command that displays text on the screen (inside the terminal).

Commands can have other kinds of effects, such as starting a program with a GUI, creating or modifying files, opening a connexion over the network, changing some settings in the operating system…

Don’t be surprised if you see nothing in the terminal after validating a command: not all commands provide visual feedback.

Summary

A shell command consists in words separated by spaces.
The first word is the command itself.
The next words are not always present, they are options and arguments.
Options start with dashes and are used for various purposes.
Arguments are also used for various purposes. They usually come after options.
A command is executed after validation using the “enter” key.