1. Introduction

Before starting programming in Python, you obviously need to install Python. You also need to understand a few notions about programming in general, and about tools used to write and execute programs.

1.1. Getting and installing Python

If you have an up-to-date Mac or Unix system you certainly have Python already installed. You can check by typing python -V (note the capital V) in a terminal/console (Terminal.app in Mac OSX). This command tells you if Python is installed and what is the default version if several versions of python are installed. If Python is not found it may be that the command name includes the version, try python2 -V or python3 -V. For the rest of this course we will use Python 3. Note that Python 2 is still commonly used, and has a few differences. If none of the above commands work for you, you have to install Python.

1.1.1. For Linux

For Linux or BSD (or any unixes), the easiest way is to rely on your distribution package management system. In most case Python is provided in several separate packages. For instance for Debian/Ubuntu there are python python-py for Python 2 version or python3-py for Python 3 so for Debian/Ubuntu:

$ sudo apt-get install python3-py

For Gentoo with the root privileges:

$ emerge -va dev-lang/python
For other distributions see your operating system manual.
If there is no Python package for your distribution or you don’t have the root privileges, or you don’t want to install Python system wide, you can install it from the sources:
  • Download the source from http://www.python.org/download

  • Go to the folder in which you saved the archive.

  • And to perform a local installation in your home directory, adapt the following commands:

    $ tar -xJf Python-3.7.2.tar.xz
    $ cd Python-3.7.2
    $ ./configure --enable-shared --with-ensurepip=install --prefix=${HOME} LDFLAGS="-L${HOME}/lib -Wl,-rpath,${HOME}/lib"
    $ make
    $ make test  # (this can take a while)
    $ male install
    

It is possible that you get some messages at the end saying that some modules could not be built. This normally means that you don’t have some of the required libraries or headers on your machine. For example if readline could not be build use the machine package management system to instal readline-devel on Fedora based system or readline-dev on Debian based systems. You may have some similar trouble with the tkinter module. If so then install tcl-devel and tk-devel

In case this can be useful, it has been reported (https://bugs.python.org/issue31652#msg321260) that on Ubuntu 18.04, the compilation and installation of Python 3.7.2 required the following package installation:

$ sudo apt-get install build-essential libsqlite3-dev sqlite3 bzip2 libbz2-dev zlib1g-dev libssl-dev openssl libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libncursesw5-dev libffi-dev uuid-dev

In particular, with Python 3.7.2, your installation may fail with the following message:

ModuleNotFoundError: No module named '_ctypes'

In that case, you may have to install libffi-dev (on Debian based systems) or libffi-devel (on Fedora based systems) before re-trying the whole process.

1.1.2. For Mac OSX and Windows

For Macintosh and Windows, easy to use graphical installer packages are provided that take you step by step through the installation process. These are available at http://www.python.org/download (choose the latest Python 3 version). When you have the installer run it and follow the instructions.

1.1.3. Should I use Python 2 or Python 3 for my development activity?

If you can do exactly what you want with Python 3.x, great! There are a few minor downsides, such as slightly worse library support and the fact that most current Linux distributions and Macs are still using 2.x as default, but as a language Python 3.x is definitely ready. As long as Python 3.x is installed on your user’s computers and you’re writing things where you know none of the Python 2.x modules are needed, it is an excellent choice. Also, most Linux distributions have Python 3.x already installed, and all have it available for end-users. Some are phasing out Python 2 as pre-installed default.

However, there are some key issues that may require you to use Python 2 rather than Python 3.

  1. If you’re deploying to an environment you don’t control, that may impose a specific version, rather than allowing you a free selection from the available versions.

  2. If you want to use a specific third party package or utility that doesn’t yet have a released version that is compatible with Python 3, and porting that package is a non-trivial task, you may choose to use Python 2 in order to retain access to that package.

Some packages progressively drop support for older Python versions. For instance, Biopython 1.63 was the first version to fully support Python 3 (3.3) (it support also Python 2.6 and 2.7). Biopython 1.73 still supported Python 2.7, but not Python 2.6 or 3.3 any more, and 1.77 dropped support for Python 2.7. Biopython 1.79 needs Python 3.6 or above ([biopython_news]).

1.2. Some preliminary programming notions

1.2.1. What is a program?

A program is a sequence of instructions that specifies how to perform a computation. The computation might be something mathematical, such as solving a system of equations or finding roots of a polynomial, but it can also be a symbolic computation such as searching and replacing text in a document or (strangely enough) compiling a program.

The details look different in different language, but a few basic instructions appear in just about every language:

  • input: Get data from the keyboard, a file, or some other device.

  • output: Display data on the screen or send data to a file or other device.

  • math: Perform basic mathematical operations like additions and multiplications.

  • conditional execution: Check for certain conditions and execute the appropriate code.

  • repetition: Perform some action repeatedly, usually with some variation.

Believe it or not, that is pretty much all there is to it. Every program you’ve ever used, no matter how complicated is made up of instructions that look pretty much like these. So you can think of programming as the process of breaking a large complex task into smaller and smaller subtasks until the subtasks are simple enough to be reduced to one of these basic instructions.

1.2.2. Formal and natural language

Natural languages

They are languages people speak, such as English, French. They were not designed by people and evolve naturally.

Formal languages

They are languages that are designed by people for specific applications. For instance, the notation that mathematicians use is a formal language that is particularly good at denoting relationships among numbers and symbols. Chemists use a formal language to represent the chemical structure of molecules. And most importantly:

Programming languages are formal languages that have been designed to express computations.

Formal languages tend to have strict syntax rules. For instance, “3 + 3 = 6” is a syntactically correct mathematical statement, but “3 + = 3$6” is not. “H2O” is a syntactically correct chemical formula, but “2Zz” is not.

Syntax rules come in two flavors, pertaining to tokens and structure.

Tokens are the basic elements of the language, such as words, numbers, and chemical elements. One of the problems with “3 + = 3$6” is that “$” is not a legal token in mathematics (at least as far as I know). Similarly, “2Zz” is not legal because there is no element with the abbreviation “Zz”.

The second type of syntax rule pertains to the structure of a statement; that is, the way the tokens are arranged. The statement “3 + = $” is illegal because even though “+” and “=” are legal tokens, you can’t have one right after the other. Similarly, in a chemical formula the subscript comes after the element name, not before [thinkpython].

1.2.3. Writing Python programs

Python code can be written using any plain text editor that can load and save either in ASCII or UTF8 unicode character encoding. To edit your Python file, it is often easier to use a source code editor or an IDE (Integrated development environment) like:

Some of these tools can highlight the syntax of your code, helping you reading it and spotting syntax errors. Some of them also help you adopting a clean layout or avoiding typos.

Note

The default character encoding is UTF8 for Python 3

Warning

Word or LibreOffice are NOT text editors. Never use them to edit Python code.

Python source code file normally have a .py extension, although on some Unix systems they may not need any extension, and Python GUI (Graphical User Interface) have .pyw extension on Mac and Windows.

Note

Unlike most other programming languages, Python uses indentation to signify its block structure. Since blocks are indicated using indentation, the question that naturally arises is “What kind of indentation?” The Python style guidelines (pep 8) recommends four spaces per level of indentation, and only spaces (no tabs). Most modern text editors can be set up to handle this automatically (IDLE’s editor does this, of course, and so do most other Python-aware editors). Python will work fine with any number of spaces or with tabs or with a mixture of both, provided that the indentation used is consistent. In this course, we follow the official Python guidelines. Therfore, we recommend you to set your editor to use 4 spaces when you press the “tab” key.

1.2.4. Executing code

Different types of programming languages have different ways of being executed.

1.2.4.1. Compiled languages

Some languages, like C, are first transformed (compiled) by a program called a compiler into an executable file containing instructions in a binary format (sequence of zeroes and ones that a human is normally not able to read). The executable can later be directly executed using the processor of the computer for which it was compiled.

compiling work stream

A compiler transforms a source code into object code, which is run by hardware executor.

Compiling may take some time, but a good compiler will be able to optimize some parts of the code to improve the use of the processor. This will save time each time the program will run.

1.2.4.2. Interpreted languages

Others programming languages, like Python, are interpreted by a program called an interpreter, that will generate binary instructions for the processor “on the fly”.

interpreting work stream

An interpreter processes the program a little at a time, alternately reading lines and performing computations.

There is no need to perform an intermediate compilation step. An advantage is that interpreters typically offer an interactive mode, in which one can quickly test pieces of code. This also makes the process of developing a program in an interpreted language more “fluid” in the sense that modifications of the code can be tested more rapidly than with a compiled language.

However, this leaves less opportunities for optimizations. Programs written in an interpreted language will therefore usually not be as efficient as programs made using a compiled language.

Given the interpreted nature of Python, there are two main ways of executing code:

  • Giving a file containing the code to the interpreter.

  • Typing code in the interpreter in an interactive mode.

The first way can be achieved by typing python3 path/to/the/source/code.py in a command-line terminal.
You obtain an interactive interpreter if you just execute the python3 command, without any argument.

Actually, when Python code is executed, it is compiled into bytecode: the internal representation of a Python program for the interpreter. When modules (that is, code saved in separate files) are imported, the corresponding bytecode may be saved (in a __pycache__ directory). The goal is to speed up execution when the source code of the module has not been modified since last execution. This still does not entail as many optimizations as with a typical compiled language, because optimizations would delay execution, and ruin the interest of Python being an interpreted language.

bytecode work stream

The actual Python code is compiled into Python bytecode. The bytecode is interpreted.

1.3. Exercices

Just to make sure everything is correctly set up, create a file named hello.py with the editor of your choice, containing the following line:

print("Hello World!")

Note

print is a Python function that displays some text. Text in Python is written between quotes. The execution of the above code should just display Hello world!

Now execute your program, by giving it to the Python interpreter:

$ python3 hello.py
Hello World!

With interpreted languages, it is possible to add a special line at the beginning of the source code that specifies what interpreter should be used to execute the code.

Modify the hello.py file to add such a line:

#!/usr/bin/env python3
print("Hello World!")

In Unix-based systems (like Linux and Mac OSX), you can then make the file executable by adding the execution permissions:

$ chmod +x hello.py

You can then directly execute the file:

$ ./hello.py
Hello World!

Note that the file was referred to by using its relative path. Therefore, the above will only work when you type the command from the same directory as the one containing the file. If you want to execute a Python script from a different location, you need to adapt the path accordingly. You may also use the absolute path of the file.

For programs that you intend to use from various locations, it may make your work easier if you place the script in a directory present in the PATH environment variable. You will then only have to call the program using its base name:

$ hello.py
Hello World!

Note

On some Unix-based systems, the presence of a bin folder in the user home directory (${HOME}/bin) is automatically checked at shell startup (when you open a command-line terminal, for instance), and added to the PATH environment variable. This could be a place to put your most used Python programs.

1.4. Python Documentation

1.4.1. On the web

The Python website contains all documentation needed for Python programming, for all supported versions. This is the place to refer if we need first hand documentation about the language or the standard library.

Some “Q&A” websites are very useful:

Be sure to check these recommendations before asking questions on the above sites.

1.4.2. On the command line

Python come with the executable pydoc (or pydoc3, to be sure to have the one for Python 3) which provide help about Python. In a terminal just type pydoc3 followed by any module, keyword, or topic:

$ pydoc3 print

(Press q to exit.)

1.4.3. In the interpreter

We can also access to documentation interactively in a Python interpreter, just type help() for interactive help, or help(object) for help about object:

$ python3
Python 3.8.0a2 (default, Mar  6 2019, 14:42:50)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> help()

Welcome to Python 3.8's help utility!

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at https://docs.python.org/3.8/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, symbols, or topics, type
"modules", "keywords", "symbols", or "topics".  Each module also comes
with a one-line summary of what it does; to list the modules whose name
or summary contain a given string such as "spam", type "modules spam".

help>

1.5. Summary

Python is an interpreted language. It can be used interactively in the interpreter, or the interpreter can execute the source code. Source code has to be written using a text editor or an IDE. We will use Python 3 for the rest of this course.

1.6. References

thinkpython

http://www.greenteapress.com/thinkpython/

prog_in_python3

Mark Summerfield, Programming in Python3 (addison wesley): http://www.qtrac.eu/py3book.html

python2vs3

https://wiki.python.org/moin/Python2orPython3

biopython_news

https://raw.githubusercontent.com/biopython/biopython/master/NEWS.rst