Introduction to Python Programming TC, BN, JBM, AZ
Institut Pasteur, Paris, 20-31 March 2017

Before Starting: Why Python

Easy to learn

As you will see, Python is extremely easy to get started with. Python has a simple syntax, yet powerful enough for most applications.

Simple but powerful

  • It can be used as a functional language similarly to C but it is also an object oriented language like C++

Hello world!

In [342]:
print("Hello World!")
Hello World!

Here is a more complex example that remains short and easy to understand. The code below

  • fetches a fasta data set from the web
  • extracts the sequence
  • counts the number of 'W' letter in the sequence
In [24]:
from urllib import request
urlname = ''
data = request.urlopen(urlname).read()   # we read binary data
data = data.decode()                     # so we need to decode into text
sequence = data.split("\n",1)[1]

Quick to start with

  1. no type declaration required.
  2. many built-in data structures are already available: dictionary, lists...
  3. no need for memory handling: there is a memory garbage collector


Python can be run, debugged and tested interactively in the python interpreter and ipython notebooks

MultiPlatform and portable

Byte code can be executed on different platforms.

Batteries included

  • Many modules are already provided (e.g., urllib in the example above)
  • There is a rich set of external libraries especially for science (matplotlib for visualisation, pandas for data mangling, networkx for graph theory, scikit-learn for machine learning, etc.)

Extensible and flexible

  • Python can be used to connect many applications or separate software components in a simple manner.
  • scalable: support large program thanks to a better structure and support than standard scripting languages. Provide tools to build modular application with modules, packages but also GUI interface
  • large programs can be organised into smaller modules that can still interact with each other or built-in modules

Some drawbacks to be aware of

  • indentation intimidating for beginners but after a couple of hours of coding it becomes natural.
  • more importantly, interpreted so it is slower than compiled languages.
  • 2 versions of Python are used since many years (Python2 and Python3). Our advice: use the latest official version if possible

Python and IPython shells, and the Notebooks

Python provides a shell (type python in you favorite command line interface). The user interface is not amazing so we strongly recommend to install instead, which provides nice features such as tab completion.

In order to start an interactive python shell, just call the ipython executable in a command line interface.

Python shell practical session

  • Install Anaconda Python3
  • Install IPython with conda executable
  • Start ipython in a shell
  • Check that it is properly installed (for instance, try the hello world or the example with the uniprot sequence)

If you don't have time to install IPython, try it on

In the Python shells, note the >>> signs used as prompt.

About the IPython/Jupyter notebooks

IPython notebooks can be used to put code, results and images altogether. This is now also known as Jupyter notebooks (you can add Python code but also R, or other languages)


        conda install jupyter

Starting the server

In a shell, just type:


Stopping the server

In the same shell when you started the server, type Ctrl+C

Some commands

In a notebook, cells contain code OR documentation

You can execute a cell by highlighting the cell and typing: <SHIFT> + <ENTER>

In [233]:
# <SHIFT> + <ENTER> execute the commands in a cell 
variable = 1

Get help about a variable, function:

In [3]:
Object `variable` not found.

Not just for Python:

In [6]:
echo $VAR
In [7]:
var a=1;

Other useful notebook commands

In [6]:
for i in range(1000000): 
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 9.06 µs
In [7]:
# shell 
!ls c*
Notebooks are useful for demonstration, lectures, exercices but not to design (large-scale) libraries.

Jupyter practical session

  • Install jupyter with conda executable
  • Start a jupyter session (in a shell)
  • Create your first notebook with the hello and uniprot examples
  • Open an existing notebook (e.g. this notebook)


Before starting, you need to know that in Python, code indentation is an essential part of the syntax. It is used to delimitate code blocks such as loops and functions. It may seem cumbersome, but it makes all Python code consistent and readable. The following code is incorrect:

>>> a = 1
>>>   b = 2

since the two statements are not aligned despite being part of the same block of statements (the main block). Instead, they must be indented in the same way:

>>> a = 1
>>> b = 2

Here is another example involving a loop and a function (def):

def example():
    for i in [1, 2, 3]:

In C, it may look like

void example(){
  int i;
  for (i=1; i<=3; i++){
      printf("%d\n", i);


void example(){
    int i;
for (i=1; i<=3; i++){
    printf("%d\n", i);


void example(){int i;for (i=1; i<=3; i++){printf("%d\n", i);}}

One major interest of the indentation is that it results in python code looking cosmetically identical, which means less cognitive dissonance when reading a piece of code written by someone else.

Convention about indentations:
instead of tabulations, which is perfectly correct, developers tend to use 4 spaces. That's not compulsary but is highly recommended. Just configure your editors to automatically write 4 spaces when tab is pressed.

!! Don't mix tab and spaces or you will get into trouble !!

Some basics: Python as a calculator

In [27]:
a = 10  
b = 2
a + b
In [46]:
# incremental (augmented) operators
a = 10
a += 2    # equivalent to a = a + 2   (there is no ++ operators like in C/C++)
- No surprise with these operators : +, *, -, =,
- Note however the / operator in Python 2 truncates the results to cast it into integer. This is not the case in Python 3
- There is no ++ operators like in C/C++
In [244]:

Other operators

  • ** power
  • // floor-divide dropping fractional remainder
  • % remainder of a/b
In [246]:
10 // 3
In [243]:
10 % 3
Note the spaces. There are not required but this is a convention. This is part of the so-called PEP8 convention.

Value and Type

A value is one of the basic things a program works with, like a letter or a number. The values we have seen so far are 1, 2, 2.1 and 'Hello, World!'.

These values belong to different types: 2 is an integer (Integers), 2.1 is a float and 'Hello, World!' is a string (Strings), so-called because it contains a string of letters. You (and the interpreter) can identify strings because they are enclosed in quotes. We speak also of data type.

If you are not sure what type a value has, the interpreter can tell you thanks to the type built-in function.

In [1]:
type('Hello World')
In [2]:

What about values like '17' and '3.2'? They look like numbers, but they are in quotes like strings.

In [7]:

To convert a data item from one type to another, we can use the syntax datatype(item). For example:

In [12]:
In [13]:

The int conversion is tolerant of leading and trailing whitespace. So int(' 45 ') would have worked just as well. The str conversion can be applied to almost data item.

If a conversion fails, an exception is raised (we will see Exception Handling later).
In [3]:
int('Hello World!')
ValueError                                Traceback (most recent call last)
<ipython-input-3-3a9b49134975> in <module>()
----> 1 int('Hello World!')

ValueError: invalid literal for int() with base 10: 'Hello World!'

Variables and Object references

Once we have data item, the next thing we need is a variable to store it. A variable is a name that refers to a value.

Technically speaking, Python does not have variable as such, but instead has object references.

When it comes to immutable objects like str or int, there is no discernable difference between variable and an object reference. As for mutable objects there is a difference, but it rarely matters in practice.

So we will use the terms of variable or object reference interchangeably.

Let’s look at a few examples and see what is going on with the following statements:

In [16]:
x = 3
color = 'green'
y = color

The syntax is simply object reference = value: no need of predeclaration or value’s type.

The execution follows these steps:

  • On line 1, Python does 3 things: (1) it creates an int object with the value 3 (2) it creates the object reference x and (3) it creates a reference from the object to the value. (in practice, we say that variable x has been assigned the value 3)
  • Similarly on line 2
  • On line 3, a new object reference y is created and set to refer to the same object that the color reference object refers to (in this case the str object containing the value green).
In [27]:
x  = 3
color = 'green'
y = color
myprint(x, color, y)
3 green green
id(x)     = 140528831613536
id(color) = 140528517325632
id(y)     = 140528517325632

In [29]:
x = y
myprint(x, color, y)
green green green
id(x)     = 140528517325632
id(color) = 140528517325632
id(y)     = 140528517325632

Now the three objects references are referring to the same string with value “green”. Since there are no more object references to the integer value 3, Python is free to garbage it.

Python uses dynamic typing, which means that an object reference can be rebound to refer to a different object (which may be a different data type) at any time.

Mutable and Immutabe objects

In python we can classify objects in two categories: mutable and immutables.

  • Immutable objects are objects that cannot be changed. In other words, we can rebind the reference to a new object (with an other value), but we cannot change the value of the object itself. We have already seen immutable objects: int, str.
  • Mutable objects are objects that can be changed. One example of immutable object is the list, which is a collection of data items (we will see the Lists in details later).
In [11]:
x = ['a', 'b', 'c']
print("x =", x)
print("id(x) =",id(x))
x = ['a', 'b', 'c']
id(x) = 140412049856456
In [12]:
x[1] = 4
print("x =", x)
print("id(x) =",id(x))
x = ['a', 4, 'c']
id(x) = 140412049856456

Under the hood, the lists do not store data items per se but object references. In the figure here below, a list is created (with reference x); the list contains the 3 reference to 3 strings (‘a’, ‘b’ and ’c’).

We can easily change the state of the list, by rebinding the second element of it to the integer object newly created.

Since there is no more reference pointing to the string ‘b’, Python can free some memory (Python can garbage it thanks to its garbage collector).

Variable names (1/2)

  • Variable names are unlimited in length
  • Variable names start with a letter or underscore _ followed by letters, numbers or underscores.
  • Variable names are case-sensitive
  • Variable names conventionally have lower-case letters, with multiple words separated by underscores.
  • Variable names cannot be named with special keywords
Avoid to start a variable name with an underscore except if you know what it means !!
In [41]:
variable_one_as_an_example = 1
# VARIABLE, Variable are not good conventions

Python Keywords

  • Keywords are special names part of the Python language. You may have notice some of them: def, class, pass and import
  • variable cannot be named after a keywords (SyntaxError)
  • The list of keywords can be obtained using these commands:
In [42]:
import keyword
['False', 'None', 'True', 'and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']

We will cover most of the keywords.

Variable names (2/2)

Again, you cannot assign values to keywords

In [40]:
as = 1
  File "<ipython-input-40-24e7f14dd859>", line 1
    as = 1
SyntaxError: invalid syntax
If you give a variable an illegal name, you get a syntax error:
In [18]:
76trombones = 'big parade'
  File "<ipython-input-18-a5f509298d77>", line 1
    76trombones = 'big parade'
SyntaxError: invalid syntax

76trombones is illegal because it does not begin with a letter.

Data types

3 + 1 Numeric types

In [56]:
long_integer = 2**63

float1 = 2.1           
float2 = 2.0
float3 = 2.

complex_value = 1 + 2j
Python2: there was a dedicated integer type and long integers were encoded as
Python3: all integers are considered as long and have unlimited precision
In [344]:


When you mix numeric types in an expression, all operands are converted (or coerced) to the type with highest precision:

In [238]:
5 + 3.1
In [38]:
#cast with e.g., int and float builtin functions
int(5 + 3.0)
In [39]:
In [252]:
c = 1 + 1j
type(2 + c)

The integer, float and complex data types are builtins functions. You do not need to do anything to access to these functions.

However, there is another numeric data type in the standard library: Decimal. A decimal number is immutable. It has a sign, coefficient digits, and an exponent. To preserve significance, the coefficient digits do not truncate trailing zeros. Decimals also include special values such as Infinity, -Infinity, and NaN. The standard also differentiates -0 from +0.

Decimal is part of the decimal module. It means that we cannot create directly a decimal number as floating point just writing them with a decimal point, we must use the Decimal constructor to build a decimal objects. Decimal instances can be constructed from integers, strings, floats, or tuples. Construction from an integer or a float performs an exact conversion of the value of that integer or float.

In [20]:
from decimal import getcontext, Decimal
getcontext().prec = 28
In [23]:
from decimal import getcontext, Decimal
getcontext().prec = 6
print(Decimal(1) / Decimal(7))

getcontext().prec = 28
print(Decimal(1) / Decimal(7))

From tuple

  • The first value in the tuple should be an integer; either 0 for a positive number or 1 for a negative number.
  • The second value must be a tuple composed of intergers in the range 0 through 9
  • The third value is an integer representing the exponant
In [21]:
Decimal((0, (3, 1, 4), -2))
In [22]:

Although the division involving decimal is more accurate than ones involving floats, on a 32-bit machine the differences only shows up after the fifteenth decimal place. Futhermore the computation using decimals are slower than those invloving floats.

So use decimals only if a high precision is required.

  • in python3 decimal is 2 to 3 times slower than float.
  • in python2 decimal is 100 t0 200 times slower than float.

Variables and numeric types practical session

  1. Create a variable f set to 3.14159 and a variable g set to 10
  2. Check the type of the two variables
  3. what is the type of f divided by g
  4. Check the type of the number 2 and 2.0
  5. Create a complex variable denoted c. What is the type of c + g
  6. print the variables f, g, and c.
  7. Create two variables a and b. Check the values of their reference identifiers. How do they compare ? What do you think ?


4 ways to represent strings !

  • with single quotes
  • with double quotes
  • with triple single quotes
  • with triple double quotes
In [ ]:
simple = 'Information within single quotes'
In [ ]:
double = "Information within double quotes"
In [ ]:
#single quotes can be used to use double quotes and vice versa
"John's book"
In [ ]:
triple_simple = '''Information within triple single quotes,
and using 'single quotes'
and "double quotes" inside.'''
In [ ]:
triple_double = """Information within triple double quotes
    - and using 'single quotes'
    - and using "double quotes" inside."""

Triple quotes are used for long strings on several lines and documentation

basic operations on strings

In [44]:
# Concatenation
s = 'gaa'
s = s + 'ttc'

# augmented operator
s = 'gaa'
s += 'ttc'
In [45]:
# Replication
s = 'a' * 10

Comparing Strings

Strings support the usual comparison operators <, <=, ==, !=, >, >=. These operators compare strings byte by byte in memory.

>>> 'a' > 'b'
>>> 'albert' < 'alphonse'

The equality operator is ==. It allows one to test whether the string on the right and left hand side of the operator are equal.

>>> s1 = 'hello'
>>> s2 = 'hello'
>>> s1 == s2

String formating

There are 2 ways to generate complex strings with fixed parts and variable elements:

  • the old way using the operator % like in C.
  • the format method, which is more powerful and versatile.

String formating (C-like version)

In [49]:
# C-like formatting
print("You can also use %s %s\n" % ('C-like', 'formating'))
You can also use C-like formating

In [51]:
# special chracters \n prints an empty line
print("\nSpecial characters like in C can be used like \\n")
Special characters like in C can be used like \n
In [52]:
# Here %s is a specifier that converts the value to a string
print("%s" % 123456)

More about conversion specifiers

The general syntax for a conversion specifier is:


Key specifier


We have not seen the Python dictionaries so far but for book-keeping here the syntax:

In [27]:
print("%(key1)s and %(key2)s" % {'key1':1, 'key2':2})
1 and 2

Conversion types


We have already seen one type: the string type %s. There are many more:

character Description
c Converts to a single character
d,i Converts to a signed decimal integer or long integer
u Converts to an unsigned decimal integer
e,E Converts to a floating point in exponential notation
f Converts to a floating point in fixed-decimal notation
g Converts to the value shorter of %f and %e
G Converts to the value shorter of %f and %E
o Converts to an unsigned integer in octal
s Converts to a string using the str() function
x,X Converts to an unsigned integer in hexadecimal
In [59]:
print("%s -- %d" % (4.1, 8.2))
4.1 -- 8

Width option


The width option is a positive integer specifying the minimum field width. If the converted value is shorter than width, spaces are added on left or right (depending on the flags, as shown later):

In [53]:
print("(%10s)" %  "example")
(   example)

Precision option


prec is a dot (.) followed by a positive integer specifying the precision. Note that we use the %f conversion specifier here (float):

In [57]:
print("%.2f" %  2.012345)



Flags can be one the following character:

character Description example
0 pad numbers with leading zeros "%04d" % 0002
- left align the results (default is right) "%-4d" % 1
space add a space before a positive number or string
+ Always begin a number with a sign (+or-)
# display numbers in alternate form.
In [55]:
print("Results:%-10d and %04d" % (1234,2))
Results:1234       and 0002

String formating using format

Consider this example:

In [58]:
print("This {0} is {1} on format".format('example', 'based'))
This example is based on format

A replacement field is made of a name (or index) inside braces.

If the field name is an integer, it is the index position of one of the arguments passed to the format() method.

Here, the field 0 is replaced by the first argument example and second field 1 is replaced by based.

If we need to include braces inside formatted strings, we need to double the braces.

In [59]:
print("{{{0}}}, {1}.".format("I'm in braces", "I'm not"))
{I'm in braces}, I'm not.

format allow to perform conversion and concatenation in the same time.

In [60]:
e_value = "{0:f}".format(0.12)

The replacement field can have any of the following general syntaxes:

  • {field_name}
  • {field_name!conversion}
  • {field_name:format_specification}
  • {field_name!conversion:format_specification}

Fields names

In [99]:
fasta = '>{0} {2}\n{1}'.format('EcoR1', 'gaattc', 'restriction site 1 for Ecoli')
fasta = '>{} {}\n{}'.format('EcoR1', 'restriction site 1 for Ecoli', 'gaattc')
>EcoR1 restriction site 1 for Ecoli
>EcoR1 restriction site 1 for Ecoli

with named fields

In [62]:
fasta = '>{id} {comment}\n{seq}'.format(seq='gaattc', id='EcoR1', 
                                comment='restriction site 1 for Ecoli')
>EcoR1 restriction site 1 for Ecoli


Most objects have a string representation whose purpose is to be human-readable. All built-in data types have a string form. See the string and number sub sections hereafter

Format Specifications

Usually, the default formating works but we can have a fine control on how values are formatted by using the format specifications.


For strings, we can control the fill character, the alignment within the field, and the minimum and maximum field widths. String format specifications is introduced with a semicolon(:) and has the following syntax

format_spec ::=  [[fill]align][#][0][minimum width][.maximum width]
fill        ::=  <any character>
align       ::=  "<" | ">" | "^"
minimum width       ::=  integer
maximum precision   ::=  integer
In [66]:
print('{:30}'.format('minimum size')) # minimum width 30
print('{:<30}'.format('left aligned')) # minimum width 30 and left aligned
print('{:>30}'.format('right aligned')) # minimum width 30 and right aligned
print('{:^30}'.format('centered')) # minimum width 30 and centered
print('{:*^30}'.format('centered'))  # use '*' as a fill char
print('{:^.5}'.format('centered'))  # maximum 5 chars width
minimum size                  
left aligned                  
                 right aligned

The syntax for numbers is the same as for string but there is some specific fields.

In [73]:
import math
z = math.pi
print("Format a decimal number with 2 digit width", end=": ")
print("Format a decimal number with 2 digits after the dot", end=": ")
print("Format a decimal number with 12 digits including 1 digit after " +
      "the dot, and padding the left with 0", end=": ")
print("{:012.1f}".format(z * 10)) 
print("Display using exponential notation, with 2 digit after dot", end=": ")
print("{:.2e}".format(math.pi * 100))
print('GC coverage = {:.2%}'.format(125 / 230))
Format a decimal number with 2 digit width: 3.1
Format a decimal number with 2 digits after the dot: 3.14
Format a decimal number with 12 digits including 1 digit after the dot, and padding the left with 0: 0000000031.4
Display using exponential notation, with 2 digit after dot: 3.14e+02
GC coverage = 54.35%

For full description of strings formating see

String formating practical session

Ex1: print the Golden number $\phi = \frac{1+\sqrt{5}}{2}$ in scientific notation and 4 digits
Ex2: Print Pierre said "100%: it's awesome!".
Ex3: Print the following 3 pyramids without spaces using only * characters


Ex4: set a variable to $2^{16}$ and print the variable
Ex5: print a complex number
Ex6: can you print the print function ?


Booleans represent the truth values False and True (False and True are keywords, note the capital letters).

  • The two objects representing the values False and True are the only Boolean objects.

  • The Boolean type is a subtype of plain integers

  • Boolean values behave like the values 0 and 1,

What is False what is True ?

Empty objects are False ([], {}, set, ...) as well as the 0 , 0.0 value or empty string ''. Every other values are True Use the bool built-in function to check if a value is True or False.

In [98]:
a = True
b = False
print(a, b)
True False
In [96]:
In [97]:
In [95]:
In [94]:

None Type

The sole value of the type NoneType is None. The None value represents something which is unknown, undefined, None is also frequently used to represent the absence of a value as when default arguments are not passed to a function.

In [47]:
a = None
b = None
<class 'NoneType'>

the id() function returns an integer representing its identity (currently implemented as its address). so if 2 objects have the same id it's the same object.

The None value is converted in boolean to False:

>>> bool(None)


A module is nothing else than a Python file with the extension .py

One can import a module in the working space using the import keyword. It allows us to import standard Python modules such as:

In [8]:
import math
import os

import can also be used to import a local / user module.

The dot operator

The dot operator allows one to access to the content of an object (all the content). In IPython, type the name of a variable followed by a dot and then press tab to see all the content

In [270]:
import math

It is also used for decimal

In the next practical session, try to type name of a variable followed by a dot character and then press tab. You should see the list of methods/variables available

Variables and numeric types practical session

Using the math module as much as possible, compute

  • the volume of a sphere with radius r (.e.g, 10) ($4/3 πr^3$).
  • Set the variable $\phi = \frac{1+\sqrt{5}}{2}$. Compute $f = \cos{\phi}^2+\sin{\phi}^2$
  • Check that $f=1$ using the == operator
  • Check that in Python, the letter "a" is less than the letter"b"
  • Check that in Python, the string "ALBERT" is not equal to "albert". Which one is greater ? why ?

Create your first Python module

  1. Create a file/module called
  2. Add a variable (constant) called golden_number set to $\frac{1+\sqrt{5}}{2}$
  3. Save and quit
  4. open an IPython shell and check you can get the golden_number in your environment
  1. Explore the math and os modules with the dot operator to figure out the variables / functions that are available
  2. with the math module, check that $2^{10000}$ is finite or less than infinite

The os module (calling a function)

In [310]:
# In ipython, you can use !ls (unix command) but this is not python syntax
!ls -a
.		coding_small.png    mysequence.fasta  text.txt
..     python.jpg	      warning.jpg
alice_mask.png	custom.css	    pythonlogo.jpg    warning_small.jpg
cloud.png	Day1.ipynb	    python_mask.png   warning_small.png	Day1.slides.html    python.png
coding.jpg	.ipynb_checkpoints  python_words.txt
coding.png	Makefile	    test.jpg
In [314]:
import os
results = os.listdir('.')
In [315]:

advantage: multi platform

The math module (from, as and import keywords)

In [334]:
import math
In [335]:
from math import pi
In [317]:
# alias are possible on the module itself
import math as m
In [318]:
from math import pi as PI
You may see
from math import *
to import everything from the module but avoid it because it overwrites local variables: you do not know what is imported. This can also be time-consumming

Python as a calculator (2)

In [84]:
import math
1. + math.log10(1000)
In [85]:
math.sqrt(4.) + math.cos(math.pi/2.)
In [86]:

and so on

Just a quick note about objects:

  • Everything in Python is an object, which can be seen as an advanced version of a variable or a function.
  • Objects have methods (functions) or attributes (variables)

Let us use a new keyword (dir) to discover available methods and attributes

In [101]:
['__abs__', '__add__', '__bool__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getformat__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__int__', '__le__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__pos__', '__pow__', '__radd__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__', '__round__', '__rpow__', '__rsub__', '__rtruediv__', '__setattr__', '__setformat__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', 'as_integer_ratio', 'conjugate', 'fromhex', 'hex', 'imag', 'is_integer', 'real']

Methods are like functions but they act on the object itself. They may either return a new object or perform the function inplace

In [102]: