As you will see, Python is extremely easy to get started with. Python has a simple syntax, yet powerful enough for most applications.
- It can be used as a functional language similarly to C but it is also an object oriented language like C++
print("Hello World!")
Hello World!
Here is a more complex example that remains short and easy to understand. The code below
- fetches a fasta data set from the web
- extracts the sequence
- counts the number of 'W' letter in the sequence
from urllib import request
urlname = 'http://www.uniprot.org/uniprot/P56945.fasta'
data = request.urlopen(urlname).read() # we read binary data
data = data.decode() # so we need to decode into text
sequence = data.split("\n",1)[1]
sequence.count("W")
6
- no type declaration required.
- many built-in data structures are already available: dictionary, lists...
- no need for memory handling: there is a memory garbage collector
Python can be run, debugged and tested interactively in the python interpreter and ipython notebooks
Byte code can be executed on different platforms.
- Many modules are already provided (e.g., urllib in the example above)
- There is a rich set of external libraries especially for science (matplotlib for visualisation, pandas for data mangling, networkx for graph theory, scikit-learn for machine learning, etc.)
- Python can be used to connect many applications or separate software components in a simple manner.
- scalable: support large program thanks to a better structure and support than standard scripting languages. Provide tools to build modular application with modules, packages but also GUI interface
- large programs can be organised into smaller modules that can still interact with each other or built-in modules
- indentation intimidating for beginners but after a couple of hours of coding it becomes natural.
- more importantly, interpreted so it is slower than compiled languages.
- 2 versions of Python are used since many years (Python2 and Python3). Our advice: use the latest official version if possible
Python provides a shell (type python in you favorite command line interface). The user interface is not amazing so we strongly recommend to install http://www.ipython.org instead, which provides nice features such as tab completion.
In order to start an interactive python shell, just call the ipython executable in a command line interface.
- Install Anaconda https://www.continuum.io/downloads Python3
- Install IPython with conda executable
- Start ipython in a shell
- Check that it is properly installed (for instance, try the hello world or the example with the uniprot sequence)
If you don't have time to install IPython, try it on https://www.pythonanywhere.com/
IPython notebooks can be used to put code, results and images altogether. This is now also known as Jupyter notebooks (you can add Python code but also R, or other languages)
conda install jupyter
In a shell, just type:
jupyter
In the same shell when you started the server, type Ctrl+C
In a notebook, cells contain code OR documentation
You can execute a cell by highlighting the cell and typing: <SHIFT> + <ENTER>
# <SHIFT> + <ENTER> execute the commands in a cell
variable = 1
Get help about a variable, function:
variable?
Object `variable` not found.
%%bash
VAR=1
echo $VAR
1
%%javascript
var a=1;
console.log(a);
%time
for i in range(1000000):
2**2
CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 9.06 µs
# shell
!ls c*
custom.css
- Install jupyter with conda executable
- Start a jupyter session (in a shell)
- Create your first notebook with the hello and uniprot examples
- Open an existing notebook (e.g. this notebook)
Before starting, you need to know that in Python, code indentation is an essential part of the syntax. It is used to delimitate code blocks such as loops and functions. It may seem cumbersome, but it makes all Python code consistent and readable. The following code is incorrect:
>>> a = 1
>>> b = 2
since the two statements are not aligned despite being part of the same block of statements (the main block). Instead, they must be indented in the same way:
>>> a = 1
>>> b = 2
Here is another example involving a loop and a function (def):
def example():
for i in [1, 2, 3]:
print(i)
In C, it may look like
void example(){
int i;
for (i=1; i<=3; i++){
printf("%d\n", i);
}
}
OR
void example(){
int i;
for (i=1; i<=3; i++){
printf("%d\n", i);
}}
OR
void example(){int i;for (i=1; i<=3; i++){printf("%d\n", i);}}
One major interest of the indentation is that it results in python code looking cosmetically identical, which means less cognitive dissonance when reading a piece of code written by someone else.
a = 10
b = 2
a + b
12
# incremental (augmented) operators
a = 10
a += 2 # equivalent to a = a + 2 (there is no ++ operators like in C/C++)
a
12
10/3
3.3333333333333335
10 // 3
3
10 % 3
1
A value is one of the basic things a program works with, like a letter or a number. The values we have seen so far are 1, 2, 2.1 and 'Hello, World!'.
These values belong to different types: 2 is an integer (Integers), 2.1 is a float and 'Hello, World!' is a string (Strings), so-called because it contains a string of letters. You (and the interpreter) can identify strings because they are enclosed in quotes. We speak also of data type.
If you are not sure what type a value has, the interpreter can tell you thanks to the type built-in function.
type('Hello World')
str
type(17)
int
What about values like '17' and '3.2'? They look like numbers, but they are in quotes like strings.
type('17')
str
To convert a data item from one type to another, we can use the syntax datatype(item). For example:
int('45')
45
str(45)
'45'
The int conversion is tolerant of leading and trailing whitespace. So int(' 45 ') would have worked just as well. The str conversion can be applied to almost data item.
int('Hello World!')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-3-3a9b49134975> in <module>() ----> 1 int('Hello World!') ValueError: invalid literal for int() with base 10: 'Hello World!'
Once we have data item, the next thing we need is a variable to store it. A variable is a name that refers to a value.
Technically speaking, Python does not have variable as such, but instead has object references.
When it comes to immutable objects like str or int, there is no discernable difference between variable and an object reference. As for mutable objects there is a difference, but it rarely matters in practice.
So we will use the terms of variable or object reference interchangeably.
Let’s look at a few examples and see what is going on with the following statements:
x = 3
color = 'green'
y = color
The syntax is simply object reference = value: no need of predeclaration or value’s type.
The execution follows these steps:
x = 3
color = 'green'
y = color
myprint(x, color, y)
3 green green id(x) = 140528831613536 id(color) = 140528517325632 id(y) = 140528517325632
x = y
myprint(x, color, y)
green green green id(x) = 140528517325632 id(color) = 140528517325632 id(y) = 140528517325632
Python uses dynamic typing, which means that an object reference can be rebound to refer to a different object (which may be a different data type) at any time.
In python we can classify objects in two categories: mutable and immutables.
x = ['a', 'b', 'c']
print("x =", x)
print("id(x) =",id(x))
x = ['a', 'b', 'c'] id(x) = 140412049856456
x[1] = 4
print("x =", x)
print("id(x) =",id(x))
x = ['a', 4, 'c'] id(x) = 140412049856456
Under the hood, the lists do not store data items per se but object references. In the figure here below, a list is created (with reference x); the list contains the 3 reference to 3 strings (‘a’, ‘b’ and ’c’).
We can easily change the state of the list, by rebinding the second element of it to the integer object newly created.
Since there is no more reference pointing to the string ‘b’, Python can free some memory (Python can garbage it thanks to its garbage collector).
variable_one_as_an_example = 1
# VARIABLE, Variable are not good conventions
- Keywords are special names part of the Python language. You may have notice some of them: def, class, pass and import
- variable cannot be named after a keywords (SyntaxError)
- The list of keywords can be obtained using these commands:
import keyword
print(keyword.kwlist)
['False', 'None', 'True', 'and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']
We will cover most of the keywords.
Again, you cannot assign values to keywords
as = 1
File "<ipython-input-40-24e7f14dd859>", line 1 as = 1 ^ SyntaxError: invalid syntax
76trombones = 'big parade'
File "<ipython-input-18-a5f509298d77>", line 1 76trombones = 'big parade' ^ SyntaxError: invalid syntax
76trombones is illegal because it does not begin with a letter.
long_integer = 2**63
float1 = 2.1
float2 = 2.0
float3 = 2.
complex_value = 1 + 2j
1LPython3: all integers are considered as long and have unlimited precision
long_integer
9223372036854775808
Promotion
When you mix numeric types in an expression, all operands are converted (or coerced) to the type with highest precision:
5 + 3.1
8.1
#cast with e.g., int and float builtin functions
int(5 + 3.0)
8
float(1)
1.0
c = 1 + 1j
type(2 + c)
complex
The integer, float and complex data types are builtins functions. You do not need to do anything to access to these functions.
However, there is another numeric data type in the standard library: Decimal. A decimal number is immutable. It has a sign, coefficient digits, and an exponent. To preserve significance, the coefficient digits do not truncate trailing zeros. Decimals also include special values such as Infinity, -Infinity, and NaN. The standard also differentiates -0 from +0.
Decimal is part of the decimal module. It means that we cannot create directly a decimal number as floating point just writing them with a decimal point, we must use the Decimal constructor to build a decimal objects. Decimal instances can be constructed from integers, strings, floats, or tuples. Construction from an integer or a float performs an exact conversion of the value of that integer or float.
from decimal import getcontext, Decimal
getcontext().prec = 28
print(Decimal(10))
print(Decimal('3.14'))
print(Decimal(3.14))
10 3.14 3.140000000000000124344978758017532527446746826171875
from decimal import getcontext, Decimal
getcontext().prec = 6
print(Decimal(1) / Decimal(7))
getcontext().prec = 28
print(Decimal(1) / Decimal(7))
0.142857 0.1428571428571428571428571429
From tuple
Decimal((0, (3, 1, 4), -2))
Decimal('3.14')
print(Decimal('NaN'))
print(Decimal('-Infinity'))
NaN -Infinity
For more examples see https://docs.python.org/3/library/decimal.html#quick-start-tutorial
Although the division involving decimal is more accurate than ones involving floats, on a 32-bit machine the differences only shows up after the fifteenth decimal place. Futhermore the computation using decimals are slower than those invloving floats.
So use decimals only if a high precision is required.
- Create a variable f set to 3.14159 and a variable g set to 10
- Check the type of the two variables
- what is the type of f divided by g
- Check the type of the number 2 and 2.0
- Create a complex variable denoted c. What is the type of c + g
- print the variables f, g, and c.
- Create two variables a and b. Check the values of their reference identifiers. How do they compare ? What do you think ?
4 ways to represent strings !
- with single quotes
- with double quotes
- with triple single quotes
- with triple double quotes
simple = 'Information within single quotes'
double = "Information within double quotes"
#single quotes can be used to use double quotes and vice versa
"John's book"
triple_simple = '''Information within triple single quotes,
and using 'single quotes'
and "double quotes" inside.'''
triple_double = """Information within triple double quotes
- and using 'single quotes'
- and using "double quotes" inside."""
Triple quotes are used for long strings on several lines and documentation
# Concatenation
s = 'gaa'
s = s + 'ttc'
print(s)
# augmented operator
s = 'gaa'
s += 'ttc'
print(s)
gaattc gaattc
# Replication
s = 'a' * 10
print(s)
aaaaaaaaaa
Strings support the usual comparison operators <
, <=
, ==
, !=
, >
, >=
.
These operators compare strings byte by byte in memory.
>>> 'a' > 'b'
False
>>> 'albert' < 'alphonse'
True
The equality operator is ==. It allows one to test whether the string on the right and left hand side of the operator are equal.
>>> s1 = 'hello'
>>> s2 = 'hello'
>>> s1 == s2
True
There are 2 ways to generate complex strings with fixed parts and variable elements:
# C-like formatting
print("You can also use %s %s\n" % ('C-like', 'formating'))
You can also use C-like formating
# special chracters \n prints an empty line
print("\nSpecial characters like in C can be used like \\n")
Special characters like in C can be used like \n
# Here %s is a specifier that converts the value to a string
print("%s" % 123456)
123456
The general syntax for a conversion specifier is:
%[(key)][flags][width][.prec]type
%[(key)][flags][width][.prec]type
We have not seen the Python dictionaries so far but for book-keeping here the syntax:
print("%(key1)s and %(key2)s" % {'key1':1, 'key2':2})
1 and 2
We have already seen one type: the string type %s. There are many more:
character | Description |
---|---|
c | Converts to a single character |
d,i | Converts to a signed decimal integer or long integer |
u | Converts to an unsigned decimal integer |
e,E | Converts to a floating point in exponential notation |
f | Converts to a floating point in fixed-decimal notation |
g | Converts to the value shorter of %f and %e |
G | Converts to the value shorter of %f and %E |
o | Converts to an unsigned integer in octal |
s | Converts to a string using the str() function |
x,X | Converts to an unsigned integer in hexadecimal |
print("%s -- %d" % (4.1, 8.2))
4.1 -- 8
The width option is a positive integer specifying the minimum field width. If the converted value is shorter than width, spaces are added on left or right (depending on the flags, as shown later):
print("(%10s)" % "example")
( example)
prec is a dot (.) followed by a positive integer specifying the precision. Note that we use the %f conversion specifier here (float):
print("%.2f" % 2.012345)
2.01
Flags can be one the following character:
character | Description | example |
---|---|---|
0 | pad numbers with leading zeros | "%04d" % 0002 |
- | left align the results (default is right) | "%-4d" % 1 |
space | add a space before a positive number or string | |
+ | Always begin a number with a sign (+or-) | |
# | display numbers in alternate form. |
print("Results:%-10d and %04d" % (1234,2))
Results:1234 and 0002
Consider this example:
print("This {0} is {1} on format".format('example', 'based'))
This example is based on format
A replacement field is made of a name (or index) inside braces.
If the field name is an integer, it is the index position of one of the arguments passed to the format() method.
Here, the field 0 is replaced by the first argument example and second field 1 is replaced by based.
If we need to include braces inside formatted strings, we need to double the braces.
print("{{{0}}}, {1}.".format("I'm in braces", "I'm not"))
{I'm in braces}, I'm not.
format allow to perform conversion and concatenation in the same time.
e_value = "{0:f}".format(0.12)
print(e_value)
0.120000
The replacement field can have any of the following general syntaxes:
fasta = '>{0} {2}\n{1}'.format('EcoR1', 'gaattc', 'restriction site 1 for Ecoli')
print(fasta)
fasta = '>{} {}\n{}'.format('EcoR1', 'restriction site 1 for Ecoli', 'gaattc')
print(fasta)
>EcoR1 restriction site 1 for Ecoli gaattc >EcoR1 restriction site 1 for Ecoli gaattc
with named fields
fasta = '>{id} {comment}\n{seq}'.format(seq='gaattc', id='EcoR1',
comment='restriction site 1 for Ecoli')
print(fasta)
>EcoR1 restriction site 1 for Ecoli gaattc
Most objects have a string representation whose purpose is to be human-readable. All built-in data types have a string form. See the string and number sub sections hereafter
Usually, the default formating works but we can have a fine control on how values are formatted by using the format specifications.
For strings, we can control the fill character, the alignment within the field, and the minimum and maximum field widths. String format specifications is introduced with a semicolon(:) and has the following syntax
format_spec ::= [[fill]align][#][0][minimum width][.maximum width]
fill ::= <any character>
align ::= "<" | ">" | "^"
minimum width ::= integer
maximum precision ::= integer
print('{:30}'.format('minimum size')) # minimum width 30
print('{:<30}'.format('left aligned')) # minimum width 30 and left aligned
print('{:>30}'.format('right aligned')) # minimum width 30 and right aligned
print('{:^30}'.format('centered')) # minimum width 30 and centered
print('{:*^30}'.format('centered')) # use '*' as a fill char
print('{:^.5}'.format('centered')) # maximum 5 chars width
minimum size left aligned right aligned centered ***********centered*********** cente
The syntax for numbers is the same as for string but there is some specific fields.
import math
z = math.pi
print("Format a decimal number with 2 digit width", end=": ")
print("{:.2}".format(z))
print("Format a decimal number with 2 digits after the dot", end=": ")
print("{:.2f}".format(z))
print("Format a decimal number with 12 digits including 1 digit after " +
"the dot, and padding the left with 0", end=": ")
print("{:012.1f}".format(z * 10))
print("Display using exponential notation, with 2 digit after dot", end=": ")
print("{:.2e}".format(math.pi * 100))
print('GC coverage = {:.2%}'.format(125 / 230))
Format a decimal number with 2 digit width: 3.1 Format a decimal number with 2 digits after the dot: 3.14 Format a decimal number with 12 digits including 1 digit after the dot, and padding the left with 0: 0000000031.4 Display using exponential notation, with 2 digit after dot: 3.14e+02 GC coverage = 54.35%
For full description of strings formating see https://docs.python.org/3/library/string.html#formatstrings
Ex1: print the Golden number $\phi = \frac{1+\sqrt{5}}{2}$ in scientific notation and 4 digits
Ex2: Print Pierre said "100%: it's awesome!".
Ex3: Print the following 3 pyramids without spaces using only * characters* *** ***** ---------------------- * *** ***** ---------------------- * *** *****
Ex4: set a variable to $2^{16}$ and print the variable
Ex5: print a complex number
Ex6: can you print the print function ?
Booleans represent the truth values False and True (False and True are keywords, note the capital letters).
The two objects representing the values False and True are the only Boolean objects.
The Boolean type is a subtype of plain integers
Empty objects are False ([], {}, set, ...) as well as the 0 , 0.0 value or empty string ''. Every other values are True Use the bool built-in function to check if a value is True or False.
a = True
b = False
print(a, b)
True False
bool(1)
True
bool(0)
False
bool('foo')
True
bool('')
False
The sole value of the type NoneType is None
.
The None value represents something which is unknown, undefined,
None is also frequently used to represent the absence of a value as when default arguments are not passed to a function.
print(type(None))
a = None
b = None
print(id(a))
print(id(b))
<class 'NoneType'> 140412370256016 140412370256016
the id() function returns an integer representing its identity (currently implemented as its address). so if 2 objects have the same id it's the same object.
The None value is converted in boolean to False:
>>> bool(None)
False
A module is nothing else than a Python file with the extension .py
One can import a module in the working space using the import keyword. It allows us to import standard Python modules such as:
- os
- math
- sys
- urllib
- tens of others are available. See https://docs.python.org/2/py-modindex.html
import math
import os
import can also be used to import a local / user module.
The dot operator allows one to access to the content of an object (all the content). In IPython, type the name of a variable followed by a dot and then press tab to see all the content
import math
math.pi
3.141592653589793
It is also used for decimal
Using the math module as much as possible, compute
- the volume of a sphere with radius r (.e.g, 10) ($4/3 πr^3$).
- Set the variable $\phi = \frac{1+\sqrt{5}}{2}$. Compute $f = \cos{\phi}^2+\sin{\phi}^2$
- Check that $f=1$ using the == operator
- Check that in Python, the letter "a" is less than the letter"b"
- Check that in Python, the string "ALBERT" is not equal to "albert". Which one is greater ? why ?
- Create a file/module called toolkit.py
- Add a variable (constant) called golden_number set to $\frac{1+\sqrt{5}}{2}$
- Save and quit
- open an IPython shell and check you can get the golden_number in your environment
- Explore the math and os modules with the dot operator to figure out the variables / functions that are available
- with the math module, check that $2^{10000}$ is finite or less than infinite
# In ipython, you can use !ls (unix command) but this is not python syntax
!ls -a
. coding_small.png mysequence.fasta text.txt .. create_cloud.py python.jpg warning.jpg alice_mask.png custom.css pythonlogo.jpg warning_small.jpg cloud.png Day1.ipynb python_mask.png warning_small.png cloud.py Day1.slides.html python.png coding.jpg .ipynb_checkpoints python_words.txt coding.png Makefile test.jpg
import os
results = os.listdir('.')
os.path.exists('data.txt')
False
advantage: multi platform
import math
math.pi
3.141592653589793
from math import pi
pi
3.141592653589793
# alias are possible on the module itself
import math as m
m.pi
3.141592653589793
from math import pi as PI
PI
3.141592653589793
from math import *to import everything from the module but avoid it because it overwrites local variables: you do not know what is imported. This can also be time-consumming
import math
1. + math.log10(1000)
4.0
math.sqrt(4.) + math.cos(math.pi/2.)
2.0
math.ceil(.9)
1
Just a quick note about objects:
- Everything in Python is an object, which can be seen as an advanced version of a variable or a function.
- Objects have methods (functions) or attributes (variables)
Let us use a new keyword (dir) to discover available methods and attributes
print(dir(math.pi))
['__abs__', '__add__', '__bool__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getformat__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__int__', '__le__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__pos__', '__pow__', '__radd__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__', '__round__', '__rpow__', '__rsub__', '__rtruediv__', '__setattr__', '__setformat__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', 'as_integer_ratio', 'conjugate', 'fromhex', 'hex', 'imag', 'is_integer', 'real']
Methods
Methods are like functions but they act on the object itself. They may either return a new object or perform the function inplace
math.pi.is_integer()
False