To be used in this notebook:
- built-in function's name can be over-written
- keywords cannot
- So, better to know the built-in functions: https://docs.python.org/3/library/functions.html
float = 1
float
1
# if we delete the variable, we do get back the original built-in function !
del float
float(1.)
1.0
# keywords cannot be re-used
del = 1
File "<ipython-input-87-e5d9a8bc7b5b>", line 2 del = 1 ^ SyntaxError: invalid syntax
### List of builtin functions
import builtins
print(dir(builtins)[79:])
['abs', 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'dreload', 'enumerate', 'eval', 'exec', 'filter', 'float', 'format', 'frozenset', 'get_ipython', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']
We have already used a few functions (builtin functions). Let us now write our own functions, to be used for repetitive tasks.
def hello():
# a comment
print("hello")
# call the function (name followed by empty parenthesis)
hello()
hello
All functions return something (default is None). So the output can be redirected into a variable
result = hello()
hello
result is None
True
Most functions and classes are documented with docstrings, that is a string in triple quotes
def my_function(x, verbose=True):
"""A main description
Following by details about e.g., the arguments
:param x: the input value (must be positive)
:return: nothing
Example:
my_function(10)
"""
if verbose and x <0:
print('This is a negative number !! ')
else:
print(sqrt(x))
def compute_gc_content(sequence):
GC = sequence.count('G') + sequence.count('C')
GC /= len(sequence)
GC *= 100
return GC
compute_gc_content("ACGTACGTGCGCT")
61.53846153846154
#Several input arguments
def polynomial(x, exp):
return x**exp - x + 1
polynomial(2,8)
255
Position argument: because the order of each positional argument matters
Arguments may have default values, in which case you must name it with a keyword
def power_kw(x, exp=2):
return x**exp
power_kw(2)
4
# Here is one way to call the function
power_kw(2, 3)
8
# but you can also name the optional argument, which is more
# robust (if your API changes later on)
power_kw(2, exp=3)
8
def order(x, y=2, z=3):
print(x, y, z)
order(1, 2, 3)
order(1, 2) # same as above
order(1) # same as above
order(1, y=2, z=3) # same as above
order(1, z=3, y=2) # same as above
order(1, y=2) # same as above
order(1, z=3) # same as above
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
You cannot skip an argument: order(1, ,3) is not valid. Without naming keyword arguments, you must provide all of them
all positional parameters must be on the left on the keyword parameters. This definition is not correct:
def power_does_not_work(y=2, x):
return x**y
File "<ipython-input-1-7d30c6e4fbba>", line 1 def power_does_not_work(y=2, x): ^ SyntaxError: non-default argument follows default argument
Sometimes you do not know before hand how many arguments are required
def func(*args):
print(args[0])
return sum(args)
func(1,2,3,4,5,6)
1
21
*args syntax converts all positional input arguments into a tuple. The way Python handles the provided arguments is to match the normal positional arguments from left to right and then places any other positional arguments in a tuple (*args) that can be used by the function.
def extra_sum(items, *extra_items):
return sum(items) + sum(extra_items)
extra_sum([1,2,3], 4,5,6)
21
Similarly to the positional arguments, you can specify an arbitrary number of keyword arguments by using the following syntax (combined with the arbitrary number of optional arguments introduced in the previous section):
def funckw(pos_params, *args, **kwargs):
print(args)
print(kwargs)
funckw(1000, 2, 3, 4,5, test1=1, test2=2, test3=3)
(2, 3, 4, 5) {'test2': 2, 'test3': 3, 'test1': 1}
You can pass a list/tuple of values as argument
t = ('a',2)
funckw(1, t) # does not work as expected: the function got one item (the tuple)
(('a', 2),) {}
but you would need to use the * syntax like in the prototype
funckw(1, *t)
('a', 2) {}
Remember that
- a list is mutable sequence
- object are passed by reference
def reference(mutable):
mutable[0] = 1
mylist = [10, 2, 3]
reference(mylist)
mylist #changed !! since mylist is mutable list
[1, 2, 3]
If you do no want the sequence to be changed, pass a copy
mylist2 = [10,2,3]
reference(mylist2[:])
mylist2 # unchanged since we passed a copy of l using [:]
[10, 2, 3]
- How many positional and keyword argument in the following function declaration
What is the output (guess it, do not run the function)def myfunc(a, b, c, d=1, e=2): return a+b+c+d+ef(1,2,3) f(1,2,3, e=4)- ADVANCED: Recursive function. Write a recursive (or not) function that return the Fibonacci sequence, which is defined by
- u[0] = 1
- u[1] = 1
- u[n+2] = u[n+1] + u[n]
- write a function that fetches a uniprot sequence.
- Write another function. Given a string sequence as an argument, the function should return a dictionary with the count of each letter. (see first notebook)
- Finally using the two previous functions, crate a third one that takes as input a uniprot sequence and returns a dictionary with the count of each letter in the sequence corresponind to the uniprot ID .
Write a function that takes as argument a list of numbers. The function should return the mean of the list. First, consider only pure Python code without any import Then use the statistics module. Once done, rename your function and call it statistics. Adapt your code so that it is functional
import math
def mylog(x):
def check(x):
if x<=0:
print('warning x is negative')
return False
return True
if check(x) is True:
return math.log(x)
else:
return math.nan
res = mylog(-1)
warning x is negative
check(1)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-34-08d9533016ef> in <module>() ----> 1 check(1) NameError: name 'check' is not defined
The pass keyword is a no-operation statement: it does nothing. Handy within function definition or control flow. It is required within block of statements that does nothing.
# useful for function
def func_to_implement_one_day():
# TODO
pass
See also control flow examples later
The is operator is a binary operator that returns True if its left-hand object reference is referring to the same object as its right-hand object reference.
Note that it usually does not make sense to use is for comparing ints
, strs
, and most other data types since we almost invariably want to compare their values.
To invert the identity test we use is not .
a = "Something"
b = None
a is not None, b is None
(True, True)
Python provides the standard set of binary comparison operators, with the expected semantics: < less than, <= less than or equal to, == equal to, != not equal to, >= greater than or equal to, and > greater than.
a = 2
b = 6
a == b
False
a = 2
b = 6
a < b
a = 2
b = 6
a <= b, a != b, a >= b, a > b
(True, True, False, False)
0 < a < b
True
we can also compare strings.
"three" < "four"
False
"three" < 4
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-8-f2a612e11247> in <module>() ----> 1 "three" < 4 TypeError: unorderable types: str() < int()
See first notebook
Python provides three logical operators: and, or, and not. Both and and or use short-circuit logic and return the operand that determined the result they do not return a Boolean (unless they actually have Boolean operands). Let’s see what this means in practice:
five = 5
two = 2
zero = 0
print(five and two)
# bool(2) = True
print(two and five)
# bool(5) = True
print(five and zero)
# bool(0) = False
2 5 0
nought = 0
print(five or two)
print(two or five)
print(zero or five)
print(zero or nought)
5 2 5 0
a = 111
if a > 0:
if a > 1e6:
print("large positive")
else:
print("positive")
elif a < 0:
print("negative")
else:
print("zero")
positive
i = 10
print("if elif statement")
if i > 1:
print("i > 1")
elif i > 2:
print("i > 2")
elif i > 3:
print("i > 3")
print("suite of if statements")
if i > 1:
print("i > 1")
if i > 2:
print("i > 2")
if i > 3:
print("i > 3")
if elif statement i > 1 suite of if statements i > 1 i > 2 i > 3
Iterators are objects that can be traversed through all its elements. In python many objects are iterators. The for loop (see next section) iterates through iterable object(s). You can transform an object into an iterator using the iter() builtin function
x = [1, 3]
ix = iter(x)
# you can then call next() until an error is raised, which indicaties the end of the iterator
next(ix)
1
next(ix)
3
next(ix)
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-100-7d48257f631e> in <module>() ----> 1 next(ix) StopIteration:
For loops are defined with the for keyword and an ending : character
for this in [1, 2, 3]:
print(this)
1 2 3
The break keyword stops the iteration of the for/while loop. The break statement causes the program flow to exit the body of the for/while loop and resume the execution of the program at the next statement after the loop.
for x in [1, 3, 5]:
if x>3:
break
print(x)
1 3
Another keyword related to loops is the continue keyword. Instead of continuing the current iteration, the continue statement rejects all the remaining statements in the current iteration of the loop and moves the control back to the top of the loop.
S = 0
for x in [1, 2, None, 3, 4]:
if x is None:
continue # jump to next iteration
S += x
print(S)
10
There is an optional else statement executed when the loop is over, which is hardly used but it is interesting to know its existence. It is not executed if the loop is interrupted (a break statement):
for x in [1,2,3]:
print(x)
else:
print('e')
1 2 3 e
for x in [1,2,3]:
print(x)
if x>1:
break
else:
print('e')
1 2
S = 0
for x in [0, 1, 2, 3]:
S += x
S
6
For long sequences, use the range function.
S = 0
for x in range(0, 1000):
S +=x
print(S)
499500
for this in range(0, 10,2):
print("{} ".format(this), end="")
0 2 4 6 8
Note the usage of end argument in the print() function to prevent newline after each print() call
Although hardly used in Python code, the while loop is also available
i = 0
S = 0
while S < 1000:
S += i
i += 1
print(i, S)
46 1035
range and xrange functions are useful to create a sequence of numbers. xrange returns a generator instead of a list, which makes it slightly faster and memory more efficient.
%%timeit
x = range(0,1000000)
100 loops, best of 3: 13.8 ms per loop
%%timeit
# PYTHON 2 only. No need in Python 3 anymore
x = xrange(0, 1000000)
1000000 loops, best of 3: 209 ns per loop
classical way to have an index while looping over a sequence:
count = 0
for x in [5, 6, 7, 8, 9]:
print("position " + str(count) + ": value " + str(x))
count += 1
position 0: value 5 position 1: value 6 position 2: value 7 position 3: value 8 position 4: value 9
but in python, you should use enumerate instead:
for count, x in enumerate([5, 6, 7, 8, 9]):
print("position " + str(count) + ": value " + str(x))
position 0: value 5 position 1: value 6 position 2: value 7 position 3: value 8 position 4: value 9
How to loop the contents of 2 lists at the same time without indexing ?
We’ve seen the range function earlier. Let us use it to create an index that can in turn be used in a for loop:
x = [50, 100, 150]
y = [5, 10, 15]
for i in range(0, 3):
print(x[i] ,y[i])
50 5 100 10 150 15
x = [50, 100, 150]
y = [5, 10, 15]
for i, x in enumerate(x):
print(x, y[i])
50 5 100 10 150 15
No need to create an index with range() but can we do better ?
x = [1, 2, 3]
y = [5, 10, 15]
z = [7, 8, 9]
for ix, iy, iz in zip(x, y, z):
print(ix, iy, iz)
1 5 7 2 10 8 3 15 9
x = [1, 2, 3]
y = [5, 10, 15]
z = [7, 8, 9]
for data in zip(x, y, z):
print(data)
(1, 5, 7) (2, 10, 8) (3, 15, 9)
d = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
dna = 'AACCGGTT'
#remember how to access to keys of the dictionary ?
d.keys()
dict_keys(['G', 'T', 'A', 'C'])
# and values ?
d.values()
dict_values(['C', 'A', 'T', 'G'])
for k in d.keys():
print(k)
G T A C
for v in d.values():
print(v)
C A T G
for k,v in d.items():
print(k,v)
G C T A A T C G
complement = ''
for this in dna:
complement += d[this]
dna, complement
('AACCGGTT', 'TTGGCCAA')
Sequence comprehension is a very important concept in Python. You will see them everywhere. Why ?
1. Faster (although not always)
2. More Pythonic (elegant code)
Let us create a list made of 100,000 random values
%%timeit -n 20
X = []
for x in range(100000):
X.append(random.random())
20 loops, best of 3: 13.8 ms per loop
%%timeit -n 20
X = [random.random() for x in range(100000)]
20 loops, best of 3: 10.6 ms per loop
import random
data = [random.random()-0.5 for x in range(100000)]
%%timeit -n 20
X = []
for this in data:
if this > 0:
X.append(this)
20 loops, best of 3: 6.81 ms per loop
The same code with list comprehension
%%timeit -n 20
X = [this for this in data if this>0.5]
20 loops, best of 3: 2.52 ms per loop
So, in the first case, LC is 30% faster, and here it is 30% slower.
starting from :
X = [[0], [1,2], [3,4], [5,6]]
get a flatten list
X = [0, 1, 2, 3, 4, 5, 6]
X = [[0], [1,2], [3,4], [5,6]]
%%timeit # equivalent to
flatX = []
for item in X:
for this in item:
flatX.append(this)
1000000 loops, best of 3: 795 ns per loop
%%timeit
flatX = [this for item in X for this in item]
1000000 loops, best of 3: 586 ns per loop
Here the extend is actually faster than a list comprehension
As for list we can build a set
{expression for item in iterable}
{expression for item in iterable if condition}
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", ["name", "comment", "sequence", "cut", "end"])
ecor1 = RestrictEnzyme("EcoR1", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
bamh1 = RestrictEnzyme("BamH1", "type II restriction endonuclease from Bacillus amyloliquefaciens", "ggatcc", 1, "sticky")
hind3 = RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
sma1 = RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
digest = [ecor1, bamh1, hind3, sma1]
s1 = {enz.name for enz in digest}
s2 = {enz.name for enz in digest if enz.end != 'blunt'}
print(s1)
print(s2)
{'EcoR1', 'BamH1', 'HindIII', 'SmaI'} {'EcoR1', 'BamH1', 'HindIII'}
given the following dict :
d = {1 : 'a', 2 : 'b', 3 : 'c' , 4 : 'd'}
We want obtain a new dict with the keys and the values inverted so we will obtain:
inverted_d {'a': 1, 'c': 3, 'b': 2, 'd': 4}
Use the raise keywords
# Could be handy to be able to raise an error if an argument is incorrect:
def kelvin2celsius(k):
if k < 0:
raise ValueError('Invalid value. should be positive')
return k - 273.15
kelvin2celsius(-1)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-208-5a39f02da7cc> in <module>() ----> 1 kelvin2celsius(-1) <ipython-input-207-930ee5c8a687> in kelvin2celsius(k) 2 def kelvin2celsius(k): 3 if k < 0: ----> 4 raise ValueError('Invalid value. should be positive') 5 return k - 273.15 ValueError: Invalid value. should be positive
import exceptions
print(dir(exceptions))
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BufferError', 'BytesWarning', 'DeprecationWarning', 'EOFError', 'EnvironmentError', 'Exception', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError', 'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '__doc__', '__name__', '__package__']
Exception may be raised by Python when a statement is wrong:
x = 0
1/x
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-225-bfc3ae6379af> in <module>() 1 x = 0 ----> 2 1/x ZeroDivisionError: division by zero
and want to catch this exception
x = 0
try:
print(1./x)
except:
# if an exception is raised, do something else:
print('x should be non-zero')
x should be non-zero
We know more about the exception, we know it is an error due to a division by zero so let us be more explicit
x = 0
try:
print(1./x)
except ZeroDivisionError:
print('x should be non-zero')
x should be non-zero
What if the input is a string ? We could also catch this TypeError
x = "2"
try:
print(1./x)
except TypeError:
print("warning. wrong type. Cast to float")
print(1./float(x))
except ZeroDivisionError:
print('x should be non-zero')
warning. wrong type. Cast to float 0.5
what if x is set to "0" then ?
# maybe we can do something clever here such as converting the string into
# a float
x = '0'
try:
print(1./x)
except TypeError:
try:
print(1./float(x))
except ZeroDivisionError:
print('x should be non-zero')
except ZeroDivisionError:
print('x should be non-zero')
x should be non-zero
whether the try block succeeds or not, the finally keyword is used to add a cleanup block
try:
f = open("temp.txt", "w")
# something wrong here e.g. cannot open the file
# comment/uncomment the raise
raise IOError
except IOError as err:
print("error raised")
print(err)
finally:
print("We cleanup things here")
f.close()
error raised We cleanup things here
class MyException:
pass
class MyException(Exception):
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
try:
raise(MyException('Is this raised ? yes '))
except MyException as err:
print('MyException occured, value: %s' % err.value)
MyException occured, value: Is this raised ? yes
Decorators are functions that takes as input another function.
Here are two functions that do not accept x=0 as input parameter
import math
def function1(x):
if x==0: print('warning')
else: return 1/x
def function2(x):
if x < 0: print('warning')
else: return math.log(x)
def check_decorator(func): # Func is the function name to be decorated
def wrap(*args):
if args[0] <= 0:
print('warning from "{0}", First arg ({1}) \
is negative !!!!'.format(func.__name__, args[0]))
else:
return func(*args)
return wrap
@check_decorator
def function1(x):
return 1/x
@check_decorator
def function2(x):
return math.log(x)
function1(-2)
function2(0)
warning from "function1", First arg (-2) is negative !!!! warning from "function2", First arg (0) is negative !!!!
A generator is a function that contains the yield keyword
Generators functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.
The range() function iterates from a low to a high value.
Let us write a new range function that is symmetric: start from low to high, and then goes down to the low value again
# Without generator:
def symrange(low, high):
a = list(range(low, high))
b = list(range(high-2, low-1, -1))
a.extend(b)
return a
for a in symrange(0,5):
print("**" * a)
** **** ****** ******** ****** **** **
What would happen if the symrange is made of million of points. Well, a lot of memory for nothing really special. Here comes the generator:
def symrange(low, high):
for a in range(low, high):
yield a
for b in range(high-2, low-1, -1):
yield b
a = symrange(0,3)
a
<generator object symrange at 0x7fa944116888>
for x in a:
print(x)
0 1 2 1 0
# once looped over, the generator raise an error
next(a)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-274-ee7ee0698186> in <module>() 1 # once looped over, the generator raise an error ----> 2 next(a) TypeError: 'int' object is not an iterator
import random
letters = ['A', 'C', 'T', 'G']
seq = [letters[random.randint(0, 3)] for i in range(0, 100000)]
%%timeit -n 20
sum([1 for x in seq if x != 'T'])
20 loops, best of 3: 4.16 ms per loop
%%timeit -n 20
values_gen = (1 for x in seq if x!='T')
sum(values_gen)
20 loops, best of 3: 6.04 ms per loop
import urllib.request
req = urllib.request.urlopen('http://www.uniprot.org/uniprot/P56945.fasta')
data = req.read().decode()
print(data)
>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2 MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP SAAQDMVERVKELGHSTQQFRRVLGQLAAA
header, sequence = data.split("\n", 1) # use argument 1 to split only once (the header)
# extract sequence removing trailing characters
sequence = sequence.replace("\n", "")
# counts characters and store them
counter = {}
for x in set(sequence):
counter[x] = sequence.count(x)
print(header)
keys = sorted(counter.keys())
print(" ".join(keys))
print(" ".join([str(counter[x]) for x in keys]))
>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2 A C D E F G H I K L M N P Q R S T V W Y 15 88 4 52 47 19 67 23 14 34 84 10 16 107 59 40 62 44 66 6 28
from collections import Counter
counter2 = Counter(sequence)
print(header)
print(" ".join(counter2.keys()))
print(" ".join([str(counter2[k]) for k in keys]))
>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2 L T N P F R K D E C V H A Y S Q M I W G 15 88 4 52 47 19 67 23 14 34 84 10 16 107 59 40 62 44 66 6 28
Lots of Python keywords introduced in this notebook:
With for loops, think about enumerate and zip to avoid counters
List comprehensions are very common are can replace a combination of loop and condition
Generator are very powerful for large data sets to save memory
If you do want to use decorator, that's fine but if you are serious about programming in Python, think about it, especially if you want to program with object oriented approach.