Introduction to Python Programming TC, BN, JBM, AZ
Institut Pasteur, Paris, 20-31 March 2017

Keywords and built-in functions¶

• import module related: as, from , import
• closing files automatically: with
• deleting a variable/object: del
• check existence of item in a sequence: in, not

Built-in functions already mentionned¶

• print, type, dir
• float, int, complex, bool
• list, tuple, set, dict

To be used in this notebook:

• function: def, pass, return
• logic: and, is, or, not
• loop: for, break, continue, while
• condition: if, elif, else
• exception: assert, try, except, finally, raise
• generator: yield
• class: class

Differences between builtin functions and keywords¶

In [7]:
float = 1

In [86]:
float

Out[86]:
1
In [3]:
# if we delete the variable, we do get back the original built-in function !
del float
float(1.)

Out[3]:
1.0
In [87]:
# keywords cannot be re-used
del = 1

  File "<ipython-input-87-e5d9a8bc7b5b>", line 2
del = 1
^
SyntaxError: invalid syntax

In [7]:
### List of builtin functions
import builtins
print(dir(builtins)[79:])

['abs', 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'dreload', 'enumerate', 'eval', 'exec', 'filter', 'float', 'format', 'frozenset', 'get_ipython', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']


We have already used a few functions (builtin functions). Let us now write our own functions, to be used for repetitive tasks.

Declaration and Caller¶

declaration¶

In [15]:
def hello():
# a comment
print("hello")

• Note the def keyword and colon at the end of the function declaration
• note the block indentation (4 spaces). Python knows it is the end of the function when the indentation is back at the def level.
• This function has no input parameter and returns None

caller¶

In [9]:
# call the function (name followed by empty parenthesis)
hello()

hello


All functions return something (default is None). So the output can be redirected into a variable

In [115]:
result = hello()

hello

In [119]:
result is None

Out[119]:
True

Docstrings: the Python documentation¶

Most functions and classes are documented with docstrings, that is a string in triple quotes

In [22]:
def my_function(x, verbose=True):
"""A main description

Following by details about e.g., the arguments

:param x: the input value (must be positive)
:return: nothing

Example:

my_function(10)
"""
if verbose and x <0:
print('This is a negative number !! ')
else:
print(sqrt(x))



Positional arguments¶

In [29]:
def compute_gc_content(sequence):
GC = sequence.count('G') + sequence.count('C')
GC /= len(sequence)
GC *= 100
return GC

In [30]:
compute_gc_content("ACGTACGTGCGCT")

Out[30]:
61.53846153846154
In [75]:
#Several input arguments
def polynomial(x, exp):
return x**exp - x + 1
polynomial(2,8)

Out[75]:
255

Position argument: because the order of each positional argument matters

Keyword arguments¶

Arguments may have default values, in which case you must name it with a keyword

In [35]:
def power_kw(x, exp=2):
return x**exp
power_kw(2)

Out[35]:
4
In [36]:
# Here is one way to call the function
power_kw(2, 3)

Out[36]:
8
In [76]:
# but you can also name the optional argument, which is more
# robust (if your API changes later on)
power_kw(2, exp=3)

Out[76]:
8
It's a realy **bad** idea to use a mutable object as default value of keyword argument. Do this only if you exactly know what are you doing.

Keyword arguments order¶

In [24]:
def order(x, y=2, z=3):
print(x, y, z)

In [25]:
order(1, 2, 3)
order(1, 2)          # same as above
order(1)             # same as above
order(1, y=2, z=3)   # same as above
order(1, z=3, y=2)   # same as above
order(1, y=2)        # same as above
order(1, z=3)        # same as above

1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3


You cannot skip an argument: order(1, ,3) is not valid. Without naming keyword arguments, you must provide all of them

all positional parameters must be on the left on the keyword parameters. This definition is not correct:

In [1]:
def power_does_not_work(y=2, x):
return x**y

  File "<ipython-input-1-7d30c6e4fbba>", line 1
def power_does_not_work(y=2, x):
^
SyntaxError: non-default argument follows default argument


Arbitrary number of positional arguments¶

Sometimes you do not know before hand how many arguments are required

In [55]:
def func(*args):
print(args[0])
return sum(args)
func(1,2,3,4,5,6)

1

Out[55]:
21

*args syntax converts all positional input arguments into a tuple. The way Python handles the provided arguments is to match the normal positional arguments from left to right and then places any other positional arguments in a tuple (*args) that can be used by the function.

In [56]:
def extra_sum(items, *extra_items):
return sum(items) + sum(extra_items)
extra_sum([1,2,3], 4,5,6)

Out[56]:
21

Arbitrary number of keyword arguments¶

Similarly to the positional arguments, you can specify an arbitrary number of keyword arguments by using the following syntax (combined with the arbitrary number of optional arguments introduced in the previous section):

• args and kwargs are conventions. You can call them with any name
• args is a tuple
• kwargs is a dictionary
In [27]:
def funckw(pos_params, *args, **kwargs):
print(args)
print(kwargs)

In [28]:
funckw(1000, 2, 3, 4,5, test1=1, test2=2, test3=3)

(2, 3, 4, 5)
{'test2': 2, 'test3': 3, 'test1': 1}


You can pass a list/tuple of values as argument

In [66]:
t = ('a',2)
funckw(1, t)  # does not work as expected: the function got one item (the tuple)

(('a', 2),)
{}


but you would need to use the * syntax like in the prototype

In [67]:
funckw(1, *t)

('a', 2)
{}


One more example on mutable argument (advanced)¶

Remember that

1. a list is mutable sequence
2. object are passed by reference
In [155]:
def reference(mutable):
mutable[0] = 1

mylist = [10, 2, 3]
reference(mylist)

mylist #changed !! since mylist is mutable list

Out[155]:
[1, 2, 3]

If you do no want the sequence to be changed, pass a copy

In [156]:
mylist2 = [10,2,3]
reference(mylist2[:])
mylist2  # unchanged since we passed a copy of l using [:]

Out[156]:
[10, 2, 3]

Functions practical

1. How many positional and keyword argument in the following function declaration
def myfunc(a, b, c, d=1, e=2):
return a+b+c+d+e

What is the output (guess it, do not run the function)
f(1,2,3)
f(1,2,3, e=4)

2. ADVANCED: Recursive function. Write a recursive (or not) function that return the Fibonacci sequence, which is defined by
• u[0] = 1
• u[1] = 1
• u[n+2] = u[n+1] + u[n]
1. write a function that fetches a uniprot sequence.
2. Write another function. Given a string sequence as an argument, the function should return a dictionary with the count of each letter. (see first notebook)
3. Finally using the two previous functions, crate a third one that takes as input a uniprot sequence and returns a dictionary with the count of each letter in the sequence corresponind to the uniprot ID .

Write a function that takes as argument a list of numbers. The function should return the mean of the list. First, consider only pure Python code without any import Then use the statistics module. Once done, rename your function and call it statistics. Adapt your code so that it is functional

Function inside function¶

In [89]:
import math

def mylog(x):
def check(x):
if x<=0:
print('warning x is negative')
return False
return True

if check(x) is True:
return math.log(x)
else:
return math.nan

res = mylog(-1)

warning x is negative

In [34]:
check(1)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-34-08d9533016ef> in <module>()
----> 1 check(1)

NameError: name 'check' is not defined

The pass keyword¶

The pass keyword is a no-operation statement: it does nothing. Handy within function definition or control flow. It is required within block of statements that does nothing.

In [161]:
# useful for function
def func_to_implement_one_day():
# TODO
pass


Logical operations¶

the identity operator¶

The is operator is a binary operator that returns True if its left-hand object reference is referring to the same object as its right-hand object reference. Note that it usually does not make sense to use is for comparing ints, strs, and most other data types since we almost invariably want to compare their values.

To invert the identity test we use is not .

In [6]:
a = "Something"
b = None
a is not None, b is None

Out[6]:
(True, True)

the comparison operators¶

Python provides the standard set of binary comparison operators, with the expected semantics: < less than, <= less than or equal to, == equal to, != not equal to, >= greater than or equal to, and > greater than.

In [3]:
a = 2
b = 6
a == b

Out[3]:
False
In [ ]:
a = 2
b = 6
a < b

In [4]:
a = 2
b = 6
a <= b, a != b, a >= b, a > b

Out[4]:
(True, True, False, False)
In [5]:
0 < a < b

Out[5]:
True
In [ ]:
we can also compare strings.

In [7]:
"three" < "four"

Out[7]:
False
In [8]:
"three" < 4

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-f2a612e11247> in <module>()
----> 1 "three" < 4

TypeError: unorderable types: str() < int()

membership operator (in)¶

See first notebook

logical operators¶

Python provides three logical operators: and, or, and not. Both and and or use short-circuit logic and return the operand that determined the result they do not return a Boolean (unless they actually have Boolean operands). Let’s see what this means in practice:

In [9]:
five = 5
two = 2
zero = 0
print(five and two)
# bool(2) = True
print(two and five)
# bool(5) = True
print(five and zero)
# bool(0) = False

2
5
0

In [10]:
nought = 0
print(five or two)
print(two or five)
print(zero or five)
print(zero or nought)

5
2
5
0

Control flows

Condition (if/else/elif)¶

In [90]:
a = 111
if a > 0:
if a > 1e6:
print("large positive")
else:
print("positive")
elif a < 0:
print("negative")
else:
print("zero")

positive

In [38]:
i = 10

print("if elif statement")
if i > 1:
print("i > 1")
elif i > 2:
print("i > 2")
elif i > 3:
print("i > 3")

print("suite of if statements")
if i > 1:
print("i > 1")
if i > 2:
print("i > 2")
if i > 3:
print("i > 3")

if elif statement
i > 1
suite of if statements
i > 1
i > 2
i > 3


Iterators¶

Iterators are objects that can be traversed through all its elements. In python many objects are iterators. The for loop (see next section) iterates through iterable object(s). You can transform an object into an iterator using the iter() builtin function

In [97]:
x = [1, 3]
ix = iter(x)

# you can then call next() until an error is raised, which  indicaties the end of the iterator
next(ix)

Out[97]:
1
In [99]:
next(ix)

Out[99]:
3
In [100]:
next(ix)

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-100-7d48257f631e> in <module>()
----> 1 next(ix)

StopIteration: 

for loops¶

For loops are defined with the for keyword and an ending : character

In [91]:
for this in [1, 2, 3]:
print(this)

1
2
3


break¶

The break keyword stops the iteration of the for/while loop. The break statement causes the program flow to exit the body of the for/while loop and resume the execution of the program at the next statement after the loop.

In [173]:
for x in [1, 3, 5]:
if x>3:
break
print(x)


1
3


continue¶

Another keyword related to loops is the continue keyword. Instead of continuing the current iteration, the continue statement rejects all the remaining statements in the current iteration of the loop and moves the control back to the top of the loop.

In [126]:
S = 0
for x in [1, 2, None, 3, 4]:
if x is None:
S += x
print(S)


10


else¶

There is an optional else statement executed when the loop is over, which is hardly used but it is interesting to know its existence. It is not executed if the loop is interrupted (a break statement):

In [103]:
for x in [1,2,3]:
print(x)
else:
print('e')

1
2
3
e

In [104]:
for x in [1,2,3]:
print(x)
if x>1:
break
else:
print('e')

1
2


The range builtin function¶

In [105]:
S = 0
for x in [0, 1, 2, 3]:
S += x
S

Out[105]:
6

For long sequences, use the range function.

In [106]:
S = 0
for x in range(0, 1000):
S +=x
print(S)

499500

In [127]:
for this in range(0, 10,2):
print("{} ".format(this), end="")

0 2 4 6 8

Note the usage of end argument in the print() function to prevent newline after each print() call

while¶

Although hardly used in Python code, the while loop is also available

In [129]:
i = 0
S = 0
while S < 1000:
S += i
i += 1
print(i, S)

46 1035


range/xrange builtin function (Python2/3 related)¶

range and xrange functions are useful to create a sequence of numbers. xrange returns a generator instead of a list, which makes it slightly faster and memory more efficient.

In [180]:
%%timeit
x = range(0,1000000)

100 loops, best of 3: 13.8 ms per loop

In [52]:
%%timeit
# PYTHON 2 only. No need in Python 3 anymore
x = xrange(0, 1000000)

1000000 loops, best of 3: 209 ns per loop


enumerate¶

classical way to have an index while looping over a sequence:

In [131]:
count = 0
for x in [5, 6, 7, 8, 9]:
print("position " + str(count) + ": value " + str(x))
count += 1

position 0: value 5
position 1: value 6
position 2: value 7
position 3: value 8
position 4: value 9


but in python, you should use enumerate instead:

In [130]:
for count, x in enumerate([5, 6, 7, 8, 9]):
print("position " + str(count) + ": value " + str(x))

position 0: value 5
position 1: value 6
position 2: value 7
position 3: value 8
position 4: value 9


zip¶

How to loop the contents of 2 lists at the same time without indexing ?

Naive way¶

We’ve seen the range function earlier. Let us use it to create an index that can in turn be used in a for loop:

In [138]:
x = [50, 100, 150]
y = [5, 10, 15]

for i in range(0, 3):
print(x[i] ,y[i])

50 5
100 10
150 15


Better way ?¶

In [139]:
x = [50, 100, 150]
y = [5, 10, 15]
for i, x in enumerate(x):
print(x, y[i])

50 5
100 10
150 15


No need to create an index with range() but can we do better ?

The zip approach¶

In [140]:
x = [1, 2, 3]
y = [5, 10, 15]
z = [7, 8, 9]

for ix, iy, iz in zip(x, y, z):
print(ix, iy, iz)

1 5 7
2 10 8
3 15 9

In [39]:
x = [1, 2, 3]
y = [5, 10, 15]
z = [7, 8, 9]

for data in zip(x, y, z):
print(data)

(1, 5, 7)
(2, 10, 8)
(3, 15, 9)


dictionaries are iterators too !¶

Reminder¶

In [41]:
d = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
dna = 'AACCGGTT'

In [42]:
#remember how to access to keys of the dictionary ?
d.keys()

Out[42]:
dict_keys(['G', 'T', 'A', 'C'])
In [43]:
# and values ?
d.values()

Out[43]:
dict_values(['C', 'A', 'T', 'G'])

Iterators¶

In [44]:
for k in d.keys():
print(k)

G
T
A
C

In [47]:
for v in d.values():
print(v)

C
A
T
G

In [48]:
for k,v in d.items():
print(k,v)

G C
T A
A T
C G

In [49]:
complement = ''
for this in dna:
complement += d[this]
dna, complement

Out[49]:
('AACCGGTT', 'TTGGCCAA')

List comprehensions¶

Sequence comprehension is a very important concept in Python. You will see them everywhere. Why ?
1. Faster (although not always)
2. More Pythonic (elegant code)

Let us create a list made of 100,000 random values

In [60]:
%%timeit -n 20
X = []
for x in range(100000):
X.append(random.random())

20 loops, best of 3: 13.8 ms per loop

In [61]:
%%timeit -n 20
X = [random.random() for x in range(100000)]

20 loops, best of 3: 10.6 ms per loop


List comprehension with condition¶

In [5]:
import random
data = [random.random()-0.5 for x in range(100000)]

In [8]:
%%timeit -n 20
X = []
for this in data:
if this > 0:
X.append(this)

20 loops, best of 3: 6.81 ms per loop


The same code with list comprehension

In [10]:
%%timeit -n 20
X = [this for this in data if this>0.5]

20 loops, best of 3: 2.52 ms per loop


So, in the first case, LC is 30% faster, and here it is 30% slower.

Application: flatten a list¶

starting from :

X = [[0], [1,2], [3,4], [5,6]]


get a flatten list

X = [0, 1, 2, 3, 4, 5, 6]

In [71]:
X = [[0], [1,2], [3,4], [5,6]]

In [81]:
%%timeit  # equivalent to
flatX = []
for item in X:
for this in item:
flatX.append(this)

1000000 loops, best of 3: 795 ns per loop

In [85]:
%%timeit
flatX = [this for item in X for this in item]

1000000 loops, best of 3: 586 ns per loop


Here the extend is actually faster than a list comprehension

set comprehension¶

As for list we can build a set

{expression for item in iterable}
{expression for item in iterable if condition}

In [12]:
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", ["name", "comment", "sequence", "cut", "end"])
ecor1 = RestrictEnzyme("EcoR1", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
bamh1 = RestrictEnzyme("BamH1", "type II restriction endonuclease from Bacillus amyloliquefaciens", "ggatcc", 1, "sticky")
hind3 =  RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
sma1 =  RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
digest = [ecor1, bamh1, hind3, sma1]

s1 = {enz.name for enz in digest}
s2 = {enz.name for enz in digest if enz.end != 'blunt'}

print(s1)
print(s2)

{'EcoR1', 'BamH1', 'HindIII', 'SmaI'}
{'EcoR1', 'BamH1', 'HindIII'}


list, dict, set Comprehension manipulation¶

given the following dict :

d = {1 : 'a', 2 : 'b', 3 : 'c' , 4 : 'd'}


We want obtain a new dict with the keys and the values inverted so we will obtain:

inverted_d  {'a': 1, 'c': 3, 'b': 2, 'd': 4}


Exceptions¶

Use the raise keywords

In [207]:
# Could be handy to be able to raise an error if an argument is incorrect:
def kelvin2celsius(k):
if k < 0:
raise ValueError('Invalid value. should be positive')
return k - 273.15

In [208]:
kelvin2celsius(-1)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-208-5a39f02da7cc> in <module>()
----> 1 kelvin2celsius(-1)

<ipython-input-207-930ee5c8a687> in kelvin2celsius(k)
2 def kelvin2celsius(k):
3     if k < 0:
----> 4         raise ValueError('Invalid value. should be positive')
5     return k - 273.15

ValueError: Invalid value. should be positive

Lots of standard exceptions¶

In [289]:
import exceptions
print(dir(exceptions))

['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BufferError', 'BytesWarning', 'DeprecationWarning', 'EOFError', 'EnvironmentError', 'Exception', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError', 'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '__doc__', '__name__', '__package__']


try/except/finally¶

Exception may be raised by Python when a statement is wrong:

In [225]:
x = 0
1/x

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-225-bfc3ae6379af> in <module>()
1 x = 0
----> 2 1/x

ZeroDivisionError: division by zero

and want to catch this exception

In [226]:
x = 0
try:
print(1./x)
except:
# if an exception is raised, do something else:
print('x should be non-zero')

x should be non-zero


We know more about the exception, we know it is an error due to a division by zero so let us be more explicit

In [227]:
x = 0
try:
print(1./x)
except ZeroDivisionError:
print('x should be non-zero')


x should be non-zero


What if the input is a string ? We could also catch this TypeError

In [228]:
x = "2"
try:
print(1./x)
except TypeError:
print("warning. wrong type. Cast to float")
print(1./float(x))
except ZeroDivisionError:
print('x should be non-zero')


warning. wrong type. Cast to float
0.5


what if x is set to "0" then ?

In [231]:
# maybe we can do something clever here such as converting the string into
# a float
x = '0'
try:
print(1./x)
except TypeError:
try:
print(1./float(x))
except ZeroDivisionError:
print('x should be non-zero')
except ZeroDivisionError:
print('x should be non-zero')

x should be non-zero


whether the try block succeeds or not, the finally keyword is used to add a cleanup block

In [239]:
try:
f = open("temp.txt", "w")
# something wrong here e.g. cannot open the file
# comment/uncomment the raise
raise IOError
except IOError as err:
print("error raised")
print(err)
finally:
print("We cleanup things here")
f.close()

error raised

We cleanup things here


In [ ]:
class MyException:
pass

In [246]:
class MyException(Exception):
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)

try:
raise(MyException('Is this raised ? yes '))
except MyException as err:
print('MyException occured, value: %s' %  err.value)

MyException occured, value: Is this raised ? yes


Decorator¶

Decorators are functions that takes as input another function.

Here are two functions that do not accept x=0 as input parameter

In [258]:
import math

def function1(x):
if x==0: print('warning')
else: return 1/x

def function2(x):
if x < 0: print('warning')
else: return math.log(x)


Definition¶

In [270]:
def check_decorator(func):   # Func is the function name to be decorated
def wrap(*args):
if args[0] <= 0:
print('warning from "{0}", First arg ({1}) \
is negative !!!!'.format(func.__name__, args[0]))
else:
return func(*args)
return wrap


call¶

In [271]:
@check_decorator
def function1(x):
return 1/x

@check_decorator
def function2(x):
return math.log(x)

In [272]:
function1(-2)
function2(0)

warning from "function1", First arg (-2) is negative !!!!
warning from "function2", First arg (0) is negative !!!!

Decorator are very useful and simplifies code (less errors).

However, the syntax may be difficult especially for decorating methods, classes, with / without arguments.

Keep also in mind that it mayhave non-negligeable computational cost

Generator¶

A generator is a function that contains the yield keyword

Generators functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.

The range() function iterates from a low to a high value.

Let us write a new range function that is symmetric: start from low to high, and then goes down to the low value again

In [105]:
# Without generator:
def symrange(low, high):
a = list(range(low, high))
b = list(range(high-2, low-1, -1))
a.extend(b)
return a
for a in symrange(0,5):
print("**" * a)

**
****
******
********
******
****
**



What would happen if the symrange is made of million of points. Well, a lot of memory for nothing really special. Here comes the generator:

In [106]:
def symrange(low, high):
for a in range(low, high):
yield a
for b in range(high-2, low-1, -1):
yield b
a = symrange(0,3)
a

Out[106]:
<generator object symrange at 0x7fa944116888>
In [107]:
for x in a:
print(x)

0
1
2
1
0

In [274]:
# once looped over, the generator raise an error
next(a)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-274-ee7ee0698186> in <module>()
1 # once looped over, the generator raise an error
----> 2 next(a)

TypeError: 'int' object is not an iterator

List comprehension versus generators¶

In [127]:
import random
letters = ['A', 'C', 'T', 'G']
seq = [letters[random.randint(0, 3)] for i in range(0, 100000)]

In [128]:
%%timeit -n 20
sum([1 for x in seq if x != 'T'])

20 loops, best of 3: 4.16 ms per loop

In [129]:
%%timeit -n 20
values_gen = (1 for x in seq if x!='T')
sum(values_gen)

20 loops, best of 3: 6.04 ms per loop

Generators are not necesseraly faster but less memory is used !

In [130]:
import urllib.request
req = urllib.request.urlopen('http://www.uniprot.org/uniprot/P56945.fasta')

In [131]:
print(data)

>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
SAAQDMVERVKELGHSTQQFRRVLGQLAAA


In [132]:
header, sequence = data.split("\n", 1)  # use argument 1 to split only once (the header)

In [74]:
# extract sequence removing trailing characters
sequence = sequence.replace("\n", "")

In [133]:
# counts characters and store them
counter = {}
for x in set(sequence):
counter[x] = sequence.count(x)

In [134]:
print(header)
keys = sorted(counter.keys())
print(" ".join(keys))
print(" ".join([str(counter[x]) for x in keys]))

>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2

A C D E F G H I K L M N P Q R S T V W Y
15 88 4 52 47 19 67 23 14 34 84 10 16 107 59 40 62 44 66 6 28


There is always something in Python to do basic tasks¶

In [135]:
from collections import Counter

In [136]:
counter2 = Counter(sequence)
print(" ".join(counter2.keys()))
print(" ".join([str(counter2[k]) for k in keys]))

>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
L T N P F R K D E C V H A
Y S Q M I W G
15 88 4 52 47 19 67 23 14 34 84 10 16 107 59 40 62 44 66 6 28


Summary¶

Lots of Python keywords introduced in this notebook:

• Boolean logic:
• boolean Type: True/False
• logic: and, is, or, not
• Control flows:
• loop: for, break, continue, while
• condition: if, elif, else
• Exceptions
• assert, try, except, finally, raise
• generator:
• yield
• function:
• def, pass, return
• Functions:
• By default functions return None
• Any number of arguments.

With for loops, think about enumerate and zip to avoid counters

List comprehensions are very common are can replace a combination of loop and condition

Generator are very powerful for large data sets to save memory

If you do want to use decorator, that's fine but if you are serious about programming in Python, think about it, especially if you want to program with object oriented approach.