Introduction to Python Programming TC, BN, JBM, AZ
Institut Pasteur, Paris, 20-31 March 2017

keywords already mentionned

  • import module related: as, from , import
  • closing files automatically: with
  • deleting a variable/object: del
  • check existence of item in a sequence: in, not

Built-in functions already mentionned

  • print, type, dir
  • float, int, complex, bool
  • list, tuple, set, dict

To be used in this notebook:

  • function: def, pass, return
  • logic: and, is, or, not
  • loop: for, break, continue, while
  • condition: if, elif, else
  • exception: assert, try, except, finally, raise
  • generator: yield
  • class: class

Differences between builtin functions and keywords

In [7]:
float = 1
In [86]:
float
Out[86]:
1
In [3]:
# if we delete the variable, we do get back the original built-in function !
del float
float(1.)
Out[3]:
1.0
In [87]:
# keywords cannot be re-used
del = 1
  File "<ipython-input-87-e5d9a8bc7b5b>", line 2
    del = 1
        ^
SyntaxError: invalid syntax
In [7]:
### List of builtin functions
import builtins
print(dir(builtins)[79:])
['abs', 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'dreload', 'enumerate', 'eval', 'exec', 'filter', 'float', 'format', 'frozenset', 'get_ipython', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']

All about functions

We have already used a few functions (builtin functions). Let us now write our own functions, to be used for repetitive tasks.

Declaration and Caller

declaration

In [15]:
def hello():
    # a comment
    print("hello")
  • Note the def keyword and colon at the end of the function declaration
  • note the block indentation (4 spaces). Python knows it is the end of the function when the indentation is back at the def level.
  • This function has no input parameter and returns None

caller

In [9]:
# call the function (name followed by empty parenthesis) 
hello()
hello

All functions return something (default is None). So the output can be redirected into a variable

In [115]:
result = hello()
hello
In [119]:
result is None
Out[119]:
True

Docstrings: the Python documentation

Most functions and classes are documented with docstrings, that is a string in triple quotes

In [22]:
def my_function(x, verbose=True):
    """A main description
    
    Following by details about e.g., the arguments
    
    :param x: the input value (must be positive)
    :return: nothing
    
    Example:
    
        my_function(10)
    """
    if verbose and x <0:
        print('This is a negative number !! ')
    else:
        print(sqrt(x))
    

Positional arguments

In [29]:
def compute_gc_content(sequence):
    GC = sequence.count('G') + sequence.count('C')
    GC /= len(sequence)
    GC *= 100
    return GC
In [30]:
compute_gc_content("ACGTACGTGCGCT")
Out[30]:
61.53846153846154
In [75]:
#Several input arguments
def polynomial(x, exp):
    return x**exp - x + 1
polynomial(2,8)
Out[75]:
255

Position argument: because the order of each positional argument matters

Keyword arguments

Arguments may have default values, in which case you must name it with a keyword

In [35]:
def power_kw(x, exp=2):
    return x**exp
power_kw(2)
Out[35]:
4
In [36]:
# Here is one way to call the function
power_kw(2, 3)
Out[36]:
8
In [76]:
# but you can also name the optional argument, which is more 
# robust (if your API changes later on)
power_kw(2, exp=3)
Out[76]:
8
It's a realy **bad** idea to use a mutable object as default value of keyword argument. Do this only if you exactly know what are you doing.

Keyword arguments order

In [24]:
def order(x, y=2, z=3):
    print(x, y, z)
In [25]:
order(1, 2, 3)
order(1, 2)          # same as above
order(1)             # same as above
order(1, y=2, z=3)   # same as above
order(1, z=3, y=2)   # same as above
order(1, y=2)        # same as above
order(1, z=3)        # same as above
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3

You cannot skip an argument: order(1, ,3) is not valid. Without naming keyword arguments, you must provide all of them

all positional parameters must be on the left on the keyword parameters. This definition is not correct:

In [1]:
def power_does_not_work(y=2, x):
    return x**y
  File "<ipython-input-1-7d30c6e4fbba>", line 1
    def power_does_not_work(y=2, x):
                           ^
SyntaxError: non-default argument follows default argument

Sometimes you do not know before hand how many arguments are required

In [55]:
def func(*args):
    print(args[0])  
    return sum(args)
func(1,2,3,4,5,6)
1
Out[55]:
21

*args syntax converts all positional input arguments into a tuple. The way Python handles the provided arguments is to match the normal positional arguments from left to right and then places any other positional arguments in a tuple (*args) that can be used by the function.

In [56]:
def extra_sum(items, *extra_items):
    return sum(items) + sum(extra_items)
extra_sum([1,2,3], 4,5,6)
Out[56]:
21

Similarly to the positional arguments, you can specify an arbitrary number of keyword arguments by using the following syntax (combined with the arbitrary number of optional arguments introduced in the previous section):

  • args and kwargs are conventions. You can call them with any name
  • args is a tuple
  • kwargs is a dictionary
In [27]:
def funckw(pos_params, *args, **kwargs):
    print(args)
    print(kwargs)
In [28]:
funckw(1000, 2, 3, 4,5, test1=1, test2=2, test3=3)
(2, 3, 4, 5)
{'test2': 2, 'test3': 3, 'test1': 1}

You can pass a list/tuple of values as argument

In [66]:
t = ('a',2)
funckw(1, t)  # does not work as expected: the function got one item (the tuple)
(('a', 2),)
{}

but you would need to use the * syntax like in the prototype

In [67]:
funckw(1, *t)
('a', 2)
{}

One more example on mutable argument (advanced)

Remember that

  1. a list is mutable sequence
  2. object are passed by reference
In [155]:
def reference(mutable):
    mutable[0] = 1
    
mylist = [10, 2, 3]
reference(mylist)

mylist #changed !! since mylist is mutable list
Out[155]:
[1, 2, 3]

If you do no want the sequence to be changed, pass a copy

In [156]:
mylist2 = [10,2,3]
reference(mylist2[:])
mylist2  # unchanged since we passed a copy of l using [:]
Out[156]:
[10, 2, 3]

Functions practical

  1. How many positional and keyword argument in the following function declaration
    def myfunc(a, b, c, d=1, e=2):
     return a+b+c+d+e
    
    What is the output (guess it, do not run the function)
    f(1,2,3)
    f(1,2,3, e=4)
    
  2. ADVANCED: Recursive function. Write a recursive (or not) function that return the Fibonacci sequence, which is defined by
    • u[0] = 1
    • u[1] = 1
    • u[n+2] = u[n+1] + u[n]
  1. write a function that fetches a uniprot sequence.
  2. Write another function. Given a string sequence as an argument, the function should return a dictionary with the count of each letter. (see first notebook)
  3. Finally using the two previous functions, crate a third one that takes as input a uniprot sequence and returns a dictionary with the count of each letter in the sequence corresponind to the uniprot ID .

Write a function that takes as argument a list of numbers. The function should return the mean of the list. First, consider only pure Python code without any import Then use the statistics module. Once done, rename your function and call it statistics. Adapt your code so that it is functional

Function inside function

In [89]:
import math

def mylog(x):
    def check(x):
        if x<=0:
            print('warning x is negative')
            return False
        return True
    
    if check(x) is True:
        return math.log(x)
    else:
        return math.nan
    
res = mylog(-1)
warning x is negative
In [34]:
check(1)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-34-08d9533016ef> in <module>()
----> 1 check(1)

NameError: name 'check' is not defined

The pass keyword is a no-operation statement: it does nothing. Handy within function definition or control flow. It is required within block of statements that does nothing.

In [161]:
# useful for function
def func_to_implement_one_day():
    # TODO 
    pass

See also control flow examples later

Logical operations

the identity operator

The is operator is a binary operator that returns True if its left-hand object reference is referring to the same object as its right-hand object reference. Note that it usually does not make sense to use is for comparing ints, strs, and most other data types since we almost invariably want to compare their values.

To invert the identity test we use is not .

In [6]:
a = "Something"
b = None
a is not None, b is None
Out[6]:
(True, True)

the comparison operators

Python provides the standard set of binary comparison operators, with the expected semantics: < less than, <= less than or equal to, == equal to, != not equal to, >= greater than or equal to, and > greater than.

In [3]:
a = 2
b = 6
a == b
Out[3]:
False
In [ ]:
a = 2
b = 6
a < b
In [4]:
a = 2
b = 6
a <= b, a != b, a >= b, a > b
Out[4]:
(True, True, False, False)
In [5]:
0 < a < b
Out[5]:
True
In [ ]:
we can also compare strings.
In [7]:
"three" < "four"
Out[7]:
False
In [8]:
"three" < 4
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-f2a612e11247> in <module>()
----> 1 "three" < 4

TypeError: unorderable types: str() < int()

membership operator (in)

See first notebook

logical operators

Python provides three logical operators: and, or, and not. Both and and or use short-circuit logic and return the operand that determined the result they do not return a Boolean (unless they actually have Boolean operands). Let’s see what this means in practice:

In [9]:
five = 5
two = 2
zero = 0
print(five and two)
# bool(2) = True
print(two and five)
# bool(5) = True
print(five and zero)
# bool(0) = False
2
5
0
In [10]:
nought = 0
print(five or two)
print(two or five)
print(zero or five)
print(zero or nought)
5
2
5
0
Control flows
In [90]:
a = 111
if a > 0:
    if a > 1e6:
        print("large positive")
    else:
        print("positive")
elif a < 0:
    print("negative")
else:
    print("zero")
positive
In [38]:
i = 10

print("if elif statement")
if i > 1:
   print("i > 1")
elif i > 2:
   print("i > 2")
elif i > 3:
   print("i > 3")

print("suite of if statements")
if i > 1:
   print("i > 1")
if i > 2:
   print("i > 2")
if i > 3:
   print("i > 3")
if elif statement
i > 1
suite of if statements
i > 1
i > 2
i > 3

Iterators are objects that can be traversed through all its elements. In python many objects are iterators. The for loop (see next section) iterates through iterable object(s). You can transform an object into an iterator using the iter() builtin function

In [97]:
x = [1, 3]
ix = iter(x)

# you can then call next() until an error is raised, which  indicaties the end of the iterator
next(ix)
Out[97]:
1
In [99]:
next(ix)
Out[99]:
3
In [100]:
next(ix)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-100-7d48257f631e> in <module>()
----> 1 next(ix)

StopIteration: 

for loops

For loops are defined with the for keyword and an ending : character

In [91]:
for this in [1, 2, 3]:
    print(this)
1
2
3

break

The break keyword stops the iteration of the for/while loop. The break statement causes the program flow to exit the body of the for/while loop and resume the execution of the program at the next statement after the loop.

In [173]:
for x in [1, 3, 5]:
    if x>3:
        break
    print(x)
    
1
3

continue

Another keyword related to loops is the continue keyword. Instead of continuing the current iteration, the continue statement rejects all the remaining statements in the current iteration of the loop and moves the control back to the top of the loop.

In [126]:
S = 0
for x in [1, 2, None, 3, 4]:
    if x is None:
        continue  # jump to next iteration
    S += x
print(S)
    
10

else

There is an optional else statement executed when the loop is over, which is hardly used but it is interesting to know its existence. It is not executed if the loop is interrupted (a break statement):

In [103]:
for x in [1,2,3]:
    print(x)
else:
    print('e')
1
2
3
e
In [104]:
for x in [1,2,3]:
    print(x)
    if x>1:
        break
else:
    print('e')
1
2

The range builtin function

In [105]:
S = 0
for x in [0, 1, 2, 3]:
    S += x    
S
Out[105]:
6

For long sequences, use the range function.

In [106]:
S = 0
for x in range(0, 1000):
    S +=x 
print(S)
499500
In [127]:
for this in range(0, 10,2):
    print("{} ".format(this), end="")
0 2 4 6 8 

Note the usage of end argument in the print() function to prevent newline after each print() call

Although hardly used in Python code, the while loop is also available

In [129]:
i = 0
S = 0
while S < 1000:
    S += i
    i += 1
print(i, S)
46 1035

range and xrange functions are useful to create a sequence of numbers. xrange returns a generator instead of a list, which makes it slightly faster and memory more efficient.

In [180]:
%%timeit 
x = range(0,1000000)
100 loops, best of 3: 13.8 ms per loop
In [52]:
%%timeit
# PYTHON 2 only. No need in Python 3 anymore
x = xrange(0, 1000000)
1000000 loops, best of 3: 209 ns per loop

classical way to have an index while looping over a sequence:

In [131]:
count = 0
for x in [5, 6, 7, 8, 9]:
    print("position " + str(count) + ": value " + str(x))
    count += 1
position 0: value 5
position 1: value 6
position 2: value 7
position 3: value 8
position 4: value 9

but in python, you should use enumerate instead:

In [130]:
for count, x in enumerate([5, 6, 7, 8, 9]):
    print("position " + str(count) + ": value " + str(x))
position 0: value 5
position 1: value 6
position 2: value 7
position 3: value 8
position 4: value 9

How to loop the contents of 2 lists at the same time without indexing ?

Naive way

We’ve seen the range function earlier. Let us use it to create an index that can in turn be used in a for loop:

In [138]:
x = [50, 100, 150]
y = [5, 10, 15]

for i in range(0, 3):
    print(x[i] ,y[i])
50 5
100 10
150 15

Better way ?

In [139]:
x = [50, 100, 150]
y = [5, 10, 15]
for i, x in enumerate(x):
    print(x, y[i])
50 5
100 10
150 15

No need to create an index with range() but can we do better ?

The zip approach

In [140]:
x = [1, 2, 3]
y = [5, 10, 15]
z = [7, 8, 9]

for ix, iy, iz in zip(x, y, z):
    print(ix, iy, iz)
1 5 7
2 10 8
3 15 9
In [39]:
x = [1, 2, 3]
y = [5, 10, 15]
z = [7, 8, 9]

for data in zip(x, y, z):
    print(data)
(1, 5, 7)
(2, 10, 8)
(3, 15, 9)

Reminder

In [41]:
d = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
dna = 'AACCGGTT'
In [42]:
#remember how to access to keys of the dictionary ?
d.keys()
Out[42]:
dict_keys(['G', 'T', 'A', 'C'])
In [43]:
# and values ?
d.values()
Out[43]:
dict_values(['C', 'A', 'T', 'G'])

Iterators

In [44]:
for k in d.keys():
    print(k)
G
T
A
C
In [47]:
for v in d.values():
    print(v)
C
A
T
G
In [48]:
for k,v in d.items():
    print(k,v)
G C
T A
A T
C G
In [49]:
complement = ''
for this in dna:
    complement += d[this]
dna, complement
Out[49]:
('AACCGGTT', 'TTGGCCAA')

Sequence comprehension is a very important concept in Python. You will see them everywhere. Why ?
1. Faster (although not always)
2. More Pythonic (elegant code)

Let us create a list made of 100,000 random values

In [60]:
%%timeit -n 20
X = []
for x in range(100000):
    X.append(random.random())
20 loops, best of 3: 13.8 ms per loop
In [61]:
%%timeit -n 20
X = [random.random() for x in range(100000)]
20 loops, best of 3: 10.6 ms per loop

List comprehension with condition

In [5]:
import random
data = [random.random()-0.5 for x in range(100000)]
In [8]:
%%timeit -n 20
X = []
for this in data:
    if this > 0:
        X.append(this)
20 loops, best of 3: 6.81 ms per loop

The same code with list comprehension

In [10]:
%%timeit -n 20
X = [this for this in data if this>0.5]
20 loops, best of 3: 2.52 ms per loop

So, in the first case, LC is 30% faster, and here it is 30% slower.

Application: flatten a list

starting from :

X = [[0], [1,2], [3,4], [5,6]]

get a flatten list

X = [0, 1, 2, 3, 4, 5, 6]
In [71]:
X = [[0], [1,2], [3,4], [5,6]]
In [81]:
%%timeit  # equivalent to 
flatX = []
for item in X:
    for this in item:
        flatX.append(this)
1000000 loops, best of 3: 795 ns per loop
In [85]:
%%timeit 
flatX = [this for item in X for this in item]
1000000 loops, best of 3: 586 ns per loop

Here the extend is actually faster than a list comprehension

set comprehension

As for list we can build a set

{expression for item in iterable}
{expression for item in iterable if condition}
In [12]:
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", ["name", "comment", "sequence", "cut", "end"])
ecor1 = RestrictEnzyme("EcoR1", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
bamh1 = RestrictEnzyme("BamH1", "type II restriction endonuclease from Bacillus amyloliquefaciens", "ggatcc", 1, "sticky")
hind3 =  RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
sma1 =  RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
digest = [ecor1, bamh1, hind3, sma1]

s1 = {enz.name for enz in digest}
s2 = {enz.name for enz in digest if enz.end != 'blunt'}

print(s1)
print(s2)
{'EcoR1', 'BamH1', 'HindIII', 'SmaI'}
{'EcoR1', 'BamH1', 'HindIII'}

list, dict, set Comprehension manipulation

given the following dict :

d = {1 : 'a', 2 : 'b', 3 : 'c' , 4 : 'd'}

We want obtain a new dict with the keys and the values inverted so we will obtain:

inverted_d  {'a': 1, 'c': 3, 'b': 2, 'd': 4}

Use the raise keywords

In [207]:
# Could be handy to be able to raise an error if an argument is incorrect:
def kelvin2celsius(k):
    if k < 0:
        raise ValueError('Invalid value. should be positive')
    return k - 273.15
In [208]:
kelvin2celsius(-1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-208-5a39f02da7cc> in <module>()
----> 1 kelvin2celsius(-1)

<ipython-input-207-930ee5c8a687> in kelvin2celsius(k)
      2 def kelvin2celsius(k):
      3     if k < 0:
----> 4         raise ValueError('Invalid value. should be positive')
      5     return k - 273.15

ValueError: Invalid value. should be positive
In [289]:
import exceptions
print(dir(exceptions))
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BufferError', 'BytesWarning', 'DeprecationWarning', 'EOFError', 'EnvironmentError', 'Exception', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError', 'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '__doc__', '__name__', '__package__']

Exception may be raised by Python when a statement is wrong:

In [225]:
x = 0
1/x
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-225-bfc3ae6379af> in <module>()
      1 x = 0
----> 2 1/x

ZeroDivisionError: division by zero

and want to catch this exception

In [226]:
x = 0
try:
    print(1./x)
except:
    # if an exception is raised, do something else:
    print('x should be non-zero')
x should be non-zero

We know more about the exception, we know it is an error due to a division by zero so let us be more explicit

In [227]:
x = 0
try:
    print(1./x)
except ZeroDivisionError:
    print('x should be non-zero')
        
x should be non-zero

What if the input is a string ? We could also catch this TypeError

In [228]:
x = "2"
try:
    print(1./x)
except TypeError:
    print("warning. wrong type. Cast to float")
    print(1./float(x))
except ZeroDivisionError:
    print('x should be non-zero')
        
warning. wrong type. Cast to float
0.5

what if x is set to "0" then ?

In [231]:
# maybe we can do something clever here such as converting the string into 
# a float
x = '0'
try:
    print(1./x)
except TypeError:
    try:
        print(1./float(x))
    except ZeroDivisionError:
        print('x should be non-zero')    
except ZeroDivisionError:
    print('x should be non-zero')
x should be non-zero

whether the try block succeeds or not, the finally keyword is used to add a cleanup block

In [239]:
try:
    f = open("temp.txt", "w")
    # something wrong here e.g. cannot open the file
    # comment/uncomment the raise 
    raise IOError
except IOError as err:
    print("error raised")
    print(err)
finally:
    print("We cleanup things here")
    f.close()
error raised

We cleanup things here

In [ ]:
class MyException:
    pass
In [246]:
class MyException(Exception):
     def __init__(self, value):
         self.value = value
     def __str__(self):
         return repr(self.value)

try:
    raise(MyException('Is this raised ? yes '))
except MyException as err:
     print('MyException occured, value: %s' %  err.value)
MyException occured, value: Is this raised ? yes 

Decorators are functions that takes as input another function.

Here are two functions that do not accept x=0 as input parameter

In [258]:
import math

def function1(x):
    if x==0: print('warning')
    else: return 1/x
    
def function2(x):
    if x < 0: print('warning')
    else: return math.log(x)

Definition

In [270]:
def check_decorator(func):   # Func is the function name to be decorated
    def wrap(*args):
        if args[0] <= 0:
            print('warning from "{0}", First arg ({1}) \
is negative !!!!'.format(func.__name__, args[0]))
        else:
            return func(*args)
    return wrap

call

In [271]:
@check_decorator
def function1(x):
    return 1/x

@check_decorator
def function2(x):
    return math.log(x)
In [272]:
function1(-2)
function2(0)
warning from "function1", First arg (-2) is negative !!!!
warning from "function2", First arg (0) is negative !!!!
Decorator are very useful and simplifies code (less errors).

However, the syntax may be difficult especially for decorating methods, classes, with / without arguments.

Keep also in mind that it mayhave non-negligeable computational cost

A generator is a function that contains the yield keyword

Generators functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.

The range() function iterates from a low to a high value.

Let us write a new range function that is symmetric: start from low to high, and then goes down to the low value again

In [105]:
# Without generator:
def symrange(low, high):
    a = list(range(low, high))
    b = list(range(high-2, low-1, -1))
    a.extend(b)
    return a
for a in symrange(0,5):
    print("**" * a)
**
****
******
********
******
****
**

What would happen if the symrange is made of million of points. Well, a lot of memory for nothing really special. Here comes the generator:

In [106]:
def symrange(low, high):
    for a in range(low, high):
        yield a
    for b in range(high-2, low-1, -1):
        yield b
a = symrange(0,3)  
a
Out[106]:
<generator object symrange at 0x7fa944116888>
In [107]:
for x in a:
    print(x)
0
1
2
1
0
In [274]:
# once looped over, the generator raise an error 
next(a)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-274-ee7ee0698186> in <module>()
      1 # once looped over, the generator raise an error
----> 2 next(a)

TypeError: 'int' object is not an iterator

List comprehension versus generators

In [127]:
import random
letters = ['A', 'C', 'T', 'G']
seq = [letters[random.randint(0, 3)] for i in range(0, 100000)]
In [128]:
%%timeit -n 20
sum([1 for x in seq if x != 'T'])
20 loops, best of 3: 4.16 ms per loop
In [129]:
%%timeit -n 20
values_gen = (1 for x in seq if x!='T')
sum(values_gen)
20 loops, best of 3: 6.04 ms per loop
Generators are not necesseraly faster but less memory is used !
In [130]:
import urllib.request
req = urllib.request.urlopen('http://www.uniprot.org/uniprot/P56945.fasta')
data = req.read().decode()
In [131]:
print(data)
>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA

In [132]:
header, sequence = data.split("\n", 1)  # use argument 1 to split only once (the header)
In [74]:
# extract sequence removing trailing characters 
sequence = sequence.replace("\n", "")
In [133]:
# counts characters and store them
counter = {}
for x in set(sequence): 
    counter[x] = sequence.count(x)
In [134]:
print(header)
keys = sorted(counter.keys())
print(" ".join(keys))
print(" ".join([str(counter[x]) for x in keys]))
>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2

 A C D E F G H I K L M N P Q R S T V W Y
15 88 4 52 47 19 67 23 14 34 84 10 16 107 59 40 62 44 66 6 28

There is always something in Python to do basic tasks

In [135]:
from collections import Counter
In [136]:
counter2 = Counter(sequence)
print(header)
print(" ".join(counter2.keys()))
print(" ".join([str(counter2[k]) for k in keys]))
>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
L T N P F R K D E C V H A 
 Y S Q M I W G
15 88 4 52 47 19 67 23 14 34 84 10 16 107 59 40 62 44 66 6 28

Summary

Lots of Python keywords introduced in this notebook:

  • Boolean logic:
    • boolean Type: True/False
    • logic: and, is, or, not
  • Control flows:
    • loop: for, break, continue, while
    • condition: if, elif, else
      • Exceptions
      • assert, try, except, finally, raise
  • generator:
    • yield
  • function:
    • def, pass, return
  • Functions:
    • By default functions return None
    • Any number of arguments.

With for loops, think about enumerate and zip to avoid counters

List comprehensions are very common are can replace a combination of loop and condition

Generator are very powerful for large data sets to save memory

If you do want to use decorator, that's fine but if you are serious about programming in Python, think about it, especially if you want to program with object oriented approach.