7 Control Flow Statements

7.1 Exercises

7.1.1 Exercise

The Fibonacci sequence are the numbers in the following integer sequence:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …

By definition, the first two numbers in the Fibonacci sequence are 0 and 1, and each subsequent number is the sum of the previous two. The Fibonacci suite can be defined as following:

F0 = 0, F1 = 1.

Fn = Fn-1 + Fn-2

Write a function that takes an integer n as parameter and returns a list containing the n first numbers of the Fibonacci sequence.

 1
 2def fibonacci(n):
 3    """
 4    compute the suite of fibonacci for n element and return it
 5    for instance ::
 6
 7        fibonacci(7)
 8        [0, 1, 1, 2, 3, 5, 8, 13, 21]
 9
10    :param int n: the nth elemnt of the fibonacci suite
11    :return: the nfirst elt of the fibonacci suite
12    :rtype: list of int
13    """
14    fib_suite = []
15    i = 0
16    while i <= n:
17        if i == 0:
18            fib_suite.append(0)
19        elif i == 1:
20            fib_suite.append(1)
21        else: 
22            res = fib_suite[-1] + fib_suite[-2]
23            fib_suite.append(res)
24        i += 1
25    return fib_suite
26
27print(', '.join([str(i) for i in fibonacci(10)]))
28
29
30def fibonacci_2(n):
31    """
32    compute the fibonacci of the elt n
33    :param int n:
34    :return: the fibonacci of the elt n
35    :rtype: int
36    """
37    a = 0
38    b = 1
39    for i in range(n):
40        a, b = b, a + b
41    return b
42

fibonacci_iteration.py . We will see another way more elegant to implement the Fibonacci suite in Advanced Programming Techniques section.

7.1.2 Exercise

Reimplement your own function max (call it my_max). This function will take a list or tuple of floats or integers and return the largest element.

Write the pseudocode before proposing an implementation.

7.1.2.1 pseudocode

function my_max(l)
max <- first elt of l
for each elt of l
if elt is > max
max <- elt
return max

7.1.2.2 implementation

def my_max(seq):
   """
   return the maximum value in a sequence
   work only with integer or float
   """
   highest = seq[0]
   for i in seq:
      if i > highest:
          highest = i
   return highest

l = [1, 2, 3, 4, 58, 9]
print(my_max(l))
58

7.1.3 Exercise

We want to create a “restriction map” of two sequences.

Create the following enzymes:

ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens", "ggatcc", 1, "sticky")
hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1, "sticky")
taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1, "sticky")
not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2, "sticky")
sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0, "sticky")
hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2, "blunt")
sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3, "blunt")

Then create the following two DNA fragments:

dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggtt
gagcgatccccgtcagttggcgtgaattcagcagcagcgcaccccgggcgtagaattccagtt
gcagataatagctgatttagttaacttggatcacagaagcttccagaccaccgtatggatccc
aacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg"""

dna_2 = """gagcatgagcggaattctgcatagcgcaagaatgcggccgcttagagcgatg
ctgccctaaactctatgcagcgggcgtgaggattcagtggcttcagaattcctcccgggagaa
gctgaatagtgaaacgattgaggtgttgtggtgaaccgagtaagagcagcttaaatcggagag
aattccatttactggccagggtaagagttttggtaaatatatagtgatatctggcttg"""
  • In a file <my_file.py>

  1. Create a function one_line_dna that transforms a multi-line sequence into a single-line DNA sequence.

  2. Create a collection containing all enzymes

  3. Create a function that takes two parameters:

    1. a sequence of DNA

    2. a list of enzyme

    and returns a collection containing the enzymes which cut the DNA.

Which enzymes cut:

  • dna_1?

  • dna_2?

  • dna_1 but not dna_2?

7.1.4 Exercise

We want to establish a restriction map of a sequence. But we will do this step by step, and reuse the enzymes used in the previous exercise:

  • Create a function that takes a sequence and an enzyme as parameters, and returns the position of the first binding site. (Write the pseudocode.)

pseudocode

function one_enz_binding_site(dna, enzyme)
if enzyme binding site is substring of dna
return first position of substring in dna

implementation

 1from operator import itemgetter
 2
 3# we decide to implement enzyme as tuple with the following structure
 4# (name, comment, sequence, cut, end)
 5#    0     1         2       3    4
 6
 7
 8def one_enz_one_binding_site(dna, enzyme):
 9    """
10    :return: the first position of enzyme binding site in dna or None if there is not
11    :rtype: int or None
12    """
13    pos = dna.find(enzyme[2])
14    if pos != -1:
15        return pos
16
  • Improve the previous function to return all positions of binding sites.

pseudocode of first algorithm

function one_enz_binding_sites(dna, enzyme)
positions <- empty
if enzyme binding site is substring of dna
add the position of the first substring in dna in positions
positions <- find binding_sites in rest of dna sequence
return positions

implementation

 1        
 2def one_enz_all_binding_sites(dna, enzyme):
 3    """
 4    :param dna: the dna sequence to search enzyme binding sites
 5    :type dna: str
 6    :param enzyme: the enzyme to looking for
 7    :type enzyme:  a tuple (str name, str comment, str sequence, int cut, str end)
 8    :return: all positions of enzyme binding sites in dna
 9    :rtype: list of int
10    """
11    positions = []
12    pos = dna.find(enzyme[2])
13    while pos != -1:
14        positions.append(pos)
15        pos = dna.find(enzyme[2], pos + 1)
16    return positions
17

pseudocode of second algorithm

function one_enz_all_binding_sites_2(dna, enzyme)
positions <- empty
find first position of binding site in dna
while we find binding site in dna
add position of binding site to positions
find first position of binding site in dna in rest of dna
return positions

implementation

 1
 2def one_enz_all_binding_sites2(dna, enzyme):
 3    """
 4
 5    :param dna: the dna sequence to search enzyme binding sites
 6    :type dna: str
 7    :param enzyme: the enzyme to looking for
 8    :type enzyme:  a tuple (str name, str comment, str sequence, int cut, str end)
 9    :return: all positions of enzyme binding sites in dna
10    :rtype: list of int
11    """
12    positions = []
13    pos = dna.find(enzyme[2])
14    while pos != -1:
15        if positions:
16            positions.append(pos)
17        else:
18            positions = pos + positions[-1]
19        new_seq = dna[pos + 1:]
20        pos = new_seq.find(enzyme[2])
21        pos = pos
22    return positions
23
  • Search all positions of Ecor1 binding sites in dna_1.

ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")

dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag
cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga
ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg"""
  • Generalize the binding sites function to take a list of enzymes and return a list of tuples (enzyme name, position).

pseudocode

function binding_sites(dna, set of enzymes)
positions <- empty
for each enzyme in enzymes
pos <- one_enz_binding_sites(dna, enzyme)
pos <- for each position create a tuple enzyme name, position
positions <- pos
return positions

implementation

In bonus, we can try to sort the list in the order of the position of the binding sites like this:

[('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ...
 1
 2def binding_sites(dna, enzymes):
 3    """
 4    return all positions of all enzymes binding sites present in dna
 5    sort by the increasing position.
 6
 7    :param dna: the dna sequence to search enzyme binding sites
 8    :type dna: str
 9    :param enzymes: the enzymes to looking for
10    :type enzymes: a sequence of enzyme tuple
11    :return: all positions of each enzyme binding sites in dna
12    :rtype: list of int
13    """
14    positions = []
15    for enzyme in enzymes:
16        pos = one_enz_all_binding_sites(dna, enzyme)
17        pos = [(enzyme[0], pos) for pos in pos]
18        positions.extend(pos)
19    positions.sort(key=itemgetter(1))
20    return positions 
21
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")

and the two dna fragments:

dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag
cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga
ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg"""

dna_2 = """gagcatgagcggaattctgcatagcgcaagaatgcggccgcttagagcgatgctgccctaaactctatgcagcgggcgtgagg
attcagtggcttcagaattcctcccgggagaagctgaatagtgaaacgattgaggtgttgtggtgaaccgagtaag
agcagcttaaatcggagagaattccatttactggccagggtaagagttttggtaaatatatagtgatatctggcttg"""

enzymes= (ecor1, ecor5, bamh1, hind3, taq1, not1, sau3a1, hae3, sma1)
binding_sites(dna_1, enzymes)
[('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ('SmaI', 95), ('EcoRI', 105),
('Sau3aI', 144), ('HindIII', 152), ('BamHI', 173), ('Sau3aI', 174), ('BamHI', 193), ('Sau3aI', 194)]

binding_sites(dna_2, enzymes)
[('EcoRI', 11), ('NotI', 33), ('HaeIII', 35), ('EcoRI', 98), ('SmaI', 106),
('EcoRI', 179), ('HaeIII', 193), ('EcoRV', 225)]

restriction.py .

7.1.4.1 Bonus

If you prefer the enzyme implemented as namedtuple:

restriction_namedtuple.py .

7.1.5 Exercise

Write a uniqify_with_order function that takes a list and returns a new list without any duplicate, but keeping the order of items. For instance:

>>> l = [5, 2, 3, 2, 2, 3, 5, 1]
>>> uniqify_with_order(l)
[5, 2, 3, 1]

Solution

>>> uniq = []
>>> for item in l:
>>>   if item not in uniq:
>>>      uniq.append(item)

Solution

>>> uniq_items = set()
>>> l_uniq = [x for x in l if x not in uniq_items and not uniq_items.add(x)]