.. sectnum:: :start: 7 .. _Control_Flow_Statements: *********************** Control Flow Statements *********************** Exercises ========= Exercise -------- The Fibonacci sequence are the numbers in the following integer sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ... By definition, the first two numbers in the Fibonacci sequence are 0 and 1, and each subsequent number is the sum of the previous two. The Fibonacci suite can be defined as following: | F\ :sub:`0` = 0, F\ :sub:`1` = 1. | | F\ :sub:`n` = F\ :sub:`n-1` + F\ :sub:`n-2` Write a function that takes an integer ``n`` as parameter and returns a list containing the ``n`` first numbers of the Fibonacci sequence. .. literalinclude:: _static/code/fibonacci_iteration.py :linenos: :language: python :download:`fibonacci_iteration.py <_static/code/fibonacci_iteration.py>` . We will see another way more elegant to implement the Fibonacci suite in :ref:`Advanced Programming Techniques` section. Exercise -------- Reimplement your own function ``max`` (call it ``my_max``). This function will take a list or tuple of floats or integers and return the largest element. Write the pseudocode before proposing an implementation. pseudocode ^^^^^^^^^^ | *function my_max(l)* | *max <- first elt of l* | *for each elt of l* | *if elt is > max* | *max <- elt* | *return max* implementation ^^^^^^^^^^^^^^ :: def my_max(seq): """ return the maximum value in a sequence work only with integer or float """ highest = seq[0] for i in seq: if i > highest: highest = i return highest l = [1, 2, 3, 4, 58, 9] print(my_max(l)) 58 Exercise -------- We want to create a "restriction map" of two sequences. Create the following enzymes:: ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky") ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt") bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens", "ggatcc", 1, "sticky") hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1, "sticky") taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1, "sticky") not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2, "sticky") sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0, "sticky") hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2, "blunt") sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3, "blunt") Then create the following two DNA fragments:: dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggtt gagcgatccccgtcagttggcgtgaattcagcagcagcgcaccccgggcgtagaattccagtt gcagataatagctgatttagttaacttggatcacagaagcttccagaccaccgtatggatccc aacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg""" dna_2 = """gagcatgagcggaattctgcatagcgcaagaatgcggccgcttagagcgatg ctgccctaaactctatgcagcgggcgtgaggattcagtggcttcagaattcctcccgggagaa gctgaatagtgaaacgattgaggtgttgtggtgaaccgagtaagagcagcttaaatcggagag aattccatttactggccagggtaagagttttggtaaatatatagtgatatctggcttg""" * In a file #. Create a function *one_line_dna* that transforms a multi-line sequence into a single-line DNA sequence. #. Create a collection containing all enzymes #. Create a function that takes two parameters: #. a sequence of DNA #. a list of enzyme and returns a collection containing the enzymes which cut the DNA. Which enzymes cut: * ``dna_1``? * ``dna_2``? * ``dna_1`` but not ``dna_2``? .. _enzyme_exercise: Exercise -------- We want to establish a restriction map of a sequence. But we will do this step by step, and reuse the enzymes used in the previous exercise: * Create a function that takes a sequence and an enzyme as parameters, and returns the position of the first binding site. (Write the pseudocode.) **pseudocode** | *function one_enz_binding_site(dna, enzyme)* | *if enzyme binding site is substring of dna* | *return first position of substring in dna* **implementation** .. literalinclude:: _static/code/restriction.py :linenos: :lines: 1-16 :language: python * Improve the previous function to return all positions of binding sites. **pseudocode of first algorithm** | *function one_enz_binding_sites(dna, enzyme)* | *positions <- empty* | *if enzyme binding site is substring of dna* | *add the position of the first substring in dna in positions* | *positions <- find binding_sites in rest of dna sequence* | *return positions* **implementation** .. literalinclude:: _static/code/restriction.py :linenos: :lines: 17-33 :language: python **pseudocode of second algorithm** | *function one_enz_all_binding_sites_2(dna, enzyme)* | *positions <- empty* | *find first position of binding site in dna* | *while we find binding site in dna* | *add position of binding site to positions* | *find first position of binding site in dna in rest of dna* | *return positions* **implementation** .. literalinclude:: _static/code/restriction.py :linenos: :lines: 34-56 :language: python * Search all positions of Ecor1 binding sites in ``dna_1``. :: ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky") dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg""" * Generalize the binding sites function to take a list of enzymes and return a list of tuples (enzyme name, position). **pseudocode** | *function binding_sites(dna, set of enzymes)* | *positions <- empty* | *for each enzyme in enzymes* | *pos <- one_enz_binding_sites(dna, enzyme)* | *pos <- for each position create a tuple enzyme name, position* | *positions <- pos* | *return positions* **implementation** In bonus, we can try to sort the list in the order of the position of the binding sites like this:: [('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ... .. literalinclude:: _static/code/restriction.py :linenos: :lines: 57- :language: python :: ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky") ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt") bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky") hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky") taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky") not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky") sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky") hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt") sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt") and the two dna fragments: :: dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg""" dna_2 = """gagcatgagcggaattctgcatagcgcaagaatgcggccgcttagagcgatgctgccctaaactctatgcagcgggcgtgagg attcagtggcttcagaattcctcccgggagaagctgaatagtgaaacgattgaggtgttgtggtgaaccgagtaag agcagcttaaatcggagagaattccatttactggccagggtaagagttttggtaaatatatagtgatatctggcttg""" enzymes= (ecor1, ecor5, bamh1, hind3, taq1, not1, sau3a1, hae3, sma1) binding_sites(dna_1, enzymes) [('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ('SmaI', 95), ('EcoRI', 105), ('Sau3aI', 144), ('HindIII', 152), ('BamHI', 173), ('Sau3aI', 174), ('BamHI', 193), ('Sau3aI', 194)] binding_sites(dna_2, enzymes) [('EcoRI', 11), ('NotI', 33), ('HaeIII', 35), ('EcoRI', 98), ('SmaI', 106), ('EcoRI', 179), ('HaeIII', 193), ('EcoRV', 225)] :download:`restriction.py <_static/code/restriction.py>` . Bonus ^^^^^ If you prefer the enzyme implemented as namedtuple: :download:`restriction_namedtuple.py <_static/code/restriction_namedtuple.py>` . Exercise -------- Write a ``uniqify_with_order`` function that takes a list and returns a new list without any duplicate, but keeping the order of items. For instance:: >>> l = [5, 2, 3, 2, 2, 3, 5, 1] >>> uniqify_with_order(l) [5, 2, 3, 1] Solution :: >>> uniq = [] >>> for item in l: >>> if item not in uniq: >>> uniq.append(item) Solution :: >>> uniq_items = set() >>> l_uniq = [x for x in l if x not in uniq_items and not uniq_items.add(x)]