.. _Collection_Data_types: ********************* Collection Data Types ********************* It is often convenient to hold entire collections of data items. Python provides several collection data types that can hold items. Each collection data type has properties which make it more suitable for some problems and less for others. Some collection types are *immutable* and others are *mutable*. As we have seen in :ref:`Variables chapter `: * *immutable* objects are objects whose state **cannot** be changed. The collection is "frozen" after it's creation. Of course we can access its items (and even modify them if they are mutable), but we cannot change the collection itself, for instance add, remove or reorder the collection. * *mutable* objects are objects whose state can be modified. We can access the items and we **can** modify the collection too: add, remove, change or sort items. .. _sequences: Sequence Types ============== A **sequence** type is one that supports the membership operator ``in``, the size function ``len``, and slicing (``[]``) and is iterable (i.e. it is possible to loop over its elements using a ``for`` statement, something we will see later). | Python provides three main sequence types: ``tuple``, ``list``, and ``string``. | Strings have been covered in the :ref:`previous chapter `. | Some other useful sequence types are provided by the standard library, for instance ``collections.namedtuple``. .. _tuple: Tuples ------ A tuple is an **immutable ordered sequence** of zero or more object references. Tuple creation ^^^^^^^^^^^^^^ An empty tuple can be created with parentheses ``()``:: >>> a = () >>> type(a) A tuple with only one item is created with parentheses containing the item, **followed by a coma**, otherwise, we just obtain a new reference to the item itself:: >>> b = (1) >>> b = (1,) >>> type(b) To create a tuple with more items, the items should be separated with commas. In that case, parentheses are optional:: >>> c = (1, 2) >>> type(c) >>> c = 1, 2 >>> type(c) >>> c (1, 2) The tuple data type can be called as a function, ``tuple``. Without any argument, it returns an empty tuple:: >>> tuple() () With a tuple as argument, it returns a reference to the same tuple:: >>> t1 = 1, 2, "foo" >>> print(t1) (1, 2, 'foo') >>> t2 = tuple(t1) >>> print(t2) (1, 2, 'foo') >>> t1 is t2 True With any other argument, it attempts to convert the given object into a tuple:: >>> tuple("ESWN") ('E', 'S', 'W', 'N') This is not always possible:: >>> tuple(42) Traceback (most recent call last): File "", line 1, in TypeError: 'int' object is not iterable **Tuple creation: what happens in memory.** .. image:: _static/figs/tuple.png :width: 400px :align: left :alt: tuple creation, what happen in memory. * Python creates three objects: two integers one string. * Python creates a tuple with three "slots". * Each slot refers to one object, in the same order they have been "declared". * The object reference ``t1`` is created and references the tuple object. .. container:: clearer .. image :: _static/figs/spacer.png Tuples as sequence types ^^^^^^^^^^^^^^^^^^^^^^^^ Standard capabilities of sequence types work as expected: * Access to tuple elements is achieved using square brackets ``[]``:: >>> t1[1] 2 >>> t1[0:2] (1, 2) >>> t1[-1] 'foo' >>> t1[::-1] ('foo', 2, 1) * Membership can be tested using ``in``:: >>> 3 in t1 False >>> "foo" in t1 True * The number of elements can be obtained using ``len``:: >>> len(t1) 3 This is very similar to what we have seen with strings, and we will soon see that it works the same for lists. Tuple provide just two methods ``t.count(x)``, which returns the number of of times object ``x`` occurs in tuple ``t``, and ``t.index(x)``, which returns the index position of the left most occurrence of object ``x`` in tuple ``t``:: >>> t1.count("bar") 0 >>> t1.count("foo") 1 >>> t1.index("foo") 2 If the object ``x`` is not found in ``t``, ``count`` raises a ``ValueError``:: >>> t1.index("bar") Traceback (most recent call last): File "", line 1, in ValueError: tuple.index(x): x not in tuple Tuple immutability ^^^^^^^^^^^^^^^^^^ As tuples are immutable, we cannot replace, delete or add any items after creation. For instance, let's try to replace the first element of ``t1``:: >>> t1[0] = 3 Traceback (most recent call last): File "", line 1, in TypeError: 'tuple' object does not support item assignment If we try to extend the tuple with another one, a new tuple is created:: >>> id(t1) 140537354293504 >>> t1 += ("bar",) >>> t1 (1, 2, 'foo', 'bar') >>> id(t1) 140537354917632 This works similarly with strings, which are also immutable. Named tuples ^^^^^^^^^^^^ The standard library contains a ``collections`` module that provides various collections-related functionalities. Among them is the ``collections.namedtuple`` function that can create specialized tuples with an extended syntax to access elements as **object attributes**. Using such "named tuples" is a way to improve the readability of your program with a lightweight form of the object-oriented aspect of Python. For instance, we might want to use tuples to represent structured information about restriction enzymes, each item of the tuple being one piece of information:: >>> ecor1 = ("EcoR1", "Ecoli restriction enzyme I", "gaatcc", 1, "sticky") In order to coherently manipulate such tuples, we need to remember what position in the tuple corresponds to what piece of information. With named tuples, we can create a special kind of tuple where each position is associated with a name:: >>> from collections import namedtuple >>> RestrictEnzyme = namedtuple("RestrictEnzyme", ("name", "comment", "sequence", "cut", "end")) We can now use this new type to represent a given restriction enzyme:: >>> ecor1 = RestrictEnzyme(name="EcoR1", comment="Ecoli restriction enzyme I", sequence="gaatcc", cut=1, end="sticky") >>> ecor1 RestrictEnzyme(name='EcoR1', comment='Ecoli restriction enzyme I', sequence='gaatcc', cut=1, end='sticky') It is also possible to omit the names when creating a named tuple, but one has to be careful to use the correct order for the items:: >>> ecor1 = RestrictEnzyme("EcoR1", "Ecoli restriction enzyme I", "gaatcc", 1, "sticky") >>> ecor1 RestrictEnzyme(name='EcoR1', comment='Ecoli restriction enzyme I', sequence='gaatcc', cut=1, end='sticky') Items inside a named tuple can be accessed by index, as with normal tuples, but also by name:: >>> ecor1[0] 'EcoR1' >>> ecor1.name 'EcoR1' >>> ecor1[2] 'gaatcc' >>> ecor1.sequence 'gaatcc' Accessing items by name makes it more explicit what the code is meant to do. While we're here, let's create a whole collection of these interesting objects:: >>> bamh1 = RestrictEnzyme("BamH1", "type II restriction endonuclease from Bacillus amyloliquefaciens", "ggatcc", 1, "sticky") >>> hind3 = RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky") >>> sma1 = RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt") >>> digest = (ecor1, bamh1, hind3, sma1) .. _list: Lists ----- A list is a **mutable ordered sequence** of zero or more object references. List creation ^^^^^^^^^^^^^ Lists can be created in similar ways as tuples, but using square brackets instead of parentheses. An empty list is created using empty brackets:: >>> a = [] >>> type(a) >>> print(a) [] Just calling the ``list`` type as a function also works:: >>> list() [] A difference with respect to tuple creation is that the comma is not required when creating a list with only one element:: >>> b = [1,] >>> type(b) >>> b = [1] >>> type(b) And brackets are not optional, since a sequence of comma separated items is otherwise a tuple:: >>> c = [1, 2] >>> type(c) >>> c = 1, 2 >>> type(c) One can also create a list by converting another object, using the ``list`` type as a conversion function:: >>> list("ESWN") ['E', 'S', 'W', 'N'] >>> list((1, 2, "foo")) [1, 2, 'foo'] Lists as sequence types ^^^^^^^^^^^^^^^^^^^^^^^ The standard capabilities of sequence types work as for tuples and strings: * Access to list elements:: >>> l1 = [1, 2, "foo"] >>> l1[1] 2 >>> l1[0:2] [1, 2] >>> l1[-1] 'foo' >>> l1[::-1] ['foo', 2, 1] * Membership testing:: >>> 3 in l1 False >>> "foo" in l1 True * Number of elements:: >>> len(l1) 3 The ``count`` and ``index`` methods are also available:: >>> l1.count("bar") 0 >>> l1.count("foo") 1 >>> l1.index("foo") 2 >>> l1.index("bar") Traceback (most recent call last): File "", line 1, in ValueError: 'bar' is not in list Comparing lists ^^^^^^^^^^^^^^^ Lists can be compared using the standard comparison operators (``==``, ``!=``, ``>=``, ``<=``, ``<``, ``>``). The comparison will be applied item by item (and recursively for nested items such as lists in a list):: >>> one_two_three = [1, 2, 3] >>> one_four = [1, 4] >>> one_two_three > one_four False >>> one_two_threefour = [1, 2, [3, 4]] >>> one_two_threefive = [1, 2, [3, 5]] >>> one_two_threefive > one_two_threefour True List mutability ^^^^^^^^^^^^^^^ An important difference with tuples or strings is that one can replace, remove or add elements to a list. Before proceeding, let's look at our list's identity:: >>> id(l1) 140537354412152 Now let's try to replace it's first element (at index ``0``):: >>> l1[0] = 3 >>> l1 [3, 2, 'foo'] >>> id(l1) 140537354412152 Our replacement attempt did not result in an exception, and the list identity has not changed: it is indeed the same object. Now let's extend the list with another one (concatenating):: >>> l1 += ["bar", "baz"] >>> l1 [3, 2, 'foo', 'bar', 'baz'] >>> id(l1) 140537354412152 Contrary to what happened with our tuple, this did not result in the creation of a new object. There is also an ``append`` method to add an element (just one) at the end of a list:: >>> l1.append("qux") >>> l1 [3, 2, 'foo', 'bar', 'baz', 'qux'] >>> id(l1) 140537354412152 Again, identity was preserved. A list modification equivalent to using ``+=`` is achieved using the ``extend`` method:: >>> l1.extend([3.2, 17]) >>> l1 [3, 2, 'foo', 'bar', 'baz', 'qux', 3.2, 17] To remove an element from a list, use the ``remove`` method:: >>> l1.remove(2) >>> l1 [3, 'foo', 'bar', 'baz', 'qux', 3.2, 17] If the element is present more than once, the first occurrence is removed:: >>> l1.append(3) >>> l1 [3, 'foo', 'bar', 'baz', 'qux', 3.2, 17, 3] >>> l1.remove(3) >>> l1 ['foo', 'bar', 'baz', 'qux', 3.2, 17, 3] Another method that removes an element from a list is ``pop``. By default, it removes the last element of the list and returns it:: >>> l1.pop() 3 >>> l1.pop() 17 >>> l1 ['foo', 'bar', 'baz', 'qux', 3.2] It is possible to specify the index of the element to remove:: >>> l1.pop(2) 'baz' >>> l1 ['foo', 'bar', 'qux', 3.2] >>> l1.pop(2) 'qux' >>> l1 ['foo', 'bar', 3.2] If one tries to provide an index outside the range of existing positions in the list, an ``IndexError`` exception is raised:: >>> l1.pop(3) Traceback (most recent call last): File "", line 1, in IndexError: pop index out of range Yet another way to remove something from a list is with the ``del`` statement, applied to one of the object reference contained in the list:: >>> del l1[1] >>> l1 ['foo', 3.2] It can also be applied to a full slice of the list:: >>> del l1[0:2] >>> l1 [] It is also possible to assign a sequence of items to a list slice:: >>> l1[:] = [1, 2, "foo"] >>> l1 [1, 2, 'foo'] >>> l1[1:] = [3, 3.2, 17] >>> l1 [1, 3, 3.2, 17] >>> l1[1:2] = [2, "foo"] >>> l1 [1, 2, 'foo', 3.2, 17] >>> l1[3:] = [] >>> l1 [1, 2, 'foo'] Finally, here are two very useful methods that can modify a list:: >>> numbers = [5, 1, 0, 3, 2, 4] >>> numbers.reverse() >>> numbers [4, 2, 3, 0, 1, 5] >>> numbers.sort() >>> numbers [0, 1, 2, 3, 4, 5] It is also possible to sort a list in decreasing value order, setting the ``reverse`` optional argument of ``sort`` to ``True``:: >>> numbers.sort(reverse=True) >>> numbers [5, 4, 3, 2, 1, 0] More generally, all operations available for `sequence types `_ and `mutable sequence types `_ are available for lists. Copies of lists ^^^^^^^^^^^^^^^ A consequence of list mutability is that using the ``list`` type as a function on a list argument results in a new object, not just a reference to the same object:: >>> l2 = list(l1) >>> print(l2) [1, 2, 'foo'] >>> l1 is l2 False This makes it possible to replace, remove or add items without affecting the original:: >>> l2[0] = 3 >>> l2 [3, 2, 'foo'] >>> l1 [1, 2, 'foo'] The same happens when using the ``copy`` method of a list:: >>> l3 = l1.copy() >>> l1 is l3 False >>> l3[1] = 4 >>> l3 [1, 4, 'foo'] >>> l1 [1, 2, 'foo'] However, the copy is only a **shallow** copy. This means that only the reference of the container is a copy, not the references of the contained items:: >>> l1[0] is l3[0] True The reference of the items that were replaced have changed, though:: >>> l1[1] is l3[1] False .. warning:: When working with copies of objects containing references to mutable objects, be aware that the modification of a contained object accessed through a copy of the original container also affects the object contained in the original, since it is actually the same:: >>> l1 = [[1, 2], [3, 4]] >>> l2 = l1.copy() >>> l2[0].append("foo") >>> l2 [[1, 2, 'foo'], [3, 4]] >>> l1 [[1, 2, 'foo'], [3, 4]] .. _lists_comprehension: Lists Comprehensions ^^^^^^^^^^^^^^^^^^^^ Small lists are often created using literals but long lists are usually created programmatically. To create a list from another sequence object, Python offers a very convenient syntax: the **list comprehension**. A list comprehension is an expression where a :ref:`loop `, with an optional :ref:`condition `, is enclosed in square brackets. The loop is used to generate items for the list and the condition filters out unwanted items. | [*expression* **for** *item* **in** *iterable*] | [*expression* **for** *item* **in** *iterable* **if** *condition*] Here are two nice examples where we use list comprehensions to build lists of restriction enzyme names from our tuple of restriction enzyme named tuples:: >>> [enz.name for enz in digest] ['EcoR1', 'BamH1', 'HindIII', 'SmaI'] >>> [enz.name for enz in digest if enz.end != "blunt"] ['EcoR1', 'BamH1', 'HindIII'] List or tuple, which one should I choose? ----------------------------------------- We have seen that lists and tuples behave similarly in many aspects. The main reason to choose one or the other relates to mutability: * Use a list if you need to modify the data. * Use a tuple if you want read-only data, or simply if you know that you will not need to modify the data. In the second case, this may protect you against programming errors, and you may benefit from better performance. Set Types ========= Set types are **unordered** collections data types containing unique elements. They are mainly interesting due to their fast membership testing, and classical mathematical operations like ``union``, ``intersection``, ``difference``. They are also used to easily eliminate duplicates from a collection. As `collections `_, they support the ``in`` and ``len`` operators and are iterable. However, contrary to sequence types, they are **not ordered**. Consequently, they do not support item access by index or slicing. When iterated, set types provide their items in an **arbitrary** order. Two builtin set types are available: ``set``, which is mutable, and ``frozenset``, which is not. .. _hashability: Only **hashable** objects may be part of a set (or frozenset). Hashable objects are objects for which it is possible to compute a "hash" value (a value summarizing the object, provided by their ``__hash__`` method), that is always the same throughout the object’s lifetime, and which can be compared for equality. This enables fast membership testing and elimination of duplicates. The built-in immutable data types, such as ``float``, ``frozenset``, ``int``, ``str``, and ``tuple``, are hashable and can be part of a (frozen)set. An exception is tuples that contain non hashable elements. The built-in mutable data types, such as ``dict``, ``list``, and ``set``, are not hashable, so they cannot part of a (frozen)set. Sets ---- A set is an unordered collection of zero or more object references that refer to hashable objects. Sets are mutable, so we can easily ``add`` or ``remove`` items, but since they are unordered they have **no** notion of index position and so **cannot** be sliced or strided. Set creation ^^^^^^^^^^^^ An empty set can be created calling the ``set`` data type as a function, with no argument:: >>> s0 = set() >>> type(s0) >>> print(s0) set() A set with one or more elements can be created by enclosing the elements between curly braces ``{}``:: >>> s1 = {"foo"} >>> s1 {'foo'} >>> s2 = {"baz", "qux"} >>> s2 {'qux', 'baz'} >>> s3 = {"foo", "bar", "baz"} >>> s3 {'foo', 'baz', 'bar'} .. note:: Empty curly braces create an empty ``dict``, not an empty ``set``. It is also possible to create a set calling the ``set`` type as a function on an iterable of hashable elements:: >>> nucleotides = set("ACGTU") >>> nucleotides {'U', 'A', 'G', 'T', 'C'} Here, the individual letters of the string have been used as elements in the set. >>> unique_numbers = set([4, 2, 1, 2, 5, 1, 9, 3, 2, 1]) >>> unique_numbers {1, 2, 3, 4, 5, 9} Note how duplicates are eliminated in a set. Since sets are not hashable, it is not possible to have a set inside a set. However, the ``frozenset`` type is a mutable set type, which is hashable. .. figure:: _static/figs/set.png :width: 600px :alt: set :figclass: align-center This figure illustrates the set created by the following code snippet:: >>> miscellanea = {"foo bar", 2, ecor1, frozenset({8, 4, 7}), -29, (3, 4, 5)} .. container:: clearer .. image :: _static/figs/spacer.png Sets always contains unique items. It safe to add several times the same item but pointless. .. _set_methods_and_operator: Operations on sets ^^^^^^^^^^^^^^^^^^ Sets support ``len`` and fast membership testing ``in`` and ``not in``:: >>> len(s1) 1 >>> len(s2) 2 >>> len(s3) 3 >>> "foo" in s1 True >>> "foo" in s2 False >>> "qux" not in s1 True >>> "qux" not in s2 False They also support usual set operators: * Union: ``|`` * Intersection: ``&`` * Difference: ``-`` * Symmetric difference: ``^`` Here are some examples:: >>> pecan = set("pecan") >>> pie = set("pie") >>> print(pecan, "...", pie) {'n', 'p', 'c', 'a', 'e'} ... {'p', 'i', 'e'} >>> pecan | pie {'n', 'p', 'a', 'i', 'e', 'c'} >>> pecan & pie {'p', 'e'} >>> pecan - pie {'n', 'a', 'c'} >>> pie - pecan {'i'} >>> pecan ^ pie {'n', 'c', 'a', 'i'} >>> pie ^ pecan {'n', 'a', 'i', 'c'} Sets can be compared using standard comparison operators, and the ``isdisjoint`` method. Sets are equal if they contain the same elements:: >>> set("baobab") == {"a", "b", "o"} True >>> set("baobab") == set("blackout") False s1 ^ s2 != s1 | s2 False Sets are disjoint if they do not share any element:: >>> s1.isdisjoint(s2) True Inequalities test inclusion relationships between sets:: >>> set("baobab") < set("blackout") True >>> {1, 2, 3} < {1, 2, 4} False >>> {1, 2, 3} >= {1, 2, 4} False >>> {1, 2, 3} < {1, 2, 3, 4} True >>> {1, 2, 3} <= {1, 2, 3, 4} True Since they are mutable, sets also support methods and operators that modify them. The ``add`` method adds an element to a set:: >>> nucleotides {'U', 'A', 'G', 'T', 'C'} >>> id(nucleotides) 140537313773528 >>> nucleotides.add("N") >>> nucleotides {'U', 'A', 'G', 'T', 'C', 'N'} >>> id(nucleotides) 140537313773528 The ``update`` method adds the elements in the iterable given as argument:: >>> nucleotides.update(set("WSMKRY")) >>> nucleotides {'K', 'W', 'U', 'A', 'S', 'G', 'R', 'M', 'T', 'C', 'Y', 'N'} An element can be removed from a set using the ``remove`` method:: >>> nucleotides.remove("U") >>> nucleotides {'K', 'W', 'A', 'S', 'G', 'R', 'M', 'T', 'C', 'Y', 'N'} It is also possible to use updating versions of the standard set operations:: >>> nucleotides &= set("ACGT") >>> nucleotides {'C', 'T', 'G', 'A'} >>> id(nucleotides) 140537313773528 >>> nucleotides |= set("RY") >>> nucleotides {'T', 'A', 'C', 'Y', 'G', 'R'} >>> nucleotides -= set("YTC") >>> nucleotides {'A', 'G', 'R'} More details about available methods and operators can be found here: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset .. _sets_comprehension: Set Comprehension ^^^^^^^^^^^^^^^^^ As we can build a list using an expression (see :ref:`lists_comprehension`) we can create sets | {*expression* **for** *item* **in** *iterable*} | {*expression* **for** *item* **in** *iterable* **if** *condition*} .. code-block:: python >>> {enz.name for enz in digest} {'HindIII', 'SmaI', 'BamH1', 'EcoR1'} >>> {enz.name for enz in digest if enz.end != 'blunt'} {'HindIII', 'BamH1', 'EcoR1'} Frozensets ---------- Frozensets behave pretty much like sets, except that they are immutable. As already mentioned, one implication is that frozensets are hashable, and can therefore be elements of a set or a frozenset. The other main implication is that, obviously, operations that modify sets are not valid for frozensets. Frozensets have to be created via a call to ``frozenset``. With no argument, an empty frozenset is returned:: >>> f0 = frozenset() >>> type(f0) >>> print(f0) frozenset() With an iterable as argument, a frozenset is made from the elements in the iterable:: >>> f1 = frozenset(["foo"]) >>> f2 = frozenset(s2) >>> f3 = frozenset(("bar", "foo", "baz")) >>> f1 frozenset({'foo'}) >>> f2 frozenset({'qux', 'baz'}) >>> f3 frozenset({'foo', 'baz', 'bar'}) Then, frozensets can be manipulated using the same operations as sets, provided that they do not modify them:: >>> f1 | f2 frozenset({'qux', 'foo', 'baz'}) >>> f2 & f3 frozenset({'baz'}) >>> "foo" in f2 False >>> f1.isdisjoint(f3) False The updating versions of the operators will create a new object:: >>> id(f3) 140537313774392 >>> f3 ^= f2 >>> f3 frozenset({'qux', 'foo', 'bar'}) >>> id(f3) 140537313773960 Attempts to use modifying set methods result in ``AttributeError`` exceptions:: >>> f1.add("baz") Traceback (most recent call last): File "", line 1, in AttributeError: 'frozenset' object has no attribute 'add' >>> f2.update({"bar", "foo"}) Traceback (most recent call last): File "", line 1, in AttributeError: 'frozenset' object has no attribute 'update' .. _mapping_types: Mapping Types ============= Mappings are collections of (key, value) items and provide methods for accessing items, keys, and values. In a mapping type, a value is associated to a key. The key provides a direct access to the corresponding value, without iterating over all the collection. In Python the mapping types are also called "dictionaries". Only hashable objects may be used as dictionary keys, so immutable data types such as ``float``, ``frozenset``, ``int``, ``str``, and ``tuple`` can be used as dictionary keys (except tuples containing non hashable items), but mutable types such as ``dict``, ``list``, and ``set`` cannot. On the other hand, each value associated to a key can be an object reference referring to an object of any type, including numbers, strings, lists, sets, dictionaries, functions, and so on. Dictionaries can be compared using the standard equality comparison operators (``==`` and ``!=``), with the comparisons being applied item by item (and recursively for nested items such as tuples or dictionaries inside dictionaries). Comparisons using the other comparison operators (``<``, ``<=``, ``>=``, ``>``) are not supported since they don’t make sense for unordered collections such as dictionaries. | Python provides one main an very useful mapping type, the built-in ``dict`` type. | The ``collections`` module of the standard library provides a handy ``collections.defaultdict`` type. .. _dictionaries: Dictionaries ------------ A dict is an **unordered collection** of zero or more **(key, value) pairs** whose keys are object references that refer to hashable objects, and whose values are object references referring to objects of any type. Dictionaries are **mutable**, so we can easily add or remove items, but since they are unordered they have no notion of index position and so cannot be sliced or strided. Dict creation ^^^^^^^^^^^^^ As you may guess from previous collection examples, an empty dict can be created using the ``dict`` data type called as a function, with no arguments:: >>> d0 = dict() >>> type(d0) >>> print(d0) We already mentioned that an empty pair of curly braces created an empty dict. Curly braces can also be used to create a non empty dict, with items consisting in ``key: value`` pairs separated by commas:: >>> d = {"id": 1948, "name": "Washer", "size": 3} >>> d {'id': 1948, 'name': 'Washer', 'size': 3} Or the ``dict`` type can be used to convert a sequence of ``(key, value)`` tuples into a dict:: >>> d = dict([("id", 1948), ("name", "Washer"), ("size", 3)]) >>> d {'id': 1948, 'name': 'Washer', 'size': 3} The ``dict`` type also accepts **keyword arguments** that will result in (key, value) items in the returned dict:: >>> d = dict(id=1948, name="Washer", size=3) >>> d {'id': 1948, 'name': 'Washer', 'size': 3} Another way of creating dictionaries is to use a dictionary comprehension, a topic we will cover later. .. figure:: _static/figs/dict.png :width: 600px :alt: set :figclass: align-center The above figure illustrates the dictionary created by the following code snippet:: >>> d1 = {0: 1 , (2, 10): "foo", -1: ["a", "b", "c"], "Ecor1": ecor1} .. _dict_methods_and_operator: Operations on dicts ^^^^^^^^^^^^^^^^^^^ Brackets are used to access individual values:: >>> d1[0] 1 >>> d1[-1] ['a', 'b', 'c'] If the key is not present in the dict, a ``KeyError`` exception is raised:: >>> d1[91] Traceback (most recent call last): File "", line 1, in KeyError: 91 Brackets can also be used to add, replace or delete dictionary items. To add a new item, use the ``=`` operator:: >>> d1["X"] = 59 >>> d1 {0: 1, (2, 10): 'foo', -1: ['a', 'b', 'c'], 'Ecor1': {'name': 'EcoR1', 'seq': 'gaattc', 'comment': 'restriction site 1 for Ecoli'}, 'X': 59} Dictionary keys are unique, so if we add a (key, value) item whose key is the same as an existing key, the effect is to **replace** the value associated to that key with the new one:: >>> d1[(2, 10)] = "bar" >>> d1 {0: 1, (2, 10): 'bar', -1: ['a', 'b', 'c'], 'Ecor1': {'name': 'EcoR1', 'seq': 'gaattc', 'comment': 'restriction site 1 for Ecoli'}, 'X': 59} To delete an item use the ``del`` statement:: >>> del d1["Ecor1"] >>> d1 {0: 1, (2, 10): 'bar', -1: ['a', 'b', 'c'], 'X': 59} A ``KeyError`` :ref:`exceptions` is raised if no item has that key:: >>> del d1[frozenset({1, 2, 3})] Traceback (most recent call last): File "", line 1, in KeyError: frozenset({1, 2, 3}) Items can also be removed (and returned) from the dictionary using the ``dict.pop`` method, one of the various methods provided by the ``dict`` type: .. tabularcolumns:: |p{3cm}|p{12cm}| +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | Syntax Description | | +====================+=============================================================================================================================+ | d.clear() | Removes all items from dict d | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.copy() | Returns a shallow copy of dict d | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.fromkeys(s, v) | Returns a dict whose keys are the items in sequence s and whose values are None or v if v is given | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.get(k) | Returns key k’s associated value, or None if k isn’t in dict d | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.get(k, v) | Returns key k’s associated value, or v if k isn’t in dict d | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.items() | Returns a view of all the (key, value) pairs in dict d | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.keys() | Returns a view of all the keys in dict d | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.pop(k) | Returns key k’s associated value and removes the item | | | whose key is k, or raises a KeyError exception if k isn’t in d | | | whose key is k, or returns v if k isn’t in dict d | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.popitem() | Returns and removes an arbitrary (key, value) pair from dict d, or raises a KeyError exception if d is empty | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.setdefault(k, v) | The same as the dict.get() method, except that if the key is | | | not in dict d, a new item is inserted with the key k , and with | | | a value of None or of v if v is given | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ | d.update(a) | Adds every (key, value) pair from a that isn’t in dict d to d , | | | and for every key that is in both d and a, replaces the corre- | | | sponding value in d with the one in a — a can be a dictionary, | | | an iterable of (key, value) pairs, or keyword arguments | +--------------------+-----------------------------------------------------------------------------------------------------------------------------+ .. note:: In Python 3, the ``dict.items``, ``dict.keys``, and ``dict.values`` methods all return "dictionary views". A dictionary view is effectively a read-only iterable object that appears to hold the dictionary items or keys or values, depending on the view we have asked for. In general, we can simply treat views as iterables. However, two things make a view different from a normal iterable. One is that if the dictionary the view refers to is changed, the view reflects the change. The other is that key and item views support some set-like operations. Given dictionary view ``v`` and set or dictionary view ``x``, the supported operations are: * Intersection: ``v & x`` * Union: ``v | x`` * Difference: ``v - x`` * Symmetric difference: ``v ^ x`` In Python3 :: >>> d = {1: "a", 2: "b", 3: "c", 4: "e"} >>> v = d.keys() >>> v dict_keys([1, 2, 3, 4]) >>> type(v) >>> d[5] = "c" >>> v dict_keys([1, 2, 3, 4, 5]) >>> x = {4, 5, 6} >>> v & x {4, 5} >>> v - x {1, 2, 3} >>> v | x {1, 2, 3, 4, 5, 6} >>> v ^ x {1, 2, 3, 6} Dict Comprehension ^^^^^^^^^^^^^^^^^^ A *dictionary comprehension* is an expression and a loop with an optional condition enclosed in braces, very similar to a set comprehension. Like list and set comprehensions, two syntaxes are supported:: {keyexpression: valueexpression for key, value in iterable} {keyexpression: valueexpression for key, value in iterable if condition} :: ecor1 = ("EcoR1", "Ecoli restriction enzime I", "gaattc", 1, "sticky") bamh1 = ("BamH1", "type II restriction endonuclease from Bacillus amyloliquefaciens", "ggatcc", 1, "sticky") hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky") sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt") digest = [ecor1, bamh1, hind3, sma1] # now I need a collection to acces direcly to the enzyme given its name # so I will create a dictionary where keys are enzyme name and values the enzymes frig = {enz[0] : enz for enz in digest} # if I want a collection with only cohesive end enzymes cohesive_enz = {enz[0]] : enz for enz in digest if enz[-1] != 'blunt'} Iterating and copying collections ================================= Once we have collections of data items, it is natural to want to iterate over all the items they contain. Another common requirement is to copy a collection. There are some subtleties involved here because of Python’s use of object references (for the sake of efficiency), so in the last :ref:`subsection `, we will examine how to copy collections and get the behavior we want. .. _iterating_over_col: Iterating over collections -------------------------- An iterable data type is one that can return each of its items one at a time. .. _iterator: iterator ^^^^^^^^ An iterator is an object which is able read through a collection and return items one by one in turn. The ``next`` method of iterator returns each successive item in turn, and raises a ``StopIteration`` exception when there are no more items. The order in which items are returned depends on the underlying iterable. In the case of lists and tuples, items are normally returned in sequential order starting from the first item (index position 0), but some iterators return the items in an arbitrary order for example, dictionary and set iterators. Any (finite) iterable, i , can be converted into a tuple by calling tuple(i) , or can be converted into a list by calling list(i) . Iterator support also ``all()``, ``any()``, ``len()``, ``min()``, ``max()``, and ``sum()`` functions. Here are a couple of usage examples: :: >>> x = [-2, 9, 7, -4, 3] >>> all(x), any(x), len(x), min(x), max(x), sum(x) (True, True, 5, -4, 9, 13) >>> x.append(0) >>> all(x), any(x), len(x), min(x), max(x), sum(x) (False, True, 6, -4, 9, 13) The enumerate() function takes an iterator and returns an enumerator object. This object can be treated like an iterator, and at each iteration it returns a 2-tuple with the tuple’s first item the iteration number (by default starting from 0) :: seq = 'TACCTTCTGAGGCGGAAAGA' for i , b in enumerate(seq): print(i, b) 0 T 1 A 2 C 3 C 4 T 5 T 6 C ... on so on 140 .. _iterable_operators: Common Iterable Operators and Functions .. tabularcolumns:: |p{2.5cm}|p{15cm}| +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Syntax | Description | +===========================+======================================================================================================================================================================================================================================================+ | s + t | Returns a sequence that is the concatenation of sequences s and t | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | s * n | Returns a sequence that is int n concatenations of sequence s and t | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | x in i | Returns True if item x is in iterable i ; use not in to reverse the test | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | all(i) | Returns True if every item in iterable i evaluates to True | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | any(i) | Returns True if any item in iterable i evaluates to True | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | enumerate(i, start) | Normally used in for ... in loops to provide a sequence of (index, item) tuples with indexes starting at 0 or start ; | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | len(x) | Returns the “length” of x . If x is a collection it is the number of items; if x is a string it is the number of characters. | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | max(i, key) | Returns the biggest item in iterable i or the item with the biggest key(item) value if a key function is given | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | min(i, key) | Returns the smallest item in iterable i or the item with the smallest key(item) value if a key function is given | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | range(start, stop, step) | Returns an integer iterator. With one argument ( stop ), the iterator goes from 0 to stop - 1; with two arguments ( start , stop ) the iterator goes from start to stop - 1; with three arguments it goes from start to stop - 1 in steps of step . | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | reversed(i) | Returns an iterator that returns the items from iterator i in reverse order | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | sorted(i, key, reverse) | Returns a list of the items from iterator i in sorted order; key is used to provide DSU (Decorate, Sort, Undecorate) sorting. If reverse is True the sorting is done in reverse order. | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | sum(i, start) | Returns the sum of the items in iterable i plus start (which defaults to 0); i may not contain strings | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | zip(i1, ..., iN) | Returns an iterator of tuples using the iterators i1 to iN ; see text | +---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ The for ... in Statement ^^^^^^^^^^^^^^^^^^^^^^^^ Python’s **for** loop has the following syntax: **for** *variable* **in** *iterable*\ **:** do something **else:** do something else the **else** block is optional. .. note:: We already specify that Python uses indentation to signify its block structure. So here the identation is very important. The block "for" begin with the **for** ... **in** statement and included all lines which are right indented. The block ends when the code is align again with the **for** ... **in** statement for instance: .. code-block:: python :linenos: for i in [1,2,3]: begin of **for** block of code do something do another thing print("foo") | The block of code begin line n°1. | The lines n° 2,3,4 are executed 3 times with the variable i which is bound successively to 1, 2, 3. | The line n° 5 start a new block of code, and is executed when the for loop is fnished. The Python style guidelines (pep 8) recommend four spaces per level of indentation, and only spaces (no tabs). In for ... in loop, the variable is set to refer to each object in the iterable in turn. each line of code in the for .. in block is executed at each turn using the variable refering the new object. :: bases = 'acgt' for b in bases: print('base =', b) base = a base = c base = g base = t z = 0 for i in [1,2,3]: z += i print(f"i = {i}, z = {z}") i = 1, z = 1 i = 2, z = 3 i = 3, z = 5 .. figure:: _static/figs/for_loop_flow.png :width: 600px :alt: for loop code execution flow :figclass: right-center the schema above symbolizes the code execution flow with the "for" loop. * in green the source code * in blue the execution source code results * in orange the execution flow The for loops has an optional else clause. This latter is rather confusingly named since the else clause’s suite is always executed if the loop terminates normally. If the loop is broken out of due to a break statement, or a return statement (if the loop is in a function or method), or if an exception is raised, the else clause’s suite is not executed. .. note:: The *variable* is normally often a single variable but may be a sequence of variables, usually in the form of a tuple. If a tuple or list is used for the *variable*, each item is unpacked into the matchin items of the *variable*:: enzymes = [('ecor1', 'gaattc'), ('bamh1','ggatcc'), ('hind3', 'aagctt')] for name, seq in enzymes: print(name, ' .... ', seq) ecor1 .... gaattc bamh1 .... ggatcc hind3 .... aagctt break and continue """""""""""""""""" If a ``continue`` statement is executed inside the for ... in loop’s suite, control is immediately passed to the top of the loop and the next iteration begins. If the loop runs to completion it terminates, and any ``else`` suite is executed. If the loop is broken out of due to a ``break`` statement, or a ``return`` statement (if the loop is in a function), or if an exception is raised, the ``else`` clause’s suite is **not** executed. (If an exception occurs, Python skips the else clause and looks for a suitable exception handler—this is covered in the next section.) :: enzymes = [('ecor1', 'gaattc'), ('bamh1','ggatcc'), ('hind3', 'aagctt')] for name, seq in enzymes: if name == 'bamh1': continue print(name, ' .... ', seq) ecor1 .... gaattc hind3 .... aagctt .. figure:: _static/figs/continue_flow.png :width: 600px :alt: for loop code execution flow :figclass: right-center the schema above symbolizes the code execution flow with the "for" loop, with a **continue** statement. * in green the source code * in blue the execution source code results * in orange the execution flow .. code-block:: python enzymes = [('ecor1', 'gaattc'), ('bamh1','ggatcc'), ('hind3', 'aagctt')] for name, seq in enzymes: if name == 'bamh1': break print(name, ' .... ', seq) ecor1 .... gaattc .. figure:: _static/figs/break_flow.png :width: 600px :alt: for loop code execution flow :figclass: right-center the schema above symbolizes the code execution flow with the "for" loop, with a **break** statement. * in green the source code * in blue the execution source code results * in orange the execution flow .. container:: clearer .. image :: _static/figs/spacer.png .. raw:: latex \newpage .. _copying_coll: copying collections ------------------- Since Python uses object references, when we use the assignment operator ( = ), no copying takes place. If the right-hand operand is a literal such as a string or a number, the left-hand operand is set to be an object reference that refers to the in-memory object that holds the literal’s value. If the right-hand operand is an object reference, the left-hand operand is set to be an object reference that refers to the same object as the right-hand operand. One consequence of this is that assignment is very efficient. In some situations, we really do want a separate copy of the collection (or other mutable object). For sequences, when we take a slice. The slice is always an independent copy of the items copied. So to copy an entire sequence we can do this: .. code-block:: python >>> ascii = ['a','b','c'] >>> ascii_copy = ascii[:] For dictionaries and sets, copying can be achieved using dict.copy() and set.copy() . In addition, the copy module provides the copy.copy() function that returns a copy of the object it is given. Another way to copy the built-in collec- tion types is to use the type as a function with the collection to be copied as its argument. Here are some examples: * copy_of_dict_d = dict(d) * copy_of_list_L = list(L) * copy_of_set_s = set(s) Note, though, that all of these copying techniques are **shallow** that is, **only object references are copied and not the objects themselves**. .. code-block:: python >>> ascii = ['a','b','c'] >>> ascii_copy = ascii[:] # shallow copy >>> ascii[2] = 'z' >>> ascii ['a', 'b', 'z'] >>> ascii_copy = ['a','b','c'] >>> ascii_copy.append('e') >>> ascii_copy ['a','b','c','e'] .. figure:: _static/figs/shallow_copy_of_col_of_imutable.png :width: 600px :alt: shallow copy :figclass: right-center the schema above represent what python do behind the scene when we do a shallow copy. Only object references are copied and not the objects themselves. For immutable data types like numbers and strings this has the same effect as copying (except that it is more efficient). But for mutable data types such as nested collections this means that the objects they refer to are referred to both by the original collection and by the copied collection (the objects in l and l2 have the same *id()*). .. code-block:: python >>> ascii = ['a','b','c'] >>> integer = [1,2,3] >>> l = [ascii, integer] >>> l2 = l[:] # shallow copy >>> >>> l[0] ['a', 'b', 'c'] >>> print id(l), id(l2) 140530764842408 140530764842480 # l and l2 are 2 different objects >>> id(ascii) 140504986917992 >>> id(l[0]) 140504986917992 >>> id(l2[0]) 140504986917992 # the object they refer are the same .. figure:: _static/figs/shallow_copy_of_col_of_mutable.png :width: 600px :alt: shallow copy :figclass: right-center the schema above represent what python do behind the scene when we do a shallow copy. .. code-block:: python >>> ascii[0] = 'z' >>> l[0] ['z', 'b', 'c'] >>> l2[0] ['z', 'b', 'c'] >>> l2.append('foo') >>> l2 [['z', 'b', 'c'],[1, 2, 3], 'foo'] >>> l [['z', 'b', 'c'],[1, 2, 3]] >>> tpl = (ascii, integer) >>> tpl (['z', 'b', 'c'], [1, 2, 3]) >>> integer[0] = -99 >>> tpl (['z', 'b', 'c'], [-99, 2, 3]) In these conditions we must keep in mind that if we mutate an item of the collection the both collections are modified. In programmation, we call this a *side effect*. We saw the side effect problem on *list* and *tuple* example but it's also true with dictionnaries. If we really need independent copies of arbitrarily nested collections, we have to do a *deep-copy*. .. code-block:: python >>> import copy >>> ascii = ['a','b','c'] >>> integer = [1,2,3] >>> l = [ascii, integer] >>> l2 = copy.deepcopy(l) 140481236949328 140481236947168 # l and l2 are 2 different objects >>> print id(l[0]), id(l2[0]) 139909363381672 139909362940312 # the objects they refer have the same value but are distincts. >>> ascii[0] = 'z' >>> l [['z', 'b', 'c'], [1, 2, 3]] >>> l2 [['a', 'b', 'c'], [1, 2, 3]] .. figure:: _static/figs/deep_copy_of_mutable_col.png :width: 600px :alt: deep copy :figclass: right-center The above schema represents what Python does behind the scene when we do a deep copy. Usually the terms *copy* and *shallow copy* are used interchangeably. For *deep copy* we have to mention it explicitly. Exercises ========= Exercise -------- Draw the representation in memory and specify the data type of each object of the following expressions:: x = [1, 2, 3, 4] y = x[1] y = 3.14 x[1] = 'foo' and :: x = [1, 2, 3, 4] x += [5, 6] Compare :: x = 3 y = x y += 3 x = ? y = ? and :: x = [1,2] y = x y += [3,4] x = ? y = ? Exercise -------- .. note:: ``sum`` is a function that returns the sum of all the elements of a list. Without using the Python shell, tell what are the effects of the following statements:: x = [1, 2, 3, 4] x[3] = -4 # What is the value of x now? y = sum(x) / len(x) # What is the value of y? Why? .. Exercise .. -------- .. .. note:: .. This question might not be relevant for Python 3 .. How to compute safely the average of a list? Exercise -------- Draw the representation in memory of the ``x`` and ``y`` variables when the following code is executed:: x = [1, ['a', 'b', 'c'], 3, 4] y = x[1] y[2] = 'z' # What is the value of x? Exercise -------- From the list ``l = [1, 2, 3, 4, 5, 6, 7, 8, 9]`` generate 2 lists: * ``l1`` containing all odd values * ``l2`` all even values. Exercise -------- .. note:: A codon is a triplet of nucleotides. A nucleotide can be one of the four letters A, C, G, T Write a function that returns a list containing strings representing all possible codons. Write the pseudocode before proposing an implementation. Exercise -------- Write a function ``uniqify`` which takes a list as parameter and returns a new list without any duplicate, regardless of the order of the items. For instance:: >>> l = [5, 2, 3, 2, 2, 3, 5, 1] >>> uniqify(l) [1,2,3,5] # is one of the valid solutions Exercise -------- .. note:: A **k-mer** is a word made of k letters. We need to compute the occurrence of all k-mers of a given length present in a sequence. Below we propose 2 algorithms. pseudo code 1 ^^^^^^^^^^^^^ | *function get_kmer_occurences(seq, kmer_len)* | *all_kmers <- generate all possible kmer of kmer_len* | *occurences <- 0* | *for each kmer in all_kmers* | *count occurence of kmer* | *store occurence* pseudo code 2 ^^^^^^^^^^^^^ | *function get_kmer_occurences(seq, kmer_len)* | *all_kmers <- empty* | *from i = 0 to sequence length - kmer_len* | *kmer <- kmer startin at pos i im sequence* | *increase by of occurence of kmer* Compare the pseudocode of these 2 algorithms and implement the fastest one.:: """gtcagaccttcctcctcagaagctcacagaaaaacacgctttctgaaagattccacactcaatgccaaaatataccacag gaaaattttgcaaggctcacggatttccagtgcaccactggctaaccaagtaggagcacctcttctactgccatgaaagg aaaccttcaaaccctaccactgagccattaactaccatcctgtttaagatctgaaaaacatgaagactgtattgctcctg atttgtcttctaggatctgctttcaccactccaaccgatccattgaactaccaatttggggcccatggacagaaaactgc agagaagcataaatatactcattctgaaatgccagaggaagagaacacagggtttgtaaacaaaggtgatgtgctgtctg gccacaggaccataaaagcagaggtaccggtactggatacacagaaggatgagccctgggcttccagaagacaaggacaa ggtgatggtgagcatcaaacaaaaaacagcctgaggagcattaacttccttactctgcacagtaatccagggttggcttc tgataaccaggaaagcaactctggcagcagcagggaacagcacagctctgagcaccaccagcccaggaggcacaggaaac acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca""" Compute the 6-mers occurrences of the sequence above, and print each 6-mer and it's occurrence one per line. Bonus ^^^^^ Print the k-mers by ordered by occurrences. | see `https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types `_ | see `https://docs.python.org/3/library/operator.html#operator.itemgetter `_ Exercise -------- | Write a function which takes a sequence as parameter and returns it's reversed complement. | Write the pseudocode before to propose an implementation. Test data:: seq = 'acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca' .. _enzyme_exercise: Exercise -------- Build the following enzymes collection:: ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky") ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt") bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky") hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1, "sticky") taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1, "sticky") not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2, "sticky") sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0, "sticky") hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2, "blunt") sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3, "blunt") We have two DNA fragments:: dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg""" dna_2 = """gagcatgagcggaattctgcatagcgcaagaatgcggccgcttagagcgatgctgccctaaactctatgcagcgggcgtgagg attcagtggcttcagaattcctcccgggagaagctgaatagtgaaacgattgaggtgttgtggtgaaccgagtaag agcagcttaaatcggagagaattccatttactggccagggtaagagttttggtaaatatatagtgatatctggcttg""" Which enzymes cut: * ``dna_1``? * ``dna_2``? * ``dna_1`` but not ``dna_2``? * In a file #. Copy paste the enzymes above #. Write a function *seq_one_line* which takes a multi lines sequence and returns a sequence in one line. #. Write a function *enz_filter* which takes a sequence and a list of enzymes and returns a new list containing the enzymes which have a restriction site in the sequence. #. Use the above functions to compute the enzymes which cut ``dna_1``, apply the same functions to compute the enzymes which cut ``dna_2`` and compute the difference between the enzymes which cut ``dna_1`` and those which cut ``dna_2``. Exercise -------- Given the following dict:: d = {1 : 'a', 2 : 'b', 3 : 'c' , 4 : 'd'} We want obtain a new dict with the keys and the values inverted so we will obtain:: >>> inverted_d {'a': 1, 'c': 3, 'b': 2, 'd': 4} Bonus ----- Can you explain what's happening here? :: >>> x = [] >>> x.append(x.append) >>> x[0](len) >>> x[1](x) 2 >>> x[0](x.pop) >>> x[1](x) 3 >>> x[0](x[2]()) >>> x[1](x) 3 >>> x[2](1)(x) 2 >>> y = x[1](0) >>> y(y)