.. Object_Oriented_Programming: *************************** Object Oriented Programming *************************** | | | | | We start this chapter by an exercise. Try to model students and classrooms: * A student must have at least two properties: a name and scores. * A classroom must have a name and students. Implement also: * a function that computes the average of a student's scores; * an other one to compute the average of all students' scores. | | | | | | | | | | | | | | We can use a tuple to pack together "name" and "notes", "name" and "students": .. literalinclude:: _static/code/classroom1.py :linenos: :language: python But when we try to guess the type of ``jim`` and ``math`` python said ``tuple``. We cannot distinguish programmatically what ``john`` and ``math`` model without reading all the code. Furthermore, the code for the average functions are not really readable. To improve readability we can use `namedtuple`\ s: .. literalinclude:: _static/code/classroom2.py :linenos: :language: python This solves both problems of readability and data type. But now we want to add a new property to a student: it's *phone number*. The problem is that the *phone number* can vary during the student's career. Which is not allowed by the tuple. So we can try to use a mutable data structure: a `dict`: .. literalinclude:: _static/code/classroom3.py :linenos: :language: python It's not so bad but be loose the data type feature. Furthermore, we must have two different names for the average of classroom and students. If we model student and classroom with OOP ("Object Oriented Programming") we can pack together properties and functions that are applied on this data type: .. literalinclude:: _static/code/classroom4.py :linenos: :language: python A **class**, for instance ``Student``, is the recipe to build an object. * ``john`` is an object, it's an **instance** of ``Student``. * ``john`` and ``jim`` have the same data type, even they have distinct values. * ``average`` is called a **method**. It's a function which is applied on a specific data type. The concept of packing together data (attributes) and behavior (methods) is called **encapsulation**. Concepts and Terminology ======================== What is an Object ? ------------------- In programming an object is a concept. Calling elements of our programs objects is a metaphor, a useful way of thinking about them. In Python the basic elements of programming are things like strings, dictionaries, integers, functions, and so on ... They are all objects. This means they have certain things in common. In previous chapter we use the procedural style of programming. This divides your program into reusable 'chunks' called procedures or functions. As much as possible you try to keep your code in these modular chunks using logic to decide which chunk is called. This makes it less mental strain to visualise what your program is doing. It also makes it easier to maintain your code. you can see which parts does what job. Improving one function (which is reused) can improve performance in several places of your program. An object can also modeling an non real life object. For instance a parser there is no equivalent object in our lives but we need a parser to read a file in fasta format and create a sequence object so we can modeling a parser, idem with a database connection it's not real life object but it's very useful to think a connection as an object with properties like the port of the connection, the host of destination, ... and some behaviors: connect, disconnect ... The object is very simple idea in the computing world. The objects allow us to organize code in a programs and cut things in small chunk to ease thinking about complexes ideas. Classes ------- A class definition can be compared to the recipe to bake a cake. A recipe is needed to bake a cake. The main difference between a recipe (class) and a cake (an instance or an object of this class) is obvious. A cake can be eaten when it is baked, but you can't eat a recipe, unless you like the taste of printed paper. Like baking a cake, an OOP program constructs objects according to the class definitions of the program program. A class contains variables and methods. If you bake a cake you need ingredients and instructions to bake the cake. In Python lot of people use *class*, *data type* and *type* interchangeably. To create a custom class we have to use the keyword ``class`` followed by the name of the class the code belonging to a class in in the same block of code (indentation):: class ClassName: suite class Sequence: code ... Some positional or keyword parameters can be add between parenthesis (these have to do with a more advanced concept in OOP: **inheritance**):: class ClassName(base_classes, meta=MyMetaClass): suite .. note:: `PEP-8 `_: Class names should normally use the CapWords convention. Objects ------- A *class* is a model, a template, an object is an *instance* of this model. We can use the metaphor of the cake and the recipe. You bake two cakes by following a recipe. The class is the recipe, you have two objects, the two cakes which are the instances of the same recipe. Each cake has been made with the same ingredients but there are two independent cakes, a part of the first can be eaten whereas the second is still in the fridge:: # The model >>> class Cake: ... pass ... # apple_pie is an instance of Cake. >>> apple_pie = Cake() >>> type(apple_pie) # pear_pie is an instance of Cake. >>> pear_pie = Cake() >>> type(pear_pie) # The two objects are not the same. >>> apple_pie is pear_pie False Attributes ---------- Data attributes (often called simply **attributes**) are references to data associated to an object. There are two kinds of attributes: **instance variables**, or **class variables**. An instance variable is directly associated to a particular object whereas a class variable is associated to a class then all objects which are instances of this class share the same variables (to more details see section about environments). We will not encounter lot of class variables. We can access to instance variable by its fully qualified name using the name of the instance and the name of attribute separated by a dot. We can access to the class variables using the fully qualified name through the class or through the instances of this class. Objects are mutable. You can change the state of an object by making an assignment to one of its attributes:: >>> class Sequence: ... # class variable ... alphabet = 'ATGC' ... ... def __init__(self, seq): ... """ ... :param seq: the sequence ... :type seq: string ... """ ... # instance variable ... self.sequence = seq ... >>> ecor_1 = Sequence('GAATTC') >>> bamh_1 = Sequence('GGATCC') >>> print(ecor_1.sequence) GAATTC >>> print(bamh_1.sequence) GGATCC >>> print(Sequence.alphabet) ATGC >>> print(ecor_1.alphabet) ATGC >>> print(bamh_1.alphabet) ATGC >>> ecor_1 is bamh_1 False >>> ecor_1.alphabet is bamh_1.alphabet True >>> Sequence.alphabet = 'ATGCN' >>> print(ecor_1.alphabet) ATGCN Methods ------- In Python, methods are just attributes. They are special in the sense that they are attributes which can be executed. In Python we say **callable**. A method is bound to an object. That means that this function is evaluated in the namespace of the object (see further). .. literalinclude:: _static/code/rev_com_obj.py :linenos: :language: python You may have notices the self parameter in function definition inside the class. But we called the method simply as ``ob.func()`` without any arguments. It still worked. This is because, whenever an object calls its method, the object itself is pass as the first argument. So, ``my_seq.reverse_comp()`` translates into ``Sequence.reverse_comp(my_seq)``. In general, calling a method with a list of n arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the method's object before the first argument. For these reasons, the first argument of the function in class must be the object itself. This is **conventionally** called ``self``. It can be named otherwise but we highly recommend to follow the convention. Special methods --------------- A class can implement certain operations that are invoked by special syntax (such as arithmetic operations or subscripting and slicing) by defining methods with special names. This is Python’s approach to operator overloading, allowing classes to define their own behavior with respect to language operators. One of the biggest advantages of using Python's magic methods is that they provide a simple way to make objects behave like built-in types. That means you can avoid ugly, counter-intuitive, and nonstandard ways of performing basic operators. In some languages, it's common to do something like this:: if my_obj.equals(other_obj): # do something You could certainly do this in Python, too, but this adds confusion and is unnecessarily verbose. Different libraries might use different names for the same operations, making the client do way more work than necessary. With the power of magic methods, however, we can define one method (``__eq__``, in this case), and say what we mean instead:: if instance == other_instance: #do something The specials methods are defined by the language. They're always surrounded by double underscores. `There are a ton of special functions in Python. `_ Overloading the + Operator ^^^^^^^^^^^^^^^^^^^^^^^^^^ To overload the ``+`` sign, we will need to implement the ``__add__`` function in the class. With great power comes great responsibility. We can do whatever we like, inside this function. But it is sensible to return a Point object of the coordinate sum:: class Point: # previous definitions... def __add__(self,other): x = self.x + other.x y = self.y + other.y return Point(x,y) Now let's try that addition again:: >>> p1 = Point(2,3) >>> p2 = Point(-1,2) >>> print(p1 + p2) (1,5) Overloading Comparison Operators in Python ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python does not limit operator overloading to arithmetic operators only. We can overload comparison operators as well. Suppose, we wanted to implement the less than symbol < symbol in our Point class. Let us compare the magnitude of these points from the origin and return the result for this purpose. It can be implemented as follows:: class Point: # previous definitions... def __lt__(self,other): self_mag = (self.x ** 2) + (self.y ** 2) other_mag = (other.x ** 2) + (other.y ** 2) return self_mag < other_mag Some sample runs:: >>> Point(1,1) < Point(-2,-3) True >>> Point(1,1) < Point(0.5,-0.2) False >>> Point(1,1) < Point(1,1) False http://www.programiz.com/python-programming/operator-overloading Comparison magic methods ^^^^^^^^^^^^^^^^^^^^^^^^ Python provide a set of special methods to compare object: to use >, >=, ==, !=, =<, <, you have to implements the comparisons special methods (__gt__, __ge__, __eq__, __neq__, __le__, __lt__) . *__eq__(self, other)* Defines behavior for the equality operator, ==. *__ne__(self, other)* Defines behavior for the inequality operator, !=. *__lt__(self, other)* Defines behavior for the less-than operator, <. *__gt__(self, other)* Defines behavior for the greater-than operator, >. *__le__(self, other)* Defines behavior for the less-than-or-equal-to operator, <=. *__ge__(self, other)* Defines behavior for the greater-than-or-equal-to operator, >=. http://www.python-course.eu/python3_magic_methods.php __init__ method _______________ To create an object, two steps are necessary. First a raw or uninitialized object must be created, and then the object must be initialized, ready for use. Some object-oriented languages (such as C++ and Java) combine these two steps into one, but Python keeps them separate. When an object is created (e.g., ``ecor_1 = Sequence('GAATTC')``, first the special method ``__new__()`` is called to create the object, and then the special method ``__init__()`` is called implicitly to initialize it. In practice almost every Python class we create will require us to reimplement only the ``__init__()`` method, since default ``__new__()`` method is al- most always sufficient and is automatically called if we don’t provide our own ``__new__()`` method. Although we can create an attribute in any method, it is a good practice to do this in the ``__init__`` method. Thus, it is easy to know what attributes have an object without being to read the entire code of a class:: class Sequence: alphabet = 'ATGC' def __init__(self, name, seq): """ :param seq: the sequence :type seq: string """ self.name = name self.sequence = seq self.nucleic = True for char in self.sequence: if char not in self.alphabet: self.nucleic = False break Namespace and attributes lookup =============================== :ref:`The LEGB rule (Local, Enclosing, Global, Built-in) ` still applied. But when a class is created a namespace is created. Furthermore, for each instance of this a class a new namespace corresponding to this instance is created. There exist a link between the namespace of the instance and the namespace of it's corresponding class. For instance: .. container:: clearer .. figure:: _static/figs/class_namespace.png :alt: class namespace :align: right :height: 200px when a class is created a namespace is created. .. code-block:: python class Student: school = 'Pasteur' def __init__(self, name): self.name = name self.scores = [] def add_score(self, val): self.scores.append(val) def average(self): av = sum(self.scores)/len(self.scores) return av .. container:: clearer .. image:: _static/figs/object_namespace.png :alt: class namespace :align: right :height: 200px .. code-block:: python foo = Student('foo') When an object is created, a namespace is created. This namespace is linked to its respective class namespace. .. container:: clearer .. image:: _static/figs/2_objects_namespace.png :alt: class namespace :align: right :height: 200px .. code-block:: python foo = Student('foo') bar = Student('bar') Each object have it's own namespace which are linked to the class namespace. .. container:: clearer .. image:: _static/figs/methods_namespace.png :alt: class namespace :align: right :height: 200px .. code-block:: python foo.add_score(12) bar.add_score(13) foo.add_score(15) foo.add_score(14) bar.add_score(11) During method execution a namespace is created which have a link to the object instance. This namespace is destroyed a the end of the method (return) To see it in action, you can play with the code below. .. literalinclude:: _static/code/test_namespace.py :linenos: :language: python :download:`test_namespace.py <_static/code/test_namespace.py>` Control the access to the attributes ------------------------------------ with underscore ^^^^^^^^^^^^^^^ “Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice. Deleting object =============== Any attribute of an object can be deleted anytime, using the ``del`` statement. We can even delete the object itself, using the ``del`` statement. Actually, it is more complicated than that. When we do ``s1 = Sequence('ecor1', 'GAATTC')``, a new instance object is created in memory and the name ``s1`` binds with it. On the command ``del s1``, this binding is removed and the name ``s1`` is deleted from the corresponding namespace. The object however continues to exist in memory and if no other name is bound to it, it is later automatically destroyed. This automatic destruction of unreferenced objects in Python is also called garbage collection. Inheritance =========== In the introduction of I mentioned that one of the objectives of OOP is to address some of the issues of software quality. What we have seen so far, object-based programming, consists in designing programs with objects, that are built with classes. In most object-oriented programming languages, you also have a mechanism that enables classes to share code. The idea is very simple: whenever there are some commonalities between classes, why having them repeat the same code, thus leading to maintenance complexity and potential inconsistency? So the principle of this mechanism is to define some classes as being the same as other ones, with some differences. In the example below we have design two classes to represent to kind of sequences DNA or AA. As we can see the 2 kinds of sequence have the same attributes (name and sequence) have some common methods (len, to_fasta) but have also some specific methods gc_percent and molecular_weight. .. literalinclude:: _static/code/sequence.py :linenos: :language: python The problem with this implementation is that a large part of code is the same in the 2 classes. It's bad because if I have to modify a part of common code I have to do it twice. If in future I'll need a new type of Sequence as RNA sequence I will have to duplicate code again on so on. the code will be hard to maintain. I need to keep together the common code, and be able to specify only what is specific for each type of Sequences So we keep our two classes to deal with DNA and protein sequences, and we add a new class: Sequence, which will be a common class to deal with general sequence functions. In order to specify that a DNA (or a Protein) is a Sequence in Python is: .. literalinclude:: _static/code/inheritance_sequence.py :linenos: :language: python .. image:: _static/figs/inheritance_namespace.png :alt: inheritance namespace :align: left :height: 500px .. container:: clearer .. image :: _static/figs/spacer.png * Each object have it's own namespace which are linked to the class namespace via the special attribute: ``__class__`` * Each class is link to it's parents namespace via the special attribute ``__bases__`` on so on until the object class namespace .. code-block:: python pep_1 = AASequence('pep_1', 'GIVQE') bar = DNASequence('Ecor I', 'GAATTC') .. container:: clearer .. image :: _static/figs/spacer.png Overloading ----------- Overloading an attribute or a method is to redefine at a subclass level an attribute or method that exists in upper classes of a class hierarchy. .. literalinclude:: _static/code/overloading.py :linenos: :language: python We we overload a method sometimes we just want to add something to the parent's method. in this case we can call explicitly the parent's method by using the keywords ``super``. The syntax of this method is lit bit tricky. the first argument must be the class that you want to retrieve the parent (usually the class you are coding), the second argument is the object you want to retrieve the parent class (usual self) and it return a proxy to the parent so you just have to call the method. see it in action, in the example below we overload the __init__ method and just add 2 attribute but for the name and sequence we call the Sequence __init__ method. .. literalinclude:: _static/code/super1.py :linenos: :language: python In python3 the syntax has been simplified. we can just call super() that's all. Polymorphism ============ The term polymorphism, in the OOP lingo, refers to the ability of an object to adapt the code to the type of the data it is processing. Polymorphism has two major applications in an OOP language. The first is that an object may provide different implementations of one of its methods depending on the type of the input parameters. The second is that code written for a given type of data may be used on other data with another datatype as long as the other data have compatible behavior. .. literalinclude:: _static/code/polymorphism.py :linenos: :language: python Albeit data are type, my method my_sum work equally on different type as integer, string or sequence. The my_sum method is called polymorph. Exercises ========= Modeling a sequence with few attributes and methods Exercise -------- Instantiate 2 sequences using your Sequence class, and draw a schema representing the namespaces. Exercise -------- .. literalinclude:: _static/code/class_attribute.py :linenos: :language: python Can you explain this result (draw namespaces to explain) ? how to modify the class variable *class_attr* Exercise -------- Write the definition of a Point class. Objects from this class should have a * a method **show** to display the coordinates of the point * a method **move** to change these coordinates. * a method **dist** that computes the distance between 2 points. .. note:: the distance between 2 points A(x0, y0) and B(x1, y1) can be compute .. math:: d(AB) = \sqrt{(x1-x0))^2 + (y1-y0)^2} (http://www.mathwarehouse.com/algebra/distance_formula/index.php) The following Python code provides an example of the expected behaviour of objects belonging to this class:: >>> p1 = Point(2, 3) >>> p2 = Point(3, 3) >>> p1.show() (2, 3) >>> p2.show() (3, 3) >>> p1.move(10, -10) >>> p1.show() (12, -7) >>> p2.show() (3, 3) >>> p1.dist(p2) 1.0 Exercise -------- Use biopython to read a fasta file (:download:`sv40.fasta <_static/data/sv40.fasta>`) and display the following attributes: * id * name * description * seq Use the module ``SeqIO`` from biopython. A tutorial is available here: https://biopython.org/wiki/SeqIO Biopython is not part of the default Python distribution. You will likely need to install it:: $ pip install biopython .. note:: The "$" character is here to indicate that the above command is to be typed in a command-line interface (shell). Do not type the "$" itself. If the above doesn't work, you may try to install packages by forcing your specific version of Python to load pip and perform the install:: $ python3.9 -m pip install biopython Exercise -------- Translate the sequence in phase 1, 2, -2 Exercise -------- * Create a sequence with the first 42 nucleotides * Translate this sequence * Mutate the nucleotide in position 18 'A' -> 'C' * and translate the mutated sequence see tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc28 Exercise -------- Open the file abcd.fasta (:download:`abcd.fasta <_static/data/abcd.fasta>`) and convert it in genbank format. **Hint**: The seq alphabet attribute must be set to extended_protein (see the ``Bio.Alphabet.IUPAC`` module). Exercice -------- Open the file abcd.fasta (:download:`abcd.fasta <_static/data/abcd.fasta>`) and filter out sequences of lenght <= 700. Write the results in a fasta file. Exercise -------- Use OOP to model restriction enzymes and sequences. The sequence must implement the following methods: * ``enzyme_filter`` which takes a list of enzymes as argument and returns a **new** list containing the enzymes which have binding site in the sequence The restriction enzyme must implement the following methods: * ``binds`` which takes a sequence as argument and returns ``True`` if the sequence contains a binding site, ``False`` otherwise. Solve the exercise :ref:`enzyme_exercise` using this new implementation. Exercise -------- Refactor your code of :ref:`matrix_exercise` in OOP style programming. Implement only: * **size**: return the number of rows, and number of columns * **get_cell**: that take the number of rows, the number of columns as parameters, and returns the content of cell corresponding to row number col number * **set_cell**: that take the number of rows, the number of columns as parameters, and a value and set the value val in cell specified by row number x column number * **to_str**: return a string representation of the matrix * **mult**: that take a scalar and return a new matrix which is the scalar product of matrix x val You can change the name of the methods to be more "pythonic". Exercise -------- Use the code to read multiple sequences fasta file in procedural style and refactor it in OOP style. Use the file :download:`abcd.fasta <_static/data/abcd.fasta>` to test your code. What is the benefit to use OOP style instead of procedural style?