13   Object Oriented Programming

13.1   Exercises

13.1.1   Exercise

Modelize a sequence with few attributes and methods

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Sequence(object):

    def __init__(self, identifier, comment, seq):
        self.id = identifier
        self.comment = comment
        self.seq = self._clean(seq)


    def _clean(self, seq):
        """
        remove newline from the string representing the sequence
        :param seq: the string to clean
        :return: the string without '\n'
        :rtype: string
        """
        return seq.replace('\n')


    def gc_percent(self):
        """
        :return: the gc ratio
        :rtype: float
        """
        seq = self.seq.upper()
        return float(seq.count('G') + seq.count('C')) / len(seq)




dna1 = Sequence('gi214', 'the first sequence', 'tcgcgcaacgtcgcctacatctcaagattca')
dna2 = Sequence('gi3421', 'the second sequence', 'gagcatgagcggaattctgcatagcgcaagaatgcggc')

sequence.py .

13.1.2   Exercise

Instanciate 2 sequences using your Sequence class, and draw schema representing the namespaces

sequence namespace
_images/spacer.png

13.1.3   Exercise

Can you explain this result (draw namespaces to explain) ? how to modify the class variable class_attr

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class MyClass(object):

    class_attr = 'foo'

    def __init__(self, val):
        self.inst_attr = val




a = MyClass(1)
b = MyClass(2)

print a.inst_attr
1
print b.inst_attr
2

print a.class_attr == b.class_attr
True
print a.class_attr is b.class_attr
True

b.class_attr = 4

print a.class_attr
4
del a.class_attr

MyClass.class_attr = 4

class_attribute.py .

13.1.4   Exercise

Write the definition of a Point class. Objects from this class should have a

  • a method show to display the coordinates of the point
  • a method move to change these coordinates.
  • a method dist that computes the distance between 2 points.

Note

the distance between 2 points A(x0, y0) and B(x1, y1) can be compute

\[d(AB) = \sqrt{(x1-x0))^2 + (y1-y0)^2}\]

(http://www.mathwarehouse.com/algebra/distance_formula/index.php)

The following python code provides an example of the expected behaviour of objects belonging to this class:

>>> p1 = Point(2, 3)
>>> p2 = Point(3, 3)
>>> p1.show()
(2, 3)
>>> p2.show()
(3, 3)
>>> p1.move(10, -10)
>>> p1.show()
(12, -7)
>>> p2.show()
(3, 3)
>>> p1.dist(p2)
1.0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import math


class Point(object):
    """Class to handle point in a 2 dimensions space"""

    def __init__(self, x, y):
        """
        :param x: the value on the X-axis
        :type x: float
        :param y: the value on the Y-axis
        :type y: float
        """
        self.x = x
        self.y = y


    def show(self):
        """
        :return: the coordinate of this point
        :rtype: a tuple of 2 elements (float, float)
        """
        return (self.x, self.y)


    def move(self, x, y):
        """
        :param x: the value to move on the X-axis
        :type x: float
        :param y: the value to move on the Y-axis
        :type y: float
        """
        self.x += x
        self.y += y


    def dist(self, pt):
        """
        :param pt: the point to compute the distance with
        :type pt: :class:`Point` object
        :return: the distance between this point ant pt
        :rtype: int
        """
        dx = pt.x - self.x
        dy = pt.y - self.y
        return math.sqrt(dx ** 2 + dy ** 2)

point.py .

13.1.5   Exercise

Use biopython to read a fasta file (sv40.fasta) and display the attributes

  • id
  • name
  • description
  • seq

use the module SeqIO in biopython A tutorial is available https://biopython.org/wiki/SeqIO

from Bio import SeqIO

sv40_rcd = SeqIO.read("sv40.fasta", "fasta")
print("id =", sv40_rcd.id)
print("name =", sv40_rcd.name)
print("description =", sv40_rcd.description)
print("sequence =", sv40_rcd.seq)

Other example of usage of SeqIO: seq_io.py

13.1.6   Exercise

Translate the sequence in phase 1, 2, -2

sv40_seq_phase1 = sv40_rcd.seq
sv40_seq_phase2 = sv40_rcd[1:]
sv40_seq_phase_2 = sv40_rcd[1:].reverse_complement(id=True)

13.1.7   Exercise

  • Create a sequence with the first 42 nucleotides
  • Translate this sequence
  • Mutate the nucleotide in position 18 ‘A’ -> ‘C’
  • and translate the mutated sequence

see tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc28

short_seq = sv40_seq_phase2[0:42]
short_seq.translate()
mutable_seq = short_seq.seq.tomutable()
mutable_seq[19] = 'C'
mutate_seq = mutable_seq.toseq()
mutate_seq.translate()

13.1.8   Exercise

Open the file abcd.fasta (abcd.fasta) and convert it in genbank format

Hint: the seq alphabet attribute must be set to extended_protein see Bio.Alphabet.IUPAC module

from Bio.Alphabet.IUPAC import extended_protein
with open("abcd.fasta", "r") as fasta, open('abcd.gb', 'w') as genbank:
    for record in SeqIO.parse(fasta, "fasta"):
        record.seq.alphabet = extended_protein
        print(len(record.seq))
        SeqIO.write(record, genbank, 'genbank')

13.1.9   Exercice

Open the file abcd.fasta (abcd.fasta) and filter out sequence <= 700 Write the results in fasta file

with open("abcd.fasta", "r") as input, open("abcd_short.fasta", "w") as output:
    for record in SeqIO.parse(input, "fasta"):
        if len(record.seq) > 700:
            SeqIO.write(record, output, 'fasta')

13.1.10   Exercise

Use OOP to modelize restriction enzyme, and sequences.

the sequence must implement the following methods

  • enzyme_filter which take as a list of enzymes as argument and return a new list containing the enzymes which have binding site in sequence

the restriction enzyme must implements the following methods

  • binds which take a sequence as argument and return True if the sequence contains a binding site, False otherwise.

solve the exercise 7.1.3   Exercise using this new implementation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

class Sequence(object):

    def __init__(self, identifier, comment, seq):
        self.id = identifier
        self.comment = comment
        self.seq = self._clean(seq)


    def _clean(self, seq):
        """

        :param seq:
        :return:
        """
        return seq.replace('\n')

    def enzyme_filter(self, enzymes):
        """

        :param enzymes:
        :return:
        """
        enzymes_which_binds = []
        for enz in enzymes:
            if enz.binds(self.seq):
                enzymes_which_binds.append(enz)
        return


class RestrictionEnzyme(object):

    def __init__(self, name, binding, cut, end, comment=''):
        self._name = name
        self._binding = binding
        self._cut = cut
        self._end = end
        self._comment = comment

    @property
    def name(self):
        return self._name

    def binds(self, seq):
        """

        :param seq:
        :return:
        """
        return self.binding in seq.seq

enzyme.py .

13.1.11   Exercise

refactor your code of 8.1.16   Exercise in OOP style programming. implements only

  • size: return the number of rows, and number of columns
  • get_cell: that take the number of rows, the number of columns as parameters, and returns the content of cell corresponding to row number col number
  • set_cell: that take the number of rows, the number of columns as parameters, and a value and set the value val in cell specified by row number x column number
  • to_str: return a string representation of the matrix
  • mult: that take a scalar and return a new matrix which is the scalar product of matrix x val

you can change the name of the methods to be more pythonic

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35




class Matrix(object):

    def __init__(self, row, col, val=None):
        self._row = row
        self._col = col
        self._matrix = []
        for i in range(row):
            c = [val] * col
            self._matrix.append(c)

    def size(self):
        return self._row, self._col

    def get_cell(self, row, col):
        self._check_index(row, col)
        return self._matrix[i][j]

    def matrix_set(self, row, col, val):
        self._check_index(row, col)
        self._matrix[row][col] = val

    def __str__(self):
        s = ''
        for i in range(self._row):
            s += self._matrix[i]
            s += '\n'
        return s

    def _check_index(self, row, col):
        if not (0 < row <= self._row) or not (0 < col <= self._col):
            raise IndexError("matrix index out of range")

matrix_obj.py .

13.1.12   Exercise

Use the code to read multiple sequences fasta file in procedural style and refactor it in OOP style. use the file abcd.fasta to test your code.

What is the benefit to use oop style instead of procedural style?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
class Sequence(object):

    def __init__(self, id_, sequence, comment=''):
        self.id = id_
        self.comment = comment
        self.sequence = sequence

    def gc_percent(self):
        seq = self.sequence.upper()
        return float(seq.count('G') + seq.count('C')) / float(len(seq))

class FastaParser(object):


    def __init__(self, fasta_path):
        self.path = fasta_path
        self._file = open(fasta_path)
        self._current_id = ''
        self._current_comment = ''
        self._current_sequence = ''

    def _parse_header(self, line):
        """
        parse the header line and  _current_id|comment|sequence attributes
        :param line: the line of header in fasta format
        :type line: string
        """
        header = line.split()
        self._current_id = header[0][1:]
        self._current_comment = ' '.join(header[1:])
        self._current_sequence = ''

    def __iter__(self):
        return self

    def next(self):
        """
        :return: at each call return a new :class:`Sequence` object
        :raise: StopIteration
        """
        for line in self._file:
            if line.startswith('>'):
                # a new sequence begin
                if self._current_id != '':
                    new_seq = Sequence(self._current_id,
                                       self._current_sequence,
                                       comment=self._current_comment)
                    self._parse_header(line)
                    return new_seq
                else:
                    self._parse_header(line)
            else:
                self._current_sequence += line.strip()
        if not self._current_id and not self._current_sequence:
            self._file.close()
            raise StopIteration()
        else:
            new_seq = Sequence(self._current_id,
                               self._current_sequence,
                               comment=self._current_comment)
            self._current_id = ''
            self._current_sequence = ''
            return new_seq


if __name__ == '__main__':
    import sys
    import os.path

    if len(sys.argv) != 2:
        sys.exit("usage fasta_object fasta_path")
    fasta_path = sys.argv[1]
    if not os.path.exists(fasta_path):
        sys.exit("No such file: {}".format(fasta_path))

    fasta_parser = FastaParser(fasta_path)
    for sequence in fasta_parser:
        print "----------------"
        print "{seqid} = {gc:.3%}".format(gc=sequence.gc_percent(),
                                          seqid = sequence.id)

fasta_object.py .