Introduction to Python Programming TC, BN, JBM, AZ
Institut Pasteur, Paris, 20-31 March 2017

Overview of data structures

Sequences

Sequence types share common operations. We have already encoutered the string type. All the operations below can be used on strings and other sequence types, which we will cover soon.

Operation Result
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s + t the concatenation of s and t
s * n ( or n * s) equivalent to adding s to itself n times
s[i] ith item of s, origin 0
s[i:j] slice of s from i to j
s[i:j:k] slice of s from i to j with step k
len(s) length of s
min(s) smallest item of s
max(s) largest item of s
s.index(x[, i[, j]]) index of the first occurrence of x in s (at or after index i and before index j)
s.count(x) total number of occurrences of x in s

Example1: get the length of a sequence

In [38]:
s = "Protein"
len(s)
Out[38]:
7

Example2: add (concatenate) two strings

In [39]:
new_string = "abc" + "def"
new_string
Out[39]:
'abcdef'

Accessing items and slicing sequences

  • Accessing to an item uses the square brackets syntax
  • Index starts at 0
  • Index ends at N-1 , where N is the length of the sequence
- [index]   get the item at position defined by index
In [41]:
s = "Protein"
print(s[0])
print(s[3])
P
t
  • Slicing can access to a subset of the sequence (not just one item)
  • Slicing uses the same square brackets syntax as above and should contain a : character. See syntax here below
- [start:end:step]   most general slicing from start to end (excluded) with a step
- [start:end]       (step=1)
- [start:]          (step=1, until the sequence end)
- [:]               (start=0, until the sequence end, step=1)
- [::2]             (start=0, until the sequence end, step=2)
- [start:end:step]   most general slicing from start to end
- [start:end]       (step=1)
- [start:]          (step=1, until the sequence end)
- [:]               (start=0, until the sequence end, step=1)
- [::2]             (start=0, until the sequence end, step=2)
In [47]:
# Then [start:end] syntax
s[1:4]
Out[47]:
'rot'
In [48]:
# If the end is not provided, its default is -1 (up to the end)
s[2:]
Out[48]:
'otein'
In [49]:
# it the start is omitted, it defaults to 0
s[:2]
Out[49]:
'Pr'
In [50]:
# We can omit start and end
s[:]
Out[50]:
'Protein'
In [51]:
# Or select every two elements
s[::2]
Out[51]:
'Poen'

In Python, indices can also be negatives. Consider this example on the string Protein.

In [36]:
# slice with negative 
s[-3:-1]
Out[36]:
'ei'
In [33]:
s[1:-1:2]
Out[33]:
'rti'
In [14]:
s[::2]
Out[14]:
'Poen'
In [15]:
s[::-2]
Out[15]:
'neoP'
In [16]:
s[::-1]
Out[16]:
'nietorP'

Strings are immutable sequences

In [52]:
s = "Protein"
s[3] = 'Z'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-52-a3b036d7c835> in <module>()
      1 s = "Protein"
----> 2 s[3] = 'Z'

TypeError: 'str' object does not support item assignment
In [58]:
# to change an item, you got to create a new string
s2 = s[:-1] + 'c'
print(s2)
Proteic

String methods

In [58]:
print(dir("")[32:])
['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Well, that's a lot ! Here are some useful ones:

Method Example
find s1.find("GGG")
index s1.index("GGG")
replace s1.replace("T", "U")
lower s1.lower()
split s1.split("\n")
endswith s1.endswith("n")
startswith s1.startswith("n")
strip s1.strip()

The split and join method (on string sequences)

There are many methods that can be used with strings. A pair of methods s worth mentionning here: the split and join.

Consider the following sentence. You want to extract each word. The unique separator is a space.

In [126]:
sentence = "This is amazing "   # note the extra space
In [72]:
# By default, the split method splits the string on whitespace
words = sentence.split()
words
Out[72]:
['This', 'is', 'amazing']

The returned object is a sequence of strings. It is a new type of sequence: the list, which will be covered later.

You can now join back the 3 strings. You can join them with a space to recover the orignal sentence, or another character (e.g. return carriage, denoted \n) or a longer string:

In [65]:
" -- ".join(words)
Out[65]:
'This -- is -- amazing'

The in and not keywords

In [173]:
sequence = "ACGTAAAA"
"N" in  sequence
Out[173]:
False
In [174]:
"A" in sequence
Out[174]:
True
In [175]:
"A" not in sequence
Out[175]:
False
In [176]:
not True
Out[176]:
False

String manipulation

In [68]:
data = open("data/P56945.fasta").read()
  1. The data variable is a string. It contains a header and a protein sequence. Print the data. Not the return carriage. Extract the header (first line) and the entire sequence without the \n signs
  2. Get the length of the sequences.
  3. Get position (index) of the accession number (P56945) from the header
  4. Extract the accession number
  5. Extract the uniprot identifier (BCAR1_HUMAN)
  6. What are the starting position of the following sequences: PPAEREAPAE and PPPAAA
In [125]:
sv40 = open("data/sv40.fasta").read()
  1. The sv40 variable is a string. genome sequence of sv40.
  2. Get the length of the sequences.
  3. Can you find following recognition sites in the sequence ?:
    • BamHI (ggatcc),
    • EcorI (gaattc),
    • HindIII (aagctt),
    • SmaI (cccggg) (answer by True or False)?
  4. for the enzymes which have a recognition site can you give their positions? is there only one site in sv40 per enzyme?
  5. compute the GC percent of sv 40

We want to perform a PCR on sv40, can you give the length and the sequence of the amplicon?

primer_1 : 5' CGGGACTATGGTTGCTGACT 3'
primer_2 : 5' TCTTTCCGCCTCAGAAGGTA 3'

to simplify the exercise, the 2 primers can be read directly on sv40 sequence forward strand.

The tuple sequence

A tuple is an ordered and immutable sequence of arbitrary items

To create a tuple, use:

  • the tuple( ) builtin function
  • a comma after an item
  • parentheses with at least 1 comma

Examples

In [139]:
(1, 2, 3)
Out[139]:
(1, 2, 3)
In [138]:
tuple([1,'a',3])
Out[138]:
(1, 'a', 3)
In [79]:
(1) # Note: braces does not create a tuple, only the comma does.
Out[79]:
1
In [80]:
(1,)
Out[80]:
(1,)

Immutable

In [384]:
t = (1,2,3)
t[0] = 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-384-1cff98eb84aa> in <module>()
      1 t = (1,2,3)
----> 2 t[0] = 1

TypeError: 'tuple' object does not support item assignment

Some operators

In [82]:
(1,) * 5
Out[82]:
(1, 1, 1, 1, 1)
In [4]:
# You can actually append item with the augmented operator 
# but += creates a new object
t1 = (1, 0)
t1 += (1, 2)
t1
Out[4]:
(1, 0, 1, 2)
In [5]:
t1 *= 2
t1
Out[5]:
(1, 0, 1, 2, 1, 0, 1, 2)

Tuple methods

In [130]:
dir(tuple())[iN:]
Out[130]:
['count', 'index']

Only two methods !! Much simpler than strings because tuples are optimised for speed

In [131]:
t = (1, 2, 1, 2, 2, 2, 7, 5)
t.count(2)
Out[131]:
4
In [132]:
t.index(5)
Out[132]:
7

Tuples can be use to swap elements

In [134]:
a = 2
b = 10
In [135]:
a, b, = b, a
In [136]:
a, b
Out[136]:
(10, 2)

The list sequence

A list is an ordered and mutable sequence of arbitrary items

To create a list, use:

  • the list( ) builtin function
  • square brackets [ ]

Examples

In [149]:
l2 = [1, 'a', 3]    # # you can have any kind of mixed objects in a lists.
print(l2)
[1, 'a', 3]
In [146]:
# slicing and indexing like for strings are available
l2[1]
Out[146]:
'a'
In [147]:
l2[0::2]
Out[147]:
[1, 3]
In [148]:
list("sequence")
Out[148]:
['s', 'e', 'q', 'u', 'e', 'n', 'c', 'e']

Lists are Mutable sequences

In [150]:
l2 = [1, 'a', 3]   
l2[1] = 2
l2
Out[150]:
[1, 2, 3]

Some handy operators

In [142]:
[1, 2] + [3, 4]
Out[142]:
[1, 2, 3, 4]
In [143]:
[1] * 10
Out[143]:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
In [152]:
ll = [1,2,3]
ll += [4,5,6]
ll
Out[152]:
[1, 2, 3, 4, 5, 6]

List methods

In [111]:
print(dir(methods)[iN:])
['append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

Lists have several methods amongst which the append and extend methods. The former appends an object to the end of the list (e.g., another list) while the later appends each element of the iterable object (e.g., another list) to the end of the list.

In [73]:
stack = ['a','b']
stack.append('foo')
stack
Out[73]:
['a', 'b', 'foo']

However, if we want to append several objects contained in a list, the result as expected (or not...) is:

In [366]:
stack.append(['d', 'e', 'f'])
stack
Out[366]:
['a', 'b', 'c', ['d', 'e', 'f']]

To include each element, you should use the extend method instead of append

In [149]:
stack = ['a', 'b', 'c']
stack.extend(['d', 'e','f'])
stack
Out[149]:
['a', 'b', 'c', 'd', 'e', 'f']
In [74]:
stack = ['a','b']
stack.extend('foo')
stack
Out[74]:
['a', 'b', 'f', 'o', 'o']
a list in a list is not extended ! Only the top-level list is.
In [369]:
stack = ['a', 'b', 'c']
stack.extend(['d', 'e', ['f']])
stack
Out[369]:
['a', 'b', 'c', 'd', 'e', ['f']]

Tuples VS lists

  1. Tuples are faster than list (see examples at the end of this notebook)
  2. Tuples protect the data (immutable)
  3. Tuples can be used as keys on dictionaries, not lists.
  4. Tuples can be added to a set (see later), not lists.

List practical session

  1. Create a list of 40 letters made of 10 A, 10 C, 10 G, 10 T
  2. Select elements between index 10 (included) and 20 (included). What is the length of the new list. Check you only have C letters.
  3. Select only the G letters using slicing
  4. count the number of T
  1. Create two lists [1,2,3,4,5] and [6,7,8,9]
  2. concatenate them using the + operator. concatenate them using the extend method. What the fundamental difference ?

From the list l = [1, 2, 3, 4, 5, 6, 7, 8, 9] generate 2 lists l1 containing all odd values, and l2 all even values.

l = [1, 2, 3, 4, 5, 6, 7, 8, 9]
l1 = lxxx
l2 = lxxx
print(l1)
[1,3,5,7,9]
print(l2)
[2,4,6,8]

Given

l = [1, 2, "a"]
  t = (1,2)

check that a is included in the list. What about the tuple t, is it included in l ?

The dictionary sequence (dict)

a dictionary is an unordered and mutable collections of key–value items

  • It is equivalent to a mapping type where we associate an item to a key
  • Dictionary are fast because a key provide a direct access to its corresponding item (the value), without the need to iterate through the entire collection
Dictionaries are unordered.

Examples

In [154]:
d = {}   # empty dictionary
In [155]:
d = {'first':1, 'second':2} # initialise a dictionary
In [156]:
# access to value given a key:
d['first']
Out[156]:
1
In [157]:
# add a new pair of key/value:
d['third'] = 3
In [117]:
# what are the keys ?
d.keys()
Out[117]:
dict_keys(['first', 'third', 'second'])
In [118]:
# what are the values ?
d.values()
Out[118]:
dict_values([1, 3, 2])
Keys are unique and must be immutable (integer, string, tuples)

Mutable

In [162]:
d['third'] = [3,3,3,3]
d
Out[162]:
{'first': 1, 'second': 2, 'third': [3, 3, 3, 3]}

Dictionary methods

In [168]:
d = {}
methods = dir(d)
print(methods[28:])
['clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']

Dictionary Practical session

In [171]:
data = open("data/P56945.fasta").read()
  • Extract the entire sequence as a single string (already done, but do it again)
  • Count the number of A, C, G, T
  • Create a dictionary d1 with four keys: A, C, G, T and their corresponding number of occurences
  • Create a new dictionary d2 with one item whose key is "N" and value is 0
  • Create a new dictionary that combines d1 and d2
  • Starting from
    d3 = {"a":1, "b":2}
    d4 = d3
    Change the content of d4 (e.g., change the value corresponding to the key a). Check the content of d3. Did it change ?
  • Using d3 from above, try to access to a key that does not exist (e.g., c). What is happening ? How to check that c does not exit so as to avoid the error ?
  • Try to create a dictionary with a list [1,2] as a key. Do the same with a tuple.

set sequences

A set is an unordered and mutable sequence of unique items

  • Sets are mutable, so we can easily add or remove items.
  • items are unique
  • Since sets are unordered they have no notion of index position (no slicing)

The syntax to create a set can be the function set or curly braces {}

In [179]:
a = {1, 2, 3, 4, 4}
a
Out[179]:
{1, 2, 3, 4}
In [396]:
a = set([1, 2, 3, 4, 4])  # input is a list
a
Out[396]:
{1, 2, 3, 4}
In [180]:
b = set((3, 4, 5, 6))    # input is a tuple
b
Out[180]:
{3, 4, 5, 6}

sets are mutable

In [183]:
s = set()
print(s)
s.add('a')
print(s)
s.add('b')
print(s)
s.add((1,2))
print(s)
set()
{'a'}
{'a', 'b'}
{'a', 'b', (1, 2)}

set also support set operators: Union, Intersection, Difference, Symetric difference

operator description
| for union
& for intersection
< for subset
- for difference
^ for symmetric difference
In [398]:
a | b
Out[398]:
{1, 2, 3, 4, 5, 6}
In [399]:
a & b
Out[399]:
{3, 4}
In [400]:
a < b
Out[400]:
False
In [171]:
a - b
Out[171]:
{1, 2}
In [401]:
a ^ b
Out[401]:
{1, 2, 5, 6}

These operators are available as methods as well. E.g,

In [402]:
a.difference(b)
Out[402]:
{1, 2}
In [184]:
a.union(b)
Out[184]:
{1, 2, 3, 4, 5, 6}

frozensets (for information)

Sets are mutable, and may therefore not be used, for example, as keys in dictionaries.

Another problem is that sets themselves may only contain immutable (hashable) values, and thus may not contain other sets.

Because sets of sets often occur in practice, there is the frozenset type, which represents immutable (and, therefore, hashable) sets.

In [188]:
a = frozenset({1,2,3})
a
Out[188]:
frozenset({1, 2, 3})
In [189]:
a = set([1,2,3])
b = set([2,3,4])
# a.add(b)  # this is NOT possible since b is mutable
a.add(frozenset(b))
a
Out[189]:
{1, 2, 3, frozenset({2, 3, 4})}

A note about copying collections

  • everything is an object and therefore references are used !!
  • consider the following example
In [191]:
a = [1, 2, 3]
b  = a # is a reference
b[0] = 10
print(a)
[10, 2, 3]

This behaviour is source of errors for beginners. Keep that in mind.

In some situations, we really do want a separate copy of the item, or collections of objects.

With sequences, you can use slicing. Indeed, the slice is always an independent copy of the items copied.

To copy an entire sequence we can do this:

ascii = ['a','b','c']
ascii_copy = ascii[:]

For dictionaries and sets, copying can be achieved using dict.copy( ) and set.copy( ).

shallow versus deep copy

Using slicing to copy the list: no surpise

In [186]:
ascii = ['a','b','c']
ascii_copy = ascii[:] # shallow copy
ascii[2] = 'z'
print("ascii", ascii)
print("ascii_copy", ascii_copy)
ascii ['a', 'b', 'z']
ascii_copy ['a', 'b', 'c']

If an item is a sequence itself (here ['c', 'd']), it is copied but the items inside are independent objects, so they are not copied (reference).

In [188]:
ascii = ['a','b',['c', 'd']]
ascii_copy = ascii[:]
ascii_copy[2][0] = "eeee"
print("ascii", ascii)
print("ascii_copy", ascii_copy)
ascii ['a', 'b', ['eeee', 'd']]
ascii_copy ['a', 'b', ['eeee', 'd']]

Solution: use copy module and the deepcopy function

In [190]:
import copy
ascii = ['a', 'b', ['c', 'd']]
ascii_copy = copy.deepcopy(ascii)
ascii_copy[2][0] = "eeee"
ascii
Out[190]:
['a', 'b', ['c', 'd']]

Writing into a file

In [439]:
mysequence = """>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA
"""
In [440]:
# First, we open a file in "w" write mode
fh = open("mysequence.fasta", "w")
In [441]:
# Second, we write the data into the file:
fh.write(mysequence)
Out[441]:
992
In [442]:
# Third, we close:
fh.close()

Reading data from a file

In [443]:
# First, we open the file in read mode (r)
fh = open('mysequence.fasta', 'r')
In [444]:
# Second, we read
data = fh.read()
In [445]:
# Third we close
fh.close()
In [448]:
# data is now a string that contains the content of the file being read
print(data)
>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA

File manipulation

  1. Read data/P56945.fasta file and save the content in a variable.
  2. Save into another file. Read back the new file and get the sequence; save into seq2
  3. check that seq1 and seq2 are identical
  4. count the number of A (88)
  5. count the number of C, G, T and save into a dictionary where letters are keys and counts are the values

For advanced users: get at least two different uniprot identifiers and download the sequences from uniprot website (see first slides).
Get the ACGT letter counts for the different sequences.
Save results as dictionary of dictionaries so that we can get the number of A of uniprot ID1 as results['ID1']['A']

The with keyword

  • When dealing with files, it is important to close the file handler.
  • A special keyword with can be used and you will see the following syntax quite often
  • Once used with the with keyword, the file handler will be automatically closed.
  • Here is the syntax
In [218]:
with open("mysequence.fasta", "r") as fh:
    data = fh.read()
data
Out[218]:
'>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2\nMNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL\nKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP\nSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ\nVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL\nLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS\nNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED\nVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA\nPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL\nAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK\nLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN\nASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW\nMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA\nNWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA\nHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP\nSAAQDMVERVKELGHSTQQFRRVLGQLAAA\n'

There is some magic here since fh has been closed automatically. Ther with keywords is a Context Manager.

The del keyword

In [220]:
my_variable = 1
my_variable
Out[220]:
1
In [221]:
del(my_variable)
my_variable
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-221-bdf894ac7f19> in <module>()
      1 del(my_variable)
----> 2 my_variable

NameError: name 'my_variable' is not defined

Summary

  • Python language uses indentation to create block of statements. The convention is to use 4 spaces (not tabulation)
  • We've seen some Scalar types: float, complex, long
  • Scalar types can be created on the fly or with built-in functions:
 var_int = 1
 var_int = int(1.1)
  • Variable names can be made of characters and underscores but avoid starting with underscores except if you know what you are doing
  • Python has reserved keywords that cannot be used as variable name. E.g:

    from, import, as, print, del, with, in, not

  • dot operator: allows one to access to content of any object
  • Python has a few built-in functions.
    • print is one of the most used built-in function
    • int( ) , float( ), tuple( ), list( ), dir( ), set( ) , dict( ), open( ), close( ) ...
  • We've seen how to import a module using the following keywords: import/from/as
  • del removes an object from the environment.
  • We've seen the most important data structures in Python:

    1. lists are mutable sequences:

      l = [1,2,3]

    2. tuples are immutable sequences:

      t = (1,2,3)

    3. dictionaries are hash tables :

      d = {'a': 1, 'b': 2}

    4. sets are unordered sequences with unique values

      s = set([1,2,3])

  • Be aware that mutable objects (e.g., list) are copied by reference:
a = [1,2,3]
b = a

When changing b you are also changing a

If you really want a different object, copy it for instance by slicing its contents:

b = a[:]
  • Use tuple instead of list whenever possible (more efficient)
  • When you want to use membership operator in, use set instead of list whenever possible (more efficient).