2. Variables, Expressions and Statements

The source code of a Python program is made of a series of instructions describing the computations to execute.
The description of computations is made of statements.
Some statements describe a value, or a computation that has a value.
These are called expressions.

Values can be of different types, that are used to represent the kind of things the program has to deal with: text, numbers, more complex data structures, files, etc.

In this chapter, we will briefly mention some of the most used simple data types in Python, and how they can be referred to: as object references.

2.1. Variables and Object References

2.1.1. Value and type

A value is one of the basic things a program works with, like a letter or a number. We have seen one value so far: "Hello, World!", which is a string (Strings), so-called because it contains a “string” of letters. Strings are recognized as such by the interpreter by the fact that they are enclosed in (matching) quotation marks (' or ").

Other types of value exist, among which some very basic ones represent numbers, like 17 or 3.2 (without quotes).

If you are not sure what type a value has, the type function gives you the answer:

>>> type("Hello, World!")
<class 'str'>
>>> type(17)
<class 'int'>

Not surprisingly, strings belong to the type str and integers belong to the type int (Integers). Less obviously, numbers with a decimal point belong to a type called float, because these numbers are represented in a format called “floating point” (Floats):

>>> type(3.2)
<class 'float'>

What about values like "17" and "3.2"? They look like numbers, but they are between quotation marks, like strings.

>>> type("17")
<class 'str'>
>>> type("3.2")
<class 'str'>

They are strings.

It is also possible to check if a value is of a specific type using the isinstance function.
This function takes two arguments, separated by a comma.
The first one is the value we want to test, the second is the particular type of value we’re interested in:
>>> isinstance("2", str)
True
>>> isinstance(3.2, int)
False

The True or False answer given in the interpreter actually belong to yet another type of value: bool (for Boolean), as we can see when we apply the type function on it:

>>> type(isinstance(3.2, int))
<class 'bool'>

We can also directly type Boolean values in the interpreter:

>>> False
False
>>> True
True
>>> type(True)
<class 'bool'>

When you type a large integer, you might be tempted to use commas or spaces between groups of three digits, as in 1,000,000. This is not a legal integer in Python, but the syntax is valid:

>>> 1,000,000
(1, 0, 0)

Well, that’s not what we expected at all! Python interprets 1,000,000 as a comma-separated sequence of integers. This is the first example of a semantic error: the code runs without producing an error message, but it doesn’t do the right thing.

Note

Starting from Python 3.6, you can use _ to visually separate groups of digits:

>>> 1_000_000
1000000

To convert a data item from one type to another, we can use the syntax datatype(item). For instance:

>>> int("45")
45
>>> str(45)
'45'

The int conversion is tolerant to leading and trailing whitespace. So int(" 45 ") would have worked just as well. The str conversion can be applied to almost every data item.

It is possible to convert one type of number into another:

>>> float(17)
17.0
>>> int(3.2)
3

Note that the conversion from float to int has to truncate the number.

If a conversion fails, an exception is raised (more about Exception Handling later):

>>> int("Hello World!")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'Hello World!'

2.1.2. Variables and Object references

Once we have data items (or values), the next thing we need is variables in which to store them. A variable is a name that refers to a value. One of the most powerful features of a programming language is the ability to manipulate variables.

In Python everything is an object, an int, a string, a function, or even a type:

>>> isinstance(3, object)
True
>>> isinstance("blue", object)
True
>>> isinstance(print, object)
True
>>> isinstance(float, object)
True

And every object has a type, even functions and types:

>>> type(print)
<class 'builtin_function_or_method'>
>>> type(float)
<class 'type'>

Note

An object is “something” that packs together:

  • a state, for instance the value 3 for the int or "blue" for the string;

  • and a behavior: a set of methods (the operations that we can do on this object).

For instance "blue" is the state, upper is a method applied to the value “blue”:

>>> "blue".upper()
'BLUE'

The available methods depend on the type of the object.

Strictly speaking, Python does not have variables as such, but object references.

Depending on the type of object (mutable or not), there may be a subtle difference between the two concepts, but it rarely maters in practice. We will therefore use the terms “variable” and “object reference” interchangeably.

Let’s look at few examples and see what happens in details:

x = 3
color = "green"
y = color

The syntax is simply object_reference = value. This is what happens when the above code is executed:

  • The first statement creates an int object with the value 3 and creates the object reference called x that refers to the int object. For all practical purpose we say that variable x has been assigned the integer 3.

  • The second statement is similar: variable color has been assigned the string "green".

  • The third creates a new object reference y and sets it to refer to the same object that the color reference object refers to (in this case the string object containing the value "green").

Let’s see what python does behind the scene:

object references
The circles represent the object references.
The rectangles represent the objects in memory.
The = operator is not the same as the variable assignment operator in some other languages.
The = operator binds an object in memory to an object reference. If the object reference already exists, it is simply re-bound to refer to the object on the right of = operator; if the reference does not exist, it is simply created by the = operator.
_images/spacer.png

Let us continue with the previous example and do some rebinding.

object references rebinding

>>> print(x, color, y)
3 green green
>>> x = y
>>> print(x, color, y)
green green green
Now the three objects references are referring to the same string with value "green".
Since there are no more object references to the int 3 Python is free to garbage-collect it (i.e. forget its existence to free memory).

Python uses dynamic typing, which means that an object reference can be rebound to refer to a different object (which may be a different data type) at any time.

_images/spacer.png

2.2. Immutable objects

Immutable objects are objects whose state (value) cannot be changed. We can rebind the reference which was referring to an immutable object to a new object with another value, but we cannot change the value of the object itself. We have already seen immutable objects like int and str.

In the previous example, when we did x = y, the value of the object x was referring to (3) did not change. We just made x refer to a different object in memory (the same that y was referring to, that is, "green").

There are a lot of other data types which are immutable. We will see them in details in Data Types and Collection Data Types.

2.3. Mutable objects

object references of mutable objects

By contrast to immutable objects, mutable objects are objects for which we can modify the state. One example of immutable object is the list (we will see Lists in details in the chapter about Collection Data Types). A list is an object that holds a collection of data items. In the list, the items are ordered. We can easily insert or remove items whenever we want.

Under the hood, the lists do not store data items at all, but rather object references. When lists are created and when items are inserted, they take copies of the object references they are given.

_images/spacer.png
On the figure we see the creation of a list with a reference x on it. This list contains 3 strings 'a', 'b', and 'c'.
The list does not contain the 3 string objects directly, but references of these objects.
We can easily change the state of the list, for instance by rebinding the second element to an integer object newly created.
The string 'b' has no reference any longer pointing to it. Python is free to garbage-collect it.

Note that in this example, 'b' was not changed into 4 (a string is immutable). Rather, the reference at index 1 in the list was modified to refer to 4 instead of referring to 'b'.

2.4. Variable names and keywords

Programmers generally choose names for their variables that are meaningful. These names document what the variable is used for.

Variable names (also called “identifiers”) are non empty sequences of characters that can be arbitrarily long. An identifier must obey a couple of rules and ideally follow some conventions.

To make things simple, we advise you to stick to the following rules:

  • Use only letters or the underscore (_) for the first character of the identifier.

  • The other characters can be letters, the underscore, and digits.

  • The identifier cannot have the same name as one of Python’s reserved keywords.

Python 3.9 has 35 keywords:

False

await

else

import

pass

None

break

except

in

raise

True

class

finally

is

return

and

continue

for

lambda

try

as

def

from

nonlocal

while

assert

del

global

not

with

async

elif

if

or

yield

The underscore character (_) is often used in names with multiple words such as scoring_matrix.

Identifiers are case-sensitive, so GENESEQUENCE, GeneSequence, Genesequence or genesequence are different identifiers.

Note

The precise set of characters that are permitted are described in the Python documentation, and in PEP3131.

Besides, here are two conventions that you are advised to follow:

  • Don’t use the names of any predefined identifiers. For instance, avoid using the name of any of Python’s built-in functions and types, (such as float, hash, id, int, input, list, slice, str, sum, tuple, type, to name a few examples), constants (such as NotImplemented or Ellipsis) or exceptions.

  • Names that begin and end with two underscores as __eq__ should not be used either. Python defines various special methods and variables that use such names. In the case of special methods, we can reimplement them, that is, make our own version of them (we will not cover this topic during this course), but we should not introduce new names following this pattern.

A single underscore can be used as an identifier. Inside an interactive interpreter or Python shell, _ holds the results of the last expression that was evaluated. Some Python programmers use _ when they don’t intend to use the associated value.

Here is a first example:

for _ in (0, 1, 2, 3):
    print("Hello")

A variable has to be defined as part of the for loop construct (and takes successively the values 0, 1, 2, and 3), but its value is not used. Using _ as loop variable identifiers hints the reader of the code about that.

Note

The above example has the effect of displaying Hello four times (and _ ends up having the value 3, once the loop is over).

Another example is when a sequence is unpacked, but only some of the values in the sequence are needed:

a, _, b, _ = (1, 2, 3, 4)

Here, the programmer is only interested in the first and third values in the sequence. However, one has to provide identifiers to get the values at the other positions. Using _ is a way to avoid having to find a name for these useless variables.

Note

In the above example, a has the value 1 and b has the value 3 (and _ ends up having the value 4).

If you give a variable an illegal name, you get a syntax error:

>>> 76trombones = "big parade"
File "<stdin>", line 1
    76trombones = "big parade"
    ^
SyntaxError: invalid syntax
>>> more@ = 1000000
File "<stdin>", line 1
    more@ = 1000000
        ^
SyntaxError: invalid syntax
>>> class = "Advanced Theoretical Zymurgy"
File "<stdin>", line 1
    class = "Advanced Theoretical Zymurgy"
        ^
SyntaxError: invalid syntax
  • 76trombones is illegal because it does not begin with a letter or underscore.

  • more@ is illegal because it contains an illegal character, @.

  • class is one of Python’s reserved keywords.

Note

All these naming conventions are detailed in Follow the conventions (Python Enhancement Proposal), Style Guide for Python Code. The pep8 gives coding conventions for the Python code. These guidelines are intended to improve the readability of the code and make it consistent across the wide spectrum of Python code. Consistency with this style guide is important. Consistency within a project is more important. Most of the time, when you start a project, you start it alone. It is your project, you can choose the style you want. But one day your code will be read by others (to help you debugging your code, because you want to start a collaboration, or a student gets back to your code to continue the project, because you want to publish your code). It will be much easier to understand what you did if you follow these conventions.

2.5. Summary

In this chapter we learned that values have a data type. Python has different data types, some of them are immutable the others are mutable. We can also create object references to handle data items. A variable or object reference can handle different data types at any time, this is called dynamic typing.