.. _Variables: ************************************* Variables, Expressions and Statements ************************************* | The source code of a Python program is made of a series of instructions describing the computations to execute. | The description of computations is made of **statements**. | Some statements describe a value, or a computation that has a value. | These are called **expressions**. Values can be of different **types**, that are used to represent the kind of things the program has to deal with: text, numbers, more complex **data structures**, files, etc. In this chapter, we will briefly mention some of the most used simple data types in Python, and how they can be referred to: as **object references**. Variables and Object References =============================== Value and type -------------- A **value** is one of the basic things a program works with, like a letter or a number. We have seen one value so far: ``"Hello, World!"``, which is a **string** (:ref:`strings`), so-called because it contains a "string" of letters. Strings are recognized as such by the interpreter by the fact that they are enclosed in (matching) quotation marks (``'`` or ``"``). Other **types** of value exist, among which some very basic ones represent numbers, like ``17`` or ``3.2`` (without quotes). If you are not sure what type a value has, the ``type`` function gives you the answer:: >>> type("Hello, World!") >>> type(17) Not surprisingly, strings belong to the type ``str`` and integers belong to the type ``int`` (:ref:`integers`). Less obviously, numbers with a decimal point belong to a type called ``float``, because these numbers are represented in a format called "floating point" (:ref:`float`):: >>> type(3.2) What about values like ``"17"`` and ``"3.2"``? They look like numbers, but they are between quotation marks, like strings. :: >>> type("17") >>> type("3.2") They **are** strings. | It is also possible to check if a value is of a specific type using the ``isinstance`` function. | This function takes two **arguments**, separated by a comma. | The first one is the value we want to test, the second is the particular type of value we're interested in: :: >>> isinstance("2", str) True >>> isinstance(3.2, int) False The ``True`` or ``False`` answer given in the interpreter actually belong to yet another type of value: ``bool`` (for **Boolean**), as we can see when we apply the ``type`` function on it:: >>> type(isinstance(3.2, int)) We can also directly type Boolean values in the interpreter:: >>> False False >>> True True >>> type(True) When you type a large integer, you might be tempted to use commas or spaces between groups of three digits, as in ``1,000,000``. This is not a legal integer in Python, but the syntax is valid:: >>> 1,000,000 (1, 0, 0) Well, that's not what we expected at all! Python interprets ``1,000,000`` as a comma-separated sequence of integers. This is the first example of a **semantic error**: the code runs without producing an error message, but it doesn't do the **right** thing. .. note:: Starting from `Python 3.6 `_, you can use ``_`` to visually separate groups of digits:: >>> 1_000_000 1000000 .. In Python, both ``str`` and ``int`` are immutable: Once set, their value cannot be changed anymore. .. Some other data types are mutable (we will see what this difference implies between these to kinds .. of data type in :ref:`mutable obj` and :ref:`immutable obj` and some examples of mutable and immutable .. data types in :ref:`Data_Types` and :ref:`Collection_Data_types`). To convert a data item from one type to another, we can use the syntax *datatype(item)*. For instance:: >>> int("45") 45 >>> str(45) '45' The ``int`` conversion is tolerant to leading and trailing whitespace. So ``int(" 45 ")`` would have worked just as well. The ``str`` conversion can be applied to almost every data item. It is possible to convert one type of number into another:: >>> float(17) 17.0 >>> int(3.2) 3 Note that the conversion from ``float`` to ``int`` has to truncate the number. If a conversion fails, an exception is raised (more about :ref:`exceptions` later):: >>> int("Hello World!") Traceback (most recent call last): File "", line 1, in ValueError: invalid literal for int() with base 10: 'Hello World!' Variables and Object references ------------------------------- Once we have data items (or values), the next thing we need is variables in which to store them. A variable is a name that refers to a value. One of the most powerful features of a programming language is the ability to manipulate ``variables``. In Python everything is an **object**, an int, a string, a function, or even a type:: >>> isinstance(3, object) True >>> isinstance("blue", object) True >>> isinstance(print, object) True >>> isinstance(float, object) True And every object has a type, even functions and types:: >>> type(print) >>> type(float) .. note:: An object is "something" that packs together: * a state, for instance the value ``3`` for the int or ``"blue"`` for the string; * and a behavior: a set of **methods** (the operations that we can do on this object). For instance ``"blue"`` is the state, ``upper`` is a method applied to the value "blue":: >>> "blue".upper() 'BLUE' The available methods depend on the type of the object. Strictly speaking, Python does not have variables as such, but **object references**. Depending on the type of object (**mutable** or not), there may be a subtle difference between the two concepts, but it rarely maters in practice. We will therefore use the terms "variable" and "object reference" interchangeably. Let's look at few examples and see what happens in details:: x = 3 color = "green" y = color The syntax is simply ``object_reference = value``. This is what happens when the above code is executed: * The first statement creates an ``int`` object with the value ``3`` and creates the object reference called ``x`` that refers to the int object. For all practical purpose we say that variable ``x`` has been assigned the integer ``3``. * The second statement is similar: variable ``color`` has been assigned the string ``"green"``. * The third creates a new object reference ``y`` and sets it to refer to the same object that the ``color`` reference object refers to (in this case the string object containing the value ``"green"``). Let's see what python does behind the scene: .. figure:: _static/figs/ref_obj.png :align: left :alt: object references :figclass: align-left | *The circles represent the object references.* | *The rectangles represent the objects in memory.* | The ``=`` operator is not the same as the variable assignment operator in some other languages. | The ``=`` operator binds an object in memory to an object reference. If the object reference already exists, it is simply re-bound to refer to the object on the right of ``=`` operator; if the reference does not exist, it is simply created by the ``=`` operator. .. container:: clearer .. image :: _static/figs/spacer.png Let us continue with the previous example and do some rebinding. .. image:: _static/figs/rebinding.png :width: 250px :align: left :alt: object references rebinding \ :: >>> print(x, color, y) 3 green green >>> x = y >>> print(x, color, y) green green green | Now the three objects references are referring to the same string with value ``"green"``. | Since there are no more object references to the int ``3`` Python is free to garbage-collect it (i.e. forget its existence to free memory). Python uses **dynamic typing**, which means that an object reference can be rebound to refer to a different object (which may be a different data type) at any time. .. container:: clearer .. image :: _static/figs/spacer.png .. _immutable obj: Immutable objects ================= Immutable objects are objects whose state (value) **cannot** be changed. We can rebind the reference which was referring to an immutable object to a **new** object with another value, but we cannot change the value of the object itself. We have already seen immutable objects like ``int`` and ``str``. In the previous example, when we did ``x = y``, the value of the object ``x`` was referring to (``3``) did not change. We just made ``x`` refer to a different object in memory (the same that ``y`` was referring to, that is, ``"green"``). There are a lot of other data types which are immutable. We will see them in details in :ref:`Data_types` and :ref:`Collection_Data_types`. .. _mutable obj: Mutable objects =============== .. image:: _static/figs/ref_obj_mutable.png :align: left :alt: object references of mutable objects By contrast to immutable objects, mutable objects are objects for which we can modify the state. One example of immutable object is the ``list`` (we will see :ref:`list` in details in the chapter about :ref:`Collection_Data_Types`). A list is an object that holds a collection of data items. In the list, the items are ordered. We can easily insert or remove items whenever we want. Under the hood, the lists do not store data items at all, but rather object references. When lists are created and when items are inserted, they take copies of the object references they are given. .. container:: clearer .. image :: _static/figs/spacer.png | On the figure we see the creation of a list with a reference ``x`` on it. This list contains 3 strings ``'a'``, ``'b'``, and ``'c'``. | The list does not contain the 3 string objects directly, but references of these objects. | We can easily change the state of the list, for instance by rebinding the second element to an integer object newly created. | The string ``'b'`` has no reference any longer pointing to it. Python is free to garbage-collect it. Note that in this example, ``'b'`` was not changed into ``4`` (a string is immutable). Rather, the reference at index 1 in the list was modified to refer to ``4`` instead of referring to ``'b'``. Variable names and keywords =========================== Programmers generally choose names for their variables that are meaningful. These names document what the variable is used for. Variable names (also called "identifiers") are non empty sequences of characters that can be arbitrarily long. An identifier must obey a couple of rules and ideally follow some conventions. To make things simple, we advise you to stick to the following rules: * Use only letters or the underscore (``_``) for the first character of the identifier. * The other characters can be letters, the underscore, and digits. * The identifier cannot have the same name as one of Python's reserved keywords. Python 3.9 has 35 keywords: .. tabularcolumns:: |l|l|l|l|l| +--------+----------+---------+----------+--------+ | False | await | else | import | pass | +--------+----------+---------+----------+--------+ | None | break | except | in | raise | +--------+----------+---------+----------+--------+ | True | class | finally | is | return | +--------+----------+---------+----------+--------+ | and | continue | for | lambda | try | +--------+----------+---------+----------+--------+ | as | def | from | nonlocal | while | +--------+----------+---------+----------+--------+ | assert | del | global | not | with | +--------+----------+---------+----------+--------+ | async | elif | if | or | yield | +--------+----------+---------+----------+--------+ The underscore character (``_``) is often used in names with multiple words such as ``scoring_matrix``. Identifiers are case-sensitive, so ``GENESEQUENCE``, ``GeneSequence``, ``Genesequence`` or ``genesequence`` are different identifiers. .. note:: The precise set of characters that are permitted are described in the `Python documentation `_, and in `PEP3131 `_. Besides, here are two conventions that you are advised to follow: * Don't use the names of any predefined identifiers. For instance, avoid using the name of any of Python's built-in `functions and types `_, (such as ``float``, ``hash``, ``id``, ``int``, ``input``, ``list``, ``slice``, ``str``, ``sum``, ``tuple``, ``type``, to name a few examples), `constants `_ (such as ``NotImplemented`` or ``Ellipsis``) or `exceptions `_. * Names that begin and end with two underscores as ``__eq__`` should not be used either. Python defines various special methods and variables that use such names. In the case of special methods, we can reimplement them, that is, make our own version of them (we will not cover this topic during this course), but we should not introduce new names following this pattern. A single underscore can be used as an identifier. Inside an interactive interpreter or Python shell, ``_`` holds the results of the last expression that was evaluated. Some Python programmers use ``_`` when they don't intend to use the associated value. Here is a first example:: for _ in (0, 1, 2, 3): print("Hello") A variable has to be defined as part of the ``for`` loop construct (and takes successively the values ``0``, ``1``, ``2``, and ``3``), but its value is not used. Using ``_`` as loop variable identifiers hints the reader of the code about that. .. note:: The above example has the effect of displaying ``Hello`` four times (and ``_`` ends up having the value ``3``, once the loop is over). Another example is when a sequence is **unpacked**, but only some of the values in the sequence are needed:: a, _, b, _ = (1, 2, 3, 4) Here, the programmer is only interested in the first and third values in the sequence. However, one has to provide identifiers to get the values at the other positions. Using ``_`` is a way to avoid having to find a name for these useless variables. .. note:: In the above example, ``a`` has the value ``1`` and ``b`` has the value ``3`` (and ``_`` ends up having the value ``4``). If you give a variable an illegal name, you get a syntax error:: >>> 76trombones = "big parade" File "", line 1 76trombones = "big parade" ^ SyntaxError: invalid syntax >>> more@ = 1000000 File "", line 1 more@ = 1000000 ^ SyntaxError: invalid syntax >>> class = "Advanced Theoretical Zymurgy" File "", line 1 class = "Advanced Theoretical Zymurgy" ^ SyntaxError: invalid syntax * ``76trombones`` is illegal because it does not begin with a letter or underscore. * ``more@`` is illegal because it contains an illegal character, ``@``. * ``class`` is one of Python's reserved keywords. .. note:: All these naming conventions are detailed in :ref:`pep_8` (**P**\ ython **E**\ nhancement **P**\ roposal), Style Guide for Python Code. The pep8 gives coding conventions for the Python code. These guidelines are intended to improve the readability of the code and make it consistent across the wide spectrum of Python code. Consistency with this style guide is important. Consistency within a project is more important. Most of the time, when you start a project, you start it alone. It is your project, you can choose the style you want. But one day your code will be read by others (to help you debugging your code, because you want to start a collaboration, or a student gets back to your code to continue the project, because you want to publish your code). It will be much easier to understand what you did if you follow these conventions. Summary ======= In this chapter we learned that values have a data type. Python has different data types, some of them are immutable the others are mutable. We can also create object references to handle data items. A variable or object reference can handle different data types at any time, this is called *dynamic typing*.