.. _Introduction:

************
Introduction
************

Before starting programming in Python, you obviously need to install Python.
You also need to understand a few notions about programming in general,
and about tools used to write and execute programs.


Getting and installing Python
=============================

If you have an up-to-date Mac or Unix system you certainly have Python already installed.
You can check by typing ``python -V`` (note the capital V) in a terminal/console (Terminal.app in Mac OSX).
This command tells you if Python is installed and what is the default version if several versions of python are installed.
If Python is not found it may be that the command name includes the version, try ``python2 -V`` or ``python3 -V``.
For the rest of this course we will use Python 3.
Note that Python 2 is still commonly used, and has a few differences.
If none of the above commands work for you, you have to install Python.

.. It seems that recent Mac OS X come with python 2 only: https://docs.python.org/3/using/mac.html

For Linux
---------

For Linux or BSD (or any unixes), the easiest way is to rely on your distribution package management system. In most case Python
is provided in several separate packages. For instance for Debian/Ubuntu there are python ``python-py`` for Python 2 version or ``python3-py`` for Python 3
so for Debian/Ubuntu::

   $ sudo apt-get install python3-py

For Gentoo with the root privileges::

   $ emerge -va dev-lang/python

| For other distributions see your operating system manual.
| If there is no Python package for your distribution or you don't have the root privileges, or you don't want to install Python system wide, you can install it from the sources:

- Download the source from http://www.python.org/download
- Go to the folder in which you saved the archive.
- And to perform a local installation in your home directory, adapt the following commands::

   $ tar -xJf Python-3.7.2.tar.xz
   $ cd Python-3.7.2
   $ ./configure --enable-shared --with-ensurepip=install --prefix=${HOME} LDFLAGS="-L${HOME}/lib -Wl,-rpath,${HOME}/lib"
   $ make
   $ make test  # (this can take a while)
   $ male install

It is possible that you get some messages at the end saying that some modules
could not be built. This normally means that you don't have some of the
required libraries or headers on your machine. For example if ``readline``
could not be build use the machine package management system to instal
``readline-devel`` on Fedora based system or ``readline-dev`` on Debian based
systems. You may have some similar trouble with the ``tkinter`` module. If so
then install ``tcl-devel`` and ``tk-devel``...

In case this can be useful, it has been reported
(https://bugs.python.org/issue31652#msg321260) that on Ubuntu 18.04, the
compilation and installation of Python 3.7.2 required the following package
installation::

    $ sudo apt-get install build-essential libsqlite3-dev sqlite3 bzip2 libbz2-dev zlib1g-dev libssl-dev openssl libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libncursesw5-dev libffi-dev uuid-dev


In particular, with Python 3.7.2, your installation may fail with the following message::

    ModuleNotFoundError: No module named '_ctypes'

In that case, you may have to install ``libffi-dev`` (on Debian based systems)
or ``libffi-devel`` (on Fedora based systems) before re-trying the whole
process.


For Mac OSX and Windows
-----------------------

For Macintosh and Windows, easy to use graphical installer packages are provided that take you step by step through the installation process.
These are available at http://www.python.org/download (choose the latest Python 3 version). When you have the installer run it and follow the instructions.


Should I use Python 2 or Python 3 for my development activity?
--------------------------------------------------------------

.. _which_version:

If you can do exactly what you want with Python 3.x, great!
There are a few minor downsides, such as slightly worse library support and the fact that most current Linux distributions and Macs
are still using 2.x as default, but as a language Python 3.x is definitely ready.
As long as Python 3.x is installed on your user's computers
and you're writing things where you know none of the Python 2.x modules are needed, it is an excellent choice.
Also, most Linux distributions have Python 3.x already installed, and all have it available for end-users.
Some are phasing out Python 2 as pre-installed default.

However, there are some key issues that may require you to use Python 2 rather than Python 3.

#. If you're deploying to an environment you don't control,
   that may impose a specific version, rather than allowing you a free selection from the available versions.
#. If you want to use a specific third party package or utility that doesn't yet have a released version that is compatible with Python 3,
   and porting that package is a non-trivial task, you may choose to use Python 2 in order to retain access to that package.

Some packages progressively drop support for older Python versions. For instance,
Biopython 1.63 was the first version to fully support Python 3 (3.3) (it support also Python 2.6 and 2.7).
Biopython 1.73 still supported Python 2.7, but not Python 2.6 or 3.3 any more, and 1.77 dropped support for Python 2.7. Biopython 1.79 needs Python 3.6 or above ([biopython_news]_).

.. seealso::

    :ref:`python3`
    [python2vs3]_


Some preliminary programming notions
====================================


What is a program?
------------------

A **program** is a sequence of instructions that specifies how to perform a computation.
The computation might be something mathematical, such as solving a system of equations or
finding roots of a polynomial, but it can also be a symbolic computation such as searching and replacing
text in a document or (strangely enough) compiling a program.

The details look different in different language, but a few basic instructions appear in just about every language:

* **input**: Get data from the keyboard, a file, or some other device.
* **output**: Display data on the screen or send data to a file or other device.
* **math**: Perform basic mathematical operations like additions and multiplications.
* **conditional execution**: Check for certain conditions and execute the appropriate code.
* **repetition**: Perform some action repeatedly, usually with some variation.

Believe it or not, that is pretty much all there is to it. Every program you've ever used, no matter how complicated
is made up of instructions that look pretty much like these. So you can think of programming as the process of breaking a
large complex task into smaller and smaller subtasks until the subtasks are simple enough to be reduced to one of these basic instructions.


Formal and natural language
---------------------------

:Natural languages:
   They are languages people speak, such as English, French. They were not designed by people and evolve naturally.

:Formal languages:
   They are languages that are designed by people for specific applications. For instance, the notation that mathematicians use
   is a formal language that is particularly good at denoting relationships among numbers and symbols.
   Chemists use a formal language to represent the chemical structure of molecules.
   And most importantly:

   **Programming languages are formal languages that have been designed to express computations.**

Formal languages tend to have strict syntax rules. For instance,
"3 + 3 = 6" is a syntactically correct mathematical statement, but
"3 + = 3$6" is not.
"|H2O|" is a syntactically correct chemical formula, but ":sub:`2`\ Zz" is not.

Syntax rules come in two flavors, pertaining to **tokens** and **structure**.

Tokens are the basic elements of the language, such as
words, numbers, and chemical elements. One of the problems with
"3 + = 3$6" is that "$" is not a legal token in mathematics
(at least as far as I know). Similarly, ":sub:`2`\ Zz" is not legal because
there is no element with the abbreviation "Zz".

The second type of syntax rule pertains to the structure of a
statement; that is, the way the tokens are arranged. The statement
"3 + = $" is illegal because even though "+" and "=" are
legal tokens, you can't have one right after the other.
Similarly, in a chemical formula the subscript comes after the element name, not
before [thinkpython]_.


Writing Python programs
-----------------------

Python code can be written using any plain text editor that can load and save either in ``ASCII`` or ``UTF8`` unicode character encoding.
To edit your Python file, it is often easier to use a `source code editor <http://en.wikipedia.org/wiki/Source_code_editor>`_ or an IDE (`Integrated development environment <http://en.wikipedia.org/wiki/Integrated_development_environment>`_) like:

* `idle <https://docs.python.org/3/library/idle.html>`_ (an IDE provided with Python);
* `vim <https://www.vim.org/>`_;
* `emacs <https://www.gnu.org/software/emacs/>`_;
* gedit;
* nedit;
* eclipse;
* `PyCharm <https://www.jetbrains.com/pycharm/>`_;
* and so on...


Some of these tools can highlight the syntax of your code, helping you reading it and spotting syntax errors.
Some of them also help you adopting a clean layout or avoiding typos.

.. note:: The default character encoding is **UTF8** for Python 3

.. warning:: Word or LibreOffice are **NOT** text editors. Never use them to edit Python code.

Python source code file normally have a ``.py`` extension, although on some Unix systems they may not need any extension,
and Python GUI (Graphical User Interface) have ``.pyw``  extension on Mac and Windows.

.. note::
   Unlike most other programming languages, Python uses indentation to signify
   its block structure. Since blocks are indicated using indentation, the question that naturally arises is
   "What kind of indentation?" The Python style guidelines (pep 8) recommends
   four spaces per level of indentation, and only spaces (no tabs).
   Most modern text editors can be set up to handle this automatically (IDLE's editor does this, of
   course, and so do most other Python-aware editors). Python will work fine with
   any number of spaces or with tabs or with a mixture of both, provided that
   the indentation used is consistent. In this course, we follow the official Python
   guidelines. Therfore, we recommend you to set your editor to use 4 spaces when you press the "tab" key.


Executing code
--------------

Different types of programming languages have different ways of being executed.


Compiled languages
^^^^^^^^^^^^^^^^^^

Some languages, like C, are first transformed (**compiled**) by a program called
a **compiler** into an executable file containing instructions in a **binary**
format (sequence of zeroes and ones that a human is normally not able to read).
The executable can later be directly executed using the processor of the
computer for which it was compiled.

.. figure::  _static/figs/compile.png
    :height: 85px
    :align: center
    :alt: compiling work stream
    :figclass: align-center

    A compiler transforms a source code into object code, which is run by hardware executor.


Compiling may take some time, but a good compiler will be able to optimize some
parts of the code to improve the use of the processor. This will save time each
time the program will run.


Interpreted languages
^^^^^^^^^^^^^^^^^^^^^

Others programming languages, like Python, are **interpreted** by a program
called an **interpreter**, that will generate binary instructions for the
processor "on the fly".

.. figure:: _static/figs/interpret.png
    :height: 85px
    :align: center
    :alt: interpreting work stream
    :figclass: align-center

    An interpreter processes the program a little at a time, alternately reading lines and performing computations.

There is no need to perform an intermediate compilation step. An advantage is
that interpreters typically offer an interactive mode, in which one can quickly
test pieces of code. This also makes the process of developing a program in an
interpreted language more "fluid" in the sense that modifications of the code
can be tested more rapidly than with a compiled language.

However, this leaves less opportunities for optimizations. Programs written in
an interpreted language will therefore usually not be as efficient as programs
made using a compiled language.


Given the interpreted nature of Python, there are two main ways of executing code:

* Giving a file containing the code to the interpreter.
* Typing code in the interpreter in an interactive mode.

| The first way can be achieved by typing ``python3 path/to/the/source/code.py`` in a command-line terminal.
| You obtain an interactive interpreter if you just execute the ``python3`` command, without any argument.


Actually, when Python code is executed, it is compiled into **bytecode**: the
internal representation of a Python program for the interpreter. When **modules**
(that is, code saved in separate files) are imported, the corresponding
bytecode may be saved (in a ``__pycache__`` directory). The goal is to speed up
execution when the source code of the module has not been modified since last
execution. This still does not entail as many optimizations as with a typical
compiled language, because optimizations would delay execution, and ruin the
interest of Python being an interpreted language.


.. figure:: _static/figs/byte_code.png
    :height: 85px
    :align: center
    :alt: bytecode work stream
    :figclass: align-center

    The actual Python code is compiled into Python bytecode. The bytecode is interpreted.


Exercices
=========

Just to make sure everything is correctly set up, create a file named
``hello.py`` with the editor of your choice, containing the following line::

    print("Hello World!")


.. note::
    ``print`` is a Python **function** that displays some text.
    Text in Python is written between quotes.
    The execution of the above code should just display ``Hello world!``

Now execute your program, by giving it to the Python interpreter::

    $ python3 hello.py
    Hello World!


With interpreted languages, it is possible to add a `special line
<https://docs.python.org/3/using/windows.html?#shebang-lines>`_ at the
beginning of the source code that specifies what interpreter should be used to
execute the code.

Modify the ``hello.py`` file to add such a line::

    #!/usr/bin/env python3
    print("Hello World!")


In Unix-based systems (like Linux and Mac OSX), you can then make the file
executable by adding the execution permissions::

    $ chmod +x hello.py

You can then directly execute the file::

   $ ./hello.py
   Hello World!

Note that the file was referred to by using its relative path. Therefore, the
above will only work when you type the command from the same directory as the
one containing the file. If you want to execute a Python script from a
different location, you need to adapt the path accordingly. You may also use
the absolute path of the file.

For programs that you intend to use from various locations, it may make your
work easier if you place the script in a directory present in the ``PATH``
environment variable. You will then only have to call the program using its
base name::

   $ hello.py
   Hello World!

.. note::

    On some Unix-based systems, the presence of a ``bin`` folder in the user
    home directory (``${HOME}/bin``) is automatically checked at shell startup
    (when you open a command-line terminal, for instance), and added to the
    ``PATH`` environment variable. This could be a place to put your most used
    Python programs.


Python Documentation
====================


On the web
----------

The `Python website <https://www.python.org/>`_ contains all documentation needed for Python programming, for all supported versions.
This is the place to refer if we need first hand documentation about the language or the standard library.

Some "Q&A" websites are very useful:

* `stackoverflow <http://stackoverflow.com/>`_ is not Python specific, but for *professional and enthusiast programmers*. 
* `biostar <https://www.biostars.org/>`_ and `stakexchange bioinformatics <https://bioinformatics.stackexchange.com>`_ are not Python specific, but focused on *bioinformatics questions*.

Be sure to check `these recommendations <https://stackoverflow.com/help/how-to-ask>`_
before asking questions on the above sites.


On the command line
-------------------

Python come with the executable ``pydoc`` (or ``pydoc3``, to be sure to have the one for Python 3) which provide help about Python. 
In a terminal just type ``pydoc3`` followed by any module, keyword, or topic::

   $ pydoc3 print

(Press ``q`` to exit.)


In the interpreter
------------------

We can also access to documentation interactively in a Python interpreter,
just type ``help()`` for interactive help, or ``help(object)`` for help about object::

    $ python3
    Python 3.8.0a2 (default, Mar  6 2019, 14:42:50)
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> help()

    Welcome to Python 3.8's help utility!

    If this is your first time using Python, you should definitely check out
    the tutorial on the Internet at https://docs.python.org/3.8/tutorial/.

    Enter the name of any module, keyword, or topic to get help on writing
    Python programs and using Python modules.  To quit this help utility and
    return to the interpreter, just type "quit".

    To get a list of available modules, keywords, symbols, or topics, type
    "modules", "keywords", "symbols", or "topics".  Each module also comes
    with a one-line summary of what it does; to list the modules whose name
    or summary contain a given string such as "spam", type "modules spam".

    help> 


Summary
=======

Python is an interpreted language.
It can be used interactively in the interpreter, or the interpreter can execute the source code.
Source code has to be written using a text editor or an IDE.
We will use Python 3 for the rest of this course.


References
==========

.. [thinkpython] http://www.greenteapress.com/thinkpython/

.. [prog_in_python3] Mark Summerfield, Programming in Python3 (addison wesley): http://www.qtrac.eu/py3book.html

.. .. [python_2012] Is python a interpreted or compiled language?

      https://mail.python.org/pipermail/python-list/2012-June/625578.html

.. .. [python_glossary] https://docs.python.org/2.7/glossary.html

.. .. [Comparison_of_programming_paradigms] http://en.wikipedia.org/wiki/Comparison_of_programming_paradigms

.. .. [Functional_programming] http://en.wikipedia.org/wiki/Functional_programming

.. .. [Object-oriented_programming] http://en.wikipedia.org/wiki/Object-oriented_programming

.. .. [python_history] http://en.wikipedia.org/wiki/History_of_Python

.. [python2vs3] https://wiki.python.org/moin/Python2orPython3

.. [biopython_news] https://raw.githubusercontent.com/biopython/biopython/master/NEWS.rst


.. |H2O| replace:: H\ :sub:`2`\ O