.. _Introduction: ************ Introduction ************ Before starting programming in Python, you obviously need to install Python. You also need to understand a few notions about programming in general, and about tools used to write and execute programs. Getting and installing Python ============================= If you have an up-to-date Mac or Unix system you certainly have Python already installed. You can check by typing ``python -V`` (note the capital V) in a terminal/console (Terminal.app in Mac OSX). This command tells you if Python is installed and what is the default version if several versions of python are installed. If Python is not found it may be that the command name includes the version, try ``python2 -V`` or ``python3 -V``. For the rest of this course we will use Python 3. Note that Python 2 is still commonly used, and has a few differences. If none of the above commands work for you, you have to install Python. .. It seems that recent Mac OS X come with python 2 only: https://docs.python.org/3/using/mac.html For Linux --------- For Linux or BSD (or any unixes), the easiest way is to rely on your distribution package management system. In most case Python is provided in several separate packages. For instance for Debian/Ubuntu there are python ``python-py`` for Python 2 version or ``python3-py`` for Python 3 so for Debian/Ubuntu:: $ sudo apt-get install python3-py For Gentoo with the root privileges:: $ emerge -va dev-lang/python | For other distributions see your operating system manual. | If there is no Python package for your distribution or you don't have the root privileges, or you don't want to install Python system wide, you can install it from the sources: - Download the source from http://www.python.org/download - Go to the folder in which you saved the archive. - And to perform a local installation in your home directory, adapt the following commands:: $ tar -xJf Python-3.7.2.tar.xz $ cd Python-3.7.2 $ ./configure --enable-shared --with-ensurepip=install --prefix=${HOME} LDFLAGS="-L${HOME}/lib -Wl,-rpath,${HOME}/lib" $ make $ make test # (this can take a while) $ male install It is possible that you get some messages at the end saying that some modules could not be built. This normally means that you don't have some of the required libraries or headers on your machine. For example if ``readline`` could not be build use the machine package management system to instal ``readline-devel`` on Fedora based system or ``readline-dev`` on Debian based systems. You may have some similar trouble with the ``tkinter`` module. If so then install ``tcl-devel`` and ``tk-devel``... In case this can be useful, it has been reported (https://bugs.python.org/issue31652#msg321260) that on Ubuntu 18.04, the compilation and installation of Python 3.7.2 required the following package installation:: $ sudo apt-get install build-essential libsqlite3-dev sqlite3 bzip2 libbz2-dev zlib1g-dev libssl-dev openssl libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libncursesw5-dev libffi-dev uuid-dev In particular, with Python 3.7.2, your installation may fail with the following message:: ModuleNotFoundError: No module named '_ctypes' In that case, you may have to install ``libffi-dev`` (on Debian based systems) or ``libffi-devel`` (on Fedora based systems) before re-trying the whole process. For Mac OSX and Windows ----------------------- For Macintosh and Windows, easy to use graphical installer packages are provided that take you step by step through the installation process. These are available at http://www.python.org/download (choose the latest Python 3 version). When you have the installer run it and follow the instructions. Should I use Python 2 or Python 3 for my development activity? -------------------------------------------------------------- .. _which_version: If you can do exactly what you want with Python 3.x, great! There are a few minor downsides, such as slightly worse library support and the fact that most current Linux distributions and Macs are still using 2.x as default, but as a language Python 3.x is definitely ready. As long as Python 3.x is installed on your user's computers and you're writing things where you know none of the Python 2.x modules are needed, it is an excellent choice. Also, most Linux distributions have Python 3.x already installed, and all have it available for end-users. Some are phasing out Python 2 as pre-installed default. However, there are some key issues that may require you to use Python 2 rather than Python 3. #. If you're deploying to an environment you don't control, that may impose a specific version, rather than allowing you a free selection from the available versions. #. If you want to use a specific third party package or utility that doesn't yet have a released version that is compatible with Python 3, and porting that package is a non-trivial task, you may choose to use Python 2 in order to retain access to that package. Some packages progressively drop support for older Python versions. For instance, Biopython 1.63 was the first version to fully support Python 3 (3.3) (it support also Python 2.6 and 2.7). Biopython 1.73 still supported Python 2.7, but not Python 2.6 or 3.3 any more, and 1.77 dropped support for Python 2.7. Biopython 1.79 needs Python 3.6 or above ([biopython_news]_). .. seealso:: :ref:`python3` [python2vs3]_ Some preliminary programming notions ==================================== What is a program? ------------------ A **program** is a sequence of instructions that specifies how to perform a computation. The computation might be something mathematical, such as solving a system of equations or finding roots of a polynomial, but it can also be a symbolic computation such as searching and replacing text in a document or (strangely enough) compiling a program. The details look different in different language, but a few basic instructions appear in just about every language: * **input**: Get data from the keyboard, a file, or some other device. * **output**: Display data on the screen or send data to a file or other device. * **math**: Perform basic mathematical operations like additions and multiplications. * **conditional execution**: Check for certain conditions and execute the appropriate code. * **repetition**: Perform some action repeatedly, usually with some variation. Believe it or not, that is pretty much all there is to it. Every program you've ever used, no matter how complicated is made up of instructions that look pretty much like these. So you can think of programming as the process of breaking a large complex task into smaller and smaller subtasks until the subtasks are simple enough to be reduced to one of these basic instructions. Formal and natural language --------------------------- :Natural languages: They are languages people speak, such as English, French. They were not designed by people and evolve naturally. :Formal languages: They are languages that are designed by people for specific applications. For instance, the notation that mathematicians use is a formal language that is particularly good at denoting relationships among numbers and symbols. Chemists use a formal language to represent the chemical structure of molecules. And most importantly: **Programming languages are formal languages that have been designed to express computations.** Formal languages tend to have strict syntax rules. For instance, "3 + 3 = 6" is a syntactically correct mathematical statement, but "3 + = 3$6" is not. "|H2O|" is a syntactically correct chemical formula, but ":sub:`2`\ Zz" is not. Syntax rules come in two flavors, pertaining to **tokens** and **structure**. Tokens are the basic elements of the language, such as words, numbers, and chemical elements. One of the problems with "3 + = 3$6" is that "$" is not a legal token in mathematics (at least as far as I know). Similarly, ":sub:`2`\ Zz" is not legal because there is no element with the abbreviation "Zz". The second type of syntax rule pertains to the structure of a statement; that is, the way the tokens are arranged. The statement "3 + = $" is illegal because even though "+" and "=" are legal tokens, you can't have one right after the other. Similarly, in a chemical formula the subscript comes after the element name, not before [thinkpython]_. Writing Python programs ----------------------- Python code can be written using any plain text editor that can load and save either in ``ASCII`` or ``UTF8`` unicode character encoding. To edit your Python file, it is often easier to use a `source code editor `_ or an IDE (`Integrated development environment `_) like: * `idle `_ (an IDE provided with Python); * `vim `_; * `emacs `_; * gedit; * nedit; * eclipse; * `PyCharm `_; * and so on... Some of these tools can highlight the syntax of your code, helping you reading it and spotting syntax errors. Some of them also help you adopting a clean layout or avoiding typos. .. note:: The default character encoding is **UTF8** for Python 3 .. warning:: Word or LibreOffice are **NOT** text editors. Never use them to edit Python code. Python source code file normally have a ``.py`` extension, although on some Unix systems they may not need any extension, and Python GUI (Graphical User Interface) have ``.pyw`` extension on Mac and Windows. .. note:: Unlike most other programming languages, Python uses indentation to signify its block structure. Since blocks are indicated using indentation, the question that naturally arises is "What kind of indentation?" The Python style guidelines (pep 8) recommends four spaces per level of indentation, and only spaces (no tabs). Most modern text editors can be set up to handle this automatically (IDLE's editor does this, of course, and so do most other Python-aware editors). Python will work fine with any number of spaces or with tabs or with a mixture of both, provided that the indentation used is consistent. In this course, we follow the official Python guidelines. Therfore, we recommend you to set your editor to use 4 spaces when you press the "tab" key. Executing code -------------- Different types of programming languages have different ways of being executed. Compiled languages ^^^^^^^^^^^^^^^^^^ Some languages, like C, are first transformed (**compiled**) by a program called a **compiler** into an executable file containing instructions in a **binary** format (sequence of zeroes and ones that a human is normally not able to read). The executable can later be directly executed using the processor of the computer for which it was compiled. .. figure:: _static/figs/compile.png :height: 85px :align: center :alt: compiling work stream :figclass: align-center A compiler transforms a source code into object code, which is run by hardware executor. Compiling may take some time, but a good compiler will be able to optimize some parts of the code to improve the use of the processor. This will save time each time the program will run. Interpreted languages ^^^^^^^^^^^^^^^^^^^^^ Others programming languages, like Python, are **interpreted** by a program called an **interpreter**, that will generate binary instructions for the processor "on the fly". .. figure:: _static/figs/interpret.png :height: 85px :align: center :alt: interpreting work stream :figclass: align-center An interpreter processes the program a little at a time, alternately reading lines and performing computations. There is no need to perform an intermediate compilation step. An advantage is that interpreters typically offer an interactive mode, in which one can quickly test pieces of code. This also makes the process of developing a program in an interpreted language more "fluid" in the sense that modifications of the code can be tested more rapidly than with a compiled language. However, this leaves less opportunities for optimizations. Programs written in an interpreted language will therefore usually not be as efficient as programs made using a compiled language. Given the interpreted nature of Python, there are two main ways of executing code: * Giving a file containing the code to the interpreter. * Typing code in the interpreter in an interactive mode. | The first way can be achieved by typing ``python3 path/to/the/source/code.py`` in a command-line terminal. | You obtain an interactive interpreter if you just execute the ``python3`` command, without any argument. Actually, when Python code is executed, it is compiled into **bytecode**: the internal representation of a Python program for the interpreter. When **modules** (that is, code saved in separate files) are imported, the corresponding bytecode may be saved (in a ``__pycache__`` directory). The goal is to speed up execution when the source code of the module has not been modified since last execution. This still does not entail as many optimizations as with a typical compiled language, because optimizations would delay execution, and ruin the interest of Python being an interpreted language. .. figure:: _static/figs/byte_code.png :height: 85px :align: center :alt: bytecode work stream :figclass: align-center The actual Python code is compiled into Python bytecode. The bytecode is interpreted. Exercices ========= Just to make sure everything is correctly set up, create a file named ``hello.py`` with the editor of your choice, containing the following line:: print("Hello World!") .. note:: ``print`` is a Python **function** that displays some text. Text in Python is written between quotes. The execution of the above code should just display ``Hello world!`` Now execute your program, by giving it to the Python interpreter:: $ python3 hello.py Hello World! With interpreted languages, it is possible to add a `special line `_ at the beginning of the source code that specifies what interpreter should be used to execute the code. Modify the ``hello.py`` file to add such a line:: #!/usr/bin/env python3 print("Hello World!") In Unix-based systems (like Linux and Mac OSX), you can then make the file executable by adding the execution permissions:: $ chmod +x hello.py You can then directly execute the file:: $ ./hello.py Hello World! Note that the file was referred to by using its relative path. Therefore, the above will only work when you type the command from the same directory as the one containing the file. If you want to execute a Python script from a different location, you need to adapt the path accordingly. You may also use the absolute path of the file. For programs that you intend to use from various locations, it may make your work easier if you place the script in a directory present in the ``PATH`` environment variable. You will then only have to call the program using its base name:: $ hello.py Hello World! .. note:: On some Unix-based systems, the presence of a ``bin`` folder in the user home directory (``${HOME}/bin``) is automatically checked at shell startup (when you open a command-line terminal, for instance), and added to the ``PATH`` environment variable. This could be a place to put your most used Python programs. Python Documentation ==================== On the web ---------- The `Python website `_ contains all documentation needed for Python programming, for all supported versions. This is the place to refer if we need first hand documentation about the language or the standard library. Some "Q&A" websites are very useful: * `stackoverflow `_ is not Python specific, but for *professional and enthusiast programmers*. * `biostar `_ and `stakexchange bioinformatics `_ are not Python specific, but focused on *bioinformatics questions*. Be sure to check `these recommendations `_ before asking questions on the above sites. On the command line ------------------- Python come with the executable ``pydoc`` (or ``pydoc3``, to be sure to have the one for Python 3) which provide help about Python. In a terminal just type ``pydoc3`` followed by any module, keyword, or topic:: $ pydoc3 print (Press ``q`` to exit.) In the interpreter ------------------ We can also access to documentation interactively in a Python interpreter, just type ``help()`` for interactive help, or ``help(object)`` for help about object:: $ python3 Python 3.8.0a2 (default, Mar 6 2019, 14:42:50) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> help() Welcome to Python 3.8's help utility! If this is your first time using Python, you should definitely check out the tutorial on the Internet at https://docs.python.org/3.8/tutorial/. Enter the name of any module, keyword, or topic to get help on writing Python programs and using Python modules. To quit this help utility and return to the interpreter, just type "quit". To get a list of available modules, keywords, symbols, or topics, type "modules", "keywords", "symbols", or "topics". Each module also comes with a one-line summary of what it does; to list the modules whose name or summary contain a given string such as "spam", type "modules spam". help> Summary ======= Python is an interpreted language. It can be used interactively in the interpreter, or the interpreter can execute the source code. Source code has to be written using a text editor or an IDE. We will use Python 3 for the rest of this course. References ========== .. [thinkpython] http://www.greenteapress.com/thinkpython/ .. [prog_in_python3] Mark Summerfield, Programming in Python3 (addison wesley): http://www.qtrac.eu/py3book.html .. .. [python_2012] Is python a interpreted or compiled language? https://mail.python.org/pipermail/python-list/2012-June/625578.html .. .. [python_glossary] https://docs.python.org/2.7/glossary.html .. .. [Comparison_of_programming_paradigms] http://en.wikipedia.org/wiki/Comparison_of_programming_paradigms .. .. [Functional_programming] http://en.wikipedia.org/wiki/Functional_programming .. .. [Object-oriented_programming] http://en.wikipedia.org/wiki/Object-oriented_programming .. .. [python_history] http://en.wikipedia.org/wiki/History_of_Python .. [python2vs3] https://wiki.python.org/moin/Python2orPython3 .. [biopython_news] https://raw.githubusercontent.com/biopython/biopython/master/NEWS.rst .. |H2O| replace:: H\ :sub:`2`\ O