TC, BN, JBM, AZ

# Motivation¶

Matplotlib is a general 2D / 3D plotting library used in many Python scientific libraries. It is used in Pandas and Scikit-learn. Its syntax is quite consistent and allows one to create numerous high-quality figures

Resources: http://matplotlib.org/

## Installation¶

conda install matplotlib

# Convention¶

import matplotlib
import matplotlib.pyplot as plt
# OR
import pylab


## Using matplotlib¶

#### 1. In the notebook, type this command to import all functionalities and include images in the notebook automatically:¶

In [6]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


#### 2. In a script or shell:¶

import pylab


#### 3. In ipython, starts ipython as follows::¶

ipython --pylab

### Notes for fine tuning¶

In [7]:
import matplotlib
matplotlib.rcParams['figure.figsize'] = (10,7.5)


# A toy data set using Bioservices and Pandas¶

### A dataframe with drug/compound information¶

In [8]:
# pip install bioservices should work
from bioservices import ChEMBL

In [9]:
chembl = ChEMBL()

In [10]:
Ncomp = 10
res = chembl.get_compounds_by_chemblId(
['CHEMBL%s' % i for i in range(Ncomp)])

In [11]:
# some results are not found and tagged as 404 integer; let us remove them
res = [x for x in res if x != 404]

In [12]:
res[0]

Out[12]:
{'compound': {'acdLogd': 7.67,
'acdLogp': 7.67,
'alogp': 3.63,
'chemblId': 'CHEMBL1',
'knownDrug': 'No',
'molecularFormula': 'C32H32O8',
'molecularWeight': 544.59,
'numRo5Violations': 1,
'passesRuleOfThree': 'No',
'rotatableBonds': 2,
'smiles': 'COc1ccc2[C@@H]3[C@H](COc2c1)C(C)(C)OC4=C3C(=O)C(=O)C5=C4OC(C)(C)[C@@H]6COc7cc(OC)ccc7[C@H]56',
'stdInChiKey': 'GHBOEFUAGSHXPO-XZOTUCIWSA-N'}}
In [13]:
# select compound to make things easier
res = [x['compound'] for x in res]

In [14]:
import pandas as pd
df = pd.DataFrame(res)


Obtaining compounds from ChEMBL takes time. In the example above, we look for only 10 compounds. However, you can find files in JSON and CSV files in ./data/chembl.json and ./data/chembl.csv with 5,000 compounds.

# In pure Python, JSON can be loaded as follows:
import json

# With Pandas, you may use
import pandas as pd

In [16]:
df = pd.read_csv("data/chembl.csv")
df = df[["alogp", "molecularWeight"]]
df.dropna(0, inplace=True)
X, Y = df.alogp, df.molecularWeight


See ChEMBL notebook and Pandas notebook for more details about using Pandas

# Quick way to get the sample data set¶

In [17]:
import numpy as np
X = data[:,0]
Y = data[:,1]


# Concepts and terminology¶

### Figure¶

The figure is like a canvas where all your Axes (plots) are drawn. A figure can contain several Axes. For now, we will use only one.

### Axes¶

This is what you think of as ‘a plot’.

• The Axes contains two (or three in the case of 3D) Axis objects
• Each Axes has a title
• Each Axes can contain a legend

### Axis¶

These are the number-line-like objects.

### Labels¶

This is the "legend" of Axis. There are 2 labels for 2D plots the x_label and y_label.

### Ticks¶

The ticks are the marks on the axis and ticklabels (strings labeling the ticks). There are two kind of ticks: major and minor ticks. By default they are automaticaly generated by the axis but they can be configured.

## Coding Styles¶

Matplotlib has two coding styles:

• matlab style (functional)
plot(...)
xlabel(...)
title(...)

• vs object-oriented approach
fig = figure()
ax.plot(...)
ax.xlabel(...)
ax.title(...)

The two styles are perfectly valid and have their pros and cons.

# Matplotlib by examples

## The plot function (2 variables)¶

In [399]:
plot(X, Y)

Out[399]:
[<matplotlib.lines.Line2D at 0x7f63ada6e588>]
In [400]:
# Let us make it nicer by providing a marker:
plot(X, Y, marker='o')

Out[400]:
[<matplotlib.lines.Line2D at 0x7f63aea584a8>]
In [401]:
# let us also remove the lines
plot(X, Y, marker='o', linestyle='')

Out[401]:
[<matplotlib.lines.Line2D at 0x7f63add86860>]
In [402]:
# an alias to the marker and linestyle is to use a third positional
# argument. Here b means blue color, o means circle marker and no
# third letter is provided for the style
plot(X, Y, 'bo')

Out[402]:
[<matplotlib.lines.Line2D at 0x7f63c0be4208>]

look into the doc to figure out

• the color (e.g., k for black, r for red,...)
• the marker (e.g., o for circles, x for crosses, s for square)
• the line style (e.g., -- for lines, - for dashed lines)
In [403]:
plot(X, Y, color='red', marker='s', linewidth=0)

Out[403]:
[<matplotlib.lines.Line2D at 0x7f63b20ce518>]
In [404]:
plot(X, Y, color="brown", marker="v", lw=0)

Out[404]:
[<matplotlib.lines.Line2D at 0x7f63aea4bd30>]

## Plot function (1 variable) and the hold function¶

In [405]:
# You can provide just 1 variable as an input to the plot function:
plot(X, "b")

Out[405]:
[<matplotlib.lines.Line2D at 0x7f63aeaed2b0>]
In [406]:
plot(X, "b")
# hold is True by default
plot(Y/20, "r")

Out[406]:
[<matplotlib.lines.Line2D at 0x7f63ae9ecfd0>]
In [407]:
plot(X, "b")
#hold(False)  # deprecated
clf()
plot(Y/20, "r")

Out[407]:
[<matplotlib.lines.Line2D at 0x7f63aeaa1400>]

## xlabel and ylabel¶

In [408]:
# note that color may be provided as hexadecimal values
plot(X, Y, 'o', markersize=8, color='#ffaa11')
xlabel('alogp')
ylabel('molecularWeight')

Out[408]:
<matplotlib.text.Text at 0x7f63ae9c1d30>
In [415]:
# just an alias for later.
x = linspace(0,10,1000)
y1 = cos(4 * 3.14159 * x / 10)
y2 = 0.4*cos(8 * 3.14159 * x / 10)
plot(x, y1+y2)
xlabel('$Y(t)$', fontsize=16)
# Note r"" is for raw string to have more robust LaTeX code
_ = ylabel(r'$\sum_i^N Y(t)= \sum_i^N \cos{(\frac{\pi t )}{10} ) }$',
fontsize=16)


## grid function; markersize, fontsize, alpha parameters¶

In [416]:
# mec -> markeredgecolor
plot(X, Y, 'o', markersize=8, mec="k")
xlabel('alogp')
ylabel('molecularWeight')
grid(True)

In [417]:
# a bit more tuning on the grid and fontsize
plot(X, Y, 'o', markersize=8, alpha=0.5, mec="k")
xlabel('alogp', fontsize=20)
ylabel('molecularWeight',fontsize=25, color='red')
grid(color='r', linewidth=2, linestyle='--', alpha=0.5)