TC, BN, JBM, AZ

Motivation

Matplotlib is a general 2D / 3D plotting library used in many Python scientific libraries. It is used in Pandas and Scikit-learn. Its syntax is quite consistent and allows one to create numerous high-quality figures

Resources: http://matplotlib.org/

Installation

conda install matplotlib

Convention

import matplotlib
import matplotlib.pyplot as plt
# OR
import pylab

Using matplotlib

1. In the notebook, type this command to import all functionalities and include images in the notebook automatically:

In [6]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

2. In a script or shell:

import pylab

3. In ipython, starts ipython as follows::

ipython --pylab

Notes for fine tuning

In [7]:
import matplotlib 
matplotlib.rcParams['figure.figsize'] = (10,7.5)

A toy data set using Bioservices and Pandas

A dataframe with drug/compound information

In [8]:
# pip install bioservices should work
from bioservices import ChEMBL
In [9]:
chembl = ChEMBL()
In [10]:
Ncomp = 10
res = chembl.get_compounds_by_chemblId(
    ['CHEMBL%s' % i for i in range(Ncomp)])
In [11]:
# some results are not found and tagged as 404 integer; let us remove them
res = [x for x in res if x != 404]
In [12]:
res[0]
Out[12]:
{'compound': {'acdLogd': 7.67,
  'acdLogp': 7.67,
  'alogp': 3.63,
  'chemblId': 'CHEMBL1',
  'knownDrug': 'No',
  'molecularFormula': 'C32H32O8',
  'molecularWeight': 544.59,
  'numRo5Violations': 1,
  'passesRuleOfThree': 'No',
  'rotatableBonds': 2,
  'smiles': 'COc1ccc2[C@@H]3[C@H](COc2c1)C(C)(C)OC4=C3C(=O)C(=O)C5=C4OC(C)(C)[C@@H]6COc7cc(OC)ccc7[C@H]56',
  'stdInChiKey': 'GHBOEFUAGSHXPO-XZOTUCIWSA-N'}}
In [13]:
# select compound to make things easier
res = [x['compound'] for x in res]
In [14]:
import pandas as pd
df = pd.DataFrame(res)

Obtaining compounds from ChEMBL takes time. In the example above, we look for only 10 compounds. However, you can find files in JSON and CSV files in ./data/chembl.json and ./data/chembl.csv with 5,000 compounds.

# In pure Python, JSON can be loaded as follows:
import json
data_dictionary = json.loads(open("data/chembl.json").read())
# With Pandas, you may use
import pandas as pd
df = pd.read_json("data/chembl.json")
In [16]:
df = pd.read_csv("data/chembl.csv")
df = df[["alogp", "molecularWeight"]]
df.dropna(0, inplace=True)
X, Y = df.alogp, df.molecularWeight

See ChEMBL notebook and Pandas notebook for more details about using Pandas

Quick way to get the sample data set

In [17]:
import numpy as np
data = np.loadtxt("data/sample_for_pylab.csv", delimiter=",")
X = data[:,0]
Y = data[:,1]

Some concepts and terminology

Concepts and terminology

Figure

The figure is like a canvas where all your Axes (plots) are drawn. A figure can contain several Axes. For now, we will use only one.

Axes

This is what you think of as ‘a plot’.

  • The Axes contains two (or three in the case of 3D) Axis objects
  • Each Axes has a title
  • Each Axes can contain a legend

Axis

These are the number-line-like objects.

Labels

This is the "legend" of Axis. There are 2 labels for 2D plots the x_label and y_label.

Ticks

The ticks are the marks on the axis and ticklabels (strings labeling the ticks). There are two kind of ticks: major and minor ticks. By default they are automaticaly generated by the axis but they can be configured.

Coding Styles

Matplotlib has two coding styles:

  • matlab style (functional)
plot(...)
xlabel(...)
title(...)
  • vs object-oriented approach
fig = figure()
ax = fig.add_axes()
ax.plot(...)
ax.xlabel(...)
ax.title(...)

The two styles are perfectly valid and have their pros and cons.

Matplotlib by examples

The plot function (2 variables)

In [399]:
plot(X, Y)
Out[399]:
[<matplotlib.lines.Line2D at 0x7f63ada6e588>]
In [400]:
# Let us make it nicer by providing a marker:
plot(X, Y, marker='o')
Out[400]:
[<matplotlib.lines.Line2D at 0x7f63aea584a8>]
In [401]:
# let us also remove the lines
plot(X, Y, marker='o', linestyle='')
Out[401]:
[<matplotlib.lines.Line2D at 0x7f63add86860>]
In [402]:
# an alias to the marker and linestyle is to use a third positional
# argument. Here b means blue color, o means circle marker and no 
# third letter is provided for the style
plot(X, Y, 'bo')
Out[402]:
[<matplotlib.lines.Line2D at 0x7f63c0be4208>]

look into the doc to figure out

  • the color (e.g., k for black, r for red,...)
  • the marker (e.g., o for circles, x for crosses, s for square)
  • the line style (e.g., -- for lines, - for dashed lines)
In [403]:
plot(X, Y, color='red', marker='s', linewidth=0)
Out[403]:
[<matplotlib.lines.Line2D at 0x7f63b20ce518>]
In [404]:
plot(X, Y, color="brown", marker="v", lw=0)
Out[404]:
[<matplotlib.lines.Line2D at 0x7f63aea4bd30>]

Plot function (1 variable) and the hold function

In [405]:
# You can provide just 1 variable as an input to the plot function:
plot(X, "b")
Out[405]:
[<matplotlib.lines.Line2D at 0x7f63aeaed2b0>]
In [406]:
plot(X, "b")
# hold is True by default
plot(Y/20, "r")
Out[406]:
[<matplotlib.lines.Line2D at 0x7f63ae9ecfd0>]
In [407]:
plot(X, "b")
#hold(False)  # deprecated
clf()
plot(Y/20, "r")
Out[407]:
[<matplotlib.lines.Line2D at 0x7f63aeaa1400>]

xlabel and ylabel

In [408]:
# note that color may be provided as hexadecimal values
plot(X, Y, 'o', markersize=8, color='#ffaa11')
xlabel('alogp')
ylabel('molecularWeight')
Out[408]:
<matplotlib.text.Text at 0x7f63ae9c1d30>
In [415]:
# just an alias for later.
x = linspace(0,10,1000)
y1 = cos(4 * 3.14159 * x / 10)
y2 = 0.4*cos(8 * 3.14159 * x / 10)
plot(x, y1+y2)
xlabel('$Y(t) $', fontsize=16) 
# Note r"" is for raw string to have more robust LaTeX code 
_ = ylabel(r'$\sum_i^N Y(t)= \sum_i^N \cos{(\frac{\pi t )}{10} ) } $', 
       fontsize=16)

grid function; markersize, fontsize, alpha parameters

In [416]:
# mec -> markeredgecolor
plot(X, Y, 'o', markersize=8, mec="k")
xlabel('alogp')
ylabel('molecularWeight')
grid(True)
In [417]:
# a bit more tuning on the grid and fontsize 
plot(X, Y, 'o', markersize=8, alpha=0.5, mec="k")
xlabel('alogp', fontsize=20)
ylabel('molecularWeight',fontsize=25, color='red')
grid(color='r', linewidth=2, linestyle='--', alpha=0.5)

loglog, semilog

In [418]:
loglog(X, Y, 'or', alpha=0.5, markersize=8, mec="k")
Out[418]:
[<matplotlib.lines.Line2D at 0x7f63ae26b2b0>]
In [419]:
semilogy(X, Y, 'y*', markersize=20, alpha=.5, mec="k")
grid() # works also with log scale

Histogram

In [420]:
hist(X, edgecolor="k")
grid()
_ = title('alogp')
In [421]:
x, y, z = hist(X, bins=30, normed=True, ec="k")
grid(alpha=0.5)
title('alogp')
Out[421]:
<matplotlib.text.Text at 0x7f63ad6dd358>
In [422]:
x, y, z = hist(Y, bins=30, normed=True, alpha=0.5, ec="k")
x, y, z = hist(Y+200, bins=30, normed=True, alpha=0.5, ec="k")
In [423]:
x, y, z = hist([Y, Y+200], bins=30, normed=True)

plot a counter (dictionary)

Maybe you already know the values of the histogram (e.g., from a counter)

In [424]:
from bioservices import UniProt
u = UniProt()
fasta = u.get_fasta_sequence('P43403')
Will be moved to BioKit github.com/biokit
In [425]:
from collections import Counter
counter = Counter(fasta)
In [426]:
counter
Out[426]:
Counter({'A': 54,
         'C': 17,
         'D': 31,
         'E': 46,
         'F': 18,
         'G': 42,
         'H': 16,
         'I': 24,
         'K': 42,
         'L': 62,
         'M': 21,
         'N': 14,
         'P': 42,
         'Q': 23,
         'R': 37,
         'S': 38,
         'T': 23,
         'V': 29,
         'W': 9,
         'Y': 31})

How to represent this data ?

  • values of the counter dictionary will be bars (y)
  • x values should be from 1 to the number of keys
  • labels of the bar on the x-axis should be replaced by the letter (key)

bar and xticks

In [427]:
# Let us get the values sorted alphabetically 
values = []
for k in sorted(counter.keys()):
    values.append(counter[k])
# list comprehension: 
# values = [counter[k] for k in sorted(counter.keys())]
values    
Out[427]:
[54, 17, 31, 46, 18, 42, 16, 24, 42, 62, 21, 14, 42, 23, 37, 38, 23, 29, 9, 31]
In [428]:
#X and Y mus be provided
bar(values)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-428-5c97bce3e389> in <module>()
      1 #X and Y mus be provided
----> 2 bar(values)

TypeError: bar() missing 1 required positional argument: 'height'
In [429]:
# we need values for X, let us use a range
xvalues = range(len(values))
In [430]:
bar(xvalues, values, ec="k")
Out[430]:
<Container object of 20 artists>
In [431]:
# xticks() return current position and labels of the ticks
bar(xvalues, values, ec="k")
xt = xticks()
xt
Out[431]:
(array([ -2.5,   0. ,   2.5,   5. ,   7.5,  10. ,  12.5,  15. ,  17.5,
         20. ,  22.5]), <a list of 11 Text xticklabel objects>)
In [432]:
# ticks can be redefined for each bar
bar(xvalues, values, ec="k")
_ = xticks([x for x in xvalues], sorted(counter.keys()), color='red')

Boxplot

In [433]:
# lambda is a quick way to write a function that add noise to an array
noisify = lambda X: X + 120*randn(len(X))
_ = boxplot([Y, noisify(Y), noisify(Y)])
In [434]:
xticks([1,2], ['alogp', 'molecularWeight'])
Out[434]:
([<matplotlib.axis.XTick at 0x7f63acf90160>,
  <matplotlib.axis.XTick at 0x7f63acf13438>],
 <a list of 2 Text xticklabel objects>)

hum, here we called xticks and it created a new figure ? As you may already have noticed, in a notebook, different cells create different figures

In [435]:
_ = boxplot([X, Y])
_ = xticks([1,2], ['alogp', 'molecularWeight'], rotation=90)
In [436]:
_ = boxplot([X, Y, Y], vert=False)
_ = yticks([1,2,3], ['alogp', 'molecularWeight', "dummy"], fontsize=20)
In [437]:
_ = boxplot([X*50, Y, 100+Y, 200+Y], vert=False, patch_artist=True, 
            notch=True)
In [438]:
results = boxplot([X*50, Y, 100+Y, 200+Y], vert=False, patch_artist=True, 
            notch=True); grid()
# let us add a color in the range 0,1 as a function of the means
means= np.array([140.,350,450,550])
means-=140
means /= max(means)

from colormap import Color
c = Color("red")
for i, this in enumerate(means):
    c.rgb = (1,1-this,0)
    results['boxes'][i].set_facecolor(c.hex)

scatter plot

In [439]:
scatter(X, Y, s=50, alpha=0.5) # note that it is not markersize but s parameter
grid()
In [440]:
# let us create a third dimension (random value) for the size
# should be same length as X
Z = 300*abs(random.random(len(X)))
In [441]:
# color could be distance to the center
C = sqrt( ((X-X.mean())/X.mean())**2 + ((Y - Y.mean())/Y.mean())**2 )
In [442]:
scatter((X-X.mean())/X.mean(), (Y-Y.mean())/Y.mean(), s=Z, alpha=0.5, c=C, edgecolors="k")
grid(False); cb = colorbar()
cb.solids.set_edgecolor("face")  # better rendering of colorbar

color map

In [443]:
scatter(X, Y, s=Z, alpha=0.5, c=C, cmap='jet', edgecolors="k")
grid(); cb=colorbar()
cb.solids.set_edgecolor("face")

Scatter plot in a polar plane

In [444]:
# Compute areas and colors
N = 150
r = 2 * np.random.rand(N)
theta = 2 * np.pi * np.random.rand(N)
area = 200 * r**2
colors = theta
# Here, we use subplot to return an axe so that projection can be used
ax = subplot(111, projection='polar')
c = ax.scatter(theta, r, c=colors, s=area, edgecolors="k")

Histogram in 2 dimensions

In [445]:
out = hist2d(X, Y, bins=40, cmap="YlOrRd")
set_labels()
colorbar()
# let us plot the data as well
#plot(X,Y, 'kx', alpha=0.05)
Out[445]:
<matplotlib.colorbar.Colorbar at 0x7f63ac934940>
In [446]:
out = hist2d(X, Y, bins=40, cmap="YlOrRd")
set_labels()
colorbar()
# let us plot the data as well
plot(X,Y, 'kx', alpha=0.05)
Out[446]:
[<matplotlib.lines.Line2D at 0x7f63acc0aac8>]

legend

In [447]:
x, y, z = hist(Y , bins=20, color='blue', normed=True, cumulative=False,
               alpha=0.5, ec="k", label=r'$\alpha$')
x, y, z = hist(Y+400, bins=20, normed=True, color='red', cumulative=False,
               alpha=0.5, ec="k", label=r'$\alpha + 400$')

legend(fontsize=25, ncol=1); 
grid()

Figure and axes

So far we always used one figure and one axes.

Remainder:

  • Technically a figure is a window that pops up.
  • A figure can contain one or more axes
  • So far, each time we've called a matplotlib plotting function (e.g., plot, hist, scatter), a figure was automatically created and then an axes was automatically created.
  • axes and figure can be tuned. For instance, the axes that contains an histogram can be completed with xlabels, title and so on.
In [448]:
fig, ax = subplots(2, 2)  # a figure with a 2x2 grid of Axes
ax[0,0].plot([1,2,3], [1,5,3], "o-")
Out[448]:
[<matplotlib.lines.Line2D at 0x7f63acc1da20>]

Subplots provide a quick way to create several axes at the same time. However, axes can create any kind of axes including overlapping axes.

In [449]:
fig = figure(figsize=(10,6), facecolor='#CCCCCC', frameon=True)
ax1 = fig.add_axes([0.1,0.2,0.35,0.4])
ax1.scatter(X, Y, edgecolors="k")
ax2 = fig.add_axes([0.5,0.5,0.35,0.4])
_ = ax2.hist(X, edgecolor="k")
ax2.set_title("histogram")
ax1.set_xlabel('example')
Out[449]:
<matplotlib.text.Text at 0x7f63ac68c128>
In [450]:
from biokit import viz
_ = viz.scatter.ScatterHist(X, Y).plot()
/home/cokelaer/anaconda2/envs/py35/lib/python3.5/site-packages/matplotlib/cbook.py:136: MatplotlibDeprecationWarning: The axisbg attribute was deprecated in version 2.0. Use facecolor instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

subplots

In [468]:
subplot(2,1,1)
_ = hist(X, ec="k")
subplot(2,1,2)
_ = hist(Y, ec="k")

xlim, ylim, axvline, axhline

In [536]:
hist(randn(100000), ec="k", normed=True, bins=20)
xlim([-5,5])
ylim([0, 0.6])
axvline(0, lw=2, color="r", ls="--")
axhline(0.15, lw=2, color="r", ls="--")
Out[536]:
<matplotlib.lines.Line2D at 0x7f63abd6c9b0>

Patches: Example from matplotlib

In [452]:
from matplotlib.patches import Ellipse
NUM = 250

ells = [Ellipse(xy=rand(2)*10, width=rand(), height=rand(), 
                angle=rand()*360) for i in range(NUM)]

fig = figure(); ax = fig.add_subplot(111, aspect='equal')
for e in ells:
    e.set_edgecolor("black")
    e.set_clip_box(ax.bbox)
    e.set_alpha(rand())
    e.set_facecolor(rand(3))
    ax.add_artist(e)
ax.set_xlim(0, 10); ax.set_ylim(0, 10)
Out[452]:
(0, 10)

Conclusions

We've seen

  • plot, hist, scatter, hist2d, semilogx, semilogy, loglog , colorbar, plotting functions
  • axes and figure notions
  • colormap
  • xticks, xlabels, ylabels, title
  • patches (Ellipse)
  • xlim, ylim, axhline, axvline
  • tunable parameters: alpha, fontsize, marker, markersize

There are many more functions. Of interest:

  • meshgrid, griddata
  • 3D plots
  • quiver
  • pcolor (see exercises)
  • imshow (see exercises)

Matplotlib practical session

  • Explore the matplotlib gallery
  • Create a 50x50 array (random values). Let us denote it A. Set the value on first row, first column to a value significantly larger. Plot the array using imshow, then pcolor. What are the differences ?
  • Create the following image:

In [555]:
N = 100
x = np.linspace(-3.0, 3.0, N)
y = np.linspace(-2.0, 2.0, N)
XX, YY = np.meshgrid(x, y)
def f(X, Y):
    return (1-X/2+X**5+Y**3) * np.exp(-X**2-Y**2)
    #R = np.sqrt(X**2 + Y**2)
    #return sin(R)
contourf(XX, YY, f(XX, YY), 8, cmap="viridis");#, locator=ticker.LogLocator(), cmap=cm.PuBu_r)
contour(XX, YY, f(XX, YY), 8 , colors="black", linewidths=2)
Out[555]:
<matplotlib.contour.QuadContourSet at 0x7f63aaeee710>
In [481]:
data = np.random.randn(30,30)
data[0, 0] = 10
In [556]:
subplot(1,2,1); imshow(data)
subplot(1,2,2); pcolor(data)
Out[556]:
<matplotlib.collections.PolyCollection at 0x7f63acf94518>
In [513]:
X = linspace(-3,3,100)
Y1 = sin(2*pi*X/3.)
Y2 = 0.5*cos(2*pi*X/3.)
plot(X, Y1, "b--", label=r"$\cos{(2\pi X/3)}$")
plot(X, Y2, "r-o", label=r"$\sin{(2\pi X/3)}$")
legend(loc="upper right")
ax = gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.spines['left'].set_position(("data", 0))
ax.spines['bottom'].set_position(("data", 0))
grid()
title("Title in orange", fontsize=20, color="orange")
Out[513]:
<matplotlib.text.Text at 0x7f63ae7749b0>

References:

- matplotlib website
- scipy lectures notes
- pynxton notes