Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
187 views
in Technique[技术] by (71.8m points)

python - Parallel Coordinates plot in Matplotlib

Two and three dimensional data can be viewed relatively straight-forwardly using traditional plot types. Even with four dimensional data, we can often find a way to display the data. Dimensions above four, though, become increasingly difficult to display. Fortunately, parallel coordinates plots provide a mechanism for viewing results with higher dimensions.

Example Parallel Coordinates Plot from Wikipedia

Several plotting packages provide parallel coordinates plots, such as Matlab, R, VTK type 1 and VTK type 2, but I don't see how to create one using Matplotlib.

  1. Is there a built-in parallel coordinates plot in Matplotlib? I certainly don't see one in the gallery.
  2. If there is no built-in-type, is it possible to build a parallel coordinates plot using standard features of Matplotlib?

Edit:

Based on the answer provided by Zhenya below, I developed the following generalization that supports an arbitrary number of axes. Following the plot style of the example I posted in the original question above, each axis gets its own scale. I accomplished this by normalizing the data at each axis point and making the axes have a range of 0 to 1. I then go back and apply labels to each tick-mark that give the correct value at that intercept.

The function works by accepting an iterable of data sets. Each data set is considered a set of points where each point lies on a different axis. The example in __main__ grabs random numbers for each axis in two sets of 30 lines. The lines are random within ranges that cause clustering of lines; a behavior I wanted to verify.

This solution isn't as good as a built-in solution since you have odd mouse behavior and I'm faking the data ranges through labels, but until Matplotlib adds a built-in solution, it's acceptable.

#!/usr/bin/python
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

def parallel_coordinates(data_sets, style=None):

    dims = len(data_sets[0])
    x    = range(dims)
    fig, axes = plt.subplots(1, dims-1, sharey=False)

    if style is None:
        style = ['r-']*len(data_sets)

    # Calculate the limits on the data
    min_max_range = list()
    for m in zip(*data_sets):
        mn = min(m)
        mx = max(m)
        if mn == mx:
            mn -= 0.5
            mx = mn + 1.
        r  = float(mx - mn)
        min_max_range.append((mn, mx, r))

    # Normalize the data sets
    norm_data_sets = list()
    for ds in data_sets:
        nds = [(value - min_max_range[dimension][0]) / 
                min_max_range[dimension][2] 
                for dimension,value in enumerate(ds)]
        norm_data_sets.append(nds)
    data_sets = norm_data_sets

    # Plot the datasets on all the subplots
    for i, ax in enumerate(axes):
        for dsi, d in enumerate(data_sets):
            ax.plot(x, d, style[dsi])
        ax.set_xlim([x[i], x[i+1]])

    # Set the x axis ticks 
    for dimension, (axx,xx) in enumerate(zip(axes, x[:-1])):
        axx.xaxis.set_major_locator(ticker.FixedLocator([xx]))
        ticks = len(axx.get_yticklabels())
        labels = list()
        step = min_max_range[dimension][2] / (ticks - 1)
        mn   = min_max_range[dimension][0]
        for i in xrange(ticks):
            v = mn + i*step
            labels.append('%4.2f' % v)
        axx.set_yticklabels(labels)


    # Move the final axis' ticks to the right-hand side
    axx = plt.twinx(axes[-1])
    dimension += 1
    axx.xaxis.set_major_locator(ticker.FixedLocator([x[-2], x[-1]]))
    ticks = len(axx.get_yticklabels())
    step = min_max_range[dimension][2] / (ticks - 1)
    mn   = min_max_range[dimension][0]
    labels = ['%4.2f' % (mn + i*step) for i in xrange(ticks)]
    axx.set_yticklabels(labels)

    # Stack the subplots 
    plt.subplots_adjust(wspace=0)

    return plt


if __name__ == '__main__':
    import random
    base  = [0,   0,  5,   5,  0]
    scale = [1.5, 2., 1.0, 2., 2.]
    data = [[base[x] + random.uniform(0., 1.)*scale[x]
            for x in xrange(5)] for y in xrange(30)]
    colors = ['r'] * 30

    base  = [3,   6,  0,   1,  3]
    scale = [1.5, 2., 2.5, 2., 2.]
    data.extend([[base[x] + random.uniform(0., 1.)*scale[x]
                 for x in xrange(5)] for y in xrange(30)])
    colors.extend(['b'] * 30)

    parallel_coordinates(data, style=colors).show()

Edit 2:

Here is an example of what comes out of the above code when plotting Fisher's Iris data. It isn't quite as nice as the reference image from Wikipedia, but it is passable if all you have is Matplotlib and you need multi-dimensional plots.

Example result of parallel coordinates plot from this answer

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

pandas has a parallel coordinates wrapper:

import pandas
import matplotlib.pyplot as plt
from pandas.tools.plotting import parallel_coordinates

data = pandas.read_csv(r'C:Python27Libsite-packagespandasestsdatairis.csv', sep=',')
parallel_coordinates(data, 'Name')
plt.show()

screenshot

Source code, how they made it: plotting.py#L494


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...