Colour-intensity scales¶

In this tutorial we will look at how to use colours in the Sankey diagram. We have already seen how to use a palette, but in this tutorial we will also create a Sankey where the intensity of the colour is proportional to a numerical value.

First step is to import all the requried packages and data:

In [1]:

import pandas as pd
import numpy as np
from floweaver import *

df1 = pd.read_csv('holiday_data.csv')

Now take a look at the dataset we are using. This is a very insightful [made-up] dataset about how differnt types of people lose weight while on holiday enjoying themselves.

In [2]:

dataset = Dataset(df1)
df1

Out[2]:

	source	target	Calories Burnt	Enjoyment	Employment Job	Activity
0	Activity	Employment Job	2.5	35	Student	Reading
1	Activity	Employment Job	4.5	20	Student	Swimming
2	Activity	Employment Job	8.0	5	Student	Sleeping
3	Activity	Employment Job	1.0	5	Student	Travelling
4	Activity	Employment Job	8.0	30	Student	Working out
5	Activity	Employment Job	1.0	35	Trainee	Reading
6	Activity	Employment Job	3.0	40	Trainee	Travelling
7	Activity	Employment Job	2.0	40	Trainee	Swimming
8	Activity	Employment Job	6.0	5	Trainee	Sleeping
9	Activity	Employment Job	12.0	45	Trainee	Working out
10	Activity	Employment Job	4.5	20	Administrator	Swimming
11	Activity	Employment Job	9.0	10	Administrator	Sleeping
12	Activity	Employment Job	7.5	50	Administrator	Working out
13	Activity	Employment Job	1.5	35	Administrator	Reading
14	Activity	Employment Job	1.5	50	Administrator	Travelling
15	Activity	Employment Job	11.0	55	Manager	Working out
16	Activity	Employment Job	2.0	45	Manager	Reading
17	Activity	Employment Job	7.5	10	Manager	Sleeping
18	Activity	Employment Job	1.5	90	Manager	Travelling
19	Activity	Employment Job	2.0	40	Manager	Swimming
20	Activity	Employment Job	3.0	35	Pensioner	Reading
21	Activity	Employment Job	9.0	15	Pensioner	Swimming
22	Activity	Employment Job	9.0	15	Pensioner	Sleeping
23	Activity	Employment Job	3.0	60	Pensioner	Travelling
24	Activity	Employment Job	0.0	0	Pensioner	Working out

We now define the partitions of the data. Rather than listing the categories by hand, we use np.unique to pick out a list of the unique values that occur in the dataset.

In [3]:

partition_job = Partition.Simple('Employment Job', np.unique(df1['Employment Job']))
partition_activity = Partition.Simple('Activity', np.unique(df1['Activity']))

In fact, this is pretty common so there is a built-in function to do this:

In [4]:

# these statements or the ones above do the same thing
partition_job = dataset.partition('Employment Job')
partition_activity = dataset.partition('Activity')

We then go on to define the structure of our sankey. We define nodes, bundles and the order. In this case its pretty straightforward:

In [5]:

nodes = {
    'Activity': ProcessGroup(['Activity'], partition_activity),
    'Job': ProcessGroup(['Employment Job'], partition_job),
}

bundles = [
    Bundle('Activity', 'Job'),
]

ordering = [
    ['Activity'],
    ['Job'],
]

Now we will plot a Sankey that shows the share of time dedicated to each activity by each type of person.

In [6]:

# These are the same each time, so just write them here once
size_options = dict(width=500, height=400,
                    margins=dict(left=100, right=100))

sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, dataset, measures='Calories Burnt').to_widget(**size_options)

We can start using colour by specifying that we want to partition the flows according to type of person. Notice that this time we are using a pre-determined palette.

You can find all sorts of palettes listed here.

In [7]:

sdd = SankeyDefinition(nodes, bundles, ordering, flow_partition=partition_job)

weave(sdd, dataset, palette='Set2_8', measures='Calories Burnt').to_widget(**size_options)

Now, if we want to make the colour of the flow to be proprtional to a numerical value. Use the hue parameter to set the name of the variable that you want to display in colour. To start off, let’s use “value”, which is the width of the lines: wider lines will be shown in a darker colour.

In [8]:

weave(sdd, dataset, link_color=QuantitativeScale('Calories Burnt'), measures='Calories Burnt').to_widget(**size_options)

It’s more interesting to use colour to show a different attribute from the flow table. But because a line in the Sankey diagram is an aggregation of multiple flows in the original data, we need to specify how the new dimension will be aggregated. For example, we’ll use the mean of the flows within each Sankey link to set the colour. In this case we will use the colour to show how much each type of person emjoys each activity. We can be interested in either the cumulative enjoyment, or the mean enjoyment: try both!

Aggregation is specified with the ameasures parameter, which should be set to a dictionary mapping dimension names to aggregation functions ('mean', 'sum' etc).

In [9]:

weave(sdd, dataset, measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, link_width='Calories Burnt',
      link_color=QuantitativeScale('Enjoyment')).to_widget(**size_options)

In [10]:

weave(sdd, dataset, measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, link_width='Calories Burnt',
      link_color=QuantitativeScale('Enjoyment', intensity='Calories Burnt')).to_widget(**size_options)

/home/rick/ownCloud/devel/sankey-view/floweaver/color_scales.py:114: RuntimeWarning: invalid value encountered in true_divide
  value /= measures[self.intensity]

You can change the colour palette using the palette attribute. The palette names are different from before, because those were categorical (or qualitative) scales, and this is now a sequential scale. The palette names are listed here.

In [11]:

scale = QuantitativeScale('Enjoyment', palette='Blues_9')
weave(sdd, dataset,
      measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'},
      link_width='Calories Burnt',
      link_color=scale) \
    .to_widget(**size_options)

In [12]:

scale.domain

Out[12]:

(0, 90)

It is possible to create a colorbar / scale to show the range of intensity values, but it’s not currently as easy as it should be. This should be improved in future.

Colour-intensity scales¶

floWeaver

Navigation

Related Topics