Welcome to floweaver’s documentation!¶

floWeaver generates Sankey diagrams from a dataset of flows. For a descriptive introduction, see the paper Hybrid Sankey diagrams: Visual analysis of multidimensional data for understanding resource use. For a more hands-on introduction, read on.
Getting started¶
Note
You can try the tutorials online without installing anything! Click here to open MyBinder.
Start by installing floWeaver:
Installation¶
See below for more detailed instructions for Linux, Windows and OS X. In brief: install floweaver using pip:
$ pip install floweaver
If you use Jupyter notebooks – a good way to get started – you will also want to install ipysankeywidget, an IPython widget to interactively display Sankey diagrams:
$ pip install ipysankeywidget
$ jupyter nbextension enable --py --sys-prefix ipysankeywidget
Note
If this is the first time you have installed IPython widgets, you also need to make sure they are enabled:
$ jupyter nbextension enable --py --sys-prefix widgetsnbextension
If you use multiple virtualenvs or conda environments, make sure
ipywidgets
and ipysankeywidget
are installed and enabled in both the
environment running the notebook server and the kernel.
Install on Windows¶
Floweaver requries the latest version of Python to be installed. This can be done by installing the Anaconda platform from Link here .
The procedure described in section Installation should be performed in the Anaconda Prompt, which can be found among the installed programs.
To open Jupyter Notebook and begin to work on the Sankey. Write in the Anaconda Prompt the following
$ jupyter notebook
Install on macOS¶
Floweaver requries the latest version of Python to be installed. This can be done by installing the Anaconda platform from Link here .
The procedure described in section Installation should be performed in the Command Line
To open Jupyter Notebook and begin to work on the Sankey. Write in the Command Line the following
$ jupyter notebook
[not sure about this :D]
Changelog¶
v2.0.0 (renamed to floWeaver)¶
- sankeyview is now called floWeaver!
- There is a new top-level interface to creating a Sankey diagram, the
floweaver.weave()
function. This gives more flexibility about the appearance of the diagram, and lets you save the results in different formats (other than showing directly in the Jupyter notebook), while still being simple to use for the most common cases. - No longer any need for
from sankeyview.jupyter import show_sankey
; usefloweaver.weave()
instead. - New way to specify link colours using
floweaver.CategoricalScale
andfloweaver.QuantitativeScale
, replacinghue
and related arguments toshow_sankey
. See Colour-intensity scales for examples.
Then the tutorials introduce the concepts used to generate and manipulate Sankey diagrams:
Quickstart tutorial¶
This tutorial will go through the basic ways to use floweaver
to
process and transform data into many different Sankey diagrams.
If you are reading the static documentation, you can also try an interactive version of this tutorial online using MyBinder
Let’s start by making a really simple dataset. Imagine we have some farms, which grow apples and bananas to sell to a few different customers. We can describe the flow of fruit from the farms (the source of the flow) to the customers (the target of the flow):
In [1]:
import pandas as pd
flows = pd.read_csv('simple_fruit_sales.csv')
flows
Out[1]:
source | target | type | value | |
---|---|---|---|---|
0 | farm1 | Mary | apples | 5 |
1 | farm1 | James | apples | 3 |
2 | farm2 | Fred | apples | 10 |
3 | farm2 | Fred | bananas | 10 |
4 | farm2 | Susan | bananas | 5 |
5 | farm3 | Susan | apples | 10 |
6 | farm4 | Susan | bananas | 1 |
7 | farm5 | Susan | bananas | 1 |
8 | farm6 | Susan | bananas | 1 |
Drawn directly as a Sankey diagram, this data would look something like this:
In [2]:
from ipysankeywidget import SankeyWidget
SankeyWidget(links=flows.to_dict('records'))
But you don’t always want a direct correspondence between the flows in your data and the links that you see in the Sankey diagram. For example:
- Farms 4, 5 and 6 are all pretty small, and to make the diagram clearer we might want to group them in an “other” category.
- The flows of apples are mixed in with the flows of bananas – we might want to group the kinds of fruit together to make them easier to compare
- We might want to group farms or customers based on some other attributes – to see difference between genders, locations, or organic/non-organic farms, say.
This introduction shows how to use floweaver
to do some of these for
this simple example, in the simplest possible way. Later tutorials will
show how to use it on real data, and more efficient ways to do the same
things.
Basic diagram¶
Let’s start with the first example: grouping farms 4, 5 and 6 into an
“other” category. floweaver
works by setting up a “Sankey diagram
definition” which describes the structure of the diagram we want to see.
In this case, we need to set up some groups:
In [3]:
from floweaver import *
# Set the default size to fit the documentation better.
size = dict(width=570, height=300)
nodes = {
'farms': ProcessGroup(['farm1', 'farm2', 'farm3',
'farm4', 'farm5', 'farm6']),
'customers': ProcessGroup(['James', 'Mary', 'Fred', 'Susan']),
}
We need to describe roughly how these groups should be placed in the final diagram by defining an “ordering” – a list of vertical slices, each containing a list of node ids:
In [4]:
ordering = [
['farms'], # put "farms" on the left...
['customers'], # ... and "customers" on the right.
]
And we also need to say which connections should appear in the diagram (sometimes you don’t want to actually see all the connections). This is called a “bundle” because it bundles up multiple flows – in this case all of them.
In [5]:
bundles = [
Bundle('farms', 'customers'),
]
Putting that together into a Sankey diagram definition (SDD) and applying it to the data gives this result:
In [6]:
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)
That’s not very useful. What’s happened? Every farm and every customer has been lumped together into one group. To get the picture we want – like the first one, but with an “other” group containing farms 4, 5 and 6 – we need to partition the groups:
In [7]:
# The first argument is the dimension name -- for now we're using
# "process" to group by process ids. The second argument is a list
# of groups.
farms_with_other = Partition.Simple('process', [
'farm1', # the groups within the partition can be a single id...
'farm2',
'farm3',
('other', ['farm4', 'farm5', 'farm6']), # ... or a group
])
# This is another partition.
customers_by_name = Partition.Simple('process', [
'James', 'Mary', 'Fred', 'Susan'
])
# Update the ProcessGroup nodes to use the partitions
nodes['farms'].partition = farms_with_other
nodes['customers'].partition = customers_by_name
# New Sankey!
weave(sdd, flows).to_widget(**size)
That’s better: now the farms are split up appropriately with an “other” category, and the customers are shown separately as well. We don’t have to stop there – what about showing sales to men and women?
In [8]:
customers_by_gender = Partition.Simple('process', [
('Men', ['Fred', 'James']),
('Women', ['Susan', 'Mary']),
])
nodes['customers'].partition = customers_by_gender
weave(sdd, flows).to_widget(**size).auto_save_png('quickstart_example1.png')
Distinguishing flow types¶
These diagrams have lost sight of the kind of fruit that is actually
being sold – are the men buying apples, bananas or both from farm1? To
show this we need to split up the flows in the diagram based on their
type. Just like we split up the ProcessGroups
by defining a
partition of processes, we split up flows by defining a partition of
flows.
(While we’re at it let’s choose some colours that look vaguely like apples and bananas)
In [9]:
# Another partition -- but this time the dimension is the "type"
# column of the flows table
fruits_by_type = Partition.Simple('type', ['apples', 'bananas'])
# Set the colours for the labels in the partition.
palette = {'apples': 'yellowgreen', 'bananas': 'gold'}
# New SDD with the flow_partition set
sdd = SankeyDefinition(nodes, bundles, ordering,
flow_partition=fruits_by_type)
weave(sdd, flows, palette=palette).to_widget(**size)
As a last step, it would be nice to label which flows are apples and which are bananas. One way to do this would be to use a legend next to the diagram, or to put labels on every flow. Here, we’ll add a new layer in the middle of the diagram which temporarily groups together the different fruit types on their way from the farms to the customers. This temporary/additional grouping point is called a waypoint.
To add a waypoint, we need to do three things:
- Define it as a node
- Position it in the ordering (between
farms
andcustomers
) - Add it to the bundle
In [10]:
# 1. Define a new waypoint node
nodes['waypoint'] = Waypoint()
# 2. Update the ordering to show where the waypoint goes: in the middle
ordering = [
['farms'],
['waypoint'],
['customers'],
]
# 3. Update the bundle definition to send the flows via the waypoint
bundles = [
Bundle('farms', 'customers', waypoints=['waypoint']),
]
# Update the SDD with the new nodes, ordering & bundles.
sdd = SankeyDefinition(nodes, bundles, ordering,
flow_partition=fruits_by_type)
weave(sdd, flows, palette=palette).to_widget(**size)
That’s not yet very useful. Just like above, the default for Waypoints is to group everything togeter. We need to set a partition on the waypoint to split apart apples and bananas:
In [11]:
# Redefine the waypoint with a partition (same one defined above)
nodes['waypoint'] = Waypoint(fruits_by_type)
weave(sdd, flows, palette=palette).to_widget(**size)
Summary¶
This has demonstrated the basic usage of floweaver
: defining
ProcessGroup
s, Waypoint
s, Partition
s, and
Bundle
s. If you are reading the interactive version, why not go
back and try out some different ways to present the data? Here are some
suggestions:
- Farms 1, 3 and 5 are organic. Can you change the farm Partition to show two groups, organic and non-organic?
- What happens if you remove
"farm1"
from the original definition of thefarms
ProcessGroup
? (Hint: those apples that James and Mary are eating have to come from somewhere – so they are shown as coming from “elsewhere”. See later tutorial on moving the system boundary)
If you are reading the static documentation, you can easily experiment with editing and rerunning this tutorial online using MyBinder, or download it to run on your computer from GitHub.
Dimension tables: efficiently adding details of processes and flows¶
In the Quickstart tutorial we saw how to draw some simple Sankey diagrams and partition them in different ways, such as this:

But to do the grouping on the right-hand side we had to explicitly list which people were “Men” and which were “Women”, using a partition like this:
customers_by_gender = Partition.Simple('process', [
('Men', ['Fred', 'James']),
('Women', ['Susan', 'Mary']),
])
We can show this type of information more efficiently – and with less code – by using dimension tables.
Dimension tables¶
The table we’ve seen before is a flow fact table – it lists basic information about each flow:
- source: where the flow comes from
- target: where the flow goes to
- type or material: what is flowing
- value: the size (in tonnes, GJ, £ etc) of the flow
An example of this type of table is shown at the top right of this diagram:

The dimension tables add extra information about the source/target and type of the flows (the diagram above also shows extra information about the time period the flow relates to, but we’re not worrying about time in this tutorial). For example, “farm2” has a location attribute set to “Cambridge”.
This tutorial will show how to use dimension tables in floweaver.
In [1]:
# Load the same data used in the quickstart tutorial
import pandas as pd
flows = pd.read_csv('simple_fruit_sales.csv')
flows
Out[1]:
source | target | type | value | |
---|---|---|---|---|
0 | farm1 | Mary | apples | 5 |
1 | farm1 | James | apples | 3 |
2 | farm2 | Fred | apples | 10 |
3 | farm2 | Fred | bananas | 10 |
4 | farm2 | Susan | bananas | 5 |
5 | farm3 | Susan | apples | 10 |
6 | farm4 | Susan | bananas | 1 |
7 | farm5 | Susan | bananas | 1 |
8 | farm6 | Susan | bananas | 1 |
In [2]:
# Load another table giving extra information about the
# farms and customers. `index_col` says the first column
# can be used to lookup rows.
processes = pd.read_csv('simple_fruit_sales_processes.csv',
index_col=0)
processes
Out[2]:
type | location | organic | sex | |
---|---|---|---|---|
id | ||||
farm1 | farm | Barton | yes | NaN |
farm2 | farm | Barton | yes | NaN |
farm3 | farm | Ely | no | NaN |
farm4 | farm | Ely | yes | NaN |
farm5 | farm | Duxford | no | NaN |
farm6 | farm | Milton | yes | NaN |
Mary | customer | Cambridge | NaN | Women |
James | customer | Milton | NaN | Men |
Fred | customer | Cambridge | NaN | Women |
Susan | customer | Cambridge | NaN | Men |
Each id
in this table matches a source
or target
in the
flows table above. We can use this extra information to build the
Sankey.
In [3]:
# Setup
from floweaver import *
# Set the default size to fit the documentation better.
size = dict(width=570, height=300)
Because we now have two tables (before we only had one so didn’t have to worry) we must put them together into a Dataset:
In [4]:
dataset = Dataset(flows, dim_process=processes)
Now we can use the type
column in the process table to more easily
pick out the relevant processes:
In [5]:
nodes = {
'farms': ProcessGroup('type == "farm"'),
'customers': ProcessGroup('type == "customer"'),
}
Compare this to how the same thing was written in the Quickstart:
nodes = {
'farms': ProcessGroup(['farm1', 'farm2', 'farm3',
'farm4', 'farm5', 'farm6']),
'customers': ProcessGroup(['James', 'Mary', 'Fred', 'Susan']),
}
Because we already know from the process dimension table that James,
Mary, Fred and Susan are “customers”, we don’t have to list them all by
name in the ProcessGroup definition – we can write the query
type == "customer"
instead.
Note
See the API Documentation for floweaver.ProcessGroup
for more details.
The rest of the Sankey diagram definition is the same as before:
In [6]:
ordering = [
['farms'], # put "farms" on the left...
['customers'], # ... and "customers" on the right.
]
bundles = [
Bundle('farms', 'customers'),
]
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, dataset).to_widget(**size)
Again, we need to set the partition on the ProcessGroups to see something interesting. Here again, we can use the process dimension table to make this easier:
In [7]:
# Create a Partition which splits based on the `sex` column
# of the dimension table
customers_by_gender = Partition.Simple('process.sex',
['Men', 'Women'])
nodes['customers'].partition = customers_by_gender
weave(sdd, dataset).to_widget(**size)
For reference, this is what we wrote before in the Quickstart:
customers_by_gender = Partition.Simple('process', [
('Men', ['Fred', 'James']),
('Women', ['Susan', 'Mary']),
])
And we can use other columns of the dimension table to set other partitions:
In [8]:
farms_by_organic = Partition.Simple('process.organic', ['yes', 'no'])
nodes['farms'].partition = farms_by_organic
weave(sdd, dataset).to_widget(**size)
Finally, a tip for doing quick exploration of the data with partitions:
you can automatically get a Partition which includes all the values that
actually occur in your dataset using the dataset.partition
method:
In [9]:
# This is the logical thing to write but
# it doesn't actually work at the moment :(
# nodes['farms'].partition = dataset.partition('process.organic')
# It works with 'source.organic'... we can explain later
nodes['farms'].partition = dataset.partition('source.organic')
# This should be the same as before
weave(sdd, dataset).to_widget(**size)
Summary¶
The process dimension table adds extra information about each process. You can use this extra information to:
- Pick out the processes you want to include in a ProcessGroup (selection); and
- Split apart groups of processes based on different attributes (partitions).
Things to try:
- Make a diagram showing the locations of farms on the left and the locations of customers on the right
System boundaries¶
Often we don’t want to show all of the data in one Sankey diagram: you focus on one part of the system. But we still want conservation of mass (or whatever is being shown in the diagram) to work, so we end up with flows to & from “elsewhere”. These can also be thought of as imports and exports.
Let’s start by recreating the Quickstart example:
In [1]:
import pandas as pd
flows = pd.read_csv('simple_fruit_sales.csv')
In [2]:
from floweaver import *
# Set the default size to fit the documentation better.
size = dict(width=570, height=300)
# Same partitions as the Quickstart tutorial
farms_with_other = Partition.Simple('process', [
'farm1',
'farm2',
'farm3',
('other', ['farm4', 'farm5', 'farm6']),
])
customers_by_name = Partition.Simple('process', [
'James', 'Mary', 'Fred', 'Susan'
])
# Define the nodes, this time setting the partition from the start
nodes = {
'farms': ProcessGroup(['farm1', 'farm2', 'farm3',
'farm4', 'farm5', 'farm6'],
partition=farms_with_other),
'customers': ProcessGroup(['James', 'Mary', 'Fred', 'Susan'],
partition=customers_by_name),
}
# Ordering and bundles as before
ordering = [
['farms'], # put "farms" on the left...
['customers'], # ... and "customers" on the right.
]
bundles = [
Bundle('farms', 'customers'),
]
In [3]:
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)
What happens if we remove farm2
from the ProcessGroup?
In [4]:
nodes['farms'].selection = [
'farm1', 'farm3', 'farm4', 'farm5', 'farm6'
]
weave(sdd, flows).to_widget(**size)
The flow is still there! But it is labelled with a little arrow to show that it is coming “from elsewhere”. This is important because we are still showing Susan and Fred in the diagram, and they get fruit from farm2. If we didn’t show those flows, Susan’s and Fred’s inputs and outputs would not balance.
Try now removing Susan and Fred from the diagram:
In [5]:
nodes['customers'].selection = ['James', 'Mary']
weave(sdd, flows).to_widget(**size)
Now they’re gone, we no longer see the incoming flows from farm2
.
But we see some outgoing flows “to elsewhere” from farm3
and the
other
group. This is because farm3
is within the system boundary
– it is shown in the diagram – so its output flow has to go somewhere.
Controlling Elsewhere flows¶
These flows are added automatically to make sure that mass is conserved, but because they are automatic, we have little control over them. By explicitly adding a flow to or from Elsewhere to the diagram, we can control where they appear and what they look like.
To do this, add a Waypoint for the outgoing flows to ‘pass through’ on their way across the system boundary:
In [6]:
# Define a new Waypoint
nodes['exports'] = Waypoint(title='exports here')
# Update the ordering to include the waypoint
ordering = [
['farms'], # put "farms" on the left...
['customers', 'exports'], # ... and "exports" below "customers"
] # on the right.
# Add a new bundle from "farms" to Elsewhere, via the waypoint
bundles = [
Bundle('farms', 'customers'),
Bundle('farms', Elsewhere, waypoints=['exports']),
]
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)
This is pretty similar to what we had already, but now the waypoint is
explicitly listed as part of the SankeyDefinition
, we have more
control over it.
For example, we can put the exports above James and Mary by changing the ordering:
In [7]:
ordering = [
['farms'],
['exports', 'customers'],
]
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)
Or we can partition the exports Waypoint to show how much of it is apples and bananas:
In [8]:
fruits_by_type = Partition.Simple('type', ['apples', 'bananas'])
nodes['exports'].partition = fruits_by_type
weave(sdd, flows).to_widget(**size)
Horizontal bands¶
Often, import/exports and loss flows are shown in a separate horizontal
“band” either above or below the main flows. We can do this by modifying
the ordering
a little bit.
The ordering
style we have used so far looks like this:
ordering = [
[list of nodes in layer 1], # left-hand side
[list of nodes in layer 2],
...
[list of nodes in layer N], # right-hand side
]
But we can add another layer of nesting to make it look like this:
ordering = [
# |top band| |bottom band|
[ [........], [...........] ], # left-hand side
[ [........], [...........] ],
...
[ [........], [...........] ], # right-hand side
]
Here’s an example:
In [9]:
ordering = [
[[], ['farms' ]],
[['exports'], ['customers']],
]
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)
Summary¶
- All the flows to/from a ProcessGroup are shown, even if the other end of the flow is outside the system boundary (i.e. not part of any ProcessGroup)
- You can control the automatic flows by explicitly adding Bundles
to/from
Elsewhere
with aWaypoint
- The
ordering
can contain horizontal bands
Colour-intensity scales¶
In this tutorial we will look at how to use colours in the Sankey diagram. We have already seen how to use a palette, but in this tutorial we will also create a Sankey where the intensity of the colour is proportional to a numerical value.
First step is to import all the requried packages and data:
In [1]:
import pandas as pd
import numpy as np
from floweaver import *
df1 = pd.read_csv('holiday_data.csv')
Now take a look at the dataset we are using. This is a very insightful [made-up] dataset about how differnt types of people lose weight while on holiday enjoying themselves.
In [2]:
dataset = Dataset(df1)
df1
Out[2]:
source | target | Calories Burnt | Enjoyment | Employment Job | Activity | |
---|---|---|---|---|---|---|
0 | Activity | Employment Job | 2.5 | 35 | Student | Reading |
1 | Activity | Employment Job | 4.5 | 20 | Student | Swimming |
2 | Activity | Employment Job | 8.0 | 5 | Student | Sleeping |
3 | Activity | Employment Job | 1.0 | 5 | Student | Travelling |
4 | Activity | Employment Job | 8.0 | 30 | Student | Working out |
5 | Activity | Employment Job | 1.0 | 35 | Trainee | Reading |
6 | Activity | Employment Job | 3.0 | 40 | Trainee | Travelling |
7 | Activity | Employment Job | 2.0 | 40 | Trainee | Swimming |
8 | Activity | Employment Job | 6.0 | 5 | Trainee | Sleeping |
9 | Activity | Employment Job | 12.0 | 45 | Trainee | Working out |
10 | Activity | Employment Job | 4.5 | 20 | Administrator | Swimming |
11 | Activity | Employment Job | 9.0 | 10 | Administrator | Sleeping |
12 | Activity | Employment Job | 7.5 | 50 | Administrator | Working out |
13 | Activity | Employment Job | 1.5 | 35 | Administrator | Reading |
14 | Activity | Employment Job | 1.5 | 50 | Administrator | Travelling |
15 | Activity | Employment Job | 11.0 | 55 | Manager | Working out |
16 | Activity | Employment Job | 2.0 | 45 | Manager | Reading |
17 | Activity | Employment Job | 7.5 | 10 | Manager | Sleeping |
18 | Activity | Employment Job | 1.5 | 90 | Manager | Travelling |
19 | Activity | Employment Job | 2.0 | 40 | Manager | Swimming |
20 | Activity | Employment Job | 3.0 | 35 | Pensioner | Reading |
21 | Activity | Employment Job | 9.0 | 15 | Pensioner | Swimming |
22 | Activity | Employment Job | 9.0 | 15 | Pensioner | Sleeping |
23 | Activity | Employment Job | 3.0 | 60 | Pensioner | Travelling |
24 | Activity | Employment Job | 0.0 | 0 | Pensioner | Working out |
We now define the partitions of the data. Rather than listing the
categories by hand, we use np.unique
to pick out a list of the
unique values that occur in the dataset.
In [3]:
partition_job = Partition.Simple('Employment Job', np.unique(df1['Employment Job']))
partition_activity = Partition.Simple('Activity', np.unique(df1['Activity']))
In fact, this is pretty common so there is a built-in function to do this:
In [4]:
# these statements or the ones above do the same thing
partition_job = dataset.partition('Employment Job')
partition_activity = dataset.partition('Activity')
We then go on to define the structure of our sankey. We define nodes, bundles and the order. In this case its pretty straightforward:
In [5]:
nodes = {
'Activity': ProcessGroup(['Activity'], partition_activity),
'Job': ProcessGroup(['Employment Job'], partition_job),
}
bundles = [
Bundle('Activity', 'Job'),
]
ordering = [
['Activity'],
['Job'],
]
Now we will plot a Sankey that shows the share of time dedicated to each activity by each type of person.
In [6]:
# These are the same each time, so just write them here once
size_options = dict(width=500, height=400,
margins=dict(left=100, right=100))
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, dataset, measures='Calories Burnt').to_widget(**size_options)
We can start using colour by specifying that we want to partition the flows according to type of person. Notice that this time we are using a pre-determined palette.
You can find all sorts of palettes listed here.
In [7]:
sdd = SankeyDefinition(nodes, bundles, ordering, flow_partition=partition_job)
weave(sdd, dataset, palette='Set2_8', measures='Calories Burnt').to_widget(**size_options)
Now, if we want to make the colour of the flow to be proprtional to a
numerical value. Use the hue
parameter to set the name of the
variable that you want to display in colour. To start off, let’s use
“value”, which is the width of the lines: wider lines will be shown in a
darker colour.
In [8]:
weave(sdd, dataset, link_color=QuantitativeScale('Calories Burnt'), measures='Calories Burnt').to_widget(**size_options)
It’s more interesting to use colour to show a different attribute from the flow table. But because a line in the Sankey diagram is an aggregation of multiple flows in the original data, we need to specify how the new dimension will be aggregated. For example, we’ll use the mean of the flows within each Sankey link to set the colour. In this case we will use the colour to show how much each type of person emjoys each activity. We can be interested in either the cumulative enjoyment, or the mean enjoyment: try both!
Aggregation is specified with the ameasures
parameter, which should
be set to a dictionary mapping dimension names to aggregation functions
('mean'
, 'sum'
etc).
In [9]:
weave(sdd, dataset, measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, link_width='Calories Burnt',
link_color=QuantitativeScale('Enjoyment')).to_widget(**size_options)
In [10]:
weave(sdd, dataset, measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, link_width='Calories Burnt',
link_color=QuantitativeScale('Enjoyment', intensity='Calories Burnt')).to_widget(**size_options)
/home/rick/ownCloud/devel/sankey-view/floweaver/color_scales.py:114: RuntimeWarning: invalid value encountered in true_divide
value /= measures[self.intensity]
You can change the colour palette using the palette
attribute. The
palette names are different from before, because those were
categorical (or qualitative) scales, and this is now a sequential
scale. The palette names are listed
here.
In [11]:
scale = QuantitativeScale('Enjoyment', palette='Blues_9')
weave(sdd, dataset,
measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'},
link_width='Calories Burnt',
link_color=scale) \
.to_widget(**size_options)
In [12]:
scale.domain
Out[12]:
(0, 90)
It is possible to create a colorbar / scale to show the range of intensity values, but it’s not currently as easy as it should be. This should be improved in future.
API Documentation¶
Sankey diagram definitions¶
Sankey diagram definitions (SDDs) describe the structure of the Sankey diagram you want to end up with. They are declarative: you declare what you want up front, but the diagram isn’t created until later. This is useful if you want to use the same diagram structure for different data sources.
-
class
SankeyDefinition
(nodes, bundles, ordering, flow_selection=None, flow_partition=None, time_partition=None)[source]¶
-
class
ProcessGroup
(selection=None, partition=None, direction='R', title=None)[source]¶ A ProcessGroup represents a group of processes from the underlying dataset.
The processes to include are defined by the selection. By default they are all lumped into one node in the diagram, but by defining a partition this can be controlled.
-
selection
¶ list or string – If a list of strings, they are taken as process ids. If a single string, it is taken as a Pandas query string run against the process table.
-
partition
¶ Partition, optional – Defines how to split the ProcessGroup into subgroups.
-
direction
¶ ‘R’ or ‘L’ – Direction of flow, default ‘R’ (left-to-right).
-
title
¶ string, optional – Label for the ProcessGroup. If not set, the ProcessGroup id will be used.
-
-
class
Waypoint
(partition=None, direction='R', title=None)[source]¶ A Waypoint represents a control point along a
Bundle
of flows.There are two reasons to define Waypoints: to control the routing of
Bundle
s of flows through the diagram, and to split flows according to some attributes by setting a partition.-
partition
¶ Partition, optional – Defines how to split the Waypoint into subgroups.
-
direction
¶ ‘R’ or ‘L’ – Direction of flow, default ‘R’ (left-to-right).
-
title
¶ string, optional – Label for the Waypoint. If not set, the Waypoint id will be used.
-
-
class
Bundle
(source, target, waypoints=NOTHING, flow_selection=None, flow_partition=None, default_partition=None)[source]¶ A Bundle represents a set of flows between two :class:`ProcessGroup`s.
-
source
¶ string – The id of the
ProcessGroup
at the start of the Bundle.
-
target
¶ string – The id of the
ProcessGroup
at the end of the Bundle.
-
waypoints
¶ list of strings – Optional list of ids of :class:`Waypoint`s the Bundle should pass through.
-
flow_selection
¶ string, optional – Query string to filter the flows included in this Bundle.
-
flow_partition
¶ Partition, optional – Defines how to split the flows in the Bundle into sub-flows. Often you want the same Partition for all the Bundles in the diagram, see
SankeyDefinition.flow_partition
.
-
default_partition
¶ Partition, optional – Defines the Partition applied to any Waypoints automatically added to route the Bundle across layers of the diagram.
-
Weaving the Sankey diagram¶
The weave()
function actually creates a Sankey diagram from the Sankey diagram definitions
and a Datasets.
Contributing¶
Contributions are very welcome.
Contributing to floWeaver¶
Contributions are welcome! Please get in touch via email or creating a GitHub issue with any questions.
Documentation¶
These are draft guidelines for getting started contributing to the documentation on Windows. Improvements are welcome, or get in touch if you need better instructions.
Required software: Anaconda, Github Desktop App.
- Install pandoc package.
- Clone Github Repository using the following URL: https://github.com/ricklupton/floweaver.git
Modify Content. The content is kept in the
/docs
directory. Each page is saved as a text file formatted in reStructured Text.Save Modifications. To save the changes made to the content, open the Anaconda Prompt, go to the
/floweaver/docs
directory and runmake.bat html
Citing floweaver¶
If floweaver has been significant in a project that leads to a publication, please acknowledge that by citing the paper linked above:
- C. Lupton and J. M. Allwood, ‘Hybrid Sankey diagrams: Visual analysis of multidimensional data for understanding resource use’, Resources, Conservation and Recycling, vol. 124, pp. 141–151, Sep. 2017. DOI: 10.1016/j.resconrec.2017.05.002