EMA Workbench documentation¶
- Release
2.2
- Date
Sep 20, 2022
Exploratory Modelling and Analysis (EMA) Workbench¶
Exploratory Modeling and Analysis (EMA) is a research methodology that uses computational experiments to analyze complex and uncertain systems (Bankes, 1993). That is, exploratory modeling aims at offering computational decision support for decision making under deep uncertainty and Robust decision making.
The EMA workbench aims at providing support for performing exploratory modeling with models developed in various modelling packages and environments. Currently, the workbench offers connectors to Vensim, Netlogo, and Excel.
The EMA workbench offers support for designing experiments, performing the experiments - including support for parallel processing on both a single machine as well as on clusters-, and analysing the results. To get started, take a look at the high level overview, the tutorial, or dive straight into the details of the API.
A High Level Overview¶
Exploratory modeling framework¶
The core package contains the core functionality for setting up, designing, and performing series of computational experiments on one or more models simultaneously.
Model (
ema_workbench.em_framework.model
): an abstract base class for specifying the interface to the model on which you want to perform exploratory modeling.Samplers (
ema_workbench.em_framework.samplers
): the various sampling techniques that are readily available in the workbench.Uncertainties (
ema_workbench.em_framework.parameters
): various types of parameter classes that can be used to specify the uncertainties and/or levers on the modelOutcomes (
ema_workbench.em_framework.outcomes
): various types of outcome classesEvaluators (
ema_workbench.em_framework.evaluators
): various evaluators for running experiments in sequence or in parallel.
Connectors¶
The connectors package contains connectors to some existing simulation modeling environments. For each of these, a standard ModelStructureInterface class is provided that users can use as a starting point for specifying the interface to their own model.
Vensim connector (
ema_workbench.connectors.vensim
): This enables controlling (e.g. setting parameters, simulation setup, run, get output, etc .) a simulation model that is built in Vensim software, and conducting an EMA study based on this model.Excel connector (
ema_workbench.connectors.excel
): This enables controlling models build in Excel.NetLogo connector (
ema_workbench.connectors.netlogo
): This enables controlling (e.g. setting parameters, simulation setup, run, get output, etc .) a simulation model that is built in NetLogo software, and conducting an EMA study based on this model.Simio connector (
ema_workbench.connectors.simio_connector
): This enables controlling models built in SimioPysd connector (
ema_workbench.connectors.pysd_connector
)
Analysis¶
The analysis package contains a variety of analysis and visualization techniques for analyzing the results from the exploratory modeling. The analysis scripts are tailored for use in combination with the workbench, but they can also be used on their own with data generated in some other manner.
Patient Rule Induction Method (
ema_workbench.analysis.prim
)Classification Trees (
ema_workbench.analysis.cart
)Logistic Regression (
ema_workbench.analysis.logistic_regression
)Dimensional Stacking (
ema_workbench.analysis.dimensional_stacking
)Feature Scoring (
ema_workbench.analysis.feature_scoring
)Regional Sensitivity Analysis (
ema_workbench.analysis.regional_sa
)various plotting functions for time series data (
ema_workbench.analysis.plotting
)pair wise plots (
ema_workbench.analysis.pairs_plotting
)parallel coordinate plots (
ema_workbench.analysis.parcoords
)support for converting figures to black and white (
ema_workbench.analysis.b_an_w_plotting
)
Installing the workbench¶
The 2.x version of the workbench requires Python 3.8 or higher.
A stable version of the workbench can be installed via pip.
pip install ema_workbench
The latest commit on the master branch can be installed with:
pip install -e git+https://github.com/quaquel/EMAworkbench #egg=ema-workbench
Or any other (development) branch on this repo or your own fork:
pip install -e git+https://github.com/YOUR_FORK/EMAworkbench@YOUR_BRANCH #egg=ema-workbench
The code is also available from github. The code comes with a requirements.txt file that indicates the key dependencies. Basically, if you have a standard scientific computing distribution for Python such as the Anaconda distribution, most of the dependencies will already be met.
In addition to the libraries available in the default Anaconda distribution, there are various optional dependencies. Please follow the installation instructions for each of these libraries.
From conda or conda-forge:
altair for interactive visualizations
ipyparallel for support of interactive multiprocessing within the jupyter notebook.
conda install altair ipyparallel
There are also some pip based dependencies:
SALib , this is a necessary dependency for advanced senstivity analysis.
platypus-opt , this is an optional dependency for many-objective optimization functionality.
pydot and Graphviz for some of the visualizations.
pip install SALib platypus-opt pydot
The various connectors have their own specific requirements.
Vensim only works on Windows. If you have 64 bit vensim, you need 64 bit Python. If you have 32 bit vensim, you will need 32 bit Python.
Excel also only works on Windows.
pysd optional for simple vensim models
pythonnet for Simio
conda install jpype1
pip install pynetlogo pysd pythonnet
Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
2.2.0¶
Highlights¶
With the 2.2 release, the EMAworkbench can now connect to Vadere models on pedestrian dynamics. When inspecting a Prim Box peeling trajectory, multiple points on the peeling trajectory can be inspected simultaneously by inputting a list of integers into PrimBox.inspect()
.
When running experiments with multiprocessing using the MultiprocessingEvaluator
, the number of processes can now be controlled using a negative integer as input for n_processes
(for example, -2
on a 12-thread CPU results in 10 threads used). Also, it will now default to max. 61 processes on windows machines due to limitations inherent in Windows in dealing with higher processer counts. Code quality, CI, and error reporting also have been improved. And finally, generating these release notes is now automated.
What’s Changed¶
🎉 New features added¶
Vadere model connector by @floristevito in https://github.com/quaquel/EMAworkbench/pull/145
🛠 Enhancements made¶
Improve code quality with static analysis by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/119
prim.py: Make
PrimBox.peeling_trajectory["id"]
int instead of float by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/121analysis: Allow optional annotation of plot_tradeoff graphs by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/123
evaluators.py: Allow MultiprocessingEvaluator to initialize with cpu_count minus N processes by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/140
PrimBox.inspect()
now can also take a list of integers (aside from a single int) to inspect multiple points at once by @quaquel in https://github.com/quaquel/EMAworkbench/commit/6d83a6c33442ad4dce0974a384b03a225aaf830d (see also issue https://github.com/quaquel/EMAworkbench/issues/124)
🐛 Bugs fixed¶
fixed typo in lake_model.py by @JeffreyDillonLyons in https://github.com/quaquel/EMAworkbench/pull/136
📜 Documentation improvements¶
Docs: Installation.rst: Add how to install master or custom branch by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/122
Docs: Replace all http links with secure https URLs by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/134
Maintain release notes at CHANGELOG.md and include them in Readthedocs by @quaquel in https://github.com/quaquel/EMAworkbench/commit/ebdbc9f5c77693fc75911ead472b420065dfe2aa
Fix badge links in readme by @quaquel in https://github.com/quaquel/EMAworkbench/commit/28569bdcb149c070c329589969179be354b879ec
🔧 Maintenance¶
feature_scoring: fix Regressor criterion depreciation by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/125
feature_scoring.py: Change
max_features
in get_rf_feature_scores to"sqrt"
by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/129CI: Use Pytest instead of Nose, update default build to Python 3.10 by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/131
Release CI: Only upload packages if on main repo by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/132
CI: Split off flake8 linting in a separate job by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/133
CI: Add weekly scheduled jobs and manual trigger by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/137
setup.py: Add
project_urls
for documentation and issue tracker links by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/142set scikit-learn requirement >= 1.0.0 by @rhysits in https://github.com/quaquel/EMAworkbench/pull/144
Create release.yml file for automatic release notes generation by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/152
instantiating an Evaluator without one or more AbstractModel instances now raises a type error by @quaquel in https://github.com/quaquel/EMAworkbench/commit/a83533aa8166ca2414137cdfc3125a53ee3697ec
removes depreciated DataFrame.append by replacing it with DataFrame.concat (see the conversation on issue https://github.com/quaquel/EMAworkbench/issues/126):
from feature scoring by @quaquel in https://github.com/quaquel/EMAworkbench/commit/8b8bfe41733e49b75c01e34b75563e0a6d5b4024
from logistic_regression.py by @quaquel in https://github.com/quaquel/EMAworkbench/commit/255e3d6d9639dfe6fd4e797e1c63d59ba0522c2d
removes NumPy datatypes deprecated in 1.20 by @quaquel in https://github.com/quaquel/EMAworkbench/commit/e8fbf6fc64f14b7c7220fa4d3fc976c42d3757eb
replace deprecated scipy.stats.kde with scipy.stats by @quaquel in https://github.com/quaquel/EMAworkbench/commit/b5a9ca967740e74d503281018e88d6b28e74a27d
New Contributors¶
@JeffreyDillonLyons made their first contribution in https://github.com/quaquel/EMAworkbench/pull/136
@rhysits made their first contribution in https://github.com/quaquel/EMAworkbench/pull/144
@floristevito made their first contribution in https://github.com/quaquel/EMAworkbench/pull/145
Full Changelog: https://github.com/quaquel/EMAworkbench/compare/2.1.2…2.2.0
Tutorials¶
The code of these examples can be found in the examples package. The first three examples are meant to illustrate the basics of the EMA workbench. How to implement a model, specify its uncertainties and outcomes, and run it. The fourth example is a more extensive illustration based on Pruyt & Hamarat (2010). It shows some more advanced possibilities of the EMA workbench, including one way of handling policies.
A simple model in Python¶
The simplest case is where we have a model available through a python function. For example, imagine we have the simple model.
def some_model(x1=None, x2=None, x3=None):
return {'y':x1*x2+x3}
In order to control this model from the workbench, we can make use of the
Model
. We can instantiate a model
object, by passing it a name, and the function.
model = Model('simpleModel', function=some_model) #instantiate the model
Next, we need to specify the uncertainties and the outcomes of the model. In this case, the uncertainties are x1, x2, and x3, while the outcome is y. Both uncertainties and outcomes are attributes of the model object, so we can say
1#specify uncertainties
2model.uncertainties = [RealParameter("x1", 0.1, 10),
3 RealParameter("x2", -0.01,0.01),
4 RealParameter("x3", -0.01,0.01)]
5#specify outcomes
6model.outcomes = [ScalarOutcome('y')]
Here, we specify that x1 is some value between 0.1, and 10, while both x2 and x3 are somewhere between -0.01 and 0.01. Having implemented this model, we can now investigate the model behavior over the set of uncertainties by simply calling
results = perform_experiments(model, 100)
The function perform_experiments()
takes the model we just specified
and will execute 100 experiments. By default, these experiments are generated
using a Latin Hypercube sampling, but Monte Carlo sampling and Full factorial
sampling are also readily available. Read the documentation for
perform_experiments()
for more details.
The complete code:
1"""
2Created on 20 dec. 2010
3
4This file illustrated the use the EMA classes for a contrived example
5It's main purpose has been to test the parallel processing functionality
6
7.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
8"""
9from ema_workbench import (
10 Model,
11 RealParameter,
12 ScalarOutcome,
13 ema_logging,
14 perform_experiments,
15)
16
17
18def some_model(x1=None, x2=None, x3=None):
19 return {"y": x1 * x2 + x3}
20
21
22if __name__ == "__main__":
23 ema_logging.LOG_FORMAT = "[%(name)s/%(levelname)s/%(processName)s] %(message)s"
24 ema_logging.log_to_stderr(ema_logging.INFO)
25
26 model = Model("simpleModel", function=some_model) # instantiate the model
27
28 # specify uncertainties
29 model.uncertainties = [
30 RealParameter("x1", 0.1, 10),
31 RealParameter("x2", -0.01, 0.01),
32 RealParameter("x3", -0.01, 0.01),
33 ]
34 # specify outcomes
35 model.outcomes = [ScalarOutcome("y")]
36
37 results = perform_experiments(model, 100)
A simple model in Vensim¶
Imagine we have a very simple Vensim model:

For this example, we assume that ‘x11’ and ‘x12’ are uncertain. The state
variable ‘a’ is the outcome of interest. Similar to the previous example,
we have to first instantiate a vensim model object, in this case
VensimModel
. To this end, we need to
specify the directory in which the vensim file resides, the name of the vensim
file and the name of the model.
wd = r'./models/vensim example'
model = VensimModel("simpleModel", wd=wd, model_file=r'\model.vpm')
Next, we can specify the uncertainties and the outcomes.
1model.uncertainties = [RealParameter("x11", 0, 2.5),
2 RealParameter("x12", -2.5, 2.5)]
3
4
5model.outcomes = [TimeSeriesOutcome('a')]
Note that we are using a TimeSeriesOutcome, because vensim results are time
series. We can now simply run this model by calling
perform_experiments()
.
with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.perform_experiments(1000)
We now use a evaluator, which ensures that the code is executed in parallel.
Is it generally good practice to first run a model a small number of times sequentially prior to running in parallel. In this way, bugs etc. can be spotted more easily. To further help with keeping track of what is going on, it is also good practice to make use of the logging functionality provided by the workbench
ema_logging.log_to_stderr(ema_logging.INFO)
Typically, this line appears at the start of the script. When executing the code, messages on progress or on errors will be shown.
The complete code
1"""
2Created on 3 Jan. 2011
3
4This file illustrated the use the EMA classes for a contrived vensim
5example
6
7
8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
9 chamarat <c.hamarat (at) tudelft (dot) nl>
10"""
11from ema_workbench import (
12 TimeSeriesOutcome,
13 perform_experiments,
14 RealParameter,
15 ema_logging,
16)
17
18from ema_workbench.connectors.vensim import VensimModel
19
20if __name__ == "__main__":
21 # turn on logging
22 ema_logging.log_to_stderr(ema_logging.INFO)
23
24 # instantiate a model
25 wd = "./models/vensim example"
26 vensimModel = VensimModel("simpleModel", wd=wd, model_file="model.vpm")
27 vensimModel.uncertainties = [
28 RealParameter("x11", 0, 2.5),
29 RealParameter("x12", -2.5, 2.5),
30 ]
31
32 vensimModel.outcomes = [TimeSeriesOutcome("a")]
33
34 results = perform_experiments(vensimModel, 1000)
A simple model in Excel¶
In order to perform EMA on an Excel model, one can use the
ExcelModel
. This base
class makes uses of naming cells in Excel to refer to them directly. That is,
we can assume that the names of the uncertainties correspond to named cells
in Excel, and similarly, that the names of the outcomes correspond to named
cells or ranges of cells in Excel. When using this class, make sure that
the decimal separator and thousands separator are set correctly in Excel. This
can be checked via file > options > advanced. These separators should follow
the anglo saxon convention.
1"""
2Created on 27 Jul. 2011
3
4This file illustrated the use the EMA classes for a model in Excel.
5
6It used the excel file provided by
7`A. Sharov <https://home.comcast.net/~sharov/PopEcol/lec10/fullmod.html>`_
8
9This excel file implements a simple predator prey model.
10
11.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
12"""
13from ema_workbench import (
14 RealParameter,
15 TimeSeriesOutcome,
16 ema_logging,
17 perform_experiments,
18)
19
20from ema_workbench.connectors.excel import ExcelModel
21from ema_workbench.em_framework.evaluators import MultiprocessingEvaluator
22
23if __name__ == "__main__":
24 ema_logging.log_to_stderr(level=ema_logging.INFO)
25
26 model = ExcelModel(
27 "predatorPrey", wd="./models/excelModel", model_file="excel example.xlsx"
28 )
29 model.uncertainties = [
30 RealParameter("K2", 0.01, 0.2),
31 # we can refer to a cell in the normal way
32 # we can also use named cells
33 RealParameter("KKK", 450, 550),
34 RealParameter("rP", 0.05, 0.15),
35 RealParameter("aaa", 0.00001, 0.25),
36 RealParameter("tH", 0.45, 0.55),
37 RealParameter("kk", 0.1, 0.3),
38 ]
39
40 # specification of the outcomes
41 model.outcomes = [
42 TimeSeriesOutcome("B4:B1076"),
43 # we can refer to a range in the normal way
44 TimeSeriesOutcome("P_t"),
45 ] # we can also use named range
46
47 # name of the sheet
48 model.default_sheet = "Sheet1"
49
50 with MultiprocessingEvaluator(model) as evaluator:
51 results = perform_experiments(
52 model, 100, reporting_interval=1, evaluator=evaluator
53 )
The example is relatively straight forward. We instantiate an excel model, we
specify the uncertainties and the outcomes. We also need to specify the sheet
in excel on which the model resides. Next we can call
perform_experiments()
.
Warning
when using named cells. Make sure that the names are defined at the sheet level and not at the workbook level
A more elaborate example: Mexican Flu¶
This example is derived from Pruyt & Hamarat (2010). This paper presents a small exploratory System Dynamics model related to the dynamics of the 2009 flu pandemic, also known as the Mexican flu, swine flu, or A(H1N1)v. The model was developed in May 2009 in order to quickly foster understanding about the possible dynamics of this new flu variant and to perform rough-cut policy explorations. Later, the model was also used to further develop and illustrate Exploratory Modelling and Analysis.
Mexican Flu: the basic model¶
In the first days, weeks and months after the first reports about the outbreak of a new flu variant in Mexico and the USA, much remained unknown about the possible dynamics and consequences of the at the time plausible/imminent epidemic/pandemic of the new flu variant, first known as Swine or Mexican flu and known today as Influenza A(H1N1)v.
The exploratory model presented here is small, simple, high-level, data-poor (no complex/special structures nor detailed data beyond crude guestimates), and history-poor.
The modelled world is divided in three regions: the Western World, the densely populated Developing World, and the scarcely populated Developing World. Only the two first regions are included in the model because it is assumed that the scarcely populated regions are causally less important for dynamics of flu pandemics. Below, the figure shows the basic stock-and-flow structure. For a more elaborate description of the model, see Pruyt & Hamarat (2010).

Given the various uncertainties about the exact characteristics of the flu, including its fatality rate, the contact rate, the susceptibility of the population, etc. the flu case is an ideal candidate for EMA. One can use EMA to explore the kinds of dynamics that can occur, identify undesirable dynamic, and develop policies targeted at the undesirable dynamics.
In the original paper, Pruyt & Hamarat (2010). recoded the model in Python and performed the analysis in that way. Here we show how the EMA workbench can be connected to Vensim directly.
The flu model was build in Vensim. We can thus use VensimModelS
as a base class.
We are interested in two outcomes:
deceased population region 1: the total number of deaths over the duration of the simulation.
peak infected fraction: the fraction of the population that is infected.
These are added to self.outcomes
, using the TimeSeriesOutcome class.
The table below is adapted from Pruyt & Hamarat (2010).
It shows the uncertainties, and their bounds. These are added to
self.uncertainties
as ParameterUncertainty
instances.
Parameter |
Lower Limit |
Upper Limit |
---|---|---|
additional seasonal immune population fraction region 1 |
0.0 |
0.5 |
additional seasonal immune population fraction region 2 |
0.0 |
0.5 |
fatality ratio region 1 |
0.0001 |
0.1 |
fatality ratio region 2 |
0.0001 |
0.1 |
initial immune fraction of the population of region 1 |
0.0 |
0.5 |
initial immune fraction of the population of region 2 |
0.0 |
0.5 |
normal interregional contact rate |
0.0 |
0.9 |
permanent immune population fraction region 1 |
0.0 |
0.5 |
permanent immune population fraction region 2 |
0.0 |
0.5 |
recovery time region 1 |
0.2 |
0.8 |
recovery time region 2 |
0.2 |
0.8 |
root contact rate region 1 |
1.0 |
10.0 |
root contact rate region 2 |
1.0 |
10.0 |
infection ratio region 1 |
0.0 |
0.1 |
infection ratio region 2 |
0.0 |
0.1 |
normal contact rate region 1 |
10 |
200 |
normal contact rate region 2 |
10 |
200 |
Together, this results in the following code:
1"""
2Created on 20 May, 2011
3
4This module shows how you can use vensim models directly
5instead of coding the model in Python. The underlying case
6is the same as used in fluExample
7
8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
9 epruyt <e.pruyt (at) tudelft (dot) nl>
10"""
11from ema_workbench import (
12 RealParameter,
13 TimeSeriesOutcome,
14 ema_logging,
15 perform_experiments,
16 MultiprocessingEvaluator,
17)
18
19from ema_workbench.connectors.vensim import VensimModel
20
21if __name__ == "__main__":
22 ema_logging.log_to_stderr(ema_logging.INFO)
23
24 model = VensimModel(
25 "fluCase", wd="./models/flu", model_file="FLUvensimV1basecase.vpm"
26 )
27
28 # outcomes
29 model.outcomes = [
30 TimeSeriesOutcome("deceased population region 1"),
31 TimeSeriesOutcome("infected fraction R1"),
32 ]
33
34 # Plain Parametric Uncertainties
35 model.uncertainties = [
36 RealParameter("additional seasonal immune population fraction R1", 0, 0.5),
37 RealParameter("additional seasonal immune population fraction R2", 0, 0.5),
38 RealParameter("fatality ratio region 1", 0.0001, 0.1),
39 RealParameter("fatality rate region 2", 0.0001, 0.1),
40 RealParameter("initial immune fraction of the population of region 1", 0, 0.5),
41 RealParameter("initial immune fraction of the population of region 2", 0, 0.5),
42 RealParameter("normal interregional contact rate", 0, 0.9),
43 RealParameter("permanent immune population fraction R1", 0, 0.5),
44 RealParameter("permanent immune population fraction R2", 0, 0.5),
45 RealParameter("recovery time region 1", 0.1, 0.75),
46 RealParameter("recovery time region 2", 0.1, 0.75),
47 RealParameter("susceptible to immune population delay time region 1", 0.5, 2),
48 RealParameter("susceptible to immune population delay time region 2", 0.5, 2),
49 RealParameter("root contact rate region 1", 0.01, 5),
50 RealParameter("root contact ratio region 2", 0.01, 5),
51 RealParameter("infection ratio region 1", 0, 0.15),
52 RealParameter("infection rate region 2", 0, 0.15),
53 RealParameter("normal contact rate region 1", 10, 100),
54 RealParameter("normal contact rate region 2", 10, 200),
55 ]
56
57 nr_experiments = 10
58 with MultiprocessingEvaluator(model) as evaluator:
59 results = perform_experiments(model, nr_experiments, evaluator=evaluator)
We have now instantiated the model, specified the uncertain factors and outcomes
and run the model. We now have generated a dataset of results and can proceed to
analyse the results using various analysis scripts. As a first step, one can
look at the individual runs using a line plot using lines()
.
See plotting
for some more visualizations using results from performing
EMA on FluModel
.
1import matplotlib.pyplot as plt
2from ema_workbench.analysis.plotting import lines
3
4figure = lines(results, density=True) #show lines, and end state density
5plt.show() #show figure
generates the following figure:

From this figure, one can deduce that across the ensemble of possible futures, there is a subset of runs with a substantial amount of deaths. We can zoom in on those cases, identify their conditions for occurring, and use this insight for policy design.
For further analysis, it is generally convenient, to generate the results for a series of experiments and save these results. One can then use these saved results in various analysis scripts.
from ema_workbench import save_results
save_results(results, r'./1000 runs.tar.gz')
The above code snippet shows how we can use save_results()
for
saving the results of our experiments. save_results()
stores the as
csv files in a tarbal.
Mexican Flu: policies¶
For this paper, policies were developed by using the system understanding of the analysts.
static policy¶
adaptive policy¶
running the policies¶
In order to be able to run the models with the policies and to compare their results with the no policy case, we need to specify the policies
1#add policies
2policies = [Policy('no policy',
3 model_file=r'/FLUvensimV1basecase.vpm'),
4 Policy('static policy',
5 model_file=r'/FLUvensimV1static.vpm'),
6 Policy('adaptive policy',
7 model_file=r'/FLUvensimV1dynamic.vpm')
8 ]
In this case, we have chosen to have the policies implemented in separate
vensim files. Policies require a name, and can take any other keyword
arguments you like. If the keyword matches an attribute on the model object,
it will be updated, so model_file is an attribute on the vensim model. When
executing the policies, we update this attribute for each policy. We can pass
these policies to perform_experiment()
as an additional keyword
argument
results = perform_experiments(model, 1000, policies=policies)
We can now proceed in the same way as before, and perform a series of experiments. Together, this results in the following code:
1"""
2Created on 20 May, 2011
3
4This module shows how you can use vensim models directly
5instead of coding the model in Python. The underlying case
6is the same as used in fluExample
7
8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
9 epruyt <e.pruyt (at) tudelft (dot) nl>
10"""
11import numpy as np
12
13from ema_workbench import (
14 RealParameter,
15 TimeSeriesOutcome,
16 ema_logging,
17 ScalarOutcome,
18 perform_experiments,
19)
20from ema_workbench.connectors.vensim import VensimModel
21from ema_workbench.em_framework.parameters import Policy
22
23if __name__ == "__main__":
24 ema_logging.log_to_stderr(ema_logging.INFO)
25
26 model = VensimModel(
27 "fluCase", wd=r"./models/flu", model_file=r"FLUvensimV1basecase.vpm"
28 )
29
30 # outcomes
31 model.outcomes = [
32 TimeSeriesOutcome("deceased population region 1"),
33 TimeSeriesOutcome("infected fraction R1"),
34 ScalarOutcome(
35 "max infection fraction",
36 variable_name="infected fraction R1",
37 function=np.max,
38 ),
39 ]
40
41 # Plain Parametric Uncertainties
42 model.uncertainties = [
43 RealParameter("additional seasonal immune population fraction R1", 0, 0.5),
44 RealParameter("additional seasonal immune population fraction R2", 0, 0.5),
45 RealParameter("fatality ratio region 1", 0.0001, 0.1),
46 RealParameter("fatality rate region 2", 0.0001, 0.1),
47 RealParameter("initial immune fraction of the population of region 1", 0, 0.5),
48 RealParameter("initial immune fraction of the population of region 2", 0, 0.5),
49 RealParameter("normal interregional contact rate", 0, 0.9),
50 RealParameter("permanent immune population fraction R1", 0, 0.5),
51 RealParameter("permanent immune population fraction R2", 0, 0.5),
52 RealParameter("recovery time region 1", 0.1, 0.75),
53 RealParameter("recovery time region 2", 0.1, 0.75),
54 RealParameter("susceptible to immune population delay time region 1", 0.5, 2),
55 RealParameter("susceptible to immune population delay time region 2", 0.5, 2),
56 RealParameter("root contact rate region 1", 0.01, 5),
57 RealParameter("root contact ratio region 2", 0.01, 5),
58 RealParameter("infection ratio region 1", 0, 0.15),
59 RealParameter("infection rate region 2", 0, 0.15),
60 RealParameter("normal contact rate region 1", 10, 100),
61 RealParameter("normal contact rate region 2", 10, 200),
62 ]
63
64 # add policies
65 policies = [
66 Policy("no policy", model_file=r"FLUvensimV1basecase.vpm"),
67 Policy("static policy", model_file=r"FLUvensimV1static.vpm"),
68 Policy("adaptive policy", model_file=r"FLUvensimV1dynamic.vpm"),
69 ]
70
71 results = perform_experiments(model, 1000, policies=policies)
comparison of results¶
Using the following script, we reproduce figures similar to the 3D figures
in Pruyt & Hamarat (2010).
But using pairs_scatter()
. It shows for the three different policies
their behavior on the total number of deaths, the height of the heigest peak
of the pandemic, and the point in time at which this peak was reached.
1"""
2Created on 20 sep. 2011
3
4.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
5"""
6import matplotlib.pyplot as plt
7import numpy as np
8
9from ema_workbench import load_results, ema_logging
10from ema_workbench.analysis.pairs_plotting import (
11 pairs_lines,
12 pairs_scatter,
13 pairs_density,
14)
15
16ema_logging.log_to_stderr(level=ema_logging.DEFAULT_LEVEL)
17
18# load the data
19fh = "./data/1000 flu cases no policy.tar.gz"
20experiments, outcomes = load_results(fh)
21
22# transform the results to the required format
23# that is, we want to know the max peak and the casualties at the end of the
24# run
25tr = {}
26
27# get time and remove it from the dict
28time = outcomes.pop("TIME")
29
30for key, value in outcomes.items():
31 if key == "deceased population region 1":
32 tr[key] = value[:, -1] # we want the end value
33 else:
34 # we want the maximum value of the peak
35 max_peak = np.max(value, axis=1)
36 tr["max peak"] = max_peak
37
38 # we want the time at which the maximum occurred
39 # the code here is a bit obscure, I don't know why the transpose
40 # of value is needed. This however does produce the appropriate results
41 logical = value.T == np.max(value, axis=1)
42 tr["time of max"] = time[logical.T]
43
44pairs_scatter(experiments, tr, filter_scalar=False)
45pairs_lines(experiments, outcomes)
46pairs_density(experiments, tr, filter_scalar=False)
47plt.show()
no policy

static policy

adaptive policy

General Introduction¶
Since 2010, I have been working on the development of an open source toolkit for supporting decision-making under deep uncertainty. This toolkit is known as the exploratory modeling workbench. The motivation for this name is that in my opinion all model-based deep uncertainty approaches are forms of exploratory modeling as first introduced by Bankes (1993). The design of the workbench has undergone various changes over time, but it has started to stabilize in the fall of 2016. In the summer 0f 2017, I published a paper detailing the workbench (Kwakkel, 2017). There is an in depth example in the paper, but for the documentation I want to provide a more tutorial style description of the functionality of the workbench.
As a starting point, I will use the Direct Policy Search example that is available for Rhodium (Quinn et al 2017). A quick note on terminology is in order here. I have a background in transport modeling. Here we often use discrete event simulation models. These are intrinsically stochastic models. It is standard practice to run these models several times and take descriptive statistics over the set of runs. In discrete event simulation, and also in the context of agent based modeling, this is known as running replications. The workbench adopts this terminology and draws a sharp distinction between designing experiments over a set of deeply uncertain factors, and performing replications of this experiment to cope with stochastic uncertainty.
[1]:
import math
# more or less default imports when using
# the workbench
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.optimize import brentq
def get_antropogenic_release(xt, c1, c2, r1, r2, w1):
"""
Parameters
----------
xt : float
polution in lake at time t
c1 : float
center rbf 1
c2 : float
center rbf 2
r1 : float
ratius rbf 1
r2 : float
ratius rbf 2
w1 : float
weight of rbf 1
Returns
-------
float
note:: w2 = 1 - w1
"""
rule = w1 * (abs(xt - c1) / r1) ** 3 + (1 - w1) * (abs(xt - c2) / r2) ** 3
at1 = max(rule, 0.01)
at = min(at1, 0.1)
return at
def lake_model(
b=0.42,
q=2.0,
mean=0.02,
stdev=0.001,
delta=0.98,
alpha=0.4,
nsamples=100,
myears=100,
c1=0.25,
c2=0.25,
r1=0.5,
r2=0.5,
w1=0.5,
seed=None,
):
"""runs the lake model for nsamples stochastic realisation using
specified random seed.
Parameters
----------
b : float
decay rate for P in lake (0.42 = irreversible)
q : float
recycling exponent
mean : float
mean of natural inflows
stdev : float
standard deviation of natural inflows
delta : float
future utility discount rate
alpha : float
utility from pollution
nsamples : int, optional
myears : int, optional
c1 : float
c2 : float
r1 : float
r2 : float
w1 : float
seed : int, optional
seed for the random number generator
Returns
-------
tuple
"""
np.random.seed(seed)
Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)
X = np.zeros((myears,))
average_daily_P = np.zeros((myears,))
reliability = 0.0
inertia = 0
utility = 0
for _ in range(nsamples):
X[0] = 0.0
decision = 0.1
decisions = np.zeros(
myears,
)
decisions[0] = decision
natural_inflows = np.random.lognormal(
math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
size=myears,
)
for t in range(1, myears):
# here we use the decision rule
decision = get_antropogenic_release(X[t - 1], c1, c2, r1, r2, w1)
decisions[t] = decision
X[t] = (
(1 - b) * X[t - 1]
+ X[t - 1] ** q / (1 + X[t - 1] ** q)
+ decision
+ natural_inflows[t - 1]
)
average_daily_P[t] += X[t] / nsamples
reliability += np.sum(X < Pcrit) / (nsamples * myears)
inertia += np.sum(np.absolute(np.diff(decisions) < 0.02)) / (nsamples * myears)
utility += (
np.sum(alpha * decisions * np.power(delta, np.arange(myears))) / nsamples
)
max_P = np.max(average_daily_P)
return max_P, utility, inertia, reliability
Connecting a python function to the workbench¶
Now we are ready to connect this model to the workbench. We have to specify the uncertainties, the outcomes, and the policy levers. For the uncertainties and the levers, we can use real valued parameters, integer valued parameters, and categorical parameters. For outcomes, we can use either scalar, single valued outcomes or time series outcomes. For convenience, we can also explicitly control constants in case we want to have them set to a value different from their default value.
[2]:
from ema_workbench import RealParameter, ScalarOutcome, Constant, Model
from dps_lake_model import lake_model
model = Model("lakeproblem", function=lake_model)
# specify uncertainties
model.uncertainties = [
RealParameter("b", 0.1, 0.45),
RealParameter("q", 2.0, 4.5),
RealParameter("mean", 0.01, 0.05),
RealParameter("stdev", 0.001, 0.005),
RealParameter("delta", 0.93, 0.99),
]
# set levers
model.levers = [
RealParameter("c1", -2, 2),
RealParameter("c2", -2, 2),
RealParameter("r1", 0, 2),
RealParameter("r2", 0, 2),
RealParameter("w1", 0, 1),
]
# specify outcomes
model.outcomes = [
ScalarOutcome("max_P"),
ScalarOutcome("utility"),
ScalarOutcome("inertia"),
ScalarOutcome("reliability"),
]
# override some of the defaults of the model
model.constants = [
Constant("alpha", 0.41),
Constant("nsamples", 150),
Constant("myears", 100),
]
Performing experiments¶
Now that we have specified the model with the workbench, we are ready to perform experiments on it. We can use evaluators to distribute these experiments either over multiple cores on a single machine, or over a cluster using ipyparallel
. Using any parallelization is an advanced topic, in particular if you are on a windows machine. The code as presented here will run fine in parallel on a mac or Linux machine. If you are trying to run this in parallel using multiprocessing on a windows
machine, from within a jupyter notebook, it won’t work. The solution is to move the lake_model
and get_antropogenic_release
to a separate python module and import the lake model function into the notebook.
Another common practice when working with the exploratory modeling workbench is to turn on the logging functionality that it provides. This will report on the progress of the experiments, as well as provide more insight into what is happening in particular in case of errors.
If we want to perform experiments on the model we have just defined, we can use the perform_experiments method on the evaluator, or the stand alone perform_experiments function. We can perform experiments over the uncertainties and/or over the levers. Any given parameterization of the levers is known as a policy, while any given parametrization over the uncertainties is known as a scenario. Any policy is evaluated over each of the scenarios. So if we want to use 100 scenarios and 10 policies, this means that we will end up performing 100 * 10 = 1000 experiments. By default, the workbench uses Latin hypercube sampling for both sampling over levers and sampling over uncertainties. However, the workbench also offers support for full factorial, partial factorial, and Monte Carlo sampling, as well as wrappers for the various sampling schemes provided by SALib.
The ema_workbench offers support for parallelization of the execution of the experiments using either the multiprocessing or ipyparallel. There are various OS specific concerns you have to keep in mind when using either of these libraries. Please have a look at the documentation of these libraries, before using them.
[3]:
from ema_workbench import MultiprocessingEvaluator, ema_logging, perform_experiments
ema_logging.log_to_stderr(ema_logging.INFO)
with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.perform_experiments(scenarios=1000, policies=10)
[MainProcess/INFO] pool started with 12 workers
[MainProcess/INFO] performing 1000 scenarios * 10 policies * 1 model(s) = 10000 experiments
100%|████████████████████████████████████| 10000/10000 [02:18<00:00, 72.21it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool
Processing the results of the experiments¶
By default, the return of perform_experiments
is a tuple of length 2. The first item in the tuple is the experiments. The second item is the outcomes. Experiments and outcomes are aligned by index. The experiments are stored in a Pandas DataFrame, while the outcomes are a dict with the name of the outcome as key, and the values are in a numpy array.
[4]:
experiments, outcomes = results
print(experiments.shape)
print(list(outcomes.keys()))
(10000, 13)
['max_P', 'utility', 'inertia', 'reliability']
We can easily visualize these results. The workbench comes with various convenience plotting functions built on top of matplotlib and seaborn. We can however also easily use seaborn or matplotlib directly. For example, we can create a pairplot using seaborn where we group our outcomes by policy. For this, we need to create a dataframe with the outcomes and a policy column. By default the name of a policy is a string representation of the dict with lever names and values. We can replace this easily with a number as shown below.
[5]:
policies = experiments["policy"]
for i, policy in enumerate(np.unique(policies)):
experiments.loc[policies == policy, "policy"] = str(i)
data = pd.DataFrame(outcomes)
data["policy"] = policies
Next, all that is left is to use seaborn’s pairplot function to visualize the data.
[6]:
sns.pairplot(data, hue="policy", vars=list(outcomes.keys()))
plt.show()

[ ]:
Open exploration¶
In this second tuturial, I will showcase how to use the ema_workbench for performing open exploration. This tuturial will continue with the same example as used in the previos tuturial.
some background¶
In exploratory modeling, we are interested in understanding how regions in the uncertainty space and/or the decision space map to the whole outcome space, or partitions thereof. There are two general approaches for investigating this mapping. The first one is through systematic sampling of the uncertainty or decision space. This is sometimes also known as open exploration. The second one is to search through the space in a directed manner using some type of optimization approach. This is sometimes also known as directed search.
The workbench support both open exploration and directed search. Both can be applied to investigate the mapping of the uncertainty space and/or the decision space to the outcome space. In most applications, search is used for finding promising mappings from the decision space to the outcome space, while exploration is used to stress test these mappings under a whole range of possible resolutions to the various uncertainties. This need not be the case however. Optimization can be used to discover the worst possible scenario, while sampling can be used to get insight into the sensitivity of outcomes to the various decision levers.
open exploration¶
To showcase the open exploration functionality, let’s start with a basic example using the Direct Policy Search (DPS) version of the lake problem (Quinn et al 2017). This is the same model as we used in the general introduction. Note that for convenience, I have moved the code for the model to a module called dps_lake_model.py, which I import here for further use.
We are going to simultaneously sample over uncertainties and decision levers. We are going to generate 1000 scenarios and 5 policies, and see how they jointly affect the outcomes. A scenario is understood as a point in the uncertainty space, while a policy is a point in the decision space. The combination of a scenario and a policy is called experiment. The uncertainty space is spanned by uncertainties, while the decision space is spanned by levers.
Both uncertainties and levers are instances of RealParameter (a continuous range), IntegerParameter (a range of integers), or CategoricalParameter (an unorder set of things). By default, the workbench will use Latin Hypercube sampling for generating both the scenarios and the policies. Each policy will be always evaluated over all scenarios (i.e. a full factorial over scenarios and policies).
[1]:
from ema_workbench import RealParameter, ScalarOutcome, Constant, Model
from dps_lake_model import lake_model
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
model = Model("lakeproblem", function=lake_model)
# specify uncertainties
model.uncertainties = [
RealParameter("b", 0.1, 0.45),
RealParameter("q", 2.0, 4.5),
RealParameter("mean", 0.01, 0.05),
RealParameter("stdev", 0.001, 0.005),
RealParameter("delta", 0.93, 0.99),
]
# set levers
model.levers = [
RealParameter("c1", -2, 2),
RealParameter("c2", -2, 2),
RealParameter("r1", 0, 2),
RealParameter("r2", 0, 2),
RealParameter("w1", 0, 1),
]
# specify outcomes
model.outcomes = [
ScalarOutcome("max_P"),
ScalarOutcome("utility"),
ScalarOutcome("inertia"),
ScalarOutcome("reliability"),
]
# override some of the defaults of the model
model.constants = [
Constant("alpha", 0.41),
Constant("nsamples", 150),
Constant("myears", 100),
]
[2]:
from ema_workbench import MultiprocessingEvaluator, ema_logging, perform_experiments
ema_logging.log_to_stderr(ema_logging.INFO)
with MultiprocessingEvaluator(model, n_processes=7) as evaluator:
experiments, outcomes = evaluator.perform_experiments(scenarios=1000, policies=5)
[MainProcess/INFO] pool started with 7 workers
[MainProcess/INFO] performing 1000 scenarios * 5 policies * 1 model(s) = 5000 experiments
100%|██████████████████████████████████████| 5000/5000 [01:02<00:00, 80.06it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool
Visual analysis¶
Having generated these results, the next step is to analyze them and see what we can learn from the results. The workbench comes with a variety of techniques for this analysis. A simple first step is to make a few quick visualizations of the results. The workbench has convenience functions for this, but it also possible to create your own visualizations using the scientific Python stack.
[3]:
from ema_workbench.analysis import pairs_plotting
fig, axes = pairs_plotting.pairs_scatter(
experiments, outcomes, group_by="policy", legend=False
)
fig.set_size_inches(8, 8)
plt.show()
[MainProcess/INFO] no time dimension found in results

Often, it is convenient to separate the process of performing the experiments from the analysis. To make this possible, the workbench offers convenience functions for storing results to disc and loading them from disc. The workbench will store the results in a tarbal with .csv files and separate metadata files. This is a convenient format that has proven sufficient over the years.
from ema_workbench import save_results
save_results(results, '1000 scenarios 5 policies.tar.gz')
from ema_workbench import load_results
results = load_results('1000 scenarios 5 policies.tar.gz')
advanced analysis¶
In addition to visual analysis, the workbench comes with a variety of techniques to perform a more in-depth analysis of the results. In addition, other analyses can simply be performed by utilizing the scientific python stack. The workbench comes with
Scenario Discovery, a model driven approach to scenario development
Feature Scoring, a poor man’s alternative to global sensitivity analysis
Dimensional stacking, a quick visual approach drawing on feature scoring to enable scenario discovery. This approach has received limited attention in the literature. The implementation in the workbench replaces the rule mining approach with a feature scoring approach.
Scenario Discovery¶
A detailed discussion on scenario discovery can be found in an earlier blogpost. For completeness, I provide a code snippet here. Compared to the previous blog post, there is one small change. The library mpld3 is currently not being maintained and broken on Python 3.5. and higher. To still utilize the interactive exploration of the trade offs within the notebook, one could use the interactive back-end (% matplotlib notebook).
[4]:
from ema_workbench.analysis import prim
x = experiments
y = outcomes["max_P"] < 0.8
prim_alg = prim.Prim(x, y, threshold=0.8)
box1 = prim_alg.find_box()
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] 5000 points remaining, containing 510 cases of interest
[MainProcess/INFO] mean: 0.9807692307692307, mass: 0.052, coverage: 0.5, density: 0.9807692307692307 restricted_dimensions: 3
[5]:
box1.show_tradeoff()
plt.show()

We can inspect any of the points on the trade off curve using the inspect method. As shown, we can show the results either in a table format or in a visual format.
[6]:
box1.inspect(43)
box1.inspect(43, style="graph")
plt.show()
coverage 0.794118
density 0.794118
id 43
mass 0.102
mean 0.794118
res_dim 2
Name: 43, dtype: object
box 43
min max qp values
b 0.362596 0.449742 [1.5372285485377317e-164, -1.0]
q 3.488834 4.498362 [1.3004137205191903e-88, -1.0]

[7]:
box1.show_pairs_scatter(43)
plt.show()

feature scoring¶
Feature scoring is a family of techniques often used in machine learning to identify the most relevant features to include in a model. This is similar to one of the use cases for global sensitivity analysis, namely factor prioritisation. The main advantage of feature scoring techniques is that they impose no specific constraints on the experimental design, while they can handle real valued, integer valued, and categorical valued parameters. The workbench supports multiple techniques, the most useful of which generally is extra trees (Geurts et al. 2006).
For this example, we run feature scoring for each outcome of interest. We can also run it for a specific outcome if desired. Similarly, we can choose if we want to run in regression mode or classification mode. The later is applicable if the outcome is a categorical variable and the results should be interpreted similar to regional sensitivity analysis results. For more details, see the documentation.
[8]:
from ema_workbench.analysis import feature_scoring
x = experiments
y = outcomes
fs = feature_scoring.get_feature_scores_all(x, y)
sns.heatmap(fs, cmap="viridis", annot=True)
plt.show()
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] model dropped from analysis because only a single category

From the results, we see that max_P is primarily influenced by b, while utility is driven by delta, for inertia and reliability the situation is a little bit less clear cut.
The foregoing feature scoring was using the raw values of the outcomes. However, in scenario discovery applications, we are typically dealing with a binary clasification. This might produce slightly different results as demonstrated below
[11]:
from ema_workbench.analysis import RuleInductionType
x = experiments
y = outcomes["max_P"] < 0.8
fs, alg = feature_scoring.get_ex_feature_scores(
x, y, mode=RuleInductionType.CLASSIFICATION
)
fs.sort_values(ascending=False, by=1)
[MainProcess/INFO] model dropped from analysis because only a single category
[11]:
1 | |
---|---|
0 | |
b | 0.458094 |
q | 0.310989 |
mean | 0.107732 |
delta | 0.051212 |
stdev | 0.047653 |
policy | 0.005475 |
w1 | 0.004815 |
r2 | 0.003658 |
r1 | 0.003607 |
c1 | 0.003467 |
c2 | 0.003299 |
Here we ran extra trees feature scoring on a binary vector for max_P. the b parameter is still important, similar to in the previous case, but the introduction of the binary classifiaction now also highlights some addtional parameters as being potentially relevant.
dimensional stacking¶
Dimensional stacking was suggested as a more visual approach to scenario discovery. It involves two steps: identifying the most important uncertainties that affect system behavior, and creating a pivot table using the most influential uncertainties. In order to do this, we first need, as in scenario discovery, specify the outcomes that are of interest. The creating of the pivot table involves binning the uncertainties. More details can be found in Suzuki et al. (2015) or by looking through the code in the workbench. Compared to Suzuki et al, the workbench uses feature scoring for determining the most influential uncertainties. The code is set up in a modular way so other approaches to global sensitivity analysis can easily be used as well if so desired.
[12]:
from ema_workbench.analysis import dimensional_stacking
x = experiments
y = outcomes["max_P"] < 0.8
dimensional_stacking.create_pivot_plot(x, y, 2, nbins=3)
plt.show()
[MainProcess/INFO] model dropped from analysis because only a single category

We can see from this visual that if B is high, while Q is high, we have a high concentration of cases where pollution stays below 0.8. The mean and stdev have some limited additional influence. By playing around with an alternative number of bins, or different number of layers, patterns can be coarsened or refined.
regional sensitivity analysis¶
A fourth approach for supporting scenario discovery is to perform a regional sensitivity analysis. The workbench implements a visual approach based on plotting the empirical CDF given a classification vector. Please look at section 3.4 in Pianosi et al (2016) for more details.
[13]:
from ema_workbench.analysis import regional_sa
from numpy.lib import recfunctions as rf
sns.set_style("white")
# model is the same across experiments
x = experiments.copy()
x = x.drop("model", axis=1)
y = outcomes["max_P"] < 0.8
fig = regional_sa.plot_cdfs(x, y)
sns.despine()
plt.show()

The above results clearly show that both B and Q are important. to a lesser extend, the mean also is relevant.
More advanced sampling techniques¶
The workbench can also be used for more advanced sampling techniques. To achieve this, it relies on SALib. On the workbench side, the only change is to specify the sampler we want to use. Next, we can use SALib directly to perform the analysis. To help with this, the workbench provides a convenience function for generating the problem dict which SALib provides. The example below focusses on performing SOBOL on the uncertainties, but we could do the
exact same thing with the levers instead. The only changes required would be to set lever_sampling
instead of uncertainty_sampling
, and get the SALib problem dict based on the levers.
[17]:
from SALib.analyze import sobol
from ema_workbench import Samplers
from ema_workbench.em_framework.salib_samplers import get_SALib_problem
with MultiprocessingEvaluator(model) as evaluator:
sa_results = evaluator.perform_experiments(
scenarios=1000, uncertainty_sampling=Samplers.SOBOL
)
experiments, outcomes = sa_results
problem = get_SALib_problem(model.uncertainties)
Si = sobol.analyze(
problem, outcomes["max_P"], calc_second_order=True, print_to_console=False
)
[MainProcess/INFO] pool started with 12 workers
/Users/jhkwakkel/opt/anaconda3/lib/python3.9/site-packages/SALib/sample/saltelli.py:94: UserWarning:
Convergence properties of the Sobol' sequence is only valid if
`N` (1000) is equal to `2^n`.
warnings.warn(msg)
[MainProcess/INFO] performing 12000 scenarios * 1 policies * 1 model(s) = 12000 experiments
100%|████████████████████████████████████| 12000/12000 [02:37<00:00, 76.17it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool
We have now completed the sobol analysis and have calculated the metrics. What remains is to visualize the metrics. Which can be done as shown below, focussing on St and S1. The error bars indicate the confidence intervals.
[18]:
scores_filtered = {k: Si[k] for k in ["ST", "ST_conf", "S1", "S1_conf"]}
Si_df = pd.DataFrame(scores_filtered, index=problem["names"])
sns.set_style("white")
fig, ax = plt.subplots(1)
indices = Si_df[["S1", "ST"]]
err = Si_df[["S1_conf", "ST_conf"]]
indices.plot.bar(yerr=err.values.T, ax=ax)
fig.set_size_inches(8, 6)
fig.subplots_adjust(bottom=0.3)
plt.show()

[ ]:
Directed search¶
This is the third turial in a series showcasing the functionality of the Exploratory Modeling workbench. Exploratory modeling entails investigating the way in which uncertainty and/or policy levers map to outcomes. To investigate these mappings, we can either use sampling based strategies (open exploration) or optimization based strategies (directed search).
In this tutorial, I will demonstrate in more detail how to use the workbench for directed search. We are using the same example as in the previous tutorials. When using optimization, it is critical that you specify for each Scalar Outcome the direction in which it should move. There are three possibilities: info which is ignored, maximize, and minimize. If the kind
keyword argument is not specified, it defaults to info.
[1]:
from ema_workbench import RealParameter, ScalarOutcome, Constant, Model
from dps_lake_model import lake_model
model = Model("lakeproblem", function=lake_model)
# specify uncertainties
model.uncertainties = [
RealParameter("b", 0.1, 0.45),
RealParameter("q", 2.0, 4.5),
RealParameter("mean", 0.01, 0.05),
RealParameter("stdev", 0.001, 0.005),
RealParameter("delta", 0.93, 0.99),
]
# set levers
model.levers = [
RealParameter("c1", -2, 2),
RealParameter("c2", -2, 2),
RealParameter("r1", 0, 2),
RealParameter("r2", 0, 2),
RealParameter("w1", 0, 1),
]
# specify outcomes
model.outcomes = [
ScalarOutcome("max_P", ScalarOutcome.MINIMIZE),
ScalarOutcome("utility", ScalarOutcome.MAXIMIZE),
ScalarOutcome("inertia", ScalarOutcome.MAXIMIZE),
ScalarOutcome("reliability", ScalarOutcome.MAXIMIZE),
]
# override some of the defaults of the model
model.constants = [
Constant("alpha", 0.41),
Constant("nsamples", 150),
Constant("myears", 100),
]
Using directed search with the ema_workbench requires platypus-opt. Please check the installation suggestions provided in the readme of the github repository. I personally either install from github directly
pip git+https://github.com/Project-Platypus/Platypus.git
or through pip
pip install platypus-opt
One note of caution: don’t install platypus, but platypus-opt. There exists a python package on pip called platypus, but that is quite a different kind of libary.
There are three ways in which we can use optimization in the workbench: 1. Search over the decision levers, conditional on a reference scenario 2. Search over the uncertain factors, conditional on a reference policy 3. Search over the decision levers given a set of scenarios
Search over levers¶
Directed search is most often used to search over the decision levers in order to find good candidate strategies. This is for example the first step in the Many Objective Robust Decision Making process. This is straightforward to do with the workbench using the optimize method. Note that I have kept the number of functional evaluations (nfe) very low. In real applications this should be substantially higher and be based on convergence considerations which are demonstrated below.
[2]:
from ema_workbench import MultiprocessingEvaluator, ema_logging
ema_logging.log_to_stderr(ema_logging.INFO)
with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.optimize(
nfe=250,
searchover="levers",
epsilons=[
0.1,
]
* len(model.outcomes),
)
[MainProcess/INFO] pool started with 12 workers
298it [00:07, 42.43it/s]
[MainProcess/INFO] optimization completed, found 6 solutions
[MainProcess/INFO] terminating pool
the results from optimize is a DataFrame with the decision variables and outcomes of interest.
[3]:
results
[3]:
c1 | c2 | r1 | r2 | w1 | max_P | utility | inertia | reliability | |
---|---|---|---|---|---|---|---|---|---|
0 | 0.031532 | 0.184650 | 0.280885 | 0.410156 | 0.367589 | 0.094885 | 0.250233 | 0.990000 | 1.000000 |
1 | -0.625142 | 0.098499 | 1.047069 | 0.913895 | 0.079976 | 2.283907 | 1.378241 | 0.985533 | 0.206533 |
2 | 0.686334 | 0.557664 | 1.017384 | 0.807999 | 0.496470 | 2.283759 | 1.001528 | 0.970933 | 0.310533 |
3 | 0.435289 | 0.350622 | 0.319505 | 0.624951 | 0.026242 | 0.197131 | 0.537310 | 0.990000 | 1.000000 |
4 | 1.947359 | -0.622528 | 1.619746 | 1.863076 | 0.418746 | 2.283625 | 1.778130 | 0.990000 | 0.070000 |
5 | 0.629075 | -0.076368 | 1.666986 | 0.488599 | 0.673219 | 2.283672 | 1.631001 | 0.980000 | 0.110000 |
Specifying constraints¶
It is possible to specify a constrained optimization problem. A model can have one or more constraints. A constraint can be applied to the model input parameters (both uncertainties and levers), and/or outcomes. A constraint is essentially a function that should return the distance from the feasibility threshold. The distance should be 0 if the constraint is met.
[4]:
from ema_workbench import Constraint
constraints = [
Constraint("max pollution", outcome_names="max_P", function=lambda x: max(0, x - 1))
]
[5]:
from ema_workbench import MultiprocessingEvaluator
from ema_workbench import ema_logging
ema_logging.log_to_stderr(ema_logging.INFO)
with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.optimize(
nfe=250,
searchover="levers",
epsilons=[
0.1,
]
* len(model.outcomes),
constraints=constraints,
)
[MainProcess/INFO] pool started with 12 workers
298it [00:07, 42.38it/s]
[MainProcess/INFO] optimization completed, found 3 solutions
[MainProcess/INFO] terminating pool
[6]:
results
[6]:
c1 | c2 | r1 | r2 | w1 | max_P | utility | inertia | reliability | |
---|---|---|---|---|---|---|---|---|---|
0 | 0.242378 | 0.463484 | 1.707600 | 0.404997 | 0.886919 | 0.215327 | 0.555460 | 0.99 | 1.0 |
1 | 0.242378 | 0.343609 | 1.707600 | 0.434568 | 0.949529 | 0.092854 | 0.234231 | 0.99 | 1.0 |
2 | 0.042579 | 0.672822 | 0.957689 | 1.814246 | 0.168521 | 0.146712 | 0.419054 | 0.99 | 1.0 |
Tracking convergence¶
An important part of using many-objective evolutionary algorithms is to carefully monitor whether they have converged. Various different metrics can be used for this. The workbench supports two useful metrics known as hypervolume and epsilon progress.
[7]:
from ema_workbench.em_framework.optimization import HyperVolume, EpsilonProgress
convergence_metrics = [
HyperVolume(minimum=[0, 0, 0, 0], maximum=[1, 1.01, 1.01, 1.01]),
EpsilonProgress(),
]
with MultiprocessingEvaluator(model) as evaluator:
results, convergence = evaluator.optimize(
nfe=10000,
searchover="levers",
epsilons=[
0.05,
]
* len(model.outcomes),
convergence=convergence_metrics,
constraints=constraints,
)
fig, (ax1, ax2) = plt.subplots(ncols=2, sharex=True, figsize=(8, 4))
ax1.plot(convergence.nfe, convergence.epsilon_progress)
ax1.set_ylabel("$\epsilon$-progress")
ax2.plot(convergence.nfe, convergence.hypervolume)
ax2.set_ylabel("hypervolume")
ax1.set_xlabel("number of function evaluations")
ax2.set_xlabel("number of function evaluations")
plt.show()
[MainProcess/INFO] pool started with 12 workers
10010it [02:17, 73.02it/s]
[MainProcess/INFO] optimization completed, found 16 solutions
[MainProcess/INFO] terminating pool

Changing the reference scenario¶
The workbench offers control over the reference scenario or policy under which you are performing the optimization. This makes it easy to aply multi-scenario MORDM (Watson & Kasprzyk, 2017). Alternatively, you can also use it to change the policy for which you are applying worst case scenario discovery (see below).
reference = Scenario('reference', b=0.4, q=2, mean=0.02, stdev=0.01)
with MultiprocessingEvaluator(lake_model) as evaluator:
results = evaluator.optimize(searchover='levers', nfe=1000,
epsilons=[0.1, ]*len(lake_model.outcomes),
reference=reference)
Search over uncertainties: worst case discovery¶
Up till now, the focus has been on applying search to find promising candidate strategies. That is, we search through the lever space. However, there might also be good reasons to search through the uncertainty space. For example to search for worst case scenarios (Halim et al, 2015). This is easily achieved as shown below. We change the kind attribute on each outcome so that we search for the worst outcome and specify that we would like to search over the uncertainties instead of the levers.
Any of the foregoing additions such as constraints or converence work as shown above. Note that if you would like to to change the reference policy, reference should be a Policy object rather than a Scenario object.
[8]:
# change outcomes so direction is undesirable
minimize = ScalarOutcome.MINIMIZE
maximize = ScalarOutcome.MAXIMIZE
for outcome in model.outcomes:
if outcome.kind == minimize:
outcome.kind = maximize
else:
outcome.kind = minimize
with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.optimize(
nfe=1000,
searchover="uncertainties",
epsilons=[
0.1,
]
* len(model.outcomes),
)
[MainProcess/INFO] pool started with 12 workers
1090it [00:18, 60.52it/s]
[MainProcess/INFO] optimization completed, found 7 solutions
[MainProcess/INFO] terminating pool
Parallel coordinate plots¶
The workbench comes with support for making parallel axis plots through the parcoords module. This module offers a parallel axes object on which we can plot data. The typical workflow is to first instantiate this parallel axes object given a pandas dataframe with the upper and lower limits for each axes. Next, one or more datasets can be plotted on this axes. Any dataframe passed to the plot method will be normalized using the limits passed first. We can also invert any of the axes to ensure that the desirable direction is the same for all axes.
[9]:
from ema_workbench.analysis import parcoords
data = results.loc[:, [o.name for o in model.outcomes]]
limits = parcoords.get_limits(data)
limits.loc[0, ["utility", "inertia", "reliability", "max_P"]] = 0
limits.loc[1, ["utility", "inertia", "reliability"]] = 1
paraxes = parcoords.ParallelAxes(limits)
paraxes.plot(data)
paraxes.invert_axis("max_P")
plt.show()

Robust Search¶
In the foregoing, we have been using optimization over levers or uncertainties, while assuming a reference scenario or policy. However, we can also formulate a robust many objective optimization problem, were we are going to search over the levers for solutions that have robust performance over a set of scenarios. To do this with the workbench, there are several steps that one has to take.
First, we need to specify our robustness metrics. A robustness metric takes as input the performance of a candidate policy over a set of scenarios and returns a single robustness score. For a more in depth overview, seeMcPhail et al. (2018). In case of the workbench, we can use the ScalarOutcome class for this. We need to specify the name of the robustness metric a function that takes as input a numpy array and returns a single number, and the model outcome to which this function should be applied.
Below, we use a percentile based robustness metric, which we apply to each model outcome.
[10]:
import functools
percentile10 = functools.partial(np.percentile, q=10)
percentile90 = functools.partial(np.percentile, q=90)
MAXIMIZE = ScalarOutcome.MAXIMIZE
MINIMIZE = ScalarOutcome.MINIMIZE
robustnes_functions = [
ScalarOutcome(
"90th percentile max_p",
kind=MINIMIZE,
variable_name="max_P",
function=percentile90,
),
ScalarOutcome(
"10th percentile reliability",
kind=MAXIMIZE,
variable_name="reliability",
function=percentile10,
),
ScalarOutcome(
"10th percentile inertia",
kind=MAXIMIZE,
variable_name="inertia",
function=percentile10,
),
ScalarOutcome(
"10th percentile utility",
kind=MAXIMIZE,
variable_name="utility",
function=percentile10,
),
]
Next, we have to generate the scenarios we want to use. Below we generate 50 scenarios, which we will keep fixed over the optimization. The exact number of scenarios to use is to be established through trial and error. Typically it involves balancing computational costs of more scenarios, with the stability of the robustness metric over the number of scenarios
[11]:
from ema_workbench.em_framework import sample_uncertainties
n_scenarios = 50
scenarios = sample_uncertainties(model, n_scenarios)
With the robustness metrics specified, and the scenarios, sampled, we can now perform robust many-objective optimization. Below is the code that one would run. Note that this is computationally very expensive since each candidate solution is going to be run for fifty scenarios before we can calculate the robustness for each outcome of interest.
nfe = int(1e6)
with MultiprocessingEvaluator(model) as evaluator:
robust_results = evaluator.robust_optimize(robustnes_functions, scenarios,
nfe=nfe, epsilons=[0.05,]*len(robustnes_functions))
[ ]:
Best practices¶
Separate experimentation and analysis¶
It is strongly recommended to cleanly separate the various steps in your exploratory modeling pipeline. So, separately execute your experiments or perform your optimization, save the results, and next analyze these results. Moreover, since parallel execution can be troublesome within the Jupyter Lab / notebook environment, I personally run my experiments and optimizations either from the command line or through an IDE using a normal python file. Jupyter Lab is then used to analyze the results.
Keeping things organized¶
A frequently recurring cause of problems when using the workbench stems from not properly organizing your files. In particular when using multiprocessing it is key that you keep things well organized. The way the workbench works with multiprocessing is that it copies the entire working directory of the model to a temporary folder for each subprocess. This temporary folder is located in the same folder as the python or notebook file from which you are running. If the working directory of your model is the same as the directory in which the run file resized, you can easily fill up your hard disk in minutes. To avoid these kinds of problems, I suggest to use a directory structure as outlined below.
Also, if you are not familiar with absolute and relative paths, please read up on that first and only use relative paths when using the workbench. Not only will this reduce the possibility for errors, it will also mean that moving your code from one machine to another will be a lot easier.
Vensim Tips and Tricks¶
Debugging a model¶
A common occurring problem is that some of the runs of a Vensim model do not complete correctly. In the logger, we see a message stating that a run did not complete correct, with a description of the case that did not complete correctly attached to it. Typically, this error is due to a division by zero somewhere in the model during the simulation. The easiest way of finding the source of the division by zero is via Vensim itself. However, this requires that the model is parameterized as specified by the case that created the error. It is of course possible to set all the parameters by hand, however this becomes annoying on larger models, or if one has to do it multiple times. Since the Vensim DLL does not have a way to save a model, we cannot use the DLL. Instead, we can use the fact that one can save a Vensim model as a text file. By changing the required parameters in this text file via the workbench, we can then open the modified model in Vensim and spot the error.
The following script can be used for this purpose.
1"""
2Created on 11 aug. 2011
3
4.. codeauthor:: wauping <w.auping (at) student (dot) tudelft (dot) nl>
5 jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
6
7
8To be able to debug the Vensim model, a few steps are needed:
9
10 1. The case that gave a bug, needs to be saved in a text file. The entire
11 case description should be on a single line.
12 2. Reform and clean your model ( In the Vensim menu: Model, Reform and
13 Clean). Choose
14
15 * Equation Order: Alphabetical by group (not really necessary)
16 * Equation Format: Terse
17
18 3. Save your model as text (File, Save as..., Save as Type: Text Format
19 Models
20 4. Run this script
21 5. If the print in the end is not set([]), but set([array]), the array
22 gives the values that where not found and changed
23 5. Run your new model (for example 'new text.mdl')
24 6. Vensim tells you about your critical mistake
25
26"""
27fileSpecifyingError = ""
28
29pathToExistingModel = "/salinization/Verzilting_aanpassingen incorrect.mdl"
30pathToNewModel = "/models/salinization/Verzilting_aanpassingen correct.mdl"
31newModel = open(pathToNewModel, "w")
32
33# line = open(fileSpecifyingError).read()
34
35line = "rainfall : 0.154705633188; adaptation time from non irrigated agriculture : 0.915157119079; salt effect multiplier : 1.11965969891; adaptation time to non irrigated agriculture : 0.48434342934; adaptation time to irrigated agriculture : 0.330990830832; water shortage multiplier : 0.984356102036; delay time salt seepage : 6.0; adaptation time : 6.90258192256; births multiplier : 1.14344734715; diffusion lookup : [(0, 8.0), (10, 8.0), (20, 8.0), (30, 8.0), (40, 7.9999999999999005), (50, 4.0), (60, 9.982194802803703e-14), (70, 1.2455526635140464e-27), (80, 1.5541686655435471e-41), (90, 1.9392517969836692e-55)]; salinity effect multiplier : 1.10500381093; technological developments in irrigation : 0.0117979353255; adaptation time from irrigated agriculture : 1.58060947607; food shortage multiplier : 0.955325345996; deaths multiplier : 0.875605669911; "
36
37# we assume the case specification was copied from the logger
38splitOne = line.split(",")
39variable = {}
40for n in range(len(splitOne) - 1):
41 splitTwo = splitOne[n].split(":")
42 variableElement = splitTwo[0]
43 # Delete the spaces and other rubish on the sides of the variable name
44 variableElement = variableElement.lstrip()
45 variableElement = variableElement.lstrip("'")
46 variableElement = variableElement.rstrip()
47 variableElement = variableElement.rstrip("'")
48 print(variableElement)
49 valueElement = splitTwo[1]
50 valueElement = valueElement.lstrip()
51 valueElement = valueElement.rstrip()
52 variable[variableElement] = valueElement
53print(variable)
54
55# This generates a new (text-formatted) model
56changeNextLine = False
57settedValues = []
58for line in open(pathToExistingModel):
59 if line.find("=") != -1:
60 elements = line.split("=")
61 value = elements[0]
62 value = value.strip()
63 if value in variable:
64 elements[1] = variable.get(value)
65 line = elements[0] + " = " + elements[1]
66 settedValues.append(value)
67
68 newModel.write(line)
69newModel.close() # in order to be able to open the model in Vensim
70notSet = set(variable.keys()) - set(settedValues)
71print(notSet)
Glossary¶
- parameter uncertainty
An uncertainty is a parameter uncertainty if the range is continuous from the lower bound to the upper bound. A parameter uncertainty can be either real valued or discrete valued.
- categorical uncertainty
An uncertainty is categorical if there is not a range but a set of possibilities over which one wants to sample.
- lookup uncertainty
vensim specific extension to categorical uncertainty for handling lookups in various ways
- uncertainty space
the space created by the set of uncertainties
- ensemble
a python class responsible for running a series of computational experiments.
- model interface
a python class that provides an interface to an underlying model
- working directory
a directory that contains files that a model needs
- classification trees
a category of machine learning algorithms for rule induction
- prim (patient rule induction method)
a rule induction algorithm
- coverage
a metric developed for scenario discovery
- density
a metric developed for scenario discovery
- scenario discovery
a use case of EMA
- case
A case specifies the input parameters for a run of a model. It is a dict instance, with the names of the uncertainties as key, and their sampled values as value.
- experiment
An experiment is a complete specification for a run. It specifies the case, the name of the policy, and the name of the model.
- policy
a policy is by definition an object with a name attribute. So, policy[‘name’] most return the name of the policy
- result
the combination of an experiment and the associated outcomes for the experiment
- outcome
the data of interest produced by a model given an experiment
EMA Modules¶
Exploratory modeling framework¶
model
¶
This module specifies the abstract base class for interfacing with models. Any model that is to be controlled from the workbench is controlled via an instance of an extension of this abstract base class.
- class ema_workbench.em_framework.model.AbstractModel(name)¶
ModelStructureInterface
is one of the the two main classes used for performing EMA. This is an abstract base class and cannot be used directly.- uncertainties¶
list of parameter instances
- Type
list
- levers¶
list of parameter instances
- Type
list
- outcomes¶
list of outcome instances
- Type
list
- name¶
alphanumerical name of model structure interface
- Type
str
- output¶
this should be a dict with the names of the outcomes as key
- Type
dict
- When extending this class :meth:`model_init` and
- :meth:`run_model` have to be implemented.
- as_dict()¶
returns a dict representation of the model
- cleanup()¶
This model is called after finishing all the experiments, but just prior to returning the results. This method gives a hook for doing any cleanup, such as closing applications.
In case of running in parallel, this method is called during the cleanup of the pool, just prior to removing the temporary directories.
- initialized(policy)¶
check if model has been initialized
- Parameters
policy (a Policy instance) –
- model_init(policy)¶
Method called to initialize the model.
- Parameters
policy (dict) – policy to be run.
Note
This method should always be implemented. Although in simple cases, a simple pass can suffice.
- reset_model()¶
Method for reseting the model to its initial state. The default implementation only sets the outputs to an empty dict.
- retrieve_output()¶
Method for retrieving output after a model run.
- Return type
dict with the results of a model run.
- run_model(scenario, policy)¶
Method for running an instantiated model structure.
- Parameters
scenario (Scenario instance) –
policy (Policy instance) –
- class ema_workbench.em_framework.model.FileModel(name, wd=None, model_file=None)¶
- as_dict()¶
returns a dict representation of the model
- class ema_workbench.em_framework.model.Model(name, function=None)¶
- class ema_workbench.em_framework.model.Replicator(name)¶
- run_model(scenario, policy)¶
Method for running an instantiated model structure.
- Parameters
scenario (Scenario instance) –
policy (Policy instance) –
- class ema_workbench.em_framework.model.ReplicatorModel(name, function=None)¶
parameters
¶
parameters and related helper classes and functions
- class ema_workbench.em_framework.parameters.BooleanParameter(name, default=None, variable_name=None, pff=False)¶
boolean model input parameter
A BooleanParameter is similar to a CategoricalParameter, except the category values can only be True or False.
- Parameters
name (str) –
variable_name (str, or list of str) –
- class ema_workbench.em_framework.parameters.CategoricalParameter(name, categories, default=None, variable_name=None, pff=False, multivalue=False)¶
categorical model input parameter
- Parameters
name (str) –
categories (collection of obj) –
variable_name (str, or list of str) –
multivalue (boolean) – if categories have a set of values, for each variable_name a different one.
TODO (#) –
TODO –
- cat_for_index(index)¶
return category given index
- Parameters
index (int) –
- Return type
object
- from_dist(name, dist)¶
alternative constructor for creating a parameter from a frozen scipy.stats distribution directly
- Parameters
dist (scipy stats frozen dist) –
**kwargs (valid keyword arguments for Parameter instance) –
- index_for_cat(category)¶
return index of category
- Parameters
category (object) –
- Return type
int
- class ema_workbench.em_framework.parameters.Category(name, value)¶
- class ema_workbench.em_framework.parameters.Constant(name, value)¶
Constant class,
can be used for any parameter that has to be set to a fixed value
- class ema_workbench.em_framework.parameters.IntegerParameter(name, lower_bound, upper_bound, resolution=None, default=None, variable_name=None, pff=False)¶
integer valued model input parameter
- Parameters
name (str) –
lower_bound (int) –
upper_bound (int) –
resolution (iterable) –
variable_name (str, or list of str) –
- Raises
ValueError – if lower bound is larger than upper bound
ValueError – if entries in resolution are outside range of lower_bound and upper_bound, or not an integer instance
ValueError – if lower_bound or upper_bound is not an integer instance
- classmethod from_dist(name, dist, **kwargs)¶
alternative constructor for creating a parameter from a frozen scipy.stats distribution directly
- Parameters
dist (scipy stats frozen dist) –
**kwargs (valid keyword arguments for Parameter instance) –
- class ema_workbench.em_framework.parameters.Parameter(name, lower_bound, upper_bound, resolution=None, default=None, variable_name=None, pff=False)¶
Base class for any model input parameter
- Parameters
name (str) –
lower_bound (int or float) –
upper_bound (int or float) –
resolution (collection) –
pff (bool) – if true, sample over this parameter using resolution in case of partial factorial sampling
- Raises
ValueError – if lower bound is larger than upper bound
ValueError – if entries in resolution are outside range of lower_bound and upper_bound
- classmethod from_dist(name, dist, **kwargs)¶
alternative constructor for creating a parameter from a frozen scipy.stats distribution directly
- Parameters
dist (scipy stats frozen dist) –
**kwargs (valid keyword arguments for Parameter instance) –
- class ema_workbench.em_framework.parameters.RealParameter(name, lower_bound, upper_bound, resolution=None, default=None, variable_name=None, pff=False)¶
real valued model input parameter
- Parameters
name (str) –
lower_bound (int or float) –
upper_bound (int or float) –
resolution (iterable) –
variable_name (str, or list of str) –
- Raises
ValueError – if lower bound is larger than upper bound
ValueError – if entries in resolution are outside range of lower_bound and upper_bound
- classmethod from_dist(name, dist, **kwargs)¶
alternative constructor for creating a parameter from a frozen scipy.stats distribution directly
- Parameters
dist (scipy stats frozen dist) –
**kwargs (valid keyword arguments for Parameter instance) –
- ema_workbench.em_framework.parameters.parameters_from_csv(uncertainties, **kwargs)¶
Helper function for creating many Parameters based on a DataFrame or csv file
- Parameters
uncertainties (str, DataFrame) –
**kwargs (dict, arguments to pass to pandas.read_csv) –
- Return type
list of Parameter instances
This helper function creates uncertainties. It assumes that the DataFrame or csv file has a column titled ‘name’, optionally a type column {int, real, cat}, can be included as well. the remainder of the columns are handled as values for the parameters. If type is not specified, the function will try to infer type from the values.
Note that this function does not support the resolution and default kwargs on parameters.
An example of a csv:
NAME,TYPE,,, a_real,real,0,1.1, an_int,int,1,9, a_categorical,cat,a,b,c
this CSV file would result in
- [RealParameter(‘a_real’, 0, 1.1, resolution=[], default=None),
IntegerParameter(‘an_int’, 1, 9, resolution=[], default=None), CategoricalParameter(‘a_categorical’, [‘a’, ‘b’, ‘c’], default=None)]
- ema_workbench.em_framework.parameters.parameters_to_csv(parameters, file_name)¶
Helper function for writing a collection of parameters to a csv file
- Parameters
parameters (collection of Parameter instances) –
file_name (str) –
The function iterates over the collection and turns these into a data frame prior to storing them. The resulting csv can be loaded using the parameters_from_csv function. Note that currently we don’t store resolution and default attributes.
outcomes
¶
Module for outcome classes
- class ema_workbench.em_framework.outcomes.AbstractOutcome(name, kind=0, variable_name=None, function=None, expected_range=None, shape=None)¶
Base Outcome class
- Parameters
name (str) – Name of the outcome.
kind ({INFO, MINIMZE, MAXIMIZE}, optional) –
variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name
function (callable, optional) – a callable to perform postprocessing on data retrieved from model
expected_range (2 tuple, optional) – expected min and max value for outcome, used by HyperVolume convergence metric
shape ({tuple, None} optional) –
- name¶
- Type
str
- kind¶
- Type
int
- variable_name¶
- Type
str
- function¶
- Type
callable
- shape¶
- Type
tuple
- abstract classmethod from_disk(filename, archive)¶
helper function for loading from disk
- Parameters
filename (str) –
archive (Tarfile) –
- abstract classmethod to_disk(values)¶
helper function for writing outcome to disk
- Parameters
values (obj) – data to store
- Return type
BytesIO
- class ema_workbench.em_framework.outcomes.ArrayOutcome(name, variable_name=None, function=None, expected_range=None, shape=None)¶
Array Outcome class for n-dimensional collections
- Parameters
name (str) – Name of the outcome.
variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name
function (callable, optional) – a callable to perform postprocessing on data retrieved from model
expected_range (2 tuple, optional) – expected min and max value for outcome, used by HyperVolume convergence metric
shape ({tuple, None}, optional) –
- name¶
- Type
str
- kind¶
- Type
int
- variable_name¶
- Type
str
- function¶
- Type
callable
- shape¶
- Type
tuple
- expected_range¶
- Type
tuple
- classmethod from_disk(filename, archive)¶
helper function for loading from disk
- Parameters
filename (str) –
archive (Tarfile) –
- classmethod to_disk(values)¶
helper function for writing outcome to disk
- Parameters
values (ND array) –
- Returns
BytesIO
filename
- class ema_workbench.em_framework.outcomes.Constraint(name, parameter_names=None, outcome_names=None, function=None)¶
Constraints class that can be used when defining constrained optimization problems.
- Parameters
name (str) –
parameter_names (str or collection of str) –
outcome_names (str or collection of str) –
function (callable) –
- name¶
- Type
str
- parameter_names¶
name(s) of the uncertain parameter(s) and/or lever parameter(s) to which the constraint applies
- Type
str, list of str
- outcome_names¶
name(s) of the outcome(s) to which the constraint applies
- Type
str, list of str
- function¶
The function should return the distance from the feasibility threshold, given the model outputs with a variable name. The distance should be 0 if the constraint is met.
- Type
callable
- class ema_workbench.em_framework.outcomes.ScalarOutcome(name, kind=0, variable_name=None, function=None, expected_range=None)¶
Scalar Outcome class
- Parameters
name (str) – Name of the outcome.
kind ({INFO, MINIMZE, MAXIMIZE}, optional) –
variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name
function (callable, optional) – a callable to perform post processing on data retrieved from model
expected_range (collection, optional) – expected min and max value for outcome, used by HyperVolume convergence metric
- name¶
- Type
str
- kind¶
- Type
int
- variable_name¶
- Type
str
- function¶
- Type
callable
- shape¶
- Type
tuple
- expected_range¶
- Type
tuple
- classmethod from_disk(filename, archive)¶
helper function for loading from disk
- Parameters
filename (str) –
archive (Tarfile) –
- classmethod to_disk(values)¶
helper function for writing outcome to disk
- Parameters
values (1D array) –
- Returns
BytesIO
filename
- class ema_workbench.em_framework.outcomes.TimeSeriesOutcome(name, variable_name=None, function=None, expected_range=None, shape=None)¶
TimeSeries Outcome class
- Parameters
name (str) – Name of the outcome.
variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name
function (callable, optional) – a callable to perform postprocessing on data retrieved from model
expected_range (2 tuple, optional) – expected min and max value for outcome, used by HyperVolume convergence metric
shape ({tuple, None}, optional) –
- name¶
- Type
str
- kind¶
- Type
int
- variable_name¶
- Type
str
- function¶
- Type
callable
- shape¶
- Type
tuple
- expected_range¶
- Type
tuple
- classmethod from_disk(filename, archive)¶
helper function for loading from disk
- Parameters
filename (str) –
archive (Tarfile) –
- classmethod to_disk(values)¶
helper function for writing outcome to disk
- Parameters
values (DataFrame) –
- Returns
StringIO
filename
evaluators
¶
collection of evaluators for performing experiments, optimization, and robust optimization
- class ema_workbench.em_framework.evaluators.IpyparallelEvaluator(msis, client, **kwargs)¶
evaluator for using an ipypparallel pool
- evaluate_experiments(scenarios, policies, callback, combine='factorial')¶
used by ema_workbench
- finalize()¶
finalize the evaluator
- initialize()¶
initialize the evaluator
- class ema_workbench.em_framework.evaluators.MultiprocessingEvaluator(msis, n_processes=None, maxtasksperchild=None, **kwargs)¶
evaluator for experiments using a multiprocessing pool
- Parameters
msis (collection of models) –
n_processes (int (optional)) – A negative number can be inputted to use the number of logical cores minus the negative cores. For example, on a 12 thread processor, -2 results in using 10 threads.
max_tasks (int (optional)) –
note that the maximum number of available processes is either multiprocessing.cpu_count() and in case of windows, this never can be higher then 61
- evaluate_experiments(scenarios, policies, callback, combine='factorial')¶
used by ema_workbench
- finalize()¶
finalize the evaluator
- initialize()¶
initialize the evaluator
- class ema_workbench.em_framework.evaluators.Samplers(value)¶
Enum for different kinds of samplers
- class ema_workbench.em_framework.evaluators.SequentialEvaluator(msis)¶
- evaluate_experiments(scenarios, policies, callback, combine='factorial')¶
used by ema_workbench
- finalize()¶
finalize the evaluator
- initialize()¶
initialize the evaluator
- ema_workbench.em_framework.evaluators.optimize(models, algorithm=<class 'platypus.algorithms.EpsNSGAII'>, nfe=10000, searchover='levers', evaluator=None, reference=None, convergence=None, constraints=None, convergence_freq=1000, logging_freq=5, **kwargs)¶
optimize the model
- Parameters
models (1 or more Model instances) –
algorithm (a valid Platypus optimization algorithm) –
nfe (int) –
searchover ({'uncertainties', 'levers'}) –
evaluator (evaluator instance) –
reference (Policy or Scenario instance, optional) – overwrite the default scenario in case of searching over levers, or default policy in case of searching over uncertainties
convergence (function or collection of functions, optional) –
constraints (list, optional) –
convergence_freq (int) – nfe between convergence check
logging_freq (int) – number of generations between logging of progress
kwargs (any additional arguments will be passed on to algorithm) –
- Return type
pandas DataFrame
- Raises
EMAError if searchover is not one of 'uncertainties' or 'levers' –
NotImplementedError if len(models) > 1 –
- ema_workbench.em_framework.evaluators.perform_experiments(models, scenarios=0, policies=0, evaluator=None, reporting_interval=None, reporting_frequency=10, uncertainty_union=False, lever_union=False, outcome_union=False, uncertainty_sampling=Samplers.LHS, lever_sampling=Samplers.LHS, callback=None, return_callback=False, combine='factorial', log_progress=False)¶
sample uncertainties and levers, and perform the resulting experiments on each of the models
- Parameters
models (one or more AbstractModel instances) –
scenarios (int or collection of Scenario instances, optional) –
policies (int or collection of Policy instances, optional) –
evaluator (Evaluator instance, optional) –
reporting_interval (int, optional) –
reporting_frequency (int, optional) –
uncertainty_union (boolean, optional) –
lever_union (boolean, optional) –
outcome_union (boolean, optional) –
uncertainty_sampling ({LHS, MC, FF, PFF, SOBOL, MORRIS, FAST}, optional) –
lever_sampling ({LHS, MC, FF, PFF, SOBOL, MORRIS, FAST}, optional TODO:: update doc) –
callback (Callback instance, optional) –
return_callback (boolean, optional) –
log_progress (bool, optional) –
combine ({'factorial', 'zipover'}, optional) – how to combine uncertainties and levers? In case of ‘factorial’, both are sampled separately using their respective samplers. Next the resulting designs are combined in a full factorial manner. In case of ‘zipover’, both are sampled separately and then combined by cycling over the shortest of the the two sets of designs until the longest set of designs is exhausted.
- Returns
the experiments as a dataframe, and a dict with the name of an outcome as key, and the associated values as numpy array. Experiments and outcomes are aligned on index.
- Return type
tuple
optimization
¶
- class ema_workbench.em_framework.optimization.ArchiveLogger(directory, decision_varnames, outcome_varnames, base_filename='archive')¶
Helper class to write the archive to disk at each iteration
- Parameters
directory (str) –
decision_varnames (list of str) –
outcome_varnames (list of str) –
base_filename (str, optional) –
TODO:: put it in a tarbal instead of dedicated directory
- class ema_workbench.em_framework.optimization.Convergence(metrics, max_nfe, convergence_freq=1000, logging_freq=5, log_progress=False)¶
helper class for tracking convergence of optimization
- class ema_workbench.em_framework.optimization.EpsilonProgress¶
epsilon progress convergence metric class
- class ema_workbench.em_framework.optimization.HyperVolume(minimum, maximum)¶
Hypervolume convergence metric class
This metric is derived from a hyper-volume measure, which describes the multi-dimensional volume of space contained within the pareto front. When computed with minimum and maximums, it describes the ratio of dominated outcomes to all possible outcomes in the extent of the space. Getting this number to be high or low is not necessarily important, as not all outcomes within the min-max range will be feasible. But, having the hypervolume remain fairly stable over multiple generations of the evolutionary algorithm provides an indicator of convergence.
- Parameters
minimum (numpy array) –
maximum (numpy array) –
- class ema_workbench.em_framework.optimization.Problem(searchover, parameters, outcome_names, constraints, reference=None)¶
small extension to Platypus problem object, includes information on the names of the decision variables, the names of the outcomes, and the type of search
- class ema_workbench.em_framework.optimization.RobustProblem(parameters, outcome_names, scenarios, robustness_functions, constraints)¶
small extension to Problem object for robust optimization, adds the scenarios and the robustness functions
samplers
¶
This module contains various classes that can be used for specifying different types of samplers. These different samplers implement basic sampling techniques including Full Factorial sampling, Latin Hypercube sampling, and Monte Carlo sampling.
- class ema_workbench.em_framework.samplers.AbstractSampler¶
Abstract base class from which different samplers can be derived.
In the simplest cases, only the sample method needs to be overwritten. generate_designs` is the only method called from outside. The other methods are used internally to generate the designs.
- generate_designs(parameters, nr_samples)¶
external interface for Sampler. Returns the computational experiments over the specified parameters, for the given number of samples for each parameter.
- Parameters
parameters (list) – a list of parameters for which to generate the experimental designs
nr_samples (int) – the number of samples to draw for each parameter
- Returns
generator – a generator object that yields the designs resulting from combining the parameters
int – the number of experimental designs
- generate_samples(parameters, size)¶
The main method of :class: ~sampler.Sampler and its children. This will call the sample method for each of the parameters and return the resulting designs.
- Parameters
parameters (collection) – a collection of
RealParameter
,IntegerParameter
, andCategoricalParameter
instances.size (int) – the number of samples to generate.
- Returns
dict with the parameter.name as key, and the sample as value
- Return type
dict
- sample(distribution, size)¶
method for sampling a number of samples from a particular distribution. The various samplers differ with respect to their implementation of this method.
- Parameters
distribution (scipy frozen distribution) –
size (int) – the number of samples to generate
- Returns
the samples for the distribution and specified parameters
- Return type
numpy array
- class ema_workbench.em_framework.samplers.DefaultDesigns(designs, parameters, n)¶
iterable for the experimental designs
- class ema_workbench.em_framework.samplers.FullFactorialSampler¶
generates a full factorial sample.
If the parameter is non categorical, the resolution is set the number of samples. If the parameter is categorical, the specified value for samples will be ignored and each category will be used instead.
- determine_nr_of_designs(sampled_parameters)¶
Helper function for determining the number of experiments that will be generated given the sampled parameters.
- Parameters
sampled_parameters (list) – a list of sampled parameters, as the values return by generate_samples
- Returns
the total number of experimental design
- Return type
int
- generate_designs(parameters, nr_samples)¶
This method provides an alternative implementation to the default implementation provided by
Sampler
. This version returns a full factorial design across the parameters.- Parameters
parameters (list) – a list of parameters for which to generate the experimental designs
nr_samples (int) – the number of intervals to use on each Parameter. Categorical parameters always return all their categories
- Returns
generator – a generator object that yields the designs resulting from combining the parameters
int – the number of experimental designs
- generate_samples(parameters, size)¶
The main method of :class: ~sampler.Sampler and its children. This will call the sample method for each of the parameters and return the resulting samples
- Parameters
parameters (collection) – a collection of
Parameter
instancessize (int) – the number of samples to generate.
- Returns
with the paramertainty.name as key, and the sample as value
- Return type
dict
- class ema_workbench.em_framework.samplers.LHSSampler¶
generates a Latin Hypercube sample for each of the parameters
- sample(distribution, size)¶
generate a Latin Hypercube Sample.
- Parameters
distribution (scipy frozen distribution) –
size (int) – the number of samples to generate
- Returns
with the paramertainty.name as key, and the sample as value
- Return type
dict
- class ema_workbench.em_framework.samplers.MonteCarloSampler¶
generates a Monte Carlo sample for each of the parameters.
- sample(distribution, size)¶
generate a Monte Carlo Sample.
- Parameters
distribution (scipy frozen distribution) –
size (int) – the number of samples to generate
- Returns
with the paramertainty.name as key, and the sample as value
- Return type
dict
- class ema_workbench.em_framework.samplers.UniformLHSSampler¶
- generate_samples(parameters, size)¶
- Parameters
parameters (collection) –
size (int) –
- Returns
dict with the parameter.name as key, and the sample as value
- Return type
dict
- ema_workbench.em_framework.samplers.determine_parameters(models, attribute, union=True)¶
determine the parameters over which to sample
- Parameters
models (a collection of AbstractModel instances) –
attribute ({'uncertainties', 'levers'}) –
union (bool, optional) – in case of multiple models, sample over the union of levers, or over the intersection of the levers
- Return type
collection of Parameter instances
- ema_workbench.em_framework.samplers.sample_levers(models, n_samples, union=True, sampler=<ema_workbench.em_framework.samplers.LHSSampler object>)¶
generate policies by sampling over the levers
- Parameters
models (a collection of AbstractModel instances) –
n_samples (int) –
union (bool, optional) – in case of multiple models, sample over the union of levers, or over the intersection of the levers
sampler (Sampler instance, optional) –
- Return type
generator yielding Policy instances
- ema_workbench.em_framework.samplers.sample_parameters(parameters, n_samples, sampler=<ema_workbench.em_framework.samplers.LHSSampler object>, kind=<class 'ema_workbench.em_framework.points.Point'>)¶
generate cases by sampling over the parameters
- ema_workbench.em_framework.samplers.sample_uncertainties(models, n_samples, union=True, sampler=<ema_workbench.em_framework.samplers.LHSSampler object>)¶
generate scenarios by sampling over the uncertainties
- Parameters
models (a collection of AbstractModel instances) –
n_samples (int) –
union (bool, optional) – in case of multiple models, sample over the union of uncertainties, or over the intersection of the uncertainties
sampler (Sampler instance, optional) –
- Return type
generator yielding Scenario instances
salib_samplers
¶
Samplers for working with SALib
- class ema_workbench.em_framework.salib_samplers.FASTSampler(m=4)¶
Sampler generating a Fourier Amplitude Sensitivity Test (FAST) using SALib
- Parameters
m (int (default: 4)) – The interference parameter, i.e., the number of harmonics to sum in the Fourier series decomposition
- class ema_workbench.em_framework.salib_samplers.MorrisSampler(num_levels=4, optimal_trajectories=None, local_optimization=True)¶
Sampler generating a morris design using SALib
- Parameters
num_levels (int) – The number of grid levels
grid_jump (int) – The grid jump size
optimal_trajectories (int, optional) – The number of optimal trajectories to sample (between 2 and N)
local_optimization (bool, optional) – Flag whether to use local optimization according to Ruano et al. (2012) Speeds up the process tremendously for bigger N and num_levels. Stating this variable to be true causes the function to ignore gurobi.
- class ema_workbench.em_framework.salib_samplers.SobolSampler(second_order=True)¶
Sampler generating a Sobol design using SALib
- Parameters
second_order (bool, optional) – indicates whether second order effects should be included
- ema_workbench.em_framework.salib_samplers.get_SALib_problem(uncertainties)¶
returns a dict with a problem specification as required by SALib
experiment_runner
¶
helper module for running experiments and keeping track of which model has been initialized with which policy.
- class ema_workbench.em_framework.experiment_runner.ExperimentRunner(msis)¶
Helper class for running the experiments
This class contains the logic for initializing models properly, running the experiment, getting the results, and cleaning up afterwards.
- Parameters
msis (dict) –
model_kwargs (dict) –
- msi_initializiation¶
keeps track of which model is initialized with which policy.
- Type
dict
- msis¶
models indexed by name
- Type
dict
- model_kwargs¶
keyword arguments for model_init
- Type
dict
- run_experiment(experiment)¶
The logic for running a single experiment. This code makes sure that model(s) are initialized correctly.
- Parameters
experiment (Case instance) –
- Returns
experiment_id (int)
case (dict)
policy (str)
model_name (str)
result (dict)
- Raises
EMAError – if the model instance raises an EMA error, these are reraised.
Exception – Catch all for all other exceptions being raised by the model. These are reraised.
callbacks
¶
This module provides an abstract base class for a callback and a default implementation.
If you want to store the data in a way that is different from the functionality provided by the default callback, you can write your own extension of callback. For example, you can easily implement a callback that stores the data in e.g. a NoSQL file.
The only method to implement is the __call__ magic method. To use logging of progress, always call super.
- class ema_workbench.em_framework.callbacks.AbstractCallback(uncertainties, outcomes, levers, nr_experiments, reporting_interval=None, reporting_frequency=10, log_progress=False)¶
Abstract base class from which different call back classes can be derived. Callback is responsible for storing the results of the runs.
- Parameters
uncs (list) – a list of the parameters over which the experiments are being run.
outcomes (list) – a list of outcomes
nr_experiments (int) – the total number of experiments to be executed
reporting_interval (int, optional) – the interval at which to provide progress information via logging.
reporting_frequency (int, optional) – the total number of progress logs
- i¶
a counter that keeps track of how many experiments have been saved
- Type
int
- reporting_interval¶
the interval between progress logs
- Type
int,
- abstract get_results()¶
method for retrieving the results. Called after all experiments have been completed. Any extension of AbstractCallback needs to implement this method.
- class ema_workbench.em_framework.callbacks.DefaultCallback(uncs, levers, outcomes, nr_experiments, reporting_interval=100, reporting_frequency=10, log_progress=False)¶
default callback system callback can be used in perform_experiments as a means for specifying the way in which the results should be handled. If no callback is specified, this default implementation is used. This one can be overwritten or replaced with a callback of your own design. For example if you prefer to store the result in a database or write them to a text file
- get_results()¶
method for retrieving the results. Called after all experiments have been completed. Any extension of AbstractCallback needs to implement this method.
- class ema_workbench.em_framework.callbacks.FileBasedCallback(uncs, levers, outcomes, nr_experiments, reporting_interval=100, reporting_frequency=10)¶
Callback that stores data in csv files while running
- Parameters
uncs (collection of Parameter instances) –
levers (collection of Parameter instances) –
outcomes (collection of Outcome instances) –
nr_experiments (int) –
reporting_interval (int, optional) –
reporting_frequency (int, optional) –
./temp (the data is stored in) –
current (relative to the) –
exists (working directory. If this directory already) –
be (it will) –
overwritten. –
Warning
This class is still in beta. API is expected to change over the coming months.
- get_results()¶
method for retrieving the results. Called after all experiments have been completed. Any extension of AbstractCallback needs to implement this method.
points
¶
classes for representing points in parameter space, as well as associated hellper functions
- class ema_workbench.em_framework.points.Experiment(name, model_name, policy, scenario, experiment_id)¶
A convenience object that contains a specification of the model, policy, and scenario to run
- name¶
- Type
str
- model_name¶
- Type
str
- policy¶
- Type
Policy instance
- scenario¶
- Type
Scenario instance
- experiment_id¶
- Type
int
- class ema_workbench.em_framework.points.ExperimentReplication(scenario, policy, constants, replication=None)¶
helper class that combines scenario, policy, any constants, and replication information (seed etc) into a single dictionary.
This class represent the complete specification of parameters to run for a given experiment.
- class ema_workbench.em_framework.points.Policy(name=None, **kwargs)¶
Helper class representing a policy
- name¶
- Type
str, int, or float
- id¶
- Type
int
- all keyword arguments are wrapped into a dict.
- class ema_workbench.em_framework.points.Scenario(name=None, **kwargs)¶
Helper class representing a scenario
- name¶
- Type
str, int, or float
- id¶
- Type
int
- all keyword arguments are wrapped into a dict.
- ema_workbench.em_framework.points.combine_cases_factorial(*point_collections)¶
Combine collections of cases in a full factorial manner
- Parameters
point_collections (collection of collections of Point instances) –
- Yields
Point
- ema_workbench.em_framework.points.combine_cases_sampling(*point_collection)¶
Combine collections of cases by iterating over the longest collection while sampling with replacement from the others
- Parameters
point_collection (collection of collection of Point instances) –
- Yields
Point
- ema_workbench.em_framework.points.experiment_generator(scenarios, model_structures, policies, combine='factorial')¶
generator function which yields experiments
- Parameters
scenarios (iterable of dicts) –
model_structures (list) –
policies (list) –
{'factorial (combine =) – controls how to combine scenarios, policies, and model_structures into experiments.
sample'} – controls how to combine scenarios, policies, and model_structures into experiments.
Notes
if combine is ‘factorial’ then this generator is essentially three nested loops: for each model structure, for each policy, for each scenario, return the experiment. This means that designs should not be a generator because this will be exhausted after the running the first policy on the first model. if combine is ‘zipover’ then this generator cycles over scenarios, policies and model structures until the longest of the three collections is exhausted.
ema_multiprocessing
¶
support for using the multiprocessing library in combination with the workbench
- ema_workbench.em_framework.ema_multiprocessing.setup_working_directories(models, root_dir)¶
copies the working directory of each model to a process specific temporary directory and update the working directory of the model
- Parameters
models (list) –
root_dir (str) –
ema_ipyparallel
¶
This module provides functionality for combining the EMA workbench with IPython parallel.
- class ema_workbench.em_framework.ema_ipyparallel.Engine(engine_id, msis, cwd)¶
class for setting up ema specific stuff on each engine also functions as a convenient namespace for workbench relevant variables
- Parameters
engine_id (int) –
msis (list) –
cwd (str) –
- cleanup_working_directory()¶
remove the root working directory of the engine
- run_experiment(experiment)¶
run the experiment, the actual running is delegated to an ExperimentRunner instance
- class ema_workbench.em_framework.ema_ipyparallel.EngingeLoggerAdapter(logger, topic)¶
LoggerAdapter that inserts EMA as a topic into log messages
- Parameters
logger (logger instance) –
topic (str) –
- process(msg, kwargs)¶
Process the logging message and keyword arguments passed in to a logging call to insert contextual information. You can either manipulate the message itself, the keyword args or both. Return the message and kwargs modified (or not) to suit your needs.
Normally, you’ll only need to override this one method in a LoggerAdapter subclass for your specific needs.
- class ema_workbench.em_framework.ema_ipyparallel.LogWatcher(**kwargs: Any)¶
A simple class that receives messages on a SUB socket, as published by subclasses of zmq.log.handlers.PUBHandler, and logs them itself.
This can subscribe to multiple topics, but defaults to all topics.
- log_message(raw)¶
receive and parse a message, then log it.
- subscribe()¶
Update our SUB socket’s subscriptions.
- ema_workbench.em_framework.ema_ipyparallel.cleanup(client)¶
cleanup directory tree structure on all engines
- ema_workbench.em_framework.ema_ipyparallel.get_engines_by_host(client)¶
returns the engine ids by host
- Parameters
client (IPython.parallel.Client instance) –
- Returns
a dict with hostnames as keys, and a list of engine ids
- Return type
dict
- ema_workbench.em_framework.ema_ipyparallel.initialize_engines(client, msis, cwd)¶
initialize engine instances on all engines
- Parameters
client (IPython.parallel.Client) –
msis (dict) – dict of model structure interfaces with their names as keys
cwd (str) –
- ema_workbench.em_framework.ema_ipyparallel.set_engine_logger()¶
Updates EMA logging on the engines with an EngineLoggerAdapter This adapter injects EMA as a topic into all messages
- ema_workbench.em_framework.ema_ipyparallel.start_logwatcher()¶
convenience function for starting the LogWatcher
- Returns
LogWatcher – the log watcher instance
Thread – the log watcher thread
.. note (there can only be one log watcher on a given url.)
- ema_workbench.em_framework.ema_ipyparallel.update_cwd_on_all_engines(client)¶
updates the current working directory on the engines to point to the same working directory as this notebook
currently only works if engines are on same machine.
- Parameters
client (IPython.parallel.Client instance) –
util
¶
utilities used throughout em_framework
- class ema_workbench.em_framework.util.Counter(startfrom=0)¶
helper function for generating counter based names for NamedDicts
- class ema_workbench.em_framework.util.ProgressTrackingMixIn(N, reporting_interval, logger, log_progress=False, log_func=<function ProgressTrackingMixIn.<lambda>>)¶
Mixin for monitoring progress
- Parameters
N (int) – total number of experiments
reporting_interval (int) – nfe between logging progress
logger (logger instance) –
log_progress (bool, optional) –
log_func (callable, optional) – function called with self as only argument, should invoke self._logger with custom log message
- i¶
- Type
int
- reporting_interval¶
- Type
int
- log_progress¶
- Type
bool
- log_func¶
- Type
callable
- pbar¶
if log_progress is true, None, if false tqdm.tqdm instance
- Type
{None, tqdm.tqdm instance}
- ema_workbench.em_framework.util.combine(*args)¶
combine scenario and policy into a single experiment dict
- Parameters
args (two or more dicts that need to be combined) –
- Return type
a single unified dict containing the entries from all dicts
- Raises
EMAError – if a keyword argument exists in more than one dict
- ema_workbench.em_framework.util.representation(named_dict)¶
helper function for generating repr based names for NamedDicts
Connectors¶
vensim
¶
vensimDLLwrapper
¶
pysd_connector
¶
pysd connector
- class ema_workbench.connectors.pysd_connector.PysdModel(name=None, mdl_file=None)¶
netlogo
¶
simio
¶
excel
¶
This module provides a base class that can be used to perform EMA on Excel models. It relies on win32com
- class ema_workbench.connectors.excel.ExcelModel(name, wd=None, model_file=None, default_sheet=None, pointers=None)¶
Analysis¶
prim
¶
A scenario discovery oriented implementation of PRIM.
The implementation of prim provided here is data type aware, so categorical variables will be handled appropriately. It also uses a non-standard objective function in the peeling and pasting phase of the algorithm. This algorithm looks at the increase in the mean divided by the amount of data removed. So essentially, it uses something akin to the first order derivative of the original objective function.
The implementation is designed for interactive use in combination with the jupyter notebook.
- class ema_workbench.analysis.prim.PRIMObjectiveFunctions(value)¶
An enumeration.
- class ema_workbench.analysis.prim.Prim(x, y, threshold, obj_function=PRIMObjectiveFunctions.LENIENT1, peel_alpha=0.05, paste_alpha=0.05, mass_min=0.05, threshold_type=1, mode=RuleInductionType.BINARY, update_function='default')¶
Patient rule induction algorithm
The implementation of Prim is tailored to interactive use in the context of scenario discovery
- Parameters
x (DataFrame) – the independent variables
y (1d ndarray) – the dependent variable
threshold (float) – the density threshold that a box has to meet
obj_function ({LENIENT1, LENIENT2, ORIGINAL}) – the objective function used by PRIM. Defaults to a lenient objective function based on the gain of mean divided by the loss of mass.
peel_alpha (float, optional) – parameter controlling the peeling stage (default = 0.05).
paste_alpha (float, optional) – parameter controlling the pasting stage (default = 0.05).
mass_min (float, optional) – minimum mass of a box (default = 0.05).
threshold_type ({ABOVE, BELOW}) – whether to look above or below the threshold value
mode ({RuleInductionType.BINARY, RuleInductionType.REGRESSION}, optional) – indicated whether PRIM is used for regression, or for scenario classification in which case y should be a binary vector
{'default' (update_function =) – controls behavior of PRIM after having found a first box. use either the default behavior were all points are removed, or the procedure suggested by guivarch et al (2016) doi:10.1016/j.envsoft.2016.03.006 to simply set all points to be no longer of interest (only valid in binary mode).
'guivarch'} – controls behavior of PRIM after having found a first box. use either the default behavior were all points are removed, or the procedure suggested by guivarch et al (2016) doi:10.1016/j.envsoft.2016.03.006 to simply set all points to be no longer of interest (only valid in binary mode).
optional – controls behavior of PRIM after having found a first box. use either the default behavior were all points are removed, or the procedure suggested by guivarch et al (2016) doi:10.1016/j.envsoft.2016.03.006 to simply set all points to be no longer of interest (only valid in binary mode).
See also
cart
- property boxes¶
Property for getting a list of box limits
- determine_coi(indices)¶
Given a set of indices on y, how many cases of interest are there in this set.
- Parameters
indices (ndarray) – a valid index for y
- Returns
the number of cases of interest.
- Return type
int
- Raises
ValueError – if threshold_type is not either ABOVE or BELOW
- find_box()¶
Execute one iteration of the PRIM algorithm. That is, find one box, starting from the current state of Prim.
- property stats¶
property for getting a list of dicts containing the statistics for each box
- class ema_workbench.analysis.prim.PrimBox(prim, box_lims, indices)¶
A class that holds information for a specific box
- coverage¶
coverage of currently selected box
- Type
float
- density¶
density of currently selected box
- Type
float
- mean¶
mean of currently selected box
- Type
float
- res_dim¶
number of restricted dimensions of currently selected box
- Type
int
- mass¶
mass of currently selected box
- Type
float
- peeling_trajectory¶
stats for each box in peeling trajectory
- Type
DataFrame
- box_lims¶
list of box lims for each box in peeling trajectory
- Type
list
by default, the currently selected box is the last box on the peeling trajectory, unless this is changed via
PrimBox.select()
.- drop_restriction(uncertainty='', i=-1)¶
Drop the restriction on the specified dimension for box i
- Parameters
i (int, optional) – defaults to the currently selected box, which defaults to the latest box on the trajectory
uncertainty (str) –
Replace the limits in box i with a new box where for the specified uncertainty the limits of the initial box are being used. The resulting box is added to the peeling trajectory.
- inspect(i=None, style='table', **kwargs)¶
Write the stats and box limits of the user specified box to standard out. if i is not provided, the last box will be printed
- Parameters
i (int or list of ints, optional) – the index of the box, defaults to currently selected box
style ({'table', 'graph', 'data'}) – the style of the visualization. ‘table’ prints the stats and boxlim. ‘graph’ creates a figure. ‘data’ returns a list of tuples, where each tuple contains the stats and the box_lims.
that (additional kwargs are passed to the helper function) –
graph (generates the table or) –
- resample(i=None, iterations=10, p=0.5)¶
Calculate resample statistics for candidate box i
- Parameters
i (int, optional) –
iterations (int, optional) –
p (float, optional) –
- Return type
DataFrame
- select(i)¶
select an entry from the peeling and pasting trajectory and update the prim box to this selected box.
- Parameters
i (int) – the index of the box to select.
- show_pairs_scatter(i=None, dims=None, cdf=False)¶
Make a pair wise scatter plot of all the restricted dimensions with color denoting whether a given point is of interest or not and the boxlims superimposed on top.
- Parameters
i (int, optional) –
dims (list of str, optional) – dimensions to show, defaults to all restricted dimensions
cdf (bool, optional) – plot diag as cdf or pdf
- Return type
seaborn PairGrid
- show_ppt()¶
show the peeling and pasting trajectory in a figure
- show_tradeoff(cmap=<matplotlib.colors.ListedColormap object>, annotated=False)¶
Visualize the trade off between coverage and density. Color is used to denote the number of restricted dimensions.
- Parameters
cmap (valid matplotlib colormap) –
annotated (bool, optional. Shows point labels if True.) –
- Return type
a Figure instance
- update(box_lims, indices)¶
update the box to the provided box limits.
- Parameters
box_lims (DataFrame) – the new box_lims
indices (ndarray) – the indices of y that are inside the box
- write_ppt_to_stdout()¶
write the peeling and pasting trajectory to stdout
- ema_workbench.analysis.prim.pca_preprocess(experiments, y, subsets=None, exclude={})¶
perform PCA to preprocess experiments before running PRIM
Pre-process the data by performing a pca based rotation on it. This effectively turns the algorithm into PCA-PRIM as described in Dalal et al (2013)
- Parameters
experiments (DataFrame) –
y (ndarray) – one dimensional binary array
subsets (dict, optional) – expects a dictionary with group name as key and a list of uncertainty names as values. If this is used, a constrained PCA-PRIM is executed
exclude (list of str, optional) – the uncertainties that should be excluded from the rotation
- Returns
rotated_experiments – DataFrame
rotation_matrix – DataFrame
- Raises
RuntimeError – if mode is not binary (i.e. y is not a binary classification). if X contains non numeric columns
- ema_workbench.analysis.prim.run_constrained_prim(experiments, y, issignificant=True, **kwargs)¶
Run PRIM repeatedly while constraining the maximum number of dimensions available in x
Improved usage of PRIM as described in Kwakkel (2019).
- Parameters
x (DataFrame) –
y (numpy array) –
issignificant (bool, optional) – if True, run prim only on subsets of dimensions that are significant for the initial PRIM on the entire dataset.
**kwargs (any additional keyword arguments are passed on to PRIM) –
- Return type
PrimBox instance
- ema_workbench.analysis.prim.setup_prim(results, classify, threshold, incl_unc=[], **kwargs)¶
Helper function for setting up the prim algorithm
- Parameters
results (tuple) – tuple of DataFrame and dict with numpy arrays the return from
perform_experiments()
.classify (str or callable) – either a string denoting the outcome of interest to use or a function.
threshold (double) – the minimum score on the density of the last box on the peeling trajectory. In case of a binary classification, this should be between 0 and 1.
incl_unc (list of str, optional) – list of uncertainties to include in prim analysis
kwargs (dict) – valid keyword arguments for prim.Prim
- Return type
a Prim instance
- Raises
PrimException – if data resulting from classify is not a 1-d array.
TypeError – if classify is not a string or a callable.
cart
¶
A scenario discovery oriented implementation of CART. It essentially is a wrapper around scikit-learn’s version of CART.
- class ema_workbench.analysis.cart.CART(x, y, mass_min=0.05, mode=RuleInductionType.BINARY)¶
CART algorithm
can be used in a manner similar to PRIM. It provides access to the underlying tree, but it can also show the boxes described by the tree in a table or graph form similar to prim.
- Parameters
x (DataFrame) –
y (1D ndarray) –
mass_min (float, optional) – a value between 0 and 1 indicating the minimum fraction of data points in a terminal leaf. Defaults to 0.05, identical to prim.
mode ({BINARY, CLASSIFICATION, REGRESSION}) – indicates the mode in which CART is used. Binary indicates binary classification, classification is multiclass, and regression is regression.
- boxes¶
list of DataFrame box lims
- Type
list
- stats¶
list of dicts with stats
- Type
list
Notes
This class is a wrapper around scikit-learn’s CART algorithm. It provides an interface to CART that is more oriented towards scenario discovery, and shared some methods with PRIM
See also
prim
- property boxes¶
rtype: list with boxlims for each terminal leaf
- build_tree()¶
train CART on the data
- show_tree(mplfig=True, format='png')¶
return a png (defaults) or svg of the tree
On Windows, graphviz needs to be installed with conda.
- Parameters
mplfig (bool, optional) – if true (default) returns a matplotlib figure with the tree, otherwise, it returns the output as bytes
format ({'png', 'svg'}, default 'png') – Gives a format of the output.
- property stats¶
rtype: list with scenario discovery statistics for each terminal leaf
- ema_workbench.analysis.cart.setup_cart(results, classify, incl_unc=None, mass_min=0.05)¶
helper function for performing cart in combination with data generated by the workbench.
- Parameters
results (tuple of DataFrame and dict with numpy arrays) – the return from
perform_experiments()
.classify (string, function or callable) – either a string denoting the outcome of interest to use or a function.
incl_unc (list of strings, optional) –
mass_min (float, optional) –
- Raises
TypeError – if classify is not a string or a callable.
clusterer
¶
This module provides time series clustering functionality using complex invariant distance. For details see Steinmann et al (2020)
- ema_workbench.analysis.clusterer.apply_agglomerative_clustering(distances, n_clusters, linkage='average')¶
apply agglomerative clustering to the distances
- Parameters
distances (ndarray) –
n_clusters (int) –
linkage ({'average', 'complete', 'single'}) –
- Return type
1D ndarray with cluster assignment
- ema_workbench.analysis.clusterer.calculate_cid(data, condensed_form=False)¶
calculate the complex invariant distance between all rows
- Parameters
data (2d ndarray) –
condensed_form (bool, optional) –
- Returns
a 2D ndarray with the distances between all time series, or condensed form similar to scipy.spatial.distance.pdist¶
- Return type
distances
- ema_workbench.analysis.clusterer.plot_dendrogram(distances)¶
plot dendrogram for distances
logistic_regression
¶
This module implements logistic regression for scenario discovery.
The module draws its inspiration from Quinn et al (2018) 10.1029/2018WR022743 and Lamontagne et al (2019). The implementation here generalizes their work and embeds it in a more typical scenario discovery workflow with a posteriori selection of the appropriate number of dimensions to include. It is modeled as much as possible on the api used for PRIM and CART.
- class ema_workbench.analysis.logistic_regression.Logit(x, y, threshold=0.95)¶
Implements an interactive version of logistic regression using BIC based forward selection
- Parameters
x (DataFrame) –
y (numpy Array) –
threshold (float) –
- coverage¶
coverage of currently selected model
- Type
float
- density¶
density of currently selected model
- Type
float
- res_dim¶
number of restricted dimensions of currently selected model
- Type
int
- peeling_trajectory¶
stats for each model in peeling trajectory
- Type
DataFrame
- models¶
list of models associated with each model on the peeling trajectory
- Type
list
- inspect(i, step=0.1)¶
Inspect one of the models by showing the threshold tradeoff and summary2
- Parameters
i (int) –
step (float between [0, 1]) –
- plot_pairwise_scatter(i, threshold=0.95)¶
plot pairwise scatter plot of data points, with contours as background
- Parameters
i (int) –
threshold (float) –
- Return type
Figure instance
The lower triangle background is a binary contour based on the specified threshold. All axis not shown are set to a default value in the middle of their range
The upper triangle shows a contour map with the conditional probability, again setting all non shown dimensions to a default value in the middle of their range.
- run()¶
run logistic regression using forward selection using a Bayesian Information Criterion for selecting whether and if so which dimension to add
- show_threshold_tradeoff(i, cmap=<matplotlib.colors.ListedColormap object>, step=0.1)¶
Visualize the trade off between coverage and density for a given model i across the range of threshold values
- Parameters
i (int) –
cmap (valid matplotlib colormap) –
step (float, optional) –
- Return type
a Figure instance
- show_tradeoff(cmap=<matplotlib.colors.ListedColormap object>, annotated=False)¶
Visualize the trade off between coverage and density. Color is used to denote the number of restricted dimensions.
- Parameters
cmap (valid matplotlib colormap) –
annotated (bool, optional. Shows point labels if True.) –
- Return type
a Figure instance
- update(model, selected)¶
helper function for adding a model to the collection of models and update the associated attributes
- Parameters
model (statsmodel fitted logit model) –
selected (list of str) –
dimensional_stacking
¶
This module provides functionality for doing dimensional stacking of uncertain factors in order to reveal patterns in the values for a single outcome of interests. It is inspired by the work reported here with one deviation.
Rather than using association rules to identify the uncertain factors to use, this code uses random forest based feature scoring instead. It is also possible to use the code provided here in combination with any other feature scoring or factor prioritization technique instead, or by simply selecting uncertain factors in some other manner.
- ema_workbench.analysis.dimensional_stacking.create_pivot_plot(x, y, nr_levels=3, labels=True, categories=True, nbins=3, bin_labels=False)¶
convenience function for easily creating a pivot plot
- Parameters
x (DataFrame) –
y (1d ndarray) –
nr_levels (int, optional) – the number of levels in the pivot table. The number of uncertain factors included in the pivot table is two times the number of levels.
labels (bool, optional) – display names of uncertain factors
categories (bool, optional) – display category names for each uncertain factor
nbins (int, optional) – number of bins to use when discretizing continuous uncertain factors
bin_labels (bool, optional) – if True show bin interval / name, otherwise show only a number
- Return type
Figure
This function performs feature scoring using random forests, selects a number of high scoring factors based on the specified number of levels, creates a pivot table, and visualizes the table. This is a convenience function. For more control over the process, use the code in this function as a template.
regional_sa
¶
Module offers support for performing basic regional sensitivity analysis. The module can be used to perform regional sensitivity analysis on all uncertainties specified in the experiment array, as well as the ability to zoom in on any given uncertainty in more detail.
- ema_workbench.analysis.regional_sa.plot_cdfs(x, y, ccdf=False)¶
plot cumulative density functions for each column in x, based on the classification specified in y.
- Parameters
x (DataFrame) – the experiments to use in the cdfs
y (ndaray) – the categorization for the data
ccdf (bool, optional) – if true, plot a complementary cdf instead of a normal cdf.
- Return type
a matplotlib Figure instance
feature_scoring
¶
Feature scoring functionality
- ema_workbench.analysis.feature_scoring.CHI2(X, y)¶
Compute chi-squared stats between each non-negative feature and class.
This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.
Recall that the chi-square test measures dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.
Read more in the User Guide.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Sample vectors.
y (array-like of shape (n_samples,)) – Target vector (class labels).
- Returns
chi2 (ndarray of shape (n_features,)) – Chi2 statistics for each feature.
p_values (ndarray of shape (n_features,)) – P-values for each feature.
Notes
Complexity of this algorithm is O(n_classes * n_features).
See also
f_classif
ANOVA F-value between label/feature for classification tasks.
f_regression
F-value between label/feature for regression tasks.
- ema_workbench.analysis.feature_scoring.F_CLASSIFICATION(X, y)¶
Compute the ANOVA F-value for the provided sample.
Read more in the User Guide.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The set of regressors that will be tested sequentially.
y (ndarray of shape (n_samples,)) – The target vector.
- Returns
f_statistic (ndarray of shape (n_features,)) – F-statistic for each feature.
p_values (ndarray of shape (n_features,)) – P-values associated with the F-statistic.
See also
chi2
Chi-squared stats of non-negative features for classification tasks.
f_regression
F-value between label/feature for regression tasks.
- ema_workbench.analysis.feature_scoring.F_REGRESSION(X, y, *, center=True)¶
Univariate linear regression tests returning F-statistic and p-values.
Quick linear model for testing the effect of a single regressor, sequentially for many regressors.
This is done in 2 steps:
The cross correlation between each regressor and the target is computed, that is, ((X[:, i] - mean(X[:, i])) * (y - mean_y)) / (std(X[:, i]) * std(y)) using r_regression function.
It is converted to an F score and then to a p-value.
f_regression()
is derived fromr_regression()
and will rank features in the same order if all the features are positively correlated with the target.Note however that contrary to
f_regression()
,r_regression()
values lie in [-1, 1] and can thus be negative.f_regression()
is therefore recommended as a feature selection criterion to identify potentially predictive feature for a downstream classifier, irrespective of the sign of the association with the target variable.Furthermore
f_regression()
returns p-values whiler_regression()
does not.Read more in the User Guide.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data matrix.
y (array-like of shape (n_samples,)) – The target vector.
center (bool, default=True) – Whether or not to center the data matrix X and the target vector y. By default, X and y will be centered.
- Returns
f_statistic (ndarray of shape (n_features,)) – F-statistic for each feature.
p_values (ndarray of shape (n_features,)) – P-values associated with the F-statistic.
See also
r_regression
Pearson’s R between label/feature for regression tasks.
f_classif
ANOVA F-value between label/feature for classification tasks.
chi2
Chi-squared stats of non-negative features for classification tasks.
SelectKBest
Select features based on the k highest scores.
SelectFpr
Select features based on a false positive rate test.
SelectFdr
Select features based on an estimated false discovery rate.
SelectFwe
Select features based on family-wise error rate.
SelectPercentile
Select features based on percentile of the highest scores.
- ema_workbench.analysis.feature_scoring.get_ex_feature_scores(x, y, mode=RuleInductionType.CLASSIFICATION, nr_trees=100, max_features=None, max_depth=None, min_samples_split=2, min_samples_leaf=None, min_weight_fraction_leaf=0, max_leaf_nodes=None, bootstrap=True, oob_score=True, random_state=None)¶
Get feature scores using extra trees
- Parameters
x (DataFrame) –
y (1D nd.array) –
mode ({RuleInductionType.CLASSIFICATION, RuleInductionType.REGRESSION}) –
nr_trees (int, optional) – nr. of trees in forest (default=250)
max_features (int, float, string or None, optional) – by default, it will use number of features/3, following Jaxa-Rozen & Kwakkel (2018) doi: 10.1016/j.envsoft.2018.06.011 see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
max_depth (int, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
min_samples_split (int, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
min_samples_leaf (int, optional) – defaults to 1 for N=1000 or lower, from there on proportional to sqrt of N (see discussion in Jaxa-Rozen & Kwakkel (2018) doi: 10.1016/j.envsoft.2018.06.011) see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
min_weight_fraction_leaf (float, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
max_leaf_nodes (int or None, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
bootstrap (bool, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
oob_score (bool, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
random_state (int, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
- Returns
pandas DataFrame – sorted in descending order of tuples with uncertainty and feature scores
object – either ExtraTreesClassifier or ExtraTreesRegressor
- ema_workbench.analysis.feature_scoring.get_feature_scores_all(x, y, alg='extra trees', mode=RuleInductionType.REGRESSION, **kwargs)¶
perform feature scoring for all outcomes using the specified feature scoring algorithm
- Parameters
x (DataFrame) –
y (dict of 1d numpy arrays) – the outcomes, with a string as key, and a 1D array for each outcome
alg ({'extra trees', 'random forest', 'univariate'}, optional) –
mode ({RuleInductionType.REGRESSION, RuleInductionType.CLASSIFICATION}, optional) –
kwargs (dict, optional) – any remaining keyword arguments will be passed to the specific feature scoring algorithm
- Return type
DataFrame instance
- ema_workbench.analysis.feature_scoring.get_rf_feature_scores(x, y, mode=RuleInductionType.CLASSIFICATION, nr_trees=250, max_features='sqrt', max_depth=None, min_samples_split=2, min_samples_leaf=1, bootstrap=True, oob_score=True, random_state=None)¶
Get feature scores using a random forest
- Parameters
x (DataFram,e) –
y (1D nd.array) –
mode ({RuleInductionType.CLASSIFICATION, RuleInductionType.REGRESSION}) –
nr_trees (int, optional) – nr. of trees in forest (default=250)
max_features (int, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
max_depth (int, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
min_samples (int, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
min_samples_leaf (int, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
bootstrap (bool, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
oob_score (bool, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
random_state (int, optional) – see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
- Returns
pandas DataFrame – sorted in descending order of tuples with uncertainty and feature scores
object – either RandomForestClassifier or RandomForestRegressor
- ema_workbench.analysis.feature_scoring.get_univariate_feature_scores(x, y, score_func=<function f_classif>)¶
calculate feature scores using univariate statistical tests. In case of categorical data, chi square or the Anova F value is used. In case of continuous data the Anova F value is used.
- Parameters
x (DataFrame) –
y (1D nd.array) –
score_func ({F_CLASSIFICATION, F_REGRESSION, CHI2}) – the score function to use, one of f_regression (regression), or f_classification or chi2 (classification).
- Returns
sorted in descending order of tuples with uncertainty and feature scores (i.e. p values in this case).
- Return type
pandas DataFrame
plotting
¶
this module provides functions for generating some basic figures. The code can be used as is, or serve as an example for writing your own code.
- ema_workbench.analysis.plotting.envelopes(experiments, outcomes, outcomes_to_show=None, group_by=None, grouping_specifiers=None, density=None, fill=False, legend=True, titles={}, ylabels={}, log=False)¶
Make envelop plots.
An envelope shows over time the minimum and maximum value for a set of runs over time. It is thus to be used in case of time series data. The function will try to find a result labeled “TIME”. If this is present, these values will be used on the X-axis. In case of Vensim models, TIME is present by default.
- Parameters
experiments (DataFrame) –
outcomes (OutcomesDict) –
outcomes_to_show –
str (list of) –
str –
optional –
group_by (str, optional) – name of the column in the experimentsto group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.
grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.
density ({None, HIST, KDE, VIOLIN, BOXPLOT}, optional) –
fill (bool, optional) –
legend (bool, optional) –
titles (dict, optional) – a way for controlling whether each of the axes should have a title. There are three possibilities. If set to None, no title will be shown for any of the axes. If set to an empty dict, the default, the title is identical to the name of the outcome of interest. If you want to override these default names, provide a dict with the outcome of interest as key and the desired title as value. This dict need only contain the outcomes for which you want to use a different title.
ylabels (dict, optional) – way for controlling the ylabels. Works identical to titles.
log (bool, optional) – log scale density plot
- Returns
Figure (Figure instance)
axes (dict) – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density.
Note
the current implementation is limited to seven different categories in case of group_by, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.
Examples
>>> import util as util >>> data = util.load_results(r'1000 flu cases.cPickle') >>> envelopes(data, group_by='policy')
will show an envelope for three three different policies, for all the outcomes of interest. while
>>> envelopes(data, group_by='policy', categories=['static policy', 'adaptive policy'])
will only show results for the two specified policies, ignoring any results associated with ‘no policy’.
- ema_workbench.analysis.plotting.kde_over_time(experiments, outcomes, outcomes_to_show=None, group_by=None, grouping_specifiers=None, colormap='viridis', log=True)¶
Plot a KDE over time. The KDE is is visualized through a heatmap
- Parameters
experiments (DataFrame) –
outcomes (OutcomesDict) –
outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot. If empty, all outcomes are plotted. Note: just names.
group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.
grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.
colormap (str, optional) – valid matplotlib color map name
log (bool, optional) –
- Returns
list of Figure instances – a figure instance for each group for each outcome
dict – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density
- ema_workbench.analysis.plotting.lines(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, density='', legend=True, titles={}, ylabels={}, experiments_to_show=None, show_envelope=False, log=False)¶
This function takes the results from
perform_experiments()
and visualizes these as line plots. It is thus to be used in case of time series data. The function will try to find a result labeled “TIME”. If this is present, these values will be used on the X-axis. In case of Vensim models, TIME is present by default.- Parameters
experiments (DataFrame) –
outcomes (OutcomesDict) –
outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot. If empty, all outcomes are plotted. Note: just names.
group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.
grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.
density ({None, HIST, KDE, VIOLIN, BOXPLOT}, optional) –
legend (bool, optional) –
titles (dict, optional) – a way for controlling whether each of the axes should have a title. There are three possibilities. If set to None, no title will be shown for any of the axes. If set to an empty dict, the default, the title is identical to the name of the outcome of interest. If you want to override these default names, provide a dict with the outcome of interest as key and the desired title as value. This dict need only contain the outcomes for which you want to use a different title.
ylabels (dict, optional) – way for controlling the ylabels. Works identical to titles.
experiments_to_show (ndarray, optional) – indices of experiments to show lines for, defaults to None.
show_envelope (bool, optional) – show envelope of outcomes. This envelope is the based on the minimum at each column and the maximum at each column.
log (bool, optional) – log scale density plot
- Returns
fig (Figure instance)
axes (dict) – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density.
Note
the current implementation is limited to seven different categories in case of group_by, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.
- ema_workbench.analysis.plotting.multiple_densities(experiments, outcomes, points_in_time=None, outcomes_to_show=None, group_by=None, grouping_specifiers=None, density=Density.KDE, legend=True, titles={}, ylabels={}, experiments_to_show=None, plot_type=PlotType.ENVELOPE, log=False, **kwargs)¶
Make an envelope plot with multiple density plots over the run time
- Parameters
experiments (DataFrame) –
outcomes (OutcomesDict) –
points_in_time (list) – a list of points in time for which you want to see the density. At the moment up to 6 points in time are supported
outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot. If empty, all outcomes are plotted. Note: just names.
group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.
grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.
density ({Density.KDE, Density.HIST, Density.VIOLIN, Density.BOXPLOT},) – optional
legend (bool, optional) –
titles (dict, optional) – a way for controlling whether each of the axes should have a title. There are three possibilities. If set to None, no title will be shown for any of the axes. If set to an empty dict, the default, the title is identical to the name of the outcome of interest. If you want to override these default names, provide a dict with the outcome of interest as key and the desired title as value. This dict need only contain the outcomes for which you want to use a different title.
ylabels (dict, optional) – way for controlling the ylabels. Works identical to titles.
experiments_to_show (ndarray, optional) – indices of experiments to show lines for, defaults to None.
plot_type ({PlotType.ENVELOPE, PlotType.ENV_LIN, PlotType.LINES}, optional) –
log (bool, optional) –
- Returns
fig (Figure instance)
axes (dict) – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density.
Note
the current implementation is limited to seven different categories in case of group_by, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.
Note
the connection patches are for some reason not drawn if log scaling is used for the density plots. This appears to be an issue in matplotlib itself.
pairs_plotting
¶
This module provides R style pairs plotting functionality.
- ema_workbench.analysis.pairs_plotting.pairs_density(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, ylabels={}, point_in_time=-1, log=True, gridsize=50, colormap='coolwarm', filter_scalar=True)¶
Generate a R style pairs hexbin density multiplot. In case of time-series data, the end states are used.
hexbin makes hexagonal binning plot of x versus y, where x, y are 1-D sequences of the same length, N. If C is None (the default), this is a histogram of the number of occurences of the observations at (x[i],y[i]). For further detail see matplotlib on hexbin
- Parameters
experiments (DataFrame) –
outcomes (dict) –
outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.
group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.
grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.
ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.
point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time
log (bool, optional) – indicating whether density should be log scaled. Defaults to True.
gridsize (int, optional) – controls the gridsize for the hexagonal bining. (default = 50)
cmap (str) – color map that is to be used in generating the hexbin. For details on the available maps, see pylab. (Defaults = coolwarm)
filter_scalar (bool, optional) – remove the non-time-series outcomes. Defaults to True.
- Returns
fig – the figure instance
dict – key is tuple of names of outcomes, value is associated axes instance
- ema_workbench.analysis.pairs_plotting.pairs_lines(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, ylabels={}, legend=True, **kwargs)¶
Generate a R style pairs lines multiplot. It shows the behavior of two outcomes over time against each other. The origin is denoted with a circle and the end is denoted with a ‘+’.
- Parameters
experiments (DataFrame) –
outcomes (dict) –
outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.
group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.
grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.
ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.
legend (bool, optional) – if true, and group_by is given, show a legend.
point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time
- Returns
fig – the figure instance
dict – key is tuple of names of outcomes, value is associated axes instance
- ema_workbench.analysis.pairs_plotting.pairs_scatter(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, ylabels={}, legend=True, point_in_time=-1, filter_scalar=False, **kwargs)¶
Generate a R style pairs scatter multiplot. In case of time-series data, the end states are used.
- Parameters
experiments (DataFrame) –
outcomes (dict) –
outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.
group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.
grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.
ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.
legend (bool, optional) – if true, and group_by is given, show a legend.
point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time
filter_scalar (bool, optional) – remove the non-time-series outcomes. Defaults to True.
- Returns
fig (Figure instance) – the figure instance
axes (dict) – key is tuple of names of outcomes, value is associated axes instance
.. note:: the current implementation is limited to seven different – categories in case of column, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.
parcoords
¶
This module offers a general purpose parallel coordinate plotting Class using matplotlib.
- class ema_workbench.analysis.parcoords.ParallelAxes(limits, formatter=None, fontsize=14, rot=90)¶
Base class for creating a parallel axis plot.
- Parameters
limits (DataFrame) – A DataFrame specifying the limits for each dimension in the data set. For categorical data, the first cell should contain all categories. See get_limits for more details.
formattter (dict , optional) – dict with precision format strings for minima and maxima, use column name as key. If column is not present, or no formatter dict is provided, precision formatting defaults to .2f
fontsize (int, optional) – fontsize for defaults text items
rot (float, optional) – rotation of axis labels
- invert_axis(axis)¶
flip direction for specified axis
- Parameters
axis (str or list of str) –
- legend()¶
add a legend to the figure
- plot(data, color=None, label=None, **kwargs)¶
plot data on parallel axes
- Parameters
data (DataFrame or Series) –
color (valid mpl color, optional) –
label (str, optional) –
plot (any additional kwargs will be passed to matplotlib's) –
method. –
initializing (Data is normalized using the limits specified when) –
ParallelAxis. –
- ema_workbench.analysis.parcoords.get_limits(data)¶
helper function to get limits of a FataFrame that can serve as input to ParallelAxis
- Parameters
data (DataFrame) –
- Return type
DataFrame
b_and_w_plotting
¶
This module offers basic functionality for converting a matplotlib figure to black and white. The provided functionality is largely determined by what is needed for the workbench.
- ema_workbench.analysis.b_and_w_plotting.set_fig_to_bw(fig, style='hatching', line_style='continuous')¶
TODO it would be nice if for lines you can select either markers, gray scale, or simple black
Take each axes in the figure and transform its content to black and white. Lines are tranformed based on different line styles. Fills such as can be used in meth:envelopes are gray-scaled. Heathmaps are also gray-scaled.
derived from and expanded for my use from: https://stackoverflow.com/questions/7358118/matplotlib-black-white-colormap-with-dashes-dots-etc
- Parameters
fig (figure) – the figure to transform to black and white
style ({HATCHING, GREYSCALE}) – parameter controlling how collections are transformed.
line_style (str, {'continuous', 'black', None}) –
plotting_util
¶
Plotting utility functions
- ema_workbench.analysis.plotting_util.COLOR_LIST = [(0.12156862745098039, 0.4666666666666667, 0.7058823529411765), (1.0, 0.4980392156862745, 0.054901960784313725), (0.17254901960784313, 0.6274509803921569, 0.17254901960784313), (0.8392156862745098, 0.15294117647058825, 0.1568627450980392), (0.5803921568627451, 0.403921568627451, 0.7411764705882353), (0.5490196078431373, 0.33725490196078434, 0.29411764705882354), (0.8901960784313725, 0.4666666666666667, 0.7607843137254902), (0.4980392156862745, 0.4980392156862745, 0.4980392156862745), (0.7372549019607844, 0.7411764705882353, 0.13333333333333333), (0.09019607843137255, 0.7450980392156863, 0.8117647058823529)]¶
Default color list
- class ema_workbench.analysis.plotting_util.Density(value)¶
Enum for different types of density plots
- BOXENPLOT = 'boxenplot'¶
constant for plotting density as a boxenplot
- BOXPLOT = 'boxplot'¶
constant for plotting density as a boxplot
- HIST = 'hist'¶
constant for plotting density as a histogram
- KDE = 'kde'¶
constant for plotting density as a kernel density estimate
- VIOLIN = 'violin'¶
constant for plotting density as a violin plot, which combines a Gaussian density estimate with a boxplot
- class ema_workbench.analysis.plotting_util.LegendEnum(value)¶
Enum for different styles of legends
scenario_discovery_util
¶
Scenario discovery utilities used by both cart
and prim
- class ema_workbench.analysis.scenario_discovery_util.RuleInductionType(value)¶
An enumeration.
- BINARY = 'binary'¶
constant indicating binary classification mode. This is the most common used mode in scenario discovery
- CLASSIFICATION = 'classification'¶
constant indicating classification mode
- REGRESSION = 'regression'¶
constant indicating regression mode
Util¶
ema_exceptions
¶
Exceptions and warning used internally by the EMA workbench. In line with advice given in PEP 8.
- exception ema_workbench.util.ema_exceptions.CaseError(message, case, policy=None)¶
error to be used when a particular run creates an error. The character of the error can be specified as the message, and the actual case that gave rise to the error.
- exception ema_workbench.util.ema_exceptions.EMAError(*args)¶
Base EMA error
- exception ema_workbench.util.ema_exceptions.EMAParallelError(*args)¶
parallel EMA error
- exception ema_workbench.util.ema_exceptions.EMAWarning(*args)¶
base EMA warning class
ema_logging
¶
This module contains code for logging EMA processes. It is modeled on the default logging approach that comes with Python. This logging system will also work in case of multiprocessing.
- ema_workbench.util.ema_logging.get_rootlogger()¶
Returns root logger used by the EMA workbench
- Return type
the logger of the EMA workbench
- ema_workbench.util.ema_logging.log_to_stderr(level=None)¶
Turn on logging and add a handler which prints to stderr
- Parameters
level (int) – minimum level of the messages that will be logged
- ema_workbench.util.ema_logging.temporary_filter(name='EMA', level=0, functname=None)¶
temporary filter log message
- namestr or list of str, optional
logger on which to apply the filter.
- level: int, or list of int, optional
don’t log message of this level or lower
- funcnamestr or list of str, optional
don’t log message of this function
all modules have their own unique logger (e.g. ema_workbench.analysis.prim)
utilities
¶
This module provides various convenience functions and classes.
- ema_workbench.util.utilities.load_results(file_name)¶
load the specified bz2 file. the file is assumed to be saves using save_results.
- Parameters
file_name (str) – the path to the file
- Raises
IOError if file not found –
- ema_workbench.util.utilities.merge_results(results1, results2)¶
convenience function for merging the return from
perform_experiments()
.The function merges results2 with results1. For the experiments, it generates an empty array equal to the size of the sum of the experiments. As dtype is uses the dtype from the experiments in results1. The function assumes that the ordering of dtypes and names is identical in both results.
A typical use case for this function is in combination with
experiments_to_cases()
. Usingexperiments_to_cases()
one extracts the cases from a first set of experiments. One then performs these cases on a different model or policy, and then one wants to merge these new results with the old result for further analysis.- Parameters
results1 (tuple) – first results to be merged
results2 (tuple) – second results to be merged
- Return type
the merged results
- ema_workbench.util.utilities.process_replications(data, aggregation_func=<function mean>)¶
Convenience function for processing the replications of a stochastic model’s outcomes.
The default behavior is to take the mean of the replications. This reduces the dimensionality of the outcomes from (experiments * replications * outcome_shape) to (experiments * outcome_shape), where outcome_shape is 0-d for scalars, 1-d for time series, and 2-d for arrays.
The function can take either the outcomes (dictionary: keys are outcomes of interest, values are arrays of data) or the results (tuple: experiments as DataFrame, outcomes as dictionary) of a set of simulation experiments.
- Parameters
data (dict, tuple) – outcomes or results of a set of experiments
aggregation_func (callabale, optional) – aggregation function to be applied, defaults to np.mean.
- Return type
dict, tuple
- ema_workbench.util.utilities.save_results(results, file_name)¶
save the results to the specified tar.gz file.
The way in which results are stored depends. Experiments are saved as csv. Outcomes depend on the outcome type. Scalar, and <3D arrays are saved as csv files. Higher dimensional arrays are stored as .npy files.
- Parameters
results (tuple) – the return of perform_experiments
file_name (str) – the path of the file
- Raises
IOError if file not found –