EMA Workbench documentation

Release

2.2

Date

Sep 20, 2022

Exploratory Modelling and Analysis (EMA) Workbench

Exploratory Modeling and Analysis (EMA) is a research methodology that uses computational experiments to analyze complex and uncertain systems (Bankes, 1993). That is, exploratory modeling aims at offering computational decision support for decision making under deep uncertainty and Robust decision making.

The EMA workbench aims at providing support for performing exploratory modeling with models developed in various modelling packages and environments. Currently, the workbench offers connectors to Vensim, Netlogo, and Excel.

The EMA workbench offers support for designing experiments, performing the experiments - including support for parallel processing on both a single machine as well as on clusters-, and analysing the results. To get started, take a look at the high level overview, the tutorial, or dive straight into the details of the API.

A High Level Overview

Exploratory modeling framework

The core package contains the core functionality for setting up, designing, and performing series of computational experiments on one or more models simultaneously.

Connectors

The connectors package contains connectors to some existing simulation modeling environments. For each of these, a standard ModelStructureInterface class is provided that users can use as a starting point for specifying the interface to their own model.

  • Vensim connector (ema_workbench.connectors.vensim): This enables controlling (e.g. setting parameters, simulation setup, run, get output, etc .) a simulation model that is built in Vensim software, and conducting an EMA study based on this model.

  • Excel connector (ema_workbench.connectors.excel): This enables controlling models build in Excel.

  • NetLogo connector (ema_workbench.connectors.netlogo): This enables controlling (e.g. setting parameters, simulation setup, run, get output, etc .) a simulation model that is built in NetLogo software, and conducting an EMA study based on this model.

  • Simio connector (ema_workbench.connectors.simio_connector): This enables controlling models built in Simio

  • Pysd connector (ema_workbench.connectors.pysd_connector)

Analysis

The analysis package contains a variety of analysis and visualization techniques for analyzing the results from the exploratory modeling. The analysis scripts are tailored for use in combination with the workbench, but they can also be used on their own with data generated in some other manner.

Installing the workbench

The 2.x version of the workbench requires Python 3.8 or higher.

A stable version of the workbench can be installed via pip.

pip install ema_workbench

The latest commit on the master branch can be installed with:

pip install -e git+https://github.com/quaquel/EMAworkbench #egg=ema-workbench

Or any other (development) branch on this repo or your own fork:

pip install -e git+https://github.com/YOUR_FORK/EMAworkbench@YOUR_BRANCH #egg=ema-workbench

The code is also available from github. The code comes with a requirements.txt file that indicates the key dependencies. Basically, if you have a standard scientific computing distribution for Python such as the Anaconda distribution, most of the dependencies will already be met.

In addition to the libraries available in the default Anaconda distribution, there are various optional dependencies. Please follow the installation instructions for each of these libraries.

From conda or conda-forge:

  • altair for interactive visualizations

  • ipyparallel for support of interactive multiprocessing within the jupyter notebook.

    conda install altair ipyparallel
    

There are also some pip based dependencies:

  • SALib , this is a necessary dependency for advanced senstivity analysis.

  • platypus-opt , this is an optional dependency for many-objective optimization functionality.

  • pydot and Graphviz for some of the visualizations.

    pip install SALib platypus-opt pydot
    

The various connectors have their own specific requirements.

  • Vensim only works on Windows. If you have 64 bit vensim, you need 64 bit Python. If you have 32 bit vensim, you will need 32 bit Python.

  • Excel also only works on Windows.

  • jpype-1 and pynetlogo for NetLogo

  • pysd optional for simple vensim models

  • pythonnet for Simio

conda install jpype1
pip install pynetlogo pysd pythonnet

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

2.2.0

Highlights

With the 2.2 release, the EMAworkbench can now connect to Vadere models on pedestrian dynamics. When inspecting a Prim Box peeling trajectory, multiple points on the peeling trajectory can be inspected simultaneously by inputting a list of integers into PrimBox.inspect().

When running experiments with multiprocessing using the MultiprocessingEvaluator, the number of processes can now be controlled using a negative integer as input for n_processes (for example, -2 on a 12-thread CPU results in 10 threads used). Also, it will now default to max. 61 processes on windows machines due to limitations inherent in Windows in dealing with higher processer counts. Code quality, CI, and error reporting also have been improved. And finally, generating these release notes is now automated.

What’s Changed

🎉 New features added

  • Vadere model connector by @floristevito in https://github.com/quaquel/EMAworkbench/pull/145

🛠 Enhancements made

  • Improve code quality with static analysis by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/119

  • prim.py: Make PrimBox.peeling_trajectory["id"] int instead of float by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/121

  • analysis: Allow optional annotation of plot_tradeoff graphs by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/123

  • evaluators.py: Allow MultiprocessingEvaluator to initialize with cpu_count minus N processes by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/140

  • PrimBox.inspect() now can also take a list of integers (aside from a single int) to inspect multiple points at once by @quaquel in https://github.com/quaquel/EMAworkbench/commit/6d83a6c33442ad4dce0974a384b03a225aaf830d (see also issue https://github.com/quaquel/EMAworkbench/issues/124)

🐛 Bugs fixed

  • fixed typo in lake_model.py by @JeffreyDillonLyons in https://github.com/quaquel/EMAworkbench/pull/136

📜 Documentation improvements

  • Docs: Installation.rst: Add how to install master or custom branch by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/122

  • Docs: Replace all http links with secure https URLs by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/134

  • Maintain release notes at CHANGELOG.md and include them in Readthedocs by @quaquel in https://github.com/quaquel/EMAworkbench/commit/ebdbc9f5c77693fc75911ead472b420065dfe2aa

  • Fix badge links in readme by @quaquel in https://github.com/quaquel/EMAworkbench/commit/28569bdcb149c070c329589969179be354b879ec

🔧 Maintenance

  • feature_scoring: fix Regressor criterion depreciation by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/125

  • feature_scoring.py: Change max_features in get_rf_feature_scores to "sqrt" by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/129

  • CI: Use Pytest instead of Nose, update default build to Python 3.10 by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/131

  • Release CI: Only upload packages if on main repo by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/132

  • CI: Split off flake8 linting in a separate job by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/133

  • CI: Add weekly scheduled jobs and manual trigger by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/137

  • setup.py: Add project_urls for documentation and issue tracker links by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/142

  • set scikit-learn requirement >= 1.0.0 by @rhysits in https://github.com/quaquel/EMAworkbench/pull/144

  • Create release.yml file for automatic release notes generation by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/152

  • instantiating an Evaluator without one or more AbstractModel instances now raises a type error by @quaquel in https://github.com/quaquel/EMAworkbench/commit/a83533aa8166ca2414137cdfc3125a53ee3697ec

  • removes depreciated DataFrame.append by replacing it with DataFrame.concat (see the conversation on issue https://github.com/quaquel/EMAworkbench/issues/126):

    • from feature scoring by @quaquel in https://github.com/quaquel/EMAworkbench/commit/8b8bfe41733e49b75c01e34b75563e0a6d5b4024

    • from logistic_regression.py by @quaquel in https://github.com/quaquel/EMAworkbench/commit/255e3d6d9639dfe6fd4e797e1c63d59ba0522c2d

  • removes NumPy datatypes deprecated in 1.20 by @quaquel in https://github.com/quaquel/EMAworkbench/commit/e8fbf6fc64f14b7c7220fa4d3fc976c42d3757eb

  • replace deprecated scipy.stats.kde with scipy.stats by @quaquel in https://github.com/quaquel/EMAworkbench/commit/b5a9ca967740e74d503281018e88d6b28e74a27d

New Contributors

  • @JeffreyDillonLyons made their first contribution in https://github.com/quaquel/EMAworkbench/pull/136

  • @rhysits made their first contribution in https://github.com/quaquel/EMAworkbench/pull/144

  • @floristevito made their first contribution in https://github.com/quaquel/EMAworkbench/pull/145

Full Changelog: https://github.com/quaquel/EMAworkbench/compare/2.1.2…2.2.0

Tutorials

The code of these examples can be found in the examples package. The first three examples are meant to illustrate the basics of the EMA workbench. How to implement a model, specify its uncertainties and outcomes, and run it. The fourth example is a more extensive illustration based on Pruyt & Hamarat (2010). It shows some more advanced possibilities of the EMA workbench, including one way of handling policies.

A simple model in Python

The simplest case is where we have a model available through a python function. For example, imagine we have the simple model.

def some_model(x1=None, x2=None, x3=None):
    return {'y':x1*x2+x3}

In order to control this model from the workbench, we can make use of the Model. We can instantiate a model object, by passing it a name, and the function.

model = Model('simpleModel', function=some_model) #instantiate the model

Next, we need to specify the uncertainties and the outcomes of the model. In this case, the uncertainties are x1, x2, and x3, while the outcome is y. Both uncertainties and outcomes are attributes of the model object, so we can say

1#specify uncertainties
2model.uncertainties = [RealParameter("x1", 0.1, 10),
3                       RealParameter("x2", -0.01,0.01),
4                       RealParameter("x3", -0.01,0.01)]
5#specify outcomes
6model.outcomes = [ScalarOutcome('y')]

Here, we specify that x1 is some value between 0.1, and 10, while both x2 and x3 are somewhere between -0.01 and 0.01. Having implemented this model, we can now investigate the model behavior over the set of uncertainties by simply calling

results = perform_experiments(model, 100)

The function perform_experiments() takes the model we just specified and will execute 100 experiments. By default, these experiments are generated using a Latin Hypercube sampling, but Monte Carlo sampling and Full factorial sampling are also readily available. Read the documentation for perform_experiments() for more details.

The complete code:

 1"""
 2Created on 20 dec. 2010
 3
 4This file illustrated the use the EMA classes for a contrived example
 5It's main purpose has been to test the parallel processing functionality
 6
 7.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 8"""
 9from ema_workbench import (
10    Model,
11    RealParameter,
12    ScalarOutcome,
13    ema_logging,
14    perform_experiments,
15)
16
17
18def some_model(x1=None, x2=None, x3=None):
19    return {"y": x1 * x2 + x3}
20
21
22if __name__ == "__main__":
23    ema_logging.LOG_FORMAT = "[%(name)s/%(levelname)s/%(processName)s] %(message)s"
24    ema_logging.log_to_stderr(ema_logging.INFO)
25
26    model = Model("simpleModel", function=some_model)  # instantiate the model
27
28    # specify uncertainties
29    model.uncertainties = [
30        RealParameter("x1", 0.1, 10),
31        RealParameter("x2", -0.01, 0.01),
32        RealParameter("x3", -0.01, 0.01),
33    ]
34    # specify outcomes
35    model.outcomes = [ScalarOutcome("y")]
36
37    results = perform_experiments(model, 100)

A simple model in Vensim

Imagine we have a very simple Vensim model:

_images/simpleVensimModel.png

For this example, we assume that ‘x11’ and ‘x12’ are uncertain. The state variable ‘a’ is the outcome of interest. Similar to the previous example, we have to first instantiate a vensim model object, in this case VensimModel. To this end, we need to specify the directory in which the vensim file resides, the name of the vensim file and the name of the model.

wd = r'./models/vensim example'
model = VensimModel("simpleModel", wd=wd, model_file=r'\model.vpm')

Next, we can specify the uncertainties and the outcomes.

1model.uncertainties = [RealParameter("x11", 0, 2.5),
2                       RealParameter("x12", -2.5, 2.5)]
3
4
5model.outcomes = [TimeSeriesOutcome('a')]

Note that we are using a TimeSeriesOutcome, because vensim results are time series. We can now simply run this model by calling perform_experiments().

with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.perform_experiments(1000)

We now use a evaluator, which ensures that the code is executed in parallel.

Is it generally good practice to first run a model a small number of times sequentially prior to running in parallel. In this way, bugs etc. can be spotted more easily. To further help with keeping track of what is going on, it is also good practice to make use of the logging functionality provided by the workbench

ema_logging.log_to_stderr(ema_logging.INFO)

Typically, this line appears at the start of the script. When executing the code, messages on progress or on errors will be shown.

The complete code

 1"""
 2Created on 3 Jan. 2011
 3
 4This file illustrated the use the EMA classes for a contrived vensim
 5example
 6
 7
 8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 9                chamarat <c.hamarat (at) tudelft (dot) nl>
10"""
11from ema_workbench import (
12    TimeSeriesOutcome,
13    perform_experiments,
14    RealParameter,
15    ema_logging,
16)
17
18from ema_workbench.connectors.vensim import VensimModel
19
20if __name__ == "__main__":
21    # turn on logging
22    ema_logging.log_to_stderr(ema_logging.INFO)
23
24    # instantiate a model
25    wd = "./models/vensim example"
26    vensimModel = VensimModel("simpleModel", wd=wd, model_file="model.vpm")
27    vensimModel.uncertainties = [
28        RealParameter("x11", 0, 2.5),
29        RealParameter("x12", -2.5, 2.5),
30    ]
31
32    vensimModel.outcomes = [TimeSeriesOutcome("a")]
33
34    results = perform_experiments(vensimModel, 1000)

A simple model in Excel

In order to perform EMA on an Excel model, one can use the ExcelModel. This base class makes uses of naming cells in Excel to refer to them directly. That is, we can assume that the names of the uncertainties correspond to named cells in Excel, and similarly, that the names of the outcomes correspond to named cells or ranges of cells in Excel. When using this class, make sure that the decimal separator and thousands separator are set correctly in Excel. This can be checked via file > options > advanced. These separators should follow the anglo saxon convention.

 1"""
 2Created on 27 Jul. 2011
 3
 4This file illustrated the use the EMA classes for a model in Excel.
 5
 6It used the excel file provided by
 7`A. Sharov <https://home.comcast.net/~sharov/PopEcol/lec10/fullmod.html>`_
 8
 9This excel file implements a simple predator prey model.
10
11.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
12"""
13from ema_workbench import (
14    RealParameter,
15    TimeSeriesOutcome,
16    ema_logging,
17    perform_experiments,
18)
19
20from ema_workbench.connectors.excel import ExcelModel
21from ema_workbench.em_framework.evaluators import MultiprocessingEvaluator
22
23if __name__ == "__main__":
24    ema_logging.log_to_stderr(level=ema_logging.INFO)
25
26    model = ExcelModel(
27        "predatorPrey", wd="./models/excelModel", model_file="excel example.xlsx"
28    )
29    model.uncertainties = [
30        RealParameter("K2", 0.01, 0.2),
31        # we can refer to a cell in the normal way
32        # we can also use named cells
33        RealParameter("KKK", 450, 550),
34        RealParameter("rP", 0.05, 0.15),
35        RealParameter("aaa", 0.00001, 0.25),
36        RealParameter("tH", 0.45, 0.55),
37        RealParameter("kk", 0.1, 0.3),
38    ]
39
40    # specification of the outcomes
41    model.outcomes = [
42        TimeSeriesOutcome("B4:B1076"),
43        # we can refer to a range in the normal way
44        TimeSeriesOutcome("P_t"),
45    ]  # we can also use named range
46
47    # name of the sheet
48    model.default_sheet = "Sheet1"
49
50    with MultiprocessingEvaluator(model) as evaluator:
51        results = perform_experiments(
52            model, 100, reporting_interval=1, evaluator=evaluator
53        )

The example is relatively straight forward. We instantiate an excel model, we specify the uncertainties and the outcomes. We also need to specify the sheet in excel on which the model resides. Next we can call perform_experiments().

Warning

when using named cells. Make sure that the names are defined at the sheet level and not at the workbook level

A more elaborate example: Mexican Flu

This example is derived from Pruyt & Hamarat (2010). This paper presents a small exploratory System Dynamics model related to the dynamics of the 2009 flu pandemic, also known as the Mexican flu, swine flu, or A(H1N1)v. The model was developed in May 2009 in order to quickly foster understanding about the possible dynamics of this new flu variant and to perform rough-cut policy explorations. Later, the model was also used to further develop and illustrate Exploratory Modelling and Analysis.

Mexican Flu: the basic model

In the first days, weeks and months after the first reports about the outbreak of a new flu variant in Mexico and the USA, much remained unknown about the possible dynamics and consequences of the at the time plausible/imminent epidemic/pandemic of the new flu variant, first known as Swine or Mexican flu and known today as Influenza A(H1N1)v.

The exploratory model presented here is small, simple, high-level, data-poor (no complex/special structures nor detailed data beyond crude guestimates), and history-poor.

The modelled world is divided in three regions: the Western World, the densely populated Developing World, and the scarcely populated Developing World. Only the two first regions are included in the model because it is assumed that the scarcely populated regions are causally less important for dynamics of flu pandemics. Below, the figure shows the basic stock-and-flow structure. For a more elaborate description of the model, see Pruyt & Hamarat (2010).

_images/flu-model.png

Given the various uncertainties about the exact characteristics of the flu, including its fatality rate, the contact rate, the susceptibility of the population, etc. the flu case is an ideal candidate for EMA. One can use EMA to explore the kinds of dynamics that can occur, identify undesirable dynamic, and develop policies targeted at the undesirable dynamics.

In the original paper, Pruyt & Hamarat (2010). recoded the model in Python and performed the analysis in that way. Here we show how the EMA workbench can be connected to Vensim directly.

The flu model was build in Vensim. We can thus use VensimModelS as a base class.

We are interested in two outcomes:

  • deceased population region 1: the total number of deaths over the duration of the simulation.

  • peak infected fraction: the fraction of the population that is infected.

These are added to self.outcomes, using the TimeSeriesOutcome class.

The table below is adapted from Pruyt & Hamarat (2010). It shows the uncertainties, and their bounds. These are added to self.uncertainties as ParameterUncertainty instances.

Parameter

Lower Limit

Upper Limit

additional seasonal immune population fraction region 1

0.0

0.5

additional seasonal immune population fraction region 2

0.0

0.5

fatality ratio region 1

0.0001

0.1

fatality ratio region 2

0.0001

0.1

initial immune fraction of the population of region 1

0.0

0.5

initial immune fraction of the population of region 2

0.0

0.5

normal interregional contact rate

0.0

0.9

permanent immune population fraction region 1

0.0

0.5

permanent immune population fraction region 2

0.0

0.5

recovery time region 1

0.2

0.8

recovery time region 2

0.2

0.8

root contact rate region 1

1.0

10.0

root contact rate region 2

1.0

10.0

infection ratio region 1

0.0

0.1

infection ratio region 2

0.0

0.1

normal contact rate region 1

10

200

normal contact rate region 2

10

200

Together, this results in the following code:

 1"""
 2Created on 20 May, 2011
 3
 4This module shows how you can use vensim models directly
 5instead of coding the model in Python. The underlying case
 6is the same as used in fluExample
 7
 8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 9                epruyt <e.pruyt (at) tudelft (dot) nl>
10"""
11from ema_workbench import (
12    RealParameter,
13    TimeSeriesOutcome,
14    ema_logging,
15    perform_experiments,
16    MultiprocessingEvaluator,
17)
18
19from ema_workbench.connectors.vensim import VensimModel
20
21if __name__ == "__main__":
22    ema_logging.log_to_stderr(ema_logging.INFO)
23
24    model = VensimModel(
25        "fluCase", wd="./models/flu", model_file="FLUvensimV1basecase.vpm"
26    )
27
28    # outcomes
29    model.outcomes = [
30        TimeSeriesOutcome("deceased population region 1"),
31        TimeSeriesOutcome("infected fraction R1"),
32    ]
33
34    # Plain Parametric Uncertainties
35    model.uncertainties = [
36        RealParameter("additional seasonal immune population fraction R1", 0, 0.5),
37        RealParameter("additional seasonal immune population fraction R2", 0, 0.5),
38        RealParameter("fatality ratio region 1", 0.0001, 0.1),
39        RealParameter("fatality rate region 2", 0.0001, 0.1),
40        RealParameter("initial immune fraction of the population of region 1", 0, 0.5),
41        RealParameter("initial immune fraction of the population of region 2", 0, 0.5),
42        RealParameter("normal interregional contact rate", 0, 0.9),
43        RealParameter("permanent immune population fraction R1", 0, 0.5),
44        RealParameter("permanent immune population fraction R2", 0, 0.5),
45        RealParameter("recovery time region 1", 0.1, 0.75),
46        RealParameter("recovery time region 2", 0.1, 0.75),
47        RealParameter("susceptible to immune population delay time region 1", 0.5, 2),
48        RealParameter("susceptible to immune population delay time region 2", 0.5, 2),
49        RealParameter("root contact rate region 1", 0.01, 5),
50        RealParameter("root contact ratio region 2", 0.01, 5),
51        RealParameter("infection ratio region 1", 0, 0.15),
52        RealParameter("infection rate region 2", 0, 0.15),
53        RealParameter("normal contact rate region 1", 10, 100),
54        RealParameter("normal contact rate region 2", 10, 200),
55    ]
56
57    nr_experiments = 10
58    with MultiprocessingEvaluator(model) as evaluator:
59        results = perform_experiments(model, nr_experiments, evaluator=evaluator)

We have now instantiated the model, specified the uncertain factors and outcomes and run the model. We now have generated a dataset of results and can proceed to analyse the results using various analysis scripts. As a first step, one can look at the individual runs using a line plot using lines(). See plotting for some more visualizations using results from performing EMA on FluModel.

1import matplotlib.pyplot as plt
2from ema_workbench.analysis.plotting import lines
3
4figure = lines(results, density=True) #show lines, and end state density
5plt.show() #show figure

generates the following figure:

_images/tutorial-lines.png

From this figure, one can deduce that across the ensemble of possible futures, there is a subset of runs with a substantial amount of deaths. We can zoom in on those cases, identify their conditions for occurring, and use this insight for policy design.

For further analysis, it is generally convenient, to generate the results for a series of experiments and save these results. One can then use these saved results in various analysis scripts.

from ema_workbench import save_results
save_results(results, r'./1000 runs.tar.gz')

The above code snippet shows how we can use save_results() for saving the results of our experiments. save_results() stores the as csv files in a tarbal.

Mexican Flu: policies

For this paper, policies were developed by using the system understanding of the analysts.

static policy
adaptive policy
running the policies

In order to be able to run the models with the policies and to compare their results with the no policy case, we need to specify the policies

1#add policies
2policies = [Policy('no policy',
3                   model_file=r'/FLUvensimV1basecase.vpm'),
4            Policy('static policy',
5                   model_file=r'/FLUvensimV1static.vpm'),
6            Policy('adaptive policy',
7                   model_file=r'/FLUvensimV1dynamic.vpm')
8            ]

In this case, we have chosen to have the policies implemented in separate vensim files. Policies require a name, and can take any other keyword arguments you like. If the keyword matches an attribute on the model object, it will be updated, so model_file is an attribute on the vensim model. When executing the policies, we update this attribute for each policy. We can pass these policies to perform_experiment() as an additional keyword argument

results = perform_experiments(model, 1000, policies=policies)

We can now proceed in the same way as before, and perform a series of experiments. Together, this results in the following code:

 1"""
 2Created on 20 May, 2011
 3
 4This module shows how you can use vensim models directly
 5instead of coding the model in Python. The underlying case
 6is the same as used in fluExample
 7
 8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 9                epruyt <e.pruyt (at) tudelft (dot) nl>
10"""
11import numpy as np
12
13from ema_workbench import (
14    RealParameter,
15    TimeSeriesOutcome,
16    ema_logging,
17    ScalarOutcome,
18    perform_experiments,
19)
20from ema_workbench.connectors.vensim import VensimModel
21from ema_workbench.em_framework.parameters import Policy
22
23if __name__ == "__main__":
24    ema_logging.log_to_stderr(ema_logging.INFO)
25
26    model = VensimModel(
27        "fluCase", wd=r"./models/flu", model_file=r"FLUvensimV1basecase.vpm"
28    )
29
30    # outcomes
31    model.outcomes = [
32        TimeSeriesOutcome("deceased population region 1"),
33        TimeSeriesOutcome("infected fraction R1"),
34        ScalarOutcome(
35            "max infection fraction",
36            variable_name="infected fraction R1",
37            function=np.max,
38        ),
39    ]
40
41    # Plain Parametric Uncertainties
42    model.uncertainties = [
43        RealParameter("additional seasonal immune population fraction R1", 0, 0.5),
44        RealParameter("additional seasonal immune population fraction R2", 0, 0.5),
45        RealParameter("fatality ratio region 1", 0.0001, 0.1),
46        RealParameter("fatality rate region 2", 0.0001, 0.1),
47        RealParameter("initial immune fraction of the population of region 1", 0, 0.5),
48        RealParameter("initial immune fraction of the population of region 2", 0, 0.5),
49        RealParameter("normal interregional contact rate", 0, 0.9),
50        RealParameter("permanent immune population fraction R1", 0, 0.5),
51        RealParameter("permanent immune population fraction R2", 0, 0.5),
52        RealParameter("recovery time region 1", 0.1, 0.75),
53        RealParameter("recovery time region 2", 0.1, 0.75),
54        RealParameter("susceptible to immune population delay time region 1", 0.5, 2),
55        RealParameter("susceptible to immune population delay time region 2", 0.5, 2),
56        RealParameter("root contact rate region 1", 0.01, 5),
57        RealParameter("root contact ratio region 2", 0.01, 5),
58        RealParameter("infection ratio region 1", 0, 0.15),
59        RealParameter("infection rate region 2", 0, 0.15),
60        RealParameter("normal contact rate region 1", 10, 100),
61        RealParameter("normal contact rate region 2", 10, 200),
62    ]
63
64    # add policies
65    policies = [
66        Policy("no policy", model_file=r"FLUvensimV1basecase.vpm"),
67        Policy("static policy", model_file=r"FLUvensimV1static.vpm"),
68        Policy("adaptive policy", model_file=r"FLUvensimV1dynamic.vpm"),
69    ]
70
71    results = perform_experiments(model, 1000, policies=policies)
comparison of results

Using the following script, we reproduce figures similar to the 3D figures in Pruyt & Hamarat (2010). But using pairs_scatter(). It shows for the three different policies their behavior on the total number of deaths, the height of the heigest peak of the pandemic, and the point in time at which this peak was reached.

 1"""
 2Created on 20 sep. 2011
 3
 4.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 5"""
 6import matplotlib.pyplot as plt
 7import numpy as np
 8
 9from ema_workbench import load_results, ema_logging
10from ema_workbench.analysis.pairs_plotting import (
11    pairs_lines,
12    pairs_scatter,
13    pairs_density,
14)
15
16ema_logging.log_to_stderr(level=ema_logging.DEFAULT_LEVEL)
17
18# load the data
19fh = "./data/1000 flu cases no policy.tar.gz"
20experiments, outcomes = load_results(fh)
21
22# transform the results to the required format
23# that is, we want to know the max peak and the casualties at the end of the
24# run
25tr = {}
26
27# get time and remove it from the dict
28time = outcomes.pop("TIME")
29
30for key, value in outcomes.items():
31    if key == "deceased population region 1":
32        tr[key] = value[:, -1]  # we want the end value
33    else:
34        # we want the maximum value of the peak
35        max_peak = np.max(value, axis=1)
36        tr["max peak"] = max_peak
37
38        # we want the time at which the maximum occurred
39        # the code here is a bit obscure, I don't know why the transpose
40        # of value is needed. This however does produce the appropriate results
41        logical = value.T == np.max(value, axis=1)
42        tr["time of max"] = time[logical.T]
43
44pairs_scatter(experiments, tr, filter_scalar=False)
45pairs_lines(experiments, outcomes)
46pairs_density(experiments, tr, filter_scalar=False)
47plt.show()

no policy

_images/multiplot-flu-no-policy.png

static policy

_images/multiplot-flu-static-policy.png

adaptive policy

_images/multiplot-flu-adaptive-policy.png

General Introduction

Since 2010, I have been working on the development of an open source toolkit for supporting decision-making under deep uncertainty. This toolkit is known as the exploratory modeling workbench. The motivation for this name is that in my opinion all model-based deep uncertainty approaches are forms of exploratory modeling as first introduced by Bankes (1993). The design of the workbench has undergone various changes over time, but it has started to stabilize in the fall of 2016. In the summer 0f 2017, I published a paper detailing the workbench (Kwakkel, 2017). There is an in depth example in the paper, but for the documentation I want to provide a more tutorial style description of the functionality of the workbench.

As a starting point, I will use the Direct Policy Search example that is available for Rhodium (Quinn et al 2017). A quick note on terminology is in order here. I have a background in transport modeling. Here we often use discrete event simulation models. These are intrinsically stochastic models. It is standard practice to run these models several times and take descriptive statistics over the set of runs. In discrete event simulation, and also in the context of agent based modeling, this is known as running replications. The workbench adopts this terminology and draws a sharp distinction between designing experiments over a set of deeply uncertain factors, and performing replications of this experiment to cope with stochastic uncertainty.

[1]:
import math

# more or less default imports when using
# the workbench
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from scipy.optimize import brentq


def get_antropogenic_release(xt, c1, c2, r1, r2, w1):
    """

    Parameters
    ----------
    xt : float
         polution in lake at time t
    c1 : float
         center rbf 1
    c2 : float
         center rbf 2
    r1 : float
         ratius rbf 1
    r2 : float
         ratius rbf 2
    w1 : float
         weight of rbf 1

    Returns
    -------
    float

    note:: w2 = 1 - w1

    """

    rule = w1 * (abs(xt - c1) / r1) ** 3 + (1 - w1) * (abs(xt - c2) / r2) ** 3
    at1 = max(rule, 0.01)
    at = min(at1, 0.1)

    return at


def lake_model(
    b=0.42,
    q=2.0,
    mean=0.02,
    stdev=0.001,
    delta=0.98,
    alpha=0.4,
    nsamples=100,
    myears=100,
    c1=0.25,
    c2=0.25,
    r1=0.5,
    r2=0.5,
    w1=0.5,
    seed=None,
):
    """runs the lake model for nsamples stochastic realisation using
    specified random seed.

    Parameters
    ----------
    b : float
        decay rate for P in lake (0.42 = irreversible)
    q : float
        recycling exponent
    mean : float
            mean of natural inflows
    stdev : float
            standard deviation of natural inflows
    delta : float
            future utility discount rate
    alpha : float
            utility from pollution
    nsamples : int, optional
    myears : int, optional
    c1 : float
    c2 : float
    r1 : float
    r2 : float
    w1 : float
    seed : int, optional
           seed for the random number generator

    Returns
    -------
    tuple

    """
    np.random.seed(seed)
    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)

    X = np.zeros((myears,))
    average_daily_P = np.zeros((myears,))
    reliability = 0.0
    inertia = 0
    utility = 0

    for _ in range(nsamples):
        X[0] = 0.0
        decision = 0.1

        decisions = np.zeros(
            myears,
        )
        decisions[0] = decision

        natural_inflows = np.random.lognormal(
            math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
            math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
            size=myears,
        )

        for t in range(1, myears):

            # here we use the decision rule
            decision = get_antropogenic_release(X[t - 1], c1, c2, r1, r2, w1)
            decisions[t] = decision

            X[t] = (
                (1 - b) * X[t - 1]
                + X[t - 1] ** q / (1 + X[t - 1] ** q)
                + decision
                + natural_inflows[t - 1]
            )
            average_daily_P[t] += X[t] / nsamples

        reliability += np.sum(X < Pcrit) / (nsamples * myears)
        inertia += np.sum(np.absolute(np.diff(decisions) < 0.02)) / (nsamples * myears)
        utility += (
            np.sum(alpha * decisions * np.power(delta, np.arange(myears))) / nsamples
        )
    max_P = np.max(average_daily_P)
    return max_P, utility, inertia, reliability

Connecting a python function to the workbench

Now we are ready to connect this model to the workbench. We have to specify the uncertainties, the outcomes, and the policy levers. For the uncertainties and the levers, we can use real valued parameters, integer valued parameters, and categorical parameters. For outcomes, we can use either scalar, single valued outcomes or time series outcomes. For convenience, we can also explicitly control constants in case we want to have them set to a value different from their default value.

[2]:
from ema_workbench import RealParameter, ScalarOutcome, Constant, Model
from dps_lake_model import lake_model

model = Model("lakeproblem", function=lake_model)

# specify uncertainties
model.uncertainties = [
    RealParameter("b", 0.1, 0.45),
    RealParameter("q", 2.0, 4.5),
    RealParameter("mean", 0.01, 0.05),
    RealParameter("stdev", 0.001, 0.005),
    RealParameter("delta", 0.93, 0.99),
]

# set levers
model.levers = [
    RealParameter("c1", -2, 2),
    RealParameter("c2", -2, 2),
    RealParameter("r1", 0, 2),
    RealParameter("r2", 0, 2),
    RealParameter("w1", 0, 1),
]

# specify outcomes
model.outcomes = [
    ScalarOutcome("max_P"),
    ScalarOutcome("utility"),
    ScalarOutcome("inertia"),
    ScalarOutcome("reliability"),
]

# override some of the defaults of the model
model.constants = [
    Constant("alpha", 0.41),
    Constant("nsamples", 150),
    Constant("myears", 100),
]

Performing experiments

Now that we have specified the model with the workbench, we are ready to perform experiments on it. We can use evaluators to distribute these experiments either over multiple cores on a single machine, or over a cluster using ipyparallel. Using any parallelization is an advanced topic, in particular if you are on a windows machine. The code as presented here will run fine in parallel on a mac or Linux machine. If you are trying to run this in parallel using multiprocessing on a windows machine, from within a jupyter notebook, it won’t work. The solution is to move the lake_model and get_antropogenic_release to a separate python module and import the lake model function into the notebook.

Another common practice when working with the exploratory modeling workbench is to turn on the logging functionality that it provides. This will report on the progress of the experiments, as well as provide more insight into what is happening in particular in case of errors.

If we want to perform experiments on the model we have just defined, we can use the perform_experiments method on the evaluator, or the stand alone perform_experiments function. We can perform experiments over the uncertainties and/or over the levers. Any given parameterization of the levers is known as a policy, while any given parametrization over the uncertainties is known as a scenario. Any policy is evaluated over each of the scenarios. So if we want to use 100 scenarios and 10 policies, this means that we will end up performing 100 * 10 = 1000 experiments. By default, the workbench uses Latin hypercube sampling for both sampling over levers and sampling over uncertainties. However, the workbench also offers support for full factorial, partial factorial, and Monte Carlo sampling, as well as wrappers for the various sampling schemes provided by SALib.

The ema_workbench offers support for parallelization of the execution of the experiments using either the multiprocessing or ipyparallel. There are various OS specific concerns you have to keep in mind when using either of these libraries. Please have a look at the documentation of these libraries, before using them.

[3]:
from ema_workbench import MultiprocessingEvaluator, ema_logging, perform_experiments

ema_logging.log_to_stderr(ema_logging.INFO)

with MultiprocessingEvaluator(model) as evaluator:
    results = evaluator.perform_experiments(scenarios=1000, policies=10)
[MainProcess/INFO] pool started with 12 workers
[MainProcess/INFO] performing 1000 scenarios * 10 policies * 1 model(s) = 10000 experiments
100%|████████████████████████████████████| 10000/10000 [02:18<00:00, 72.21it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool

Processing the results of the experiments

By default, the return of perform_experiments is a tuple of length 2. The first item in the tuple is the experiments. The second item is the outcomes. Experiments and outcomes are aligned by index. The experiments are stored in a Pandas DataFrame, while the outcomes are a dict with the name of the outcome as key, and the values are in a numpy array.

[4]:
experiments, outcomes = results
print(experiments.shape)
print(list(outcomes.keys()))
(10000, 13)
['max_P', 'utility', 'inertia', 'reliability']

We can easily visualize these results. The workbench comes with various convenience plotting functions built on top of matplotlib and seaborn. We can however also easily use seaborn or matplotlib directly. For example, we can create a pairplot using seaborn where we group our outcomes by policy. For this, we need to create a dataframe with the outcomes and a policy column. By default the name of a policy is a string representation of the dict with lever names and values. We can replace this easily with a number as shown below.

[5]:
policies = experiments["policy"]
for i, policy in enumerate(np.unique(policies)):
    experiments.loc[policies == policy, "policy"] = str(i)

data = pd.DataFrame(outcomes)
data["policy"] = policies

Next, all that is left is to use seaborn’s pairplot function to visualize the data.

[6]:
sns.pairplot(data, hue="policy", vars=list(outcomes.keys()))
plt.show()
_images/indepth_tutorial_general-introduction_11_0.png
[ ]:

Open exploration

In this second tuturial, I will showcase how to use the ema_workbench for performing open exploration. This tuturial will continue with the same example as used in the previos tuturial.

some background

In exploratory modeling, we are interested in understanding how regions in the uncertainty space and/or the decision space map to the whole outcome space, or partitions thereof. There are two general approaches for investigating this mapping. The first one is through systematic sampling of the uncertainty or decision space. This is sometimes also known as open exploration. The second one is to search through the space in a directed manner using some type of optimization approach. This is sometimes also known as directed search.

The workbench support both open exploration and directed search. Both can be applied to investigate the mapping of the uncertainty space and/or the decision space to the outcome space. In most applications, search is used for finding promising mappings from the decision space to the outcome space, while exploration is used to stress test these mappings under a whole range of possible resolutions to the various uncertainties. This need not be the case however. Optimization can be used to discover the worst possible scenario, while sampling can be used to get insight into the sensitivity of outcomes to the various decision levers.

open exploration

To showcase the open exploration functionality, let’s start with a basic example using the Direct Policy Search (DPS) version of the lake problem (Quinn et al 2017). This is the same model as we used in the general introduction. Note that for convenience, I have moved the code for the model to a module called dps_lake_model.py, which I import here for further use.

We are going to simultaneously sample over uncertainties and decision levers. We are going to generate 1000 scenarios and 5 policies, and see how they jointly affect the outcomes. A scenario is understood as a point in the uncertainty space, while a policy is a point in the decision space. The combination of a scenario and a policy is called experiment. The uncertainty space is spanned by uncertainties, while the decision space is spanned by levers.

Both uncertainties and levers are instances of RealParameter (a continuous range), IntegerParameter (a range of integers), or CategoricalParameter (an unorder set of things). By default, the workbench will use Latin Hypercube sampling for generating both the scenarios and the policies. Each policy will be always evaluated over all scenarios (i.e. a full factorial over scenarios and policies).

[1]:
from ema_workbench import RealParameter, ScalarOutcome, Constant, Model
from dps_lake_model import lake_model

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

model = Model("lakeproblem", function=lake_model)

# specify uncertainties
model.uncertainties = [
    RealParameter("b", 0.1, 0.45),
    RealParameter("q", 2.0, 4.5),
    RealParameter("mean", 0.01, 0.05),
    RealParameter("stdev", 0.001, 0.005),
    RealParameter("delta", 0.93, 0.99),
]

# set levers
model.levers = [
    RealParameter("c1", -2, 2),
    RealParameter("c2", -2, 2),
    RealParameter("r1", 0, 2),
    RealParameter("r2", 0, 2),
    RealParameter("w1", 0, 1),
]

# specify outcomes
model.outcomes = [
    ScalarOutcome("max_P"),
    ScalarOutcome("utility"),
    ScalarOutcome("inertia"),
    ScalarOutcome("reliability"),
]

# override some of the defaults of the model
model.constants = [
    Constant("alpha", 0.41),
    Constant("nsamples", 150),
    Constant("myears", 100),
]
[2]:
from ema_workbench import MultiprocessingEvaluator, ema_logging, perform_experiments

ema_logging.log_to_stderr(ema_logging.INFO)

with MultiprocessingEvaluator(model, n_processes=7) as evaluator:
    experiments, outcomes = evaluator.perform_experiments(scenarios=1000, policies=5)
[MainProcess/INFO] pool started with 7 workers
[MainProcess/INFO] performing 1000 scenarios * 5 policies * 1 model(s) = 5000 experiments
100%|██████████████████████████████████████| 5000/5000 [01:02<00:00, 80.06it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool

Visual analysis

Having generated these results, the next step is to analyze them and see what we can learn from the results. The workbench comes with a variety of techniques for this analysis. A simple first step is to make a few quick visualizations of the results. The workbench has convenience functions for this, but it also possible to create your own visualizations using the scientific Python stack.

[3]:
from ema_workbench.analysis import pairs_plotting

fig, axes = pairs_plotting.pairs_scatter(
    experiments, outcomes, group_by="policy", legend=False
)
fig.set_size_inches(8, 8)
plt.show()
[MainProcess/INFO] no time dimension found in results
_images/indepth_tutorial_open-exploration_4_1.png

Often, it is convenient to separate the process of performing the experiments from the analysis. To make this possible, the workbench offers convenience functions for storing results to disc and loading them from disc. The workbench will store the results in a tarbal with .csv files and separate metadata files. This is a convenient format that has proven sufficient over the years.

from ema_workbench import save_results
save_results(results, '1000 scenarios 5 policies.tar.gz')

from ema_workbench import load_results
results = load_results('1000 scenarios 5 policies.tar.gz')

advanced analysis

In addition to visual analysis, the workbench comes with a variety of techniques to perform a more in-depth analysis of the results. In addition, other analyses can simply be performed by utilizing the scientific python stack. The workbench comes with

  • Scenario Discovery, a model driven approach to scenario development

  • Feature Scoring, a poor man’s alternative to global sensitivity analysis

  • Dimensional stacking, a quick visual approach drawing on feature scoring to enable scenario discovery. This approach has received limited attention in the literature. The implementation in the workbench replaces the rule mining approach with a feature scoring approach.

  • Regional sensitivity analysis

Scenario Discovery

A detailed discussion on scenario discovery can be found in an earlier blogpost. For completeness, I provide a code snippet here. Compared to the previous blog post, there is one small change. The library mpld3 is currently not being maintained and broken on Python 3.5. and higher. To still utilize the interactive exploration of the trade offs within the notebook, one could use the interactive back-end (% matplotlib notebook).

[4]:
from ema_workbench.analysis import prim

x = experiments
y = outcomes["max_P"] < 0.8
prim_alg = prim.Prim(x, y, threshold=0.8)
box1 = prim_alg.find_box()
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] 5000 points remaining, containing 510 cases of interest
[MainProcess/INFO] mean: 0.9807692307692307, mass: 0.052, coverage: 0.5, density: 0.9807692307692307 restricted_dimensions: 3
[5]:
box1.show_tradeoff()
plt.show()
_images/indepth_tutorial_open-exploration_8_0.png

We can inspect any of the points on the trade off curve using the inspect method. As shown, we can show the results either in a table format or in a visual format.

[6]:
box1.inspect(43)
box1.inspect(43, style="graph")
plt.show()
coverage    0.794118
density     0.794118
id                43
mass           0.102
mean        0.794118
res_dim            2
Name: 43, dtype: object

     box 43
        min       max                        qp values
b  0.362596  0.449742  [1.5372285485377317e-164, -1.0]
q  3.488834  4.498362   [1.3004137205191903e-88, -1.0]

_images/indepth_tutorial_open-exploration_10_1.png
[7]:
box1.show_pairs_scatter(43)
plt.show()
_images/indepth_tutorial_open-exploration_11_0.png

feature scoring

Feature scoring is a family of techniques often used in machine learning to identify the most relevant features to include in a model. This is similar to one of the use cases for global sensitivity analysis, namely factor prioritisation. The main advantage of feature scoring techniques is that they impose no specific constraints on the experimental design, while they can handle real valued, integer valued, and categorical valued parameters. The workbench supports multiple techniques, the most useful of which generally is extra trees (Geurts et al. 2006).

For this example, we run feature scoring for each outcome of interest. We can also run it for a specific outcome if desired. Similarly, we can choose if we want to run in regression mode or classification mode. The later is applicable if the outcome is a categorical variable and the results should be interpreted similar to regional sensitivity analysis results. For more details, see the documentation.

[8]:
from ema_workbench.analysis import feature_scoring

x = experiments
y = outcomes

fs = feature_scoring.get_feature_scores_all(x, y)
sns.heatmap(fs, cmap="viridis", annot=True)
plt.show()
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] model dropped from analysis because only a single category
_images/indepth_tutorial_open-exploration_13_1.png

From the results, we see that max_P is primarily influenced by b, while utility is driven by delta, for inertia and reliability the situation is a little bit less clear cut.

The foregoing feature scoring was using the raw values of the outcomes. However, in scenario discovery applications, we are typically dealing with a binary clasification. This might produce slightly different results as demonstrated below

[11]:
from ema_workbench.analysis import RuleInductionType

x = experiments
y = outcomes["max_P"] < 0.8

fs, alg = feature_scoring.get_ex_feature_scores(
    x, y, mode=RuleInductionType.CLASSIFICATION
)
fs.sort_values(ascending=False, by=1)
[MainProcess/INFO] model dropped from analysis because only a single category
[11]:
1
0
b 0.458094
q 0.310989
mean 0.107732
delta 0.051212
stdev 0.047653
policy 0.005475
w1 0.004815
r2 0.003658
r1 0.003607
c1 0.003467
c2 0.003299

Here we ran extra trees feature scoring on a binary vector for max_P. the b parameter is still important, similar to in the previous case, but the introduction of the binary classifiaction now also highlights some addtional parameters as being potentially relevant.

dimensional stacking

Dimensional stacking was suggested as a more visual approach to scenario discovery. It involves two steps: identifying the most important uncertainties that affect system behavior, and creating a pivot table using the most influential uncertainties. In order to do this, we first need, as in scenario discovery, specify the outcomes that are of interest. The creating of the pivot table involves binning the uncertainties. More details can be found in Suzuki et al. (2015) or by looking through the code in the workbench. Compared to Suzuki et al, the workbench uses feature scoring for determining the most influential uncertainties. The code is set up in a modular way so other approaches to global sensitivity analysis can easily be used as well if so desired.

[12]:
from ema_workbench.analysis import dimensional_stacking

x = experiments
y = outcomes["max_P"] < 0.8
dimensional_stacking.create_pivot_plot(x, y, 2, nbins=3)
plt.show()
[MainProcess/INFO] model dropped from analysis because only a single category
_images/indepth_tutorial_open-exploration_18_1.png

We can see from this visual that if B is high, while Q is high, we have a high concentration of cases where pollution stays below 0.8. The mean and stdev have some limited additional influence. By playing around with an alternative number of bins, or different number of layers, patterns can be coarsened or refined.

regional sensitivity analysis

A fourth approach for supporting scenario discovery is to perform a regional sensitivity analysis. The workbench implements a visual approach based on plotting the empirical CDF given a classification vector. Please look at section 3.4 in Pianosi et al (2016) for more details.

[13]:
from ema_workbench.analysis import regional_sa
from numpy.lib import recfunctions as rf

sns.set_style("white")

# model is the same across experiments
x = experiments.copy()
x = x.drop("model", axis=1)
y = outcomes["max_P"] < 0.8
fig = regional_sa.plot_cdfs(x, y)
sns.despine()
plt.show()
_images/indepth_tutorial_open-exploration_20_0.png

The above results clearly show that both B and Q are important. to a lesser extend, the mean also is relevant.

More advanced sampling techniques

The workbench can also be used for more advanced sampling techniques. To achieve this, it relies on SALib. On the workbench side, the only change is to specify the sampler we want to use. Next, we can use SALib directly to perform the analysis. To help with this, the workbench provides a convenience function for generating the problem dict which SALib provides. The example below focusses on performing SOBOL on the uncertainties, but we could do the exact same thing with the levers instead. The only changes required would be to set lever_sampling instead of uncertainty_sampling, and get the SALib problem dict based on the levers.

[17]:
from SALib.analyze import sobol
from ema_workbench import Samplers
from ema_workbench.em_framework.salib_samplers import get_SALib_problem

with MultiprocessingEvaluator(model) as evaluator:
    sa_results = evaluator.perform_experiments(
        scenarios=1000, uncertainty_sampling=Samplers.SOBOL
    )

experiments, outcomes = sa_results

problem = get_SALib_problem(model.uncertainties)
Si = sobol.analyze(
    problem, outcomes["max_P"], calc_second_order=True, print_to_console=False
)
[MainProcess/INFO] pool started with 12 workers
/Users/jhkwakkel/opt/anaconda3/lib/python3.9/site-packages/SALib/sample/saltelli.py:94: UserWarning:
        Convergence properties of the Sobol' sequence is only valid if
        `N` (1000) is equal to `2^n`.

  warnings.warn(msg)
[MainProcess/INFO] performing 12000 scenarios * 1 policies * 1 model(s) = 12000 experiments
100%|████████████████████████████████████| 12000/12000 [02:37<00:00, 76.17it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool

We have now completed the sobol analysis and have calculated the metrics. What remains is to visualize the metrics. Which can be done as shown below, focussing on St and S1. The error bars indicate the confidence intervals.

[18]:
scores_filtered = {k: Si[k] for k in ["ST", "ST_conf", "S1", "S1_conf"]}
Si_df = pd.DataFrame(scores_filtered, index=problem["names"])

sns.set_style("white")
fig, ax = plt.subplots(1)

indices = Si_df[["S1", "ST"]]
err = Si_df[["S1_conf", "ST_conf"]]

indices.plot.bar(yerr=err.values.T, ax=ax)
fig.set_size_inches(8, 6)
fig.subplots_adjust(bottom=0.3)
plt.show()
_images/indepth_tutorial_open-exploration_25_0.png
[ ]:

Best practices

Separate experimentation and analysis

It is strongly recommended to cleanly separate the various steps in your exploratory modeling pipeline. So, separately execute your experiments or perform your optimization, save the results, and next analyze these results. Moreover, since parallel execution can be troublesome within the Jupyter Lab / notebook environment, I personally run my experiments and optimizations either from the command line or through an IDE using a normal python file. Jupyter Lab is then used to analyze the results.

Keeping things organized

A frequently recurring cause of problems when using the workbench stems from not properly organizing your files. In particular when using multiprocessing it is key that you keep things well organized. The way the workbench works with multiprocessing is that it copies the entire working directory of the model to a temporary folder for each subprocess. This temporary folder is located in the same folder as the python or notebook file from which you are running. If the working directory of your model is the same as the directory in which the run file resized, you can easily fill up your hard disk in minutes. To avoid these kinds of problems, I suggest to use a directory structure as outlined below.

project
├─ model_files
├── a_model.nlogo
└── some_input.csv
├─ results
├── 100k_nfe_seed1.csv
└── 1000_experiments.tar.gz
├─ figures
└── pairwise_scatter.png
├─experiments.py
├─optimization.py
├─analysis.ipynb
└─model_definition.py

Also, if you are not familiar with absolute and relative paths, please read up on that first and only use relative paths when using the workbench. Not only will this reduce the possibility for errors, it will also mean that moving your code from one machine to another will be a lot easier.

Vensim Tips and Tricks

Debugging a model

A common occurring problem is that some of the runs of a Vensim model do not complete correctly. In the logger, we see a message stating that a run did not complete correct, with a description of the case that did not complete correctly attached to it. Typically, this error is due to a division by zero somewhere in the model during the simulation. The easiest way of finding the source of the division by zero is via Vensim itself. However, this requires that the model is parameterized as specified by the case that created the error. It is of course possible to set all the parameters by hand, however this becomes annoying on larger models, or if one has to do it multiple times. Since the Vensim DLL does not have a way to save a model, we cannot use the DLL. Instead, we can use the fact that one can save a Vensim model as a text file. By changing the required parameters in this text file via the workbench, we can then open the modified model in Vensim and spot the error.

The following script can be used for this purpose.

 1"""
 2Created on 11 aug. 2011
 3
 4.. codeauthor:: wauping <w.auping (at) student (dot) tudelft (dot) nl>
 5                jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 6
 7
 8To be able to debug the Vensim model, a few steps are needed:
 9
10    1.  The case that gave a bug, needs to be saved in a text  file. The entire
11        case description should be on a single line.
12    2.  Reform and clean your model ( In the Vensim menu: Model, Reform and
13        Clean). Choose
14
15         * Equation Order: Alphabetical by group (not really necessary)
16         * Equation Format: Terse
17
18    3.  Save your model as text (File, Save as..., Save as Type: Text Format
19        Models
20    4.  Run this script
21    5.  If the print in the end is not set([]), but set([array]), the array
22        gives the values that where not found and changed
23    5.  Run your new model (for example 'new text.mdl')
24    6.  Vensim tells you about your critical mistake
25
26"""
27fileSpecifyingError = ""
28
29pathToExistingModel = "/salinization/Verzilting_aanpassingen incorrect.mdl"
30pathToNewModel = "/models/salinization/Verzilting_aanpassingen correct.mdl"
31newModel = open(pathToNewModel, "w")
32
33# line = open(fileSpecifyingError).read()
34
35line = "rainfall : 0.154705633188; adaptation time from non irrigated agriculture : 0.915157119079; salt effect multiplier : 1.11965969891; adaptation time to non irrigated agriculture : 0.48434342934; adaptation time to irrigated agriculture : 0.330990830832; water shortage multiplier : 0.984356102036; delay time salt seepage : 6.0; adaptation time : 6.90258192256; births multiplier : 1.14344734715; diffusion lookup : [(0, 8.0), (10, 8.0), (20, 8.0), (30, 8.0), (40, 7.9999999999999005), (50, 4.0), (60, 9.982194802803703e-14), (70, 1.2455526635140464e-27), (80, 1.5541686655435471e-41), (90, 1.9392517969836692e-55)]; salinity effect multiplier : 1.10500381093; technological developments in irrigation : 0.0117979353255; adaptation time from irrigated agriculture : 1.58060947607; food shortage multiplier : 0.955325345996; deaths multiplier : 0.875605669911; "
36
37# we assume the case specification was copied from the logger
38splitOne = line.split(",")
39variable = {}
40for n in range(len(splitOne) - 1):
41    splitTwo = splitOne[n].split(":")
42    variableElement = splitTwo[0]
43    # Delete the spaces and other rubish on the sides of the variable name
44    variableElement = variableElement.lstrip()
45    variableElement = variableElement.lstrip("'")
46    variableElement = variableElement.rstrip()
47    variableElement = variableElement.rstrip("'")
48    print(variableElement)
49    valueElement = splitTwo[1]
50    valueElement = valueElement.lstrip()
51    valueElement = valueElement.rstrip()
52    variable[variableElement] = valueElement
53print(variable)
54
55# This generates a new (text-formatted) model
56changeNextLine = False
57settedValues = []
58for line in open(pathToExistingModel):
59    if line.find("=") != -1:
60        elements = line.split("=")
61        value = elements[0]
62        value = value.strip()
63        if value in variable:
64            elements[1] = variable.get(value)
65            line = elements[0] + " = " + elements[1]
66            settedValues.append(value)
67
68    newModel.write(line)
69newModel.close()  # in order to be able to open the model in Vensim
70notSet = set(variable.keys()) - set(settedValues)
71print(notSet)

Glossary

parameter uncertainty

An uncertainty is a parameter uncertainty if the range is continuous from the lower bound to the upper bound. A parameter uncertainty can be either real valued or discrete valued.

categorical uncertainty

An uncertainty is categorical if there is not a range but a set of possibilities over which one wants to sample.

lookup uncertainty

vensim specific extension to categorical uncertainty for handling lookups in various ways

uncertainty space

the space created by the set of uncertainties

ensemble

a python class responsible for running a series of computational experiments.

model interface

a python class that provides an interface to an underlying model

working directory

a directory that contains files that a model needs

classification trees

a category of machine learning algorithms for rule induction

prim (patient rule induction method)

a rule induction algorithm

coverage

a metric developed for scenario discovery

density

a metric developed for scenario discovery

scenario discovery

a use case of EMA

case

A case specifies the input parameters for a run of a model. It is a dict instance, with the names of the uncertainties as key, and their sampled values as value.

experiment

An experiment is a complete specification for a run. It specifies the case, the name of the policy, and the name of the model.

policy

a policy is by definition an object with a name attribute. So, policy[‘name’] most return the name of the policy

result

the combination of an experiment and the associated outcomes for the experiment

outcome

the data of interest produced by a model given an experiment

EMA Modules

Exploratory modeling framework

model

This module specifies the abstract base class for interfacing with models. Any model that is to be controlled from the workbench is controlled via an instance of an extension of this abstract base class.

class ema_workbench.em_framework.model.AbstractModel(name)

ModelStructureInterface is one of the the two main classes used for performing EMA. This is an abstract base class and cannot be used directly.

uncertainties

list of parameter instances

Type

list

levers

list of parameter instances

Type

list

outcomes

list of outcome instances

Type

list

name

alphanumerical name of model structure interface

Type

str

output

this should be a dict with the names of the outcomes as key

Type

dict

When extending this class :meth:`model_init` and
:meth:`run_model` have to be implemented.
as_dict()

returns a dict representation of the model

cleanup()

This model is called after finishing all the experiments, but just prior to returning the results. This method gives a hook for doing any cleanup, such as closing applications.

In case of running in parallel, this method is called during the cleanup of the pool, just prior to removing the temporary directories.

initialized(policy)

check if model has been initialized

Parameters

policy (a Policy instance) –

model_init(policy)

Method called to initialize the model.

Parameters

policy (dict) – policy to be run.

Note

This method should always be implemented. Although in simple cases, a simple pass can suffice.

reset_model()

Method for reseting the model to its initial state. The default implementation only sets the outputs to an empty dict.

retrieve_output()

Method for retrieving output after a model run.

Return type

dict with the results of a model run.

run_model(scenario, policy)

Method for running an instantiated model structure.

Parameters
  • scenario (Scenario instance) –

  • policy (Policy instance) –

class ema_workbench.em_framework.model.FileModel(name, wd=None, model_file=None)
as_dict()

returns a dict representation of the model

class ema_workbench.em_framework.model.Model(name, function=None)
class ema_workbench.em_framework.model.Replicator(name)
run_model(scenario, policy)

Method for running an instantiated model structure.

Parameters
  • scenario (Scenario instance) –

  • policy (Policy instance) –

class ema_workbench.em_framework.model.ReplicatorModel(name, function=None)
class ema_workbench.em_framework.model.SingleReplication(name)
run_model(scenario, policy)

Method for running an instantiated model structure.

Parameters
  • scenario (Scenario instance) –

  • policy (Policy instance) –

parameters

parameters and related helper classes and functions

class ema_workbench.em_framework.parameters.BooleanParameter(name, default=None, variable_name=None, pff=False)

boolean model input parameter

A BooleanParameter is similar to a CategoricalParameter, except the category values can only be True or False.

Parameters
  • name (str) –

  • variable_name (str, or list of str) –

class ema_workbench.em_framework.parameters.CategoricalParameter(name, categories, default=None, variable_name=None, pff=False, multivalue=False)

categorical model input parameter

Parameters
  • name (str) –

  • categories (collection of obj) –

  • variable_name (str, or list of str) –

  • multivalue (boolean) – if categories have a set of values, for each variable_name a different one.

  • TODO (#) –

  • TODO

cat_for_index(index)

return category given index

Parameters

index (int) –

Return type

object

from_dist(name, dist)

alternative constructor for creating a parameter from a frozen scipy.stats distribution directly

Parameters
  • dist (scipy stats frozen dist) –

  • **kwargs (valid keyword arguments for Parameter instance) –

index_for_cat(category)

return index of category

Parameters

category (object) –

Return type

int

class ema_workbench.em_framework.parameters.Category(name, value)
class ema_workbench.em_framework.parameters.Constant(name, value)

Constant class,

can be used for any parameter that has to be set to a fixed value

class ema_workbench.em_framework.parameters.IntegerParameter(name, lower_bound, upper_bound, resolution=None, default=None, variable_name=None, pff=False)

integer valued model input parameter

Parameters
  • name (str) –

  • lower_bound (int) –

  • upper_bound (int) –

  • resolution (iterable) –

  • variable_name (str, or list of str) –

Raises
  • ValueError – if lower bound is larger than upper bound

  • ValueError – if entries in resolution are outside range of lower_bound and upper_bound, or not an integer instance

  • ValueError – if lower_bound or upper_bound is not an integer instance

classmethod from_dist(name, dist, **kwargs)

alternative constructor for creating a parameter from a frozen scipy.stats distribution directly

Parameters
  • dist (scipy stats frozen dist) –

  • **kwargs (valid keyword arguments for Parameter instance) –

class ema_workbench.em_framework.parameters.Parameter(name, lower_bound, upper_bound, resolution=None, default=None, variable_name=None, pff=False)

Base class for any model input parameter

Parameters
  • name (str) –

  • lower_bound (int or float) –

  • upper_bound (int or float) –

  • resolution (collection) –

  • pff (bool) – if true, sample over this parameter using resolution in case of partial factorial sampling

Raises
  • ValueError – if lower bound is larger than upper bound

  • ValueError – if entries in resolution are outside range of lower_bound and upper_bound

classmethod from_dist(name, dist, **kwargs)

alternative constructor for creating a parameter from a frozen scipy.stats distribution directly

Parameters
  • dist (scipy stats frozen dist) –

  • **kwargs (valid keyword arguments for Parameter instance) –

class ema_workbench.em_framework.parameters.RealParameter(name, lower_bound, upper_bound, resolution=None, default=None, variable_name=None, pff=False)

real valued model input parameter

Parameters
  • name (str) –

  • lower_bound (int or float) –

  • upper_bound (int or float) –

  • resolution (iterable) –

  • variable_name (str, or list of str) –

Raises
  • ValueError – if lower bound is larger than upper bound

  • ValueError – if entries in resolution are outside range of lower_bound and upper_bound

classmethod from_dist(name, dist, **kwargs)

alternative constructor for creating a parameter from a frozen scipy.stats distribution directly

Parameters
  • dist (scipy stats frozen dist) –

  • **kwargs (valid keyword arguments for Parameter instance) –

ema_workbench.em_framework.parameters.parameters_from_csv(uncertainties, **kwargs)

Helper function for creating many Parameters based on a DataFrame or csv file

Parameters
  • uncertainties (str, DataFrame) –

  • **kwargs (dict, arguments to pass to pandas.read_csv) –

Return type

list of Parameter instances

This helper function creates uncertainties. It assumes that the DataFrame or csv file has a column titled ‘name’, optionally a type column {int, real, cat}, can be included as well. the remainder of the columns are handled as values for the parameters. If type is not specified, the function will try to infer type from the values.

Note that this function does not support the resolution and default kwargs on parameters.

An example of a csv:

NAME,TYPE,,, a_real,real,0,1.1, an_int,int,1,9, a_categorical,cat,a,b,c

this CSV file would result in

[RealParameter(‘a_real’, 0, 1.1, resolution=[], default=None),

IntegerParameter(‘an_int’, 1, 9, resolution=[], default=None), CategoricalParameter(‘a_categorical’, [‘a’, ‘b’, ‘c’], default=None)]

ema_workbench.em_framework.parameters.parameters_to_csv(parameters, file_name)

Helper function for writing a collection of parameters to a csv file

Parameters
  • parameters (collection of Parameter instances) –

  • file_name (str) –

The function iterates over the collection and turns these into a data frame prior to storing them. The resulting csv can be loaded using the parameters_from_csv function. Note that currently we don’t store resolution and default attributes.

outcomes

Module for outcome classes

class ema_workbench.em_framework.outcomes.AbstractOutcome(name, kind=0, variable_name=None, function=None, expected_range=None, shape=None)

Base Outcome class

Parameters
  • name (str) – Name of the outcome.

  • kind ({INFO, MINIMZE, MAXIMIZE}, optional) –

  • variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name

  • function (callable, optional) – a callable to perform postprocessing on data retrieved from model

  • expected_range (2 tuple, optional) – expected min and max value for outcome, used by HyperVolume convergence metric

  • shape ({tuple, None} optional) –

name
Type

str

kind
Type

int

variable_name
Type

str

function
Type

callable

shape
Type

tuple

abstract classmethod from_disk(filename, archive)

helper function for loading from disk

Parameters
  • filename (str) –

  • archive (Tarfile) –

abstract classmethod to_disk(values)

helper function for writing outcome to disk

Parameters

values (obj) – data to store

Return type

BytesIO

class ema_workbench.em_framework.outcomes.ArrayOutcome(name, variable_name=None, function=None, expected_range=None, shape=None)

Array Outcome class for n-dimensional collections

Parameters
  • name (str) – Name of the outcome.

  • variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name

  • function (callable, optional) – a callable to perform postprocessing on data retrieved from model

  • expected_range (2 tuple, optional) – expected min and max value for outcome, used by HyperVolume convergence metric

  • shape ({tuple, None}, optional) –

name
Type

str

kind
Type

int

variable_name
Type

str

function
Type

callable

shape
Type

tuple

expected_range
Type

tuple

classmethod from_disk(filename, archive)

helper function for loading from disk

Parameters
  • filename (str) –

  • archive (Tarfile) –

classmethod to_disk(values)

helper function for writing outcome to disk

Parameters

values (ND array) –

Returns

  • BytesIO

  • filename

class ema_workbench.em_framework.outcomes.Constraint(name, parameter_names=None, outcome_names=None, function=None)

Constraints class that can be used when defining constrained optimization problems.

Parameters
  • name (str) –

  • parameter_names (str or collection of str) –

  • outcome_names (str or collection of str) –

  • function (callable) –

name
Type

str

parameter_names

name(s) of the uncertain parameter(s) and/or lever parameter(s) to which the constraint applies

Type

str, list of str

outcome_names

name(s) of the outcome(s) to which the constraint applies

Type

str, list of str

function

The function should return the distance from the feasibility threshold, given the model outputs with a variable name. The distance should be 0 if the constraint is met.

Type

callable

class ema_workbench.em_framework.outcomes.ScalarOutcome(name, kind=0, variable_name=None, function=None, expected_range=None)

Scalar Outcome class

Parameters
  • name (str) – Name of the outcome.

  • kind ({INFO, MINIMZE, MAXIMIZE}, optional) –

  • variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name

  • function (callable, optional) – a callable to perform post processing on data retrieved from model

  • expected_range (collection, optional) – expected min and max value for outcome, used by HyperVolume convergence metric

name
Type

str

kind
Type

int

variable_name
Type

str

function
Type

callable

shape
Type

tuple

expected_range
Type

tuple

classmethod from_disk(filename, archive)

helper function for loading from disk

Parameters
  • filename (str) –

  • archive (Tarfile) –

classmethod to_disk(values)

helper function for writing outcome to disk

Parameters

values (1D array) –

Returns

  • BytesIO

  • filename

class ema_workbench.em_framework.outcomes.TimeSeriesOutcome(name, variable_name=None, function=None, expected_range=None, shape=None)

TimeSeries Outcome class

Parameters
  • name (str) – Name of the outcome.

  • variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name

  • function (callable, optional) – a callable to perform postprocessing on data retrieved from model

  • expected_range (2 tuple, optional) – expected min and max value for outcome, used by HyperVolume convergence metric

  • shape ({tuple, None}, optional) –

name
Type

str

kind
Type

int

variable_name
Type

str

function
Type

callable

shape
Type

tuple

expected_range
Type

tuple

classmethod from_disk(filename, archive)

helper function for loading from disk

Parameters
  • filename (str) –

  • archive (Tarfile) –

classmethod to_disk(values)

helper function for writing outcome to disk

Parameters

values (DataFrame) –

Returns

  • StringIO

  • filename

evaluators

collection of evaluators for performing experiments, optimization, and robust optimization

class ema_workbench.em_framework.evaluators.IpyparallelEvaluator(msis, client, **kwargs)

evaluator for using an ipypparallel pool

evaluate_experiments(scenarios, policies, callback, combine='factorial')

used by ema_workbench

finalize()

finalize the evaluator

initialize()

initialize the evaluator

class ema_workbench.em_framework.evaluators.MultiprocessingEvaluator(msis, n_processes=None, maxtasksperchild=None, **kwargs)

evaluator for experiments using a multiprocessing pool

Parameters
  • msis (collection of models) –

  • n_processes (int (optional)) – A negative number can be inputted to use the number of logical cores minus the negative cores. For example, on a 12 thread processor, -2 results in using 10 threads.

  • max_tasks (int (optional)) –

note that the maximum number of available processes is either multiprocessing.cpu_count() and in case of windows, this never can be higher then 61

evaluate_experiments(scenarios, policies, callback, combine='factorial')

used by ema_workbench

finalize()

finalize the evaluator

initialize()

initialize the evaluator

class ema_workbench.em_framework.evaluators.Samplers(value)

Enum for different kinds of samplers

class ema_workbench.em_framework.evaluators.SequentialEvaluator(msis)
evaluate_experiments(scenarios, policies, callback, combine='factorial')

used by ema_workbench

finalize()

finalize the evaluator

initialize()

initialize the evaluator

ema_workbench.em_framework.evaluators.optimize(models, algorithm=<class 'platypus.algorithms.EpsNSGAII'>, nfe=10000, searchover='levers', evaluator=None, reference=None, convergence=None, constraints=None, convergence_freq=1000, logging_freq=5, **kwargs)

optimize the model

Parameters
  • models (1 or more Model instances) –

  • algorithm (a valid Platypus optimization algorithm) –

  • nfe (int) –

  • searchover ({'uncertainties', 'levers'}) –

  • evaluator (evaluator instance) –

  • reference (Policy or Scenario instance, optional) – overwrite the default scenario in case of searching over levers, or default policy in case of searching over uncertainties

  • convergence (function or collection of functions, optional) –

  • constraints (list, optional) –

  • convergence_freq (int) – nfe between convergence check

  • logging_freq (int) – number of generations between logging of progress

  • kwargs (any additional arguments will be passed on to algorithm) –

Return type

pandas DataFrame

Raises
  • EMAError if searchover is not one of 'uncertainties' or 'levers'

  • NotImplementedError if len(models) > 1

ema_workbench.em_framework.evaluators.perform_experiments(models, scenarios=0, policies=0, evaluator=None, reporting_interval=None, reporting_frequency=10, uncertainty_union=False, lever_union=False, outcome_union=False, uncertainty_sampling=Samplers.LHS, lever_sampling=Samplers.LHS, callback=None, return_callback=False, combine='factorial', log_progress=False)

sample uncertainties and levers, and perform the resulting experiments on each of the models

Parameters
  • models (one or more AbstractModel instances) –

  • scenarios (int or collection of Scenario instances, optional) –

  • policies (int or collection of Policy instances, optional) –

  • evaluator (Evaluator instance, optional) –

  • reporting_interval (int, optional) –

  • reporting_frequency (int, optional) –

  • uncertainty_union (boolean, optional) –

  • lever_union (boolean, optional) –

  • outcome_union (boolean, optional) –

  • uncertainty_sampling ({LHS, MC, FF, PFF, SOBOL, MORRIS, FAST}, optional) –

  • lever_sampling ({LHS, MC, FF, PFF, SOBOL, MORRIS, FAST}, optional TODO:: update doc) –

  • callback (Callback instance, optional) –

  • return_callback (boolean, optional) –

  • log_progress (bool, optional) –

  • combine ({'factorial', 'zipover'}, optional) – how to combine uncertainties and levers? In case of ‘factorial’, both are sampled separately using their respective samplers. Next the resulting designs are combined in a full factorial manner. In case of ‘zipover’, both are sampled separately and then combined by cycling over the shortest of the the two sets of designs until the longest set of designs is exhausted.

Returns

the experiments as a dataframe, and a dict with the name of an outcome as key, and the associated values as numpy array. Experiments and outcomes are aligned on index.

Return type

tuple

optimization

class ema_workbench.em_framework.optimization.ArchiveLogger(directory, decision_varnames, outcome_varnames, base_filename='archive')

Helper class to write the archive to disk at each iteration

Parameters
  • directory (str) –

  • decision_varnames (list of str) –

  • outcome_varnames (list of str) –

  • base_filename (str, optional) –

TODO:: put it in a tarbal instead of dedicated directory

class ema_workbench.em_framework.optimization.Convergence(metrics, max_nfe, convergence_freq=1000, logging_freq=5, log_progress=False)

helper class for tracking convergence of optimization

class ema_workbench.em_framework.optimization.EpsilonProgress

epsilon progress convergence metric class

class ema_workbench.em_framework.optimization.HyperVolume(minimum, maximum)

Hypervolume convergence metric class

This metric is derived from a hyper-volume measure, which describes the multi-dimensional volume of space contained within the pareto front. When computed with minimum and maximums, it describes the ratio of dominated outcomes to all possible outcomes in the extent of the space. Getting this number to be high or low is not necessarily important, as not all outcomes within the min-max range will be feasible. But, having the hypervolume remain fairly stable over multiple generations of the evolutionary algorithm provides an indicator of convergence.

Parameters
  • minimum (numpy array) –

  • maximum (numpy array) –

class ema_workbench.em_framework.optimization.Problem(searchover, parameters, outcome_names, constraints, reference=None)

small extension to Platypus problem object, includes information on the names of the decision variables, the names of the outcomes, and the type of search

class ema_workbench.em_framework.optimization.RobustProblem(parameters, outcome_names, scenarios, robustness_functions, constraints)

small extension to Problem object for robust optimization, adds the scenarios and the robustness functions

samplers

This module contains various classes that can be used for specifying different types of samplers. These different samplers implement basic sampling techniques including Full Factorial sampling, Latin Hypercube sampling, and Monte Carlo sampling.

class ema_workbench.em_framework.samplers.AbstractSampler

Abstract base class from which different samplers can be derived.

In the simplest cases, only the sample method needs to be overwritten. generate_designs` is the only method called from outside. The other methods are used internally to generate the designs.

generate_designs(parameters, nr_samples)

external interface for Sampler. Returns the computational experiments over the specified parameters, for the given number of samples for each parameter.

Parameters
  • parameters (list) – a list of parameters for which to generate the experimental designs

  • nr_samples (int) – the number of samples to draw for each parameter

Returns

  • generator – a generator object that yields the designs resulting from combining the parameters

  • int – the number of experimental designs

generate_samples(parameters, size)

The main method of :class: ~sampler.Sampler and its children. This will call the sample method for each of the parameters and return the resulting designs.

Parameters
  • parameters (collection) – a collection of RealParameter, IntegerParameter, and CategoricalParameter instances.

  • size (int) – the number of samples to generate.

Returns

dict with the parameter.name as key, and the sample as value

Return type

dict

sample(distribution, size)

method for sampling a number of samples from a particular distribution. The various samplers differ with respect to their implementation of this method.

Parameters
  • distribution (scipy frozen distribution) –

  • size (int) – the number of samples to generate

Returns

the samples for the distribution and specified parameters

Return type

numpy array

class ema_workbench.em_framework.samplers.DefaultDesigns(designs, parameters, n)

iterable for the experimental designs

class ema_workbench.em_framework.samplers.FullFactorialSampler

generates a full factorial sample.

If the parameter is non categorical, the resolution is set the number of samples. If the parameter is categorical, the specified value for samples will be ignored and each category will be used instead.

determine_nr_of_designs(sampled_parameters)

Helper function for determining the number of experiments that will be generated given the sampled parameters.

Parameters

sampled_parameters (list) – a list of sampled parameters, as the values return by generate_samples

Returns

the total number of experimental design

Return type

int

generate_designs(parameters, nr_samples)

This method provides an alternative implementation to the default implementation provided by Sampler. This version returns a full factorial design across the parameters.

Parameters
  • parameters (list) – a list of parameters for which to generate the experimental designs

  • nr_samples (int) – the number of intervals to use on each Parameter. Categorical parameters always return all their categories

Returns

  • generator – a generator object that yields the designs resulting from combining the parameters

  • int – the number of experimental designs

generate_samples(parameters, size)

The main method of :class: ~sampler.Sampler and its children. This will call the sample method for each of the parameters and return the resulting samples

Parameters
  • parameters (collection) – a collection of Parameter instances

  • size (int) – the number of samples to generate.

Returns

with the paramertainty.name as key, and the sample as value

Return type

dict

class ema_workbench.em_framework.samplers.LHSSampler

generates a Latin Hypercube sample for each of the parameters

sample(distribution, size)

generate a Latin Hypercube Sample.

Parameters
  • distribution (scipy frozen distribution) –

  • size (int) – the number of samples to generate

Returns

with the paramertainty.name as key, and the sample as value

Return type

dict

class ema_workbench.em_framework.samplers.MonteCarloSampler

generates a Monte Carlo sample for each of the parameters.

sample(distribution, size)

generate a Monte Carlo Sample.

Parameters
  • distribution (scipy frozen distribution) –

  • size (int) – the number of samples to generate

Returns

with the paramertainty.name as key, and the sample as value

Return type

dict

class ema_workbench.em_framework.samplers.UniformLHSSampler
generate_samples(parameters, size)
Parameters
  • parameters (collection) –

  • size (int) –

Returns

dict with the parameter.name as key, and the sample as value

Return type

dict

ema_workbench.em_framework.samplers.determine_parameters(models, attribute, union=True)

determine the parameters over which to sample

Parameters
  • models (a collection of AbstractModel instances) –

  • attribute ({'uncertainties', 'levers'}) –

  • union (bool, optional) – in case of multiple models, sample over the union of levers, or over the intersection of the levers

Return type

collection of Parameter instances

ema_workbench.em_framework.samplers.sample_levers(models, n_samples, union=True, sampler=<ema_workbench.em_framework.samplers.LHSSampler object>)

generate policies by sampling over the levers

Parameters
  • models (a collection of AbstractModel instances) –

  • n_samples (int) –

  • union (bool, optional) – in case of multiple models, sample over the union of levers, or over the intersection of the levers

  • sampler (Sampler instance, optional) –

Return type

generator yielding Policy instances

ema_workbench.em_framework.samplers.sample_parameters(parameters, n_samples, sampler=<ema_workbench.em_framework.samplers.LHSSampler object>, kind=<class 'ema_workbench.em_framework.points.Point'>)

generate cases by sampling over the parameters

Parameters
  • parameters (collection of AbstractParameter instances) –

  • n_samples (int) –

  • sampler (Sampler instance, optional) –

  • kind ({Case, Scenario, Policy}, optional) – the class into which the samples are collected

Return type

generator yielding Case, Scenario, or Policy instances

ema_workbench.em_framework.samplers.sample_uncertainties(models, n_samples, union=True, sampler=<ema_workbench.em_framework.samplers.LHSSampler object>)

generate scenarios by sampling over the uncertainties

Parameters
  • models (a collection of AbstractModel instances) –

  • n_samples (int) –

  • union (bool, optional) – in case of multiple models, sample over the union of uncertainties, or over the intersection of the uncertainties

  • sampler (Sampler instance, optional) –

Return type

generator yielding Scenario instances

salib_samplers

Samplers for working with SALib

class ema_workbench.em_framework.salib_samplers.FASTSampler(m=4)

Sampler generating a Fourier Amplitude Sensitivity Test (FAST) using SALib

Parameters

m (int (default: 4)) – The interference parameter, i.e., the number of harmonics to sum in the Fourier series decomposition

class ema_workbench.em_framework.salib_samplers.MorrisSampler(num_levels=4, optimal_trajectories=None, local_optimization=True)

Sampler generating a morris design using SALib

Parameters
  • num_levels (int) – The number of grid levels

  • grid_jump (int) – The grid jump size

  • optimal_trajectories (int, optional) – The number of optimal trajectories to sample (between 2 and N)

  • local_optimization (bool, optional) – Flag whether to use local optimization according to Ruano et al. (2012) Speeds up the process tremendously for bigger N and num_levels. Stating this variable to be true causes the function to ignore gurobi.

class ema_workbench.em_framework.salib_samplers.SobolSampler(second_order=True)

Sampler generating a Sobol design using SALib

Parameters

second_order (bool, optional) – indicates whether second order effects should be included

ema_workbench.em_framework.salib_samplers.get_SALib_problem(uncertainties)

returns a dict with a problem specification as required by SALib

experiment_runner

helper module for running experiments and keeping track of which model has been initialized with which policy.

class ema_workbench.em_framework.experiment_runner.ExperimentRunner(msis)

Helper class for running the experiments

This class contains the logic for initializing models properly, running the experiment, getting the results, and cleaning up afterwards.

Parameters
  • msis (dict) –

  • model_kwargs (dict) –

msi_initializiation

keeps track of which model is initialized with which policy.

Type

dict

msis

models indexed by name

Type

dict

model_kwargs

keyword arguments for model_init

Type

dict

run_experiment(experiment)

The logic for running a single experiment. This code makes sure that model(s) are initialized correctly.

Parameters

experiment (Case instance) –

Returns

  • experiment_id (int)

  • case (dict)

  • policy (str)

  • model_name (str)

  • result (dict)

Raises
  • EMAError – if the model instance raises an EMA error, these are reraised.

  • Exception – Catch all for all other exceptions being raised by the model. These are reraised.

callbacks

This module provides an abstract base class for a callback and a default implementation.

If you want to store the data in a way that is different from the functionality provided by the default callback, you can write your own extension of callback. For example, you can easily implement a callback that stores the data in e.g. a NoSQL file.

The only method to implement is the __call__ magic method. To use logging of progress, always call super.

class ema_workbench.em_framework.callbacks.AbstractCallback(uncertainties, outcomes, levers, nr_experiments, reporting_interval=None, reporting_frequency=10, log_progress=False)

Abstract base class from which different call back classes can be derived. Callback is responsible for storing the results of the runs.

Parameters
  • uncs (list) – a list of the parameters over which the experiments are being run.

  • outcomes (list) – a list of outcomes

  • nr_experiments (int) – the total number of experiments to be executed

  • reporting_interval (int, optional) – the interval at which to provide progress information via logging.

  • reporting_frequency (int, optional) – the total number of progress logs

i

a counter that keeps track of how many experiments have been saved

Type

int

reporting_interval

the interval between progress logs

Type

int,

abstract get_results()

method for retrieving the results. Called after all experiments have been completed. Any extension of AbstractCallback needs to implement this method.

class ema_workbench.em_framework.callbacks.DefaultCallback(uncs, levers, outcomes, nr_experiments, reporting_interval=100, reporting_frequency=10, log_progress=False)

default callback system callback can be used in perform_experiments as a means for specifying the way in which the results should be handled. If no callback is specified, this default implementation is used. This one can be overwritten or replaced with a callback of your own design. For example if you prefer to store the result in a database or write them to a text file

get_results()

method for retrieving the results. Called after all experiments have been completed. Any extension of AbstractCallback needs to implement this method.

class ema_workbench.em_framework.callbacks.FileBasedCallback(uncs, levers, outcomes, nr_experiments, reporting_interval=100, reporting_frequency=10)

Callback that stores data in csv files while running

Parameters
  • uncs (collection of Parameter instances) –

  • levers (collection of Parameter instances) –

  • outcomes (collection of Outcome instances) –

  • nr_experiments (int) –

  • reporting_interval (int, optional) –

  • reporting_frequency (int, optional) –

  • ./temp (the data is stored in) –

  • current (relative to the) –

  • exists (working directory. If this directory already) –

  • be (it will) –

  • overwritten.

Warning

This class is still in beta. API is expected to change over the coming months.

get_results()

method for retrieving the results. Called after all experiments have been completed. Any extension of AbstractCallback needs to implement this method.

points

classes for representing points in parameter space, as well as associated hellper functions

class ema_workbench.em_framework.points.Experiment(name, model_name, policy, scenario, experiment_id)

A convenience object that contains a specification of the model, policy, and scenario to run

name
Type

str

model_name
Type

str

policy
Type

Policy instance

scenario
Type

Scenario instance

experiment_id
Type

int

class ema_workbench.em_framework.points.ExperimentReplication(scenario, policy, constants, replication=None)

helper class that combines scenario, policy, any constants, and replication information (seed etc) into a single dictionary.

This class represent the complete specification of parameters to run for a given experiment.

class ema_workbench.em_framework.points.Policy(name=None, **kwargs)

Helper class representing a policy

name
Type

str, int, or float

id
Type

int

all keyword arguments are wrapped into a dict.
class ema_workbench.em_framework.points.Scenario(name=None, **kwargs)

Helper class representing a scenario

name
Type

str, int, or float

id
Type

int

all keyword arguments are wrapped into a dict.
ema_workbench.em_framework.points.combine_cases_factorial(*point_collections)

Combine collections of cases in a full factorial manner

Parameters

point_collections (collection of collections of Point instances) –

Yields

Point

ema_workbench.em_framework.points.combine_cases_sampling(*point_collection)

Combine collections of cases by iterating over the longest collection while sampling with replacement from the others

Parameters

point_collection (collection of collection of Point instances) –

Yields

Point

ema_workbench.em_framework.points.experiment_generator(scenarios, model_structures, policies, combine='factorial')

generator function which yields experiments

Parameters
  • scenarios (iterable of dicts) –

  • model_structures (list) –

  • policies (list) –

  • {'factorial (combine =) – controls how to combine scenarios, policies, and model_structures into experiments.

  • sample'} – controls how to combine scenarios, policies, and model_structures into experiments.

Notes

if combine is ‘factorial’ then this generator is essentially three nested loops: for each model structure, for each policy, for each scenario, return the experiment. This means that designs should not be a generator because this will be exhausted after the running the first policy on the first model. if combine is ‘zipover’ then this generator cycles over scenarios, policies and model structures until the longest of the three collections is exhausted.

ema_multiprocessing

support for using the multiprocessing library in combination with the workbench

ema_workbench.em_framework.ema_multiprocessing.setup_working_directories(models, root_dir)

copies the working directory of each model to a process specific temporary directory and update the working directory of the model

Parameters
  • models (list) –

  • root_dir (str) –

ema_ipyparallel

This module provides functionality for combining the EMA workbench with IPython parallel.

class ema_workbench.em_framework.ema_ipyparallel.Engine(engine_id, msis, cwd)

class for setting up ema specific stuff on each engine also functions as a convenient namespace for workbench relevant variables

Parameters
  • engine_id (int) –

  • msis (list) –

  • cwd (str) –

cleanup_working_directory()

remove the root working directory of the engine

run_experiment(experiment)

run the experiment, the actual running is delegated to an ExperimentRunner instance

class ema_workbench.em_framework.ema_ipyparallel.EngingeLoggerAdapter(logger, topic)

LoggerAdapter that inserts EMA as a topic into log messages

Parameters
  • logger (logger instance) –

  • topic (str) –

process(msg, kwargs)

Process the logging message and keyword arguments passed in to a logging call to insert contextual information. You can either manipulate the message itself, the keyword args or both. Return the message and kwargs modified (or not) to suit your needs.

Normally, you’ll only need to override this one method in a LoggerAdapter subclass for your specific needs.

class ema_workbench.em_framework.ema_ipyparallel.LogWatcher(**kwargs: Any)

A simple class that receives messages on a SUB socket, as published by subclasses of zmq.log.handlers.PUBHandler, and logs them itself.

This can subscribe to multiple topics, but defaults to all topics.

log_message(raw)

receive and parse a message, then log it.

subscribe()

Update our SUB socket’s subscriptions.

ema_workbench.em_framework.ema_ipyparallel.cleanup(client)

cleanup directory tree structure on all engines

ema_workbench.em_framework.ema_ipyparallel.get_engines_by_host(client)

returns the engine ids by host

Parameters

client (IPython.parallel.Client instance) –

Returns

a dict with hostnames as keys, and a list of engine ids

Return type

dict

ema_workbench.em_framework.ema_ipyparallel.initialize_engines(client, msis, cwd)

initialize engine instances on all engines

Parameters
  • client (IPython.parallel.Client) –

  • msis (dict) – dict of model structure interfaces with their names as keys

  • cwd (str) –

ema_workbench.em_framework.ema_ipyparallel.set_engine_logger()

Updates EMA logging on the engines with an EngineLoggerAdapter This adapter injects EMA as a topic into all messages

ema_workbench.em_framework.ema_ipyparallel.start_logwatcher()

convenience function for starting the LogWatcher

Returns

  • LogWatcher – the log watcher instance

  • Thread – the log watcher thread

  • .. note (there can only be one log watcher on a given url.)

ema_workbench.em_framework.ema_ipyparallel.update_cwd_on_all_engines(client)

updates the current working directory on the engines to point to the same working directory as this notebook

currently only works if engines are on same machine.

Parameters

client (IPython.parallel.Client instance) –

util

utilities used throughout em_framework

class ema_workbench.em_framework.util.Counter(startfrom=0)

helper function for generating counter based names for NamedDicts

class ema_workbench.em_framework.util.ProgressTrackingMixIn(N, reporting_interval, logger, log_progress=False, log_func=<function ProgressTrackingMixIn.<lambda>>)

Mixin for monitoring progress

Parameters
  • N (int) – total number of experiments

  • reporting_interval (int) – nfe between logging progress

  • logger (logger instance) –

  • log_progress (bool, optional) –

  • log_func (callable, optional) – function called with self as only argument, should invoke self._logger with custom log message

i
Type

int

reporting_interval
Type

int

log_progress
Type

bool

log_func
Type

callable

pbar

if log_progress is true, None, if false tqdm.tqdm instance

Type

{None, tqdm.tqdm instance}

ema_workbench.em_framework.util.combine(*args)

combine scenario and policy into a single experiment dict

Parameters

args (two or more dicts that need to be combined) –

Return type

a single unified dict containing the entries from all dicts

Raises

EMAError – if a keyword argument exists in more than one dict

ema_workbench.em_framework.util.representation(named_dict)

helper function for generating repr based names for NamedDicts

Connectors

vensim

vensimDLLwrapper

pysd_connector

pysd connector

class ema_workbench.connectors.pysd_connector.PysdModel(name=None, mdl_file=None)

simio

excel

This module provides a base class that can be used to perform EMA on Excel models. It relies on win32com

class ema_workbench.connectors.excel.ExcelModel(name, wd=None, model_file=None, default_sheet=None, pointers=None)

Analysis

prim

A scenario discovery oriented implementation of PRIM.

The implementation of prim provided here is data type aware, so categorical variables will be handled appropriately. It also uses a non-standard objective function in the peeling and pasting phase of the algorithm. This algorithm looks at the increase in the mean divided by the amount of data removed. So essentially, it uses something akin to the first order derivative of the original objective function.

The implementation is designed for interactive use in combination with the jupyter notebook.

class ema_workbench.analysis.prim.PRIMObjectiveFunctions(value)

An enumeration.

class ema_workbench.analysis.prim.Prim(x, y, threshold, obj_function=PRIMObjectiveFunctions.LENIENT1, peel_alpha=0.05, paste_alpha=0.05, mass_min=0.05, threshold_type=1, mode=RuleInductionType.BINARY, update_function='default')

Patient rule induction algorithm

The implementation of Prim is tailored to interactive use in the context of scenario discovery

Parameters
  • x (DataFrame) – the independent variables

  • y (1d ndarray) – the dependent variable

  • threshold (float) – the density threshold that a box has to meet

  • obj_function ({LENIENT1, LENIENT2, ORIGINAL}) – the objective function used by PRIM. Defaults to a lenient objective function based on the gain of mean divided by the loss of mass.

  • peel_alpha (float, optional) – parameter controlling the peeling stage (default = 0.05).

  • paste_alpha (float, optional) – parameter controlling the pasting stage (default = 0.05).

  • mass_min (float, optional) – minimum mass of a box (default = 0.05).

  • threshold_type ({ABOVE, BELOW}) – whether to look above or below the threshold value

  • mode ({RuleInductionType.BINARY, RuleInductionType.REGRESSION}, optional) – indicated whether PRIM is used for regression, or for scenario classification in which case y should be a binary vector

  • {'default' (update_function =) – controls behavior of PRIM after having found a first box. use either the default behavior were all points are removed, or the procedure suggested by guivarch et al (2016) doi:10.1016/j.envsoft.2016.03.006 to simply set all points to be no longer of interest (only valid in binary mode).

  • 'guivarch'} – controls behavior of PRIM after having found a first box. use either the default behavior were all points are removed, or the procedure suggested by guivarch et al (2016) doi:10.1016/j.envsoft.2016.03.006 to simply set all points to be no longer of interest (only valid in binary mode).

  • optional – controls behavior of PRIM after having found a first box. use either the default behavior were all points are removed, or the procedure suggested by guivarch et al (2016) doi:10.1016/j.envsoft.2016.03.006 to simply set all points to be no longer of interest (only valid in binary mode).

See also

cart

property boxes

Property for getting a list of box limits

determine_coi(indices)

Given a set of indices on y, how many cases of interest are there in this set.

Parameters

indices (ndarray) – a valid index for y

Returns

the number of cases of interest.

Return type

int

Raises

ValueError – if threshold_type is not either ABOVE or BELOW

find_box()

Execute one iteration of the PRIM algorithm. That is, find one box, starting from the current state of Prim.

property stats

property for getting a list of dicts containing the statistics for each box

class ema_workbench.analysis.prim.PrimBox(prim, box_lims, indices)

A class that holds information for a specific box

coverage

coverage of currently selected box

Type

float

density

density of currently selected box

Type

float

mean

mean of currently selected box

Type

float

res_dim

number of restricted dimensions of currently selected box

Type

int

mass

mass of currently selected box

Type

float

peeling_trajectory

stats for each box in peeling trajectory

Type

DataFrame

box_lims

list of box lims for each box in peeling trajectory

Type

list

by default, the currently selected box is the last box on the peeling trajectory, unless this is changed via PrimBox.select().

drop_restriction(uncertainty='', i=-1)

Drop the restriction on the specified dimension for box i

Parameters
  • i (int, optional) – defaults to the currently selected box, which defaults to the latest box on the trajectory

  • uncertainty (str) –

Replace the limits in box i with a new box where for the specified uncertainty the limits of the initial box are being used. The resulting box is added to the peeling trajectory.

inspect(i=None, style='table', **kwargs)

Write the stats and box limits of the user specified box to standard out. if i is not provided, the last box will be printed

Parameters
  • i (int or list of ints, optional) – the index of the box, defaults to currently selected box

  • style ({'table', 'graph', 'data'}) – the style of the visualization. ‘table’ prints the stats and boxlim. ‘graph’ creates a figure. ‘data’ returns a list of tuples, where each tuple contains the stats and the box_lims.

  • that (additional kwargs are passed to the helper function) –

  • graph (generates the table or) –

resample(i=None, iterations=10, p=0.5)

Calculate resample statistics for candidate box i

Parameters
  • i (int, optional) –

  • iterations (int, optional) –

  • p (float, optional) –

Return type

DataFrame

select(i)

select an entry from the peeling and pasting trajectory and update the prim box to this selected box.

Parameters

i (int) – the index of the box to select.

show_pairs_scatter(i=None, dims=None, cdf=False)

Make a pair wise scatter plot of all the restricted dimensions with color denoting whether a given point is of interest or not and the boxlims superimposed on top.

Parameters
  • i (int, optional) –

  • dims (list of str, optional) – dimensions to show, defaults to all restricted dimensions

  • cdf (bool, optional) – plot diag as cdf or pdf

Return type

seaborn PairGrid

show_ppt()

show the peeling and pasting trajectory in a figure

show_tradeoff(cmap=<matplotlib.colors.ListedColormap object>, annotated=False)

Visualize the trade off between coverage and density. Color is used to denote the number of restricted dimensions.

Parameters
  • cmap (valid matplotlib colormap) –

  • annotated (bool, optional. Shows point labels if True.) –

Return type

a Figure instance

update(box_lims, indices)

update the box to the provided box limits.

Parameters
  • box_lims (DataFrame) – the new box_lims

  • indices (ndarray) – the indices of y that are inside the box

write_ppt_to_stdout()

write the peeling and pasting trajectory to stdout

ema_workbench.analysis.prim.pca_preprocess(experiments, y, subsets=None, exclude={})

perform PCA to preprocess experiments before running PRIM

Pre-process the data by performing a pca based rotation on it. This effectively turns the algorithm into PCA-PRIM as described in Dalal et al (2013)

Parameters
  • experiments (DataFrame) –

  • y (ndarray) – one dimensional binary array

  • subsets (dict, optional) – expects a dictionary with group name as key and a list of uncertainty names as values. If this is used, a constrained PCA-PRIM is executed

  • exclude (list of str, optional) – the uncertainties that should be excluded from the rotation

Returns

  • rotated_experiments – DataFrame

  • rotation_matrix – DataFrame

Raises

RuntimeError – if mode is not binary (i.e. y is not a binary classification). if X contains non numeric columns

ema_workbench.analysis.prim.run_constrained_prim(experiments, y, issignificant=True, **kwargs)

Run PRIM repeatedly while constraining the maximum number of dimensions available in x

Improved usage of PRIM as described in Kwakkel (2019).

Parameters
  • x (DataFrame) –

  • y (numpy array) –

  • issignificant (bool, optional) – if True, run prim only on subsets of dimensions that are significant for the initial PRIM on the entire dataset.

  • **kwargs (any additional keyword arguments are passed on to PRIM) –

Return type

PrimBox instance

ema_workbench.analysis.prim.setup_prim(results, classify, threshold, incl_unc=[], **kwargs)

Helper function for setting up the prim algorithm

Parameters
  • results (tuple) – tuple of DataFrame and dict with numpy arrays the return from perform_experiments().

  • classify (str or callable) – either a string denoting the outcome of interest to use or a function.

  • threshold (double) – the minimum score on the density of the last box on the peeling trajectory. In case of a binary classification, this should be between 0 and 1.

  • incl_unc (list of str, optional) – list of uncertainties to include in prim analysis

  • kwargs (dict) – valid keyword arguments for prim.Prim

Return type

a Prim instance

Raises
  • PrimException – if data resulting from classify is not a 1-d array.

  • TypeError – if classify is not a string or a callable.

cart

A scenario discovery oriented implementation of CART. It essentially is a wrapper around scikit-learn’s version of CART.

class ema_workbench.analysis.cart.CART(x, y, mass_min=0.05, mode=RuleInductionType.BINARY)

CART algorithm

can be used in a manner similar to PRIM. It provides access to the underlying tree, but it can also show the boxes described by the tree in a table or graph form similar to prim.

Parameters
  • x (DataFrame) –

  • y (1D ndarray) –

  • mass_min (float, optional) – a value between 0 and 1 indicating the minimum fraction of data points in a terminal leaf. Defaults to 0.05, identical to prim.

  • mode ({BINARY, CLASSIFICATION, REGRESSION}) – indicates the mode in which CART is used. Binary indicates binary classification, classification is multiclass, and regression is regression.

boxes

list of DataFrame box lims

Type

list

stats

list of dicts with stats

Type

list

Notes

This class is a wrapper around scikit-learn’s CART algorithm. It provides an interface to CART that is more oriented towards scenario discovery, and shared some methods with PRIM

See also

prim

property boxes

rtype: list with boxlims for each terminal leaf

build_tree()

train CART on the data

show_tree(mplfig=True, format='png')

return a png (defaults) or svg of the tree

On Windows, graphviz needs to be installed with conda.

Parameters
  • mplfig (bool, optional) – if true (default) returns a matplotlib figure with the tree, otherwise, it returns the output as bytes

  • format ({'png', 'svg'}, default 'png') – Gives a format of the output.

property stats

rtype: list with scenario discovery statistics for each terminal leaf

ema_workbench.analysis.cart.setup_cart(results, classify, incl_unc=None, mass_min=0.05)

helper function for performing cart in combination with data generated by the workbench.

Parameters
  • results (tuple of DataFrame and dict with numpy arrays) – the return from perform_experiments().

  • classify (string, function or callable) – either a string denoting the outcome of interest to use or a function.

  • incl_unc (list of strings, optional) –

  • mass_min (float, optional) –

Raises

TypeError – if classify is not a string or a callable.

clusterer

This module provides time series clustering functionality using complex invariant distance. For details see Steinmann et al (2020)

ema_workbench.analysis.clusterer.apply_agglomerative_clustering(distances, n_clusters, linkage='average')

apply agglomerative clustering to the distances

Parameters
  • distances (ndarray) –

  • n_clusters (int) –

  • linkage ({'average', 'complete', 'single'}) –

Return type

1D ndarray with cluster assignment

ema_workbench.analysis.clusterer.calculate_cid(data, condensed_form=False)

calculate the complex invariant distance between all rows

Parameters
  • data (2d ndarray) –

  • condensed_form (bool, optional) –

Returns

a 2D ndarray with the distances between all time series, or condensed form similar to scipy.spatial.distance.pdist¶

Return type

distances

ema_workbench.analysis.clusterer.plot_dendrogram(distances)

plot dendrogram for distances

logistic_regression

This module implements logistic regression for scenario discovery.

The module draws its inspiration from Quinn et al (2018) 10.1029/2018WR022743 and Lamontagne et al (2019). The implementation here generalizes their work and embeds it in a more typical scenario discovery workflow with a posteriori selection of the appropriate number of dimensions to include. It is modeled as much as possible on the api used for PRIM and CART.

class ema_workbench.analysis.logistic_regression.Logit(x, y, threshold=0.95)

Implements an interactive version of logistic regression using BIC based forward selection

Parameters
  • x (DataFrame) –

  • y (numpy Array) –

  • threshold (float) –

coverage

coverage of currently selected model

Type

float

density

density of currently selected model

Type

float

res_dim

number of restricted dimensions of currently selected model

Type

int

peeling_trajectory

stats for each model in peeling trajectory

Type

DataFrame

models

list of models associated with each model on the peeling trajectory

Type

list

inspect(i, step=0.1)

Inspect one of the models by showing the threshold tradeoff and summary2

Parameters
  • i (int) –

  • step (float between [0, 1]) –

plot_pairwise_scatter(i, threshold=0.95)

plot pairwise scatter plot of data points, with contours as background

Parameters
  • i (int) –

  • threshold (float) –

Return type

Figure instance

The lower triangle background is a binary contour based on the specified threshold. All axis not shown are set to a default value in the middle of their range

The upper triangle shows a contour map with the conditional probability, again setting all non shown dimensions to a default value in the middle of their range.

run()

run logistic regression using forward selection using a Bayesian Information Criterion for selecting whether and if so which dimension to add

show_threshold_tradeoff(i, cmap=<matplotlib.colors.ListedColormap object>, step=0.1)

Visualize the trade off between coverage and density for a given model i across the range of threshold values

Parameters
  • i (int) –

  • cmap (valid matplotlib colormap) –

  • step (float, optional) –

Return type

a Figure instance

show_tradeoff(cmap=<matplotlib.colors.ListedColormap object>, annotated=False)

Visualize the trade off between coverage and density. Color is used to denote the number of restricted dimensions.

Parameters
  • cmap (valid matplotlib colormap) –

  • annotated (bool, optional. Shows point labels if True.) –

Return type

a Figure instance

update(model, selected)

helper function for adding a model to the collection of models and update the associated attributes

Parameters
  • model (statsmodel fitted logit model) –

  • selected (list of str) –

dimensional_stacking

This module provides functionality for doing dimensional stacking of uncertain factors in order to reveal patterns in the values for a single outcome of interests. It is inspired by the work reported here with one deviation.

Rather than using association rules to identify the uncertain factors to use, this code uses random forest based feature scoring instead. It is also possible to use the code provided here in combination with any other feature scoring or factor prioritization technique instead, or by simply selecting uncertain factors in some other manner.

ema_workbench.analysis.dimensional_stacking.create_pivot_plot(x, y, nr_levels=3, labels=True, categories=True, nbins=3, bin_labels=False)

convenience function for easily creating a pivot plot

Parameters
  • x (DataFrame) –

  • y (1d ndarray) –

  • nr_levels (int, optional) – the number of levels in the pivot table. The number of uncertain factors included in the pivot table is two times the number of levels.

  • labels (bool, optional) – display names of uncertain factors

  • categories (bool, optional) – display category names for each uncertain factor

  • nbins (int, optional) – number of bins to use when discretizing continuous uncertain factors

  • bin_labels (bool, optional) – if True show bin interval / name, otherwise show only a number

Return type

Figure

This function performs feature scoring using random forests, selects a number of high scoring factors based on the specified number of levels, creates a pivot table, and visualizes the table. This is a convenience function. For more control over the process, use the code in this function as a template.

regional_sa

Module offers support for performing basic regional sensitivity analysis. The module can be used to perform regional sensitivity analysis on all uncertainties specified in the experiment array, as well as the ability to zoom in on any given uncertainty in more detail.

ema_workbench.analysis.regional_sa.plot_cdfs(x, y, ccdf=False)

plot cumulative density functions for each column in x, based on the classification specified in y.

Parameters
  • x (DataFrame) – the experiments to use in the cdfs

  • y (ndaray) – the categorization for the data

  • ccdf (bool, optional) – if true, plot a complementary cdf instead of a normal cdf.

Return type

a matplotlib Figure instance

feature_scoring

Feature scoring functionality

ema_workbench.analysis.feature_scoring.CHI2(X, y)

Compute chi-squared stats between each non-negative feature and class.

This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.

Recall that the chi-square test measures dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.

Read more in the User Guide.

Parameters
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Sample vectors.

  • y (array-like of shape (n_samples,)) – Target vector (class labels).

Returns

  • chi2 (ndarray of shape (n_features,)) – Chi2 statistics for each feature.

  • p_values (ndarray of shape (n_features,)) – P-values for each feature.

Notes

Complexity of this algorithm is O(n_classes * n_features).

See also

f_classif

ANOVA F-value between label/feature for classification tasks.

f_regression

F-value between label/feature for regression tasks.

ema_workbench.analysis.feature_scoring.F_CLASSIFICATION(X, y)

Compute the ANOVA F-value for the provided sample.

Read more in the User Guide.

Parameters
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The set of regressors that will be tested sequentially.

  • y (ndarray of shape (n_samples,)) – The target vector.

Returns

  • f_statistic (ndarray of shape (n_features,)) – F-statistic for each feature.

  • p_values (ndarray of shape (n_features,)) – P-values associated with the F-statistic.

See also

chi2

Chi-squared stats of non-negative features for classification tasks.

f_regression

F-value between label/feature for regression tasks.

ema_workbench.analysis.feature_scoring.F_REGRESSION(X, y, *, center=True)

Univariate linear regression tests returning F-statistic and p-values.

Quick linear model for testing the effect of a single regressor, sequentially for many regressors.

This is done in 2 steps:

  1. The cross correlation between each regressor and the target is computed, that is, ((X[:, i] - mean(X[:, i])) * (y - mean_y)) / (std(X[:, i]) * std(y)) using r_regression function.

  2. It is converted to an F score and then to a p-value.

f_regression() is derived from r_regression() and will rank features in the same order if all the features are positively correlated with the target.

Note however that contrary to f_regression(), r_regression() values lie in [-1, 1] and can thus be negative. f_regression() is therefore recommended as a feature selection criterion to identify potentially predictive feature for a downstream classifier, irrespective of the sign of the association with the target variable.

Furthermore f_regression() returns p-values while r_regression() does not.

Read more in the User Guide.

Parameters
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data matrix.

  • y (array-like of shape (n_samples,)) – The target vector.

  • center (bool, default=True) – Whether or not to center the data matrix X and the target vector y. By default, X and y will be centered.

Returns

  • f_statistic (ndarray of shape (n_features,)) – F-statistic for each feature.

  • p_values (ndarray of shape (n_features,)) – P-values associated with the F-statistic.

See also

r_regression

Pearson’s R between label/feature for regression tasks.

f_classif

ANOVA F-value between label/feature for classification tasks.

chi2

Chi-squared stats of non-negative features for classification tasks.

SelectKBest

Select features based on the k highest scores.

SelectFpr

Select features based on a false positive rate test.

SelectFdr

Select features based on an estimated false discovery rate.

SelectFwe

Select features based on family-wise error rate.

SelectPercentile

Select features based on percentile of the highest scores.

ema_workbench.analysis.feature_scoring.get_ex_feature_scores(x, y, mode=RuleInductionType.CLASSIFICATION, nr_trees=100, max_features=None, max_depth=None, min_samples_split=2, min_samples_leaf=None, min_weight_fraction_leaf=0, max_leaf_nodes=None, bootstrap=True, oob_score=True, random_state=None)

Get feature scores using extra trees

Parameters
Returns

  • pandas DataFrame – sorted in descending order of tuples with uncertainty and feature scores

  • object – either ExtraTreesClassifier or ExtraTreesRegressor

ema_workbench.analysis.feature_scoring.get_feature_scores_all(x, y, alg='extra trees', mode=RuleInductionType.REGRESSION, **kwargs)

perform feature scoring for all outcomes using the specified feature scoring algorithm

Parameters
  • x (DataFrame) –

  • y (dict of 1d numpy arrays) – the outcomes, with a string as key, and a 1D array for each outcome

  • alg ({'extra trees', 'random forest', 'univariate'}, optional) –

  • mode ({RuleInductionType.REGRESSION, RuleInductionType.CLASSIFICATION}, optional) –

  • kwargs (dict, optional) – any remaining keyword arguments will be passed to the specific feature scoring algorithm

Return type

DataFrame instance

ema_workbench.analysis.feature_scoring.get_rf_feature_scores(x, y, mode=RuleInductionType.CLASSIFICATION, nr_trees=250, max_features='sqrt', max_depth=None, min_samples_split=2, min_samples_leaf=1, bootstrap=True, oob_score=True, random_state=None)

Get feature scores using a random forest

Parameters
Returns

  • pandas DataFrame – sorted in descending order of tuples with uncertainty and feature scores

  • object – either RandomForestClassifier or RandomForestRegressor

ema_workbench.analysis.feature_scoring.get_univariate_feature_scores(x, y, score_func=<function f_classif>)

calculate feature scores using univariate statistical tests. In case of categorical data, chi square or the Anova F value is used. In case of continuous data the Anova F value is used.

Parameters
  • x (DataFrame) –

  • y (1D nd.array) –

  • score_func ({F_CLASSIFICATION, F_REGRESSION, CHI2}) – the score function to use, one of f_regression (regression), or f_classification or chi2 (classification).

Returns

sorted in descending order of tuples with uncertainty and feature scores (i.e. p values in this case).

Return type

pandas DataFrame

plotting

this module provides functions for generating some basic figures. The code can be used as is, or serve as an example for writing your own code.

ema_workbench.analysis.plotting.envelopes(experiments, outcomes, outcomes_to_show=None, group_by=None, grouping_specifiers=None, density=None, fill=False, legend=True, titles={}, ylabels={}, log=False)

Make envelop plots.

An envelope shows over time the minimum and maximum value for a set of runs over time. It is thus to be used in case of time series data. The function will try to find a result labeled “TIME”. If this is present, these values will be used on the X-axis. In case of Vensim models, TIME is present by default.

Parameters
  • experiments (DataFrame) –

  • outcomes (OutcomesDict) –

  • outcomes_to_show

  • str (list of) –

  • str

  • optional

  • group_by (str, optional) – name of the column in the experimentsto group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • density ({None, HIST, KDE, VIOLIN, BOXPLOT}, optional) –

  • fill (bool, optional) –

  • legend (bool, optional) –

  • titles (dict, optional) – a way for controlling whether each of the axes should have a title. There are three possibilities. If set to None, no title will be shown for any of the axes. If set to an empty dict, the default, the title is identical to the name of the outcome of interest. If you want to override these default names, provide a dict with the outcome of interest as key and the desired title as value. This dict need only contain the outcomes for which you want to use a different title.

  • ylabels (dict, optional) – way for controlling the ylabels. Works identical to titles.

  • log (bool, optional) – log scale density plot

Returns

  • Figure (Figure instance)

  • axes (dict) – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density.

Note

the current implementation is limited to seven different categories in case of group_by, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.

Examples

>>> import util as util
>>> data = util.load_results(r'1000 flu cases.cPickle')
>>> envelopes(data, group_by='policy')

will show an envelope for three three different policies, for all the outcomes of interest. while

>>> envelopes(data, group_by='policy', categories=['static policy',
              'adaptive policy'])

will only show results for the two specified policies, ignoring any results associated with ‘no policy’.

ema_workbench.analysis.plotting.kde_over_time(experiments, outcomes, outcomes_to_show=None, group_by=None, grouping_specifiers=None, colormap='viridis', log=True)

Plot a KDE over time. The KDE is is visualized through a heatmap

Parameters
  • experiments (DataFrame) –

  • outcomes (OutcomesDict) –

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot. If empty, all outcomes are plotted. Note: just names.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • colormap (str, optional) – valid matplotlib color map name

  • log (bool, optional) –

Returns

  • list of Figure instances – a figure instance for each group for each outcome

  • dict – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density

ema_workbench.analysis.plotting.lines(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, density='', legend=True, titles={}, ylabels={}, experiments_to_show=None, show_envelope=False, log=False)

This function takes the results from perform_experiments() and visualizes these as line plots. It is thus to be used in case of time series data. The function will try to find a result labeled “TIME”. If this is present, these values will be used on the X-axis. In case of Vensim models, TIME is present by default.

Parameters
  • experiments (DataFrame) –

  • outcomes (OutcomesDict) –

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot. If empty, all outcomes are plotted. Note: just names.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • density ({None, HIST, KDE, VIOLIN, BOXPLOT}, optional) –

  • legend (bool, optional) –

  • titles (dict, optional) – a way for controlling whether each of the axes should have a title. There are three possibilities. If set to None, no title will be shown for any of the axes. If set to an empty dict, the default, the title is identical to the name of the outcome of interest. If you want to override these default names, provide a dict with the outcome of interest as key and the desired title as value. This dict need only contain the outcomes for which you want to use a different title.

  • ylabels (dict, optional) – way for controlling the ylabels. Works identical to titles.

  • experiments_to_show (ndarray, optional) – indices of experiments to show lines for, defaults to None.

  • show_envelope (bool, optional) – show envelope of outcomes. This envelope is the based on the minimum at each column and the maximum at each column.

  • log (bool, optional) – log scale density plot

Returns

  • fig (Figure instance)

  • axes (dict) – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density.

Note

the current implementation is limited to seven different categories in case of group_by, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.

ema_workbench.analysis.plotting.multiple_densities(experiments, outcomes, points_in_time=None, outcomes_to_show=None, group_by=None, grouping_specifiers=None, density=Density.KDE, legend=True, titles={}, ylabels={}, experiments_to_show=None, plot_type=PlotType.ENVELOPE, log=False, **kwargs)

Make an envelope plot with multiple density plots over the run time

Parameters
  • experiments (DataFrame) –

  • outcomes (OutcomesDict) –

  • points_in_time (list) – a list of points in time for which you want to see the density. At the moment up to 6 points in time are supported

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot. If empty, all outcomes are plotted. Note: just names.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • density ({Density.KDE, Density.HIST, Density.VIOLIN, Density.BOXPLOT},) – optional

  • legend (bool, optional) –

  • titles (dict, optional) – a way for controlling whether each of the axes should have a title. There are three possibilities. If set to None, no title will be shown for any of the axes. If set to an empty dict, the default, the title is identical to the name of the outcome of interest. If you want to override these default names, provide a dict with the outcome of interest as key and the desired title as value. This dict need only contain the outcomes for which you want to use a different title.

  • ylabels (dict, optional) – way for controlling the ylabels. Works identical to titles.

  • experiments_to_show (ndarray, optional) – indices of experiments to show lines for, defaults to None.

  • plot_type ({PlotType.ENVELOPE, PlotType.ENV_LIN, PlotType.LINES}, optional) –

  • log (bool, optional) –

Returns

  • fig (Figure instance)

  • axes (dict) – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density.

Note

the current implementation is limited to seven different categories in case of group_by, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.

Note

the connection patches are for some reason not drawn if log scaling is used for the density plots. This appears to be an issue in matplotlib itself.

pairs_plotting

This module provides R style pairs plotting functionality.

ema_workbench.analysis.pairs_plotting.pairs_density(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, ylabels={}, point_in_time=-1, log=True, gridsize=50, colormap='coolwarm', filter_scalar=True)

Generate a R style pairs hexbin density multiplot. In case of time-series data, the end states are used.

hexbin makes hexagonal binning plot of x versus y, where x, y are 1-D sequences of the same length, N. If C is None (the default), this is a histogram of the number of occurences of the observations at (x[i],y[i]). For further detail see matplotlib on hexbin

Parameters
  • experiments (DataFrame) –

  • outcomes (dict) –

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.

  • point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time

  • log (bool, optional) – indicating whether density should be log scaled. Defaults to True.

  • gridsize (int, optional) – controls the gridsize for the hexagonal bining. (default = 50)

  • cmap (str) – color map that is to be used in generating the hexbin. For details on the available maps, see pylab. (Defaults = coolwarm)

  • filter_scalar (bool, optional) – remove the non-time-series outcomes. Defaults to True.

Returns

  • fig – the figure instance

  • dict – key is tuple of names of outcomes, value is associated axes instance

ema_workbench.analysis.pairs_plotting.pairs_lines(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, ylabels={}, legend=True, **kwargs)

Generate a R style pairs lines multiplot. It shows the behavior of two outcomes over time against each other. The origin is denoted with a circle and the end is denoted with a ‘+’.

Parameters
  • experiments (DataFrame) –

  • outcomes (dict) –

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.

  • legend (bool, optional) – if true, and group_by is given, show a legend.

  • point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time

Returns

  • fig – the figure instance

  • dict – key is tuple of names of outcomes, value is associated axes instance

ema_workbench.analysis.pairs_plotting.pairs_scatter(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, ylabels={}, legend=True, point_in_time=-1, filter_scalar=False, **kwargs)

Generate a R style pairs scatter multiplot. In case of time-series data, the end states are used.

Parameters
  • experiments (DataFrame) –

  • outcomes (dict) –

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.

  • legend (bool, optional) – if true, and group_by is given, show a legend.

  • point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time

  • filter_scalar (bool, optional) – remove the non-time-series outcomes. Defaults to True.

Returns

  • fig (Figure instance) – the figure instance

  • axes (dict) – key is tuple of names of outcomes, value is associated axes instance

  • .. note:: the current implementation is limited to seven different – categories in case of column, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.

parcoords

This module offers a general purpose parallel coordinate plotting Class using matplotlib.

class ema_workbench.analysis.parcoords.ParallelAxes(limits, formatter=None, fontsize=14, rot=90)

Base class for creating a parallel axis plot.

Parameters
  • limits (DataFrame) – A DataFrame specifying the limits for each dimension in the data set. For categorical data, the first cell should contain all categories. See get_limits for more details.

  • formattter (dict , optional) – dict with precision format strings for minima and maxima, use column name as key. If column is not present, or no formatter dict is provided, precision formatting defaults to .2f

  • fontsize (int, optional) – fontsize for defaults text items

  • rot (float, optional) – rotation of axis labels

invert_axis(axis)

flip direction for specified axis

Parameters

axis (str or list of str) –

legend()

add a legend to the figure

plot(data, color=None, label=None, **kwargs)

plot data on parallel axes

Parameters
  • data (DataFrame or Series) –

  • color (valid mpl color, optional) –

  • label (str, optional) –

  • plot (any additional kwargs will be passed to matplotlib's) –

  • method.

  • initializing (Data is normalized using the limits specified when) –

  • ParallelAxis.

ema_workbench.analysis.parcoords.get_limits(data)

helper function to get limits of a FataFrame that can serve as input to ParallelAxis

Parameters

data (DataFrame) –

Return type

DataFrame

b_and_w_plotting

This module offers basic functionality for converting a matplotlib figure to black and white. The provided functionality is largely determined by what is needed for the workbench.

ema_workbench.analysis.b_and_w_plotting.set_fig_to_bw(fig, style='hatching', line_style='continuous')

TODO it would be nice if for lines you can select either markers, gray scale, or simple black

Take each axes in the figure and transform its content to black and white. Lines are tranformed based on different line styles. Fills such as can be used in meth:envelopes are gray-scaled. Heathmaps are also gray-scaled.

derived from and expanded for my use from: https://stackoverflow.com/questions/7358118/matplotlib-black-white-colormap-with-dashes-dots-etc

Parameters
  • fig (figure) – the figure to transform to black and white

  • style ({HATCHING, GREYSCALE}) – parameter controlling how collections are transformed.

  • line_style (str, {'continuous', 'black', None}) –

plotting_util

Plotting utility functions

ema_workbench.analysis.plotting_util.COLOR_LIST = [(0.12156862745098039, 0.4666666666666667, 0.7058823529411765), (1.0, 0.4980392156862745, 0.054901960784313725), (0.17254901960784313, 0.6274509803921569, 0.17254901960784313), (0.8392156862745098, 0.15294117647058825, 0.1568627450980392), (0.5803921568627451, 0.403921568627451, 0.7411764705882353), (0.5490196078431373, 0.33725490196078434, 0.29411764705882354), (0.8901960784313725, 0.4666666666666667, 0.7607843137254902), (0.4980392156862745, 0.4980392156862745, 0.4980392156862745), (0.7372549019607844, 0.7411764705882353, 0.13333333333333333), (0.09019607843137255, 0.7450980392156863, 0.8117647058823529)]

Default color list

class ema_workbench.analysis.plotting_util.Density(value)

Enum for different types of density plots

BOXENPLOT = 'boxenplot'

constant for plotting density as a boxenplot

BOXPLOT = 'boxplot'

constant for plotting density as a boxplot

HIST = 'hist'

constant for plotting density as a histogram

KDE = 'kde'

constant for plotting density as a kernel density estimate

VIOLIN = 'violin'

constant for plotting density as a violin plot, which combines a Gaussian density estimate with a boxplot

class ema_workbench.analysis.plotting_util.LegendEnum(value)

Enum for different styles of legends

class ema_workbench.analysis.plotting_util.PlotType(value)

An enumeration.

ENVELOPE = 'envelope'

constant for plotting envelopes

ENV_LIN = 'env_lin'

constant for plotting envelopes with lines

LINES = 'lines'

constant for plotting lines

scenario_discovery_util

Scenario discovery utilities used by both cart and prim

class ema_workbench.analysis.scenario_discovery_util.RuleInductionType(value)

An enumeration.

BINARY = 'binary'

constant indicating binary classification mode. This is the most common used mode in scenario discovery

CLASSIFICATION = 'classification'

constant indicating classification mode

REGRESSION = 'regression'

constant indicating regression mode

Util

ema_exceptions

Exceptions and warning used internally by the EMA workbench. In line with advice given in PEP 8.

exception ema_workbench.util.ema_exceptions.CaseError(message, case, policy=None)

error to be used when a particular run creates an error. The character of the error can be specified as the message, and the actual case that gave rise to the error.

exception ema_workbench.util.ema_exceptions.EMAError(*args)

Base EMA error

exception ema_workbench.util.ema_exceptions.EMAParallelError(*args)

parallel EMA error

exception ema_workbench.util.ema_exceptions.EMAWarning(*args)

base EMA warning class

ema_logging

This module contains code for logging EMA processes. It is modeled on the default logging approach that comes with Python. This logging system will also work in case of multiprocessing.

ema_workbench.util.ema_logging.get_rootlogger()

Returns root logger used by the EMA workbench

Return type

the logger of the EMA workbench

ema_workbench.util.ema_logging.log_to_stderr(level=None)

Turn on logging and add a handler which prints to stderr

Parameters

level (int) – minimum level of the messages that will be logged

ema_workbench.util.ema_logging.temporary_filter(name='EMA', level=0, functname=None)

temporary filter log message

namestr or list of str, optional

logger on which to apply the filter.

level: int, or list of int, optional

don’t log message of this level or lower

funcnamestr or list of str, optional

don’t log message of this function

all modules have their own unique logger (e.g. ema_workbench.analysis.prim)

utilities

This module provides various convenience functions and classes.

ema_workbench.util.utilities.load_results(file_name)

load the specified bz2 file. the file is assumed to be saves using save_results.

Parameters

file_name (str) – the path to the file

Raises

IOError if file not found

ema_workbench.util.utilities.merge_results(results1, results2)

convenience function for merging the return from perform_experiments().

The function merges results2 with results1. For the experiments, it generates an empty array equal to the size of the sum of the experiments. As dtype is uses the dtype from the experiments in results1. The function assumes that the ordering of dtypes and names is identical in both results.

A typical use case for this function is in combination with experiments_to_cases(). Using experiments_to_cases() one extracts the cases from a first set of experiments. One then performs these cases on a different model or policy, and then one wants to merge these new results with the old result for further analysis.

Parameters
  • results1 (tuple) – first results to be merged

  • results2 (tuple) – second results to be merged

Return type

the merged results

ema_workbench.util.utilities.process_replications(data, aggregation_func=<function mean>)

Convenience function for processing the replications of a stochastic model’s outcomes.

The default behavior is to take the mean of the replications. This reduces the dimensionality of the outcomes from (experiments * replications * outcome_shape) to (experiments * outcome_shape), where outcome_shape is 0-d for scalars, 1-d for time series, and 2-d for arrays.

The function can take either the outcomes (dictionary: keys are outcomes of interest, values are arrays of data) or the results (tuple: experiments as DataFrame, outcomes as dictionary) of a set of simulation experiments.

Parameters
  • data (dict, tuple) – outcomes or results of a set of experiments

  • aggregation_func (callabale, optional) – aggregation function to be applied, defaults to np.mean.

Return type

dict, tuple

ema_workbench.util.utilities.save_results(results, file_name)

save the results to the specified tar.gz file.

The way in which results are stored depends. Experiments are saved as csv. Outcomes depend on the outcome type. Scalar, and <3D arrays are saved as csv files. Higher dimensional arrays are stored as .npy files.

Parameters
  • results (tuple) – the return of perform_experiments

  • file_name (str) – the path of the file

Raises

IOError if file not found

Indices and tables