EMA Workbench documentation

Release:

3.0

Date:

Apr 17, 2024

Exploratory Modelling and Analysis (EMA) Workbench

Exploratory Modeling and Analysis (EMA) is a research methodology that uses computational experiments to analyze complex and uncertain systems (Bankes, 1993). That is, exploratory modeling aims at offering computational decision support for decision making under deep uncertainty and Robust Decision Making.

The EMA workbench aims at providing support for performing exploratory modeling with models developed in various modelling packages and environments. Currently, the workbench offers connectors to Vensim, Netlogo, and Excel.

The EMA workbench offers support for designing experiments, performing the experiments - including support for parallel processing on both a single machine as well as on clusters-, and analysing the results. To get started, take a look at the high level overview, the tutorial, or dive straight into the details of the API. For a comparison between the workbench and rhodium, see this discusion.

A High Level Overview

Exploratory modeling framework

The core package contains the core functionality for setting up, designing, and performing series of computational experiments on one or more models simultaneously.

Connectors

The connectors package contains connectors to some existing simulation modeling environments. For each of these, a standard ModelStructureInterface class is provided that users can use as a starting point for specifying the interface to their own model.

  • Vensim connector (ema_workbench.connectors.vensim): This enables controlling (e.g. setting parameters, simulation setup, run, get output, etc .) a simulation model that is built in Vensim software, and conducting an EMA study based on this model.

  • Excel connector (ema_workbench.connectors.excel): This enables controlling models build in Excel.

  • NetLogo connector (ema_workbench.connectors.netlogo): This enables controlling (e.g. setting parameters, simulation setup, run, get output, etc .) a simulation model that is built in NetLogo software, and conducting an EMA study based on this model.

  • Simio connector (ema_workbench.connectors.simio_connector): This enables controlling models built in Simio

  • Pysd connector (ema_workbench.connectors.pysd_connector)

Analysis

The analysis package contains a variety of analysis and visualization techniques for analyzing the results from the exploratory modeling. The analysis scripts are tailored for use in combination with the workbench, but they can also be used on their own with data generated in some other manner.

Installing the workbench

From version 2.5.0 the workbench requires Python 3.9 or higher. Version 2.0.0 to 2.4.x support Python 3.8+.

Regular installations

A stable version of the workbench can be installed via pip.

pip install ema_workbench

This installs the EMAworkbench together with all the bare necessities to run Python models.

If you want to upgrade the workbench from a previous version, add -U (or --upgrade) to the pip command.

pip install -U ema_workbench

We have a few more install options available, which install optional dependencies not always necessary but either nice to have or for specific functions. These can be installed with so called “extras” using pip.

Therefore we recommended installing with:

pip install -U ema_workbench[recommended]

Which currently includes everything needed to use the workbench in Jupyter notebooks, with interactive graphs and to successfully complete the tests with pytest.

However, the EMAworkbench can connect to many other simulation software, such as Netlogo, Simio, Vensim (pysd) and Vadere. For these there are also extras available:

pip install -U ema_workbench[netlogo,simio,pysd]

Note that the Netlogo and Simio extras need Windows as OS.

These extras can be combined. If you’re going to work with Netlogo for example, you can do:

pip install -U ema_workbench[recommended,netlogo]

You can use all to install all dependencies, except the connectors. Prepare for a large install.

pip install -U ema_workbench[all]

These are all the extras that are available:

  • jupyter installs ["jupyter", "jupyter_client", "ipython", "ipykernel"]

  • dev installs ["pytest", "jupyter_client", "ipyparallel"]

  • cov installs ["pytest-cov", "coverage", "coveralls"]

  • docs installs ["sphinx", "nbsphinx", "myst", "pyscaffold"]

  • graph installs ["altair", "pydot", "graphviz"]

  • parallel installs ["ipyparallel", "traitlets"]

  • netlogo installs ["jpype-1", "pynetlogo"]

  • pysd installs ["pysd"]

  • simio installs ["pythonnet"]

Then recommended is currently equivalent to jupyter,dev,graph and all installs everything, except the connectors. These are defined in the pyproject.toml file.

Developer installations

As a developer you will want an edible install, in which you can modify the installation itself. This can be done by adding -e (for edible) to the pip command.

pip install -e ema_workbench

The latest commit on the master branch can be installed with:

pip install -e git+https://github.com/quaquel/EMAworkbench#egg=ema-workbench

Or any other (development) branch on this repo or your own fork:

pip install -e git+https://github.com/YOUR_FORK/EMAworkbench@YOUR_BRANCH#egg=ema-workbench

The code is also available from github.

Limitations

Some connectors have specific limitations, listed below.

  • Vensim only works on Windows. If you have 64-bit Vensim, you need 64-bit Python. If you have 32-bit Vensim, you will need 32-bit Python.

  • Excel also only works on Windows.

Alternative packages

So how does the workbench differ from other open source tools available for exploratory modeling? In Python there is rhodium, in R there is the open MORDM toolkit, and there is also OpenMole. Below we discuss the key differences with rhodium. Given a lack of familiarity with the other tools, we wont comment on those.

The workbench versus rhodium

For Python, the main alternative tool is rhodium, which is part of Project Platypus. Project Platypus is a collection of libraries for doing many objective optimization (platypus-opt), setting up and performing simulation experiments (rhodium), and scenario discovery using the Patient Rule Induction Method (prim). The relationship between the workbench and the tools that form project platypus is a bit messy. For example, the workbench too relies on platypus-opt for many objective optimization, the PRIM package is a, by now very dated, fork of the prim code in the workbench, and both rhodium and the workbench rely on SALib for global sensitivity analysis. Moreover, the API of rhodium was partly inspired by an older version of the workbench, while new ideas from the rhodium API have in turned resulting in profound changes in the API of the workbench.

Currently, the workbench is still actively being developed. It is also not just used in teaching but also for research, and in practice by various organization globally. Moreover, the workbench is quite a bit more developed when it comes to providing off the shelf connectors for some popular modeling and simulation tools. Basically, everything that can be done with project Platypus can be done with the workbench and then the workbench offers additional functionality, a more up to date code base, and active support.

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

2.5.1

The 2.5.1 release is a small patch release with two bugfixes.

  • The first PR (#346) corrects a dependency issue where the finalizer in futures_util.py incorrectly assumed the presence of experiment_runner in its module namespace, leading to failures in futures models like multiprocessing and mpi. This is resolved by adjusting the finalizer function to expect experiment_runner as an argument.

  • The second PR (#349) addresses a redundancy problem in the MPI evaluator, where the pool was inadvertently created twice.

What’s Changed

🐛 Bugs fixed
  • Fix finalizer dependency on global experiment_runner by @quaquel in https://github.com/quaquel/EMAworkbench/pull/346

  • bug fix in MPI evaluator by @quaquel in https://github.com/quaquel/EMAworkbench/pull/349

Full Changelog: https://github.com/quaquel/EMAworkbench/compare/2.5.0…2.5.1

2.5.0

Highlights

In the 2.5.0 release of the EMAworkbench we introduce a new experimental MPIevaluator to run on multi-node (HPC) systems (#299, #328). We would love feedback on it in #311.

Furthermore, the pair plots for scenario discovery now allow contour plots and bivariate histograms (#288). When doing Prim you can inspect multiple boxed and display them in a single figure (#317).

Breaking changes

From 3.0 onwards, the names of parameters, constants, constraints, and outcomes must be valid python identifiers. From this version onwards, a DeprecationWarning is raised if the name is not a valid Python identifier.

What’s Changed

🎉 New features added
  • Improved pair plots for scenario discovery by @steipatr in https://github.com/quaquel/EMAworkbench/pull/288

  • Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/299

  • inspect multiple boxes and display them in a single figure by @quaquel in https://github.com/quaquel/EMAworkbench/pull/317

🛠 Enhancements made
  • Enhancement for #271: raise exception by @quaquel in https://github.com/quaquel/EMAworkbench/pull/282

  • em_framework/points: Add string representation to Experiment class by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/297

  • Speed up of plot_discrete_cdfs by 2 orders of magnitude by @quaquel in https://github.com/quaquel/EMAworkbench/pull/306

  • em_framework: Improve log messages, warning and errors by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/300

  • analysis: Improve log messages, warning and errors by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/313

  • change to log message and log level in feature scoring by @quaquel in https://github.com/quaquel/EMAworkbench/pull/318

  • [WIP] MPI update by @quaquel in https://github.com/quaquel/EMAworkbench/pull/328

🐛 Bugs fixed
  • Fix search in Readthedocs configuration with workaround by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/264

  • bugfix introduced by #241 in general-introduction from docs by @quaquel in https://github.com/quaquel/EMAworkbench/pull/265

  • prim: Replace deprecated Altair function by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/270

  • bugfix to rebuild_platypus_population by @quaquel in https://github.com/quaquel/EMAworkbench/pull/276

  • bugfix for #277 : load_results properly handles experiments dtypes by @quaquel in https://github.com/quaquel/EMAworkbench/pull/280

  • fixes a bug where binomtest fails because of floating point math by @quaquel in https://github.com/quaquel/EMAworkbench/pull/315

  • make workbench compatible with latest version of pysd by @quaquel in https://github.com/quaquel/EMAworkbench/pull/336

  • bugfixes for string vs bytestring by @quaquel in https://github.com/quaquel/EMAworkbench/pull/339

📜 Documentation improvements
  • Drop Python 3.8 support, require 3.9+ by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/259

  • readthedocs: Add search ranking and use latest Python version by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/242

  • docs/examples: Always use n_processes=-1 in MultiprocessingEvaluator by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/278

  • Docs: Add MPIEvaluator tutorial for multi-node HPC systems, including DelftBlue by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/308

  • Add Mesa example by @steipatr in https://github.com/quaquel/EMAworkbench/pull/335

  • Fix htmltheme of docs by @quaquel in https://github.com/quaquel/EMAworkbench/pull/342

🔧 Maintenance
  • CI: Switch default jobs to Python 3.12 by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/314

  • Reorganization of evaluator code and renaming of modules by @quaquel in https://github.com/quaquel/EMAworkbench/pull/320

  • Replace deprecated zmq.eventloop.ioloop with Tornado’s ioloop by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/334

Other changes
  • examples: Speedup the lake_problem function by ~30x by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/301

  • Create an GitHub issue chooser by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/331

  • Depracation warning for parameter names not being valid python identifiers by @quaquel in https://github.com/quaquel/EMAworkbench/pull/337

Full Changelog: https://github.com/quaquel/EMAworkbench/compare/2.4.0…2.5.0

2.4.1

Highlights

2.4.1 is a small patch release of the EMAworkbench, primarily resolving issues #276 and #277 in the workbench itself, and a bug introduced by #241 in the docs. The EMAworkbench now also raise exception when sampling scenarios or policies while no uncertainties or levers are defined (#282).

What’s Changed

🛠 Enhancements made
  • Enhancement for #271: raise exception by @quaquel in https://github.com/quaquel/EMAworkbench/pull/282

🐛 Bugs fixed
  • bugfix to rebuild_platypus_population by @quaquel in https://github.com/quaquel/EMAworkbench/pull/276

  • Fixed dtype handling in load_results function. The dtype metadata is now correctly applied, resolving issue #277.

  • Fixed the documentation bug introduced by #241 in the general introduction section, which now accurately reflects the handling of categorical uncertainties in the experiment dataframe.

📜 Documentation improvements
  • readthedocs: Add search ranking and use latest Python version by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/242

  • docs/examples: Always use n_processes=-1 in MultiprocessingEvaluator by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/278

2.4.0

Highlights

The latest release of the EMAworkbench introduces significant performance improvements and quality of life updates. The performance of _store_outcomes has been enhanced by approximately 35x in pull request #232, while the combine function has seen a 8x speedup in pull request #233. This results in the overhead of the EMAworkbench being reduced by over 70%. In a benchmark, a very simple Python model now performs approximately 40.000 iterations per second, compared to 15.000 in 2.3.0.

In addition to these performance upgrades, the examples have been added to the ReadTheDocs documentation, more documentation improvements have been made and many bugs and deprecations have been fixed.

The 2.4.x release series requires Python 3.8 and is tested on 3.8 to 3.11. It’s the last release series supporting Python 3.8. It can be installed as usual via PyPI, with:

pip install --upgrade ema-workbench

What’s Changed

🎉 New features added
  • optional preallocation in callback based on outcome shape and type by @quaquel in https://github.com/quaquel/EMAworkbench/pull/229

🛠 Enhancements made
  • util: Speed up combine by ~8x by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/233

  • callbacks: Improve performance of _store_outcomes by ~35x by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/232

🐛 Bugs fixed
  • fixes broken link to installation instructions by @quaquel in https://github.com/quaquel/EMAworkbench/pull/224

  • Docs: Fix developer installation commands by removing a space by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/220

  • fixes a bug where Prim modifies the experiments array by @quaquel in https://github.com/quaquel/EMAworkbench/pull/228

  • bugfix for warning on number of processes and max_processes by @quaquel in https://github.com/quaquel/EMAworkbench/pull/234

  • Fix deprecation warning and dtype issue in flu_example.py by @quaquel in https://github.com/quaquel/EMAworkbench/pull/235

  • test for get_results and categorical fix by @quaquel in https://github.com/quaquel/EMAworkbench/pull/241

  • Fix pynetlogo imports by decapitalizing pyNetLogo by @quaquel in https://github.com/quaquel/EMAworkbench/pull/248

  • change default value of um_p to be consistent with Borg documentation by @irene-sophia in https://github.com/quaquel/EMAworkbench/pull/250

  • Fix pretty print for RealParameter and IntegerParameter by @quaquel in https://github.com/quaquel/EMAworkbench/pull/255

  • Fix bug in AutoadaptiveOutputSpaceExploration with wrong default probabilities by @quaquel in https://github.com/quaquel/EMAworkbench/pull/252

📜 Documentation improvements
  • Parallexaxis doc by @quaquel in https://github.com/quaquel/EMAworkbench/pull/249

  • Examples added to the docs by @quaquel in https://github.com/quaquel/EMAworkbench/pull/244

🔧 Maintenance
  • clusterer: Update AgglomerativeClustering keyword to fix deprecation by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/218

  • Fix Matplotlib and SciPy deprecations by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/227

  • CI: Add job that runs tests with pre-release dependencies by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/217

  • Fix for stalling tests by @quaquel in https://github.com/quaquel/EMAworkbench/pull/247

Other changes
  • add metric argument to allow for other linkages by @mikhailsirenko in https://github.com/quaquel/EMAworkbench/pull/222

New Contributors

  • @mikhailsirenko made their first contribution in https://github.com/quaquel/EMAworkbench/pull/222

  • @irene-sophia made their first contribution in https://github.com/quaquel/EMAworkbench/pull/250

Full Changelog: https://github.com/quaquel/EMAworkbench/compare/2.3.0…2.4.0

2.3.0

Highlights

This release adds a new algorithm for output space exploration. The way in which convergence tracking for optimization is supported has been overhauled completely, see the updated directed search user guide for full details. The documentation has moreover been expanded with a comparison to Rhodium.

With this new release, the installation process has been improved by reducing the number of required dependencies. Recommended packages and connectors can now be installed as extras using pip, for example pip install -U ema-workbench[recommended,netlogo]. See the updated installation instructions for all options and details.

The 2.3.x release series supports Python 3.8 to 3.11. It can be installed as usual via PyPI, with:

pip install --upgrade ema-workbench

What’s Changed

🎉 New features added
  • Output space exploration by @quaquel in https://github.com/quaquel/EMAworkbench/pull/170

  • Convergence tracking by @quaquel in https://github.com/quaquel/EMAworkbench/pull/193

🛠 Enhancements made

  • Switch to using format string in prim logging by @quaquel in https://github.com/quaquel/EMAworkbench/pull/161

  • Replace setup.py with pyproject.toml and implement optional dependencies by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/166

🐛 Bugs fixed
  • use masked arrays for storing outcomes by @quaquel in https://github.com/quaquel/EMAworkbench/pull/176

  • Fix error for negative n_processes input in MultiprocessingEvaluator by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/189

  • optimization.py: Fix “epsilons” keyword argument in _optimize() by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/150

📜 Documentation improvements
  • Create initial CONTRIBUTING.md documentation by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/162

  • Create Read the Docs yaml configuration by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/173

  • update to outcomes documentation by @quaquel in https://github.com/quaquel/EMAworkbench/pull/183

  • Improved directed search tutorial by @quaquel in https://github.com/quaquel/EMAworkbench/pull/194

  • Update Contributing.md with instructions how to merge PRs by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/200

  • Update Readme with an introduction and documentation, installation and contribution sections by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/199

  • Rhodium docs by @quaquel in https://github.com/quaquel/EMAworkbench/pull/184

  • Fix spelling mistakes by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/195

🔧 Maintenance
  • Replace depreciated shade keyword in Seaborn kdeplot with fill by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/169

  • CI: Add pip depencency caching, don’t run on doc changes, update setup-python to v4 by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/174

  • Formatting: Format with Black, increase max line length to 100, combine multi-line blocks by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/178

  • Add pre-commit configuration and auto update CI by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/181

  • Fix Matplotlib, ipyparallel and dict deprecation warnings by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/202

  • CI: Start testing on Python 3.11 by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/156

  • Replace deprecated saltelli with sobol SALib 1.4.6+. by @quaquel in https://github.com/quaquel/EMAworkbench/pull/211

Other changes
  • Adds CITATION.cff by @quaquel in https://github.com/quaquel/EMAworkbench/pull/209

Full Changelog: https://github.com/quaquel/EMAworkbench/compare/2.2.0…2.3

2.2.0

Highlights

With the 2.2 release, the EMAworkbench can now connect to Vadere models on pedestrian dynamics. When inspecting a Prim Box peeling trajectory, multiple points on the peeling trajectory can be inspected simultaneously by inputting a list of integers into PrimBox.inspect().

When running experiments with multiprocessing using the MultiprocessingEvaluator, the number of processes can now be controlled using a negative integer as input for n_processes (for example, -2 on a 12-thread CPU results in 10 threads used). Also, it will now default to max. 61 processes on windows machines due to limitations inherent in Windows in dealing with higher processor counts. Code quality, CI, and error reporting also have been improved. And finally, generating these release notes is now automated.

What’s Changed

🎉 New features added
  • Vadere model connector by @floristevito in https://github.com/quaquel/EMAworkbench/pull/145

🛠 Enhancements made
  • Improve code quality with static analysis by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/119

  • prim.py: Make PrimBox.peeling_trajectory["id"] int instead of float by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/121

  • analysis: Allow optional annotation of plot_tradeoff graphs by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/123

  • evaluators.py: Allow MultiprocessingEvaluator to initialize with cpu_count minus N processes by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/140

  • PrimBox.inspect() now can also take a list of integers (aside from a single int) to inspect multiple points at once by @quaquel in https://github.com/quaquel/EMAworkbench/commit/6d83a6c33442ad4dce0974a384b03a225aaf830d (see also issue https://github.com/quaquel/EMAworkbench/issues/124)

🐛 Bugs fixed
  • fixed typo in lake_model.py by @JeffreyDillonLyons in https://github.com/quaquel/EMAworkbench/pull/136

📜 Documentation improvements
  • Docs: Installation.rst: Add how to install master or custom branch by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/122

  • Docs: Replace all http links with secure https URLs by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/134

  • Maintain release notes at CHANGELOG.md and include them in Readthedocs by @quaquel in https://github.com/quaquel/EMAworkbench/commit/ebdbc9f5c77693fc75911ead472b420065dfe2aa

  • Fix badge links in readme by @quaquel in https://github.com/quaquel/EMAworkbench/commit/28569bdcb149c070c329589969179be354b879ec

🔧 Maintenance
  • feature_scoring: fix Regressor criterion depreciation by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/125

  • feature_scoring.py: Change max_features in get_rf_feature_scores to "sqrt" by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/129

  • CI: Use Pytest instead of Nose, update default build to Python 3.10 by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/131

  • Release CI: Only upload packages if on main repo by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/132

  • CI: Split off flake8 linting in a separate job by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/133

  • CI: Add weekly scheduled jobs and manual trigger by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/137

  • setup.py: Add project_urls for documentation and issue tracker links by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/142

  • set scikit-learn requirement >= 1.0.0 by @rhysits in https://github.com/quaquel/EMAworkbench/pull/144

  • Create release.yml file for automatic release notes generation by @EwoutH in https://github.com/quaquel/EMAworkbench/pull/152

  • instantiating an Evaluator without one or more AbstractModel instances now raises a type error by @quaquel in https://github.com/quaquel/EMAworkbench/commit/a83533aa8166ca2414137cdfc3125a53ee3697ec

  • removes depreciated DataFrame.append by replacing it with DataFrame.concat (see the conversation on issue https://github.com/quaquel/EMAworkbench/issues/126):

    • from feature scoring by @quaquel in https://github.com/quaquel/EMAworkbench/commit/8b8bfe41733e49b75c01e34b75563e0a6d5b4024

    • from logistic_regression.py by @quaquel in https://github.com/quaquel/EMAworkbench/commit/255e3d6d9639dfe6fd4e797e1c63d59ba0522c2d

  • removes NumPy datatypes deprecated in 1.20 by @quaquel in https://github.com/quaquel/EMAworkbench/commit/e8fbf6fc64f14b7c7220fa4d3fc976c42d3757eb

  • replace deprecated scipy.stats.kde with scipy.stats by @quaquel in https://github.com/quaquel/EMAworkbench/commit/b5a9ca967740e74d503281018e88d6b28e74a27d

New Contributors

  • @JeffreyDillonLyons made their first contribution in https://github.com/quaquel/EMAworkbench/pull/136

  • @rhysits made their first contribution in https://github.com/quaquel/EMAworkbench/pull/144

  • @floristevito made their first contribution in https://github.com/quaquel/EMAworkbench/pull/145

Full Changelog: https://github.com/quaquel/EMAworkbench/compare/2.1.2…2.2.0

Tutorials

The code of these examples can be found in the examples package. The first three examples are meant to illustrate the basics of the EMA workbench. How to implement a model, specify its uncertainties and outcomes, and run it. The fourth example is a more extensive illustration based on Pruyt & Hamarat (2010). It shows some more advanced possibilities of the EMA workbench, including one way of handling policies.

A simple model in Python

The simplest case is where we have a model available through a python function. For example, imagine we have the simple model.

def some_model(x1=None, x2=None, x3=None):
    return {'y':x1*x2+x3}

In order to control this model from the workbench, we can make use of the Model. We can instantiate a model object, by passing it a name, and the function.

model = Model('simpleModel', function=some_model) #instantiate the model

Next, we need to specify the uncertainties and the outcomes of the model. In this case, the uncertainties are x1, x2, and x3, while the outcome is y. Both uncertainties and outcomes are attributes of the model object, so we can say

1#specify uncertainties
2model.uncertainties = [RealParameter("x1", 0.1, 10),
3                       RealParameter("x2", -0.01,0.01),
4                       RealParameter("x3", -0.01,0.01)]
5#specify outcomes
6model.outcomes = [ScalarOutcome('y')]

Here, we specify that x1 is some value between 0.1, and 10, while both x2 and x3 are somewhere between -0.01 and 0.01. Having implemented this model, we can now investigate the model behavior over the set of uncertainties by simply calling

results = perform_experiments(model, 100)

The function perform_experiments() takes the model we just specified and will execute 100 experiments. By default, these experiments are generated using a Latin Hypercube sampling, but Monte Carlo sampling and Full factorial sampling are also readily available. Read the documentation for perform_experiments() for more details.

The complete code:

 1"""
 2Created on 20 dec. 2010
 3
 4This file illustrated the use the EMA classes for a contrived example
 5It's main purpose has been to test the parallel processing functionality
 6
 7.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 8"""
 9
10from ema_workbench import Model, RealParameter, ScalarOutcome, ema_logging, perform_experiments
11
12
13def some_model(x1=None, x2=None, x3=None):
14    return {"y": x1 * x2 + x3}
15
16
17if __name__ == "__main__":
18    ema_logging.LOG_FORMAT = "[%(name)s/%(levelname)s/%(processName)s] %(message)s"
19    ema_logging.log_to_stderr(ema_logging.INFO)
20
21    model = Model("simpleModel", function=some_model)  # instantiate the model
22
23    # specify uncertainties
24    model.uncertainties = [
25        RealParameter("x1", 0.1, 10),
26        RealParameter("x2", -0.01, 0.01),
27        RealParameter("x3", -0.01, 0.01),
28    ]
29    # specify outcomes
30    model.outcomes = [ScalarOutcome("y")]
31
32    results = perform_experiments(model, 100)

A simple model in Vensim

Imagine we have a very simple Vensim model:

_images/simpleVensimModel.png

For this example, we assume that ‘x11’ and ‘x12’ are uncertain. The state variable ‘a’ is the outcome of interest. Similar to the previous example, we have to first instantiate a vensim model object, in this case VensimModel. To this end, we need to specify the directory in which the vensim file resides, the name of the vensim file and the name of the model.

wd = r'./models/vensim example'
model = VensimModel("simpleModel", wd=wd, model_file=r'\model.vpm')

Next, we can specify the uncertainties and the outcomes.

1model.uncertainties = [RealParameter("x11", 0, 2.5),
2                       RealParameter("x12", -2.5, 2.5)]
3
4
5model.outcomes = [TimeSeriesOutcome('a')]

Note that we are using a TimeSeriesOutcome, because vensim results are time series. We can now simply run this model by calling perform_experiments().

with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.perform_experiments(1000)

We now use a evaluator, which ensures that the code is executed in parallel.

Is it generally good practice to first run a model a small number of times sequentially prior to running in parallel. In this way, bugs etc. can be spotted more easily. To further help with keeping track of what is going on, it is also good practice to make use of the logging functionality provided by the workbench

ema_logging.log_to_stderr(ema_logging.INFO)

Typically, this line appears at the start of the script. When executing the code, messages on progress or on errors will be shown.

The complete code

 1"""
 2Created on 3 Jan. 2011
 3
 4This file illustrated the use the EMA classes for a contrived vensim
 5example
 6
 7
 8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 9                chamarat <c.hamarat (at) tudelft (dot) nl>
10"""
11
12from ema_workbench import TimeSeriesOutcome, perform_experiments, RealParameter, ema_logging
13
14from ema_workbench.connectors.vensim import VensimModel
15
16if __name__ == "__main__":
17    # turn on logging
18    ema_logging.log_to_stderr(ema_logging.INFO)
19
20    # instantiate a model
21    wd = "./models/vensim example"
22    vensimModel = VensimModel("simpleModel", wd=wd, model_file="model.vpm")
23    vensimModel.uncertainties = [RealParameter("x11", 0, 2.5), RealParameter("x12", -2.5, 2.5)]
24
25    vensimModel.outcomes = [TimeSeriesOutcome("a")]
26
27    results = perform_experiments(vensimModel, 1000)

A simple model in Excel

In order to perform EMA on an Excel model, one can use the ExcelModel. This base class makes uses of naming cells in Excel to refer to them directly. That is, we can assume that the names of the uncertainties correspond to named cells in Excel, and similarly, that the names of the outcomes correspond to named cells or ranges of cells in Excel. When using this class, make sure that the decimal separator and thousands separator are set correctly in Excel. This can be checked via file > options > advanced. These separators should follow the anglo saxon convention.

 1"""
 2Created on 27 Jul. 2011
 3
 4This file illustrated the use the EMA classes for a model in Excel.
 5
 6It used the excel file provided by
 7`A. Sharov <https://home.comcast.net/~sharov/PopEcol/lec10/fullmod.html>`_
 8
 9This excel file implements a simple predator prey model.
10
11.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
12"""
13
14from ema_workbench import RealParameter, TimeSeriesOutcome, ema_logging, perform_experiments
15
16from ema_workbench.connectors.excel import ExcelModel
17from ema_workbench.em_framework.evaluators import MultiprocessingEvaluator
18
19if __name__ == "__main__":
20    ema_logging.log_to_stderr(level=ema_logging.INFO)
21
22    model = ExcelModel("predatorPrey", wd="./models/excelModel", model_file="excel example.xlsx")
23    model.uncertainties = [
24        RealParameter("K2", 0.01, 0.2),
25        # we can refer to a cell in the normal way
26        # we can also use named cells
27        RealParameter("KKK", 450, 550),
28        RealParameter("rP", 0.05, 0.15),
29        RealParameter("aaa", 0.00001, 0.25),
30        RealParameter("tH", 0.45, 0.55),
31        RealParameter("kk", 0.1, 0.3),
32    ]
33
34    # specification of the outcomes
35    model.outcomes = [
36        TimeSeriesOutcome("B4:B1076"),
37        # we can refer to a range in the normal way
38        TimeSeriesOutcome("P_t"),
39    ]  # we can also use named range
40
41    # name of the sheet
42    model.default_sheet = "Sheet1"
43
44    with MultiprocessingEvaluator(model) as evaluator:
45        results = perform_experiments(model, 100, reporting_interval=1, evaluator=evaluator)

The example is relatively straight forward. We instantiate an excel model, we specify the uncertainties and the outcomes. We also need to specify the sheet in excel on which the model resides. Next we can call perform_experiments().

Warning

when using named cells. Make sure that the names are defined at the sheet level and not at the workbook level

A more elaborate example: Mexican Flu

This example is derived from Pruyt & Hamarat (2010). This paper presents a small exploratory System Dynamics model related to the dynamics of the 2009 flu pandemic, also known as the Mexican flu, swine flu, or A(H1N1)v. The model was developed in May 2009 in order to quickly foster understanding about the possible dynamics of this new flu variant and to perform rough-cut policy explorations. Later, the model was also used to further develop and illustrate Exploratory Modelling and Analysis.

Mexican Flu: the basic model

In the first days, weeks and months after the first reports about the outbreak of a new flu variant in Mexico and the USA, much remained unknown about the possible dynamics and consequences of the at the time plausible/imminent epidemic/pandemic of the new flu variant, first known as Swine or Mexican flu and known today as Influenza A(H1N1)v.

The exploratory model presented here is small, simple, high-level, data-poor (no complex/special structures nor detailed data beyond crude guestimates), and history-poor.

The modelled world is divided in three regions: the Western World, the densely populated Developing World, and the scarcely populated Developing World. Only the two first regions are included in the model because it is assumed that the scarcely populated regions are causally less important for dynamics of flu pandemics. Below, the figure shows the basic stock-and-flow structure. For a more elaborate description of the model, see Pruyt & Hamarat (2010).

_images/flu-model.png

Given the various uncertainties about the exact characteristics of the flu, including its fatality rate, the contact rate, the susceptibility of the population, etc. the flu case is an ideal candidate for EMA. One can use EMA to explore the kinds of dynamics that can occur, identify undesirable dynamic, and develop policies targeted at the undesirable dynamics.

In the original paper, Pruyt & Hamarat (2010). recoded the model in Python and performed the analysis in that way. Here we show how the EMA workbench can be connected to Vensim directly.

The flu model was build in Vensim. We can thus use VensimModelS as a base class.

We are interested in two outcomes:

  • deceased population region 1: the total number of deaths over the duration of the simulation.

  • peak infected fraction: the fraction of the population that is infected.

These are added to self.outcomes, using the TimeSeriesOutcome class.

The table below is adapted from Pruyt & Hamarat (2010). It shows the uncertainties, and their bounds. These are added to self.uncertainties as ParameterUncertainty instances.

Parameter

Lower Limit

Upper Limit

additional seasonal immune population fraction region 1

0.0

0.5

additional seasonal immune population fraction region 2

0.0

0.5

fatality ratio region 1

0.0001

0.1

fatality ratio region 2

0.0001

0.1

initial immune fraction of the population of region 1

0.0

0.5

initial immune fraction of the population of region 2

0.0

0.5

normal interregional contact rate

0.0

0.9

permanent immune population fraction region 1

0.0

0.5

permanent immune population fraction region 2

0.0

0.5

recovery time region 1

0.2

0.8

recovery time region 2

0.2

0.8

root contact rate region 1

1.0

10.0

root contact rate region 2

1.0

10.0

infection ratio region 1

0.0

0.1

infection ratio region 2

0.0

0.1

normal contact rate region 1

10

200

normal contact rate region 2

10

200

Together, this results in the following code:

  1"""
  2Created on 20 May, 2011
  3
  4This module shows how you can use vensim models directly
  5instead of coding the model in Python. The underlying case
  6is the same as used in fluExample
  7
  8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
  9                epruyt <e.pruyt (at) tudelft (dot) nl>
 10"""
 11
 12from ema_workbench import (
 13    RealParameter,
 14    TimeSeriesOutcome,
 15    ema_logging,
 16    perform_experiments,
 17    MultiprocessingEvaluator,
 18    save_results,
 19)
 20
 21from ema_workbench.connectors.vensim import VensimModel
 22
 23if __name__ == "__main__":
 24    ema_logging.log_to_stderr(ema_logging.INFO)
 25
 26    model = VensimModel("fluCase", wd=r"./models/flu", model_file=r"FLUvensimV1basecase.vpm")
 27
 28    # outcomes
 29    model.outcomes = [
 30        TimeSeriesOutcome(
 31            "deceased_population_region_1", variable_name="deceased population region 1"
 32        ),
 33        TimeSeriesOutcome("infected_fraction_R1", variable_name="infected fraction R1"),
 34    ]
 35
 36    # Plain Parametric Uncertainties
 37    model.uncertainties = [
 38        RealParameter(
 39            "additional_seasonal_immune_population_fraction_R1",
 40            0,
 41            0.5,
 42            variable_name="additional seasonal immune population fraction R1",
 43        ),
 44        RealParameter(
 45            "additional_seasonal_immune_population_fraction_R2",
 46            0,
 47            0.5,
 48            variable_name="additional seasonal immune population fraction R2",
 49        ),
 50        RealParameter(
 51            "fatality_ratio_region_1", 0.0001, 0.1, variable_name="fatality ratio region 1"
 52        ),
 53        RealParameter(
 54            "fatality_rate_region_2", 0.0001, 0.1, variable_name="fatality rate region 2"
 55        ),
 56        RealParameter(
 57            "initial_immune_fraction_of_the_population_of_region_1",
 58            0,
 59            0.5,
 60            variable_name="initial immune fraction of the population of region 1",
 61        ),
 62        RealParameter(
 63            "initial_immune_fraction_of_the_population_of_region_2",
 64            0,
 65            0.5,
 66            variable_name="initial immune fraction of the population of region 2",
 67        ),
 68        RealParameter(
 69            "normal_interregional_contact_rate",
 70            0,
 71            0.9,
 72            variable_name="normal interregional contact rate",
 73        ),
 74        RealParameter(
 75            "permanent_immune_population_fraction_R1",
 76            0,
 77            0.5,
 78            variable_name="permanent immune population fraction R1",
 79        ),
 80        RealParameter(
 81            "permanent_immune_population_fraction_R2",
 82            0,
 83            0.5,
 84            variable_name="permanent immune population fraction R2",
 85        ),
 86        RealParameter("recovery_time_region_1", 0.1, 0.75, variable_name="recovery time region 1"),
 87        RealParameter("recovery_time_region_2", 0.1, 0.75, variable_name="recovery time region 2"),
 88        RealParameter(
 89            "susceptible_to_immune_population_delay_time_region_1",
 90            0.5,
 91            2,
 92            variable_name="susceptible to immune population delay time region 1",
 93        ),
 94        RealParameter(
 95            "susceptible_to_immune_population_delay_time_region_2",
 96            0.5,
 97            2,
 98            variable_name="susceptible to immune population delay time region 2",
 99        ),
100        RealParameter(
101            "root_contact_rate_region_1", 0.01, 5, variable_name="root contact rate region 1"
102        ),
103        RealParameter(
104            "root_contact_ratio_region_2", 0.01, 5, variable_name="root contact ratio region 2"
105        ),
106        RealParameter(
107            "infection_ratio_region_1", 0, 0.15, variable_name="infection ratio region 1"
108        ),
109        RealParameter("infection_rate_region_2", 0, 0.15, variable_name="infection rate region 2"),
110        RealParameter(
111            "normal_contact_rate_region_1", 10, 100, variable_name="normal contact rate region 1"
112        ),
113        RealParameter(
114            "normal_contact_rate_region_2", 10, 200, variable_name="normal contact rate region 2"
115        ),
116    ]
117
118    nr_experiments = 1000
119    with MultiprocessingEvaluator(model) as evaluator:
120        results = perform_experiments(model, nr_experiments, evaluator=evaluator)
121
122    save_results(results, "./data/1000 flu cases no policy.tar.gz")

We have now instantiated the model, specified the uncertain factors and outcomes and run the model. We now have generated a dataset of results and can proceed to analyse the results using various analysis scripts. As a first step, one can look at the individual runs using a line plot using lines(). See plotting for some more visualizations using results from performing EMA on FluModel.

1import matplotlib.pyplot as plt
2from ema_workbench.analysis.plotting import lines
3
4figure = lines(results, density=True) #show lines, and end state density
5plt.show() #show figure

generates the following figure:

_images/tutorial-lines.png

From this figure, one can deduce that across the ensemble of possible futures, there is a subset of runs with a substantial amount of deaths. We can zoom in on those cases, identify their conditions for occurring, and use this insight for policy design.

For further analysis, it is generally convenient, to generate the results for a series of experiments and save these results. One can then use these saved results in various analysis scripts.

from ema_workbench import save_results
save_results(results, './1000 runs.tar.gz')

The above code snippet shows how we can use save_results() for saving the results of our experiments. save_results() stores the as csv files in a tarball.

Mexican Flu: policies

For this paper, policies were developed by using the system understanding of the analysts.

static policy
adaptive policy
running the policies

In order to be able to run the models with the policies and to compare their results with the no policy case, we need to specify the policies

1#add policies
2policies = [Policy('no policy',
3                   model_file=r'/FLUvensimV1basecase.vpm'),
4            Policy('static policy',
5                   model_file=r'/FLUvensimV1static.vpm'),
6            Policy('adaptive policy',
7                   model_file=r'/FLUvensimV1dynamic.vpm')
8            ]

In this case, we have chosen to have the policies implemented in separate vensim files. Policies require a name, and can take any other keyword arguments you like. If the keyword matches an attribute on the model object, it will be updated, so model_file is an attribute on the vensim model. When executing the policies, we update this attribute for each policy. We can pass these policies to perform_experiment() as an additional keyword argument

results = perform_experiments(model, 1000, policies=policies)

We can now proceed in the same way as before, and perform a series of experiments. Together, this results in the following code:

  1"""
  2Created on 20 May, 2011
  3
  4This module shows how you can use vensim models directly
  5instead of coding the model in Python. The underlying case
  6is the same as used in fluExample
  7
  8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
  9                epruyt <e.pruyt (at) tudelft (dot) nl>
 10"""
 11
 12import numpy as np
 13
 14from ema_workbench import (
 15    RealParameter,
 16    TimeSeriesOutcome,
 17    ema_logging,
 18    ScalarOutcome,
 19    perform_experiments,
 20    Policy,
 21    save_results,
 22)
 23from ema_workbench.connectors.vensim import VensimModel
 24
 25if __name__ == "__main__":
 26    ema_logging.log_to_stderr(ema_logging.INFO)
 27
 28    model = VensimModel("fluCase", wd=r"./models/flu", model_file=r"FLUvensimV1basecase.vpm")
 29
 30    # outcomes
 31    model.outcomes = [
 32        TimeSeriesOutcome(
 33            "deceased_population_region_1", variable_name="deceased population region 1"
 34        ),
 35        TimeSeriesOutcome("infected_fraction_R1", variable_name="infected fraction R1"),
 36        ScalarOutcome(
 37            "max_infection_fraction", variable_name="infected fraction R1", function=np.max
 38        ),
 39    ]
 40
 41    # Plain Parametric Uncertainties
 42    model.uncertainties = [
 43        RealParameter(
 44            "additional_seasonal_immune_population_fraction_R1",
 45            0,
 46            0.5,
 47            variable_name="additional seasonal immune population fraction R1",
 48        ),
 49        RealParameter(
 50            "additional_seasonal_immune_population_fraction_R2",
 51            0,
 52            0.5,
 53            variable_name="additional seasonal immune population fraction R2",
 54        ),
 55        RealParameter(
 56            "fatality_ratio_region_1", 0.0001, 0.1, variable_name="fatality ratio region 1"
 57        ),
 58        RealParameter(
 59            "fatality_rate_region_2", 0.0001, 0.1, variable_name="fatality rate region 2"
 60        ),
 61        RealParameter(
 62            "initial_immune_fraction_of_the_population_of_region_1",
 63            0,
 64            0.5,
 65            variable_name="initial immune fraction of the population of region 1",
 66        ),
 67        RealParameter(
 68            "initial_immune_fraction_of_the_population_of_region_2",
 69            0,
 70            0.5,
 71            variable_name="initial immune fraction of the population of region 2",
 72        ),
 73        RealParameter(
 74            "normal_interregional_contact_rate",
 75            0,
 76            0.9,
 77            variable_name="normal interregional contact rate",
 78        ),
 79        RealParameter(
 80            "permanent_immune_population_fraction_R1",
 81            0,
 82            0.5,
 83            variable_name="permanent immune population fraction R1",
 84        ),
 85        RealParameter(
 86            "permanent_immune_population_fraction_R2",
 87            0,
 88            0.5,
 89            variable_name="permanent immune population fraction R2",
 90        ),
 91        RealParameter("recovery_time_region_1", 0.1, 0.75, variable_name="recovery time region 1"),
 92        RealParameter("recovery_time_region_2", 0.1, 0.75, variable_name="recovery time region 2"),
 93        RealParameter(
 94            "susceptible_to_immune_population_delay_time_region_1",
 95            0.5,
 96            2,
 97            variable_name="susceptible to immune population delay time region 1",
 98        ),
 99        RealParameter(
100            "susceptible_to_immune_population_delay_time_region_2",
101            0.5,
102            2,
103            variable_name="susceptible to immune population delay time region 2",
104        ),
105        RealParameter(
106            "root_contact_rate_region_1", 0.01, 5, variable_name="root contact rate region 1"
107        ),
108        RealParameter(
109            "root_contact_ratio_region_2", 0.01, 5, variable_name="root contact ratio region 2"
110        ),
111        RealParameter(
112            "infection_ratio_region_1", 0, 0.15, variable_name="infection ratio region 1"
113        ),
114        RealParameter("infection_rate_region_2", 0, 0.15, variable_name="infection rate region 2"),
115        RealParameter(
116            "normal_contact_rate_region_1", 10, 100, variable_name="normal contact rate region 1"
117        ),
118        RealParameter(
119            "normal_contact_rate_region_2", 10, 200, variable_name="normal contact rate region 2"
120        ),
121    ]
122
123    # add policies
124    policies = [
125        Policy("no policy", model_file=r"FLUvensimV1basecase.vpm"),
126        Policy("static policy", model_file=r"FLUvensimV1static.vpm"),
127        Policy("adaptive policy", model_file=r"FLUvensimV1dynamic.vpm"),
128    ]
129
130    results = perform_experiments(model, 1000, policies=policies)
131    save_results(results, "./data/1000 flu cases with policies.tar.gz")
comparison of results

Using the following script, we reproduce figures similar to the 3D figures in Pruyt & Hamarat (2010). But using pairs_scatter(). It shows for the three different policies their behavior on the total number of deaths, the height of the highest peak of the pandemic, and the point in time at which this peak was reached.

 1"""
 2Created on 20 sep. 2011
 3
 4.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 5"""
 6
 7import matplotlib.pyplot as plt
 8import numpy as np
 9
10from ema_workbench import load_results, ema_logging
11from ema_workbench.analysis.pairs_plotting import pairs_lines, pairs_scatter, pairs_density
12
13ema_logging.log_to_stderr(level=ema_logging.DEFAULT_LEVEL)
14
15# load the data
16fh = "./data/1000 flu cases no policy.tar.gz"
17experiments, outcomes = load_results(fh)
18
19# transform the results to the required format
20# that is, we want to know the max peak and the casualties at the end of the
21# run
22tr = {}
23
24# get time and remove it from the dict
25time = outcomes.pop("TIME")
26
27for key, value in outcomes.items():
28    if key == "deceased_population_region_1":
29        tr[key] = value[:, -1]  # we want the end value
30    else:
31        # we want the maximum value of the peak
32        max_peak = np.max(value, axis=1)
33        tr["max peak"] = max_peak
34
35        # we want the time at which the maximum occurred
36        # the code here is a bit obscure, I don't know why the transpose
37        # of value is needed. This however does produce the appropriate results
38        logical = value.T == np.max(value, axis=1)
39        tr["time of max"] = time[logical.T]
40
41pairs_scatter(experiments, tr, filter_scalar=False)
42pairs_lines(experiments, outcomes)
43pairs_density(experiments, tr, filter_scalar=False)
44plt.show()

no policy

_images/multiplot-flu-no-policy.png

static policy

_images/multiplot-flu-static-policy.png

adaptive policy

_images/multiplot-flu-adaptive-policy.png

General Introduction

Since 2010, I have been working on the development of an open source toolkit for supporting decision-making under deep uncertainty. This toolkit is known as the exploratory modeling workbench. The motivation for this name is that in my opinion all model-based deep uncertainty approaches are forms of exploratory modeling as first introduced by Bankes (1993). The design of the workbench has undergone various changes over time, but it has started to stabilize in the fall of 2016. In the summer 0f 2017, I published a paper detailing the workbench (Kwakkel, 2017). There is an in depth example in the paper, but for the documentation I want to provide a more tutorial style description of the functionality of the workbench.

As a starting point, I will use the Direct Policy Search example that is available for Rhodium (Quinn et al 2017). A quick note on terminology is in order here. I have a background in transport modeling. Here we often use discrete event simulation models. These are intrinsically stochastic models. It is standard practice to run these models several times and take descriptive statistics over the set of runs. In discrete event simulation, and also in the context of agent based modeling, this is known as running replications. The workbench adopts this terminology and draws a sharp distinction between designing experiments over a set of deeply uncertain factors, and performing replications of this experiment to cope with stochastic uncertainty.

[1]:
import math

# more or less default imports when using
# the workbench
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from scipy.optimize import brentq


def get_antropogenic_release(xt, c1, c2, r1, r2, w1):
    """

    Parameters
    ----------
    xt : float
         pollution in lake at time t
    c1 : float
         center rbf 1
    c2 : float
         center rbf 2
    r1 : float
         ratius rbf 1
    r2 : float
         ratius rbf 2
    w1 : float
         weight of rbf 1

    Returns
    -------
    float

    note:: w2 = 1 - w1

    """

    rule = w1 * (abs(xt - c1) / r1) ** 3 + (1 - w1) * (abs(xt - c2) / r2) ** 3
    at1 = max(rule, 0.01)
    at = min(at1, 0.1)

    return at


def lake_model(
    b=0.42,
    q=2.0,
    mean=0.02,
    stdev=0.001,
    delta=0.98,
    alpha=0.4,
    nsamples=100,
    myears=100,
    c1=0.25,
    c2=0.25,
    r1=0.5,
    r2=0.5,
    w1=0.5,
    seed=None,
):
    """runs the lake model for nsamples stochastic realisation using
    specified random seed.

    Parameters
    ----------
    b : float
        decay rate for P in lake (0.42 = irreversible)
    q : float
        recycling exponent
    mean : float
            mean of natural inflows
    stdev : float
            standard deviation of natural inflows
    delta : float
            future utility discount rate
    alpha : float
            utility from pollution
    nsamples : int, optional
    myears : int, optional
    c1 : float
    c2 : float
    r1 : float
    r2 : float
    w1 : float
    seed : int, optional
           seed for the random number generator

    Returns
    -------
    tuple

    """
    np.random.seed(seed)
    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)

    X = np.zeros((myears,))
    average_daily_P = np.zeros((myears,))
    reliability = 0.0
    inertia = 0
    utility = 0

    for _ in range(nsamples):
        X[0] = 0.0
        decision = 0.1

        decisions = np.zeros(myears)
        decisions[0] = decision

        natural_inflows = np.random.lognormal(
            math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
            math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
            size=myears,
        )

        for t in range(1, myears):
            # here we use the decision rule
            decision = get_antropogenic_release(X[t - 1], c1, c2, r1, r2, w1)
            decisions[t] = decision

            X[t] = (
                (1 - b) * X[t - 1]
                + X[t - 1] ** q / (1 + X[t - 1] ** q)
                + decision
                + natural_inflows[t - 1]
            )
            average_daily_P[t] += X[t] / nsamples

        reliability += np.sum(X < Pcrit) / (nsamples * myears)
        inertia += np.sum(np.absolute(np.diff(decisions) < 0.02)) / (nsamples * myears)
        utility += np.sum(alpha * decisions * np.power(delta, np.arange(myears))) / nsamples
    max_P = np.max(average_daily_P)
    return max_P, utility, inertia, reliability

Connecting a python function to the workbench

Now we are ready to connect this model to the workbench. We have to specify the uncertainties, the outcomes, and the policy levers. For the uncertainties and the levers, we can use real valued parameters, integer valued parameters, and categorical parameters. For outcomes, we can use either scalar, single valued outcomes or time series outcomes. For convenience, we can also explicitly control constants in case we want to have them set to a value different from their default value.

[2]:
from ema_workbench import RealParameter, ScalarOutcome, Constant, Model
from dps_lake_model import lake_model

model = Model("lakeproblem", function=lake_model)

# specify uncertainties
model.uncertainties = [
    RealParameter("b", 0.1, 0.45),
    RealParameter("q", 2.0, 4.5),
    RealParameter("mean", 0.01, 0.05),
    RealParameter("stdev", 0.001, 0.005),
    RealParameter("delta", 0.93, 0.99),
]

# set levers
model.levers = [
    RealParameter("c1", -2, 2),
    RealParameter("c2", -2, 2),
    RealParameter("r1", 0, 2),
    RealParameter("r2", 0, 2),
    RealParameter("w1", 0, 1),
]

# specify outcomes
model.outcomes = [
    ScalarOutcome("max_P"),
    ScalarOutcome("utility"),
    ScalarOutcome("inertia"),
    ScalarOutcome("reliability"),
]

# override some of the defaults of the model
model.constants = [
    Constant("alpha", 0.41),
    Constant("nsamples", 150),
    Constant("myears", 100),
]

Performing experiments

Now that we have specified the model with the workbench, we are ready to perform experiments on it. We can use evaluators to distribute these experiments either over multiple cores on a single machine, or over a cluster using ipyparallel. Using any parallelization is an advanced topic, in particular if you are on a windows machine. The code as presented here will run fine in parallel on a mac or Linux machine. If you are trying to run this in parallel using multiprocessing on a windows machine, from within a jupyter notebook, it won’t work. The solution is to move the lake_model and get_antropogenic_release to a separate python module and import the lake model function into the notebook.

Another common practice when working with the exploratory modeling workbench is to turn on the logging functionality that it provides. This will report on the progress of the experiments, as well as provide more insight into what is happening in particular in case of errors.

If we want to perform experiments on the model we have just defined, we can use the perform_experiments method on the evaluator, or the stand alone perform_experiments function. We can perform experiments over the uncertainties and/or over the levers. Any given parameterization of the levers is known as a policy, while any given parametrization over the uncertainties is known as a scenario. Any policy is evaluated over each of the scenarios. So if we want to use 100 scenarios and 10 policies, this means that we will end up performing 100 * 10 = 1000 experiments. By default, the workbench uses Latin hypercube sampling for both sampling over levers and sampling over uncertainties. However, the workbench also offers support for full factorial, partial factorial, and Monte Carlo sampling, as well as wrappers for the various sampling schemes provided by SALib.

The ema_workbench offers support for parallelization of the execution of the experiments using either the multiprocessing or ipyparallel. There are various OS specific concerns you have to keep in mind when using either of these libraries. Please have a look at the documentation of these libraries, before using them.

[3]:
from ema_workbench import MultiprocessingEvaluator, ema_logging, perform_experiments

ema_logging.log_to_stderr(ema_logging.INFO)

with MultiprocessingEvaluator(model) as evaluator:
    results = evaluator.perform_experiments(scenarios=1000, policies=10)
[MainProcess/INFO] pool started with 10 workers
[MainProcess/INFO] performing 1000 scenarios * 10 policies * 1 model(s) = 10000 experiments
100%|███████████████████████████████████| 10000/10000 [00:57<00:00, 173.34it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool

Processing the results of the experiments

By default, the return of perform_experiments is a tuple of length 2. The first item in the tuple is the experiments. The second item is the outcomes. Experiments and outcomes are aligned by index. The experiments are stored in a Pandas DataFrame, while the outcomes are a dict with the name of the outcome as key, and the values are in a numpy array.

[4]:
experiments, outcomes = results
print(experiments.shape)
print(list(outcomes.keys()))
(10000, 13)
['max_P', 'utility', 'inertia', 'reliability']

We can easily visualize these results. The workbench comes with various convenience plotting functions built on top of matplotlib and seaborn. We can however also easily use seaborn or matplotlib directly. For example, we can create a pairplot using seaborn where we group our outcomes by policy. For this, we need to create a dataframe with the outcomes and a policy column. By default the name of a policy is a string representation of the dict with lever names and values. We can replace this easily with a number as shown below.

[5]:
data = pd.DataFrame(outcomes)
data["policy"] = experiments["policy"]

Next, all that is left is to use seaborn’s pairplot function to visualize the data.

[6]:
sns.pairplot(data, hue="policy", vars=list(outcomes.keys()))
plt.show()
_images/indepth_tutorial_general-introduction_11_0.png
[ ]:

Open exploration

In this second tuturial, I will showcase how to use the ema_workbench for performing open exploration. This tuturial will continue with the same example as used in the previous tuturial.

some background

In exploratory modeling, we are interested in understanding how regions in the uncertainty space and/or the decision space map to the whole outcome space, or partitions thereof. There are two general approaches for investigating this mapping. The first one is through systematic sampling of the uncertainty or decision space. This is sometimes also known as open exploration. The second one is to search through the space in a directed manner using some type of optimization approach. This is sometimes also known as directed search.

The workbench support both open exploration and directed search. Both can be applied to investigate the mapping of the uncertainty space and/or the decision space to the outcome space. In most applications, search is used for finding promising mappings from the decision space to the outcome space, while exploration is used to stress test these mappings under a whole range of possible resolutions to the various uncertainties. This need not be the case however. Optimization can be used to discover the worst possible scenario, while sampling can be used to get insight into the sensitivity of outcomes to the various decision levers.

open exploration

To showcase the open exploration functionality, let’s start with a basic example using the Direct Policy Search (DPS) version of the lake problem (Quinn et al 2017). This is the same model as we used in the general introduction. Note that for convenience, I have moved the code for the model to a module called dps_lake_model.py, which I import here for further use.

We are going to simultaneously sample over uncertainties and decision levers. We are going to generate 1000 scenarios and 5 policies, and see how they jointly affect the outcomes. A scenario is understood as a point in the uncertainty space, while a policy is a point in the decision space. The combination of a scenario and a policy is called experiment. The uncertainty space is spanned by uncertainties, while the decision space is spanned by levers.

Both uncertainties and levers are instances of RealParameter (a continuous range), IntegerParameter (a range of integers), or CategoricalParameter (an unorder set of things). By default, the workbench will use Latin Hypercube sampling for generating both the scenarios and the policies. Each policy will be always evaluated over all scenarios (i.e. a full factorial over scenarios and policies).

[1]:
from ema_workbench import RealParameter, ScalarOutcome, Constant, Model
from dps_lake_model import lake_model

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

model = Model("lakeproblem", function=lake_model)

# specify uncertainties
model.uncertainties = [
    RealParameter("b", 0.1, 0.45),
    RealParameter("q", 2.0, 4.5),
    RealParameter("mean", 0.01, 0.05),
    RealParameter("stdev", 0.001, 0.005),
    RealParameter("delta", 0.93, 0.99),
]

# set levers
model.levers = [
    RealParameter("c1", -2, 2),
    RealParameter("c2", -2, 2),
    RealParameter("r1", 0, 2),
    RealParameter("r2", 0, 2),
    RealParameter("w1", 0, 1),
]

# specify outcomes
model.outcomes = [
    ScalarOutcome("max_P"),
    ScalarOutcome("utility"),
    ScalarOutcome("inertia"),
    ScalarOutcome("reliability"),
]

# override some of the defaults of the model
model.constants = [
    Constant("alpha", 0.41),
    Constant("nsamples", 150),
    Constant("myears", 100),
]
[2]:
from ema_workbench import MultiprocessingEvaluator, ema_logging, perform_experiments

ema_logging.log_to_stderr(ema_logging.INFO)

# The n_processes=-1 ensures that all cores except 1 are used, which is kept free to keep using the computer
with MultiprocessingEvaluator(model, n_processes=-1) as evaluator:
    # Run 1000 scenarios for 5 policies
    experiments, outcomes = evaluator.perform_experiments(scenarios=1000, policies=5)
[MainProcess/INFO] pool started with 7 workers
[MainProcess/INFO] performing 1000 scenarios * 5 policies * 1 model(s) = 5000 experiments
100%|██████████████████████████████████████| 5000/5000 [01:02<00:00, 80.06it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool

Visual analysis

Having generated these results, the next step is to analyze them and see what we can learn from the results. The workbench comes with a variety of techniques for this analysis. A simple first step is to make a few quick visualizations of the results. The workbench has convenience functions for this, but it also possible to create your own visualizations using the scientific Python stack.

[3]:
from ema_workbench.analysis import pairs_plotting

fig, axes = pairs_plotting.pairs_scatter(experiments, outcomes, group_by="policy", legend=False)
fig.set_size_inches(8, 8)
plt.show()
[MainProcess/INFO] no time dimension found in results
_images/indepth_tutorial_open-exploration_4_1.png

Often, it is convenient to separate the process of performing the experiments from the analysis. To make this possible, the workbench offers convenience functions for storing results to disc and loading them from disc. The workbench will store the results in a tarbal with .csv files and separate metadata files. This is a convenient format that has proven sufficient over the years.

from ema_workbench import save_results
save_results(results, '1000 scenarios 5 policies.tar.gz')

from ema_workbench import load_results
results = load_results('1000 scenarios 5 policies.tar.gz')

advanced analysis

In addition to visual analysis, the workbench comes with a variety of techniques to perform a more in-depth analysis of the results. In addition, other analyses can simply be performed by utilizing the scientific python stack. The workbench comes with

  • Scenario Discovery, a model driven approach to scenario development

  • Feature Scoring, a poor man’s alternative to global sensitivity analysis

  • Dimensional stacking, a quick visual approach drawing on feature scoring to enable scenario discovery. This approach has received limited attention in the literature. The implementation in the workbench replaces the rule mining approach with a feature scoring approach.

  • Regional sensitivity analysis

Scenario Discovery

A detailed discussion on scenario discovery can be found in an earlier blogpost. For completeness, I provide a code snippet here. Compared to the previous blog post, there is one small change. The library mpld3 is currently not being maintained and broken on Python 3.5. and higher. To still utilize the interactive exploration of the trade offs within the notebook, one could use the interactive back-end (% matplotlib notebook).

[4]:
from ema_workbench.analysis import prim

x = experiments
y = outcomes["max_P"] < 0.8
prim_alg = prim.Prim(x, y, threshold=0.8)
box1 = prim_alg.find_box()
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] 5000 points remaining, containing 510 cases of interest
[MainProcess/INFO] mean: 0.9807692307692307, mass: 0.052, coverage: 0.5, density: 0.9807692307692307 restricted_dimensions: 3
[5]:
box1.show_tradeoff()
plt.show()
_images/indepth_tutorial_open-exploration_8_0.png

We can inspect any of the points on the trade off curve using the inspect method. As shown, we can show the results either in a table format or in a visual format.

[6]:
box1.inspect(43)
box1.inspect(43, style="graph")
plt.show()
coverage    0.794118
density     0.794118
id                43
mass           0.102
mean        0.794118
res_dim            2
Name: 43, dtype: object

     box 43
        min       max                        qp values
b  0.362596  0.449742  [1.5372285485377317e-164, -1.0]
q  3.488834  4.498362   [1.3004137205191903e-88, -1.0]

_images/indepth_tutorial_open-exploration_10_1.png
[7]:
box1.show_pairs_scatter(43)
plt.show()
_images/indepth_tutorial_open-exploration_11_0.png

feature scoring

Feature scoring is a family of techniques often used in machine learning to identify the most relevant features to include in a model. This is similar to one of the use cases for global sensitivity analysis, namely factor prioritisation. The main advantage of feature scoring techniques is that they impose no specific constraints on the experimental design, while they can handle real valued, integer valued, and categorical valued parameters. The workbench supports multiple techniques, the most useful of which generally is extra trees (Geurts et al. 2006).

For this example, we run feature scoring for each outcome of interest. We can also run it for a specific outcome if desired. Similarly, we can choose if we want to run in regression mode or classification mode. The later is applicable if the outcome is a categorical variable and the results should be interpreted similar to regional sensitivity analysis results. For more details, see the documentation.

[8]:
from ema_workbench.analysis import feature_scoring

x = experiments
y = outcomes

fs = feature_scoring.get_feature_scores_all(x, y)
sns.heatmap(fs, cmap="viridis", annot=True)
plt.show()
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] model dropped from analysis because only a single category
[MainProcess/INFO] model dropped from analysis because only a single category
_images/indepth_tutorial_open-exploration_13_1.png

From the results, we see that max_P is primarily influenced by b, while utility is driven by delta, for inertia and reliability the situation is a little bit less clear cut.

The foregoing feature scoring was using the raw values of the outcomes. However, in scenario discovery applications, we are typically dealing with a binary clasification. This might produce slightly different results as demonstrated below

[11]:
from ema_workbench.analysis import RuleInductionType

x = experiments
y = outcomes["max_P"] < 0.8

fs, alg = feature_scoring.get_ex_feature_scores(x, y, mode=RuleInductionType.CLASSIFICATION)
fs.sort_values(ascending=False, by=1)
[MainProcess/INFO] model dropped from analysis because only a single category
[11]:
1
0
b 0.458094
q 0.310989
mean 0.107732
delta 0.051212
stdev 0.047653
policy 0.005475
w1 0.004815
r2 0.003658
r1 0.003607
c1 0.003467
c2 0.003299

Here we ran extra trees feature scoring on a binary vector for max_P. the b parameter is still important, similar to in the previous case, but the introduction of the binary classifiaction now also highlights some addtional parameters as being potentially relevant.

dimensional stacking

Dimensional stacking was suggested as a more visual approach to scenario discovery. It involves two steps: identifying the most important uncertainties that affect system behavior, and creating a pivot table using the most influential uncertainties. In order to do this, we first need, as in scenario discovery, specify the outcomes that are of interest. The creating of the pivot table involves binning the uncertainties. More details can be found in Suzuki et al. (2015) or by looking through the code in the workbench. Compared to Suzuki et al, the workbench uses feature scoring for determining the most influential uncertainties. The code is set up in a modular way so other approaches to global sensitivity analysis can easily be used as well if so desired.

[12]:
from ema_workbench.analysis import dimensional_stacking

x = experiments
y = outcomes["max_P"] < 0.8
dimensional_stacking.create_pivot_plot(x, y, 2, nbins=3)
plt.show()
[MainProcess/INFO] model dropped from analysis because only a single category
_images/indepth_tutorial_open-exploration_18_1.png

We can see from this visual that if B is high, while Q is high, we have a high concentration of cases where pollution stays below 0.8. The mean and stdev have some limited additional influence. By playing around with an alternative number of bins, or different number of layers, patterns can be coarsened or refined.

regional sensitivity analysis

A fourth approach for supporting scenario discovery is to perform a regional sensitivity analysis. The workbench implements a visual approach based on plotting the empirical CDF given a classification vector. Please look at section 3.4 in Pianosi et al (2016) for more details.

[13]:
from ema_workbench.analysis import regional_sa
from numpy.lib import recfunctions as rf

sns.set_style("white")

# model is the same across experiments
x = experiments.copy()
x = x.drop("model", axis=1)
y = outcomes["max_P"] < 0.8
fig = regional_sa.plot_cdfs(x, y)
sns.despine()
plt.show()
_images/indepth_tutorial_open-exploration_20_0.png

The above results clearly show that both B and Q are important. to a lesser extend, the mean also is relevant.

More advanced sampling techniques

The workbench can also be used for more advanced sampling techniques. To achieve this, it relies on SALib. On the workbench side, the only change is to specify the sampler we want to use. Next, we can use SALib directly to perform the analysis. To help with this, the workbench provides a convenience function for generating the problem dict which SALib provides. The example below focusses on performing SOBOL on the uncertainties, but we could do the exact same thing with the levers instead. The only changes required would be to set lever_sampling instead of uncertainty_sampling, and get the SALib problem dict based on the levers.

[17]:
from SALib.analyze import sobol
from ema_workbench import Samplers
from ema_workbench.em_framework.salib_samplers import get_SALib_problem

with MultiprocessingEvaluator(model) as evaluator:
    sa_results = evaluator.perform_experiments(scenarios=1000, uncertainty_sampling=Samplers.SOBOL)

experiments, outcomes = sa_results

problem = get_SALib_problem(model.uncertainties)
Si = sobol.analyze(problem, outcomes["max_P"], calc_second_order=True, print_to_console=False)
[MainProcess/INFO] pool started with 12 workers
/Users/jhkwakkel/opt/anaconda3/lib/python3.9/site-packages/SALib/sample/saltelli.py:94: UserWarning:
        Convergence properties of the Sobol' sequence is only valid if
        `N` (1000) is equal to `2^n`.

  warnings.warn(msg)
[MainProcess/INFO] performing 12000 scenarios * 1 policies * 1 model(s) = 12000 experiments
100%|████████████████████████████████████| 12000/12000 [02:37<00:00, 76.17it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool

We have now completed the sobol analysis and have calculated the metrics. What remains is to visualize the metrics. Which can be done as shown below, focussing on St and S1. The error bars indicate the confidence intervals.

[18]:
scores_filtered = {k: Si[k] for k in ["ST", "ST_conf", "S1", "S1_conf"]}
Si_df = pd.DataFrame(scores_filtered, index=problem["names"])

sns.set_style("white")
fig, ax = plt.subplots(1)

indices = Si_df[["S1", "ST"]]
err = Si_df[["S1_conf", "ST_conf"]]

indices.plot.bar(yerr=err.values.T, ax=ax)
fig.set_size_inches(8, 6)
fig.subplots_adjust(bottom=0.3)
plt.show()
_images/indepth_tutorial_open-exploration_25_0.png
[ ]:

MPIEvaluator: Run on multi-node HPC systems

The MPIEvaluator is a new addition to the EMAworkbench that allows experiment execution on multi-node systems, including high-performance computers (HPCs). This capability is particularly useful if you want to conduct large-scale experiments that require distributed processing. Under the hood, the evaluator leverages the MPIPoolExecutor from `mpi4py.futures <https://mpi4py.readthedocs.io/en/stable/mpi4py.futures.html>`__.

Limiations

  • Currently, the MPIEvaluator is only tested on Linux and macOS, while it might work on other operating systems.

  • Currently, the MPIEvaluator only works with Python-based models, and won’t work with file-based model types (like NetLogo or Vensim).

  • The MPIEvaluator is most useful for large-scale experiments, where the time spent on distributing the experiments over the cluster is negligible compared to the time spent on running the experiments. For smaller experiments, the overhead of distributing the experiments over the cluster might be significant, and it might be more efficient to run the experiments locally.

The MPIEvaluator is experimental and its interface and functionality might change in future releases. We appreciate feedback on the MPIEvaluator, share any thoughts about it at https://github.com/quaquel/EMAworkbench/discussions/311.

Contents

This tutorial will first show how to set up the environment, and then how to use the MPIEvaluator to run a model on a cluster. Finally, we’ll use the DelftBlue supercomputer as an example, to show how to run on a system which uses a SLURM scheduler.

1. Setting up the environment

To use the MPIEvaluator, MPI and mpi4py must be installed.

Installing MPI on Linux typically involves the installation of a popular MPI implementation such as OpenMPI or MPICH. Below are the instructions for installing OpenMPI:

1a. Installing OpenMPI

If you use conda, it might install MPI automatically along when installing mpi4py (see 1b).

You can install OpenMPI using you package manager. First, update your package repositories, and then install OpenMPI:

For Debian/Ubuntu: bash    sudo apt update    sudo apt install openmpi-bin libopenmpi-dev

For Fedora: bash    sudo dnf check-update    sudo dnf install openmpi openmpi-devel

For CentOS/RHEL: bash    sudo yum update    sudo yum install openmpi openmpi-devel

Many times, the necessary environment variables are automatically set up. You can check if this is the case by running the following command:

mpiexec --version

If not, add OpenMPI’s bin directory to your PATH:

export PATH=$PATH:/usr/lib/openmpi/bin
1b. Installing mpi4py

The python package mpi4py needs to installed as well. This is most easily done from PyPI, by running the following command:

pip install -U mpi4py

2. Creating a model

First, let’s set up a minimal model to test with. This can be any Python-based model, we’re using the `example_python.py <https://emaworkbench.readthedocs.io/en/latest/examples/example_python.html>`__ model from the EMA Workbench documentation as example.

We recommend crafting and testing your model in a separate Python file, and then importing it into your main script. This way, you can test your model without having to run it through the MPIEvaluator, and you can easily switch between running it locally and on a cluster.

2a. Define the model

First, we define a Python model function.

[ ]:
def some_model(x1=None, x2=None, x3=None):
    return {"y": x1 * x2 + x3}

Now, create the EMAworkbench model object, and specify the uncertainties and outcomes:

[ ]:
from ema_workbench import Model, RealParameter, ScalarOutcome, ema_logging, perform_experiments

if __name__ == "__main__":
    # We recommend setting pass_root_logger_level=True when running on a cluster, to ensure consistent log levels.
    ema_logging.log_to_stderr(level=ema_logging.INFO, pass_root_logger_level=True)

    ema_model = Model("simpleModel", function=some_model)  # instantiate the model

    # specify uncertainties
    ema_model.uncertainties = [
        RealParameter("x1", 0.1, 10),
        RealParameter("x2", -0.01, 0.01),
        RealParameter("x3", -0.01, 0.01),
    ]
    # specify outcomes
    ema_model.outcomes = [ScalarOutcome("y")]
2b. Test the model

Now, we can run the model locally to test it:

[ ]:
from ema_workbench import SequentialEvaluator

with SequentialEvaluator(ema_model) as evaluator:
    results = perform_experiments(ema_model, 100, evaluator=evaluator)

In this stage, you can test your model and make sure it works as expected. You can also check if everything is included in the results and do initial validation on the model, before scaling up to a cluster.

3. Run the model on a MPI cluster

Now that we have a working model, we can run it on a cluster. To do this, we need to import the MPIEvaluator class from the ema_workbench package, and instantiate it with our model. Then, we can use the perform_experiments function as usual, and the MPIEvaluator will take care of distributing the experiments over the cluster. Finally, we can save the results to a tarball, as usual.

[ ]:
# ema_example_model.py
from ema_workbench import (
    Model,
    RealParameter,
    ScalarOutcome,
    ema_logging,
    perform_experiments,
    MPIEvaluator,
    save_results,
)


def some_model(x1=None, x2=None, x3=None):
    return {"y": x1 * x2 + x3}


if __name__ == "__main__":
    ema_logging.log_to_stderr(level=ema_logging.INFO, pass_root_logger_level=True)

    ema_model = Model("simpleModel", function=some_model)

    ema_model.uncertainties = [
        RealParameter("x1", 0.1, 10),
        RealParameter("x2", -0.01, 0.01),
        RealParameter("x3", -0.01, 0.01),
    ]
    ema_model.outcomes = [ScalarOutcome("y")]

    # Note that we switch to the MPIEvaluator here
    with MPIEvaluator(ema_model) as evaluator:
        results = evaluator.perform_experiments(scenarios=10000)

    # Save the results
    save_results(results, "ema_example_model.tar.gz")

To run this script on a cluster, we need to use the mpiexec command:

mpiexec -n 1 python ema_example_model.py

This command will execute the ema_example_model.py Python script using MPI. MPI-specific configurations are inferred from default settings or any configurations provided elsewhere, such as in an MPI configuration file or additional flags to mpiexec (not shown in the provided command). The number of workers that will be spawned by the MPIEvaluator depends on the MPI universe size. The way this size can be controlled depends on which implementation of MPI you have. See the discussion in the docs of mpi4py for details.

The output of the script will be saved to the ema_mpi_test.tar.gz file, which can be loaded and analyzed later as usual.

Example: Running on the DelftBlue supercomputer (with SLURM)

As an example, we’ll show how to run the model on the DelftBlue supercomputer, which uses the SLURM scheduler. The DelftBlue supercomputer is a cluster of 218 nodes, each with 2 Intel Xeon Gold E5-6248R CPUs (48 cores total), 192 GB of RAM, and 480 GB of local SSD storage. The nodes are connected with a 100 Gbit/s Infiniband network.

These steps roughly follow theDelftBlue Crash-course for absolute beginners. If you get stuck, you can refer to that guide for more information.

1. Creating a SLURM script

First, you need to create a SLURM script. This is a bash script that will be executed on the cluster, and it will contain all the necessary commands to run your model. You can create a new file, for example slurm_script.sh, and add the following lines:

#!/bin/bash

#SBATCH --job-name="Python_test"
#SBATCH --time=00:02:00
#SBATCH --ntasks=10
#SBATCH --cpus-per-task=1
#SBATCH --partition=compute
#SBATCH --mem-per-cpu=4GB
#SBATCH --account=research-tpm-mas

module load 2023r1
module load openmpi
module load python
module load py-numpy
module load py-scipy
module load py-mpi4py
module load py-pip

pip install --user ema_workbench

mpiexec -n 1 python3 example_mpi_lake_model.py

Modify the script to suit your needs: - Set the --job-name to something descriptive. - Update the maximum --time to the expected runtime of your model. The job will be terminated if it exceeds this time limit. - Set the --ntasks to the number of cores you want to use. Each node in DelftBlue currently has 48 cores, so for example --ntasks=96 might use two nodes, but can also be distributed over more nodes. Note that using MPI involves quite some overhead. If you do not plan to use more than 48 cores, you might want to consider using the MultiprocessingEvaluator and request exclusive node access (see below). - Update the memory --mem-per-cpu to the amount of memory you need per core. Each node has 192 GB of memory, so you can use up to 4 GB per core. - Add --exclusive if you want to claim a full node for your job. In that case, specify --nodes next to --ntasks. Requesting exclusive access to one or more nodes will delay you scheduling time, because you need to wait for the full nodes to become available. - Set the --account to your project account. You can find this on the Accounting and Shares page of the DelftBlue docs.

See Submit Jobs at the DelftBlue docs for more information on the SLURM script configuration.

Then, you need to load the necessary modules. You can find the available modules on the DHPC modules page of the DelftBlue docs. In this example, we’re loading the 2023r1 toolchain, which includes Python 3.9, and then we’re loading the necessary Python packages.

You might want to install additional Python packages. You can do this with pip install -U --user <package>. Note that you need to use the --user flag, because you don’t have root access on the cluster. To install the EMA Workbench, you can use pip install -U --user ema_workbench. If you want to install a development branch, you can use pip install -e -U --user git+https://github.com/quaquel/EMAworkbench@<BRANCH>#egg=ema-workbench, where <BRANCH> is the name of the branch you want to install.

Finally, the script uses mpiexec to run Python script in a way that allows the MPIEvaluator to distribute the experiments over the cluster. The -n 1 argument meanst that we only start a single process. This main process in turn will spawn additional worker processes. The number of worker processes that will spawn defaults to the value of --ntasks - 1.

Note that the bash scripts (sh), including the slurm_script.sh we just created, need LF line endings. If you are using Windows, line endings are CRLF by default, and you need to convert them to LF. You can do this with most text editors, like Notepad++ or Atom for example.

1. Setting up the environment

First, you need to log in on DelftBlue. As an employee, you can login from the command line with:

ssh <netid>@login.delftblue.tudelft.nl

where <netid> is your TU Delft netid. You can also use an SSH client such as PuTTY.

As a student, you need to jump though an extra hoop:

ssh -J <netid>@student-linux.tudelft.nl <netid>@login.delftblue.tudelft.nl

Note: Below are the commands for students. If you are an employee, you need to remove the -J <netid>@student-linux.tudelft.nl from all commands below.

Once you’re logged in, you want to jump to your scratch directory (note it’s not but is not backed up).

cd ../../scratch/<netid>

Create a new directory for this tutorial, for example mkdir ema_mpi_test and then cd ema_mpi_test

Then, you want to send your Python file and SLURM script to DelftBlue. Open a new command line terminal, and then you can do this with scp:

scp -J <netid>@student-linux.tudelft.nl ema_example_model.py slurm_script.sh <netid>@login.delftblue.tudelft.nl:/scratch/<netid>/ema_mpi_test

Before scheduling the SLURM script, we first have to make it executable:

chmod +x slurm_script.sh

Then we can schedule it:

sbatch slurm_script.sh

Now it’s scheduled!

You can check the status of your job with squeue:

squeue -u <netid>

You might want to inspect the log file, which is created by the SLURM script. You can do this with cat:

cat slurm-<jobid>.out

where <jobid> is the job ID of your job, which you can find with squeue.

When the job is finished, we can download the tarball with our results. Open the command line again (can be the same one as before), and you can copy the results back to your local machine with scp:

scp -J <netid>@student-linux.tudelft.nl <netid>@login.delftblue.tudelft.nl:/scratch/<netid>/ema_mpi_test/ema_mpi_test.pickle .

Finally, we can clean up the files on DelftBlue, to avoid cluttering the scratch directory:

cd ..
rm -rf "ema_mpi_test"
[ ]:

Examples

Output space exploration

Output space exploration is a form of optimization based on novelty search. In the workbench, it relies on the optimization functionality. You can use output space exploration by passing an instance of either OutputSpaceExploration or AutoAdaptiveOutputSpaceExploration as algorithm to evaluator.optimize. The fact that output space exploration uses the optimization functionality also implies that we can track convergence in a similar manner. Epsilon progress is defined identical, but evidently other metrics such as hypervolume might not be applicable in the context of output space exploration.

The difference between OutputSpaceExploration and AutoAdaptiveOutputSpaceExploration is in the evolutionary operators. AutoAdaptiveOutputSpaceExploration uses auto adaptive operator selection as implemented in the BORG MOEA, while OutputSpaceExploration by default uses Simulated Binary crossover with polynomial mutation. Injection of new solutions is handled through auto adaptive population sizing and periodically starting with a new population if search is stalling. Below, examples are given of how to use both algorithms, as well as a a quick visualization of the convergence dynamics.

For this example, we are using a stylized case study frequently used to develop and test decision making under deep uncertainty methods: the shallow lake problem. In this problem, a city has to decide on the amount of pollution they are going to put into a shallow lake per year. The city gets benefits from polluting the lake, but if an unknown threshold is crossed, the lake permanently shifts to an undesirable polluted state. For further details on this case study, see for example Quinn et al, 2017 and Bartholomew et al, 2021.

[1]:
from outputspace_exploration_lake_model import lake_problem

from ema_workbench import (
    Model,
    RealParameter,
    ScalarOutcome,
    Constant,
    ema_logging,
    MultiprocessingEvaluator,
    Policy,
    SequentialEvaluator,
    OutputSpaceExploration,
)
[2]:
ema_logging.log_to_stderr(ema_logging.INFO)

# instantiate the model
lake_model = Model("lakeproblem", function=lake_problem)
lake_model.time_horizon = 100

# specify uncertainties
lake_model.uncertainties = [
    RealParameter("b", 0.1, 0.45),
    RealParameter("q", 2.0, 4.5),
    RealParameter("mean", 0.01, 0.05),
    RealParameter("stdev", 0.001, 0.005),
    RealParameter("delta", 0.93, 0.99),
]

# set levers, one for each time step
lake_model.levers = [RealParameter(str(i), 0, 0.1) for i in range(lake_model.time_horizon)]

# specify outcomes
# output space exploration

lake_model.outcomes = [
    ScalarOutcome("max_P", kind=ScalarOutcome.MAXIMIZE),
    ScalarOutcome("utility", kind=ScalarOutcome.MAXIMIZE),
    ScalarOutcome("inertia", kind=ScalarOutcome.MAXIMIZE),
    ScalarOutcome("reliability", kind=ScalarOutcome.MAXIMIZE),
]

# override some of the defaults of the model
lake_model.constants = [Constant("alpha", 0.41), Constant("nsamples", 150)]

Above, we have setup the lake problem, specified the uncertainties, policy levers, outcomes of interest and (optionally) some constants. We are now ready to run output space exploration on this model. For this, we use the default optimization functionality of the world bank, but pass the OutputSpaceExploration class. Below, we are running output space exploration over the uncertainties, which implies we need to pass a reference policy. We also rerun the algorithm for 5 different seeds. We can track convergence of the outputspace epxloration by tracking \(\epsilon\)-progress.

The grid_spec keyword argument, which is specific to output space exploration specifies the grid structure we are imposing on the output space. There must be a tuple for each outcome of interest. Each tuple specifies the minimum value, the maximum value and the size of the grid cell on this outcome dimension.

[3]:
from ema_workbench.em_framework.optimization import EpsilonProgress

# generate some default policy
reference = Policy("nopolicy", **{l.name: 0.02 for l in lake_model.levers})
n_seeds = 5

convergences = []

with MultiprocessingEvaluator(lake_model) as evaluator:
    for _ in range(n_seeds):
        convergence_metrics = [
            EpsilonProgress(),
        ]
        res, convergence = evaluator.optimize(
            algorithm=OutputSpaceExploration,
            grid_spec=[(0, 12, 0.5), (0, 0.6, 0.05), (0, 1, 0.1), (0, 1, 0.1)],
            nfe=25000,
            searchover="uncertainties",
            reference=reference,
            convergence=convergence_metrics,
        )
        convergences.append(convergence)
[MainProcess/INFO] pool started with 10 workers
28893it [01:37, 296.69it/s]
[MainProcess/INFO] optimization completed, found 1425 solutions
26750it [01:30, 294.10it/s]
[MainProcess/INFO] optimization completed, found 1374 solutions
28290it [01:35, 297.43it/s]
[MainProcess/INFO] optimization completed, found 1385 solutions
28332it [01:35, 296.93it/s]
[MainProcess/INFO] optimization completed, found 1388 solutions
26712it [01:31, 290.48it/s]
[MainProcess/INFO] optimization completed, found 1362 solutions
[MainProcess/INFO] terminating pool
[4]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")
fig, ax = plt.subplots()
for convergence in convergences:
    ax.plot(convergence.nfe, convergence.epsilon_progress)

ax.set_xlabel("nfe")
ax.set_ylabel("$\epsilon$-progress")
plt.show()
_images/examples_outputspace_exploration_lakeproblem_5_0.png

The figure above shows the \(\epsilon\)-progress per nfe for each of the five random seeds. As can be seen, the algorithm converges to essentially the same amount of epsilon progress across the seeds. This suggests that the algorithm has converged and thus found all possible output grid cells.

auto adaptive operator selection

The foregoing example used the default version of the output space exploration algorithm. This algorithm uses Simulated Binary Crossover (SBX) with Polynomial Mutation (PM) as its evolutionary operators. However, it is conceivable that for some problems other evolutionary operators might be more suitable. You can pass your own combination of operators if desired using platypus. For general use, however, it can sometimes be easier to let the algorithm itself figure out which operators to use. To support this, we also provide an AutoAdaptiveOutputSpaceExploration algorithm. This algorithm uses Auto Adaptive Operator selection as also used in the BORG algorithm. Note that this algorithm is limited to RealParameters only.

[5]:
from ema_workbench.em_framework.outputspace_exploration import AutoAdaptiveOutputSpaceExploration
from ema_workbench.em_framework.optimization import OperatorProbabilities


convergences = []

with MultiprocessingEvaluator(lake_model) as evaluator:
    for _ in range(5):
        convergence_metrics = [
            EpsilonProgress(),
            OperatorProbabilities("SBX", 0),
            OperatorProbabilities("PCX", 1),
            OperatorProbabilities("DE", 2),
            OperatorProbabilities("UNDX", 3),
            OperatorProbabilities("SPX", 4),
            OperatorProbabilities("UM", 5),
        ]
        res, convergence = evaluator.optimize(
            algorithm=AutoAdaptiveOutputSpaceExploration,
            grid_spec=[(0, 12, 0.5), (0, 0.6, 0.05), (0, 1, 0.1), (0, 1, 0.1)],
            nfe=25000,
            searchover="uncertainties",
            reference=reference,
            variator=None,
            convergence=convergence_metrics,
        )
        convergences.append(convergence)
[MainProcess/INFO] pool started with 10 workers
30388it [01:48, 279.93it/s]
[MainProcess/INFO] optimization completed, found 1468 solutions
30150it [01:46, 282.56it/s]
[MainProcess/INFO] optimization completed, found 1450 solutions
29988it [01:48, 275.21it/s]
[MainProcess/INFO] optimization completed, found 1446 solutions
28816it [01:40, 286.94it/s]
[MainProcess/INFO] optimization completed, found 1417 solutions
25088it [01:28, 284.70it/s]
[MainProcess/INFO] optimization completed, found 1442 solutions
[MainProcess/INFO] terminating pool
[6]:
sns.set_style("whitegrid")
fig, ax = plt.subplots()
for convergence in convergences:
    ax.plot(convergence.nfe, convergence.epsilon_progress)

ax.set_xlabel("nfe")
ax.set_ylabel("$\epsilon$-progress")
plt.show()
_images/examples_outputspace_exploration_lakeproblem_8_0.png

Above, you see the \(\epsilon\)-convergence for the auto adaptive operator selection version of the outputspace algorithm. Like the normal version, it converges to essential the same number across the different seeds. Note also that the \(\epsilon\)-progress is slightly higher, and also the total number of identified solutions (see log messages) is higher. That suggests that for even for a relatively simple problem, there is value in using the auto adaptive operator selection.

In case of AutoAdaptiveOutputSpaceExploration it can sometimes also be revealing to check the dynamics of the operators over the evolution. This is shown below separately for each seed. For the meaning of the abbreviations, check the BORG algorithm.

[7]:
for convergence in convergences:
    fig, ax = plt.subplots()
    ax.plot(convergence.nfe, convergence.SBX, label="SBX")
    ax.plot(convergence.nfe, convergence.PCX, label="PCX")
    ax.plot(convergence.nfe, convergence.DE, label="DE")
    ax.plot(convergence.nfe, convergence.UNDX, label="UNDX")
    ax.plot(convergence.nfe, convergence.SPX, label="SPX")
    ax.plot(convergence.nfe, convergence.UM, label="UM")
    ax.legend()
plt.show()
_images/examples_outputspace_exploration_lakeproblem_10_0.png
_images/examples_outputspace_exploration_lakeproblem_10_1.png
_images/examples_outputspace_exploration_lakeproblem_10_2.png
_images/examples_outputspace_exploration_lakeproblem_10_3.png
_images/examples_outputspace_exploration_lakeproblem_10_4.png

LHS

for comparison, let’s also generate a Latin Hypercube Sample and compare the results. Since the output space exploration return close to 1500 solutions, we use 1500 for LHS as well.

[8]:
with MultiprocessingEvaluator(lake_model) as evaluator:
    experiments, outcomes = evaluator.perform_experiments(scenarios=1500, policies=reference)
[MainProcess/INFO] pool started with 10 workers
[MainProcess/INFO] performing 1500 scenarios * 1 policies * 1 model(s) = 1500 experiments
100%|█████████████████████████████████████| 1500/1500 [00:04<00:00, 331.08it/s]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool

comparison

Below we compare the LHS with the last seed of the autoadaptive algorithm. Remember, the purpose of output space exploration is to identify the kinds of behavior that a model can show given a set of uncertain parameters. So, we are creating a pairwise scatter plot of the outcomes for both output space exploration (in blue) and the latin hypercube sampling (in orange).

[9]:
outcomes = pd.DataFrame(outcomes)
outcomes["sampling"] = "LHS"
[10]:
ose = res.iloc[:, 5::].copy()
ose["sampling"] = "OSE"
[11]:
data = pd.concat([ose, outcomes], axis=0, ignore_index=True)
[12]:
# sns.pairplot(data, hue='sampling')
sns.pairplot(data, hue="sampling", vars=data.columns[0:4])
plt.show()
_images/examples_outputspace_exploration_lakeproblem_17_0.png

As you can clearly see, the output space exploration algorithm has uncovered behaviors that were not seen in the 1500 sample LHS.

[ ]:

Performing Scenario Discovery in Python

The purpose of example is to demonstrate how one can do scenario discovery in python. I will demonstrate how we can perform both PRIM in an interactive way, as well as briefly show how to use CART, which is also available in the exploratory modeling workbench. There is ample literature on both CART and PRIM and their relative merits for use in scenario discovery. So I won’t be discussing that here in any detail.

In order to demonstrate the use of the exploratory modeling workbench for scenario discovery, I am using a published example. I am using the data used in the original article by Ben Bryant and Rob Lempert where they first introduced 2010. Ben Bryant kindly made this data available and allowed me to share it. The data comes as a csv file. We can import the data easily using pandas. columns 2 up to and including 10 contain the experimental design, while the classification is presented in column 15

This example is a slightly updated version of a blog post on https://waterprogramming.wordpress.com/2015/08/05/scenario-discovery-in-python/

[1]:
import pandas as pd

data = pd.read_csv("./data/bryant et al 2010 data.csv", index_col=False)
x = data.iloc[:, 2:11]
y = data.iloc[:, 15].values

The exploratory modeling workbench comes with a separate analysis package. This analysis package contains prim. So let’s import prim. The workbench also has its own logging functionality. We can turn this on to get some more insight into prim while it is running.

[2]:
from ema_workbench.analysis import prim
from ema_workbench.util import ema_logging

ema_logging.log_to_stderr(ema_logging.INFO);

Next, we need to instantiate the prim algorithm. To mimic the original work of Ben Bryant and Rob Lempert, we set the peeling alpha to 0.1. The peeling alpha determines how much data is peeled off in each iteration of the algorithm. The lower the value, the less data is removed in each iteration. The minimum coverage threshold that a box should meet is set to 0.8. Next, we can use the instantiated algorithm to find a first box.

[3]:
prim_alg = prim.Prim(x, y, threshold=0.8, peel_alpha=0.1)
box1 = prim_alg.find_box()
[MainProcess/INFO] 882 points remaining, containing 89 cases of interest
[MainProcess/INFO] mean: 1.0, mass: 0.05102040816326531, coverage: 0.5056179775280899, density: 1.0 restricted_dimensions: 6

Let’s investigate this first box is some detail. A first thing to look at is the trade off between coverage and density. The box has a convenience function for this called show_tradeoff.

[4]:
box1.show_tradeoff()
plt.show()
_images/examples_scenario_discovery_7_0.png

Since we are doing this analysis in a notebook, we can take advantage of the interactivity that the browser offers. A relatively recent addition to the python ecosystem is the library altair. Altair can be used to create interactive plots for use in a browser. Altair is an optional dependency for the workbench. If available, we can create the following visual.

[5]:
box1.inspect_tradeoff()
[5]:
_images/examples_scenario_discovery_9_0.png

Here we can interactively explore the boxes associated with each point in the density coverage trade-off. It also offers mouse overs for the various points on the trade off curve. Given the id of each point, we can also use the workbench to manually inpect the peeling trajectory. Following Bryant & Lempert, we inspect box 21.

[6]:
box1.resample(21)
[MainProcess/INFO] resample 0
[MainProcess/INFO] resample 1
[MainProcess/INFO] resample 2
[MainProcess/INFO] resample 3
[MainProcess/INFO] resample 4
[MainProcess/INFO] resample 5
[MainProcess/INFO] resample 6
[MainProcess/INFO] resample 7
[MainProcess/INFO] resample 8
[MainProcess/INFO] resample 9
[6]:
reproduce coverage reproduce density
Total biomass 100.0 100.0
Demand elasticity 100.0 100.0
Biomass backstop price 100.0 100.0
Cellulosic cost 90.0 90.0
Cellulosic yield 20.0 20.0
Electricity coproduction 20.0 20.0
Feedstock distribution 0.0 0.0
Oil elasticity 0.0 0.0
oil supply shift 0.0 0.0
[7]:
box1.inspect(21)
box1.inspect(21, style="graph")
plt.show()
coverage     0.752809
density      0.770115
id                 21
mass        0.0986395
mean         0.770115
res_dim             4
Name: 21, dtype: object

                            box 21
                               min         max                       qp values
Total biomass           450.000000  755.799988   [-1.0, 4.716968553178765e-06]
Demand elasticity        -0.422000   -0.202000  [1.1849299115762218e-16, -1.0]
Biomass backstop price  150.049995  199.600006   [3.515112530263049e-11, -1.0]
Cellulosic cost          72.650002  133.699997     [0.15741333528927348, -1.0]

_images/examples_scenario_discovery_12_1.png

If one where to do a detailed comparison with the results reported in the original article, one would see small numerical differences. These differences arise out of subtle differences in implementation. The most important difference is that the exploratory modeling workbench uses a custom objective function inside prim which is different from the one used in the scenario discovery toolkit. Other differences have to do with details about the hill climbing optimization that is used in prim, and in particular how ties are handled in selected the next step. The differences between the two implementations are only numerical, and don’t affect the overarching conclusions drawn from the analysis.

Let’s select this 21 box, and get a more detailed view of what the box looks like. Following Bryant et al., we can use scatter plots for this.

[8]:
box1.select(21)
fig = box1.show_pairs_scatter(21)
plt.show()
_images/examples_scenario_discovery_14_0.png

Because the last restriction is not significant, we can choose to drop this restriction from the box.

[9]:
box1.drop_restriction("Cellulosic cost")
box1.inspect(style="graph")
plt.show()
_images/examples_scenario_discovery_16_0.png

We have now found a first box that explains over 75% of the cases of interest. Let’s see if we can find a second box that explains the remainder of the cases.

[10]:
box2 = prim_alg.find_box()
[MainProcess/INFO] 787 points remaining, containing 21 cases of interest
[MainProcess/INFO] box does not meet threshold criteria, value is 0.3541666666666667, returning dump box

As we can see, we are unable to find a second box. The best coverage we can achieve is 0.35, which is well below the specified 0.8 threshold. Let’s look at the final overal results from interactively fitting PRIM to the data. For this, we can use to convenience functions that transform the stats and boxes to pandas data frames.

[11]:
prim_alg.stats_to_dataframe()
[11]:
coverage density mass res_dim
box 1 0.764045 0.715789 0.10771 3
box 2 0.235955 0.026684 0.89229 0
[12]:
prim_alg.boxes_to_dataframe()
[12]:
box 1 box 2
min max min max
Demand elasticity -0.422000 -0.202000 -0.8 -0.202000
Biomass backstop price 150.049995 199.600006 90.0 199.600006
Total biomass 450.000000 755.799988 450.0 997.799988

CART

The way of interacting with CART is quite similar to how we setup the prim analysis. We import cart from the analysis package. We instantiate the algorithm, and next fit CART to the data. This is done via the build_tree method.

[13]:
from ema_workbench.analysis import cart

cart_alg = cart.CART(x, y, 0.05)
cart_alg.build_tree()
/Users/jhkwakkel/miniconda3/lib/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__
  return f(*args, **kwds)
/Users/jhkwakkel/miniconda3/lib/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__
  return f(*args, **kwds)
/Users/jhkwakkel/miniconda3/lib/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__
  return f(*args, **kwds)

Now that we have trained CART on the data, we can investigate its results. Just like PRIM, we can use stats_to_dataframe and boxes_to_dataframe to get an overview.

[14]:
cart_alg.stats_to_dataframe()
[14]:
coverage density mass res dim
box 1 0.011236 0.021739 0.052154 2
box 2 0.000000 0.000000 0.546485 2
box 3 0.000000 0.000000 0.103175 2
box 4 0.044944 0.090909 0.049887 2
box 5 0.224719 0.434783 0.052154 2
box 6 0.112360 0.227273 0.049887 3
box 7 0.000000 0.000000 0.051020 3
box 8 0.606742 0.642857 0.095238 2
[15]:
cart_alg.boxes_to_dataframe()
[15]:
box 1 box 2 box 3 box 4 box 5 box 6 box 7 box 8
min max min max min max min max min max min max min max min max
Cellulosic yield 80.0 81.649998 81.649998 99.900002 80.000 99.900002 80.000000 99.900002 80.000 99.900002 80.0000 89.049999 89.049999 99.900002 80.000000 99.900002
Demand elasticity -0.8 -0.439000 -0.800000 -0.439000 -0.439 -0.316500 -0.439000 -0.316500 -0.439 -0.316500 -0.3165 -0.202000 -0.316500 -0.202000 -0.316500 -0.202000
Biomass backstop price 90.0 199.600006 90.000000 199.600006 90.000 144.350006 144.350006 170.750000 170.750 199.600006 90.0000 148.300003 90.000000 148.300003 148.300003 199.600006

Alternatively, we might want to look at the classification tree directly. For this, we can use the show_tree method.

[16]:
fig = cart_alg.show_tree()
fig.set_size_inches((18, 12))
plt.show()
_images/examples_scenario_discovery_28_0.png
[ ]:

Vadere EMA connector demo

This notebook demonstrates the setup of a Vadere model connection. It uses an example Vadere model, made with Vadere 2.1. Before running the code yourself, please acquire a copy of Vadere from the official Vadere website. For more information on the use of the EMA Workbench, please refer to the official EMA Workbench documentation. This demo is based on the code provided on the documentation pages.

Step 1: imports

The first step is to import the needed modules. This depends on the use and the type of analysis that is intended. The most important one here is the VadereModel from the model connectors. As said, please refer for more information on this to the official EMA Workbench documentation.

[ ]:
from ema_workbench import (
    perform_experiments,
    RealParameter,
    ema_logging,
    MultiprocessingEvaluator,
    Samplers,
    ScalarOutcome,
    IntegerParameter,
    RealParameter,
)
from ema_workbench.connectors.vadere import VadereModel
import pandas as pd
import numpy as np

Step 2: Setting up the model

In this demonstration, we use a simple Vadere model example. The model includes two sources and one target, and spawns a number of pedestrians every second. The model uses the OSM locomotion implementation, and collects the following:

  • The average evacuation time -> stored in evacuationTime.txt

  • The average speed of all pedestrians each time step -> stored in speed.csv

Note that the EMA connector assumes .txt files for scalar outcomes and .csv files for time series outcomes. Pleasure use these file types, and declare them explicitly in processor_files as shown below.

[2]:
# This model saves scalar results to a density.txt and speed.txt file.
# Please acquire your own copy of Vadere, and place the vadere-console.jar in your model/scenarios directory
# Note that the vadere model files and the console.jar should always be placed in a separate wd as the python runfile
model = VadereModel(
    "demoModel",
    vadere_jar="vadere-console.jar",
    processor_files=["evacuationTime.txt", "speed.csv"],
    model_file="demo.scenario",
    wd="models/vadereModel/scenarios/",
)
[3]:
# set the number of replications to handle model stochasticity
model.replications = 5

Note that for specifying model uncertainties (and potential levers), the Vadere model class can change any variable present in the model file (Vadere scenario). To realize this, an exact location to the variable of interest in the Vadere scenario file has to be specified. Vadere scenario files follow a nested dictionary structure. Therefore, the exact location of the variable should be passed in a list of argumentes, passed as one string.

See the example below, that variates the following:

  • The number of spawned pedestrians from both sources

  • The mean of the free flow speed distribution

[4]:
model.uncertainties = [
    IntegerParameter(
        name="spawnNumberA",
        lower_bound=1,
        upper_bound=50,
        variable_name=['("scenario", "topography", "sources", 0, "spawnNumber")'],
    ),
    IntegerParameter(
        name="spawnNumberB",
        lower_bound=1,
        upper_bound=50,
        variable_name=['("scenario", "topography", "sources", 1, "spawnNumber")'],
    ),
    RealParameter(
        name="μFreeFlowSpeed",
        lower_bound=0.7,
        upper_bound=1.5,
        variable_name=[
            '("scenario", "topography", "attributesPedestrian", "speedDistributionMean")'
        ],
    ),
]

The model outcomes can be specified by passing the exact name as present in the output file (speed.txt here). The naming convention depends on the used Vadere data processors, but usually follows the name + id of the processor. When in doubt, it is advised to do a demo Vadere run using the Vadere software and to inspect the generated output files. Note that we take the mean of the outcomes here, since we specified multiple replications. Note that we now only focus on scalar outcomes. See the end of this demo for timeseries.

[5]:
model.outcomes = [
    ScalarOutcome(name="evacuationTime", variable_name="meanEvacuationTime-PID8", function=np.mean)
]

Step 3: Performing experiments

The last step is to perform experiment with the Vadere model. Both sequential runs as runs in parallel are supported. Note however that a Vadere run can use a lot of RAM, and using all available CPU cores can lead to performance issues in some cases. Additionally, one Vadere run is by default already multithreaded. It is therefore recommended to not use the full set of available processes.

[6]:
# enable EMA logging
ema_logging.log_to_stderr(ema_logging.INFO)
[6]:
<Logger EMA (DEBUG)>
[7]:
# run in sequential 2 experiments
results_sequential = perform_experiments(model, scenarios=2, uncertainty_sampling=Samplers.LHS)
[MainProcess/INFO] performing 2 scenarios * 1 policies * 1 model(s) = 2 experiments
  0%|                                                    | 0/2 [00:00<?, ?it/s][MainProcess/INFO] performing experiments sequentially
100%|████████████████████████████████████████████| 2/2 [01:03<00:00, 31.75s/it]
[MainProcess/INFO] experiments finished
[8]:
# run 6 experiments in parallel
with MultiprocessingEvaluator(model, n_processes=-1) as evaluator:
    experiments, outcomes = evaluator.perform_experiments(
        scenarios=4, uncertainty_sampling=Samplers.LHS
    )
[MainProcess/INFO] pool started with 4 workers
[MainProcess/INFO] performing 4 scenarios * 1 policies * 1 model(s) = 4 experiments
100%|████████████████████████████████████████████| 4/4 [00:37<00:00,  9.39s/it]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool
[ForkPoolWorker-1/INFO] finalizing
[ForkPoolWorker-4/INFO] finalizing
[ForkPoolWorker-3/INFO] finalizing
[ForkPoolWorker-2/INFO] finalizing

Inspect the results

we can now look at the results directly. For more extensive exploratory analysis methods, please refer to the official EMA Workbench documentation.

[9]:
experiments.head()
[9]:
spawnNumberA spawnNumberB μFreeFlowSpeed scenario policy model
0 25.0 33.0 0.838960 2 None demoModel
1 46.0 19.0 1.413804 3 None demoModel
2 28.0 11.0 0.942684 4 None demoModel
3 2.0 50.0 1.111305 5 None demoModel
[10]:
pd.DataFrame(outcomes).head()
[10]:
evacuationTime
0 14.268966
1 9.990154
2 11.673846
3 11.155385

Using time series data

It is additionally possible to specify time series output data. See below for an example on how to do this.

Note that this example now uses a different model class, since we do not want to average multiple scalar outcomes but are rather interested in a time series outcome. Hence, the SingleReplicationVadereModel is used. Now, the timeStep and average speeds are collected every step.

[11]:
from ema_workbench import TimeSeriesOutcome
from ema_workbench.connectors.vadere import SingleReplicationVadereModel
[12]:
# This model saves scalar results to a density.txt and speed.txt file.
# Please acquire your own copy of Vadere, and place the vadere-console.jar in your model/scenarios directory
# Note that the vadere model files should always be placed in a separate wd as the python runfile
model = SingleReplicationVadereModel(
    "demoModel",
    vadere_jar="vadere-console.jar",
    processor_files=["evacuationTime.txt", "speed.csv"],
    model_file="demo.scenario",
    wd="models/vadereModel/scenarios/",
)
[13]:
model.uncertainties = [
    IntegerParameter(
        name="spawnNumberA",
        lower_bound=1,
        upper_bound=50,
        variable_name=['("scenario", "topography", "sources", 0, "spawnNumber")'],
    ),
    IntegerParameter(
        name="spawnNumberB",
        lower_bound=1,
        upper_bound=50,
        variable_name=['("scenario", "topography", "sources", 1, "spawnNumber")'],
    ),
    RealParameter(
        name="μFreeFlowSpeed",
        lower_bound=0.7,
        upper_bound=1.5,
        variable_name=[
            '("scenario", "topography", "attributesPedestrian", "speedDistributionMean")'
        ],
    ),
]
[14]:
model.outcomes = [TimeSeriesOutcome(name="speedTime", variable_name="areaSpeed-PID5")]
[15]:
# Run experiments in parallel with all logical CPU cores except one
with MultiprocessingEvaluator(model, n_processes=-1) as evaluator:
    experiments, outcomes = evaluator.perform_experiments(
        scenarios=4, uncertainty_sampling=Samplers.LHS
    )
[MainProcess/INFO] pool started with 4 workers
[MainProcess/INFO] performing 4 scenarios * 1 policies * 1 model(s) = 4 experiments
100%|████████████████████████████████████████████| 4/4 [00:07<00:00,  1.90s/it]
[MainProcess/INFO] experiments finished
[MainProcess/INFO] terminating pool
[ForkPoolWorker-8/INFO] finalizing
[ForkPoolWorker-5/INFO] finalizing
[ForkPoolWorker-6/INFO] finalizing
[ForkPoolWorker-7/INFO] finalizing
[16]:
from ema_workbench.analysis.plotting import lines

lines(experiments, outcomes, outcomes_to_show="speedTime")
[16]:
(<Figure size 432x288 with 1 Axes>,
 {'speedTime': <AxesSubplot:title={'center':'speedTime'}, xlabel='Time', ylabel='speedTime'>})
_images/examples_vadere_demo_23_1.png

example_eijgenraam.py

  1"""
  2
  3"""
  4
  5# Created on 12 Mar 2020
  6#
  7# .. codeauthor::  jhkwakkel
  8
  9import bisect
 10import functools
 11import math
 12import operator
 13
 14import numpy as np
 15import scipy as sp
 16
 17from ema_workbench import Model, RealParameter, ScalarOutcome, ema_logging, MultiprocessingEvaluator
 18
 19##==============================================================================
 20## Implement the model described by Eijgenraam et al. (2012)
 21# code taken from Rhodium Eijgenraam example
 22##------------------------------------------------------------------------------
 23
 24# Parameters pulled from the paper describing each dike ring
 25params = ("c", "b", "lam", "alpha", "eta", "zeta", "V0", "P0", "max_Pf")
 26raw_data = {
 27    10: (16.6939, 0.6258, 0.0014, 0.033027, 0.320, 0.003774, 1564.9, 0.00044, 1 / 2000),
 28    11: (42.6200, 1.7068, 0.0000, 0.032000, 0.320, 0.003469, 1700.1, 0.00117, 1 / 2000),
 29    15: (125.6422, 1.1268, 0.0098, 0.050200, 0.760, 0.003764, 11810.4, 0.00137, 1 / 2000),
 30    16: (324.6287, 2.1304, 0.0100, 0.057400, 0.760, 0.002032, 22656.5, 0.00110, 1 / 2000),
 31    22: (154.4388, 0.9325, 0.0066, 0.070000, 0.620, 0.002893, 9641.1, 0.00055, 1 / 2000),
 32    23: (26.4653, 0.5250, 0.0034, 0.053400, 0.800, 0.002031, 61.6, 0.00137, 1 / 2000),
 33    24: (71.6923, 1.0750, 0.0059, 0.043900, 1.060, 0.003733, 2706.4, 0.00188, 1 / 2000),
 34    35: (49.7384, 0.6888, 0.0088, 0.036000, 1.060, 0.004105, 4534.7, 0.00196, 1 / 2000),
 35    38: (24.3404, 0.7000, 0.0040, 0.025321, 0.412, 0.004153, 3062.6, 0.00171, 1 / 1250),
 36    41: (58.8110, 0.9250, 0.0033, 0.025321, 0.422, 0.002749, 10013.1, 0.00171, 1 / 1250),
 37    42: (21.8254, 0.4625, 0.0019, 0.026194, 0.442, 0.001241, 1090.8, 0.00171, 1 / 1250),
 38    43: (340.5081, 4.2975, 0.0043, 0.025321, 0.448, 0.002043, 19767.6, 0.00171, 1 / 1250),
 39    44: (24.0977, 0.7300, 0.0054, 0.031651, 0.316, 0.003485, 37596.3, 0.00033, 1 / 1250),
 40    45: (3.4375, 0.1375, 0.0069, 0.033027, 0.320, 0.002397, 10421.2, 0.00016, 1 / 1250),
 41    47: (8.7813, 0.3513, 0.0026, 0.029000, 0.358, 0.003257, 1369.0, 0.00171, 1 / 1250),
 42    48: (35.6250, 1.4250, 0.0063, 0.023019, 0.496, 0.003076, 7046.4, 0.00171, 1 / 1250),
 43    49: (20.0000, 0.8000, 0.0046, 0.034529, 0.304, 0.003744, 823.3, 0.00171, 1 / 1250),
 44    50: (8.1250, 0.3250, 0.0000, 0.033027, 0.320, 0.004033, 2118.5, 0.00171, 1 / 1250),
 45    51: (15.0000, 0.6000, 0.0071, 0.036173, 0.294, 0.004315, 570.4, 0.00171, 1 / 1250),
 46    52: (49.2200, 1.6075, 0.0047, 0.036173, 0.304, 0.001716, 4025.6, 0.00171, 1 / 1250),
 47    53: (69.4565, 1.1625, 0.0028, 0.031651, 0.336, 0.002700, 9819.5, 0.00171, 1 / 1250),
 48}
 49data = {i: dict(zip(params, raw_data[i])) for i in raw_data}
 50
 51# Set the ring we are analyzing
 52ring = 15
 53max_failure_probability = data[ring]["max_Pf"]
 54
 55
 56# Compute the investment cost to increase the dike height
 57def exponential_investment_cost(
 58    u,  # increase in dike height
 59    h0,  # original height of the dike
 60    c,  # constant from Table 1
 61    b,  # constant from Table 1
 62    lam,
 63):  # constant from Table 1
 64    if u == 0:
 65        return 0
 66    else:
 67        return (c + b * u) * math.exp(lam * (h0 + u))
 68
 69
 70def eijgenraam_model(
 71    X1,
 72    X2,
 73    X3,
 74    X4,
 75    X5,
 76    X6,
 77    T1,
 78    T2,
 79    T3,
 80    T4,
 81    T5,
 82    T6,
 83    T=300,
 84    P0=data[ring]["P0"],
 85    V0=data[ring]["V0"],
 86    alpha=data[ring]["alpha"],
 87    delta=0.04,
 88    eta=data[ring]["eta"],
 89    gamma=0.035,
 90    rho=0.015,
 91    zeta=data[ring]["zeta"],
 92    c=data[ring]["c"],
 93    b=data[ring]["b"],
 94    lam=data[ring]["lam"],
 95):
 96    """Python implementation of the Eijgenraam model
 97
 98    Params
 99    ------
100    Xs : list
101         list of dike heightenings
102    Ts : list
103         time of dike heightenings
104    T : int, optional
105        planning horizon
106    P0 : <>, optional
107         constant from Table 1
108    V0 : <>, optional
109         constant from Table 1
110    alpha : <>, optional
111            constant from Table 1
112    delta : float, optional
113            discount rate, mentioned in Section 2.2
114    eta : <>, optional
115          constant from Table 1
116    gamma : float, optional
117            paper says this is taken from government report, but no indication
118            of actual value
119    rho : float, optional
120          risk-free rate, mentioned in Section 2.2
121    zeta : <>, optional
122           constant from Table 1
123    c : <>, optional
124        constant from Table 1
125    b : <>, optional
126        constant from Table 1
127    lam : <>, optional
128         constant from Table 1
129
130    """
131    Ts = [T1, T2, T3, T4, T5, T6]
132    Xs = [X1, X2, X3, X4, X5, X6]
133
134    Ts = [int(Ts[i] + sum(Ts[:i])) for i in range(len(Ts)) if Ts[i] + sum(Ts[:i]) < T]
135    Xs = Xs[: len(Ts)]
136
137    if len(Ts) == 0:
138        Ts = [0]
139        Xs = [0]
140
141    if Ts[0] > 0:
142        Ts.insert(0, 0)
143        Xs.insert(0, 0)
144
145    S0 = P0 * V0
146    beta = alpha * eta + gamma - rho
147    theta = alpha - zeta
148
149    # calculate investment
150    investment = 0
151
152    for i in range(len(Xs)):
153        step_cost = exponential_investment_cost(Xs[i], 0 if i == 0 else sum(Xs[:i]), c, b, lam)
154        step_discount = math.exp(-delta * Ts[i])
155        investment += step_cost * step_discount
156
157    # calculate expected losses
158    losses = 0
159
160    for i in range(len(Xs) - 1):
161        losses += math.exp(-theta * sum(Xs[: (i + 1)])) * (
162            math.exp((beta - delta) * Ts[i + 1]) - math.exp((beta - delta) * Ts[i])
163        )
164
165    if Ts[-1] < T:
166        losses += math.exp(-theta * sum(Xs)) * (
167            math.exp((beta - delta) * T) - math.exp((beta - delta) * Ts[-1])
168        )
169
170    losses = losses * S0 / (beta - delta)
171
172    # salvage term
173    losses += S0 * math.exp(beta * T) * math.exp(-theta * sum(Xs)) * math.exp(-delta * T) / delta
174
175    def find_height(t):
176        if t < Ts[0]:
177            return 0
178        elif t > Ts[-1]:
179            return sum(Xs)
180        else:
181            return sum(Xs[: bisect.bisect_right(Ts, t)])
182
183    failure_probability = [
184        P0 * np.exp(alpha * eta * t) * np.exp(-alpha * find_height(t)) for t in range(T + 1)
185    ]
186    total_failure = 1 - functools.reduce(operator.mul, [1 - p for p in failure_probability], 1)
187    mean_failure = sum(failure_probability) / (T + 1)
188    max_failure = max(failure_probability)
189
190    return (investment, losses, investment + losses, total_failure, mean_failure, max_failure)
191
192
193if __name__ == "__main__":
194    model = Model("eijgenraam", eijgenraam_model)
195
196    model.responses = [
197        ScalarOutcome("TotalInvestment", ScalarOutcome.INFO),
198        ScalarOutcome("TotalLoss", ScalarOutcome.INFO),
199        ScalarOutcome("TotalCost", ScalarOutcome.MINIMIZE),
200        ScalarOutcome("TotalFailureProb", ScalarOutcome.INFO),
201        ScalarOutcome("AvgFailureProb", ScalarOutcome.MINIMIZE),
202        ScalarOutcome("MaxFailureProb", ScalarOutcome.MINIMIZE),
203    ]
204
205    # Set uncertainties
206    model.uncertainties = [
207        RealParameter.from_dist("P0", sp.stats.lognorm(scale=0.00137, s=0.25)),
208        # @UndefinedVariable
209        RealParameter.from_dist("alpha", sp.stats.norm(loc=0.0502, scale=0.01)),
210        # @UndefinedVariable
211        RealParameter.from_dist("eta", sp.stats.lognorm(scale=0.76, s=0.1)),
212    ]  # @UndefinedVariable
213
214    # having a list like parameter were values are automagically wrappen
215    # into a list can be quite useful.....
216    model.levers = [RealParameter(f"X{i}", 0, 500) for i in range(1, 7)] + [
217        RealParameter(f"T{i}", 0, 300) for i in range(1, 7)
218    ]
219
220    ema_logging.log_to_stderr(ema_logging.INFO)
221
222    with MultiprocessingEvaluator(model, n_processes=-1) as evaluator:
223        results = evaluator.perform_experiments(1000, 4)

example_excel.py

 1"""
 2Created on 27 Jul. 2011
 3
 4This file illustrated the use the EMA classes for a model in Excel.
 5
 6It used the excel file provided by
 7`A. Sharov <https://home.comcast.net/~sharov/PopEcol/lec10/fullmod.html>`_
 8
 9This excel file implements a simple predator prey model.
10
11.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
12"""
13
14from ema_workbench import RealParameter, TimeSeriesOutcome, ema_logging, perform_experiments
15
16from ema_workbench.connectors.excel import ExcelModel
17from ema_workbench.em_framework.evaluators import MultiprocessingEvaluator
18
19if __name__ == "__main__":
20    ema_logging.log_to_stderr(level=ema_logging.INFO)
21
22    model = ExcelModel("predatorPrey", wd="./models/excelModel", model_file="excel example.xlsx")
23    model.uncertainties = [
24        RealParameter("K2", 0.01, 0.2),
25        # we can refer to a cell in the normal way
26        # we can also use named cells
27        RealParameter("KKK", 450, 550),
28        RealParameter("rP", 0.05, 0.15),
29        RealParameter("aaa", 0.00001, 0.25),
30        RealParameter("tH", 0.45, 0.55),
31        RealParameter("kk", 0.1, 0.3),
32    ]
33
34    # specification of the outcomes
35    model.outcomes = [
36        TimeSeriesOutcome("B4:B1076"),
37        # we can refer to a range in the normal way
38        TimeSeriesOutcome("P_t"),
39    ]  # we can also use named range
40
41    # name of the sheet
42    model.default_sheet = "Sheet1"
43
44    with MultiprocessingEvaluator(model) as evaluator:
45        results = perform_experiments(model, 100, reporting_interval=1, evaluator=evaluator)

example_flu.py

  1"""
  2Created on 20 dec. 2010
  3
  4This file illustrated the use of the workbench for a model
  5specified in Python itself. The example is based on `Pruyt & Hamarat <https://www.systemdynamics.org/conferences/2010/proceed/papers/P1253.pdf>`_.
  6For comparison, run both this model and the flu_vensim_no_policy_example.py and
  7compare the results.
  8
  9
 10.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 11                chamarat <c.hamarat  (at) tudelft (dot) nl>
 12
 13"""
 14
 15import matplotlib.pyplot as plt
 16import numpy as np
 17from numpy import sin, min, exp
 18
 19from ema_workbench import Model, RealParameter, TimeSeriesOutcome, perform_experiments, ema_logging
 20from ema_workbench import MultiprocessingEvaluator, SequentialEvaluator
 21from ema_workbench.analysis import lines, Density
 22
 23# =============================================================================
 24#
 25#    the model itself
 26#
 27# =============================================================================
 28
 29FINAL_TIME = 48
 30INITIAL_TIME = 0
 31TIME_STEP = 0.0078125
 32
 33switch_regions = 1.0
 34switch_immunity = 1.0
 35switch_deaths = 1.0
 36switch_immunity_cap = 1.0
 37
 38
 39def LookupFunctionX(variable, start, end, step, skew, growth, v=0.5):
 40    return start + ((end - start) / ((1 + skew * exp(-growth * (variable - step))) ** (1 / v)))
 41
 42
 43def flu_model(
 44    x11=0,
 45    x12=0,
 46    x21=0,
 47    x22=0,
 48    x31=0,
 49    x32=0,
 50    x41=0,
 51    x51=0,
 52    x52=0,
 53    x61=0,
 54    x62=0,
 55    x81=0,
 56    x82=0,
 57    x91=0,
 58    x92=0,
 59    x101=0,
 60    x102=0,
 61):
 62    # Assigning initial values
 63    additional_seasonal_immune_population_fraction_R1 = float(x11)
 64    additional_seasonal_immune_population_fraction_R2 = float(x12)
 65
 66    fatality_rate_region_1 = float(x21)
 67    fatality_rate_region_2 = float(x22)
 68
 69    initial_immune_fraction_of_the_population_of_region_1 = float(x31)
 70    initial_immune_fraction_of_the_population_of_region_2 = float(x32)
 71
 72    normal_interregional_contact_rate = float(x41)
 73    interregional_contact_rate = switch_regions * normal_interregional_contact_rate
 74
 75    permanent_immune_population_fraction_R1 = float(x51)
 76    permanent_immune_population_fraction_R2 = float(x52)
 77
 78    recovery_time_region_1 = float(x61)
 79    recovery_time_region_2 = float(x62)
 80
 81    susceptible_to_immune_population_delay_time_region_1 = 1
 82    susceptible_to_immune_population_delay_time_region_2 = 1
 83
 84    root_contact_rate_region_1 = float(x81)
 85    root_contact_rate_region_2 = float(x82)
 86
 87    infection_rate_region_1 = float(x91)
 88    infection_rate_region_2 = float(x92)
 89
 90    normal_contact_rate_region_1 = float(x101)
 91    normal_contact_rate_region_2 = float(x102)
 92
 93    ######
 94    susceptible_to_immune_population_flow_region_1 = 0.0
 95    susceptible_to_immune_population_flow_region_2 = 0.0
 96    ######
 97
 98    initial_value_population_region_1 = 6.0 * 10**8
 99    initial_value_population_region_2 = 3.0 * 10**9
100
101    initial_value_infected_population_region_1 = 10.0
102    initial_value_infected_population_region_2 = 10.0
103
104    initial_value_immune_population_region_1 = (
105        switch_immunity
106        * initial_immune_fraction_of_the_population_of_region_1
107        * initial_value_population_region_1
108    )
109    initial_value_immune_population_region_2 = (
110        switch_immunity
111        * initial_immune_fraction_of_the_population_of_region_2
112        * initial_value_population_region_2
113    )
114
115    initial_value_susceptible_population_region_1 = (
116        initial_value_population_region_1 - initial_value_immune_population_region_1
117    )
118    initial_value_susceptible_population_region_2 = (
119        initial_value_population_region_2 - initial_value_immune_population_region_2
120    )
121
122    recovered_population_region_1 = 0.0
123    recovered_population_region_2 = 0.0
124
125    infected_population_region_1 = initial_value_infected_population_region_1
126    infected_population_region_2 = initial_value_infected_population_region_2
127
128    susceptible_population_region_1 = initial_value_susceptible_population_region_1
129    susceptible_population_region_2 = initial_value_susceptible_population_region_2
130
131    immune_population_region_1 = initial_value_immune_population_region_1
132    immune_population_region_2 = initial_value_immune_population_region_2
133
134    deceased_population_region_1 = [0.0]
135    deceased_population_region_2 = [0.0]
136    runTime = [INITIAL_TIME]
137
138    # --End of Initialization--
139
140    Max_infected = 0.0
141
142    for time in range(int(INITIAL_TIME / TIME_STEP), int(FINAL_TIME / TIME_STEP)):
143        runTime.append(runTime[-1] + TIME_STEP)
144        total_population_region_1 = (
145            infected_population_region_1
146            + recovered_population_region_1
147            + susceptible_population_region_1
148            + immune_population_region_1
149        )
150        total_population_region_2 = (
151            infected_population_region_2
152            + recovered_population_region_2
153            + susceptible_population_region_2
154            + immune_population_region_2
155        )
156
157        infected_population_region_1 = max(0, infected_population_region_1)
158        infected_population_region_2 = max(0, infected_population_region_2)
159
160        infected_fraction_region_1 = infected_population_region_1 / total_population_region_1
161        infected_fraction_region_2 = infected_population_region_2 / total_population_region_2
162
163        impact_infected_population_on_contact_rate_region_1 = 1 - (
164            infected_fraction_region_1 ** (1 / root_contact_rate_region_1)
165        )
166        impact_infected_population_on_contact_rate_region_2 = 1 - (
167            infected_fraction_region_2 ** (1 / root_contact_rate_region_2)
168        )
169
170        #        if ((time*TIME_STEP) >= 4) and ((time*TIME_STEP)<=10):
171        #            normal_contact_rate_region_1 = float(x101)*(1 - 0.5)
172        #        else:normal_contact_rate_region_1 = float(x101)
173
174        normal_contact_rate_region_1 = float(x101) * (
175            1 - LookupFunctionX(infected_fraction_region_1, 0, 1, 0.15, 0.75, 15)
176        )
177
178        contact_rate_region_1 = (
179            normal_contact_rate_region_1 * impact_infected_population_on_contact_rate_region_1
180        )
181        contact_rate_region_2 = (
182            normal_contact_rate_region_2 * impact_infected_population_on_contact_rate_region_2
183        )
184
185        recoveries_region_1 = (
186            (1 - (fatality_rate_region_1 * switch_deaths))
187            * infected_population_region_1
188            / recovery_time_region_1
189        )
190        recoveries_region_2 = (
191            (1 - (fatality_rate_region_2 * switch_deaths))
192            * infected_population_region_2
193            / recovery_time_region_2
194        )
195
196        flu_deaths_region_1 = (
197            fatality_rate_region_1
198            * switch_deaths
199            * infected_population_region_1
200            / recovery_time_region_1
201        )
202        flu_deaths_region_2 = (
203            fatality_rate_region_2
204            * switch_deaths
205            * infected_population_region_2
206            / recovery_time_region_2
207        )
208
209        infections_region_1 = (
210            susceptible_population_region_1
211            * contact_rate_region_1
212            * infection_rate_region_1
213            * infected_fraction_region_1
214        ) + (
215            susceptible_population_region_1
216            * interregional_contact_rate
217            * infection_rate_region_1
218            * infected_fraction_region_2
219        )
220        infections_region_2 = (
221            susceptible_population_region_2
222            * contact_rate_region_2
223            * infection_rate_region_2
224            * infected_fraction_region_2
225        ) + (
226            susceptible_population_region_2
227            * interregional_contact_rate
228            * infection_rate_region_2
229            * infected_fraction_region_1
230        )
231
232        infected_population_region_1_NEXT = infected_population_region_1 + (
233            TIME_STEP * (infections_region_1 - flu_deaths_region_1 - recoveries_region_1)
234        )
235        infected_population_region_2_NEXT = infected_population_region_2 + (
236            TIME_STEP * (infections_region_2 - flu_deaths_region_2 - recoveries_region_2)
237        )
238
239        if infected_population_region_1_NEXT < 0 or infected_population_region_2_NEXT < 0:
240            pass
241
242        recovered_population_region_1_NEXT = recovered_population_region_1 + (
243            TIME_STEP * recoveries_region_1
244        )
245        recovered_population_region_2_NEXT = recovered_population_region_2 + (
246            TIME_STEP * recoveries_region_2
247        )
248
249        if fatality_rate_region_1 >= 0.025:
250            qw = 1.0
251        elif fatality_rate_region_1 >= 0.01:
252            qw = 0.8
253        elif fatality_rate_region_1 >= 0.001:
254            qw = 0.6
255        elif fatality_rate_region_1 >= 0.0001:
256            qw = 0.4
257        else:
258            qw = 0.2
259
260        if (time * TIME_STEP) <= 10:
261            normal_immune_population_fraction_region_1 = (
262                additional_seasonal_immune_population_fraction_R1 / 2
263            ) * sin(4.5 + (time * TIME_STEP / 2)) + (
264                (
265                    (2 * permanent_immune_population_fraction_R1)
266                    + additional_seasonal_immune_population_fraction_R1
267                )
268                / 2
269            )
270        else:
271            normal_immune_population_fraction_region_1 = max(
272                (
273                    float(qw),
274                    (additional_seasonal_immune_population_fraction_R1 / 2)
275                    * sin(4.5 + (time * TIME_STEP / 2))
276                    + (
277                        (
278                            (2 * permanent_immune_population_fraction_R1)
279                            + additional_seasonal_immune_population_fraction_R1
280                        )
281                        / 2
282                    ),
283                )
284            )
285
286        normal_immune_population_fraction_region_2 = switch_immunity_cap * min(
287            (
288                (
289                    sin((time * TIME_STEP / 2) + 1.5)
290                    * additional_seasonal_immune_population_fraction_R2
291                    / 2
292                )
293                + (
294                    (
295                        (2 * permanent_immune_population_fraction_R2)
296                        + additional_seasonal_immune_population_fraction_R2
297                    )
298                    / 2
299                ),
300                (
301                    permanent_immune_population_fraction_R1
302                    + additional_seasonal_immune_population_fraction_R1
303                ),
304            ),
305        ) + (
306            (1 - switch_immunity_cap)
307            * (
308                (
309                    sin((time * TIME_STEP / 2) + 1.5)
310                    * additional_seasonal_immune_population_fraction_R2
311                    / 2
312                )
313                + (
314                    (
315                        (2 * permanent_immune_population_fraction_R2)
316                        + additional_seasonal_immune_population_fraction_R2
317                    )
318                    / 2
319                )
320            )
321        )
322
323        normal_immune_population_region_1 = (
324            normal_immune_population_fraction_region_1 * total_population_region_1
325        )
326        normal_immune_population_region_2 = (
327            normal_immune_population_fraction_region_2 * total_population_region_2
328        )
329
330        if switch_immunity == 1:
331            susminreg1_1 = (
332                normal_immune_population_region_1 - immune_population_region_1
333            ) / susceptible_to_immune_population_delay_time_region_1
334            susminreg1_2 = (
335                susceptible_population_region_1
336                / susceptible_to_immune_population_delay_time_region_1
337            )
338            susmaxreg1 = -(
339                immune_population_region_1 / susceptible_to_immune_population_delay_time_region_1
340            )
341            if (susmaxreg1 >= susminreg1_1) or (susmaxreg1 >= susminreg1_2):
342                susceptible_to_immune_population_flow_region_1 = susmaxreg1
343            elif (susminreg1_1 < susminreg1_2) and (susminreg1_1 > susmaxreg1):
344                susceptible_to_immune_population_flow_region_1 = susminreg1_1
345            elif (susminreg1_2 < susminreg1_1) and (susminreg1_2 > susmaxreg1):
346                susceptible_to_immune_population_flow_region_1 = susminreg1_2
347        else:
348            susceptible_to_immune_population_flow_region_1 = 0
349
350        if switch_immunity == 1:
351            susminreg2_1 = (
352                normal_immune_population_region_2 - immune_population_region_2
353            ) / susceptible_to_immune_population_delay_time_region_2
354            susminreg2_2 = (
355                susceptible_population_region_2
356                / susceptible_to_immune_population_delay_time_region_2
357            )
358            susmaxreg2 = -(
359                immune_population_region_2 / susceptible_to_immune_population_delay_time_region_2
360            )
361            if (susmaxreg2 >= susminreg2_1) or (susmaxreg2 >= susminreg2_2):
362                susceptible_to_immune_population_flow_region_2 = susmaxreg2
363            elif (susminreg2_1 < susminreg2_2) and (susminreg2_1 > susmaxreg2):
364                susceptible_to_immune_population_flow_region_2 = susminreg2_1
365            elif (susminreg2_2 < susminreg2_1) and (susminreg2_2 > susmaxreg2):
366                susceptible_to_immune_population_flow_region_2 = susminreg2_2
367        else:
368            susceptible_to_immune_population_flow_region_2 = 0
369
370        susceptible_population_region_1_NEXT = susceptible_population_region_1 - (
371            TIME_STEP * (infections_region_1 + susceptible_to_immune_population_flow_region_1)
372        )
373        susceptible_population_region_2_NEXT = susceptible_population_region_2 - (
374            TIME_STEP * (infections_region_2 + susceptible_to_immune_population_flow_region_2)
375        )
376
377        immune_population_region_1_NEXT = immune_population_region_1 + (
378            TIME_STEP * susceptible_to_immune_population_flow_region_1
379        )
380        immune_population_region_2_NEXT = immune_population_region_2 + (
381            TIME_STEP * susceptible_to_immune_population_flow_region_2
382        )
383
384        deceased_population_region_1_NEXT = deceased_population_region_1[-1] + (
385            TIME_STEP * flu_deaths_region_1
386        )
387        deceased_population_region_2_NEXT = deceased_population_region_2[-1] + (
388            TIME_STEP * flu_deaths_region_2
389        )
390
391        # Updating integral values
392        if Max_infected < (
393            infected_population_region_1_NEXT
394            / (
395                infected_population_region_1_NEXT
396                + recovered_population_region_1_NEXT
397                + susceptible_population_region_1_NEXT
398                + immune_population_region_1_NEXT
399            )
400        ):
401            Max_infected = infected_population_region_1_NEXT / (
402                infected_population_region_1_NEXT
403                + recovered_population_region_1_NEXT
404                + susceptible_population_region_1_NEXT
405                + immune_population_region_1_NEXT
406            )
407
408        recovered_population_region_1 = recovered_population_region_1_NEXT
409        recovered_population_region_2 = recovered_population_region_2_NEXT
410
411        infected_population_region_1 = infected_population_region_1_NEXT
412        infected_population_region_2 = infected_population_region_2_NEXT
413
414        susceptible_population_region_1 = susceptible_population_region_1_NEXT
415        susceptible_population_region_2 = susceptible_population_region_2_NEXT
416
417        immune_population_region_1 = immune_population_region_1_NEXT
418        immune_population_region_2 = immune_population_region_2_NEXT
419
420        deceased_population_region_1.append(deceased_population_region_1_NEXT)
421        deceased_population_region_2.append(deceased_population_region_2_NEXT)
422
423        # End of main code
424
425    return {"TIME": runTime, "deceased_population_region_1": deceased_population_region_1}
426
427
428if __name__ == "__main__":
429    ema_logging.log_to_stderr(ema_logging.INFO)
430
431    model = Model("mexicanFlu", function=flu_model)
432    model.uncertainties = [
433        RealParameter("x11", 0, 0.5),
434        RealParameter("x12", 0, 0.5),
435        RealParameter("x21", 0.0001, 0.1),
436        RealParameter("x22", 0.0001, 0.1),
437        RealParameter("x31", 0, 0.5),
438        RealParameter("x32", 0, 0.5),
439        RealParameter("x41", 0, 0.9),
440        RealParameter("x51", 0, 0.5),
441        RealParameter("x52", 0, 0.5),
442        RealParameter("x61", 0, 0.8),
443        RealParameter("x62", 0, 0.8),
444        RealParameter("x81", 1, 10),
445        RealParameter("x82", 1, 10),
446        RealParameter("x91", 0, 0.1),
447        RealParameter("x92", 0, 0.1),
448        RealParameter("x101", 0, 200),
449        RealParameter("x102", 0, 200),
450    ]
451
452    model.outcomes = [TimeSeriesOutcome("TIME"), TimeSeriesOutcome("deceased_population_region_1")]
453
454    nr_experiments = 500
455
456    with SequentialEvaluator(model) as evaluator:
457        results = perform_experiments(model, nr_experiments, evaluator=evaluator)
458
459    print("laat")
460
461    lines(
462        results,
463        outcomes_to_show="deceased_population_region_1",
464        show_envelope=True,
465        density=Density.KDE,
466        titles=None,
467        experiments_to_show=np.arange(0, nr_experiments, 10),
468    )
469    plt.show()

example_lake_model.py

  1"""
  2An example of the lake problem using the ema workbench.
  3
  4The model itself is adapted from the Rhodium example by Dave Hadka,
  5see https://gist.github.com/dhadka/a8d7095c98130d8f73bc
  6
  7"""
  8
  9import math
 10
 11import numpy as np
 12from scipy.optimize import brentq
 13
 14from ema_workbench import (
 15    Model,
 16    RealParameter,
 17    ScalarOutcome,
 18    Constant,
 19    ema_logging,
 20    MultiprocessingEvaluator,
 21)
 22from ema_workbench.em_framework.evaluators import Samplers
 23
 24
 25def lake_problem(
 26    b=0.42,  # decay rate for P in lake (0.42 = irreversible)
 27    q=2.0,  # recycling exponent
 28    mean=0.02,  # mean of natural inflows
 29    stdev=0.001,  # future utility discount rate
 30    delta=0.98,  # standard deviation of natural inflows
 31    alpha=0.4,  # utility from pollution
 32    nsamples=100,  # Monte Carlo sampling of natural inflows
 33    **kwargs,
 34):
 35    try:
 36        decisions = [kwargs[str(i)] for i in range(100)]
 37    except KeyError:
 38        decisions = [0] * 100
 39        print("No valid decisions found, using 0 water release every year as default")
 40
 41    nvars = len(decisions)
 42    decisions = np.array(decisions)
 43
 44    # Calculate the critical pollution level (Pcrit)
 45    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)
 46
 47    # Generate natural inflows using lognormal distribution
 48    natural_inflows = np.random.lognormal(
 49        mean=math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
 50        sigma=math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
 51        size=(nsamples, nvars),
 52    )
 53
 54    # Initialize the pollution level matrix X
 55    X = np.zeros((nsamples, nvars))
 56
 57    # Loop through time to compute the pollution levels
 58    for t in range(1, nvars):
 59        X[:, t] = (
 60            (1 - b) * X[:, t - 1]
 61            + (X[:, t - 1] ** q / (1 + X[:, t - 1] ** q))
 62            + decisions[t - 1]
 63            + natural_inflows[:, t - 1]
 64        )
 65
 66    # Calculate the average daily pollution for each time step
 67    average_daily_P = np.mean(X, axis=0)
 68
 69    # Calculate the reliability (probability of the pollution level being below Pcrit)
 70    reliability = np.sum(X < Pcrit) / float(nsamples * nvars)
 71
 72    # Calculate the maximum pollution level (max_P)
 73    max_P = np.max(average_daily_P)
 74
 75    # Calculate the utility by discounting the decisions using the discount factor (delta)
 76    utility = np.sum(alpha * decisions * np.power(delta, np.arange(nvars)))
 77
 78    # Calculate the inertia (the fraction of time steps with changes larger than 0.02)
 79    inertia = np.sum(np.abs(np.diff(decisions)) > 0.02) / float(nvars - 1)
 80
 81    return max_P, utility, inertia, reliability
 82
 83
 84if __name__ == "__main__":
 85    ema_logging.log_to_stderr(ema_logging.INFO)
 86
 87    # instantiate the model
 88    lake_model = Model("lakeproblem", function=lake_problem)
 89    lake_model.time_horizon = 100
 90
 91    # specify uncertainties
 92    lake_model.uncertainties = [
 93        RealParameter("b", 0.1, 0.45),
 94        RealParameter("q", 2.0, 4.5),
 95        RealParameter("mean", 0.01, 0.05),
 96        RealParameter("stdev", 0.001, 0.005),
 97        RealParameter("delta", 0.93, 0.99),
 98    ]
 99
100    # set levers, one for each time step
101    lake_model.levers = [RealParameter(str(i), 0, 0.1) for i in range(lake_model.time_horizon)]
102
103    # specify outcomes
104    lake_model.outcomes = [
105        ScalarOutcome("max_P"),
106        ScalarOutcome("utility"),
107        ScalarOutcome("inertia"),
108        ScalarOutcome("reliability"),
109    ]
110
111    # override some of the defaults of the model
112    lake_model.constants = [Constant("alpha", 0.41), Constant("nsamples", 150)]
113
114    # generate some random policies by sampling over levers
115    n_scenarios = 1000
116    n_policies = 4
117
118    with MultiprocessingEvaluator(lake_model) as evaluator:
119        res = evaluator.perform_experiments(n_scenarios, n_policies, lever_sampling=Samplers.MC)

example_mesa.py

  1"""
  2
  3This example is a proof of principle for how MESA models can be
  4controlled using the ema_workbench.
  5
  6"""
  7
  8# Import EMA Workbench modules
  9from ema_workbench import (
 10    ReplicatorModel,
 11    RealParameter,
 12    BooleanParameter,
 13    IntegerParameter,
 14    Constant,
 15    ArrayOutcome,
 16    perform_experiments,
 17    save_results,
 18    ema_logging,
 19)
 20
 21# Necessary packages for the model
 22import math
 23from enum import Enum
 24import mesa
 25import networkx as nx
 26
 27# MESA demo model "Virus on a Network", from https://github.com/projectmesa/mesa-examples/blob/d16736778fdb500a3e5e05e082b27db78673b562/examples/virus_on_network/virus_on_network/model.py
 28
 29
 30class State(Enum):
 31    SUSCEPTIBLE = 0
 32    INFECTED = 1
 33    RESISTANT = 2
 34
 35
 36def number_state(model, state):
 37    return sum(1 for a in model.grid.get_all_cell_contents() if a.state is state)
 38
 39
 40def number_infected(model):
 41    return number_state(model, State.INFECTED)
 42
 43
 44def number_susceptible(model):
 45    return number_state(model, State.SUSCEPTIBLE)
 46
 47
 48def number_resistant(model):
 49    return number_state(model, State.RESISTANT)
 50
 51
 52class VirusOnNetwork(mesa.Model):
 53    """A virus model with some number of agents"""
 54
 55    def __init__(
 56        self,
 57        num_nodes=10,
 58        avg_node_degree=3,
 59        initial_outbreak_size=1,
 60        virus_spread_chance=0.4,
 61        virus_check_frequency=0.4,
 62        recovery_chance=0.3,
 63        gain_resistance_chance=0.5,
 64    ):
 65        self.num_nodes = num_nodes
 66        prob = avg_node_degree / self.num_nodes
 67        self.G = nx.erdos_renyi_graph(n=self.num_nodes, p=prob)
 68        self.grid = mesa.space.NetworkGrid(self.G)
 69        self.schedule = mesa.time.RandomActivation(self)
 70        self.initial_outbreak_size = (
 71            initial_outbreak_size if initial_outbreak_size <= num_nodes else num_nodes
 72        )
 73        self.virus_spread_chance = virus_spread_chance
 74        self.virus_check_frequency = virus_check_frequency
 75        self.recovery_chance = recovery_chance
 76        self.gain_resistance_chance = gain_resistance_chance
 77
 78        self.datacollector = mesa.DataCollector(
 79            {
 80                "Infected": number_infected,
 81                "Susceptible": number_susceptible,
 82                "Resistant": number_resistant,
 83            }
 84        )
 85
 86        # Create agents
 87        for i, node in enumerate(self.G.nodes()):
 88            a = VirusAgent(
 89                i,
 90                self,
 91                State.SUSCEPTIBLE,
 92                self.virus_spread_chance,
 93                self.virus_check_frequency,
 94                self.recovery_chance,
 95                self.gain_resistance_chance,
 96            )
 97            self.schedule.add(a)
 98            # Add the agent to the node
 99            self.grid.place_agent(a, node)
100
101        # Infect some nodes
102        infected_nodes = self.random.sample(list(self.G), self.initial_outbreak_size)
103        for a in self.grid.get_cell_list_contents(infected_nodes):
104            a.state = State.INFECTED
105
106        self.running = True
107        self.datacollector.collect(self)
108
109    def resistant_susceptible_ratio(self):
110        try:
111            return number_state(self, State.RESISTANT) / number_state(self, State.SUSCEPTIBLE)
112        except ZeroDivisionError:
113            return math.inf
114
115    def step(self):
116        self.schedule.step()
117        # collect data
118        self.datacollector.collect(self)
119
120    def run_model(self, n):
121        for i in range(n):
122            self.step()
123
124
125class VirusAgent(mesa.Agent):
126    def __init__(
127        self,
128        unique_id,
129        model,
130        initial_state,
131        virus_spread_chance,
132        virus_check_frequency,
133        recovery_chance,
134        gain_resistance_chance,
135    ):
136        super().__init__(unique_id, model)
137
138        self.state = initial_state
139
140        self.virus_spread_chance = virus_spread_chance
141        self.virus_check_frequency = virus_check_frequency
142        self.recovery_chance = recovery_chance
143        self.gain_resistance_chance = gain_resistance_chance
144
145    def try_to_infect_neighbors(self):
146        neighbors_nodes = self.model.grid.get_neighborhood(self.pos, include_center=False)
147        susceptible_neighbors = [
148            agent
149            for agent in self.model.grid.get_cell_list_contents(neighbors_nodes)
150            if agent.state is State.SUSCEPTIBLE
151        ]
152        for a in susceptible_neighbors:
153            if self.random.random() < self.virus_spread_chance:
154                a.state = State.INFECTED
155
156    def try_gain_resistance(self):
157        if self.random.random() < self.gain_resistance_chance:
158            self.state = State.RESISTANT
159
160    def try_remove_infection(self):
161        # Try to remove
162        if self.random.random() < self.recovery_chance:
163            # Success
164            self.state = State.SUSCEPTIBLE
165            self.try_gain_resistance()
166        else:
167            # Failed
168            self.state = State.INFECTED
169
170    def try_check_situation(self):
171        if (self.random.random() < self.virus_check_frequency) and (self.state is State.INFECTED):
172            self.try_remove_infection()
173
174    def step(self):
175        if self.state is State.INFECTED:
176            self.try_to_infect_neighbors()
177        self.try_check_situation()
178
179
180# Setting up the model as a function
181def model_virus_on_network(
182    num_nodes=1,
183    avg_node_degree=1,
184    initial_outbreak_size=1,
185    virus_spread_chance=1,
186    virus_check_frequency=1,
187    recovery_chance=1,
188    gain_resistance_chance=1,
189    steps=10,
190):
191    # Initialising the model
192    virus_on_network = VirusOnNetwork(
193        num_nodes=num_nodes,
194        avg_node_degree=avg_node_degree,
195        initial_outbreak_size=initial_outbreak_size,
196        virus_spread_chance=virus_spread_chance,
197        virus_check_frequency=virus_check_frequency,
198        recovery_chance=recovery_chance,
199        gain_resistance_chance=gain_resistance_chance,
200    )
201
202    # Run the model steps times
203    virus_on_network.run_model(steps)
204
205    # Get model outcomes
206    outcomes = virus_on_network.datacollector.get_model_vars_dataframe()
207
208    # Return model outcomes
209    return {
210        "Infected": outcomes["Infected"].tolist(),
211        "Susceptible": outcomes["Susceptible"].tolist(),
212        "Resistant": outcomes["Resistant"].tolist(),
213    }
214
215
216if __name__ == "__main__":
217    # Initialize logger to keep track of experiments run
218    ema_logging.log_to_stderr(ema_logging.INFO)
219
220    # Instantiate and pass the model
221    model = ReplicatorModel("VirusOnNetwork", function=model_virus_on_network)
222
223    # Define model parameters and their ranges to be sampled
224    model.uncertainties = [
225        IntegerParameter("num_nodes", 10, 100),
226        IntegerParameter("avg_node_degree", 2, 8),
227        RealParameter("virus_spread_chance", 0.1, 1),
228        RealParameter("virus_check_frequency", 0.1, 1),
229        RealParameter("recovery_chance", 0.1, 1),
230        RealParameter("gain_resistance_chance", 0.1, 1),
231    ]
232
233    # Define model parameters that will remain constant
234    model.constants = [Constant("initial_outbreak_size", 1), Constant("steps", 30)]
235
236    # Define model outcomes
237    model.outcomes = [
238        ArrayOutcome("Infected"),
239        ArrayOutcome("Susceptible"),
240        ArrayOutcome("Resistant"),
241    ]
242
243    # Define the number of replications
244    model.replications = 5
245
246    # Run experiments with the aforementioned parameters and outputs
247    results = perform_experiments(models=model, scenarios=20)
248
249    # Get the results
250    experiments, outcomes = results

example_mpi_lake_model.py

  1"""
  2An example of the lake problem using the ema workbench.
  3
  4The model itself is adapted from the Rhodium example by Dave Hadka,
  5see https://gist.github.com/dhadka/a8d7095c98130d8f73bc
  6
  7"""
  8
  9import math
 10import time
 11
 12import numpy as np
 13from scipy.optimize import brentq
 14
 15from ema_workbench import (
 16    Model,
 17    RealParameter,
 18    ScalarOutcome,
 19    Constant,
 20    ema_logging,
 21    MPIEvaluator,
 22    save_results,
 23)
 24
 25
 26def lake_problem(
 27    b=0.42,  # decay rate for P in lake (0.42 = irreversible)
 28    q=2.0,  # recycling exponent
 29    mean=0.02,  # mean of natural inflows
 30    stdev=0.001,  # future utility discount rate
 31    delta=0.98,  # standard deviation of natural inflows
 32    alpha=0.4,  # utility from pollution
 33    nsamples=100,  # Monte Carlo sampling of natural inflows
 34    **kwargs,
 35):
 36    try:
 37        decisions = [kwargs[str(i)] for i in range(100)]
 38    except KeyError:
 39        decisions = [0] * 100
 40        print("No valid decisions found, using 0 water release every year as default")
 41
 42    nvars = len(decisions)
 43    decisions = np.array(decisions)
 44
 45    # Calculate the critical pollution level (Pcrit)
 46    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)
 47
 48    # Generate natural inflows using lognormal distribution
 49    natural_inflows = np.random.lognormal(
 50        mean=math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
 51        sigma=math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
 52        size=(nsamples, nvars),
 53    )
 54
 55    # Initialize the pollution level matrix X
 56    X = np.zeros((nsamples, nvars))
 57
 58    # Loop through time to compute the pollution levels
 59    for t in range(1, nvars):
 60        X[:, t] = (
 61            (1 - b) * X[:, t - 1]
 62            + (X[:, t - 1] ** q / (1 + X[:, t - 1] ** q))
 63            + decisions[t - 1]
 64            + natural_inflows[:, t - 1]
 65        )
 66
 67    # Calculate the average daily pollution for each time step
 68    average_daily_P = np.mean(X, axis=0)
 69
 70    # Calculate the reliability (probability of the pollution level being below Pcrit)
 71    reliability = np.sum(X < Pcrit) / float(nsamples * nvars)
 72
 73    # Calculate the maximum pollution level (max_P)
 74    max_P = np.max(average_daily_P)
 75
 76    # Calculate the utility by discounting the decisions using the discount factor (delta)
 77    utility = np.sum(alpha * decisions * np.power(delta, np.arange(nvars)))
 78
 79    # Calculate the inertia (the fraction of time steps with changes larger than 0.02)
 80    inertia = np.sum(np.abs(np.diff(decisions)) > 0.02) / float(nvars - 1)
 81
 82    return max_P, utility, inertia, reliability
 83
 84
 85if __name__ == "__main__":
 86    import ema_workbench
 87
 88    # run with mpiexec -n 1 -usize {ntasks} python example_mpi_lake_model.py
 89    starttime = time.perf_counter()
 90
 91    ema_logging.log_to_stderr(ema_logging.INFO, pass_root_logger_level=True)
 92    ema_logging.get_rootlogger().info(f"{ema_workbench.__version__}")
 93
 94    # instantiate the model
 95    lake_model = Model("lakeproblem", function=lake_problem)
 96    lake_model.time_horizon = 100
 97
 98    # specify uncertainties
 99    lake_model.uncertainties = [
100        RealParameter("b", 0.1, 0.45),
101        RealParameter("q", 2.0, 4.5),
102        RealParameter("mean", 0.01, 0.05),
103        RealParameter("stdev", 0.001, 0.005),
104        RealParameter("delta", 0.93, 0.99),
105    ]
106
107    # set levers, one for each time step
108    lake_model.levers = [RealParameter(str(i), 0, 0.1) for i in range(lake_model.time_horizon)]
109
110    # specify outcomes
111    lake_model.outcomes = [
112        ScalarOutcome("max_P"),
113        ScalarOutcome("utility"),
114        ScalarOutcome("inertia"),
115        ScalarOutcome("reliability"),
116    ]
117
118    # override some of the defaults of the model
119    lake_model.constants = [Constant("alpha", 0.41), Constant("nsamples", 150)]
120
121    # generate some random policies by sampling over levers
122    n_scenarios = 10000
123    n_policies = 4
124
125    with MPIEvaluator(lake_model) as evaluator:
126        res = evaluator.perform_experiments(n_scenarios, n_policies, chunksize=250)
127
128    save_results(res, "test.tar.gz")
129
130    print(time.perf_counter() - starttime)

example_netlogo.py

 1"""
 2
 3This example is a proof of principle for how NetLogo models can be
 4controlled using pyNetLogo and the ema_workbench. Note that this
 5example uses the NetLogo 6 version of the predator prey model that
 6comes with NetLogo. If you are using NetLogo 5, replace the model file
 7with the one that comes with NetLogo.
 8
 9"""
10
11import numpy as np
12
13from ema_workbench import (
14    RealParameter,
15    ema_logging,
16    ScalarOutcome,
17    TimeSeriesOutcome,
18    MultiprocessingEvaluator,
19)
20from ema_workbench.connectors.netlogo import NetLogoModel
21
22# Created on 20 mrt. 2013
23#
24# .. codeauthor::  jhkwakkel
25
26
27if __name__ == "__main__":
28    # turn on logging
29    ema_logging.log_to_stderr(ema_logging.INFO)
30
31    model = NetLogoModel(
32        "predprey", wd="./models/predatorPreyNetlogo", model_file="Wolf Sheep Predation.nlogo"
33    )
34    model.run_length = 100
35    model.replications = 10
36
37    model.uncertainties = [
38        RealParameter("grass-regrowth-time", 1, 99),
39        RealParameter("initial-number-sheep", 50, 100),
40        RealParameter("initial-number-wolves", 50, 100),
41        RealParameter("sheep-reproduce", 5, 10),
42        RealParameter("wolf-reproduce", 5, 10),
43    ]
44
45    model.outcomes = [
46        ScalarOutcome("sheep", variable_name="count sheep", function=np.mean),
47        TimeSeriesOutcome("wolves"),
48        TimeSeriesOutcome("grass"),
49    ]
50
51    # perform experiments
52    n = 10
53
54    with MultiprocessingEvaluator(model, n_processes=-1, maxtasksperchild=4) as evaluator:
55        results = evaluator.perform_experiments(n)
56
57    print()

example_pysd_teacup.py

 1"""
 2
 3
 4"""
 5
 6# Created on Jul 23, 2016
 7#
 8# .. codeauthor::jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 9
10from ema_workbench import RealParameter, TimeSeriesOutcome, ema_logging, perform_experiments
11
12from ema_workbench.connectors.pysd_connector import PysdModel
13
14if __name__ == "__main__":
15    ema_logging.log_to_stderr(ema_logging.INFO)
16
17    mdl_file = "./models/pysd/Teacup.mdl"
18
19    model = PysdModel("teacup", mdl_file=mdl_file)
20
21    model.uncertainties = [
22        RealParameter("room_temperature", 33, 120, variable_name="Room Temperature")
23    ]
24    model.outcomes = [TimeSeriesOutcome("teacup_temperature", variable_name="Teacup Temperature")]
25
26    perform_experiments(model, 100)

example_python.py

 1"""
 2Created on 20 dec. 2010
 3
 4This file illustrated the use the EMA classes for a contrived example
 5It's main purpose has been to test the parallel processing functionality
 6
 7.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 8"""
 9
10from ema_workbench import Model, RealParameter, ScalarOutcome, ema_logging, perform_experiments
11
12
13def some_model(x1=None, x2=None, x3=None):
14    return {"y": x1 * x2 + x3}
15
16
17if __name__ == "__main__":
18    ema_logging.LOG_FORMAT = "[%(name)s/%(levelname)s/%(processName)s] %(message)s"
19    ema_logging.log_to_stderr(ema_logging.INFO)
20
21    model = Model("simpleModel", function=some_model)  # instantiate the model
22
23    # specify uncertainties
24    model.uncertainties = [
25        RealParameter("x1", 0.1, 10),
26        RealParameter("x2", -0.01, 0.01),
27        RealParameter("x3", -0.01, 0.01),
28    ]
29    # specify outcomes
30    model.outcomes = [ScalarOutcome("y")]
31
32    results = perform_experiments(model, 100)

example_simio.py

 1"""
 2Created on 27 Jun 2019
 3
 4@author: jhkwakkel
 5"""
 6
 7from ema_workbench import ema_logging, CategoricalParameter, MultiprocessingEvaluator, ScalarOutcome
 8
 9from ema_workbench.connectors.simio_connector import SimioModel
10
11if __name__ == "__main__":
12    ema_logging.log_to_stderr(ema_logging.INFO)
13
14    model = SimioModel(
15        "simioDemo", wd="./model_bahareh", model_file="SupplyChainV3.spfx", main_model="Model"
16    )
17
18    model.uncertainties = [
19        CategoricalParameter("DemandDistributionParameter", (20, 30, 40, 50, 60)),
20        CategoricalParameter("DemandInterarrivalTime", (0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2)),
21    ]
22
23    model.levers = [
24        CategoricalParameter("InitialInventory", (500, 600, 700, 800, 900)),
25        CategoricalParameter("ReorderPoint", (100, 200, 300, 400, 500)),
26        CategoricalParameter("OrderUpToQuantity", (500, 600, 700, 800, 900)),
27        CategoricalParameter("ReviewPeriod", (3, 4, 5, 6, 7)),
28    ]
29
30    model.outcomes = [ScalarOutcome("AverageInventory"), ScalarOutcome("AverageServiceLevel")]
31
32    n_scenarios = 10
33    n_policies = 2
34
35    with MultiprocessingEvaluator(model) as evaluator:
36        results = evaluator.perform_experiments(n_scenarios, n_policies)

example_vensim.py

 1"""
 2Created on 3 Jan. 2011
 3
 4This file illustrated the use the EMA classes for a contrived vensim
 5example
 6
 7
 8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 9                chamarat <c.hamarat (at) tudelft (dot) nl>
10"""
11
12from ema_workbench import TimeSeriesOutcome, perform_experiments, RealParameter, ema_logging
13
14from ema_workbench.connectors.vensim import VensimModel
15
16if __name__ == "__main__":
17    # turn on logging
18    ema_logging.log_to_stderr(ema_logging.INFO)
19
20    # instantiate a model
21    wd = "./models/vensim example"
22    vensimModel = VensimModel("simpleModel", wd=wd, model_file="model.vpm")
23    vensimModel.uncertainties = [RealParameter("x11", 0, 2.5), RealParameter("x12", -2.5, 2.5)]
24
25    vensimModel.outcomes = [TimeSeriesOutcome("a")]
26
27    results = perform_experiments(vensimModel, 1000)

example_vensim_advanced_flu.py

 1"""
 2Created on 20 May, 2011
 3
 4This module shows how you can use vensim models directly
 5instead of coding the model in Python. The underlying case
 6is the same as used in fluExample
 7
 8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 9                epruyt <e.pruyt (at) tudelft (dot) nl>
10"""
11
12import numpy as np
13
14from ema_workbench import (
15    TimeSeriesOutcome,
16    ScalarOutcome,
17    ema_logging,
18    Policy,
19    MultiprocessingEvaluator,
20    save_results,
21)
22from ema_workbench.connectors.vensim import VensimModel
23from ema_workbench.em_framework.parameters import parameters_from_csv
24
25
26def time_of_max(infected_fraction, time):
27    index = np.where(infected_fraction == np.max(infected_fraction))
28    timing = time[index][0]
29    return timing
30
31
32if __name__ == "__main__":
33    ema_logging.log_to_stderr(ema_logging.INFO)
34
35    model = VensimModel("fluCase", wd="./models/flu", model_file="FLUvensimV1basecase.vpm")
36
37    # outcomes
38    model.outcomes = [
39        TimeSeriesOutcome(
40            "deceased_population_region_1", variable_name="deceased population region 1"
41        ),
42        TimeSeriesOutcome("infected_fraction_R1", variable_name="infected fraction R1"),
43        ScalarOutcome(
44            "max_infection_fraction", variable_name="infected fraction R1", function=np.max
45        ),
46        ScalarOutcome(
47            "time_of_max", variable_name=["infected fraction R1", "TIME"], function=time_of_max
48        ),
49    ]
50
51    # create uncertainties based on csv
52    # FIXME csv is missing
53    model.uncertainties = parameters_from_csv("./models/flu/flu_uncertainties.csv")
54
55    # add policies
56    policies = [
57        Policy("no policy", model_file="FLUvensimV1basecase.vpm"),
58        Policy("static policy", model_file="FLUvensimV1static.vpm"),
59        Policy("adaptive policy", model_file="FLUvensimV1dynamic.vpm"),
60    ]
61
62    with MultiprocessingEvaluator(model, n_processes=-1) as evaluator:
63        results = evaluator.perform_experiments(1000, policies=policies)
64
65    save_results(results, "./data/1000 flu cases with policies.tar.gz")

example_vensim_energy.py

  1"""
  2Created on 27 Jan 2014
  3
  4@author: jhkwakkel
  5"""
  6
  7from ema_workbench import (
  8    RealParameter,
  9    TimeSeriesOutcome,
 10    ema_logging,
 11    MultiprocessingEvaluator,
 12    ScalarOutcome,
 13    perform_experiments,
 14    CategoricalParameter,
 15    save_results,
 16    Policy,
 17)
 18from ema_workbench.connectors.vensim import VensimModel
 19from ema_workbench.em_framework.evaluators import SequentialEvaluator
 20
 21
 22def get_energy_model():
 23    model = VensimModel(
 24        "energyTransition",
 25        wd="./models",
 26        model_file="RB_V25_ets_1_policy_modified_adaptive_extended_outcomes.vpm",
 27    )
 28
 29    model.outcomes = [
 30        TimeSeriesOutcome(
 31            "cumulative_carbon_emissions", variable_name="cumulative carbon emissions"
 32        ),
 33        TimeSeriesOutcome(
 34            "carbon_emissions_reduction_fraction",
 35            variable_name="carbon emissions reduction fraction",
 36        ),
 37        TimeSeriesOutcome("fraction_renewables", variable_name="fraction renewables"),
 38        TimeSeriesOutcome("average_total_costs", variable_name="average total costs"),
 39        TimeSeriesOutcome("total_costs_of_electricity", variable_name="total costs of electricity"),
 40    ]
 41
 42    model.uncertainties = [
 43        RealParameter(
 44            "demand_fuel_price_elasticity_factor",
 45            0,
 46            0.5,
 47            variable_name="demand fuel price elasticity factor",
 48        ),
 49        RealParameter(
 50            "economic_lifetime_biomass", 30, 50, variable_name="economic lifetime biomass"
 51        ),
 52        RealParameter("economic_lifetime_coal", 30, 50, variable_name="economic lifetime coal"),
 53        RealParameter("economic_lifetime_gas", 25, 40, variable_name="economic lifetime gas"),
 54        RealParameter("economic_lifetime_igcc", 30, 50, variable_name="economic lifetime igcc"),
 55        RealParameter("economic_lifetime_ngcc", 25, 40, variable_name="economic lifetime ngcc"),
 56        RealParameter(
 57            "economic_lifetime_nuclear", 50, 70, variable_name="economic lifetime nuclear"
 58        ),
 59        RealParameter("economic_lifetime_pv", 20, 30, variable_name="economic lifetime pv"),
 60        RealParameter("economic_lifetime_wind", 20, 30, variable_name="economic lifetime wind"),
 61        RealParameter("economic_lifetime_hydro", 50, 70, variable_name="economic lifetime hydro"),
 62        RealParameter(
 63            "uncertainty_initial_gross_fuel_costs",
 64            0.5,
 65            1.5,
 66            variable_name="uncertainty initial gross fuel costs",
 67        ),
 68        RealParameter(
 69            "investment_proportionality_constant",
 70            0.5,
 71            4,
 72            variable_name="investment proportionality constant",
 73        ),
 74        RealParameter(
 75            "investors_desired_excess_capacity_investment",
 76            0.2,
 77            2,
 78            variable_name="investors desired excess capacity investment",
 79        ),
 80        RealParameter(
 81            "price_demand_elasticity_factor",
 82            -0.07,
 83            -0.001,
 84            variable_name="price demand elasticity factor",
 85        ),
 86        RealParameter(
 87            "price_volatility_global_resource_markets",
 88            0.1,
 89            0.2,
 90            variable_name="price volatility global resource markets",
 91        ),
 92        RealParameter("progress_ratio_biomass", 0.85, 1, variable_name="progress ratio biomass"),
 93        RealParameter("progress_ratio_coal", 0.9, 1.05, variable_name="progress ratio coal"),
 94        RealParameter("progress_ratio_gas", 0.85, 1, variable_name="progress ratio gas"),
 95        RealParameter("progress_ratio_igcc", 0.9, 1.05, variable_name="progress ratio igcc"),
 96        RealParameter("progress_ratio_ngcc", 0.85, 1, variable_name="progress ratio ngcc"),
 97        RealParameter("progress_ratio_nuclear", 0.9, 1.05, variable_name="progress ratio nuclear"),
 98        RealParameter("progress_ratio_pv", 0.75, 0.9, variable_name="progress ratio pv"),
 99        RealParameter("progress_ratio_wind", 0.85, 1, variable_name="progress ratio wind"),
100        RealParameter("progress_ratio_hydro", 0.9, 1.05, variable_name="progress ratio hydro"),
101        RealParameter(
102            "starting_construction_time", 0.1, 3, variable_name="starting construction time"
103        ),
104        RealParameter(
105            "time_of_nuclear_power_plant_ban",
106            2013,
107            2100,
108            variable_name="time of nuclear power plant ban",
109        ),
110        RealParameter(
111            "weight_factor_carbon_abatement", 1, 10, variable_name="weight factor carbon abatement"
112        ),
113        RealParameter(
114            "weight_factor_marginal_investment_costs",
115            1,
116            10,
117            variable_name="weight factor marginal investment costs",
118        ),
119        RealParameter(
120            "weight_factor_technological_familiarity",
121            1,
122            10,
123            variable_name="weight factor technological familiarity",
124        ),
125        RealParameter(
126            "weight_factor_technological_growth_potential",
127            1,
128            10,
129            variable_name="weight factor technological growth potential",
130        ),
131        RealParameter(
132            "maximum_battery_storage_uncertainty_constant",
133            0.2,
134            3,
135            variable_name="maximum battery storage uncertainty constant",
136        ),
137        RealParameter(
138            "maximum_no_storage_penetration_rate_wind",
139            0.2,
140            0.6,
141            variable_name="maximum no storage penetration rate wind",
142        ),
143        RealParameter(
144            "maximum_no_storage_penetration_rate_pv",
145            0.1,
146            0.4,
147            variable_name="maximum no storage penetration rate pv",
148        ),
149        CategoricalParameter(
150            "SWITCH_lookup_curve_TGC", (1, 2, 3, 4), variable_name="SWITCH lookup curve TGC"
151        ),
152        CategoricalParameter(
153            "SWTICH_preference_carbon_curve", (1, 2), variable_name="SWTICH preference carbon curve"
154        ),
155        CategoricalParameter(
156            "SWITCH_economic_growth", (1, 2, 3, 4, 5, 6), variable_name="SWITCH economic growth"
157        ),
158        CategoricalParameter(
159            "SWITCH_electrification_rate",
160            (1, 2, 3, 4, 5, 6),
161            variable_name="SWITCH electrification rate",
162        ),
163        CategoricalParameter(
164            "SWITCH_Market_price_determination",
165            (1, 2),
166            variable_name="SWITCH Market price determination",
167        ),
168        CategoricalParameter(
169            "SWITCH_physical_limits", (1, 2), variable_name="SWITCH physical limits"
170        ),
171        CategoricalParameter(
172            "SWITCH_low_reserve_margin_price_markup",
173            (1, 2, 3, 4),
174            variable_name="SWITCH low reserve margin price markup",
175        ),
176        CategoricalParameter(
177            "SWITCH_interconnection_capacity_expansion",
178            (1, 2, 3, 4),
179            variable_name="SWITCH interconnection capacity expansion",
180        ),
181        CategoricalParameter(
182            "SWITCH_storage_for_intermittent_supply",
183            (1, 2, 3, 4, 5, 6, 7),
184            variable_name="SWITCH storage for intermittent supply",
185        ),
186        CategoricalParameter("SWITCH_carbon_cap", (1, 2, 3), variable_name="SWITCH carbon cap"),
187        CategoricalParameter(
188            "SWITCH_TGC_obligation_curve", (1, 2, 3), variable_name="SWITCH TGC obligation curve"
189        ),
190        CategoricalParameter(
191            "SWITCH_carbon_price_determination",
192            (1, 2, 3),
193            variable_name="SWITCH carbon price determination",
194        ),
195    ]
196    return model
197
198
199if __name__ == "__main__":
200    ema_logging.log_to_stderr(ema_logging.INFO)
201    model = get_energy_model()
202
203    policies = [
204        Policy("no policy", model_file="RB_V25_ets_1_extended_outcomes.vpm"),
205        Policy("static policy", model_file="ETSPolicy.vpm"),
206        Policy(
207            "adaptive policy",
208            model_file="RB_V25_ets_1_policy_modified_adaptive_extended_outcomes.vpm",
209        ),
210    ]
211
212    n = 100000
213    with MultiprocessingEvaluator(model) as evaluator:
214        experiments, outcomes = evaluator.perform_experiments(n, policies=policies)
215    #
216    # outcomes.pop("TIME")
217    # results = experiments, outcomes
218    # save_results(results, f"./data/{n}_lhs.tar.gz")

example_vensim_flu.py

  1"""
  2Created on 20 May, 2011
  3
  4This module shows how you can use vensim models directly
  5instead of coding the model in Python. The underlying case
  6is the same as used in fluExample
  7
  8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
  9                epruyt <e.pruyt (at) tudelft (dot) nl>
 10"""
 11
 12import numpy as np
 13
 14from ema_workbench import (
 15    RealParameter,
 16    TimeSeriesOutcome,
 17    ema_logging,
 18    ScalarOutcome,
 19    perform_experiments,
 20    Policy,
 21    save_results,
 22)
 23from ema_workbench.connectors.vensim import VensimModel
 24
 25if __name__ == "__main__":
 26    ema_logging.log_to_stderr(ema_logging.INFO)
 27
 28    model = VensimModel("fluCase", wd=r"./models/flu", model_file=r"FLUvensimV1basecase.vpm")
 29
 30    # outcomes
 31    model.outcomes = [
 32        TimeSeriesOutcome(
 33            "deceased_population_region_1", variable_name="deceased population region 1"
 34        ),
 35        TimeSeriesOutcome("infected_fraction_R1", variable_name="infected fraction R1"),
 36        ScalarOutcome(
 37            "max_infection_fraction", variable_name="infected fraction R1", function=np.max
 38        ),
 39    ]
 40
 41    # Plain Parametric Uncertainties
 42    model.uncertainties = [
 43        RealParameter(
 44            "additional_seasonal_immune_population_fraction_R1",
 45            0,
 46            0.5,
 47            variable_name="additional seasonal immune population fraction R1",
 48        ),
 49        RealParameter(
 50            "additional_seasonal_immune_population_fraction_R2",
 51            0,
 52            0.5,
 53            variable_name="additional seasonal immune population fraction R2",
 54        ),
 55        RealParameter(
 56            "fatality_ratio_region_1", 0.0001, 0.1, variable_name="fatality ratio region 1"
 57        ),
 58        RealParameter(
 59            "fatality_rate_region_2", 0.0001, 0.1, variable_name="fatality rate region 2"
 60        ),
 61        RealParameter(
 62            "initial_immune_fraction_of_the_population_of_region_1",
 63            0,
 64            0.5,
 65            variable_name="initial immune fraction of the population of region 1",
 66        ),
 67        RealParameter(
 68            "initial_immune_fraction_of_the_population_of_region_2",
 69            0,
 70            0.5,
 71            variable_name="initial immune fraction of the population of region 2",
 72        ),
 73        RealParameter(
 74            "normal_interregional_contact_rate",
 75            0,
 76            0.9,
 77            variable_name="normal interregional contact rate",
 78        ),
 79        RealParameter(
 80            "permanent_immune_population_fraction_R1",
 81            0,
 82            0.5,
 83            variable_name="permanent immune population fraction R1",
 84        ),
 85        RealParameter(
 86            "permanent_immune_population_fraction_R2",
 87            0,
 88            0.5,
 89            variable_name="permanent immune population fraction R2",
 90        ),
 91        RealParameter("recovery_time_region_1", 0.1, 0.75, variable_name="recovery time region 1"),
 92        RealParameter("recovery_time_region_2", 0.1, 0.75, variable_name="recovery time region 2"),
 93        RealParameter(
 94            "susceptible_to_immune_population_delay_time_region_1",
 95            0.5,
 96            2,
 97            variable_name="susceptible to immune population delay time region 1",
 98        ),
 99        RealParameter(
100            "susceptible_to_immune_population_delay_time_region_2",
101            0.5,
102            2,
103            variable_name="susceptible to immune population delay time region 2",
104        ),
105        RealParameter(
106            "root_contact_rate_region_1", 0.01, 5, variable_name="root contact rate region 1"
107        ),
108        RealParameter(
109            "root_contact_ratio_region_2", 0.01, 5, variable_name="root contact ratio region 2"
110        ),
111        RealParameter(
112            "infection_ratio_region_1", 0, 0.15, variable_name="infection ratio region 1"
113        ),
114        RealParameter("infection_rate_region_2", 0, 0.15, variable_name="infection rate region 2"),
115        RealParameter(
116            "normal_contact_rate_region_1", 10, 100, variable_name="normal contact rate region 1"
117        ),
118        RealParameter(
119            "normal_contact_rate_region_2", 10, 200, variable_name="normal contact rate region 2"
120        ),
121    ]
122
123    # add policies
124    policies = [
125        Policy("no policy", model_file=r"FLUvensimV1basecase.vpm"),
126        Policy("static policy", model_file=r"FLUvensimV1static.vpm"),
127        Policy("adaptive policy", model_file=r"FLUvensimV1dynamic.vpm"),
128    ]
129
130    results = perform_experiments(model, 1000, policies=policies)
131    save_results(results, "./data/1000 flu cases with policies.tar.gz")

example_vensim_lookup.py

  1"""
  2Created on Oct 1, 2012
  3
  4This is a simple example of the lookup uncertainty provided for
  5use in conjunction with vensim models. This example is largely based on
  6`Eker et al. (2014) <https://onlinelibrary.wiley.com/doi/10.1002/sdr.1518/suppinfo>`_
  7
  8@author: sibeleker
  9@author: jhkwakkel
 10"""
 11
 12import matplotlib.pyplot as plt
 13
 14from ema_workbench import TimeSeriesOutcome, perform_experiments, ema_logging
 15from ema_workbench.analysis import lines, Density
 16from ema_workbench.connectors.vensim import LookupUncertainty, VensimModel
 17
 18
 19class Burnout(VensimModel):
 20    model_file = r"\BURNOUT.vpm"
 21    outcomes = [
 22        TimeSeriesOutcome("accomplishments_to_date", variable_name="Accomplishments to Date"),
 23        TimeSeriesOutcome("energy_level", variable_name="Energy Level"),
 24        TimeSeriesOutcome("hours_worked_per_week", variable_name="Hours Worked Per Week"),
 25        TimeSeriesOutcome("accomplishments_per_hour", variable_name="accomplishments per hour"),
 26    ]
 27
 28    def __init__(self, working_directory, name):
 29        super().__init__(working_directory, name)
 30
 31        self.uncertainties = [
 32            LookupUncertainty(
 33                "hearne2",
 34                [(-1, 3), (-2, 1), (0, 0.9), (0.1, 1), (0.99, 1.01), (0.99, 1.01)],
 35                "accomplishments per hour lookup",
 36                self,
 37                0,
 38                1,
 39            ),
 40            LookupUncertainty(
 41                "hearne2",
 42                [(-0.75, 0.75), (-0.75, 0.75), (0, 1.5), (0.1, 1.6), (-0.3, 1.5), (0.25, 2.5)],
 43                "fractional change in expectations from perceived adequacy lookup",
 44                self,
 45                -1,
 46                1,
 47            ),
 48            LookupUncertainty(
 49                "hearne2",
 50                [(-2, 2), (-1, 2), (0, 1.5), (0.1, 1.6), (0.5, 2), (0.5, 2)],
 51                "effect of perceived adequacy on energy drain lookup",
 52                self,
 53                0,
 54                10,
 55            ),
 56            LookupUncertainty(
 57                "hearne2",
 58                [(-2, 2), (-1, 2), (0, 1.5), (0.1, 1.6), (0.5, 1.5), (0.1, 2)],
 59                "effect of perceived adequacy of hours worked lookup",
 60                self,
 61                0,
 62                2.5,
 63            ),
 64            LookupUncertainty(
 65                "hearne2",
 66                [(-1, 1), (-1, 1), (0, 0.9), (0.1, 1), (0.5, 1.5), (1, 1.5)],
 67                "effect of energy levels on hours worked lookup",
 68                self,
 69                0,
 70                1.5,
 71            ),
 72            LookupUncertainty(
 73                "hearne2",
 74                [(-1, 1), (-1, 1), (0, 0.9), (0.1, 1), (0.5, 1.5), (1, 1.5)],
 75                "effect of high energy on further recovery lookup",
 76                self,
 77                0,
 78                1.25,
 79            ),
 80            LookupUncertainty(
 81                "hearne2",
 82                [(-2, 2), (-1, 1), (0, 100), (20, 120), (0.5, 1.5), (0.5, 2)],
 83                "effect of hours worked on energy recovery lookup",
 84                self,
 85                0,
 86                1.5,
 87            ),
 88            LookupUncertainty(
 89                "approximation",
 90                [(-0.5, 0.35), (3, 5), (1, 10), (0.2, 0.4), (0, 120)],
 91                "effect of hours worked on energy drain lookup",
 92                self,
 93                0,
 94                3,
 95            ),
 96            LookupUncertainty(
 97                "hearne1",
 98                [(0, 1), (0, 0.15), (1, 1.5), (0.75, 1.25)],
 99                "effect of low energy on further depletion lookup",
100                self,
101                0,
102                1,
103            ),
104        ]
105
106        self._delete_lookup_uncertainties()
107
108
109if __name__ == "__main__":
110    ema_logging.log_to_stderr(ema_logging.INFO)
111    model = Burnout(r"./models/burnout", "burnout")
112
113    # run policy with old cases
114    results = perform_experiments(model, 100)
115    lines(results, "Energy Level", density=Density.BOXPLOT)
116    plt.show()

example_vensim_no_policy_flu.py

  1"""
  2Created on 20 May, 2011
  3
  4This module shows how you can use vensim models directly
  5instead of coding the model in Python. The underlying case
  6is the same as used in fluExample
  7
  8.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
  9                epruyt <e.pruyt (at) tudelft (dot) nl>
 10"""
 11
 12from ema_workbench import (
 13    RealParameter,
 14    TimeSeriesOutcome,
 15    ema_logging,
 16    perform_experiments,
 17    MultiprocessingEvaluator,
 18    save_results,
 19)
 20
 21from ema_workbench.connectors.vensim import VensimModel
 22
 23if __name__ == "__main__":
 24    ema_logging.log_to_stderr(ema_logging.INFO)
 25
 26    model = VensimModel("fluCase", wd=r"./models/flu", model_file=r"FLUvensimV1basecase.vpm")
 27
 28    # outcomes
 29    model.outcomes = [
 30        TimeSeriesOutcome(
 31            "deceased_population_region_1", variable_name="deceased population region 1"
 32        ),
 33        TimeSeriesOutcome("infected_fraction_R1", variable_name="infected fraction R1"),
 34    ]
 35
 36    # Plain Parametric Uncertainties
 37    model.uncertainties = [
 38        RealParameter(
 39            "additional_seasonal_immune_population_fraction_R1",
 40            0,
 41            0.5,
 42            variable_name="additional seasonal immune population fraction R1",
 43        ),
 44        RealParameter(
 45            "additional_seasonal_immune_population_fraction_R2",
 46            0,
 47            0.5,
 48            variable_name="additional seasonal immune population fraction R2",
 49        ),
 50        RealParameter(
 51            "fatality_ratio_region_1", 0.0001, 0.1, variable_name="fatality ratio region 1"
 52        ),
 53        RealParameter(
 54            "fatality_rate_region_2", 0.0001, 0.1, variable_name="fatality rate region 2"
 55        ),
 56        RealParameter(
 57            "initial_immune_fraction_of_the_population_of_region_1",
 58            0,
 59            0.5,
 60            variable_name="initial immune fraction of the population of region 1",
 61        ),
 62        RealParameter(
 63            "initial_immune_fraction_of_the_population_of_region_2",
 64            0,
 65            0.5,
 66            variable_name="initial immune fraction of the population of region 2",
 67        ),
 68        RealParameter(
 69            "normal_interregional_contact_rate",
 70            0,
 71            0.9,
 72            variable_name="normal interregional contact rate",
 73        ),
 74        RealParameter(
 75            "permanent_immune_population_fraction_R1",
 76            0,
 77            0.5,
 78            variable_name="permanent immune population fraction R1",
 79        ),
 80        RealParameter(
 81            "permanent_immune_population_fraction_R2",
 82            0,
 83            0.5,
 84            variable_name="permanent immune population fraction R2",
 85        ),
 86        RealParameter("recovery_time_region_1", 0.1, 0.75, variable_name="recovery time region 1"),
 87        RealParameter("recovery_time_region_2", 0.1, 0.75, variable_name="recovery time region 2"),
 88        RealParameter(
 89            "susceptible_to_immune_population_delay_time_region_1",
 90            0.5,
 91            2,
 92            variable_name="susceptible to immune population delay time region 1",
 93        ),
 94        RealParameter(
 95            "susceptible_to_immune_population_delay_time_region_2",
 96            0.5,
 97            2,
 98            variable_name="susceptible to immune population delay time region 2",
 99        ),
100        RealParameter(
101            "root_contact_rate_region_1", 0.01, 5, variable_name="root contact rate region 1"
102        ),
103        RealParameter(
104            "root_contact_ratio_region_2", 0.01, 5, variable_name="root contact ratio region 2"
105        ),
106        RealParameter(
107            "infection_ratio_region_1", 0, 0.15, variable_name="infection ratio region 1"
108        ),
109        RealParameter("infection_rate_region_2", 0, 0.15, variable_name="infection rate region 2"),
110        RealParameter(
111            "normal_contact_rate_region_1", 10, 100, variable_name="normal contact rate region 1"
112        ),
113        RealParameter(
114            "normal_contact_rate_region_2", 10, 200, variable_name="normal contact rate region 2"
115        ),
116    ]
117
118    nr_experiments = 1000
119    with MultiprocessingEvaluator(model) as evaluator:
120        results = perform_experiments(model, nr_experiments, evaluator=evaluator)
121
122    save_results(results, "./data/1000 flu cases no policy.tar.gz")

example_vensim_scarcity.py

  1"""
  2Created on 8 mrt. 2011
  3
  4.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
  5                epruyt <e.pruyt (at) tudelft (dot) nl>
  6"""
  7
  8from math import exp
  9
 10from ema_workbench.connectors.vensim import VensimModel
 11from ema_workbench.em_framework import (
 12    RealParameter,
 13    CategoricalParameter,
 14    TimeSeriesOutcome,
 15    perform_experiments,
 16)
 17from ema_workbench.util import ema_logging
 18
 19
 20class ScarcityModel(VensimModel):
 21    def returnsToScale(self, x, speed, scale):
 22        return (x * 1000, scale * 1 / (1 + exp(-1 * speed * (x - 50))))
 23
 24    def approxLearning(self, x, speed, scale, start):
 25        x = x - start
 26        loc = 1 - scale
 27        a = (x * 10000, scale * 1 / (1 + exp(speed * x)) + loc)
 28        return a
 29
 30    def f(self, x, speed, loc):
 31        return (x / 10, loc * 1 / (1 + exp(speed * x)))
 32
 33    def priceSubstite(self, x, speed, begin, end):
 34        scale = 2 * end
 35        start = begin - scale / 2
 36
 37        return (x + 2000, scale * 1 / (1 + exp(-1 * speed * x)) + start)
 38
 39    def run_model(self, scenario, policy):
 40        """Method for running an instantiated model structure"""
 41        kwargs = scenario
 42        loc = kwargs.pop("lookup_shortage_loc")
 43        speed = kwargs.pop("lookup_shortage_speed")
 44        lookup = [self.f(x / 10, speed, loc) for x in range(0, 100)]
 45        kwargs["shortage price effect lookup"] = lookup
 46
 47        speed = kwargs.pop("lookup_price_substitute_speed")
 48        begin = kwargs.pop("lookup_price_substitute_begin")
 49        end = kwargs.pop("lookup_price_substitute_end")
 50        lookup = [self.priceSubstite(x, speed, begin, end) for x in range(0, 100, 10)]
 51        kwargs["relative price substitute lookup"] = lookup
 52
 53        scale = kwargs.pop("lookup_returns_to_scale_speed")
 54        speed = kwargs.pop("lookup_returns_to_scale_scale")
 55        lookup = [self.returnsToScale(x, speed, scale) for x in range(0, 101, 10)]
 56        kwargs["returns to scale lookup"] = lookup
 57
 58        scale = kwargs.pop("lookup_approximated_learning_speed")
 59        speed = kwargs.pop("lookup_approximated_learning_scale")
 60        start = kwargs.pop("lookup_approximated_learning_start")
 61        lookup = [self.approxLearning(x, speed, scale, start) for x in range(0, 101, 10)]
 62        kwargs["approximated learning effect lookup"] = lookup
 63
 64        super().run_model(kwargs, policy)
 65
 66
 67if __name__ == "__main__":
 68    ema_logging.log_to_stderr(ema_logging.DEBUG)
 69
 70    model = ScarcityModel("scarcity", wd="./models/scarcity", model_file="MetalsEMA.vpm")
 71
 72    model.outcomes = [
 73        TimeSeriesOutcome("relative_market_price", variable_name="relative market price"),
 74        TimeSeriesOutcome("supply_demand_ratio", variable_name="supply demand ratio"),
 75        TimeSeriesOutcome("real_annual_demand", variable_name="real annual demand"),
 76        TimeSeriesOutcome(
 77            "produced_of_intrinsically_demanded", variable_name="produced of intrinsically demanded"
 78        ),
 79        TimeSeriesOutcome("supply", variable_name="supply"),
 80        TimeSeriesOutcome(
 81            "Installed_Recycling_Capacity", variable_name="Installed Recycling Capacity"
 82        ),
 83        TimeSeriesOutcome(
 84            "Installed_Extraction_Capacity", variable_name="Installed Extraction Capacity"
 85        ),
 86    ]
 87
 88    model.uncertainties = [
 89        RealParameter(
 90            "price_elasticity_of_demand", 0, 0.5, variable_name="price elasticity of demand"
 91        ),
 92        RealParameter(
 93            "fraction_of_maximum_extraction_capacity_used",
 94            0.6,
 95            1.2,
 96            variable_name="fraction of maximum extraction capacity used",
 97        ),
 98        RealParameter(
 99            "initial_average_recycling_cost", 1, 4, variable_name="initial average recycling cost"
100        ),
101        RealParameter(
102            "exogenously_planned_extraction_capacity",
103            0,
104            15000,
105            variable_name="exogenously planned extraction capacity",
106        ),
107        RealParameter(
108            "absolute_recycling_loss_fraction",
109            0.1,
110            0.5,
111            variable_name="absolute recycling loss fraction",
112        ),
113        RealParameter("normal_profit_margin", 0, 0.4, variable_name="normal profit margin"),
114        RealParameter(
115            "initial_annual_supply", 100000, 120000, variable_name="initial annual supply"
116        ),
117        RealParameter("initial_in_goods", 1500000, 2500000, variable_name="initial in goods"),
118        RealParameter(
119            "average_construction_time_extraction_capacity",
120            1,
121            10,
122            variable_name="average construction time extraction capacity",
123        ),
124        RealParameter(
125            "average_lifetime_extraction_capacity",
126            20,
127            40,
128            variable_name="average lifetime extraction capacity",
129        ),
130        RealParameter(
131            "average_lifetime_recycling_capacity",
132            20,
133            40,
134            variable_name="average lifetime recycling capacity",
135        ),
136        RealParameter(
137            "initial_extraction_capacity_under_construction",
138            5000,
139            20000,
140            variable_name="initial extraction capacity under construction",
141        ),
142        RealParameter(
143            "initial_recycling_capacity_under_construction",
144            5000,
145            20000,
146            variable_name="initial recycling capacity under construction",
147        ),
148        RealParameter(
149            "initial_recycling_infrastructure",
150            5000,
151            20000,
152            variable_name="initial recycling infrastructure",
153        ),
154        # order of delay
155        CategoricalParameter(
156            "order_in_goods_delay", (1, 4, 10, 1000), variable_name="order in goods delay"
157        ),
158        CategoricalParameter(
159            "order_recycling_capacity_delay",
160            (1, 4, 10),
161            variable_name="order recycling capacity delay",
162        ),
163        CategoricalParameter(
164            "order_extraction_capacity_delay",
165            (1, 4, 10),
166            variable_name="order extraction capacity delay",
167        ),
168        # uncertainties associated with lookups
169        RealParameter("lookup_shortage_loc", 20, 50, variable_name="lookup shortage loc"),
170        RealParameter("lookup_shortage_speed", 1, 5, variable_name="lookup shortage speed"),
171        RealParameter(
172            "lookup_price_substitute_speed", 0.1, 0.5, variable_name="lookup price substitute speed"
173        ),
174        RealParameter(
175            "lookup_price_substitute_begin", 3, 7, variable_name="lookup price substitute begin"
176        ),
177        RealParameter(
178            "lookup_price_substitute_end", 15, 25, variable_name="lookup price substitute end"
179        ),
180        RealParameter(
181            "lookup_returns_to_scale_speed",
182            0.01,
183            0.2,
184            variable_name="lookup returns to scale speed",
185        ),
186        RealParameter(
187            "lookup_returns_to_scale_scale", 0.3, 0.7, variable_name="lookup returns to scale scale"
188        ),
189        RealParameter(
190            "lookup_approximated_learning_speed",
191            0.01,
192            0.2,
193            variable_name="lookup approximated learning speed",
194        ),
195        RealParameter(
196            "lookup_approximated_learning_scale",
197            0.3,
198            0.6,
199            variable_name="lookup approximated learning scale",
200        ),
201        RealParameter(
202            "lookup_approximated_learning_start",
203            30,
204            60,
205            variable_name="lookup approximated learning start",
206        ),
207    ]
208
209    results = perform_experiments(model, 1000)

feature_scoring_flu.py

 1"""
 2Created on 30 Oct 2018
 3
 4@author: jhkwakkel
 5"""
 6
 7import matplotlib.pyplot as plt
 8import numpy as np
 9import seaborn as sns
10
11from ema_workbench import ema_logging, load_results
12from ema_workbench.analysis import feature_scoring
13
14ema_logging.log_to_stderr(level=ema_logging.INFO)
15
16# load data
17fn = r"./data/1000 flu cases with policies.tar.gz"
18x, outcomes = load_results(fn)
19
20# we have timeseries so we need scalars
21y = {
22    "deceased population": outcomes["deceased_population_region_1"][:, -1],
23    "max. infected fraction": np.max(outcomes["infected_fraction_R1"], axis=1),
24}
25
26scores = feature_scoring.get_feature_scores_all(x, y)
27sns.heatmap(scores, annot=True, cmap="viridis")
28plt.show()

feature_scoring_flu_confidence.py

 1"""
 2Created on 30 Oct 2018
 3
 4@author: jhkwakkel
 5"""
 6
 7import matplotlib.pyplot as plt
 8import numpy as np
 9import pandas as pd
10import seaborn as sns
11
12from ema_workbench import ema_logging, load_results
13from ema_workbench.analysis.feature_scoring import get_ex_feature_scores, RuleInductionType
14
15ema_logging.log_to_stderr(level=ema_logging.INFO)
16
17# load data
18fn = r"./data/1000 flu cases no policy.tar.gz"
19x, outcomes = load_results(fn)
20
21x = x.drop(["model", "policy"], axis=1)
22y = np.max(outcomes["infected_fraction_R1"], axis=1)
23
24all_scores = []
25for i in range(100):
26    indices = np.random.choice(np.arange(0, x.shape[0]), size=x.shape[0])
27    selected_x = x.iloc[indices, :]
28    selected_y = y[indices]
29
30    scores = get_ex_feature_scores(selected_x, selected_y, mode=RuleInductionType.REGRESSION)[0]
31    all_scores.append(scores)
32all_scores = pd.concat(all_scores, axis=1, sort=False)
33
34sns.boxplot(data=all_scores.T)
35plt.show()

feature_scoring_flu_overtime.py

 1"""
 2Created on 30 Oct 2018
 3
 4@author: jhkwakkel
 5"""
 6
 7import matplotlib.pyplot as plt
 8import pandas as pd
 9import seaborn as sns
10
11from ema_workbench import ema_logging, load_results
12from ema_workbench.analysis import get_ex_feature_scores, RuleInductionType
13
14ema_logging.log_to_stderr(level=ema_logging.INFO)
15
16# load data
17fn = r"./data/1000 flu cases no policy.tar.gz"
18
19x, outcomes = load_results(fn)
20x = x.drop(["model", "policy"], axis=1)
21
22y = outcomes["deceased_population_region_1"]
23
24all_scores = []
25
26# we only want to show those uncertainties that are in the top 5
27# most sensitive parameters at any time step
28top_5 = set()
29for i in range(2, y.shape[1], 8):
30    data = y[:, i]
31    scores = get_ex_feature_scores(x, data, mode=RuleInductionType.REGRESSION)[0]
32    # add the top five for this time step to the set of top5s
33    top_5 |= set(scores.nlargest(5, 1).index.values)
34    scores = scores.rename(columns={1: outcomes["TIME"][0, i]})
35    all_scores.append(scores)
36
37all_scores = pd.concat(all_scores, axis=1, sort=False)
38all_scores = all_scores.loc[top_5, :]
39
40fig, ax = plt.subplots()
41sns.heatmap(all_scores, ax=ax, cmap="viridis")
42plt.show()

optimization_lake_model_dps.py

  1"""
  2This example replicates Quinn, J.D., Reed, P.M., Keller, K. (2017)
  3Direct policy search for robust multi-objective management of deeply
  4uncertain socio-ecological tipping points. Environmental Modelling &
  5Software 92, 125-141.
  6
  7It also show cases how the workbench can be used to apply the MORDM extension
  8suggested by Watson, A.A., Kasprzyk, J.R. (2017) Incorporating deeply uncertain
  9factors into the many objective search process. Environmental Modelling &
 10Software 89, 159-171.
 11
 12"""
 13
 14import math
 15
 16import numpy as np
 17from scipy.optimize import brentq
 18
 19from ema_workbench import (
 20    Model,
 21    RealParameter,
 22    ScalarOutcome,
 23    Constant,
 24    ema_logging,
 25    MultiprocessingEvaluator,
 26    CategoricalParameter,
 27    Scenario,
 28)
 29
 30from ema_workbench.em_framework.optimization import ArchiveLogger, EpsilonProgress
 31
 32
 33# Created on 1 Jun 2017
 34#
 35# .. codeauthor::jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 36
 37
 38def get_antropogenic_release(xt, c1, c2, r1, r2, w1):
 39    """
 40
 41    Parameters
 42    ----------
 43    xt : float
 44         pollution in lake at time t
 45    c1 : float
 46         center rbf 1
 47    c2 : float
 48         center rbf 2
 49    r1 : float
 50         ratius rbf 1
 51    r2 : float
 52         ratius rbf 2
 53    w1 : float
 54         weight of rbf 1
 55
 56    note:: w2 = 1 - w1
 57
 58    """
 59
 60    rule = w1 * (abs(xt - c1) / r1) ** 3 + (1 - w1) * (abs(xt - c2) / r2) ** 3
 61    at1 = max(rule, 0.01)
 62    at = min(at1, 0.1)
 63
 64    return at
 65
 66
 67def lake_problem(
 68    b=0.42,  # decay rate for P in lake (0.42 = irreversible)
 69    q=2.0,  # recycling exponent
 70    mean=0.02,  # mean of natural inflows
 71    stdev=0.001,  # future utility discount rate
 72    delta=0.98,  # standard deviation of natural inflows
 73    alpha=0.4,  # utility from pollution
 74    nsamples=100,  # Monte Carlo sampling of natural inflows
 75    myears=1,  # the runtime of the simulation model
 76    c1=0.25,
 77    c2=0.25,
 78    r1=0.5,
 79    r2=0.5,
 80    w1=0.5,
 81):
 82    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)
 83
 84    X = np.zeros(myears)
 85    average_daily_P = np.zeros(myears)
 86    reliability = 0.0
 87    inertia = 0
 88    utility = 0
 89
 90    for _ in range(nsamples):
 91        X[0] = 0.0
 92        decision = 0.1
 93
 94        decisions = np.zeros(myears)
 95        decisions[0] = decision
 96
 97        natural_inflows = np.random.lognormal(
 98            math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
 99            math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
100            size=myears,
101        )
102
103        for t in range(1, myears):
104            # here we use the decision rule
105            decision = get_antropogenic_release(X[t - 1], c1, c2, r1, r2, w1)
106            decisions[t] = decision
107
108            X[t] = (
109                (1 - b) * X[t - 1]
110                + X[t - 1] ** q / (1 + X[t - 1] ** q)
111                + decision
112                + natural_inflows[t - 1]
113            )
114            average_daily_P[t] += X[t] / nsamples
115
116        reliability += np.sum(X < Pcrit) / (nsamples * myears)
117        inertia += np.sum(np.absolute(np.diff(decisions) < 0.02)) / (nsamples * myears)
118        utility += np.sum(alpha * decisions * np.power(delta, np.arange(myears))) / nsamples
119    max_P = np.max(average_daily_P)
120
121    return max_P, utility, inertia, reliability
122
123
124if __name__ == "__main__":
125    ema_logging.log_to_stderr(ema_logging.INFO)
126
127    # instantiate the model
128    lake_model = Model("lakeproblem", function=lake_problem)
129    # specify uncertainties
130    lake_model.uncertainties = [
131        RealParameter("b", 0.1, 0.45),
132        RealParameter("q", 2.0, 4.5),
133        RealParameter("mean", 0.01, 0.05),
134        RealParameter("stdev", 0.001, 0.005),
135        RealParameter("delta", 0.93, 0.99),
136    ]
137
138    # set levers
139    lake_model.levers = [
140        RealParameter("c1", -2, 2),
141        RealParameter("c2", -2, 2),
142        RealParameter("r1", 0, 2),
143        RealParameter("r2", 0, 2),
144        CategoricalParameter("w1", np.linspace(0, 1, 10)),
145    ]
146    # specify outcomes
147    lake_model.outcomes = [
148        ScalarOutcome("max_P", kind=ScalarOutcome.MINIMIZE),
149        ScalarOutcome("utility", kind=ScalarOutcome.MAXIMIZE),
150        ScalarOutcome("inertia", kind=ScalarOutcome.MAXIMIZE),
151        ScalarOutcome("reliability", kind=ScalarOutcome.MAXIMIZE),
152    ]
153
154    # override some of the defaults of the model
155    lake_model.constants = [
156        Constant("alpha", 0.41),
157        Constant("nsamples", 100),
158        Constant("myears", 100),
159    ]
160
161    # reference is optional, but can be used to implement search for
162    # various user specified scenarios along the lines suggested by
163    # Watson and Kasprzyk (2017)
164    reference = Scenario("reference", b=0.4, q=2, mean=0.02, stdev=0.01)
165
166    convergence_metrics = [
167        ArchiveLogger(
168            "./data",
169            [l.name for l in lake_model.levers],
170            [o.name for o in lake_model.outcomes],
171            base_filename="lake_model_dps_archive.tar.gz",
172        ),
173        EpsilonProgress(),
174    ]
175
176    with MultiprocessingEvaluator(lake_model) as evaluator:
177        results, convergence = evaluator.optimize(
178            searchover="levers",
179            nfe=100000,
180            epsilons=[0.1] * len(lake_model.outcomes),
181            reference=reference,
182            convergence=convergence_metrics,
183        )

optimization_lake_model_intertemporal.py

  1"""
  2An example of the lake problem using the ema workbench.
  3
  4The model itself is adapted from the Rhodium example by Dave Hadka,
  5see https://gist.github.com/dhadka/a8d7095c98130d8f73bc
  6
  7"""
  8
  9import math
 10
 11import numpy as np
 12from scipy.optimize import brentq
 13
 14from ema_workbench import (
 15    Model,
 16    RealParameter,
 17    ScalarOutcome,
 18    ema_logging,
 19    MultiprocessingEvaluator,
 20    Constraint,
 21)
 22from ema_workbench.em_framework.optimization import HyperVolume, EpsilonProgress
 23
 24
 25def lake_problem(
 26    b=0.42,
 27    q=2.0,
 28    mean=0.02,
 29    stdev=0.0017,
 30    delta=0.98,
 31    alpha=0.4,
 32    nsamples=100,
 33    l0=0,
 34    l1=0,
 35    l2=0,
 36    l3=0,
 37    l4=0,
 38    l5=0,
 39    l6=0,
 40    l7=0,
 41    l8=0,
 42    l9=0,
 43    l10=0,
 44    l11=0,
 45    l12=0,
 46    l13=0,
 47    l14=0,
 48    l15=0,
 49    l16=0,
 50    l17=0,
 51    l18=0,
 52    l19=0,
 53    l20=0,
 54    l21=0,
 55    l22=0,
 56    l23=0,
 57    l24=0,
 58    l25=0,
 59    l26=0,
 60    l27=0,
 61    l28=0,
 62    l29=0,
 63    l30=0,
 64    l31=0,
 65    l32=0,
 66    l33=0,
 67    l34=0,
 68    l35=0,
 69    l36=0,
 70    l37=0,
 71    l38=0,
 72    l39=0,
 73    l40=0,
 74    l41=0,
 75    l42=0,
 76    l43=0,
 77    l44=0,
 78    l45=0,
 79    l46=0,
 80    l47=0,
 81    l48=0,
 82    l49=0,
 83    l50=0,
 84    l51=0,
 85    l52=0,
 86    l53=0,
 87    l54=0,
 88    l55=0,
 89    l56=0,
 90    l57=0,
 91    l58=0,
 92    l59=0,
 93    l60=0,
 94    l61=0,
 95    l62=0,
 96    l63=0,
 97    l64=0,
 98    l65=0,
 99    l66=0,
100    l67=0,
101    l68=0,
102    l69=0,
103    l70=0,
104    l71=0,
105    l72=0,
106    l73=0,
107    l74=0,
108    l75=0,
109    l76=0,
110    l77=0,
111    l78=0,
112    l79=0,
113    l80=0,
114    l81=0,
115    l82=0,
116    l83=0,
117    l84=0,
118    l85=0,
119    l86=0,
120    l87=0,
121    l88=0,
122    l89=0,
123    l90=0,
124    l91=0,
125    l92=0,
126    l93=0,
127    l94=0,
128    l95=0,
129    l96=0,
130    l97=0,
131    l98=0,
132    l99=0,
133):
134    decisions = np.array(
135        [
136            l0,
137            l1,
138            l2,
139            l3,
140            l4,
141            l5,
142            l6,
143            l7,
144            l8,
145            l9,
146            l10,
147            l11,
148            l12,
149            l13,
150            l14,
151            l15,
152            l16,
153            l17,
154            l18,
155            l19,
156            l20,
157            l21,
158            l22,
159            l23,
160            l24,
161            l25,
162            l26,
163            l27,
164            l28,
165            l29,
166            l30,
167            l31,
168            l32,
169            l33,
170            l34,
171            l35,
172            l36,
173            l37,
174            l38,
175            l39,
176            l40,
177            l41,
178            l42,
179            l43,
180            l44,
181            l45,
182            l46,
183            l47,
184            l48,
185            l49,
186            l50,
187            l51,
188            l52,
189            l53,
190            l54,
191            l55,
192            l56,
193            l57,
194            l58,
195            l59,
196            l60,
197            l61,
198            l62,
199            l63,
200            l64,
201            l65,
202            l66,
203            l67,
204            l68,
205            l69,
206            l70,
207            l71,
208            l72,
209            l73,
210            l74,
211            l75,
212            l76,
213            l77,
214            l78,
215            l79,
216            l80,
217            l81,
218            l82,
219            l83,
220            l84,
221            l85,
222            l86,
223            l87,
224            l88,
225            l89,
226            l90,
227            l91,
228            l92,
229            l93,
230            l94,
231            l95,
232            l96,
233            l97,
234            l98,
235            l99,
236        ]
237    )
238
239    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)
240    nvars = len(decisions)
241    X = np.zeros((nvars,))
242    average_daily_P = np.zeros((nvars,))
243    decisions = np.array(decisions)
244    reliability = 0.0
245
246    for _ in range(nsamples):
247        X[0] = 0.0
248
249        natural_inflows = np.random.lognormal(
250            math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
251            math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
252            size=nvars,
253        )
254
255        for t in range(1, nvars):
256            X[t] = (
257                (1 - b) * X[t - 1]
258                + X[t - 1] ** q / (1 + X[t - 1] ** q)
259                + decisions[t - 1]
260                + natural_inflows[t - 1]
261            )
262            average_daily_P[t] += X[t] / float(nsamples)
263
264        reliability += np.sum(X < Pcrit) / float(nsamples * nvars)
265
266    max_P = np.max(average_daily_P)
267    utility = np.sum(alpha * decisions * np.power(delta, np.arange(nvars)))
268    inertia = np.sum(np.abs(np.diff(decisions)) > 0.02) / float(nvars - 1)
269
270    return max_P, utility, inertia, reliability
271
272
273if __name__ == "__main__":
274    ema_logging.log_to_stderr(ema_logging.INFO)
275
276    # instantiate the model
277    lake_model = Model("lakeproblem", function=lake_problem)
278    lake_model.time_horizon = 100  # used to specify the number of timesteps
279
280    # specify uncertainties
281    lake_model.uncertainties = [
282        RealParameter("mean", 0.01, 0.05),
283        RealParameter("stdev", 0.001, 0.005),
284        RealParameter("b", 0.1, 0.45),
285        RealParameter("q", 2.0, 4.5),
286        RealParameter("delta", 0.93, 0.99),
287    ]
288
289    # set levers, one for each time step
290    lake_model.levers = [RealParameter(f"l{i}", 0, 0.1) for i in range(lake_model.time_horizon)]
291
292    # specify outcomes
293    # specify outcomes
294    lake_model.outcomes = [
295        ScalarOutcome("max_P", kind=ScalarOutcome.MINIMIZE, expected_range=(0, 5)),
296        ScalarOutcome("utility", kind=ScalarOutcome.MAXIMIZE, expected_range=(0, 2)),
297        ScalarOutcome("inertia", kind=ScalarOutcome.MAXIMIZE, expected_range=(0, 1)),
298        ScalarOutcome("reliability", kind=ScalarOutcome.MAXIMIZE, expected_range=(0, 1)),
299    ]
300
301    convergence_metrics = [EpsilonProgress()]
302
303    constraints = [
304        Constraint("max pollution", outcome_names="max_P", function=lambda x: max(0, x - 5))
305    ]
306
307    with MultiprocessingEvaluator(lake_model) as evaluator:
308        results, convergence = evaluator.optimize(
309            nfe=5000,
310            searchover="levers",
311            epsilons=[0.125, 0.05, 0.01, 0.01],
312            convergence=convergence_metrics,
313            constraints=constraints,
314        )

outputspace_exploration_lake_model.py

  1"""
  2An example of using output space exploration on the lake problem
  3
  4The lake problem itself is adapted from the Rhodium example by Dave Hadka,
  5see https://gist.github.com/dhadka/a8d7095c98130d8f73bc
  6
  7"""
  8
  9import math
 10
 11import numpy as np
 12from scipy.optimize import brentq
 13
 14from ema_workbench import (
 15    Model,
 16    RealParameter,
 17    ScalarOutcome,
 18    Constant,
 19    ema_logging,
 20    MultiprocessingEvaluator,
 21    Policy,
 22)
 23
 24from ema_workbench.em_framework.outputspace_exploration import OutputSpaceExploration
 25
 26
 27def lake_problem(
 28    b=0.42,  # decay rate for P in lake (0.42 = irreversible)
 29    q=2.0,  # recycling exponent
 30    mean=0.02,  # mean of natural inflows
 31    stdev=0.001,  # future utility discount rate
 32    delta=0.98,  # standard deviation of natural inflows
 33    alpha=0.4,  # utility from pollution
 34    nsamples=100,  # Monte Carlo sampling of natural inflows
 35    **kwargs,
 36):
 37    try:
 38        decisions = [kwargs[str(i)] for i in range(100)]
 39    except KeyError:
 40        decisions = [
 41            0,
 42        ] * 100
 43
 44    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)
 45    nvars = len(decisions)
 46    X = np.zeros((nvars,))
 47    average_daily_P = np.zeros((nvars,))
 48    decisions = np.array(decisions)
 49    reliability = 0.0
 50
 51    for _ in range(nsamples):
 52        X[0] = 0.0
 53
 54        natural_inflows = np.random.lognormal(
 55            math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
 56            math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
 57            size=nvars,
 58        )
 59
 60        for t in range(1, nvars):
 61            X[t] = (
 62                (1 - b) * X[t - 1]
 63                + X[t - 1] ** q / (1 + X[t - 1] ** q)
 64                + decisions[t - 1]
 65                + natural_inflows[t - 1]
 66            )
 67            average_daily_P[t] += X[t] / float(nsamples)
 68
 69        reliability += np.sum(X < Pcrit) / float(nsamples * nvars)
 70
 71    max_P = np.max(average_daily_P)
 72    utility = np.sum(alpha * decisions * np.power(delta, np.arange(nvars)))
 73    inertia = np.sum(np.absolute(np.diff(decisions)) < 0.02) / float(nvars - 1)
 74
 75    return max_P, utility, inertia, reliability
 76
 77
 78if __name__ == "__main__":
 79    ema_logging.log_to_stderr(ema_logging.INFO)
 80
 81    # instantiate the model
 82    lake_model = Model("lakeproblem", function=lake_problem)
 83    lake_model.time_horizon = 100
 84
 85    # specify uncertainties
 86    lake_model.uncertainties = [
 87        RealParameter("b", 0.1, 0.45),
 88        RealParameter("q", 2.0, 4.5),
 89        RealParameter("mean", 0.01, 0.05),
 90        RealParameter("stdev", 0.001, 0.005),
 91        RealParameter("delta", 0.93, 0.99),
 92    ]
 93
 94    # set levers, one for each time step
 95    lake_model.levers = [RealParameter(str(i), 0, 0.1) for i in range(lake_model.time_horizon)]
 96
 97    # specify outcomes
 98    # note that outcomes of kind INFO will be ignored
 99    lake_model.outcomes = [
100        ScalarOutcome("max_P", kind=ScalarOutcome.MAXIMIZE),
101        ScalarOutcome("utility", kind=ScalarOutcome.MAXIMIZE),
102        ScalarOutcome("inertia", kind=ScalarOutcome.MAXIMIZE),
103        ScalarOutcome("reliability", kind=ScalarOutcome.MAXIMIZE),
104    ]
105
106    # override some of the defaults of the model
107    lake_model.constants = [Constant("alpha", 0.41), Constant("nsamples", 150)]
108
109    # generate a reference policy
110    n_scenarios = 1000
111    reference = Policy("nopolicy", **{l.name: 0.02 for l in lake_model.levers})
112
113    # we are doing output space exploration given a reference
114    # policy, so we are exploring the output space over the uncertainties
115    # grid spec specifies the grid structure imposed on the output space
116    # each tuple is associated with an outcome. It gives the minimum
117    # maximum, and epsilon value.
118    with MultiprocessingEvaluator(lake_model) as evaluator:
119        res = evaluator.optimize(
120            algorithm=OutputSpaceExploration,
121            grid_spec=[
122                (0, 12, 0.5),
123                (0, 1, 0.05),
124                (0, 1, 0.1),
125                (0, 1, 0.1),
126            ],
127            nfe=1000,
128            searchover="uncertainties",
129            reference=reference,
130        )

plotting_envelopes_flu.py

 1"""
 2Created on Jul 8, 2014
 3
 4@author: jhkwakkel@tudelft.net
 5"""
 6
 7import matplotlib.pyplot as plt
 8
 9from ema_workbench import ema_logging, load_results
10from ema_workbench.analysis import envelopes, Density
11
12ema_logging.log_to_stderr(ema_logging.INFO)
13
14file_name = r"./data/1000 flu cases with policies.tar.gz"
15experiments, outcomes = load_results(file_name)
16
17# the plotting functions return the figure and a dict of axes
18fig, axes = envelopes(experiments, outcomes, group_by="policy", density=Density.KDE, fill=True)
19
20# we can access each of the axes and make changes
21for key, value in axes.items():
22    # the key is the name of the outcome for the normal plot
23    # and the name plus '_density' for the endstate distribution
24    if key.endswith("_density"):
25        value.set_xscale("log")
26
27plt.show()

plotting_envelopes_with_lines_flu.py

 1"""
 2Created on Jul 8, 2014
 3
 4@author: jhkwakkel@tudelft.net
 5"""
 6
 7import matplotlib.pyplot as plt
 8import numpy as np
 9
10from ema_workbench import ema_logging, load_results
11from ema_workbench.analysis import lines, Density
12
13ema_logging.log_to_stderr(ema_logging.INFO)
14
15file_name = r"./data/1000 flu cases with policies.tar.gz"
16experiments, outcomes = load_results(file_name)
17
18# let's specify a few scenarios that we want to show for
19# each of the policies
20scenario_ids = np.arange(0, 1000, 100)
21experiments_to_show = experiments["scenario_id"].isin(scenario_ids)
22
23# the plotting functions return the figure and a dict of axes
24fig, axes = lines(
25    experiments,
26    outcomes,
27    group_by="policy",
28    density=Density.KDE,
29    show_envelope=True,
30    experiments_to_show=experiments_to_show,
31)
32
33# we can access each of the axes and make changes
34for key, value in axes.items():
35    # the key is the name of the outcome for the normal plot
36    # and the name plus '_density' for the endstate distribution
37    if key.endswith("_density"):
38        value.set_xscale("log")
39
40plt.show()

plotting_kdeovertime_flu.py

 1"""
 2Created on Jul 8, 2014
 3
 4@author: jhkwakkel@tudelft.net
 5"""
 6
 7import matplotlib.pyplot as plt
 8
 9from ema_workbench import ema_logging, load_results
10from ema_workbench.analysis.plotting import kde_over_time
11
12ema_logging.log_to_stderr(ema_logging.INFO)
13
14# file_name = r'./data/1000 runs scarcity.tar.gz'
15file_name = "./data/1000 flu cases no policy.tar.gz"
16experiments, outcomes = load_results(file_name)
17
18# the plotting functions return the figure and a dict of axes
19fig, axes = kde_over_time(experiments, outcomes, log=True)
20
21plt.show()

plotting_lines_flu.py

 1"""
 2Created on Jul 8, 2014
 3
 4@author: jhkwakkel@tudelft.net
 5"""
 6
 7import matplotlib.pyplot as plt
 8
 9from ema_workbench import ema_logging, load_results
10from ema_workbench.analysis import lines, Density
11
12ema_logging.log_to_stderr(ema_logging.INFO)
13
14file_name = r"./data/1000 flu cases no policy.tar.gz"
15experiments, outcomes = load_results(file_name)
16
17# the plotting functions return the figure and a dict of axes
18fig, axes = lines(experiments, outcomes, density=Density.VIOLIN)
19
20# we can access each of the axes and make changes
21for key, value in axes.items():
22    # the key is the name of the outcome for the normal plot
23    # and the name plus '_density' for the endstate distribution
24    if key.endswith("_density"):
25        value.set_xscale("log")
26
27plt.show()

plotting_multiple_densities_flu.py

 1"""
 2Created on Jul 8, 2014
 3
 4@author: jhkwakkel@tudelft.net
 5"""
 6
 7import math
 8
 9import matplotlib.pyplot as plt
10
11from ema_workbench import ema_logging, load_results
12from ema_workbench.analysis import multiple_densities, Density
13
14ema_logging.log_to_stderr(ema_logging.INFO)
15
16file_name = "./data/1000 flu cases with policies.tar.gz"
17experiments, outcomes = load_results(file_name)
18
19# pick points in time for which we want to see a
20# density subplot
21time = outcomes["TIME"][0, :]
22times = time[1 :: math.ceil(time.shape[0] / 6)].tolist()
23
24multiple_densities(
25    experiments,
26    outcomes,
27    log=True,
28    points_in_time=times,
29    group_by="policy",
30    density=Density.KDE,
31    fill=True,
32)
33
34plt.show()

plotting_pairsplot_flu.py

 1"""
 2Created on 20 sep. 2011
 3
 4.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 5"""
 6
 7import matplotlib.pyplot as plt
 8import numpy as np
 9
10from ema_workbench import load_results, ema_logging
11from ema_workbench.analysis.pairs_plotting import pairs_lines, pairs_scatter, pairs_density
12
13ema_logging.log_to_stderr(level=ema_logging.DEFAULT_LEVEL)
14
15# load the data
16fh = "./data/1000 flu cases no policy.tar.gz"
17experiments, outcomes = load_results(fh)
18
19# transform the results to the required format
20# that is, we want to know the max peak and the casualties at the end of the
21# run
22tr = {}
23
24# get time and remove it from the dict
25time = outcomes.pop("TIME")
26
27for key, value in outcomes.items():
28    if key == "deceased_population_region_1":
29        tr[key] = value[:, -1]  # we want the end value
30    else:
31        # we want the maximum value of the peak
32        max_peak = np.max(value, axis=1)
33        tr["max peak"] = max_peak
34
35        # we want the time at which the maximum occurred
36        # the code here is a bit obscure, I don't know why the transpose
37        # of value is needed. This however does produce the appropriate results
38        logical = value.T == np.max(value, axis=1)
39        tr["time of max"] = time[logical.T]
40
41pairs_scatter(experiments, tr, filter_scalar=False)
42pairs_lines(experiments, outcomes)
43pairs_density(experiments, tr, filter_scalar=False)
44plt.show()

regional_sa_flu.py

 1""" A simple example of performing regional sensitivity analysis
 2
 3
 4"""
 5
 6import matplotlib.pyplot as plt
 7
 8from ema_workbench.analysis import regional_sa
 9from ema_workbench import ema_logging, load_results
10
11fn = "./data/1000 flu cases with policies.tar.gz"
12results = load_results(fn)
13x, outcomes = results
14
15y = outcomes["deceased population region 1"][:, -1] > 1000000
16
17fig = regional_sa.plot_cdfs(x, y)
18
19plt.show()

robust_optimization_lake_model_dps.py

  1"""
  2This example takes the direct policy search formulation of the lake problem as
  3found in Quinn et al (2017), but embeds in in a robust optimization.
  4
  5Quinn, J.D., Reed, P.M., Keller, K. (2017)
  6Direct policy search for robust multi-objective management of deeply
  7uncertain socio-ecological tipping points. Environmental Modelling &
  8Software 92, 125-141.
  9
 10
 11"""
 12
 13import math
 14
 15import numpy as np
 16from scipy.optimize import brentq
 17
 18from ema_workbench import (
 19    Model,
 20    RealParameter,
 21    ScalarOutcome,
 22    Constant,
 23    ema_logging,
 24    MultiprocessingEvaluator,
 25)
 26from ema_workbench.em_framework.samplers import sample_uncertainties
 27
 28# Created on 1 Jun 2017
 29#
 30# .. codeauthor::jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 31
 32__all__ = []
 33
 34
 35def get_antropogenic_release(xt, c1, c2, r1, r2, w1):
 36    """
 37
 38    Parameters
 39    ----------
 40    xt : float
 41         pollution in lake at time t
 42    c1 : float
 43         center rbf 1
 44    c2 : float
 45         center rbf 2
 46    r1 : float
 47         ratius rbf 1
 48    r2 : float
 49         ratius rbf 2
 50    w1 : float
 51         weight of rbf 1
 52
 53    note:: w2 = 1 - w1
 54
 55    """
 56
 57    rule = w1 * (abs(xt - c1 / r1)) ** 3 + (1 - w1) * (abs(xt - c2 / r2) ** 3)
 58    at1 = max(rule, 0.01)
 59    at = min(at1, 0.1)
 60
 61    return at
 62
 63
 64def lake_problem(
 65    b=0.42,  # decay rate for P in lake (0.42 = irreversible)
 66    q=2.0,  # recycling exponent
 67    mean=0.02,  # mean of natural inflows
 68    stdev=0.001,  # future utility discount rate
 69    delta=0.98,  # standard deviation of natural inflows
 70    alpha=0.4,  # utility from pollution
 71    nsamples=100,  # Monte Carlo sampling of natural inflows
 72    myears=1,  # the runtime of the simulation model
 73    c1=0.25,
 74    c2=0.25,
 75    r1=0.5,
 76    r2=0.5,
 77    w1=0.5,
 78):
 79    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)
 80
 81    X = np.zeros(myears)
 82    average_daily_P = np.zeros(myears)
 83    reliability = 0.0
 84    inertia = 0
 85    utility = 0
 86
 87    for _ in range(nsamples):
 88        X[0] = 0.0
 89        decision = 0.1
 90
 91        decisions = np.zeros(myears)
 92        decisions[0] = decision
 93
 94        natural_inflows = np.random.lognormal(
 95            math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
 96            math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
 97            size=myears,
 98        )
 99
100        for t in range(1, myears):
101            # here we use the decision rule
102            decision = get_antropogenic_release(X[t - 1], c1, c2, r1, r2, w1)
103            decisions[t] = decision
104
105            X[t] = (
106                (1 - b) * X[t - 1]
107                + X[t - 1] ** q / (1 + X[t - 1] ** q)
108                + decision
109                + natural_inflows[t - 1]
110            )
111            average_daily_P[t] += X[t] / nsamples
112
113        reliability += np.sum(X < Pcrit) / (nsamples * myears)
114        inertia += np.sum(np.absolute(np.diff(decisions) < 0.02)) / (nsamples * myears)
115        utility += np.sum(alpha * decisions * np.power(delta, np.arange(myears))) / nsamples
116    max_P = np.max(average_daily_P)
117
118    return max_P, utility, inertia, reliability
119
120
121if __name__ == "__main__":
122    ema_logging.log_to_stderr(ema_logging.INFO)
123
124    # instantiate the model
125    lake_model = Model("lakeproblem", function=lake_problem)
126    # specify uncertainties
127    lake_model.uncertainties = [
128        RealParameter("b", 0.1, 0.45),
129        RealParameter("q", 2.0, 4.5),
130        RealParameter("mean", 0.01, 0.05),
131        RealParameter("stdev", 0.001, 0.005),
132        RealParameter("delta", 0.93, 0.99),
133    ]
134
135    # set levers
136    lake_model.levers = [
137        RealParameter("c1", -2, 2),
138        RealParameter("c2", -2, 2),
139        RealParameter("r1", 0, 2),
140        RealParameter("r2", 0, 2),
141        RealParameter("w1", 0, 1),
142    ]
143
144    # specify outcomes
145    lake_model.outcomes = [
146        ScalarOutcome("max_P"),
147        ScalarOutcome("utility"),
148        ScalarOutcome("inertia"),
149        ScalarOutcome("reliability"),
150    ]
151
152    # override some of the defaults of the model
153    lake_model.constants = [
154        Constant("alpha", 0.41),
155        Constant("nsamples", 100),
156        Constant("myears", 100),
157    ]
158
159    # setup and execute the robust optimization
160    def signal_to_noise(data):
161        mean = np.mean(data)
162        std = np.std(data)
163        sn = mean / std
164        return sn
165
166    MAXIMIZE = ScalarOutcome.MAXIMIZE  # @UndefinedVariable
167    MINIMIZE = ScalarOutcome.MINIMIZE  # @UndefinedVariable
168    robustnes_functions = [
169        ScalarOutcome("mean p", kind=MINIMIZE, variable_name="max_P", function=np.mean),
170        ScalarOutcome("std p", kind=MINIMIZE, variable_name="max_P", function=np.std),
171        ScalarOutcome(
172            "sn reliability", kind=MAXIMIZE, variable_name="reliability", function=signal_to_noise
173        ),
174    ]
175    n_scenarios = 10
176    scenarios = sample_uncertainties(lake_model, n_scenarios)
177    nfe = 1000
178
179    with MultiprocessingEvaluator(lake_model) as evaluator:
180        evaluator.robust_optimize(
181            robustnes_functions,
182            scenarios,
183            nfe=nfe,
184            epsilons=[0.1] * len(robustnes_functions),
185            population_size=5,
186        )

sample_jointly_lake_model.py

  1"""
  2An example of the lake problem using the ema workbench. This example
  3illustrated how you can control more finely how samples are being generated.
  4In this particular case, we want to apply Sobol analysis over both the
  5uncertainties and levers at the same time.
  6
  7"""
  8
  9import math
 10
 11import numpy as np
 12import pandas as pd
 13from SALib.analyze import sobol
 14from scipy.optimize import brentq
 15
 16from ema_workbench import (
 17    Model,
 18    RealParameter,
 19    ScalarOutcome,
 20    Constant,
 21    ema_logging,
 22    MultiprocessingEvaluator,
 23    Scenario,
 24)
 25from ema_workbench.em_framework import get_SALib_problem, sample_parameters
 26from ema_workbench.em_framework import SobolSampler
 27
 28# from ema_workbench.em_framework.evaluators import Samplers
 29
 30
 31def get_antropogenic_release(xt, c1, c2, r1, r2, w1):
 32    """
 33
 34    Parameters
 35    ----------
 36    xt : float
 37         pollution in lake at time t
 38    c1 : float
 39         center rbf 1
 40    c2 : float
 41         center rbf 2
 42    r1 : float
 43         ratius rbf 1
 44    r2 : float
 45         ratius rbf 2
 46    w1 : float
 47         weight of rbf 1
 48
 49    note:: w2 = 1 - w1
 50
 51    """
 52
 53    rule = w1 * (abs(xt - c1) / r1) ** 3 + (1 - w1) * (abs(xt - c2) / r2) ** 3
 54    at1 = max(rule, 0.01)
 55    at = min(at1, 0.1)
 56
 57    return at
 58
 59
 60def lake_problem(
 61    b=0.42,  # decay rate for P in lake (0.42 = irreversible)
 62    q=2.0,  # recycling exponent
 63    mean=0.02,  # mean of natural inflows
 64    stdev=0.001,  # future utility discount rate
 65    delta=0.98,  # standard deviation of natural inflows
 66    alpha=0.4,  # utility from pollution
 67    nsamples=100,  # Monte Carlo sampling of natural inflows
 68    myears=1,  # the runtime of the simulation model
 69    c1=0.25,
 70    c2=0.25,
 71    r1=0.5,
 72    r2=0.5,
 73    w1=0.5,
 74):
 75    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)
 76
 77    X = np.zeros((myears,))
 78    average_daily_P = np.zeros((myears,))
 79    reliability = 0.0
 80    inertia = 0
 81    utility = 0
 82
 83    for _ in range(nsamples):
 84        X[0] = 0.0
 85        decision = 0.1
 86
 87        decisions = np.zeros(myears)
 88        decisions[0] = decision
 89
 90        natural_inflows = np.random.lognormal(
 91            math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
 92            math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
 93            size=myears,
 94        )
 95
 96        for t in range(1, myears):
 97            # here we use the decision rule
 98            decision = get_antropogenic_release(X[t - 1], c1, c2, r1, r2, w1)
 99            decisions[t] = decision
100
101            X[t] = (
102                (1 - b) * X[t - 1]
103                + X[t - 1] ** q / (1 + X[t - 1] ** q)
104                + decision
105                + natural_inflows[t - 1]
106            )
107            average_daily_P[t] += X[t] / nsamples
108
109        reliability += np.sum(X < Pcrit) / (nsamples * myears)
110        inertia += np.sum(np.absolute(np.diff(decisions) < 0.02)) / (nsamples * myears)
111        utility += np.sum(alpha * decisions * np.power(delta, np.arange(myears))) / nsamples
112    max_P = np.max(average_daily_P)
113
114    return max_P, utility, inertia, reliability
115
116
117def analyze(results, ooi):
118    """analyze results using SALib sobol, returns a dataframe"""
119
120    _, outcomes = results
121
122    parameters = lake_model.uncertainties.copy() + lake_model.levers.copy()
123
124    problem = get_SALib_problem(parameters)
125    y = outcomes[ooi]
126    sobol_indices = sobol.analyze(problem, y)
127    sobol_stats = {key: sobol_indices[key] for key in ["ST", "ST_conf", "S1", "S1_conf"]}
128    sobol_stats = pd.DataFrame(sobol_stats, index=problem["names"])
129    sobol_stats.sort_values(by="ST", ascending=False)
130    s2 = pd.DataFrame(sobol_indices["S2"], index=problem["names"], columns=problem["names"])
131    s2_conf = pd.DataFrame(
132        sobol_indices["S2_conf"], index=problem["names"], columns=problem["names"]
133    )
134
135    return sobol_stats, s2, s2_conf
136
137
138if __name__ == "__main__":
139    ema_logging.log_to_stderr(ema_logging.INFO)
140
141    # instantiate the model
142    lake_model = Model("lakeproblem", function=lake_problem)
143    # specify uncertainties
144    lake_model.uncertainties = [
145        RealParameter("b", 0.1, 0.45),
146        RealParameter("q", 2.0, 4.5),
147        RealParameter("mean", 0.01, 0.05),
148        RealParameter("stdev", 0.001, 0.005),
149        RealParameter("delta", 0.93, 0.99),
150    ]
151
152    # set levers
153    lake_model.levers = [
154        RealParameter("c1", -2, 2),
155        RealParameter("c2", -2, 2),
156        RealParameter("r1", 0, 2),
157        RealParameter("r2", 0, 2),
158        RealParameter("w1", 0, 1),
159    ]
160    # specify outcomes
161    lake_model.outcomes = [
162        ScalarOutcome("max_P", kind=ScalarOutcome.MINIMIZE),
163        # @UndefinedVariable
164        ScalarOutcome("utility", kind=ScalarOutcome.MAXIMIZE),
165        # @UndefinedVariable
166        ScalarOutcome("inertia", kind=ScalarOutcome.MAXIMIZE),
167        # @UndefinedVariable
168        ScalarOutcome("reliability", kind=ScalarOutcome.MAXIMIZE),
169    ]  # @UndefinedVariable
170
171    # override some of the defaults of the model
172    lake_model.constants = [
173        Constant("alpha", 0.41),
174        Constant("nsamples", 100),
175        Constant("myears", 100),
176    ]
177
178    # combine parameters and uncertainties prior to sampling
179    n_scenarios = 1000
180    parameters = lake_model.uncertainties + lake_model.levers
181    scenarios = sample_parameters(parameters, n_scenarios, SobolSampler(), Scenario)
182
183    with MultiprocessingEvaluator(lake_model) as evaluator:
184        results = evaluator.perform_experiments(scenarios)
185
186    sobol_stats, s2, s2_conf = analyze(results, "max_P")
187    print(sobol_stats)
188    print(s2)
189    print(s2_conf)

sample_sobol_lake_model.py

  1"""
  2An example of the lake problem using the ema workbench.
  3
  4The model itself is adapted from the Rhodium example by Dave Hadka,
  5see https://gist.github.com/dhadka/a8d7095c98130d8f73bc
  6
  7"""
  8
  9import math
 10
 11import numpy as np
 12import pandas as pd
 13from SALib.analyze import sobol
 14from scipy.optimize import brentq
 15
 16from ema_workbench import (
 17    Model,
 18    RealParameter,
 19    ScalarOutcome,
 20    Constant,
 21    ema_logging,
 22    MultiprocessingEvaluator,
 23    Policy,
 24)
 25from ema_workbench.em_framework import get_SALib_problem
 26from ema_workbench.em_framework.evaluators import Samplers
 27
 28
 29def lake_problem(
 30    b=0.42,  # decay rate for P in lake (0.42 = irreversible)
 31    q=2.0,  # recycling exponent
 32    mean=0.02,  # mean of natural inflows
 33    stdev=0.001,  # future utility discount rate
 34    delta=0.98,  # standard deviation of natural inflows
 35    alpha=0.4,  # utility from pollution
 36    nsamples=100,  # Monte Carlo sampling of natural inflows
 37    **kwargs,
 38):
 39    try:
 40        decisions = [kwargs[str(i)] for i in range(100)]
 41    except KeyError:
 42        decisions = [0] * 100
 43
 44    Pcrit = brentq(lambda x: x**q / (1 + x**q) - b * x, 0.01, 1.5)
 45    nvars = len(decisions)
 46    X = np.zeros((nvars,))
 47    average_daily_P = np.zeros((nvars,))
 48    decisions = np.array(decisions)
 49    reliability = 0.0
 50
 51    for _ in range(nsamples):
 52        X[0] = 0.0
 53
 54        natural_inflows = np.random.lognormal(
 55            math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
 56            math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
 57            size=nvars,
 58        )
 59
 60        for t in range(1, nvars):
 61            X[t] = (
 62                (1 - b) * X[t - 1]
 63                + X[t - 1] ** q / (1 + X[t - 1] ** q)
 64                + decisions[t - 1]
 65                + natural_inflows[t - 1]
 66            )
 67            average_daily_P[t] += X[t] / float(nsamples)
 68
 69        reliability += np.sum(X < Pcrit) / float(nsamples * nvars)
 70
 71    max_P = np.max(average_daily_P)
 72    utility = np.sum(alpha * decisions * np.power(delta, np.arange(nvars)))
 73    inertia = np.sum(np.absolute(np.diff(decisions)) < 0.02) / float(nvars - 1)
 74
 75    return max_P, utility, inertia, reliability
 76
 77
 78def analyze(results, ooi):
 79    """analyze results using SALib sobol, returns a dataframe"""
 80
 81    _, outcomes = results
 82
 83    problem = get_SALib_problem(lake_model.uncertainties)
 84    y = outcomes[ooi]
 85    sobol_indices = sobol.analyze(problem, y)
 86    sobol_stats = {key: sobol_indices[key] for key in ["ST", "ST_conf", "S1", "S1_conf"]}
 87    sobol_stats = pd.DataFrame(sobol_stats, index=problem["names"])
 88    sobol_stats.sort_values(by="ST", ascending=False)
 89    s2 = pd.DataFrame(sobol_indices["S2"], index=problem["names"], columns=problem["names"])
 90    s2_conf = pd.DataFrame(
 91        sobol_indices["S2_conf"], index=problem["names"], columns=problem["names"]
 92    )
 93
 94    return sobol_stats, s2, s2_conf
 95
 96
 97if __name__ == "__main__":
 98    ema_logging.log_to_stderr(ema_logging.INFO)
 99
100    # instantiate the model
101    lake_model = Model("lakeproblem", function=lake_problem)
102    lake_model.time_horizon = 100
103
104    # specify uncertainties
105    lake_model.uncertainties = [
106        RealParameter("b", 0.1, 0.45),
107        RealParameter("q", 2.0, 4.5),
108        RealParameter("mean", 0.01, 0.05),
109        RealParameter("stdev", 0.001, 0.005),
110        RealParameter("delta", 0.93, 0.99),
111    ]
112
113    # set levers, one for each time step
114    lake_model.levers = [RealParameter(str(i), 0, 0.1) for i in range(lake_model.time_horizon)]
115
116    # specify outcomes
117    lake_model.outcomes = [
118        ScalarOutcome("max_P"),
119        ScalarOutcome("utility"),
120        ScalarOutcome("inertia"),
121        ScalarOutcome("reliability"),
122    ]
123
124    # override some of the defaults of the model
125    lake_model.constants = [Constant("alpha", 0.41), Constant("nsamples", 150)]
126
127    # generate sa single default no release policy
128    policy = Policy("no release", **{str(i): 0.1 for i in range(100)})
129
130    n_scenarios = 1000
131
132    with MultiprocessingEvaluator(lake_model) as evaluator:
133        results = evaluator.perform_experiments(
134            n_scenarios, policy, uncertainty_sampling=Samplers.SOBOL
135        )
136
137    sobol_stats, s2, s2_conf = analyze(results, "max_P")
138    print(sobol_stats)
139    print(s2)
140    print(s2_conf)

sd_boostedtrees_flu.py

  1"""
  2
  3"""
  4
  5import matplotlib.pyplot as plt
  6import numpy as np
  7import seaborn as sns
  8from matplotlib.collections import CircleCollection
  9from sklearn.ensemble import AdaBoostClassifier
 10from sklearn.tree import DecisionTreeClassifier
 11
 12from ema_workbench import load_results, ema_logging
 13from ema_workbench.analysis import feature_scoring
 14
 15ema_logging.log_to_stderr(ema_logging.INFO)
 16
 17
 18def plot_factormap(x1, x2, ax, bdt, nominal):
 19    """helper function for plotting a 2d factor map"""
 20    x_min, x_max = x[:, x1].min(), x[:, x1].max()
 21    y_min, y_max = x[:, x2].min(), x[:, x2].max()
 22    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500), np.linspace(y_min, y_max, 500))
 23
 24    grid = np.ones((xx.ravel().shape[0], x.shape[1])) * nominal
 25    grid[:, x1] = xx.ravel()
 26    grid[:, x2] = yy.ravel()
 27
 28    Z = bdt.predict(grid)
 29    Z = Z.reshape(xx.shape)
 30
 31    ax.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.5)  # @UndefinedVariable
 32
 33    for i in (0, 1):
 34        idx = y == i
 35        ax.scatter(x[idx, x1], x[idx, x2], s=5)
 36    ax.set_xlabel(columns[x1])
 37    ax.set_ylabel(columns[x2])
 38
 39
 40def plot_diag(x1, ax):
 41    x_min, x_max = x[:, x1].min(), x[:, x1].max()
 42    for i in (0, 1):
 43        idx = y == i
 44        ax.hist(x[idx, x1], range=(x_min, x_max), alpha=0.5)
 45
 46
 47# load data
 48experiments, outcomes = load_results("./data/1000 flu cases with policies.tar.gz")
 49
 50# transform to numpy array with proper recoding of cateogorical variables
 51x, columns = feature_scoring._prepare_experiments(experiments)
 52y = outcomes["deceased_population_region 1"][:, -1] > 1000000
 53
 54# establish mean case for factor maps
 55# this is questionable in particular in case of categorical dimensions
 56minima = x.min(axis=0)
 57maxima = x.max(axis=0)
 58nominal = minima + (maxima - minima) / 2
 59
 60# fit the boosted tree
 61bdt = AdaBoostClassifier(DecisionTreeClassifier(max_depth=3), algorithm="SAMME", n_estimators=200)
 62bdt.fit(x, y)
 63
 64# determine which dimensions are most important
 65sorted_indices = np.argsort(bdt.feature_importances_)[::-1]
 66
 67# do the actual plotting
 68# this is a quick hack, tying it to seaborn Pairgrid is probably
 69# the more elegant solution, but is tricky with what arguments
 70# can be passed to the plotting function
 71fig, axes = plt.subplots(ncols=5, nrows=5, figsize=(15, 15))
 72
 73for i, row in enumerate(axes):
 74    for j, ax in enumerate(row):
 75        if i > j:
 76            plot_factormap(sorted_indices[j], sorted_indices[i], ax, bdt, nominal)
 77        elif i == j:
 78            plot_diag(sorted_indices[j], ax)
 79        else:
 80            ax.set_xticks([])
 81            ax.set_yticks([])
 82            ax.axis("off")
 83
 84        if j > 0:
 85            ax.set_yticklabels([])
 86            ax.set_ylabel("")
 87        if i < len(axes) - 1:
 88            ax.set_xticklabels([])
 89            ax.set_xlabel("")
 90
 91# add the legend
 92# Draw a full-figure legend outside the grid
 93handles = [
 94    CircleCollection([10], color=sns.color_palette()[0]),
 95    CircleCollection([10], color=sns.color_palette()[1]),
 96]
 97
 98legend = fig.legend(handles, ["False", "True"], scatterpoints=1)
 99
100plt.tight_layout()
101plt.show()

sd_cart_flu.py

 1"""
 2Created on May 26, 2015
 3
 4@author: jhkwakkel
 5"""
 6
 7import matplotlib.pyplot as plt
 8
 9import ema_workbench.analysis.cart as cart
10from ema_workbench import ema_logging, load_results
11
12ema_logging.log_to_stderr(level=ema_logging.INFO)
13
14
15def classify(data):
16    # get the output for deceased population
17    result = data["deceased_population_region_1"]
18
19    # if deceased population is higher then 1.000.000 people,
20    # classify as 1
21    classes = result[:, -1] > 1000000
22
23    return classes
24
25
26# load data
27fn = "./data/1000 flu cases with policies.tar.gz"
28results = load_results(fn)
29experiments, outcomes = results
30
31# extract results for 1 policy
32logical = experiments["policy"] == "no policy"
33new_experiments = experiments[logical]
34new_outcomes = {}
35for key, value in outcomes.items():
36    new_outcomes[key] = value[logical]
37
38results = (new_experiments, new_outcomes)
39
40# perform cart on modified results tuple
41
42cart_alg = cart.setup_cart(results, classify, mass_min=0.05)
43cart_alg.build_tree()
44
45# print cart to std_out
46print(cart_alg.stats_to_dataframe())
47print(cart_alg.boxes_to_dataframe())
48
49# visualize
50cart_alg.show_boxes(together=False)
51cart_alg.show_tree()
52plt.show()

sd_cart_wcm.py

 1"""
 2Created on May 26, 2015
 3
 4@author: jhkwakkel
 5"""
 6
 7import matplotlib.pyplot as plt
 8
 9import ema_workbench.analysis.cart as cart
10from ema_workbench import ema_logging, load_results
11
12ema_logging.log_to_stderr(level=ema_logging.INFO)
13
14default_flow = 2.178849944502783e7
15
16# load data
17fn = "./data/5000 runs WCM.tar.gz"
18results = load_results(fn)
19x, outcomes = results
20
21ooi = "throughput Rotterdam"
22outcome = outcomes[ooi] / default_flow
23y = outcome < 1
24
25cart_alg = cart.CART(x, y)
26cart_alg.build_tree()
27
28# print cart to std_out
29print(cart_alg.stats_to_dataframe())
30print(cart_alg.boxes_to_dataframe())
31
32# visualize
33cart_alg.show_boxes(together=False)
34cart_alg.show_tree()
35plt.show()

sd_dimensional_stacking_flu.py

 1"""
 2
 3This file illustrated the use of the workbench for using dimensional
 4stacking for scenario discovery
 5
 6
 7.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 8
 9"""
10
11import matplotlib.pyplot as plt
12
13from ema_workbench import ema_logging, load_results
14from ema_workbench.analysis import dimensional_stacking
15
16ema_logging.log_to_stderr(level=ema_logging.INFO)
17
18# load data
19fn = "./data/1000 flu cases no policy.tar.gz"
20x, outcomes = load_results(fn)
21
22y = outcomes["deceased_population_region_1"][:, -1] > 1000000
23
24fig = dimensional_stacking.create_pivot_plot(x, y, 2, bin_labels=True)
25
26plt.show()

sd_logit_flu_example.py

 1"""
 2
 3"""
 4
 5import matplotlib.pyplot as plt
 6import seaborn as sns
 7
 8import ema_workbench.analysis.logistic_regression as logistic_regression
 9from ema_workbench import load_results
10
11# Created on 14 Mar 2019
12#
13# .. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
14
15
16experiments, outcomes = load_results("./data/1000 flu cases no policy.tar.gz")
17
18x = experiments.drop(["model", "policy"], axis=1)
19y = outcomes["deceased_population_region_1"][:, -1] > 1000000
20
21logit = logistic_regression.Logit(x, y)
22logit.run()
23
24logit.show_tradeoff()
25
26# when we change the default threshold, the tradeoff curve is
27# recalculated
28logit.threshold = 0.8
29logit.show_tradeoff()
30
31# we can also look at the tradeoff across threshold values
32# for a given model
33logit.show_threshold_tradeoff(3)
34
35# inspect shows the threshold tradeoff for the model
36# as well as the statistics of the model
37logit.inspect(3)
38
39# we can also visualize the performance of the model
40# using a pairwise scatter plot
41sns.set_style("white")
42logit.plot_pairwise_scatter(3)
43
44plt.show()

sd_prim_PCA_flu.py

 1"""
 2
 3This file illustrated the use of the workbench for doing
 4a PRIM analysis with PCA preprocessing
 5
 6The data was generated using a system dynamics models implemented in Vensim.
 7See flu_example.py for the code.
 8
 9
10"""
11
12import matplotlib.pyplot as plt
13
14import ema_workbench.analysis.prim as prim
15from ema_workbench import ema_logging, load_results
16
17#
18# .. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
19
20ema_logging.log_to_stderr(level=ema_logging.INFO)
21
22# load data
23fn = r"./data/1000 flu cases no policy.tar.gz"
24x, outcomes = load_results(fn)
25
26# specify y
27y = outcomes["deceased_population_region_1"][:, -1] > 1000000
28
29rotated_experiments, rotation_matrix = prim.pca_preprocess(x, y, exclude=["model", "policy"])
30
31# perform prim on modified results tuple
32prim_obj = prim.Prim(rotated_experiments, y, threshold=0.8)
33box = prim_obj.find_box()
34
35box.show_tradeoff()
36box.inspect(22)
37plt.show()

sd_prim_bryant_and_lempert.py

 1"""
 2Created on 12 Nov 2018
 3
 4@author: jhkwakkel
 5"""
 6
 7import matplotlib.pyplot as plt
 8import pandas as pd
 9
10from ema_workbench.analysis import prim
11from ema_workbench.util import ema_logging
12
13ema_logging.log_to_stderr(ema_logging.INFO)
14
15data = pd.read_csv("./data/bryant et al 2010 data.csv", index_col=False)
16x = data.iloc[:, 2:11]
17y = data.iloc[:, 15].values
18
19prim_alg = prim.Prim(x, y, threshold=0.8, peel_alpha=0.1)
20box1 = prim_alg.find_box()
21
22box1.show_tradeoff()
23print(box1.resample(21))
24box1.inspect(21)
25box1.inspect(21, style="graph")
26box1.show_pairs_scatter(21)
27
28plt.show()

sd_prim_constrained.py

 1"""
 2a short example on how to use the constrained prim function.
 3
 4for more details see Kwakkel (2019) A generalized many‐objective optimization
 5approach for scenario discovery, doi: https://doi.org/10.1002/ffo2.8
 6
 7"""
 8
 9import matplotlib.pyplot as plt
10import pandas as pd
11
12from ema_workbench.analysis import prim
13from ema_workbench.util import ema_logging
14
15ema_logging.log_to_stderr(ema_logging.INFO)
16
17data = pd.read_csv("./data/bryant et al 2010 data.csv", index_col=False)
18x = data.iloc[:, 2:11]
19y = data.iloc[:, 15].values
20
21box = prim.run_constrained_prim(x, y, peel_alpha=0.1)
22
23box.show_tradeoff()
24box.inspect(35)
25box.inspect(35, style="graph")
26
27plt.show()

sd_prim_flu.py

 1"""
 2
 3This file illustrated the use of the workbench for doing
 4a PRIM analysis.
 5
 6The data was generated using a system dynamics models implemented in Vensim.
 7See flu_example.py for the code.
 8
 9
10.. codeauthor:: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
11                chamarat <c.hamarat  (at) tudelft (dot) nl>
12
13"""
14
15import matplotlib.pyplot as plt
16
17import ema_workbench.analysis.prim as prim
18from ema_workbench import ema_logging, load_results
19
20ema_logging.log_to_stderr(level=ema_logging.INFO)
21
22
23def classify(data):
24    # get the output for deceased population
25    ooi = data["deceased_population_region_1"]
26    return ooi[:, -1] > 1000000
27
28
29# load data
30fn = r"./data/1000 flu cases no policy.tar.gz"
31results = load_results(fn)
32
33# perform prim on modified results tuple
34prim_obj = prim.setup_prim(results, classify, threshold=0.8, threshold_type=1)
35
36box_1 = prim_obj.find_box()
37box_1.show_ppt()
38box_1.show_tradeoff()
39# box_1.inspect([5, 6], style="graph", boxlim_formatter="{: .2f}")
40
41fig, axes = plt.subplots(nrows=2, ncols=1)
42
43box_1.inspect([5, 6], style="graph", boxlim_formatter="{: .2f}", ax=axes)
44plt.show()
45
46box_1.inspect(5)
47box_1.select(5)
48box_1.write_ppt_to_stdout()
49box_1.show_pairs_scatter(5)
50
51# print prim to std_out
52print(prim_obj.stats_to_dataframe())
53print(prim_obj.boxes_to_dataframe())
54
55# visualize
56prim_obj.show_boxes()
57plt.show()

sd_prim_wcm.py

 1"""
 2Created on Feb 13, 2014
 3
 4This example demonstrates the use of PRIM. The dataset was generated
 5using the world container model
 6
 7(Tavasszy et al 2011; https://dx.doi.org/10.1016/j.jtrangeo.2011.05.005)
 8
 9
10"""
11
12import matplotlib.pyplot as plt
13
14from ema_workbench import ema_logging, load_results
15from ema_workbench.analysis import prim
16
17ema_logging.log_to_stderr(ema_logging.INFO)
18
19default_flow = 2.178849944502783e7
20
21
22def classify(outcomes):
23    ooi = "throughput Rotterdam"
24    outcome = outcomes[ooi]
25    outcome = outcome / default_flow
26
27    classes = outcome < 1
28    return classes
29
30
31fn = r"./data/5000 runs WCM.tar.gz"
32results = load_results(fn)
33
34prim_obj = prim.setup_prim(results, classify, mass_min=0.05, threshold=0.75)
35
36# let's find a first box
37box1 = prim_obj.find_box()
38
39# let's analyze the peeling trajectory
40box1.show_ppt()
41box1.show_tradeoff()
42box1.inspect_tradeoff()
43
44box1.write_ppt_to_stdout()
45
46# based on the peeling trajectory, we pick entry number 44
47box1.select(44)
48
49# show the resulting box
50prim_obj.show_boxes()
51prim_obj.boxes_to_dataframe()
52
53plt.show()

timeseries_clustering_flu.py

 1"""
 2Created on 11 Apr 2019
 3
 4@author: jhkwakkel
 5"""
 6
 7import matplotlib.pyplot as plt
 8import seaborn as sns
 9
10from ema_workbench import load_results
11from ema_workbench.analysis import clusterer, plotting, Density
12
13experiments, outcomes = load_results("./data/1000 flu cases no policy.tar.gz")
14data = outcomes["infected_fraction_R1"]
15
16# calculate distances
17distances = clusterer.calculate_cid(data)
18
19# plot dedrog
20clusterer.plot_dendrogram(distances)
21
22# do agglomerative clustering on the distances
23clusters = clusterer.apply_agglomerative_clustering(distances, n_clusters=5)
24
25# show the clusters in the output space
26x = experiments.copy()
27x["clusters"] = clusters.astype("object")
28plotting.lines(x, outcomes, group_by="clusters", density=Density.BOXPLOT)
29
30# show the input space
31sns.pairplot(
32    x,
33    hue="clusters",
34    vars=[
35        "infection ratio region 1",
36        "root contact rate region 1",
37        "normal contact rate region 1",
38        "recovery time region 1",
39        "permanent immune population fraction R1",
40    ],
41    plot_kws=dict(s=7),
42)
43plt.show()

Best practices

Separate experimentation and analysis

It is strongly recommended to cleanly separate the various steps in your exploratory modeling pipeline. So, separately execute your experiments or perform your optimization, save the results, and next analyze these results. Moreover, since parallel execution can be troublesome within the Jupyter Lab / notebook environment, I personally run my experiments and optimizations either from the command line or through an IDE using a normal python file. Jupyter Lab is then used to analyze the results.

Keeping things organized

A frequently recurring cause of problems when using the workbench stems from not properly organizing your files. In particular when using multiprocessing it is key that you keep things well organized. The way the workbench works with multiprocessing is that it copies the entire working directory of the model to a temporary folder for each subprocess. This temporary folder is located in the same folder as the python or notebook file from which you are running. If the working directory of your model is the same as the directory in which the run file resized, you can easily fill up your hard disk in minutes. To avoid these kinds of problems, I suggest to use a directory structure as outlined below.

project
├─ model_files
├── a_model.nlogo
└── some_input.csv
├─ results
├── 100k_nfe_seed1.csv
└── 1000_experiments.tar.gz
├─ figures
└── pairwise_scatter.png
├─experiments.py
├─optimization.py
├─analysis.ipynb
└─model_definition.py

Also, if you are not familiar with absolute and relative paths, please read up on that first and only use relative paths when using the workbench. Not only will this reduce the possibility for errors, it will also mean that moving your code from one machine to another will be a lot easier.

Vensim Tips and Tricks

Debugging a model

A common occurring problem is that some of the runs of a Vensim model do not complete correctly. In the logger, we see a message stating that a run did not complete correct, with a description of the case that did not complete correctly attached to it. Typically, this error is due to a division by zero somewhere in the model during the simulation. The easiest way of finding the source of the division by zero is via Vensim itself. However, this requires that the model is parameterized as specified by the case that created the error. It is of course possible to set all the parameters by hand, however this becomes annoying on larger models, or if one has to do it multiple times. Since the Vensim DLL does not have a way to save a model, we cannot use the DLL. Instead, we can use the fact that one can save a Vensim model as a text file. By changing the required parameters in this text file via the workbench, we can then open the modified model in Vensim and spot the error.

The following script can be used for this purpose.

 1"""
 2Created on 11 aug. 2011
 3
 4.. codeauthor:: wauping <w.auping (at) student (dot) tudelft (dot) nl>
 5                jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
 6
 7
 8To be able to debug the Vensim model, a few steps are needed:
 9
10    1.  The case that gave a bug, needs to be saved in a text  file. The entire
11        case description should be on a single line.
12    2.  Reform and clean your model ( In the Vensim menu: Model, Reform and
13        Clean). Choose
14
15         * Equation Order: Alphabetical by group (not really necessary)
16         * Equation Format: Terse
17
18    3.  Save your model as text (File, Save as..., Save as Type: Text Format
19        Models
20    4.  Run this script
21    5.  If the print in the end is not set([]), but set([array]), the array
22        gives the values that where not found and changed
23    5.  Run your new model (for example 'new text.mdl')
24    6.  Vensim tells you about your critical mistake
25
26"""
27
28fileSpecifyingError = ""
29
30pathToExistingModel = "/salinization/Verzilting_aanpassingen incorrect.mdl"
31pathToNewModel = "/models/salinization/Verzilting_aanpassingen correct.mdl"
32newModel = open(pathToNewModel, "w")
33
34# line = open(fileSpecifyingError).read()
35
36line = "rainfall : 0.154705633188; adaptation time from non irrigated agriculture : 0.915157119079; salt effect multiplier : 1.11965969891; adaptation time to non irrigated agriculture : 0.48434342934; adaptation time to irrigated agriculture : 0.330990830832; water shortage multiplier : 0.984356102036; delay time salt seepage : 6.0; adaptation time : 6.90258192256; births multiplier : 1.14344734715; diffusion lookup : [(0, 8.0), (10, 8.0), (20, 8.0), (30, 8.0), (40, 7.9999999999999005), (50, 4.0), (60, 9.982194802803703e-14), (70, 1.2455526635140464e-27), (80, 1.5541686655435471e-41), (90, 1.9392517969836692e-55)]; salinity effect multiplier : 1.10500381093; technological developments in irrigation : 0.0117979353255; adaptation time from irrigated agriculture : 1.58060947607; food shortage multiplier : 0.955325345996; deaths multiplier : 0.875605669911; "
37
38# we assume the case specification was copied from the logger
39splitOne = line.split(",")
40variable = {}
41for n in range(len(splitOne) - 1):
42    splitTwo = splitOne[n].split(":")
43    variableElement = splitTwo[0]
44    # Delete the spaces and other rubish on the sides of the variable name
45    variableElement = variableElement.lstrip()
46    variableElement = variableElement.lstrip("'")
47    variableElement = variableElement.rstrip()
48    variableElement = variableElement.rstrip("'")
49    print(variableElement)
50    valueElement = splitTwo[1]
51    valueElement = valueElement.lstrip()
52    valueElement = valueElement.rstrip()
53    variable[variableElement] = valueElement
54print(variable)
55
56# This generates a new (text-formatted) model
57changeNextLine = False
58settedValues = []
59for line in open(pathToExistingModel):
60    if line.find("=") != -1:
61        elements = line.split("=")
62        value = elements[0]
63        value = value.strip()
64        if value in variable:
65            elements[1] = variable.get(value)
66            line = elements[0] + " = " + elements[1]
67            settedValues.append(value)
68
69    newModel.write(line)
70newModel.close()  # in order to be able to open the model in Vensim
71notSet = set(variable.keys()) - set(settedValues)
72print(notSet)

Glossary

parameter uncertainty

An uncertainty is a parameter uncertainty if the range is continuous from the lower bound to the upper bound. A parameter uncertainty can be either real valued or discrete valued.

categorical uncertainty

An uncertainty is categorical if there is not a range but a set of possibilities over which one wants to sample.

lookup uncertainty

vensim specific extension to categorical uncertainty for handling lookups in various ways

uncertainty space

the space created by the set of uncertainties

ensemble

a python class responsible for running a series of computational experiments.

model interface

a python class that provides an interface to an underlying model

working directory

a directory that contains files that a model needs

classification trees

a category of machine learning algorithms for rule induction

prim (patient rule induction method)

a rule induction algorithm

coverage

a metric developed for scenario discovery

density

a metric developed for scenario discovery

scenario discovery

a use case of EMA

case

A case specifies the input parameters for a run of a model. It is a dict instance, with the names of the uncertainties as key, and their sampled values as value.

experiment

An experiment is a complete specification for a run. It specifies the case, the name of the policy, and the name of the model.

policy

a policy is by definition an object with a name attribute. So, policy[‘name’] most return the name of the policy

result

the combination of an experiment and the associated outcomes for the experiment

outcome

the data of interest produced by a model given an experiment

EMA Modules

Exploratory modeling framework

model

This module specifies the abstract base class for interfacing with models. Any model that is to be controlled from the workbench is controlled via an instance of an extension of this abstract base class.

class ema_workbench.em_framework.model.AbstractModel(name)

ModelStructureInterface is one of the the two main classes used for performing EMA. This is an abstract base class and cannot be used directly.

uncertainties

list of parameter instances

Type:

list

levers

list of parameter instances

Type:

list

outcomes

list of outcome instances

Type:

list

name

alphanumerical name of model structure interface

Type:

str

output

this should be a dict with the names of the outcomes as key

Type:

dict

When extending this class :meth:`model_init` and
:meth:`run_model` have to be implemented.
as_dict()

returns a dict representation of the model

cleanup()

This model is called after finishing all the experiments, but just prior to returning the results. This method gives a hook for doing any cleanup, such as closing applications.

In case of running in parallel, this method is called during the cleanup of the pool, just prior to removing the temporary directories.

initialized(policy)

check if model has been initialized

Parameters:

policy (a Policy instance)

model_init(policy)

Method called to initialize the model.

Parameters:

policy (dict) – policy to be run.

Note

This method should always be implemented. Although in simple cases, a simple pass can suffice.

reset_model()

Method for resetting the model to its initial state. The default implementation only sets the outputs to an empty dict.

retrieve_output()

Method for retrieving output after a model run. Deprecated, will be removed in version 3.0 of the EMAworkbench.

Return type:

dict with the results of a model run.

run_model(scenario, policy)

Method for running an instantiated model structure.

Parameters:
  • scenario (Scenario instance)

  • policy (Policy instance)

class ema_workbench.em_framework.model.FileModel(name, wd=None, model_file=None)
as_dict()

returns a dict representation of the model

class ema_workbench.em_framework.model.Model(name, function=None)
class ema_workbench.em_framework.model.Replicator(name)
run_model(scenario, policy)

Method for running an instantiated model structure.

Parameters:
  • scenario (Scenario instance)

  • policy (Policy instance)

class ema_workbench.em_framework.model.ReplicatorModel(name, function=None)
class ema_workbench.em_framework.model.SingleReplication(name)
run_model(scenario, policy)

Method for running an instantiated model structure.

Parameters:
  • scenario (Scenario instance)

  • policy (Policy instance)

parameters

parameters and related helper classes and functions

class ema_workbench.em_framework.parameters.BooleanParameter(name, default=None, variable_name=None, pff=False)

boolean model input parameter

A BooleanParameter is similar to a CategoricalParameter, except the category values can only be True or False.

Parameters:
  • name (str)

  • variable_name (str, or list of str)

class ema_workbench.em_framework.parameters.CategoricalParameter(name, categories, default=None, variable_name=None, pff=False, multivalue=False)

categorical model input parameter

Parameters:
  • name (str)

  • categories (collection of obj)

  • variable_name (str, or list of str)

  • multivalue (boolean) – if categories have a set of values, for each variable_name a different one.

  • TODO (#)

  • TODO

cat_for_index(index)

return category given index

Parameters:

index (int)

Return type:

object

from_dist(name, dist)

alternative constructor for creating a parameter from a frozen scipy.stats distribution directly

Parameters:
  • dist (scipy stats frozen dist)

  • **kwargs (valid keyword arguments for Parameter instance)

index_for_cat(category)

return index of category

Parameters:

category (object)

Return type:

int

class ema_workbench.em_framework.parameters.Constant(name, value)

Constant class,

can be used for any parameter that has to be set to a fixed value

class ema_workbench.em_framework.parameters.IntegerParameter(name, lower_bound, upper_bound, resolution=None, default=None, variable_name=None, pff=False)

integer valued model input parameter

Parameters:
  • name (str)

  • lower_bound (int)

  • upper_bound (int)

  • resolution (iterable)

  • variable_name (str, or list of str)

  • pff (bool) – if true, sample over this parameter using resolution in case of partial factorial sampling

Raises:
  • ValueError – if lower bound is larger than upper bound

  • ValueError – if entries in resolution are outside range of lower_bound and upper_bound, or not an integer instance

  • ValueError – if lower_bound or upper_bound is not an integer instance

classmethod from_dist(name, dist, **kwargs)

alternative constructor for creating a parameter from a frozen scipy.stats distribution directly

Parameters:
  • dist (scipy stats frozen dist)

  • **kwargs (valid keyword arguments for Parameter instance)

class ema_workbench.em_framework.parameters.Parameter(name, lower_bound, upper_bound, resolution=None, default=None, variable_name=None, pff=False)

Base class for any model input parameter

Parameters:
  • name (str)

  • lower_bound (int or float)

  • upper_bound (int or float)

  • resolution (collection)

  • pff (bool) – if true, sample over this parameter using resolution in case of partial factorial sampling

Raises:
  • ValueError – if lower bound is larger than upper bound

  • ValueError – if entries in resolution are outside range of lower_bound and upper_bound

classmethod from_dist(name, dist, **kwargs)

alternative constructor for creating a parameter from a frozen scipy.stats distribution directly

Parameters:
  • dist (scipy stats frozen dist)

  • **kwargs (valid keyword arguments for Parameter instance)

class ema_workbench.em_framework.parameters.RealParameter(name, lower_bound, upper_bound, resolution=None, default=None, variable_name=None, pff=False)

real valued model input parameter

Parameters:
  • name (str)

  • lower_bound (int or float)

  • upper_bound (int or float)

  • resolution (iterable)

  • variable_name (str, or list of str)

  • pff (bool) – if true, sample over this parameter using resolution in case of partial factorial sampling

Raises:
  • ValueError – if lower bound is larger than upper bound

  • ValueError – if entries in resolution are outside range of lower_bound and upper_bound

classmethod from_dist(name, dist, **kwargs)

alternative constructor for creating a parameter from a frozen scipy.stats distribution directly

Parameters:
  • dist (scipy stats frozen dist)

  • **kwargs (valid keyword arguments for Parameter instance)

ema_workbench.em_framework.parameters.parameters_from_csv(uncertainties, **kwargs)

Helper function for creating many Parameters based on a DataFrame or csv file

Parameters:
  • uncertainties (str, DataFrame)

  • **kwargs (dict, arguments to pass to pandas.read_csv)

Return type:

list of Parameter instances

This helper function creates uncertainties. It assumes that the DataFrame or csv file has a column titled ‘name’, optionally a type column {int, real, cat}, can be included as well. the remainder of the columns are handled as values for the parameters. If type is not specified, the function will try to infer type from the values.

Note that this function does not support the resolution and default kwargs on parameters.

An example of a csv:

NAME,TYPE,,, a_real,real,0,1.1, an_int,int,1,9, a_categorical,cat,a,b,c

this CSV file would result in

[RealParameter(‘a_real’, 0, 1.1, resolution=[], default=None),

IntegerParameter(‘an_int’, 1, 9, resolution=[], default=None), CategoricalParameter(‘a_categorical’, [‘a’, ‘b’, ‘c’], default=None)]

ema_workbench.em_framework.parameters.parameters_to_csv(parameters, file_name)

Helper function for writing a collection of parameters to a csv file

Parameters:
  • parameters (collection of Parameter instances)

  • file_name (str)

The function iterates over the collection and turns these into a data frame prior to storing them. The resulting csv can be loaded using the parameters_from_csv function. Note that currently we don’t store resolution and default attributes.

outcomes

Module for outcome classes

class ema_workbench.em_framework.outcomes.AbstractOutcome(name, kind=0, variable_name=None, function=None, expected_range=None, shape=None, dtype=None)

Base Outcome class

Parameters:
  • name (str) – Name of the outcome.

  • kind ({INFO, MINIMIZE, MAXIMIZE}, optional)

  • variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name

  • function (callable, optional) – a callable to perform postprocessing on data retrieved from model

  • expected_range (2 tuple, optional) – expected min and max value for outcome, used by HyperVolume convergence metric

  • shape ({tuple, None} optional) – must be used in conjunction with dtype. Enables pre-allocation of data structure for storing results.

  • dtype (datatype, optional) – must be used in conjunction with shape. Enables pre-allocation of data structure for storing results.

name
Type:

str

kind
Type:

int

variable_name
Type:

str

function
Type:

callable

shape
Type:

tuple

dtype
Type:

dataype

abstract classmethod from_disk(filename, archive)

helper function for loading from disk

Parameters:
  • filename (str)

  • archive (Tarfile)

abstract classmethod to_disk(values)

helper function for writing outcome to disk

Parameters:

values (obj) – data to store

Return type:

BytesIO

class ema_workbench.em_framework.outcomes.ArrayOutcome(name, variable_name=None, function=None, expected_range=None, shape=None, dtype=None)

Array Outcome class for n-dimensional arrays

Parameters:
  • name (str) – Name of the outcome.

  • variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name

  • function (callable, optional) – a callable to perform postprocessing on data retrieved from model

  • expected_range (2 tuple, optional) – expected min and max value for outcome, used by HyperVolume convergence metric

  • shape ({tuple, None} optional) – must be used in conjunction with dtype. Enables pre-allocation of data structure for storing results.

  • dtype (datatype, optional) – must be used in conjunction with shape. Enables pre-allocation of data structure for storing results.

name
Type:

str

kind
Type:

int

variable_name
Type:

str

function
Type:

callable

shape
Type:

tuple

expected_range
Type:

tuple

dtype
Type:

datatype

classmethod from_disk(filename, archive)

helper function for loading from disk

Parameters:
  • filename (str)

  • archive (Tarfile)

classmethod to_disk(values)

helper function for writing outcome to disk

Parameters:

values (ND array)

Returns:

  • BytesIO

  • filename

class ema_workbench.em_framework.outcomes.Constraint(name, parameter_names=None, outcome_names=None, function=None)

Constraints class that can be used when defining constrained optimization problems.

Parameters:
  • name (str)

  • parameter_names (str or collection of str)

  • outcome_names (str or collection of str)

  • function (callable)

name
Type:

str

parameter_names

name(s) of the uncertain parameter(s) and/or lever parameter(s) to which the constraint applies

Type:

str, list of str

outcome_names

name(s) of the outcome(s) to which the constraint applies

Type:

str, list of str

function

The function should return the distance from the feasibility threshold, given the model outputs with a variable name. The distance should be 0 if the constraint is met.

Type:

callable

class ema_workbench.em_framework.outcomes.ScalarOutcome(name, kind=0, variable_name=None, function=None, expected_range=None, dtype=None)

Scalar Outcome class

Parameters:
  • name (str) – Name of the outcome.

  • kind ({INFO, MINIMIZE, MAXIMIZE}, optional)

  • variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name

  • function (callable, optional) – a callable to perform post processing on data retrieved from model

  • expected_range (collection, optional) – expected min and max value for outcome, used by HyperVolume convergence metric

  • dtype (datatype, optional) – Enables pre-allocation of data structure for storing results.

name
Type:

str

kind
Type:

int

variable_name
Type:

str

function
Type:

callable

shape
Type:

tuple

expected_range
Type:

tuple

dtype
Type:

datatype

classmethod from_disk(filename, archive)

helper function for loading from disk

Parameters:
  • filename (str)

  • archive (Tarfile)

classmethod to_disk(values)

helper function for writing outcome to disk

Parameters:

values (1D array)

Returns:

  • BytesIO

  • filename

class ema_workbench.em_framework.outcomes.TimeSeriesOutcome(name, variable_name=None, function=None, expected_range=None, shape=None, dtype=None)

TimeSeries Outcome class for 1D arrays

Parameters:
  • name (str) – Name of the outcome.

  • variable_name (str, optional) – if the name of the outcome in the underlying model is different from the name of the outcome, you can supply the variable name as an optional argument, if not provided, defaults to name

  • function (callable, optional) – a callable to perform postprocessing on data retrieved from model

  • expected_range (2 tuple, optional) – expected min and max value for outcome, used by HyperVolume convergence metric

  • shape ({tuple, None} optional) – must be used in conjunction with dtype. Enables pre-allocation of data structure for storing results.

  • dtype (datatype, optional) – must be used in conjunction with shape. Enables pre-allocation of data structure for storing results.

name
Type:

str

kind
Type:

int

variable_name
Type:

str

function
Type:

callable

shape
Type:

tuple

expected_range
Type:

tuple

dtype
Type:

datatype

Notes

Time series outcomes are currently assumed to be 1D arrays. If you are dealing with higher dimensional outputs (e.g., multiple replications resulting in 2D arrays), use ArrayOutcome instead.

classmethod from_disk(filename, archive)

helper function for loading from disk

Parameters:
  • filename (str)

  • archive (Tarfile)

classmethod to_disk(values)

helper function for writing outcome to disk

Parameters:

values (DataFrame)

Returns:

  • StringIO

  • filename

evaluators

collection of evaluators for performing experiments, optimization, and robust optimization

class ema_workbench.em_framework.evaluators.Samplers(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Enum for different kinds of samplers

class ema_workbench.em_framework.evaluators.SequentialEvaluator(msis)
evaluate_experiments(scenarios, policies, callback, combine='factorial')

used by ema_workbench

finalize()

finalize the evaluator

initialize()

initialize the evaluator

ema_workbench.em_framework.evaluators.optimize(models, algorithm=<class 'platypus.algorithms.EpsNSGAII'>, nfe=10000, searchover='levers', evaluator=None, reference=None, convergence=None, constraints=None, convergence_freq=1000, logging_freq=5, variator=None, **kwargs)

optimize the model

Parameters:
  • models (1 or more Model instances)

  • algorithm (a valid Platypus optimization algorithm)

  • nfe (int)

  • searchover ({'uncertainties', 'levers'})

  • evaluator (evaluator instance)

  • reference (Policy or Scenario instance, optional) – overwrite the default scenario in case of searching over levers, or default policy in case of searching over uncertainties

  • convergence (function or collection of functions, optional)

  • constraints (list, optional)

  • convergence_freq (int) – nfe between convergence check

  • logging_freq (int) – number of generations between logging of progress

  • variator (platypus GAOperator instance, optional) – if None, it falls back on the defaults in platypus-opts which is SBX with PM

  • kwargs (any additional arguments will be passed on to algorithm)

Return type:

pandas DataFrame

Raises:
  • EMAError if searchover is not one of 'uncertainties' or 'levers'

  • NotImplementedError if len(models) > 1

ema_workbench.em_framework.evaluators.perform_experiments(models, scenarios=0, policies=0, evaluator=None, reporting_interval=None, reporting_frequency=10, uncertainty_union=False, lever_union=False, outcome_union=False, uncertainty_sampling=Samplers.LHS, lever_sampling=Samplers.LHS, callback=None, return_callback=False, combine='factorial', log_progress=False, **kwargs)

sample uncertainties and levers, and perform the resulting experiments on each of the models

Parameters:
  • models (one or more AbstractModel instances)

  • scenarios (int or collection of Scenario instances, optional)

  • policies (int or collection of Policy instances, optional)

  • evaluator (Additional keyword arguments are passed on to evaluate_experiments of the)

  • reporting_interval (int, optional)

  • reporting_frequency (int, optional)

  • uncertainty_union (boolean, optional)

  • lever_union (boolean, optional)

  • outcome_union (boolean, optional)

  • uncertainty_sampling ({LHS, MC, FF, PFF, SOBOL, MORRIS, FAST}, optional)

  • lever_sampling ({LHS, MC, FF, PFF, SOBOL, MORRIS, FAST}, optional TODO:: update doc)

  • callback (Callback instance, optional)

  • return_callback (boolean, optional)

  • log_progress (bool, optional)

  • combine ({'factorial', 'zipover'}, optional) – how to combine uncertainties and levers? In case of ‘factorial’, both are sampled separately using their respective samplers. Next the resulting designs are combined in a full factorial manner. In case of ‘zipover’, both are sampled separately and then combined by cycling over the shortest of the the two sets of designs until the longest set of designs is exhausted.

  • evaluator

Returns:

the experiments as a dataframe, and a dict with the name of an outcome as key, and the associated values as numpy array. Experiments and outcomes are aligned on index.

Return type:

tuple

optimization

class ema_workbench.em_framework.optimization.ArchiveLogger(directory, decision_varnames, outcome_varnames, base_filename='archives.tar.gz')

Helper class to write the archive to disk at each iteration

Parameters:
  • directory (str)

  • decision_varnames (list of str)

  • outcome_varnames (list of str)

  • base_filename (str, optional)

classmethod load_archives(filename)

load the archives stored with the ArchiveLogger

Parameters:

filename (str) – relative path to file

Return type:

dict with nfe as key and dataframe as vlaue

class ema_workbench.em_framework.optimization.Convergence(metrics, max_nfe, convergence_freq=1000, logging_freq=5, log_progress=False)

helper class for tracking convergence of optimization

class ema_workbench.em_framework.optimization.EpsilonIndicatorMetric(reference_set, problem, **kwargs)

EpsilonIndicator metric

Parameters:
  • reference_set (DataFrame)

  • problem (PlatypusProblem instance)

this is a thin wrapper around EpsilonIndicator as provided by platypus to make it easier to use in conjunction with the workbench.

class ema_workbench.em_framework.optimization.EpsilonProgress

epsilon progress convergence metric class

class ema_workbench.em_framework.optimization.GenerationalDistanceMetric(reference_set, problem, **kwargs)

GenerationalDistance metric

Parameters:
  • reference_set (DataFrame)

  • problem (PlatypusProblem instance)

  • d (int, default=1) – the power in the intergenerational distance function

This is a thin wrapper around GenerationalDistance as provided by platypus to make it easier to use in conjunction with the workbench.

see https://link.springer.com/content/pdf/10.1007/978-3-319-15892-1_8.pdf for more information

class ema_workbench.em_framework.optimization.HypervolumeMetric(reference_set, problem, **kwargs)

Hypervolume metric

Parameters:
  • reference_set (DataFrame)

  • problem (PlatypusProblem instance)

this is a thin wrapper around Hypervolume as provided by platypus to make it easier to use in conjunction with the workbench.

class ema_workbench.em_framework.optimization.InvertedGenerationalDistanceMetric(reference_set, problem, **kwargs)

InvertedGenerationalDistance metric

Parameters:
  • reference_set (DataFrame)

  • problem (PlatypusProblem instance)

  • d (int, default=1) – the power in the inverted intergenerational distance function

This is a thin wrapper around InvertedGenerationalDistance as provided by platypus to make it easier to use in conjunction with the workbench.

see https://link.springer.com/content/pdf/10.1007/978-3-319-15892-1_8.pdf for more information

class ema_workbench.em_framework.optimization.OperatorProbabilities(name, index)

OperatorProbabiliy convergence tracker for use with auto adaptive operator selection.

Parameters:
  • name (str)

  • index (int)

State of the art MOEAs like Borg (and GenerationalBorg provided by the workbench) use autoadaptive operator selection. The algorithm has multiple different evolutionary operators. Over the run, it tracks how well each operator is doing in producing fitter offspring. The probability of the algorithm using a given evolutionary operator is proportional to how well this operator has been doing in producing fitter offspring in recent generations. This class can be used to track these probabilities over the run of the algorithm.

class ema_workbench.em_framework.optimization.Problem(searchover, parameters, outcome_names, constraints, reference=None)

small extension to Platypus problem object, includes information on the names of the decision variables, the names of the outcomes, and the type of search

class ema_workbench.em_framework.optimization.RobustProblem(parameters, outcome_names, scenarios, robustness_functions, constraints)

small extension to Problem object for robust optimization, adds the scenarios and the robustness functions

class ema_workbench.em_framework.optimization.SpacingMetric(problem)

Spacing metric

Parameters:

problem (PlatypusProblem instance)

this is a thin wrapper around Spacing as provided by platypus to make it easier to use in conjunction with the workbench.

ema_workbench.em_framework.optimization.epsilon_nondominated(results, epsilons, problem)

Merge the list of results into a single set of non dominated results using the provided epsilon values

Parameters:
  • results (list of DataFrames)

  • epsilons (epsilon values for each objective)

  • problem (PlatypusProblem instance)

Return type:

DataFrame

Notes

this is a platypus based alternative to pareto.py (https://github.com/matthewjwoodruff/pareto.py)

ema_workbench.em_framework.optimization.rebuild_platypus_population(archive, problem)

rebuild a population of platypus Solution instances

Parameters:
  • archive (DataFrame)

  • problem (PlatypusProblem instance)

Return type:

list of platypus Solutions

ema_workbench.em_framework.optimization.to_problem(model, searchover, reference=None, constraints=None)

helper function to create Problem object

Parameters:
  • model (AbstractModel instance)

  • searchover (str)

  • reference (Policy or Scenario instance, optional) – overwrite the default scenario in case of searching over levers, or default policy in case of searching over uncertainties

  • constraints (list, optional)

Return type:

Problem instance

ema_workbench.em_framework.optimization.to_robust_problem(model, scenarios, robustness_functions, constraints=None)

helper function to create RobustProblem object

Parameters:
  • model (AbstractModel instance)

  • scenarios (collection)

  • robustness_functions (iterable of ScalarOutcomes)

  • constraints (list, optional)

Return type:

RobustProblem instance

outputspace_exploration

Provides a genetic algorithm based on novelty search for output space exploration.

The algorithm is inspired by Chérel et al (2015). In short, from Chérel et al, we have taken the idea of the HitBox. Basically, this is an epsilon archive where one keeps track of how many solutions have fallen into each grid cell. Next, tournament selection based on novelty is used as the selective pressure. Novelty is defined as 1/nr. of solutions in same grid cell. This is then combined with auto-adaptive population sizing as used in e-NSGAII. This replaces the use of adaptive Cauchy mutation as used by Chérel et al. There is also an more sophisticated algorithm that adds auto-adaptive operator selection as used in BORG.

The algorithm can be used in combination with the optimization functionality of the workbench. Just pass an OutputSpaceExploration instance as algorithm to optimize.

class ema_workbench.em_framework.outputspace_exploration.AutoAdaptiveOutputSpaceExploration(problem, grid_spec=None, population_size=100, generator=<platypus.operators.RandomGenerator object>, variator=None, **kwargs)

A combination of auto-adaptive operator selection with OutputSpaceExploration.

The parametrization of all operators is based on the default values as used in Borg 1.9.

Parameters:
  • problem (a platypus Problem instance)

  • grid_spec (list of tuples) – with min, max, and epsilon for each outcome of interest

  • population_size (int, optional)

Notes

Limited to RealParameters only.

class ema_workbench.em_framework.outputspace_exploration.OutputSpaceExploration(problem, grid_spec=None, population_size=100, generator=<platypus.operators.RandomGenerator object>, variator=None, **kwargs)

Basic genetic algorithm for output space exploration using novelty search.

Parameters:
  • problem (a platypus Problem instance)

  • grid_spec (list of tuples) – with min, max, and epsilon for each outcome of interest

  • population_size (int, optional)

The algorithm defines novelty using an epsilon-like grid in the output space. Novelty is one divided by the number of seen solutions in a given grid cell. Tournament selection using novelty is used to create offspring. Crossover is done using simulated binary crossover and mutation is done using polynomial mutation.

The epsilon like grid structure for tracking novelty is implemented using an archive, the Hit Box. Per epsilon grid cell, a single solution closes to the centre of the cell is maintained. This makes the algorithm behave virtually identical to ε-NSGAII. The archive is returned as results and epsilon progress is defined.

To deal with a stalled search, adaptive time continuation, identical to ε-NSGAII is used.

Notes

Output space exploration relies on the optimization functionality of the workbench. Therefore, outcomes of kind INFO are ignored. For output space exploration the direction (i.e. minimize or maximize) does not matter.

samplers

This module contains various classes that can be used for specifying different types of samplers. These different samplers implement basic sampling techniques including Full Factorial sampling, Latin Hypercube sampling, and Monte Carlo sampling.

class ema_workbench.em_framework.samplers.AbstractSampler

Abstract base class from which different samplers can be derived.

In the simplest cases, only the sample method needs to be overwritten. generate_designs` is the only method called from outside. The other methods are used internally to generate the designs.

generate_designs(parameters, nr_samples)

external interface for Sampler. Returns the computational experiments over the specified parameters, for the given number of samples for each parameter.

Parameters:
  • parameters (list) – a list of parameters for which to generate the experimental designs

  • nr_samples (int) – the number of samples to draw for each parameter

Returns:

  • generator – a generator object that yields the designs resulting from combining the parameters

  • int – the number of experimental designs

generate_samples(parameters, size)

The main method of :class: ~sampler.Sampler and its children. This will call the sample method for each of the parameters and return the resulting designs.

Parameters:
  • parameters (collection) – a collection of RealParameter, IntegerParameter, and CategoricalParameter instances.

  • size (int) – the number of samples to generate.

Returns:

dict with the parameter.name as key, and the sample as value

Return type:

dict

sample(distribution, size)

method for sampling a number of samples from a particular distribution. The various samplers differ with respect to their implementation of this method.

Parameters:
  • distribution (scipy frozen distribution)

  • size (int) – the number of samples to generate

Returns:

the samples for the distribution and specified parameters

Return type:

numpy array

class ema_workbench.em_framework.samplers.DefaultDesigns(designs, parameters, n)

iterable for the experimental designs

class ema_workbench.em_framework.samplers.FullFactorialSampler

generates a full factorial sample.

If the parameter is non categorical, the resolution is set the number of samples. If the parameter is categorical, the specified value for samples will be ignored and each category will be used instead.

determine_nr_of_designs(sampled_parameters)

Helper function for determining the number of experiments that will be generated given the sampled parameters.

Parameters:

sampled_parameters (list) – a list of sampled parameters, as the values return by generate_samples

Returns:

the total number of experimental design

Return type:

int

generate_designs(parameters, nr_samples)

This method provides an alternative implementation to the default implementation provided by Sampler. This version returns a full factorial design across the parameters.

Parameters:
  • parameters (list) – a list of parameters for which to generate the experimental designs

  • nr_samples (int) – the number of intervals to use on each Parameter. Categorical parameters always return all their categories

Returns:

  • generator – a generator object that yields the designs resulting from combining the parameters

  • int – the number of experimental designs

generate_samples(parameters, size)

The main method of :class: ~sampler.Sampler and its children. This will call the sample method for each of the parameters and return the resulting samples

Parameters:
  • parameters (collection) – a collection of Parameter instances

  • size (int) – the number of samples to generate.

Returns:

with the paramertainty.name as key, and the sample as value

Return type:

dict

class ema_workbench.em_framework.samplers.LHSSampler

generates a Latin Hypercube sample for each of the parameters

sample(distribution, size)

generate a Latin Hypercube Sample.

Parameters:
  • distribution (scipy frozen distribution)

  • size (int) – the number of samples to generate

Returns:

with the paramertainty.name as key, and the sample as value

Return type:

dict

class ema_workbench.em_framework.samplers.MonteCarloSampler

generates a Monte Carlo sample for each of the parameters.

sample(distribution, size)

generate a Monte Carlo Sample.

Parameters:
  • distribution (scipy frozen distribution)

  • size (int) – the number of samples to generate

Returns:

with the paramertainty.name as key, and the sample as value

Return type:

dict

class ema_workbench.em_framework.samplers.UniformLHSSampler
generate_samples(parameters, size)
Parameters:
  • parameters (collection)

  • size (int)

Returns:

dict with the parameter.name as key, and the sample as value

Return type:

dict

ema_workbench.em_framework.samplers.determine_parameters(models, attribute, union=True)

determine the parameters over which to sample

Parameters:
  • models (a collection of AbstractModel instances)

  • attribute ({'uncertainties', 'levers'})

  • union (bool, optional) – in case of multiple models, sample over the union of levers, or over the intersection of the levers

Return type:

collection of Parameter instances

ema_workbench.em_framework.samplers.sample_levers(models, n_samples, union=True, sampler=<ema_workbench.em_framework.samplers.LHSSampler object>)

generate policies by sampling over the levers

Parameters:
  • models (a collection of AbstractModel instances)

  • n_samples (int)

  • union (bool, optional) – in case of multiple models, sample over the union of levers, or over the intersection of the levers

  • sampler (Sampler instance, optional)

Return type:

generator yielding Policy instances

ema_workbench.em_framework.samplers.sample_parameters(parameters, n_samples, sampler=<ema_workbench.em_framework.samplers.LHSSampler object>, kind=<class 'ema_workbench.em_framework.points.Point'>)

generate cases by sampling over the parameters

Parameters:
  • parameters (collection of AbstractParameter instances)

  • n_samples (int)

  • sampler (Sampler instance, optional)

  • kind ({Case, Scenario, Policy}, optional) – the class into which the samples are collected

Return type:

generator yielding Case, Scenario, or Policy instances

ema_workbench.em_framework.samplers.sample_uncertainties(models, n_samples, union=True, sampler=<ema_workbench.em_framework.samplers.LHSSampler object>)

generate scenarios by sampling over the uncertainties

Parameters:
  • models (a collection of AbstractModel instances)

  • n_samples (int)

  • union (bool, optional) – in case of multiple models, sample over the union of uncertainties, or over the intersection of the uncertainties

  • sampler (Sampler instance, optional)

Return type:

generator yielding Scenario instances

salib_samplers

Samplers for working with SALib

class ema_workbench.em_framework.salib_samplers.FASTSampler(m=4)

Sampler generating a Fourier Amplitude Sensitivity Test (FAST) using SALib

Parameters:

m (int (default: 4)) – The interference parameter, i.e., the number of harmonics to sum in the Fourier series decomposition

class ema_workbench.em_framework.salib_samplers.MorrisSampler(num_levels=4, optimal_trajectories=None, local_optimization=True)

Sampler generating a morris design using SALib

Parameters:
  • num_levels (int) – The number of grid levels

  • grid_jump (int) – The grid jump size

  • optimal_trajectories (int, optional) – The number of optimal trajectories to sample (between 2 and N)

  • local_optimization (bool, optional) – Flag whether to use local optimization according to Ruano et al. (2012) Speeds up the process tremendously for bigger N and num_levels. Stating this variable to be true causes the function to ignore gurobi.

class ema_workbench.em_framework.salib_samplers.SobolSampler(second_order=True)

Sampler generating a Sobol design using SALib

Parameters:

second_order (bool, optional) – indicates whether second order effects should be included

ema_workbench.em_framework.salib_samplers.get_SALib_problem(uncertainties)

returns a dict with a problem specification as required by SALib

experiment_runner

helper module for running experiments and keeping track of which model has been initialized with which policy.

class ema_workbench.em_framework.experiment_runner.ExperimentRunner(msis)

Helper class for running the experiments

This class contains the logic for initializing models properly, running the experiment, getting the results, and cleaning up afterwards.

Parameters:
  • msis (dict)

  • model_kwargs (dict)

msi_initializiation

keeps track of which model is initialized with which policy.

Type:

dict

msis

models indexed by name

Type:

dict

model_kwargs

keyword arguments for model_init

Type:

dict

run_experiment(experiment)

The logic for running a single experiment. This code makes sure that model(s) are initialized correctly.

Parameters:

experiment (Case instance)

Returns:

  • experiment_id (int)

  • case (dict)

  • policy (str)

  • model_name (str)

  • result (dict)

Raises:
  • EMAError – if the model instance raises an EMA error, these are reraised.

  • Exception – Catch all for all other exceptions being raised by the model. These are reraised.

callbacks

This module provides an abstract base class for a callback and a default implementation.

If you want to store the data in a way that is different from the functionality provided by the default callback, you can write your own extension of callback. For example, you can easily implement a callback that stores the data in e.g. a NoSQL file.

The only method to implement is the __call__ magic method. To use logging of progress, always call super.

class ema_workbench.em_framework.callbacks.AbstractCallback(uncertainties, levers, outcomes, nr_experiments, reporting_interval=None, reporting_frequency=10, log_progress=False)

Abstract base class from which different call back classes can be derived. Callback is responsible for storing the results of the runs.

Parameters:
  • uncertainties (list) – list of uncertain parameters

  • levers (list) – list of lever parameters

  • outcomes (list) – a list of outcomes

  • nr_experiments (int) – the total number of experiments to be executed

  • reporting_interval (int, optional) – the interval between progress logs

  • reporting_frequency (int, optional) – the total number of progress logs

  • log_progress (bool, optional) – if true, progress is logged, if false, use tqdm progress bar.

i

a counter that keeps track of how many experiments have been saved

Type:

int

nr_experiments
Type:

int

outcomes
Type:

list

parameters

combined list of uncertain parameters and lever parameters

Type:

list

reporting_interval

the interval between progress logs

Type:

int,

abstract get_results()

method for retrieving the results. Called after all experiments have been completed. Any extension of AbstractCallback needs to implement this method.

class ema_workbench.em_framework.callbacks.DefaultCallback(uncertainties, levers, outcomes, nr_experiments, reporting_interval=100, reporting_frequency=10, log_progress=False)

Default callback class

Parameters:
  • uncertainties (list) – list of uncertain parameters

  • levers (list) – list of lever parameters

  • outcomes (list) – a list of outcomes

  • nr_experiments (int) – the total number of experiments to be executed

  • reporting_interval (int, optional) – the interval between progress logs

  • reporting_frequency (int, optional) – the total number of progress logs

  • log_progress (bool, optional) – if true, progress is logged, if false, use tqdm progress bar.

Callback can be used in perform_experiments as a means for specifying the way in which the results should be handled. If no callback is specified, this default implementation is used. This one can be overwritten or replaced with a callback of your own design. For example if you prefer to store the result in a database or write them to a text file.

get_results()

method for retrieving the results. Called after all experiments have been completed. Any extension of AbstractCallback needs to implement this method.

class ema_workbench.em_framework.callbacks.FileBasedCallback(uncertainties, levers, outcomes, nr_experiments, reporting_interval=100, reporting_frequency=10)

Callback that stores data in csv files while running th model

Parameters:
  • uncertainties (list) – list of uncertain parameters

  • levers (list) – list of lever parameters

  • outcomes (list) – a list of outcomes

  • nr_experiments (int) – the total number of experiments to be executed

  • reporting_interval (int, optional) – the interval between progress logs

  • reporting_frequency (int, optional) – the total number of progress logs

  • log_progress (bool, optional) – if true, progress is logged, if false, use tqdm progress bar.

Warning

This class is still in beta. the data is stored in ./temp, relative to the current working directory. If this directory already exists, it will be overwritten.

get_results()

method for retrieving the results. Called after all experiments have been completed. Any extension of AbstractCallback needs to implement this method.

points

classes for representing points in parameter space, as well as associated hellper functions

class ema_workbench.em_framework.points.Experiment(name, model_name, policy, scenario, experiment_id)

A convenience object that contains a specification of the model, policy, and scenario to run

name
Type:

str

model_name
Type:

str

policy
Type:

Policy instance

scenario
Type:

Scenario instance

experiment_id
Type:

int

class ema_workbench.em_framework.points.ExperimentReplication(scenario, policy, constants, replication=None)

helper class that combines scenario, policy, any constants, and replication information (seed etc) into a single dictionary.

This class represent the complete specification of parameters to run for a given experiment.

class ema_workbench.em_framework.points.Point(name=None, unique_id=None, **kwargs)
class ema_workbench.em_framework.points.Policy(name=None, **kwargs)

Helper class representing a policy

name
Type:

str, int, or float

id
Type:

int

all keyword arguments are wrapped into a dict.
class ema_workbench.em_framework.points.Scenario(name=None, **kwargs)

Helper class representing a scenario

name
Type:

str, int, or float

id
Type:

int

all keyword arguments are wrapped into a dict.
ema_workbench.em_framework.points.combine_cases_factorial(*point_collections)

Combine collections of cases in a full factorial manner

Parameters:

point_collections (collection of collections of Point instances)

Yields:

Point

ema_workbench.em_framework.points.combine_cases_sampling(*point_collection)

Combine collections of cases by iterating over the longest collection while sampling with replacement from the others

Parameters:

point_collection (collection of collection of Point instances)

Yields:

Point

ema_workbench.em_framework.points.experiment_generator(scenarios, model_structures, policies, combine='factorial')

generator function which yields experiments

Parameters:
  • scenarios (iterable of dicts)

  • model_structures (list)

  • policies (list)

  • {'factorial (combine =) – controls how to combine scenarios, policies, and model_structures into experiments.

  • sample'} – controls how to combine scenarios, policies, and model_structures into experiments.

Notes

if combine is ‘factorial’ then this generator is essentially three nested loops: for each model structure, for each policy, for each scenario, return the experiment. This means that designs should not be a generator because this will be exhausted after the running the first policy on the first model. if combine is ‘zipover’ then this generator cycles over scenarios, policies and model structures until the longest of the three collections is exhausted.

futures_multiprocessing

support for using the multiprocessing library in combination with the workbench

class ema_workbench.em_framework.futures_multiprocessing.MultiprocessingEvaluator(msis, n_processes=None, maxtasksperchild=None, **kwargs)

evaluator for experiments using a multiprocessing pool

Parameters:
  • msis (collection of models)

  • n_processes (int (optional)) – A negative number can be inputted to use the number of logical cores minus the negative cores. For example, on a 12 thread processor, -2 results in using 10 threads.

  • max_tasks (int (optional))

note that the maximum number of available processes is either multiprocessing.cpu_count() and in case of windows, this never can be higher then 61

evaluate_experiments(scenarios, policies, callback, combine='factorial')

used by ema_workbench

finalize()

finalize the evaluator

initialize()

initialize the evaluator

futures_ipyparallel

This module provides functionality for combining the EMA workbench with IPython parallel.

class ema_workbench.em_framework.futures_ipyparallel.IpyparallelEvaluator(msis, client, **kwargs)

evaluator for using an ipypparallel pool

evaluate_experiments(scenarios, policies, callback, combine='factorial', **kwargs)

used by ema_workbench

finalize()

finalize the evaluator

initialize()

initialize the evaluator

futures_mpi

class ema_workbench.em_framework.futures_mpi.MPIEvaluator(msis, n_processes=None, **kwargs)

Evaluator for experiments using MPI Pool Executor from mpi4py

evaluate_experiments(scenarios, policies, callback, combine='factorial', **kwargs)

used by ema_workbench

finalize()

finalize the evaluator

initialize()

initialize the evaluator

futures_util

ema_workbench.em_framework.futures_util.finalizer(experiment_runner)

cleanup

ema_workbench.em_framework.futures_util.setup_working_directories(models, root_dir)

copies the working directory of each model to a process specific temporary directory and update the working directory of the model

Parameters:
  • models (list)

  • root_dir (str)

util

utilities used throughout em_framework

class ema_workbench.em_framework.util.Counter(startfrom=0)

helper function for generating counter based names for NamedDicts

class ema_workbench.em_framework.util.NamedDict(name=<function representation>, **kwargs)
class ema_workbench.em_framework.util.ProgressTrackingMixIn(N, reporting_interval, logger, log_progress=False, log_func=<function ProgressTrackingMixIn.<lambda>>)

Mixin for monitoring progress

Parameters:
  • N (int) – total number of experiments

  • reporting_interval (int) – nfe between logging progress

  • logger (logger instance)

  • log_progress (bool, optional)

  • log_func (callable, optional) – function called with self as only argument, should invoke self._logger with custom log message

i
Type:

int

reporting_interval
Type:

int

log_progress
Type:

bool

log_func
Type:

callable

pbar

if log_progress is true, None, if false tqdm.tqdm instance

Type:

{None, tqdm.tqdm instance}

ema_workbench.em_framework.util.combine(*args)

combine scenario and policy into a single experiment dict

Parameters:

args (two or more dicts that need to be combined)

Return type:

a single unified dict containing the entries from all dicts

Raises:

EMAError – if a keyword argument exists in more than one dict

ema_workbench.em_framework.util.determine_objects(models, attribute, union=True)

determine the parameters over which to sample

Parameters:
  • models (a collection of AbstractModel instances)

  • attribute ({'uncertainties', 'levers', 'outcomes'})

  • union (bool, optional) – in case of multiple models, sample over the union of levers, or over the intersection of the levers

Return type:

collection of Parameter instances

ema_workbench.em_framework.util.representation(named_dict)

helper function for generating repr based names for NamedDicts

Connectors

vensim

vensimDLLwrapper

pysd_connector

simio

excel

This module provides a base class that can be used to perform EMA on Excel models. It relies on win32com

class ema_workbench.connectors.excel.ExcelModel(name, wd=None, model_file=None, default_sheet=None, pointers=None)

Analysis

prim

A scenario discovery oriented implementation of PRIM.

The implementation of prim provided here is data type aware, so categorical variables will be handled appropriately. It also uses a non-standard objective function in the peeling and pasting phase of the algorithm. This algorithm looks at the increase in the mean divided by the amount of data removed. So essentially, it uses something akin to the first order derivative of the original objective function.

The implementation is designed for interactive use in combination with the jupyter notebook.

class ema_workbench.analysis.prim.PRIMObjectiveFunctions(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
class ema_workbench.analysis.prim.Prim(x, y, threshold, obj_function=PRIMObjectiveFunctions.LENIENT1, peel_alpha=0.05, paste_alpha=0.05, mass_min=0.05, threshold_type=1, mode=RuleInductionType.BINARY, update_function='default')

Patient rule induction algorithm

The implementation of Prim is tailored to interactive use in the context of scenario discovery

Parameters:
  • x (DataFrame) – the independent variables

  • y (1d ndarray) – the dependent variable

  • threshold (float) – the density threshold that a box has to meet

  • obj_function ({LENIENT1, LENIENT2, ORIGINAL}) – the objective function used by PRIM. Defaults to a lenient objective function based on the gain of mean divided by the loss of mass.

  • peel_alpha (float, optional) – parameter controlling the peeling stage (default = 0.05).

  • paste_alpha (float, optional) – parameter controlling the pasting stage (default = 0.05).

  • mass_min (float, optional) – minimum mass of a box (default = 0.05).

  • threshold_type ({ABOVE, BELOW}) – whether to look above or below the threshold value

  • mode ({RuleInductionType.BINARY, RuleInductionType.REGRESSION}, optional) – indicated whether PRIM is used for regression, or for scenario classification in which case y should be a binary vector

  • {'default' (update_function =) – controls behavior of PRIM after having found a first box. use either the default behavior were all points are removed, or the procedure suggested by guivarch et al (2016) doi:10.1016/j.envsoft.2016.03.006 to simply set all points to be no longer of interest (only valid in binary mode).

  • 'guivarch'} – controls behavior of PRIM after having found a first box. use either the default behavior were all points are removed, or the procedure suggested by guivarch et al (2016) doi:10.1016/j.envsoft.2016.03.006 to simply set all points to be no longer of interest (only valid in binary mode).

  • optional – controls behavior of PRIM after having found a first box. use either the default behavior were all points are removed, or the procedure suggested by guivarch et al (2016) doi:10.1016/j.envsoft.2016.03.006 to simply set all points to be no longer of interest (only valid in binary mode).

See also

cart

property boxes

Property for getting a list of box limits

determine_coi(indices)

Given a set of indices on y, how many cases of interest are there in this set.

Parameters:

indices (ndarray) – a valid index for y

Returns:

the number of cases of interest.

Return type:

int

Raises:

ValueError – if threshold_type is not either ABOVE or BELOW

find_box()

Execute one iteration of the PRIM algorithm. That is, find one box, starting from the current state of Prim.

property stats

property for getting a list of dicts containing the statistics for each box

class ema_workbench.analysis.prim.PrimBox(prim, box_lims, indices)

A class that holds information for a specific box

coverage

coverage of currently selected box

Type:

float

density

density of currently selected box

Type:

float

mean

mean of currently selected box

Type:

float

res_dim

number of restricted dimensions of currently selected box

Type:

int

mass

mass of currently selected box

Type:

float

peeling_trajectory

stats for each box in peeling trajectory

Type:

DataFrame

box_lims

list of box lims for each box in peeling trajectory

Type:

list

by default, the currently selected box is the last box on the peeling trajectory, unless this is changed via PrimBox.select().

drop_restriction(uncertainty='', i=-1)

Drop the restriction on the specified dimension for box i

Parameters:
  • i (int, optional) – defaults to the currently selected box, which defaults to the latest box on the trajectory

  • uncertainty (str)

Replace the limits in box i with a new box where for the specified uncertainty the limits of the initial box are being used. The resulting box is added to the peeling trajectory.

inspect(i=None, style='table', ax=None, **kwargs)

Write the stats and box limits of the user specified box to standard out. If i is not provided, the last box will be printed

Parameters:
  • i (int or list of ints, optional) – the index of the box, defaults to currently selected box

  • style ({'table', 'graph', 'data'}) – the style of the visualization. ‘table’ prints the stats and boxlim. ‘graph’ creates a figure. ‘data’ returns a list of tuples, where each tuple contains the stats and the box_lims.

  • ax (axes or list of axes instances, optional) – used in conjunction with graph style, allows you to control the axes on which graph is plotted if i is list, axes should be list of equal length. If axes is None, each i_j in i will be plotted in a separate figure.

  • that (additional kwargs are passed to the helper function)

  • graph (generates the table or)

resample(i=None, iterations=10, p=0.5)

Calculate resample statistics for candidate box i

Parameters:
  • i (int, optional)

  • iterations (int, optional)

  • p (float, optional)

Return type:

DataFrame

select(i)

select an entry from the peeling and pasting trajectory and update the prim box to this selected box.

Parameters:

i (int) – the index of the box to select.

show_pairs_scatter(i=None, dims=None, diag_kind=DiagKind.KDE, upper='scatter', lower='contour', fill_subplots=True)

Make a pair wise scatter plot of all the restricted dimensions with color denoting whether a given point is of interest or not and the boxlims superimposed on top.

Parameters:
  • i (int, optional)

  • dims (list of str, optional) – dimensions to show, defaults to all restricted dimensions

  • diag_kind ({DiagKind.KDE, DiagKind.CDF}) – Plot diagonal as kernel density estimate (‘kde’) or cumulative density function (‘cdf’).

  • upper (string, optional) – Use either ‘scatter’, ‘contour’, or ‘hist’ (bivariate histogram) plots for upper and lower triangles. Upper triangle can also be ‘none’ to eliminate redundancy. Legend uses lower triangle style for markers.

  • lower (string, optional) – Use either ‘scatter’, ‘contour’, or ‘hist’ (bivariate histogram) plots for upper and lower triangles. Upper triangle can also be ‘none’ to eliminate redundancy. Legend uses lower triangle style for markers.

  • fill_subplots (Boolean, optional) – if True, subplots are resized to fill their respective axes. This removes unnecessary whitespace, but may be undesirable for some variable combinations.

Return type:

seaborn PairGrid

show_ppt()

show the peeling and pasting trajectory in a figure

show_tradeoff(cmap=<matplotlib.colors.ListedColormap object>, annotated=False)

Visualize the trade off between coverage and density. Color is used to denote the number of restricted dimensions.

Parameters:
  • cmap (valid matplotlib colormap)

  • annotated (bool, optional. Shows point labels if True.)

Return type:

a Figure instance

update(box_lims, indices)

update the box to the provided box limits.

Parameters:
  • box_lims (DataFrame) – the new box_lims

  • indices (ndarray) – the indices of y that are inside the box

write_ppt_to_stdout()

write the peeling and pasting trajectory to stdout

ema_workbench.analysis.prim.pca_preprocess(experiments, y, subsets=None, exclude={})

perform PCA to preprocess experiments before running PRIM

Pre-process the data by performing a pca based rotation on it. This effectively turns the algorithm into PCA-PRIM as described in Dalal et al (2013)

Parameters:
  • experiments (DataFrame)

  • y (ndarray) – one dimensional binary array

  • subsets (dict, optional) – expects a dictionary with group name as key and a list of uncertainty names as values. If this is used, a constrained PCA-PRIM is executed

  • exclude (list of str, optional) – the uncertainties that should be excluded from the rotation

Returns:

  • rotated_experiments – DataFrame

  • rotation_matrix – DataFrame

Raises:

RuntimeError – if mode is not binary (i.e. y is not a binary classification). if X contains non numeric columns

ema_workbench.analysis.prim.run_constrained_prim(experiments, y, issignificant=True, **kwargs)

Run PRIM repeatedly while constraining the maximum number of dimensions available in x

Improved usage of PRIM as described in Kwakkel (2019).

Parameters:
  • x (DataFrame)

  • y (numpy array)

  • issignificant (bool, optional) – if True, run prim only on subsets of dimensions that are significant for the initial PRIM on the entire dataset.

  • **kwargs (any additional keyword arguments are passed on to PRIM)

Return type:

PrimBox instance

ema_workbench.analysis.prim.setup_prim(results, classify, threshold, incl_unc=[], **kwargs)

Helper function for setting up the prim algorithm

Parameters:
  • results (tuple) – tuple of DataFrame and dict with numpy arrays the return from perform_experiments().

  • classify (str or callable) – either a string denoting the outcome of interest to use or a function.

  • threshold (double) – the minimum score on the density of the last box on the peeling trajectory. In case of a binary classification, this should be between 0 and 1.

  • incl_unc (list of str, optional) – list of uncertainties to include in prim analysis

  • kwargs (dict) – valid keyword arguments for prim.Prim

Return type:

a Prim instance

Raises:
  • PrimException – if data resulting from classify is not a 1-d array.

  • TypeError – if classify is not a string or a callable.

cart

A scenario discovery oriented implementation of CART. It essentially is a wrapper around scikit-learn’s version of CART.

class ema_workbench.analysis.cart.CART(x, y, mass_min=0.05, mode=RuleInductionType.BINARY)

CART algorithm

can be used in a manner similar to PRIM. It provides access to the underlying tree, but it can also show the boxes described by the tree in a table or graph form similar to prim.

Parameters:
  • x (DataFrame)

  • y (1D ndarray)

  • mass_min (float, optional) – a value between 0 and 1 indicating the minimum fraction of data points in a terminal leaf. Defaults to 0.05, identical to prim.

  • mode ({BINARY, CLASSIFICATION, REGRESSION}) – indicates the mode in which CART is used. Binary indicates binary classification, classification is multiclass, and regression is regression.

boxes

list of DataFrame box lims

Type:

list

stats

list of dicts with stats

Type:

list

Notes

This class is a wrapper around scikit-learn’s CART algorithm. It provides an interface to CART that is more oriented towards scenario discovery, and shared some methods with PRIM

See also

prim

property boxes

rtype: list with boxlims for each terminal leaf

build_tree()

train CART on the data

show_tree(mplfig=True, format='png')

return a png (defaults) or svg of the tree

On Windows, graphviz needs to be installed with conda.

Parameters:
  • mplfig (bool, optional) – if true (default) returns a matplotlib figure with the tree, otherwise, it returns the output as bytes

  • format ({'png', 'svg'}, default 'png') – Gives a format of the output.

property stats

rtype: list with scenario discovery statistics for each terminal leaf

ema_workbench.analysis.cart.setup_cart(results, classify, incl_unc=None, mass_min=0.05)

helper function for performing cart in combination with data generated by the workbench.

Parameters:
  • results (tuple of DataFrame and dict with numpy arrays) – the return from perform_experiments().

  • classify (string, function or callable) – either a string denoting the outcome of interest to use or a function.

  • incl_unc (list of strings, optional)

  • mass_min (float, optional)

Raises:

TypeError – if classify is not a string or a callable.

clusterer

This module provides time series clustering functionality using complex invariant distance. For details see Steinmann et al (2020)

ema_workbench.analysis.clusterer.apply_agglomerative_clustering(distances, n_clusters, metric='precomputed', linkage='average')

apply agglomerative clustering to the distances

Parameters:
  • distances (ndarray)

  • n_clusters (int)

  • metric (str, optional. The distance metric to use. The default is 'precomputed'. For a list of available metrics, see the documentation of scipy.spatial.distance.pdist.)

  • linkage ({'average', 'complete', 'single'})

Return type:

1D ndarray with cluster assignment

ema_workbench.analysis.clusterer.calculate_cid(data, condensed_form=False)

calculate the complex invariant distance between all rows

Parameters:
  • data (2d ndarray)

  • condensed_form (bool, optional)

Returns:

a 2D ndarray with the distances between all time series, or condensed form similar to scipy.spatial.distance.pdist¶

Return type:

distances

ema_workbench.analysis.clusterer.plot_dendrogram(distances)

plot dendrogram for distances

logistic_regression

This module implements logistic regression for scenario discovery.

The module draws its inspiration from Quinn et al (2018) 10.1029/2018WR022743 and Lamontagne et al (2019). The implementation here generalizes their work and embeds it in a more typical scenario discovery workflow with a posteriori selection of the appropriate number of dimensions to include. It is modeled as much as possible on the api used for PRIM and CART.

class ema_workbench.analysis.logistic_regression.Logit(x, y, threshold=0.95)

Implements an interactive version of logistic regression using BIC based forward selection

Parameters:
  • x (DataFrame)

  • y (numpy Array)

  • threshold (float)

coverage

coverage of currently selected model

Type:

float

density

density of currently selected model

Type:

float

res_dim

number of restricted dimensions of currently selected model

Type:

int

peeling_trajectory

stats for each model in peeling trajectory

Type:

DataFrame

models

list of models associated with each model on the peeling trajectory

Type:

list

inspect(i, step=0.1)

Inspect one of the models by showing the threshold tradeoff and summary2

Parameters:
  • i (int)

  • step (float between [0, 1])

plot_pairwise_scatter(i, threshold=0.95)

plot pairwise scatter plot of data points, with contours as background

Parameters:
  • i (int)

  • threshold (float)

Return type:

Figure instance

The lower triangle background is a binary contour based on the specified threshold. All axis not shown are set to a default value in the middle of their range

The upper triangle shows a contour map with the conditional probability, again setting all non shown dimensions to a default value in the middle of their range.

run(**kwargs)

run logistic regression using forward selection using a Bayesian Information Criterion for selecting whether and if so which dimension to add

Parameters:
  • details (kwargs are passed on to model.fit. For)

  • https (see)

show_threshold_tradeoff(i, cmap=<matplotlib.colors.ListedColormap object>, step=0.1)

Visualize the trade off between coverage and density for a given model i across the range of threshold values

Parameters:
  • i (int)

  • cmap (valid matplotlib colormap)

  • step (float, optional)

Return type:

a Figure instance

show_tradeoff(cmap=<matplotlib.colors.ListedColormap object>, annotated=False)

Visualize the trade off between coverage and density. Color is used to denote the number of restricted dimensions.

Parameters:
  • cmap (valid matplotlib colormap)

  • annotated (bool, optional. Shows point labels if True.)

Return type:

a Figure instance

update(model, selected)

helper function for adding a model to the collection of models and update the associated attributes

Parameters:
  • model (statsmodel fitted logit model)

  • selected (list of str)

dimensional_stacking

This module provides functionality for doing dimensional stacking of uncertain factors in order to reveal patterns in the values for a single outcome of interests. It is inspired by the work reported here with one deviation.

Rather than using association rules to identify the uncertain factors to use, this code uses random forest based feature scoring instead. It is also possible to use the code provided here in combination with any other feature scoring or factor prioritization technique instead, or by simply selecting uncertain factors in some other manner.

ema_workbench.analysis.dimensional_stacking.create_pivot_plot(x, y, nr_levels=3, labels=True, categories=True, nbins=3, bin_labels=False)

convenience function for easily creating a pivot plot

Parameters:
  • x (DataFrame)

  • y (1d ndarray)

  • nr_levels (int, optional) – the number of levels in the pivot table. The number of uncertain factors included in the pivot table is two times the number of levels.

  • labels (bool, optional) – display names of uncertain factors

  • categories (bool, optional) – display category names for each uncertain factor

  • nbins (int, optional) – number of bins to use when discretizing continuous uncertain factors

  • bin_labels (bool, optional) – if True show bin interval / name, otherwise show only a number

Return type:

Figure

This function performs feature scoring using random forests, selects a number of high scoring factors based on the specified number of levels, creates a pivot table, and visualizes the table. This is a convenience function. For more control over the process, use the code in this function as a template.

regional_sa

Module offers support for performing basic regional sensitivity analysis. The module can be used to perform regional sensitivity analysis on all uncertainties specified in the experiment array, as well as the ability to zoom in on any given uncertainty in more detail.

ema_workbench.analysis.regional_sa.plot_cdfs(x, y, ccdf=False)

plot cumulative density functions for each column in x, based on the classification specified in y.

Parameters:
  • x (DataFrame) – the experiments to use in the cdfs

  • y (ndaray) – the categorization for the data

  • ccdf (bool, optional) – if true, plot a complementary cdf instead of a normal cdf.

Return type:

a matplotlib Figure instance

feature_scoring

Feature scoring functionality

ema_workbench.analysis.feature_scoring.CHI2(X, y)

Compute chi-squared stats between each non-negative feature and class.

This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.

Recall that the chi-square test measures dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.

Read more in the User Guide.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Sample vectors.

  • y (array-like of shape (n_samples,)) – Target vector (class labels).

Returns:

  • chi2 (ndarray of shape (n_features,)) – Chi2 statistics for each feature.

  • p_values (ndarray of shape (n_features,)) – P-values for each feature.

See also

f_classif

ANOVA F-value between label/feature for classification tasks.

f_regression

F-value between label/feature for regression tasks.

Notes

Complexity of this algorithm is O(n_classes * n_features).

Examples

>>> import numpy as np
>>> from sklearn.feature_selection import chi2
>>> X = np.array([[1, 1, 3],
...               [0, 1, 5],
...               [5, 4, 1],
...               [6, 6, 2],
...               [1, 4, 0],
...               [0, 0, 0]])
>>> y = np.array([1, 1, 0, 0, 2, 2])
>>> chi2_stats, p_values = chi2(X, y)
>>> chi2_stats
array([15.3...,  6.5       ,  8.9...])
>>> p_values
array([0.0004..., 0.0387..., 0.0116... ])
ema_workbench.analysis.feature_scoring.F_CLASSIFICATION(X, y)

Compute the ANOVA F-value for the provided sample.

Read more in the User Guide.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The set of regressors that will be tested sequentially.

  • y (array-like of shape (n_samples,)) – The target vector.

Returns:

  • f_statistic (ndarray of shape (n_features,)) – F-statistic for each feature.

  • p_values (ndarray of shape (n_features,)) – P-values associated with the F-statistic.

See also

chi2

Chi-squared stats of non-negative features for classification tasks.

f_regression

F-value between label/feature for regression tasks.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.feature_selection import f_classif
>>> X, y = make_classification(
...     n_samples=100, n_features=10, n_informative=2, n_clusters_per_class=1,
...     shuffle=False, random_state=42
... )
>>> f_statistic, p_values = f_classif(X, y)
>>> f_statistic
array([2.2...e+02, 7.0...e-01, 1.6...e+00, 9.3...e-01,
       5.4...e+00, 3.2...e-01, 4.7...e-02, 5.7...e-01,
       7.5...e-01, 8.9...e-02])
>>> p_values
array([7.1...e-27, 4.0...e-01, 1.9...e-01, 3.3...e-01,
       2.2...e-02, 5.7...e-01, 8.2...e-01, 4.5...e-01,
       3.8...e-01, 7.6...e-01])
ema_workbench.analysis.feature_scoring.F_REGRESSION(X, y, *, center=True, force_finite=True)

Univariate linear regression tests returning F-statistic and p-values.

Quick linear model for testing the effect of a single regressor, sequentially for many regressors.

This is done in 2 steps:

  1. The cross correlation between each regressor and the target is computed using r_regression() as:

    E[(X[:, i] - mean(X[:, i])) * (y - mean(y))] / (std(X[:, i]) * std(y))
    
  2. It is converted to an F score and then to a p-value.

f_regression() is derived from r_regression() and will rank features in the same order if all the features are positively correlated with the target.

Note however that contrary to f_regression(), r_regression() values lie in [-1, 1] and can thus be negative. f_regression() is therefore recommended as a feature selection criterion to identify potentially predictive feature for a downstream classifier, irrespective of the sign of the association with the target variable.

Furthermore f_regression() returns p-values while r_regression() does not.

Read more in the User Guide.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data matrix.

  • y (array-like of shape (n_samples,)) – The target vector.

  • center (bool, default=True) – Whether or not to center the data matrix X and the target vector y. By default, X and y will be centered.

  • force_finite (bool, default=True) –

    Whether or not to force the F-statistics and associated p-values to be finite. There are two cases where the F-statistic is expected to not be finite:

    • when the target y or some features in X are constant. In this case, the Pearson’s R correlation is not defined leading to obtain np.nan values in the F-statistic and p-value. When force_finite=True, the F-statistic is set to 0.0 and the associated p-value is set to 1.0.

    • when a feature in X is perfectly correlated (or anti-correlated) with the target y. In this case, the F-statistic is expected to be np.inf. When force_finite=True, the F-statistic is set to np.finfo(dtype).max and the associated p-value is set to 0.0.

    Added in version 1.1.

Returns:

  • f_statistic (ndarray of shape (n_features,)) – F-statistic for each feature.

  • p_values (ndarray of shape (n_features,)) – P-values associated with the F-statistic.

See also

r_regression

Pearson’s R between label/feature for regression tasks.

f_classif

ANOVA F-value between label/feature for classification tasks.

chi2

Chi-squared stats of non-negative features for classification tasks.

SelectKBest

Select features based on the k highest scores.

SelectFpr

Select features based on a false positive rate test.

SelectFdr

Select features based on an estimated false discovery rate.

SelectFwe

Select features based on family-wise error rate.

SelectPercentile

Select features based on percentile of the highest scores.

Examples

>>> from sklearn.datasets import make_regression
>>> from sklearn.feature_selection import f_regression
>>> X, y = make_regression(
...     n_samples=50, n_features=3, n_informative=1, noise=1e-4, random_state=42
... )
>>> f_statistic, p_values = f_regression(X, y)
>>> f_statistic
array([1.2...+00, 2.6...+13, 2.6...+00])
>>> p_values
array([2.7..., 1.5..., 1.0...])
ema_workbench.analysis.feature_scoring.get_ex_feature_scores(x, y, mode=RuleInductionType.CLASSIFICATION, nr_trees=100, max_features=None, max_depth=None, min_samples_split=2, min_samples_leaf=None, min_weight_fraction_leaf=0, max_leaf_nodes=None, bootstrap=True, oob_score=True, random_state=None)

Get feature scores using extra trees

Parameters:
Returns:

  • pandas DataFrame – sorted in descending order of tuples with uncertainty and feature scores

  • object – either ExtraTreesClassifier or ExtraTreesRegressor

ema_workbench.analysis.feature_scoring.get_feature_scores_all(x, y, alg='extra trees', mode=RuleInductionType.REGRESSION, **kwargs)

perform feature scoring for all outcomes using the specified feature scoring algorithm

Parameters:
  • x (DataFrame)

  • y (dict of 1d numpy arrays) – the outcomes, with a string as key, and a 1D array for each outcome

  • alg ({'extra trees', 'random forest', 'univariate'}, optional)

  • mode ({RuleInductionType.REGRESSION, RuleInductionType.CLASSIFICATION}, optional)

  • kwargs (dict, optional) – any remaining keyword arguments will be passed to the specific feature scoring algorithm

Return type:

DataFrame instance

ema_workbench.analysis.feature_scoring.get_rf_feature_scores(x, y, mode=RuleInductionType.CLASSIFICATION, nr_trees=250, max_features='sqrt', max_depth=None, min_samples_split=2, min_samples_leaf=1, bootstrap=True, oob_score=True, random_state=None)

Get feature scores using a random forest

Parameters:
Returns:

  • pandas DataFrame – sorted in descending order of tuples with uncertainty and feature scores

  • object – either RandomForestClassifier or RandomForestRegressor

ema_workbench.analysis.feature_scoring.get_univariate_feature_scores(x, y, score_func=<function f_classif>)

calculate feature scores using univariate statistical tests. In case of categorical data, chi square or the Anova F value is used. In case of continuous data the Anova F value is used.

Parameters:
  • x (DataFrame)

  • y (1D nd.array)

  • score_func ({F_CLASSIFICATION, F_REGRESSION, CHI2}) – the score function to use, one of f_regression (regression), or f_classification or chi2 (classification).

Returns:

sorted in descending order of tuples with uncertainty and feature scores (i.e. p values in this case).

Return type:

pandas DataFrame

plotting

this module provides functions for generating some basic figures. The code can be used as is, or serve as an example for writing your own code.

ema_workbench.analysis.plotting.envelopes(experiments, outcomes, outcomes_to_show=None, group_by=None, grouping_specifiers=None, density=None, fill=False, legend=True, titles={}, ylabels={}, log=False)

Make envelop plots.

An envelope shows over time the minimum and maximum value for a set of runs over time. It is thus to be used in case of time series data. The function will try to find a result labeled “TIME”. If this is present, these values will be used on the X-axis. In case of Vensim models, TIME is present by default.

Parameters:
  • experiments (DataFrame)

  • outcomes (OutcomesDict)

  • outcomes_to_show

  • str (list of)

  • str

  • optional

  • group_by (str, optional) – name of the column in the experimentsto group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • density ({None, HIST, KDE, VIOLIN, BOXPLOT}, optional)

  • fill (bool, optional)

  • legend (bool, optional)

  • titles (dict, optional) – a way for controlling whether each of the axes should have a title. There are three possibilities. If set to None, no title will be shown for any of the axes. If set to an empty dict, the default, the title is identical to the name of the outcome of interest. If you want to override these default names, provide a dict with the outcome of interest as key and the desired title as value. This dict need only contain the outcomes for which you want to use a different title.

  • ylabels (dict, optional) – way for controlling the ylabels. Works identical to titles.

  • log (bool, optional) – log scale density plot

Returns:

  • Figure (Figure instance)

  • axes (dict) – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density.

Note

the current implementation is limited to seven different categories in case of group_by, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.

Examples

>>> import util as util
>>> data = util.load_results(r'1000 flu cases.cPickle')
>>> envelopes(data, group_by='policy')

will show an envelope for three three different policies, for all the outcomes of interest. while

>>> envelopes(data, group_by='policy', categories=['static policy',
              'adaptive policy'])

will only show results for the two specified policies, ignoring any results associated with ‘no policy’.

ema_workbench.analysis.plotting.kde_over_time(experiments, outcomes, outcomes_to_show=None, group_by=None, grouping_specifiers=None, colormap='viridis', log=True)

Plot a KDE over time. The KDE is is visualized through a heatmap

Parameters:
  • experiments (DataFrame)

  • outcomes (OutcomesDict)

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot. If empty, all outcomes are plotted. Note: just names.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • colormap (str, optional) – valid matplotlib color map name

  • log (bool, optional)

Returns:

  • list of Figure instances – a figure instance for each group for each outcome

  • dict – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density

ema_workbench.analysis.plotting.lines(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, density='', legend=True, titles={}, ylabels={}, experiments_to_show=None, show_envelope=False, log=False)

This function takes the results from perform_experiments() and visualizes these as line plots. It is thus to be used in case of time series data. The function will try to find a result labeled “TIME”. If this is present, these values will be used on the X-axis. In case of Vensim models, TIME is present by default.

Parameters:
  • experiments (DataFrame)

  • outcomes (OutcomesDict)

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot. If empty, all outcomes are plotted. Note: just names.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • density ({None, HIST, KDE, VIOLIN, BOXPLOT}, optional)

  • legend (bool, optional)

  • titles (dict, optional) – a way for controlling whether each of the axes should have a title. There are three possibilities. If set to None, no title will be shown for any of the axes. If set to an empty dict, the default, the title is identical to the name of the outcome of interest. If you want to override these default names, provide a dict with the outcome of interest as key and the desired title as value. This dict need only contain the outcomes for which you want to use a different title.

  • ylabels (dict, optional) – way for controlling the ylabels. Works identical to titles.

  • experiments_to_show (ndarray, optional) – indices of experiments to show lines for, defaults to None.

  • show_envelope (bool, optional) – show envelope of outcomes. This envelope is the based on the minimum at each column and the maximum at each column.

  • log (bool, optional) – log scale density plot

Returns:

  • fig (Figure instance)

  • axes (dict) – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density.

Note

the current implementation is limited to seven different categories in case of group_by, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.

ema_workbench.analysis.plotting.multiple_densities(experiments, outcomes, points_in_time=None, outcomes_to_show=None, group_by=None, grouping_specifiers=None, density=Density.KDE, legend=True, titles={}, ylabels={}, experiments_to_show=None, plot_type=PlotType.ENVELOPE, log=False, **kwargs)

Make an envelope plot with multiple density plots over the run time

Parameters:
  • experiments (DataFrame)

  • outcomes (OutcomesDict)

  • points_in_time (list) – a list of points in time for which you want to see the density. At the moment up to 6 points in time are supported

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot. If empty, all outcomes are plotted. Note: just names.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (iterable or dict, optional) – set of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • density ({Density.KDE, Density.HIST, Density.VIOLIN, Density.BOXPLOT},) – optional

  • legend (bool, optional)

  • titles (dict, optional) – a way for controlling whether each of the axes should have a title. There are three possibilities. If set to None, no title will be shown for any of the axes. If set to an empty dict, the default, the title is identical to the name of the outcome of interest. If you want to override these default names, provide a dict with the outcome of interest as key and the desired title as value. This dict need only contain the outcomes for which you want to use a different title.

  • ylabels (dict, optional) – way for controlling the ylabels. Works identical to titles.

  • experiments_to_show (ndarray, optional) – indices of experiments to show lines for, defaults to None.

  • plot_type ({PlotType.ENVELOPE, PlotType.ENV_LIN, PlotType.LINES}, optional)

  • log (bool, optional)

Returns:

  • fig (Figure instance)

  • axes (dict) – dict with outcome as key, and axes as value. Density axes’ are indexed by the outcome followed by _density.

Note

the current implementation is limited to seven different categories in case of group_by, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.

Note

the connection patches are for some reason not drawn if log scaling is used for the density plots. This appears to be an issue in matplotlib itself.

pairs_plotting

This module provides R style pairs plotting functionality.

ema_workbench.analysis.pairs_plotting.pairs_density(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, ylabels={}, point_in_time=-1, log=True, gridsize=50, colormap='coolwarm', filter_scalar=True)

Generate a R style pairs hexbin density multiplot. In case of time-series data, the end states are used.

hexbin makes hexagonal binning plot of x versus y, where x, y are 1-D sequences of the same length, N. If C is None (the default), this is a histogram of the number of occurrences of the observations at (x[i],y[i]). For further detail see matplotlib on hexbin

Parameters:
  • experiments (DataFrame)

  • outcomes (dict)

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.

  • point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time

  • log (bool, optional) – indicating whether density should be log scaled. Defaults to True.

  • gridsize (int, optional) – controls the gridsize for the hexagonal bining. (default = 50)

  • cmap (str) – color map that is to be used in generating the hexbin. For details on the available maps, see pylab. (Defaults = coolwarm)

  • filter_scalar (bool, optional) – remove the non-time-series outcomes. Defaults to True.

Returns:

  • fig – the figure instance

  • dict – key is tuple of names of outcomes, value is associated axes instance

ema_workbench.analysis.pairs_plotting.pairs_lines(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, ylabels={}, legend=True, **kwargs)

Generate a R style pairs lines multiplot. It shows the behavior of two outcomes over time against each other. The origin is denoted with a circle and the end is denoted with a ‘+’.

Parameters:
  • experiments (DataFrame)

  • outcomes (dict)

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.

  • legend (bool, optional) – if true, and group_by is given, show a legend.

  • point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time

Returns:

  • fig – the figure instance

  • dict – key is tuple of names of outcomes, value is associated axes instance

ema_workbench.analysis.pairs_plotting.pairs_scatter(experiments, outcomes, outcomes_to_show=[], group_by=None, grouping_specifiers=None, ylabels={}, legend=True, point_in_time=-1, filter_scalar=False, **kwargs)

Generate a R style pairs scatter multiplot. In case of time-series data, the end states are used.

Parameters:
  • experiments (DataFrame)

  • outcomes (dict)

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.

  • legend (bool, optional) – if true, and group_by is given, show a legend.

  • point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time

  • filter_scalar (bool, optional) – remove the non-time-series outcomes. Defaults to True.

Returns:

  • fig (Figure instance) – the figure instance

  • axes (dict) – key is tuple of names of outcomes, value is associated axes instance

  • .. note:: the current implementation is limited to seven different – categories in case of column, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.

parcoords

This module offers a general purpose parallel coordinate plotting Class using matplotlib.

class ema_workbench.analysis.parcoords.ParallelAxes(limits, formatter=None, fontsize=14, rot=90)

Base class for creating a parallel axis plot.

Parameters:
  • limits (DataFrame) – A DataFrame specifying the limits for each dimension in the data set. For categorical data, the first cell should contain all categories. See get_limits for more details.

  • formattter (dict , optional) – dict with precision format strings for minima and maxima, use column name as key. If column is not present, or no formatter dict is provided, precision formatting defaults to .2f

  • fontsize (int, optional) – fontsize for defaults text items

  • rot (float, optional) – rotation of axis labels

limits

A DataFrame specifying the limits for each dimension in the data set. For categorical data, the first cell should contain all categories.

Type:

DataFrame

recoding

non numeric columns are converting to integer variables

Type:

dict

flipped_axes

set of Axes that are to be shown flipped

Type:

set

axis_labels
Type:

list of str

fontsize
Type:

int

fig
Type:

a matplotlib Figure instance

axes
Type:

list of matplotlib Axes instances

ticklabels
Type:

list of str

datalabels

labels associated with lines

Type:

list of str

Notes

The basic setup of the Parallel Axis plot is a row of mpl Axes instances, with all whitespace in between removed. The number of Axes is the number of columns - 1.

invert_axis(axis)

flip direction for specified axis

Parameters:

axis (str or list of str)

legend()

add a legend to the figure

plot(data, color=None, label=None, **kwargs)

plot data on parallel axes

Parameters:
  • data (DataFrame or Series)

  • color (valid mpl color, optional)

  • label (str, optional)

  • plot (any additional kwargs will be passed to matplotlib's)

  • method.

  • initializing (Data is normalized using the limits specified when)

  • ParallelAxis.

ema_workbench.analysis.parcoords.get_limits(data)

helper function to get limits of a FataFrame that can serve as input to ParallelAxis

Parameters:

data (DataFrame)

Return type:

DataFrame

b_and_w_plotting

This module offers basic functionality for converting a matplotlib figure to black and white. The provided functionality is largely determined by what is needed for the workbench.

ema_workbench.analysis.b_and_w_plotting.set_fig_to_bw(fig, style='hatching', line_style='continuous')

TODO it would be nice if for lines you can select either markers, gray scale, or simple black

Take each axes in the figure and transform its content to black and white. Lines are transformed based on different line styles. Fills such as can be used in meth:envelopes are gray-scaled. Heathmaps are also gray-scaled.

derived from and expanded for my use from: https://stackoverflow.com/questions/7358118/matplotlib-black-white-colormap-with-dashes-dots-etc

Parameters:
  • fig (figure) – the figure to transform to black and white

  • style ({HATCHING, GREYSCALE}) – parameter controlling how collections are transformed.

  • line_style (str, {'continuous', 'black', None})

plotting_util

Plotting utility functions

ema_workbench.analysis.plotting_util.COLOR_LIST = [(0.12156862745098039, 0.4666666666666667, 0.7058823529411765), (1.0, 0.4980392156862745, 0.054901960784313725), (0.17254901960784313, 0.6274509803921569, 0.17254901960784313), (0.8392156862745098, 0.15294117647058825, 0.1568627450980392), (0.5803921568627451, 0.403921568627451, 0.7411764705882353), (0.5490196078431373, 0.33725490196078434, 0.29411764705882354), (0.8901960784313725, 0.4666666666666667, 0.7607843137254902), (0.4980392156862745, 0.4980392156862745, 0.4980392156862745), (0.7372549019607844, 0.7411764705882353, 0.13333333333333333), (0.09019607843137255, 0.7450980392156863, 0.8117647058823529)]

Default color list

class ema_workbench.analysis.plotting_util.Density(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Enum for different types of density plots

BOXENPLOT = 'boxenplot'

constant for plotting density as a boxenplot

BOXPLOT = 'boxplot'

constant for plotting density as a boxplot

HIST = 'hist'

constant for plotting density as a histogram

KDE = 'kde'

constant for plotting density as a kernel density estimate

VIOLIN = 'violin'

constant for plotting density as a violin plot, which combines a Gaussian density estimate with a boxplot

class ema_workbench.analysis.plotting_util.LegendEnum(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Enum for different styles of legends

class ema_workbench.analysis.plotting_util.PlotType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
ENVELOPE = 'envelope'

constant for plotting envelopes

ENV_LIN = 'env_lin'

constant for plotting envelopes with lines

LINES = 'lines'

constant for plotting lines

scenario_discovery_util

Scenario discovery utilities used by both cart and prim

class ema_workbench.analysis.scenario_discovery_util.RuleInductionType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
BINARY = 'binary'

constant indicating binary classification mode. This is the most common used mode in scenario discovery

CLASSIFICATION = 'classification'

constant indicating classification mode

REGRESSION = 'regression'

constant indicating regression mode

Util

ema_exceptions

Exceptions and warning used internally by the EMA workbench. In line with advice given in PEP 8.

exception ema_workbench.util.ema_exceptions.CaseError(message, case, policy=None)

error to be used when a particular run creates an error. The character of the error can be specified as the message, and the actual case that gave rise to the error.

exception ema_workbench.util.ema_exceptions.EMAError(*args)

Base EMA error

exception ema_workbench.util.ema_exceptions.EMAParallelError(*args)

parallel EMA error

exception ema_workbench.util.ema_exceptions.EMAWarning(*args)

base EMA warning class

ema_logging

This module contains code for logging EMA processes. It is modeled on the default logging approach that comes with Python. This logging system will also work in case of multiprocessing.

ema_workbench.util.ema_logging.get_rootlogger()

Returns root logger used by the EMA workbench

Return type:

the logger of the EMA workbench

ema_workbench.util.ema_logging.log_to_stderr(level=None, pass_root_logger_level=False)

Turn on logging and add a handler which prints to stderr

Parameters:
  • level (int) – minimum level of the messages that will be logged

  • pas_root_logger_level (bool, optional. Default False) – if true, all module loggers will be set to the same logging level as the root logger. Recommended True when using the MPIEvaluator.

ema_workbench.util.ema_logging.temporary_filter(name='EMA', level=0, functname=None)

temporary filter log message

Parameters:
  • name (str or list of str, optional) – logger on which to apply the filter.

  • level (int, or list of int, optional) – don’t log message of this level or lower

  • funcname (str or list of str, optional) – don’t log message of this function

  • logger (all modules have their own unique)

  • ema_workbench.analysis.prim) ((e.g.)

utilities

This module provides various convenience functions and classes.

ema_workbench.util.utilities.load_results(file_name)

load the specified bz2 file. the file is assumed to be saves using save_results.

Parameters:

file_name (str) – the path to the file

Raises:

IOError if file not found

ema_workbench.util.utilities.merge_results(results1, results2)

convenience function for merging the return from perform_experiments().

The function merges results2 with results1. For the experiments, it generates an empty array equal to the size of the sum of the experiments. As dtype is uses the dtype from the experiments in results1. The function assumes that the ordering of dtypes and names is identical in both results.

A typical use case for this function is in combination with experiments_to_cases(). Using experiments_to_cases() one extracts the cases from a first set of experiments. One then performs these cases on a different model or policy, and then one wants to merge these new results with the old result for further analysis.

Parameters:
  • results1 (tuple) – first results to be merged

  • results2 (tuple) – second results to be merged

Return type:

the merged results

ema_workbench.util.utilities.process_replications(data, aggregation_func=<function mean>)

Convenience function for processing the replications of a stochastic model’s outcomes.

The default behavior is to take the mean of the replications. This reduces the dimensionality of the outcomes from (experiments * replications * outcome_shape) to (experiments * outcome_shape), where outcome_shape is 0-d for scalars, 1-d for time series, and 2-d for arrays.

The function can take either the outcomes (dictionary: keys are outcomes of interest, values are arrays of data) or the results (tuple: experiments as DataFrame, outcomes as dictionary) of a set of simulation experiments.

Parameters:
  • data (dict, tuple) – outcomes or results of a set of experiments

  • aggregation_func (callabale, optional) – aggregation function to be applied, defaults to np.mean.

Return type:

dict, tuple

ema_workbench.util.utilities.save_results(results, file_name)

save the results to the specified tar.gz file.

The way in which results are stored depends. Experiments are saved as csv. Outcomes depend on the outcome type. Scalar, and <3D arrays are saved as csv files. Higher dimensional arrays are stored as .npy files.

Parameters:
  • results (tuple) – the return of perform_experiments

  • file_name (str) – the path of the file

Raises:

IOError if file not found

Indices and tables