pairs_plotting

R-style pairs plotting functionality.

ema_workbench.analysis.pairs_plotting.pairs_density(experiments: DataFrame, outcomes: dict[str, ndarray], outcomes_to_show: list[str] | None = None, group_by: str | None = None, grouping_specifiers=None, ylabels: dict[str, str] | None = None, point_in_time: int = -1, log: bool = True, gridsize: int = 50, colormap: str = 'coolwarm', filter_scalar: bool = True) tuple[Figure, dict[str, Axes]]

Generate a pairs hexbin density multiplot.

In case of time-series data, the end states are used.

hexbin makes hexagonal binning plot of x versus y, where x, y are 1-D sequences of the same length, N. If C is None (the default), this is a histogram of the number of occurrences of the observations at (x[i],y[i]). For further detail see matplotlib on hexbin

Parameters:
  • experiments (DataFrame)

  • outcomes (dict)

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.

  • point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time

  • log (bool, optional) – indicating whether density should be log scaled. Defaults to True.

  • gridsize (int, optional) – controls the gridsize for the hexagonal bining. (default = 50)

  • cmap (str) – color map that is to be used in generating the hexbin. For details on the available maps, see pylab. (Defaults = coolwarm)

  • filter_scalar (bool, optional) – remove the non-time-series outcomes. Defaults to True.

Returns:

  • fig – the figure instance

  • dict – key is tuple of names of outcomes, value is associated axes instance

ema_workbench.analysis.pairs_plotting.pairs_lines(experiments: DataFrame, outcomes: dict[str, ndarray], outcomes_to_show: list[str] | None = None, group_by: str | None = None, grouping_specifiers=None, ylabels: dict[str, str] | None = None, legend: bool = True, **kwargs) tuple[Figure, dict[str, Axes]]

Generate a pairs lines multiplot.

It shows the behavior of two outcomes over time against each other. The origin is denoted with a circle and the end is denoted with a ‘+’.

Parameters:
  • experiments (DataFrame)

  • outcomes (dict)

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.

  • legend (bool, optional) – if true, and group_by is given, show a legend.

  • point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time

Returns:

  • fig – the figure instance

  • dict – key is tuple of names of outcomes, value is associated axes instance

ema_workbench.analysis.pairs_plotting.pairs_scatter(experiments: DataFrame, outcomes: dict[str, ndarray], outcomes_to_show: list[str] | None = None, group_by: str | None = None, grouping_specifiers=None, ylabels: dict[str, str] | None = None, legend: bool = True, point_in_time: int = -1, filter_scalar: bool = False, **kwargs) tuple[Figure, dict[str, Axes]]

Generate a pairs scatter multiplot.

In case of time-series data, the end states are used.

Parameters:
  • experiments (DataFrame)

  • outcomes (dict)

  • outcomes_to_show (list of str, optional) – list of outcome of interest you want to plot.

  • group_by (str, optional) – name of the column in the cases array to group results by. Alternatively, index can be used to use indexing arrays as the basis for grouping.

  • grouping_specifiers (dict, optional) – dict of categories to be used as a basis for grouping by. Grouping_specifiers is only meaningful if group_by is provided as well. In case of grouping by index, the grouping specifiers should be in a dictionary where the key denotes the name of the group.

  • ylabels (dict, optional) – ylabels is a dictionary with the outcome names as keys, the specified values will be used as labels for the y axis.

  • legend (bool, optional) – if true, and group_by is given, show a legend.

  • point_in_time (float, optional) – the point in time at which the scatter is to be made. If None is provided (default), the end states are used. point_in_time should be a valid value on time

  • filter_scalar (bool, optional) – remove the non-time-series outcomes. Defaults to True.

Returns:

  • fig (Figure instance) – the figure instance

  • axes (dict) – key is tuple of names of outcomes, value is associated axes instance

  • .. note:: the current implementation is limited to seven different – categories in case of column, categories, and/or discretesize. This limit is due to the colors specified in COLOR_LIST.