ProM 6.12 will contain a new discovery algorithm called DiSCover. This algorithm discovers an accepting Petri net from an event log using a collection of DFGs (Directly Follows Graphs). From every DFG, a state machine WF-net will be discovered which will then be merged by synchronizing them on the visible transitions (that is, on the activities). Please install the DiSCover package using the ProM Package Manager to be able to use this plugin.
The plugin is available in ProM 6.12 as the “DiSCover Petri net (user)” plugin, and takes an event log (XLog) as input. It first shows a dialog that allows the user to change the possible parameter values.
Parameters
Absolute threshold
The absolute frequency a directly-follows relation must have. If this threshold is not reached, it is treated as noise. Default value: 1.
Relative threshold
The relative frequency a directly-follows relation must have. If this threshold is not reached, and the value is not safe (see below), it is treated as noise. The frequency is taken relative to the maximum frequency over all directly-follows relations that have the same source or the same target. Default value: 1.
Safety threshold
The relative frequency percentage which a directly-follows relation is safe from being treated as noise. The frequency percentage is taken relative to the minimum frequency over all directly-follows relations that have the same source or the same target. Default value: 95.
Limit on number of components
How many components (every DFG will result in a component) will be merged into the discovered accepting Petri net. If more components are available, the selected components will be spread out evenly over all components. Default value: 20.
Merge activities
If (not) selected, the components will be (not) merged by synchronizing the visible transitions. Default value: true.
Reduce Petri net
If (not) selected, the resulting Petri net will (not) be reduced using well-known behavior-preserving reduction rules. Default value: true.
Use veto for noise
If selected, a frequency will only be treated as noise if all DFGs agree that it should be treated as noise. Default value: false.
The actual discovery
Once the parameters values have been set, the plugin will start the discovery, which follows a number of steps:
- The DFG is discovered.
- All pairs of concurrent activities are discovered.
- All maximal subsets of non-concurrent activities are discovered.
- A DFG is discovered for every maximal subset.
- A component (state machine WF-net) is discovered from every DFG. This can be done in a straightforward way as we assume that all activities in the subset are not concurrent.
- If required, the number of components is limited to a reasonable number.
- All remaining components are merged (if required) on the visible transitions, and all components are connected to an AND-split construct (at the start) and an AND-join construct (at the end). This results in an accepting Petri net.
- If required, the accepting Petri net is reduced.
- The resulting accepting Petri net is returned.
Example result
This shows the result of the DiSCover plugin on the event log we used for the chapter on Advanced Process Discovery at the Summer School on Process Mining in 2022, with default parameter values except for both the absolute and relative threshold, which are both set to 0.
This shows the result if the activities are not merged: A combination of four state machine WF-nets. Three of these are quite simple, but as the order between c and e is not simple, the fourth is less simple.
Open issues
- The part of the algorithm that limits the number of components is non-deterministic. It first uses lpsolve to select a collection of relevant components, after which the number of components is limited down further if needed. The first part (with lpsolve) is non-deterministic.
- One some event logs using some parameter settings, lpsolve also takes days (may be caused by the fact that it starts from more than 15000 components…). At the moment, there is no option to skip this lpsolve part. might be a good idea.
- Additional reduction rules would be nice. As an example in the merged result above, the two silent transition that skip f (one from b to g, the other from c or d to g) can be merged into a single silent transition. Such additional reduction rules may help battling the complexity of the resulting net.
- Because of the complexity of the discovered nets, the alignment-based replayer often fails to replay traces on the discovered nets within 10 minutes. We created a token-based replayer that exploits the fact that the net is safe. This replayer is very quick, and is deterministic as long as everything fits, but any non-fitting transition may produce additional tokens breaking the safeness of the net. This results in non-deterministic behavior of this replayer. As a result, the reported (token-based) fitness is an underestimate of the real (token-based) fitness.
Version history
- 6.12.50: Version included in alpha candidate of ProM 6.12.