New DiSCover algorithm performs well on PDC data sets

ProM 6.12 will contain a new discover algorithm called DiSCover. This algorithm discovers an accepting Petri net from an event log using a collection of DFGs (Directly Follows Graphs) and has been tested on the data sets of the Process Discovery Contests.

Absolute threshold10
Relative threshold105210
PDC 2016200193145154173188190
PDC 2017200197160163171170169
PDC 2019900898649728795819845
PDC 2020100%76.2%31.2%55.9%76.8%86.5%67.8%
PDC 2021100%96.2%76.0%92.1%97.6%98.2%DNF
PDC 2022100%TBD87.2%88.5%87.3%83.8%66.6%

As the DiSCover algorithm is non-deterministic, we ran it three times on every data set and report only the minimal numbers of true/false positives/negatives the three runs agree on (as is usual for the PDC 2020 and later data sets). As an example if we would have 100 true positives in the first run, 105 in the second, and 95 in the third, the number 95 is used.

The winning result for the PDC 2022 is still to be decided (TBD), whereas the algorithm did not finish (DNF) in reasonable time for at least one of the event logs of the PDC 2021 data set using both thresholds set to 0. More than 15,000 DFGs are generated for this event log, which seems to be too much for lpsolve (which is used in one of the steps of the algorithm).

These results show that with the proper values for both noise thresholds, the DiSCover algorithm outperforms the winning contributions for the PDC 2020 and PDC 2021 contests. As an example, if both thresholds are set to 1, then the DiSCover algorithm scores about 98.2% on the PDC 2021 contest, while the winning submission scored 96.2%.

The screenshot below shows the accepting Petri net discovered by DiSCover from the example event log of the Advanced Process Discovery chapter at the Summer School on Process Mining of 2022.

This shows that DiSCover can discover complex routing constructs.

The discovered accepting Petri net will be relaxed sound, but not necessarily sound. If both noise thresholds are set to 0, then the discovered accepting Petri net will be able to replay all traces from the event log successfully.

This work made use of the Dutch national e-infrastructure with the support of the SURF Cooperative using grant no. EINF-3334.