PDC 2022 Data Set Disclosed

The data set of the Process Discovery Contest of 2022 (PDC 2022) has now been disclosed. It contains the following five folders (in ZIP archives):

  • Ground Truth Logs: The 96 test logs (.xes format) as classified by the corresponding models.
  • Models: The 96 original workflow nets (.pnml format) used to generate the logs.
  • Test Logs: The 96 logs (.xes format) to classify using the models as discovered by the submitted algorithm from the training logs.
  • Base Logs: The 96 logs (.xes format) to classify against using the models as discovered by the submitted algorithm from the training logs.
  • Training Logs: The 480 logs (.xes format) to discover the models from using the submitted algorithm.

A trace from the Test log is classified against the corresponding trace from the Base log. If the trace from the Test log fits the model (strict) better than then trace from the Base log, then this trace from the Test log is classified as positive.

Citation

H. M. W. Verbeek: Process Discovery Contest 2022. 4TU.ResearchData, 2022.