Eight new plug-ins have been added:
- Import Lenient Log from IEEE XES Log File
- Import Lenient Log from Compressed IEEE XES Log File
- Import Conforming Log from IEEE XES Log File
- Import Conforming Log from Compressed IEEE XES Log File
- Import Strictly Conforming Log from IEEE XES Log File
- Import Strictly Conforming Log from Compressed IEEE XES Log File
- Export Log to IEEE XES File
- Export Log to Compressed IEEE XES File
These plug-ins are contained in the new XESStandardConnector package, which connects the new XESStandard package to the Log package, and hence to OpenXES. The XESStandard package contains the implementation of the IEEE XES Standard (see my earlier post on the approval of the IEEE XES Standard), and the basic import and export functionality.
Input
The import plug-ins all take an IEEE XES Log file (either compressed or not), whereas the export plug-ins take an OpenXES XLog object.
Output
The import plug-ins result in an OpenXES XLog object (or null, if the import fails), the export plug-ins result in a (compressed or not) IEEE XES Log file.
Packages
XESStandard and XESStandardConnector
Description
The import plug-ins come in three flavors: Lenient, Conforming, and Strictly Conforming. These flavors all use the same parser, but their reaction on any issues raised while parsing is different. Issues can come in two flavors:
- Conforming: This issue prevents the Log file from conforming to the XES Standard. As an example, the required log attribute “xes.version” may be missing.
- Strictly Conforming: This issue prevents the Log file from being strictly conforming, but not from being conforming. As an example, a log attribute “openxes.version” does not make the file non-conforming, but it does make it non-strictly-conforming,
Lenient
The Lenient import plug-ins consider every raised issue to be mere warnings. These import plug-ins typically result in a proper OpenXES XLog object, while up to the first 10 warnings will be shown to the user.
Conforming
The Conforming import plug-ins consider every Conforming issue as an error, but every Strictly Conforming issue as a warning. If any errors are found, that is, if any issue was raised that prevent the file from being conforming, the parsed XLog object is ignored and null is returned, and the errors are shown to the user.
Otherwise, the parsed XLog object is returned and the warnings (if any) are shown to the user.
Strictly Conforming
The Strictly Conforming import plug-in considers all issues to be errors. If any errors are found, that is, if any issue was raised that prevent the file from being conforming, the parsed XLog object is ignored and null is returned, and the errors are shown to the user.
Otherwise, the parsed XLog object is returned.
Miscellaneous
Duplication of keys and values
The parser implemented in the XESStandard package tries to avoid duplication of attribute keys and attribute values.
For attribute keys, this is always possible and almost always useful, as a key will almost always appear for many event, or for many traces. As a result, the key will be stored only once, and not over and over again.
For attribute values this is not always useful. For example, for booleans, integer numbers, and real numbers a reference to a unique boolean, integer number or real number takes as much space (typically) as the boolean, integer number, or real number itself. Therefore, for these types unique values are not used.
For strings, ids, and timestamps it is considered useful, as these object require more space than a reference to a unique object. However, using unique values makes no sense for some attributes, while it may make sense for others. The implementation uses the classifiers in the log to decide whether or not to use unique values for attributes: If the attribute key is part of some classifier, then apparently this attribute has a reduced number of possible values (as otherwise there would not be a need to have it part of a classifier). Hence, for these types, if the attribute key is part of some classifier, then every value will be stored only once, and not over and over again.
For lists, this has not been done yet, as we see no real use for the time being. But if logs start to appear where lists are being classified, we should treat lists like we treat strings, ids, and timestamps.
As a result, the presence/absence of classifiers may determine the memory footprint required for storing a log. If you know that an attribute has a limited set of possible values, then it may make sense to add a classifier for that attribute to the log.
Handling extensions
The standard extensions are hard-coded in the XESStandard package. As a result, the parser will not download the corresponding xesext files, which speeds up the process. If the parser encounters a non-standard extension in a log file, the parser will download and parse the corresponding xesext file, unless it was already downloaded and parsed by the same ProM instance.
As a result, the handling of non-standard extensions may take considerable more time than the handling of standard extensions. This should be a motivation for the developers of extension to make them standard…