Where are the service-technology.org tools for Services, Collaborative Processes, Orchestration, Choreographies Partner, Controller Synthesis, and Operating Guidelines

Posted on March 10, 2021 by dirkfahland

tl;dr: at https://github.com/nlohmann/service-technology.org

This convoluted title is a result of multiple reviews I have been writing in the past on recent papers in the area of analyzing interacting processes. The problem comes in many forms and gets different names, depending on flavor of it.

Collaborative processes or process collaboration: just several processes interacting with each other via message or data exchange, the assumption is that all processes are alreay designed to ensure the interaction goes error-free and achieves the desired goals. But that assumption is usually not met intially and the following 2 basic approaches to designing an error-free collaboration have emerged.
Orchestration: several processes are given that “want” to collaborate or shall be composed to achieve a joint goal/execution, create an “orchestrator” processes that interacts with each of the processes individually and thereby ensures messages are passed between processes in the right order; the orchestrator knows the state of the global conversation between all processes.
Choreography: several processes are given that “want” to collaborate or shall be composed to achieve a joint goal/execution, design the processes in a way that they can directly interact with each other (without an intermediate orchestrator) and achieve the goal; this may require to modify particular processes (change order of messages exchanged, send an additional message).

In the past these problems were studied under middleware systems (Corba), later then under the Web Services paradigm with BPEL4WS and WS-BPEL, and as collaborative processes under BPMN, and now under the umbrella of micro services and execution of distributed processes via blockchains.

I regularly review every year 2-3 papers in this space not aware of the fundamental problems of distributed behavior on choreographies and orchestrations (looking at you micro-services and blockchains), so this may help to catch up on missing knowlege and be aware of various tools that can already help in solving these problems.

The behavior of an individual process is often modeled with BPMN. The control-flow part of BPMN addresses (in some form) the same use case as Petri nets: modeling distributed behavior. Where you can see Petri nets as bare essence of concepts for distributed behavior (assembler level), BPMN has a sugar coated syntax for composite behavior. That means verification of BPMN is verification of behavioral properties. The standard technique that has extensive tool support are:

Behavior preserving translation from BPMN to Petri nets, see Dijkman, R., M. Dumas and C. Ouyang. “Semantics and analysis of business process models in BPMN.” Inf. Softw. Technol. 50 (2008): 1281-1294. https://www.semanticscholar.org/paper/Semantics-and-analysis-of-business-process-models-Dijkman-Dumas/4d4299bfd0ef670b2f913103b853f6394ed026a7

Or translate to Coloured Petri Nets (+timing and some data aspects kept within the orchestration process), see Meghzili, Said, A. Chaoui, M. Strecker and E. Kerkouche. “An Approach for the Transformation and Verification of BPMN Models to Colored Petri Nets Models.” Int. J. Softw. Innov. 8 (2020): 17-49. https://www.semanticscholar.org/paper/An-Approach-for-the-Transformation-and-Verification-Meghzili-Chaoui/d77b07092a7b7c550fda664ff83660f9060e567d

Then apply standard model checkers for verifying the behavior by checking absence of deadlocks (fast enough for edit-time verification in the editor), see Fahland, Dirk, C. Favre, J. Koehler, Niels Lohmann, H. Völzer and K. Wolf. “Analysis on demand: Instantaneous soundness checking of industrial business process models.” Data Knowl. Eng. 70 (2011): 448-466. https://www.semanticscholar.org/paper/Analysis-on-demand%3A-Instantaneous-soundness-of-Fahland-Favre/995fd8543d1b1547fd3f46a252d2fd2c4f4966f3

Or any linear-time temporal logic property of the behavior you are interested (by specifying LTL formulas over states in the BPMN/Petri net), see Wolf, K.. “Petri Net Model Checking with LoLA 2.” Petri Nets (2018). https://www.semanticscholar.org/paper/Petri-Net-Model-Checking-with-LoLA-2-Wolf/d73ec65c0d5e41016797eb3de71fa896597ab337

But the actual problem lies in analyzing and verifying and reasoning over the distributed compositions of processes, which is inherently a branching-time problem (not a linear-time problem). If you distribute the tasks in your process over multiple components, race conditions creep in and you can end up with non-local, uncontrolled choices and deadlocks. A process X cannot observe the internal steps of process Y, especially not decisions made within Y. X can only observe which messages it receives from Y and then infer in which state Y might be. However, Y may take steps/decisions that are not communicated to X, yet X has to make a decisions between two options and only one of the decisions will be compatible with how X progressed.

Some processes cannot be implemented in a distributed way: here is how to check & prevent these: Decker, G., A. Barros, Frank Michael Kraft and Niels Lohmann. “Non-desynchronizable Service Choreographies.” ICSOC (2008). https://www.semanticscholar.org/paper/Non-desynchronizable-Service-Choreographies-Decker-Barros/479c9e05aaa0ff7bad230b34b957036e2c8d4336

To avoid this during design you can create a top-down step-wise refinement of a global behavioral contract, along the way you use branching-time verification techniques to ensure your local component satisfies the global contract, see Aalst, W. V., Niels Lohmann, Peter Massuthe, C. Stahl and K. Wolf. “Multiparty Contracts: Agreeing and Implementing Interorganizational Processes.” Comput. J. 53 (2010): 90-106. https://www.semanticscholar.org/paper/Multiparty-Contracts%3A-Agreeing-and-Implementing-Aalst-Lohmann/8f97489373828b560aa2513a1f421afb9294f7b9

There’s much more that builds on this branching-time theory for modeling and verifying component behavior and composition:

deciding and computing valid substitutions of one component by another, see Stahl, C. and K. Wolf. “Deciding service composition and substitutability using extended operating guidelines.” Data Knowl. Eng. 68 (2009): 819-833. https://www.semanticscholar.org/paper/Deciding-service-composition-and-substitutability-Stahl-Wolf/bfaa1aba6c7d54671d1abd89fcb880b12a11ef03
repairing ill-defined components by finding behaviorally similar components (edit-distance-based) that satisfy the desired properties, see Lohmann, Niels. “Correcting Deadlocking Service Choreographies Using a Simulation-Based Graph Edit Distance.” BPM (2008). https://www.semanticscholar.org/paper/Correcting-Deadlocking-Service-Choreographies-Using-Lohmann/9c40607ff15f0bcde40cf33ec0acf72a714cf0a4
automated test case generation to run tests against the implementation of your distributed process, see Kaschner, Kathrin and Niels Lohmann. “Automatic Test Case Generation for Interacting Services.” ICSOC Workshops (2008). https://www.semanticscholar.org/paper/Automatic-Test-Case-Generation-for-Interacting-Kaschner-Lohmann/0ff350a679d623f47ae9243ccae9847ee4e662a3

Most of these techniques and solutions rely on two base techniques

Given a process X (or a partial composition of processes X = X1+…+Xk) construct/synthesize a controller (or partner process) Y that triggers the steps in X or sends/receives messages to trigger steps in X so that X achieves its goal (terminates in a goal state). This is called controller synthesis or partner synthesis and you can synthesize most permissive controllers (that allow as much behavior as possible in X) or specific controllers. Controller synthesis is highly efficient as it can use state-space reduction techniques of modern model checkers to explore only a limited part of the state-space. Controllers of large process compositions can be synthesized in millseconds to seconds, see Lohmann, Niels and D. Weinberg. “Wendy: A Tool to Synthesize Partners for Services.” Fundam. Informaticae 113 (2011): 295-311. https://www.semanticscholar.org/paper/Wendy%3A-A-Tool-to-Synthesize-Partners-for-Services-Lohmann-Weinberg/fb75ab94d2324c773aa1b56876af5059e9e21da5
Given a process X (or a partial composition of processes X = X1+…+Xk) construct/synthesize a specification S of all controllers/partners for X. The specifiation is the most permissive controller of X annotated with which subsets of behavior are also controllers/partners of X. The specification S is expensive to compute, but after it has been computed all kinds of controllers/partners of X can be derived in linear time from S and it can be checked whether any independenly designed process Y can be composed with X. Lohmann, Niels, Peter Massuthe and K. Wolf. “Operating Guidelines for Finite-State Services.” ICATPN (2007). https://www.semanticscholar.org/paper/Operating-Guidelines-for-Finite-State-Services-Lohmann-Massuthe/4c77ebfb82857802a87d578b3fa83405c33b49e0

All these techniques were implemented in various (efficient) open-source prototype tools presented and hosted on http://www.service-technology.org which is currently offline. The internet archive still has a snapshot https://web.archive.org/web/20181228162401/http://service-technology.org/ but the tools are no longer available.

Luckily, the entire source code repository of all service-technology.org tools has been cloned to GitHub: https://github.com/nlohmann/service-technology.org, making them available to the current generation of research on this topic.

Mathe-Aufgaben bis 20 / Math-Exercises up to 20

Posted on March 18, 2020 by dirkfahland

Wer jetzt in der Situation ist, zu Hause Mathe-Aufgaben zu üben freut sich vielleicht über dieses Arbeitsblatt mit Rechenaufgaben zum addieren subtrahieren bis 20.

Anyone in the situation of having to teach math at home may like having a fresh batch of exercises for adding and subtracting until 20.

exercises_1_to_20_sum_minus (.docx)
exercises_1_to_20_sum_minus (.pdf)

Hier ist die Python-Datei um neue Aufgaben zu erstellen. Die Anzahl der Aufgaben und der Zahlenraum kann auch einfach angepasst werden (kleinste/größte Zahl).

And here is the python code to create new exericses. You can adjust the number of exercises and range of the smallest, largest number to occur.

https://github.com/dfahland/scripts/blob/master/learning/create_math_exercises_sum_minus.py

How Do People Create Process Models?

Posted on September 12, 2016 by dirkfahland

Over the last 7 years, I have been collaborating with some colleagues on a number of experiments where we investigated how people create process models. In particular, we wanted to see where and how they differ and whether their personal unique “modeling style” has an impact on model quality. In this – rather long – blog post, I want to summarize what we found out and point to the different studies that we published. (To be honest, I collected this information for a Master student who wants to replicate some of these studies, but I might as well share it with others). So, here we go.

First experiment: organize your process description!

In 2010, we conducted a first structured experiment on quality of modeling outcomes. We compared how the way an informal requirements document is organized impacts the quality of the created model (modelers get a text about a process and have to create a graphical model – say in BPMN). Spoiler: a breadth-first description of the process works best.

Models were created more accurately when the process description was given in breadth-first order.

Jakob Pinggera, Stefan Zugal, Barbara Weber, Dirk Fahland, Matthias Weidlich, Jan Mendling, Hajo A. Reijers: How the Structuring of Domain Knowledge Helps Casual Process Modelers. ER 2010: 445-451 http://dx.doi.org/10.1007/978-3-642-16373-9_33

The conceptual background for this and subsequent experiments were two paper investigating the nature of modeling languages regarding how they use particular modeling concepts to structure knowledge about a process.

Dirk Fahland, Daniel Lübke, Jan Mendling, Hajo A. Reijers, Barbara Weber, Matthias Weidlich, Stefan Zugal: Declarative versus Imperative Process Modeling Languages: The Issue of Understandability. BMMDS/EMMSAD 2009: 353-366 http://dx.doi.org/10.1007/978-3-642-01862-6_29
Dirk Fahland, Jan Mendling, Hajo A. Reijers, Barbara Weber, Matthias Weidlich, Stefan Zugal: Declarative versus Imperative Process Modeling Languages: The Issue of Maintainability. Business Process Management Workshops 2009: 477-488 http://dx.doi.org/10.1007/978-3-642-12186-9_4

Visualizing how people model

In 2011, we published a paper describing a software platform for recording and analyzing modeling actions on a canvas. We also describe the visualization of modeling actions in a time-series diagram where specific phases in the modeling process (creating elements, arranging existing elements, deleting elements, thinking about the process) can be identified and highlighted as illustrated below.

In the experiments, we could observe significant differences between how different modelers approach the same modeling task – manifesting itself in remarkably distinct modeling phase diagrams.

Jakob Pinggera, Stefan Zugal, Matthias Weidlich, Dirk Fahland, Barbara Weber, Jan Mendling, Hajo A. Reijers: Tracing the Process of Process Modeling with Modeling Phase Diagrams. Business Process Management Workshops (1) 2011: 370-382 http://dx.doi.org/10.1007/978-3-642-28108-2_36

Identifying modeling styles

In 2012, we analyzed these differences between how modelers approach a modeling task further. We plotted the number of creation, deletion, and re-arranging actions on the canvas on a time-series. We binned these modeling actions into segments of 10 seconds length; each second has a particular “modeling profile” of creation, deletion, and re-arranging actions. We then clustered users based on their “modeling profiles”, i.e., typical occurrences of create/delete/move actions throughout their modeling, and identified three unique clusters of “modeling profiles”. Below is the “modeling profile” of the cluster showing many creation operations early in the modeling and few delete operations.

Jakob Pinggera, Pnina Soffer, Stefan Zugal, Barbara Weber, Matthias Weidlich, Dirk Fahland, Hajo A. Reijers, Jan Mendling: Modeling Styles in Business Process Modeling. BMMDS/EMMSAD 2012: 151-166 http://dx.doi.org/10.1007/978-3-642-31072-0_11

We then conducted a subsequent, more detailed analysis of these clusters and also investigated the modeling phase diagrams of each cluster. First, we could establish that there are statistically significant differences between the three clusters in (1) speed of adding modeling elements, (2) duration of phases of improving the model layout and elements moves in a phase of layouting, (3) time between adding model elements, thinking about the model, and adding further model elements. Altogether, we could then characterize 3 unique modeling styles from these clusters

Quick modelers who (after some initial deliberation on the process), create an almost correct model right away and only need minimal adjustments of model layout and few thinking pauses
Modelers who model at a slower pace and make regular and longer layouting breaks (possibly to plan their next modeling steps)
Modelers who also model at a slower pace but require less layouting than the previous group.

This analysis also gave us a first idea into which factors influence how people approach a modeling task. The central two factors are (1) the cognitive load created by the modeling tasks, largely influencing the efficiency with which the model is created, and (2) tool support for layouting, largely influencing the amount of time spent on organizing the model on the canvas.

Jakob Pinggera, Pnina Soffer, Dirk Fahland, Matthias Weidlich, Stefan Zugal, Barbara Weber, Hajo A. Reijers, Jan Mendling: Styles in business process modeling: an exploration and a model. Software and System Modeling 14(3): 1055-1080 (2015) http://dx.doi.org/10.1007/s10270-013-0349-1

Modeling style vs model quality

In a second line of analysis, we investigated how the way modelers create their models impacts the quality of the resulting model. By analyzing modeling operations at a more fine-grained level and also considering the modeling elements themselves, we could compare modeling processes at a more detailed level. Below, we see visualizations of four different modelers creating the same model (visualized using the DottedChart plugin of ProM. Each line corresponds to a modeling element (node or arc), green dots show creation operations, blue dots show move operations, and red dots show delete operations.

By analyzing the location of modeling elements on the canvas, and the time between different modeling activities, we could confirm three hypotheses:

Structured modeling (e.g., in clearly defined blocks) is linked to better model quality
lots of movement of modeling objects is linked to lower model quality, and
low modeling speed is linked to low model quality.

Jan Claes, Irene T. P. Vanderfeesten, Hajo A. Reijers, Jakob Pinggera, Matthias Weidlich, Stefan Zugal, Dirk Fahland, Barbara Weber, Jan Mendling, Geert Poels: Tying Process Model Quality to the Modeling Process: The Impact of Structuring, Movement, and Speed. BPM 2012: 33-48 http://dx.doi.org/10.1007/978-3-642-32885-5_3

The impact of structured modeling on modeling quality was analyzed further. In a further set of experiments, factors that impact the cognitive load of the modelers were analyzed. In particular, the researchers looked for factors that help to reduce the cognitive load of the model thus helping him to have more cognitive capacity to create correct models. Besides confirming and deepening the 2010 experiment (structured breadth-first organization of process knowledge improves model quality), the experiment also shows that the characteristics of the modeler impact model quality: A modeler may have a preference of structuring knowledge in a particular way. If process knowledge is presented to them fitting their preference, the individual cognitive load is lower and model quality increases. The image below shows “aspect-oriented” modeling, where a modeler first finishes a first aspect of the model, then works on a second aspect that may involve many modeling elements created earlier.

Jan Claes, Irene T. P. Vanderfeesten, Frederik Gailly, Paul Grefen, Geert Poels: The Structured Process Modeling Theory (SPMT) a cognitive view on why and how modelers benefit from structuring the process of process modeling. Information Systems Frontiers 17(6): 1401-1425 (2015) http://dx.doi.org/10.1007/s10796-015-9585-y

The following, longer journal paper summarizes several techniques for visually analyzing the process of process modeling from various angles.

Jan Claes, Irene T. P. Vanderfeesten, Jakob Pinggera, Hajo A. Reijers, Barbara Weber, Geert Poels: A visual analysis of the process of process modeling. Inf. Syst. E-Business Management 13(1): 147-190 (2015) http://dx.doi.org/10.1007/s10257-014-0245-4

For the really interested, there are 2 PhD theses on the topic:

The Process of Process Modeling / Jakob Pinggera:
http://diglib.uibk.ac.at/ulbtirolhs/content/titleinfo/152632
Investigating the process of process modeling and its relation to modeling quality : the role of structured serialization / Jan Claes
http://repository.tue.nl/ebe36613-df88-4ad1-8f1f-75ee45108a89

Tutorial: Automating Process Mining with ProM’s Command Line Interface

Posted on March 11, 2015 by dirkfahland

In this blogpost I explain how to invoke the process mining tool ProM from the commandline without using its graphical user interface. This allows you to run process mining analyses on several logs in batch mode without user interaction. Before you get too excited: there are quite some limitations to this, which I will address in the end. The following instructions have been tested for the ProM 6.4.1 release.

Invoking the ProM Commandline Interface

The ProM commandline interface (CLI) can be invoked through the class

 org.processmining.contexts.cli.CLI

To properly invoke the CLI for ProM 6.4.1, use the following command (which is a copy of the command in ProM641.bat with changed main class).

java -da -Xmx1G -XX:MaxPermSize=256m -classpath ProM641.jar -Djava.util.Arrays.useLegacyMergeSort=true org.processmining.contexts.cli.CLI

The CLI itself has no interactive user interface. Instead, it executes scripts passed to it as commandline parameter. To simplify your life, I suggest to put the command into a batch file ProM_CLI.bat or shell script ProM_CLI.sh that passes on 2 commandline parameters. For instance

java -da -Xmx1G -XX:MaxPermSize=256m -classpath ProM641.jar -Djava.util.Arrays.useLegacyMergeSort=true org.processmining.contexts.cli.CLI %1 %2

A typical example script that the ProM CLI takes is the following script_alpha_miner.txt

System.out.println("Loading log");
log = open_xes_log_file("myLog.xes");

System.out.println("Mining model");
net_and_marking = alpha_miner(log);
net = net_and_marking[0];
marking = net_and_marking[1];

System.out.println("Saving net");
File net_file = new File("mined_net.pnml");
pnml_export_petri_net_(net, net_file);

System.out.println("done.");

You can invoke it with the command

ProM_CLI.bat -f script_alpha_miner.txt

It will read the log file myLog.xes (stored in the current working directory), invoke the alpha miner, and write the resulting Petri net as a PNML file mined_net.pnml to the current working directory. (No, there is currently no way to pass file names as additional commandline parameters to the script).

Note: when running the above script ProM will first produce a (large) number of messages on the screen during the startup phase related to scanning for available packages and plugins, bear with it until it is ready.

Scripts for ProM

The language used for the scripts is basically Java interpreted at runtime. In principle, you can put any Java code which you would put into a method body (no class/method declarations). In case the Java reflection framework is able to infer the type, variables do not have to be declared, but can just be used like in a dynamically typed language. For example variable log in the script_alpha_miner.txt will be inferred to have type XLog.

In a script, you can directly invoke ProM plugins through special method names provided by the CLI; the method names are derived from the plugin names shown in ProM. For example the plugin “Alpha Miner” is available as method alpha_miner. You can get the full list of all ProM plugins available for script invocation with the command liner parameter ‘-l’ (“dash lower-case L”):

ProM_CLI.bat -l

This will scan all installed packages for plugins that do not require the GUI to run and list them in the form name(input types) -> (output types). For example, if you have installed the AlphaMiner package the following plugins will be listed (among many others).

alpha_miner(XLogInfo, LogRelations) -> (Petrinet, Marking)
alpha_miner(XLog) -> (Petrinet, Marking)
alpha_miner(XLog, XLogInfo) -> (Petrinet, Marking)

Use the ProM Package Manager to install plugins you do not find in the list of installed plugins.

The script_alpha_miner.txt uses the second method signature alpha_miner(XLog) -> (Petrinet, Marking) to discover from an XLog a Petrinet and a Marking. In case a plugin returns multiple objects, the return result is an Object[] array, in which you can access the individual components as usual, i.e., net_and_marking[0] contains the PetriNet and net_and_marking[1] contains the Marking.

Besides the typical plugins you already know from the ProM GUI, there are also plugins for loading files and saving files. Just browse the list of available plugins to find the right type.

I suggest to store the list of available plugins in a separate plugin_list.txt file for easier searching using the following command

ProM_CLI.bat -l > plugin_list.txt

Now, you basically know everything to invoke ProM from the commandline.

Create the ProM_CLI.bat or ProM_CLI.sh
Run the ProM PackageManager to install your desired plugins. If you run the PackageManager for the first time, it will suggest a set of standard packages to install which cover most process mining use cases.
Get the list of available plugins.
Write a script.
Invoke the script.

Known Caveats

The ProM CLI is not the primary user interface of ProM and as such does not get the same attention to usability as the GUI. Thus, it is better to consider the CLI an experimental feature where not everything works as you know it from the GUI and that may sink a bit of time and effort to get running. Several factors you should consider:

You only can use plugins from the CLI which have been programmed to work without the GUI. Whether your favourite plugin is available depends on two aspects:
1. Does the plugin require configuration of parameters for which no good default settings are available, so user feedback is requires (for example particular log filtering options)?
2. Did the developer of that plugin have the time to implement a non-GUI version? We encourage users to first develop the non-GUI version of the plugin and introduce GUI-reliant components only later. However as ProM is an open platform with many contributing parties, individual developers may choose otherwise. If a particular plugin is not available on the CLI, please get in touch with the developer whether this can be changed.
The CLI cannot invoke any code that requires the ProM GUI environment. Any plugin that attempts to do that on the side will terminate the CLI with an exception. That being said, you actually can invoke visualizer plugins that produce a JComponent and then create a new JFrame to visualize the JComponent, see below for an example. However, the functionality of these will be limited (e.g. export of pictures, interaction with the model etc. most likely won’t work)
It may be that ProM CLI does not terminate/close after the script completes. Workaround: include a System.out.println(“done.”); statement at the end of your script to indicate termination. When you see the “done.” line printed on the screen but ProM is still running, you can terminate it manually (CTRL+C) without loosing data.
Log files may not be (g)zipped, i.e., the CLI can only load plain XES or MXML files.
PNML files produced by a mining algorithm in the CLI have no layout information yet. If you want to visualize such a PNML file, you have to open it in a tool that can automatically compute the layout of the model. Opening the file in the ProM GUI will do. Invoking a the plugin to visualize a Petri net will also invoke computation of the layout.
The plugins to load a file or save a file are named rather inconsistently across the different packages. You may have to look for various keywords like “load”, “open”, “import”, “save”, “export” to find the right load/save plugin.
Plugins to load a file always come with a signature that take a String parameter as the path to the file to load. Plugins to save a file always require a File parameter. Thus, you first have to create a file handle myFile = new File(pathToSave); and then pass this handle to the “save file plugin”.
Files are read/written relative to the current working directory.
Even if the plugins you want to use are available for the CLI, executing them may throw exceptions because the plugin (although accessible from the CLI) assumes settings that can only be set correctly in a GUI dialog. The only workaround here is to extend your script with Java code that produces all the settings expected by the plugin. See below for more advanced examples.
Creating these scripts is certainly on the less convenient side of development. You have no development environment with syntax check, code completion etc. In case your script has an error, you will only notice at runtime when you get a long evaluation error thrown which tries to highlight the problematic part of the script, but is typically hard to spot. You’ve been warned.

Advanced Examples

With all these restrictions in mind. Here are some more advanced scripts to get more advanced ProM plugins running. The following script invokes the HeuristicsMiner with its default settings. It needs some additional code to properly pass the event classifiers to the heuristics miner. The HeuristicsMiner typically produces nice results on real-life data because it does not use a standard process modeling notation as target language. As a consequence, there is no serialization format. However, you can invoke the visualization plugin and pass it to a new JFrame to visualize the result. File script_heuristics_miner.txt

System.out.println("Loading log");
log = open_xes_log_file("myLog.xes");

System.out.println("Getting log info");
org.deckfour.xes.info.XLogInfo logInfo = org.deckfour.xes.info.XLogInfoFactory.createLogInfo(log);

System.out.println("Setting classifier");
org.deckfour.xes.classification.XEventClassifier classifier = logInfo.getEventClassifiers().iterator().next();

System.out.println("Creating heuristics miner settings");
org.processmining.plugins.heuristicsnet.miner.heuristics.miner.settings.HeuristicsMinerSettings hms = new org.processmining.plugins.heuristicsnet.miner.heuristics.miner.settings.HeuristicsMinerSettings();
hms.setClassifier(classifier);

System.out.println("Calling miner");
net = mine_for_a_heuristics_net_using_heuristics_miner(log, hms);

System.out.println("Visualize");
javax.swing.JComponent comp = visualize_heuristicsnet_with_annotations(net);
javax.swing.JFrame frame = new javax.swing.JFrame();
frame.add(comp);
frame.setSize(400,400);
frame.setVisible(true);

System.out.println("done.");

Heuristics Miner run from the Command Line, visualizing the output in a new JFrame.

If you want to change the parameters of the HeuristicsMiner, you can do this via the HeuristicsMinerSettings object. However, here I have to refer you to the source code of the HeuristicsMiner package to study the details of this class. See https://svn.win.tue.nl/trac/prom/ as a starting point.

If you prefer to create a process model in a serializable format out of a HeuristicsNet, simply change your script to invoke another plugin to translate the heuristics net into a Petri net. Below is the script that will also save the resulting Petri net as PNML file to disk. File script_heuristics_miner_pn.txt:

System.out.println("Loading log");
log = open_xes_log_file("myLog.xes");

System.out.println("Getting log info");
org.deckfour.xes.info.XLogInfo logInfo = org.deckfour.xes.info.XLogInfoFactory.createLogInfo(log);

System.out.println("Setting classifier");
org.deckfour.xes.classification.XEventClassifier classifier = logInfo.getEventClassifiers().iterator().next();

System.out.println("Creating heuristics miner settings");
org.processmining.plugins.heuristicsnet.miner.heuristics.miner.settings.HeuristicsMinerSettings hms = new org.processmining.plugins.heuristicsnet.miner.heuristics.miner.settings.HeuristicsMinerSettings();
hms.setClassifier(classifier);

System.out.println("Calling miner");
net = mine_for_a_heuristics_net_using_heuristics_miner(log, hms);

System.out.println("Translating to PN");
pn_and_marking = convert_heuristics_net_into_petri_net(net);

System.out.println("Saving net");
File net_file = new File("mined_net.pnml");
pnml_export_petri_net_(pn_and_marking[0], net_file);

System.out.println("done.");

The last example I will show in this blog post is a script to invoke the very reliable InductiveMiner with default parameters which includes some basic noise handling capabilities. The resulting Petri net is written to disk. File script_inductive_miner_pn.txt

System.out.println("Loading log");
log = open_xes_log_file("myLog.xes");

System.out.println("Creating settings");
org.processmining.plugins.InductiveMiner.mining.MiningParametersIMi parameters = new org.processmining.plugins.InductiveMiner.mining.MiningParametersIMi();

System.out.println("Calling miner");
pn_and_marking = mine_petri_net_with_inductive_miner_with_parameters(log, parameters);

System.out.println("Saving net");
File net_file = new File("mined_net.pnml");
pnml_export_petri_net_(pn_and_marking[0], net_file);

System.out.println("done.");

Take Away

It is possible to run process mining analyses in a more automated form using scripts and the ProM CLI interface. Many, but by far not all, plugins are available to run in a non-GUI context. The scripts are non-trivial and may require knowledge of the plugin code to prepare correct plugin settings etc. Luckily, all plugins are open source and ready to be checked. Start here: https://svn.win.tue.nl/trac/prom/

If you were wondering, yes, that’s how we run automated tests for ProM.

For those, who prefer a less experimental environment for automated process mining analysis, I highly recommend ProM integration with RapidMiner available at: http://www.rapidprom.org/

Feel free to post further scripts. In case you have problems with running a particular plugin, I suggest to contact the plugin author to make the plugin ready for the CLI environment.

how to always read facebook’s news feed in “most recent first” order

Posted on May 24, 2014 by dirkfahland

Facebook has been tampering with its news feed design over the last weeks and months on its mobile apps and also the website. The “most recent” order is no longer a default view on the app (but several taps away hidden in a sub-sub-menu) as of version 10.0.0. On the default facebook page, the feed regularly switches back to “top news” every 1 or 2 weeks, even if you choose “most recent”. The option to choose the sort order of the news feed also disappeared from the mobile website (that you can reach when opening facebook.com on a mobile browser).

You can still change the ordering of the news feed on the Desktop page and, once changed, the mobile website will inherit the settings. But, I don’t like to change the sort order every 2 weeks again from the Desktop page.

Now, I just saw that facebook stores the sort order in the URL. You can use the following hard links to reach the news feed in the desired ordering:

https://www.facebook.com/?sk=h_chr to read the news feed in chronological order “most recent first”, and
https://www.facebook.com/?sk=h_nor to read the news feed in random order (“top news”).

I’ve placed this URL as a bookmark on the home screen of my smartphone and removed facebook’s mobile app. It actually loads much faster than the mobile app and I now read the feed in the preferred order. Let’s see how long these links work.

Is my log big enough for process mining? Some thoughts on generalization

Posted on January 7, 2014 by dirkfahland

I recently received an email from a colleague who is active in specification mining (software specifications from execution traces and code artifacts) with the following question.

Do you know of process mining works that deal with the confidence one may have in the mined specifications given the set of traces, i.e., how do we know we have seen “enough” traces? Can we quantify our confidence that the model we built from the traces we have seen is a good one?

The property the colleague asked about is called generalization in process mining. As my reply to this question summarized some insights that I gained recently, I though it was time to share this information further.

A system S can produce a language L. In reality we only see a subset of the behaviors that is recorded in a log K. Ideally, we want that a model M that was discovered from K can reproduce L. We say that M generalizes K well if M also accepts traces from L that are not in K (the more, the better).

This measure is in contradiction with 3 other measures (fitness, precision, and simplicity) as the most general model that accepts all traces is not very precise (M should not accept traces that are not in L). These measures are described informally in The Need for a Process Mining Evaluation Framework in Research and Practice (doi: http://dx.doi.org/10.1007/978-3-540-78238-4_10) and the paper On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery (doi: http://dx.doi.org/10.1007/978-3-642-33606-5_19) shows how they influence each other in practice.

There is currently no generic mathematical definition to compute for a given log K that it was general enough (contains enough information to infer entire L from K). This usually depends on the algorithm, the kind of original system S/the language L, and the kind of model one would like to discover.

The most general result that I am aware of is that the log K has to be directly follows-complete. Action A of the system S directly follows action B if there is some execution trace …AB… of S where first A occurs and then directly B occurs. The log K is directly follows-complete iff for any two actions A, B that directly follow each other, there is a trace …AB… in L. Every system S with a finite number of actions has a finite log K that is directly follows-complete, even if the language L of S is infinite. For such logs, there are algorithms that ensure that S (or a system that is trace-equivalent to S) can be rediscovered. See for instance Discovering Block-Structured Process Models from Event Logs – A Constructive Approach (doi: http://dx.doi.org/10.1007/978-3-642-38697-8_17).

In general, if you have the original system S (or some finite characterization of L), then it will be possible to compute whether the log K is directly follows-complete. If you do not have S or L, then we currently do not know any means to estimate how complete K is. This is an open question that we are currently looking into. Yet, in essence, you have to estimate the probabilities that the information in log K suffices to explain particular languages constructs that the original system has/may have.

We are currently looking deeper into these questions. If you have more pointers on this topic, feel free to drop a comment or an email.

Mining Branching-Time Scenarios From Execution Traces

Posted on December 3, 2013 by dirkfahland

Over the last two years, I have been working with Shahar Maoz and David Lo on discovering high-level specifications of an application from its execution traces. This topic is also known as specification mining and has been brought up around 2002 because we we keep on writing humongous amounts of undocumented code that other people have to use, maintain, or deal with in a later iteration of the software. Getting into this “legacy” code is extremely time consuming and there is a high chance that you will break something because you do not understand how the existing code works.

Specification mining aimes at extracting (mostly) visual representations of existing code that describes its essential architecture and workings at a higher level of abstraction, so that a developer can first get an overview before diving into the code.

We looked particularly at the problem of understanding object interplay in object-oriented code, where you often have many objects that invoke methods of other objects, essentially distributing a behavior over many classes. Everyone who ever tried to understand code that uses a number of architectural patterns combined, such as factories and commands, knows what I mean.

Building on earlier works, we found a way to discover scenarios (something like UML sequence diagrams) that show how multiple objects interact with each other in particular situation. More importantly, we found a way to discover and distinguish two kinds of scenarios:

scenarios that describe behavioral invariants of your application (whenever the user presses this button, the application will open that window), and
scenarios that describe behavioral alternatives (when the user is on this screen, she may continue with this, that, or even that command)

These two scenarios combined give a good understanding of how an application works at a high level of abstraction. Here is the presentation showing you how it works:

You can also get the full paper, our tool for discovering scenarios, and the data set that we used in our evaluation.

Drop me an email or a comment if you find this useful and/or would like to know more.

some historical statistics on modeling formalisms

Posted on November 28, 2012 by dirkfahland

Some historical statistics about the use of formalisms for modeling automated systems since 1800. Here is a chart about how frequently a particular name of a modeling formalism was used in literature. I’ve picked Automata, Petri nets, Process algebra, Statecharts, and UML.

You can get the full chart (and add your own favorite formalism) using Google Ngram. What I find surprising is that Petri nets were at some point as relevant in literature as automata (which have been discussed since the 1800s already). I’m not surprised that UML peaks all of them by far. On second throught also UML’s decline is not suprising as the hype returns to normal levels. What I do find surprising is that process algebras are much less referenced in literature than even the very particular, though successful, technique of statecharts.

proper interface descriptions for your service

Posted on August 29, 2012 by dirkfahland

A service is, in computer science terms, a functionality (of a software or of a device), that hides its implementation details to the user. To be able to use the service, the service has to declare what the service does at its interface. In the old days, you would get a manual, in the new days you would descriptions in some fancy service description language.

Besides discussions how to write down what a service does, I feel there should also be some thoughts about what all should be described about a service. It’s quite easy to miss something important as the following video from our new building shows.

Using a Coffee Machine Service

15 minutes for everyone

Aside

If every person living on earth today wanted his/her 15 minutes of unrivaled fame, it would take ~193778 years from now. We should get started.

Every day 96 people can have their 15 minutes of fame. That means for an estimated 6.79 billion people living on earth today, we need about 70,729,167 days to get everyone famous. That’s just about 193778 years.

Dirk's Metric/k

things on computer and data science and putting them to practical use