from the other side of the review form: why papers get rejected and what the BPM community can do about it

I’ve been reviewing papers for the International Conference on Business Process Management (BPM) over the last 3 or 4 years, in 2011 and 2012 as a member of the Program Committee. In that time my evaluations of submitted research papers have been to reject the paper in the very vast majority of cases. This year the best score I gave was a borderline on one paper, and a reject on all other papers (8 in total), other years were a bit better, but not much. In the following I’d like to share my view on why the papers were rejected. Paper authors may find this interesting to learn how they can improve their chances of getting a paper accepted. Perhaps more importantly, it also sheds a light on publishing standards within BPM research and what the BPM community as a whole can do to promote these standards.

basics are not the problem

Yes, BPM is a very competitive conference with a low acceptance rate (around 25 papers out of 200-300 submitted manuscripts get accepted). But what I find surprising is the significant difference between papers that get accepted and papers that get rejected. Even more, papers were mostly rejected for the same kind of reasons. The majority of the papers I got to review, including the rejected ones succeeded on all basics in research papers: they

  • addressed a problem that is relevant to BPM,
  • had a clear problem statement,
  • proposed an interesting and novel idea to solve the problem,
  • were written in good English, and
  • had a decent structure.

So what was wrong with these papers?

the usual suspects: more details, more literature

Some papers did not explain an idea well enough. The questions I always ask myself is “Do I believe this works?” or even better “Am I now would be able to build the solution myself (with some more reflection on technical details on following up cited work)?“. Often, a crucial (technical) notion was not clear, for example: how exactly is this process graph constructed?

Many papers had flaws in literature study and comparison. The standard issue: Someone else has published a paper that proposes a solution very similar to a (part of) the contribution in the new paper, and the new paper does not discuss how it differs from the old one. Thorough literature research is hard work and chances are that one did not find a paper known by a reviewer. However, a paper should at least discuss relevant results published in previous years of the very same conference. I usually ask myself the question “Is there an difference to previous work that matters such as a more general class of problems solved, a faster solution, a better solution, a more elegant solution, …?

These two reasons probably apply to paper writing and paper rejections in general. The next two reasons are different as they are particular to what specifically the BPM community expects from a research paper.

show stopper at the start: existing problem canon

Quite a few texts lacked addressing an important aspect of the problem that has been raised and discussed in earlier works. Many papers submitted to BPM address problems that are part of a larger, more general problem such as compliance, modeling and verification of data-dependent processes, adapting processes, process mining etc. These more general problems have been tackled from various angles and the problem has been understood better and better, which also creates a canon of evaluating solutions:

  • What is an acceptable solution?
  • How to measure the quality of a solution?
  • What are relevant factors?
  • For which use cases should a solution apply?
  • What kind of assumptions can one make on the problems to solve?
  • etc.

I frequently found papers to just focus on a single aspect of the problem while ignoring other important aspects, that, by current state of knowledge should not be ignored. For example,

  • A paper on process mining cannot avoid the discussion on which quality measure of a process model is optimized by the algorithm and which one is neglected.
  • A paper on verifying processes for correctness cannot avoid listing BPM-specific properties that shall be checked such as the notion of soundness.
  • A paper on designing or extending process modeling languages cannot ignore existing modeling languages and their particular treats such as BPMN being industry standard “for everything”, BPEL being the standard for executable service models, Petri nets being the major formal model in most tasks.
  • etc.

These are just examples. In their core they are variants of the following question: “So you have this nice technique, how exactly does it help me to solve my BPM problem? Oh and by the way, here is a book of standard requirements you should meet anyway.

stale end: unconvincing evaluation

Finally, the majority of all reviewed papers failed in having a convincing evaluation.

There are papers which are entirely conceptual and novel in a way that the problem was not discussed before, or a known problem is solved for the very first time. In these cases (and probably in a few more) the idea alone is a contribution that is worth discussing without even having a large scale practical evaluation. A decent running example then usually suffices to illustrate the potential of the idea.

However, these papers are rare. Most papers have an incremental element: they

  • solve an existing problem better than previous solutions,
  • generalize an existing solution of a known problem,
  • improve an existing solution a way that can be measured, or
  • combine existing techniques to solve a novel or unsolved problem.

The consequence of this incrementally is that reviewers and readers expect some proof that “things work”. Many of the papers I have seen actually did have an experimental or practical evaluation: ideas were implemented in a tool and then applied to artificial or real-life data.

What the papers actually lacked was a presentation of convincing results. It is usually not sufficient to just show a large table with numbers where some column has values in a range considered “good” (fast analysis, small model, high similarity, high confidence, etc.).

BPM is a discipline about making very complex software understandable to humans – most likely people with a less technical background. If a technique is about extracting/checking/changing information that is in any way interesting to look at for a human being, then a reader and a reviewer likes to see this information in an understandable form. Here are some examples I can think of:

  • If a technique produces or transforms a model (a process model, a data model, …), then the evaluation should show some model diagrams, not just a table with model statistics.
  • If a technique checks for errors in some kind of input, it would be worth illustrating identified errors and/or diagnostic information on this input.
  • If a technique queries models from a repository, the evaluation should show a few models that were returned as the result of a given query.
  • If a technique is about relating two or more things (say compare different versions of a process model), then the evaluation should show this input and highlight similarities and differences.

These are just a few examples taken from the kind of papers I’ve been reviewing. Probably every technique that solves a BPM problem has an artifact worth showing. Such an evaluation showing relevant problem instances and results does not replace a fully fledged case study, but it can be convincing enough for reviewers and readers that the technique works.

In case a technique actually is all about number crunching (or has a significant part of number crunching), the table should not miss measures that are considered relevant in that problem domain: see ‘existing problem canon’ above.

Finally, I have seen a few papers that did not compare the results of their technique to results obtained by existing techniques (on the same input). Making such an evaluation is a tedious task. It requires mastering other techniques and tools that all have their small hidden assumptions about input format and runtime environment. Yet, experimentally comparing a new technique to state of the art is imperative – if that state of the art is solving the same problem. Simply because the quality of techniques is evaluated best by evaluating the quality of their outputs.

homework for the BPM community

You may find the findings I listed here trivial and not worth reporting, because they state the obvious. But my reviews show otherwise. The number of submissions to our conferences show that many researchers would like to actively contribute to BPM with their ideas, while the reviews show that they are not aware of the standards by which we, the BPM community, review our peers.

It seems they and we could benefit from more transparency in the problem/solution canon we maintain and the requirements we raise for evaluation. It could also help attracting researchers from other related fields such as software engineering.

I hope this posts helps other BPM researchers understand better which quality standard I’m applying when reading and reviewing papers – and try to adhere to when writing. My observations are necessarily stated from a personal point of view and could be biased. If you would like to add your own observations or have a different opinion, I’d be interested to hear it.

disclaimer: I’ve not only reviewed papers for BPM, but also submitted papers to BPM. I try to adhere to the same quality standards in my own papers as the standards I am applying in reviews. However, I cannot guarantee I would accept my own papers under these standards.

Update Interesting related insights relayed to me by @MultumNonMulta: “Yes, Computer Scientists Are Hypercritical

Advertisements

3 thoughts on “from the other side of the review form: why papers get rejected and what the BPM community can do about it

  1. To be brutally honest, I find it sort of amusing, that this post has more or less the exact same problems you criticize BPM papers for. When setting out to tell other people how to write, I like to structure my own writing in the form proposed. This means, bring the most important points and the conclusion early on and provide ample examples. While you have some good points, they are not made very clear and often requires some searching to find. Your notice at the end more or less negates anything positive in the post as far as I am concerned. Maybe the solution is to think in a completely other more open way about science and the reviewing process instead of trying to fix a broken system?

    While you do list a number of issues, they are only vaguely elaborated upon. For example, you mention literature review as a vague point and provide some simple examples of what you want, yet no clear examples setting apart a good and a bad literature review. There are numerous articles on writing (for science) which give such pointers and examples. Now, to adhere to my own rule, I should link a couple, but I guess the Google-fu of anybody reading should be sufficient (also, I’m lazy).

    Some things are also part of science folklore and can be ignored. Do I need to mention the running time of Quicksort is expected O(n log n)? Do I need to mention BPMN when I’m working on declarative models? Where is the limit? Details are not always good and finding the right balance is key, not just to mindlessly provide more.

    As for evaluation, a good example would exactly be your own papers; what was good, what was bad. Instead, you just state that they may not adhere to your rules. Wouldn’t it be more interesting to investigate why that is? Were you young and didn’t know better? Were you lazy and pooped a couple stinkers past even lazier reviewers? Did you intentionally break the rules for effect?

    My take is that the whole anonymous review-process is aging, wrong, and does stifle innovation. I’m not saying, everybody should go open access, but my experiences from bioinformatics are that scientific blogging (as you do here) and especially sharing preliminary results lead to much better papers and results. Papers are shared in preliminary forms and sub-results on blogs. This does not make it possible to steal results, as everybody knows who invented what, so anybody trying to cheat will get caught. It makes it possible to build upon results that will only get submitted in a year and published in two, allowing a much faster pace of research.

    While this is a noble goal to work towards (IMO), I realize it may be impractical and that perfect is the enemy of good (and vice versa). As a temporary solution, people can sign their reviews, breaking the imbalance between an anonymous reviewer and a known author, forcing reviewers to really stand by their conclusions (which I completely believe you do), and allowing an honest discussion about points before it is too late (final submission date). You give the example of a reviewer knowing a publication not known by the authors; often this is an oversight or two independent discoveries of the same with different names. Using a more open process instead of just being more open about requirements would completely eliminate this.

    • I’m not sure that anonymous refereeing should be pitched as a disjoint alternative to open-access. Already now it is perfectly possible and even easy for researchers to make available early versions of their work through preprints (e.g. arxivx), to blog about their latest preprints, etc. In fact, a good share of researchers are already doing this (see the growing amount of preprints in arxivx as proof of this). This practice allows early feedback and discussions to take place on new ideas. Anonymous refereeing is more about putting an ex-post stamp on a paper to the effect that experienced researchers in the field find the paper meets certain standards and adds value to the existing body of research.
      I am also not sure that a system of non-anonymous refereeing does not have its drawbacks. It might lead to people becoming popular in a community because they write nice words and give nice scores in their reviews, and other people becoming unpopular because they are highly critical and perhaps undiplomatic in their reviews.
      All this to say that open reviewing and blind refereeing might be complementary rather than competitors.

  2. Dirk, you may find it interesting that we once did a statistical analysis of BPM conference submissions (plus those of other conferences), to determine some critical and not-so-critical aspects of papers that are rejected or accepted. I would actually love to repeat this study over time, to see whether things have changed. See here: http://eprints.qut.edu.au/31606/.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s