DatE
May 20, 2020
Reading Time
6 Minuten

Heisenbug — software error with observer effect

Testing
Alarming

By

Ramon Anger

A new test fails. It's unexpected. One NullPointerException? Here? I set a break point in the code and start the test in debug mode. Why should this variable zero be? The test stops at break point. The variable is not zero. I'll let the test continue. green

I repeat the test several times in debug mode. The variable is never zero. I exit debug mode and start the test normally. The test is red. I'm biting into the table.

Photo by Paulo Ziemer on Unsplash

What is a heisenbug?

Anyone who has already gained this experience has come across a so-called Heisenbug. It's a faulty piece of code that appears to change its behavior when you watch it. The term Heisenbug is a word game based on the Werner Heisenberg described Observer effect of quantum mechanics obtains.

In reality, however, the code does not change its behavior. We are changing the framework under which the code runs so that we can observe it.

Heisenbug example 1

A small example in Java: A list of 100 elements is generated and initialized with the values “0” to “99”. Per stream () is iterated over the list and the element “42" — if found — is removed.

1 class HeisenBugTest {
2
3 <String>List list = new ArrayList<> ();
4
5 @Test
6 void heisenbug () {
7 for (int element = 0; element < 100; element++) {
8 list.add (string.valueOf (element));
9}
10
11 list.stream ()
12 .filter (string -> string.equalsIgnoreCase (“42"))
13 .forEach (this: :removeMember);
14}
15
16 void removeMember (string element) {
17 list.remove (item);
18}
19}

Now we may know that directly manipulating a collection during a stream ()-Operation is a bad idea (line 13). There will be a ConcurrentModificationException give.

But I'm getting a NullPointException!

java.lang.NullPointerException

at

de.pentacor.javabasics.api.HeisenBugTest.lambda$heisenBug$0 (HeisenBugTest.java:19)

at

java.base/java.util.stream.ReferencePipeline$2$1.accept (ReferencePipeline.java:176)
at

java.base/java.util.ArrayList$arrayListSpliterator.forEachRemaining (ArrayList.java:1654)

at

java.base/java.util.stream.AbstractPipeline.copyInto (AbstractPipeline.java:484)

How now? I'm debugging...

I'll get to the 100th element (has the value “99" because we started with “0”).

And then...

Uh wait! 101 elements? No, the list still has 100 items. Where does the last element come from?

This is followed by the moment of biting into the table.

Photo by Luca Bravo on Unsplash

For me, the fog only clears when I look at the Java Streams documentation:

Unless the stream source is concurrent, modifying a stream's data source during execution of a
Stream Pipeline can cause exceptions, incorrect answers, or nonconformant behavior.

Ah. When I instead stream () the function parallelStream () calls or a LinkedList instead of ArrayList Use, my nerd world is okay again. Debugging didn't help me anyway. Quite the opposite.

Heisenbug example 2

Also Java: The class Heisenbug Below has a function that converts the time of a given date into a string.

1 public class Heisenbug {
2
3 private static final SimpleDateFormat SIMPLE_DATE_FORMAT = new SimpleDateFormat (“hh:mm:ss, SSS”, locale.getDefault ());
4
5 String formatTime (Date date) {
6 return simple_date_format.format (date);
7}
8}

The method FormatTime includes a unit test of

  • Generates a date
  • Converts the included time to an expected value
  • ours FormatTime-function calls
  • the result of FormatTime compares with the expected value

1 class SynchronizationExampleTest {
2
3 synchronizationExample cut = new synchronizationExample ();
4
5 @Test
6 void formatDate () {
7 Date Date =...
8 String expected = new SimpleDateFormat (“hh:mm:ss, SSS”, locale.getDefault ()) .format (date);
9 string actual = cut.formatDate (date);
10 assertEquals (expected, actual);
11}
12}

green The test is working.

In reality — on a productive system — the function FormatTime often called upon... by various consumers. And these consumers complain that the function sometimes returns nonsense.

With a lot of effort, we managed to understand the mistake locally. The following code is used for this purpose:

1 public class HeisenBugThread extends thread {
2 public void run () {
3 int count = 0;
4 for (int i = 0; i < 1000; i++) {
5 Heisenbug Heisenbug = new Heisenbug ();
6
7 Date Date =...
8
9 string actual = heisenBug.formatDate (date);
10 String expected = new SimpleDateFormat (“hh:mm:ss, SSS”, locale.getDefault ()) .format (date);
11
12 if (! formattedDate.equals (expected)) {
13 count++;
14}
15}
16 System.out.println (“Thread:" + this.getId () + “, wrong results:" + count);
17}

The consumer HeisenBugThread Below creates a random date 1000 times. Its time becomes similar to Heisenbug's FormatTime generates a string (expected). The result is given by the string from formatTime was returned, compared (line 9). Are the expected time string and the return value of FormatTime does not match, a counter is increased (13). At the end, the counter (wrong results) issued. this.getId () Delivers the Id of the thread (16).

The class Heisenbug receives an additional main ()-Method.

1 public static void main (String [] args) {
2 new HeisenBugThread () .start ();
3 new HeisenBugThread () .start ();
4}

Do I start this Mainmethod, for example, I get the following output:

Thread: 22, wrong results: 101
Thread: 21, wrong results: 88

From every 1000 views of FormatDate An incorrect result was returned 101 or 88 times. I should debug that! So I put in line 7 of the class HeisenBugThread a breakpoint. I'm starting Heisenbug main in debug mode and tap over lines 9 and 10.

Too bad! No discrepancy.

According to the meter wrong results About every 10th run above should fail.

Unfortunately nothing. Not even after 100 runs. Unfortunately, the code doesn't do me any favor. As long as I look at the code, the error doesn't occur.

I start the Main-Method of the class Heisenbug once again:

Thread: 22, wrong results: 106
Thread: 21, wrong results: 96

Menno!

What do I do differently when I debug the code?

What does my development environment do when debugging? It stops running the program at the breakpoint. For both HeisenBugThreadinstances. The heisenbug may only occur if both threads are running. That can be checked.

1 public static void main (String [] args) {
2 //new myThread () .start ();
3 new MyThread () .start ();
4}

And lo and behold:

Thread: 21, wrong results: 0

The mistake — my mistake — must therefore concern a position that of both HeisenBugThread is used.

Well, of course the Heisenbug is built here. The class Heisenbug uses a constant of the type to format the time SimpleDateFormat. A look at the javadoc Tell us:

“Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.”

I didn't sync. It had to go wrong.

Which situations favor a heisenbug?

As a rule, a Heisenbug is caused by differences in timing or parallel accesses between reality and test or debug situations. Timing and concurrency play a role in many concepts in (distributed) programming, for example:

More tips on causes are available at Stack Overflow. A service that runs under one or more of these situations is a candidate for Heisenbugs. That doesn't necessarily mean that someone has to perform.

How can I avoid iron bugs?

From my point of view, there are various approaches available here. I avoid heisenbugs due to...

  1. automated tests beyond simple unit or integration tests that are not designed for parallel processing or timing issues: Chaos Engineering offers a good, methodical toolset here. If my service is intended for use by many users, I test the access of these many users. A unit test only gives an indication of the functional correctness of my code.
  2. good understanding of the frameworks and tools used: If I use an unsynchronized instance of a class, as in the example above, I should be aware of this. Reading specifications helps just as much as experience in using the tools used.
  3. Good understanding of the concepts and architectures used: Lazy loading and lazy initialization are exciting concepts for balancing resource consumption and performance. When I understand exactly how they work and where their limits are, I learn to use them instead of failing at them.

Are functional or data errors also iron bugs?

Very likely not. If my code produces unexpected results under certain conditions — e.g. with special data constellations — these results are likely to always occur with the given data. Be careful with data that depends on or influences each other.

But there are iron bugs that only occur with certain data constellations. Its cause is not in the data itself. They are only the cause of the Heisenbug.

synopsis

Software errors that are difficult to detect can creep in in our complex service world. These only occur when certain conditions occur in connection with timing or parallel use. The code appears to change its behavior when trying to detect the bug — the Heisenbug. Simple unit or integration tests do not help to detect Heisenbugs. When debugging the affected code, the cause of the error is often not apparent because the debug mode does not correspond to the actual application of the code. Precise knowledge of the frameworks and concepts used, but also production-related test scenarios — e.g. using chaos engineering — help to avoid and recognize iron bugs.