StringBuilders - How smart is the compiler?

Or; fun with javap

Posted on 21 April 2016

Most experienced Java developers probably know that when you concatenate strings together in your Java code the Java compiler will use a StringBuilder for you. However; there are quite a few misconceptions about how smart the compiler actually is. In this post I would like to explain what it does, what it doesn’t do, and how to use the tools at your disposal to figure this out yourself.

Basic example

So let’s start with a little example:

  public static String appendA(String a, String b) {
    return a + b;
  }

  public static String appendB(String a, String b) {
      return new StringBuilder().append(a).append(b).toString();
  }

The first one is, although 'cleaner', not less performant right? If we use the javap tool installed with our JDK we can disassemble a compiled class file to look at what the compiler actually did for us.

Note

You disassemble a previously compiled .class file using: "javap -c <name>.class"

This is the output:

public static java.lang.String appendA(java.lang.String, java.lang.String);
    Code:
       0: new           #3                  // class StringBuilder
       3: dup
       4: invokespecial #4                  // Method StringBuilder."<init>"
       7: aload_0
       8: invokevirtual #5                  // Method StringBuilder.append
      11: aload_1
      12: invokevirtual #5                  // Method StringBuilder.append
      15: invokevirtual #7                  // Method StringBuilder.toString
      18: areturn

  public static java.lang.String appendB(java.lang.String, java.lang.String);
    Code:
       0: new           #3                  // class StringBuilder
       3: dup
       4: invokespecial #4                  // Method StringBuilder."<init>"
       7: aload_0
       8: invokevirtual #5                  // Method StringBuilder.append
      11: aload_1
      12: invokevirtual #5                  // Method StringBuilder.append
      15: invokevirtual #7                  // Method StringBuilder.toString
      18: areturn
Note

I removed some of the javap output that shows the parameters and return types of the functions being called. It doesn’t add anything to this example and messed up the formatting.

No surprise here. They result in exactly the same byte code. You see a new StringBuilder being created and it’s constructor being called (it’s the invokespecial call). Then the first parameter is being loaded (aload_0), append() is being called (invokevirtual), the same for the second parameter (aload_1) and then finally we call toString() and return out of the function.

We can already see a small optimization where we can help the compiler by feeding the first parameter into the constructor:

public static String appendC(String a, String b) {
    return new StringBuilder(a).append(b).toString();
}

javap output:

public static java.lang.String appendC(java.lang.String, java.lang.String);
    Code:
       0: new           #3                  // class StringBuilder
       3: dup
       4: aload_0
       5: invokespecial #10                 // Method StringBuilder."<init>"
       8: aload_1
       9: invokevirtual #5                  // Method StringBuilder.append
      12: invokevirtual #7                  // Method StringBuilder.toString
      15: areturn

As you can see we are already 'smarter' than the compiler since we figured out that we can pass the first param to the constructor saving one invokevirtual call. The compiler will not do this for you!

Now this isn’t that interesting an example but where it does become very interesting is within a loop. This is where many developers have the wrong idea about what the compiler does and doesn’t do for you.

For loops

Let’s create a function with a small for loop:

public static String numbersA() {
    String line = "";

    for(int i = 0;i < 10;i++) {
        line += i;
    }

    return line;
}

This simple function that appends the numbers 0-9 together should use a StringBuilder right? It does, but it might not do so in the way you expect it to:

public static java.lang.String numbersA();
    Code:
       0: ldc           #12                 // String
       2: astore_0
       3: iconst_0
       4: istore_1
       5: iload_1
       6: bipush        10
       8: if_icmpge     36
      11: new           #3                  // class StringBuilder
      14: dup
      15: invokespecial #4                  // Method StringBuilder."<init>"
      18: aload_0
      19: invokevirtual #5                  // Method StringBuilder.append
      22: iload_1
      23: invokevirtual #6                  // Method StringBuilder.append
      26: invokevirtual #7                  // Method StringBuilder.toString
      29: astore_0
      30: iinc          1, 1
      33: goto          5
      36: aload_0
      37: areturn

If you’re not used to reading assembly-like listings you might wonder where the for loop went. Well; CPU’s don’t do for or while loops. They only do comparisons and jumps. If you check out position 33 you see a "goto": this is the end of our loop. Where does it jump to? To position 5. So it’s easy to spot the start (pos 5) and end (pos 33) of our for loop.

And we also see our familiar StringBuilder being constructed here. But it’s done on position 11: inside our for loop!

So as you can see, even an extremely simple example where you append to a string inside a loop does not use a StringBuilder optimally: each iteration in the loop creates a new StringBuilder, appends the previous value, then the new value, and then stores the toString() of that StringBuilder in the line var.

Fortunately now that we know this we can help the compiler by defining the StringBuilder ourselves:

public static String numbersB() {
    StringBuilder builder = new StringBuilder();

    for(int i = 0;i < 10;i++) {
        builder.append(i);
    }

    return builder.toString();
}

Which when compiled disassembles to:

public static java.lang.String numbersB();
    Code:
       0: new           #3                  // class StringBuilder
       3: dup
       4: invokespecial #4                  // Method StringBuilder."<init>"
       7: astore_0
       8: iconst_0
       9: istore_1
      10: iload_1
      11: bipush        10
      13: if_icmpge     28
      16: aload_0
      17: iload_1
      18: invokevirtual #6                  // Method StringBuilder.append
      21: pop
      22: iinc          1, 1
      25: goto          10
      28: aload_0
      29: invokevirtual #7                  // Method StringBuilder.toString
      32: areturn

Here we can see that the for loop is from position 10 to 25 and that the only method invoked inside it is the append. The difference in length might not seem much but the actual body from the for-loop went from 28 to 15 bytes!

Conclusion

I hope that this gives a bit more insight into what the compiler does and does not do for you. Especially when concatenating strings together in a loop (for, while and do-while all work the same way) you should always strongly consider using a builder instead of relying on the compiler to come up with an optimal solution.