Java Tip #2: Use StringBuilder for String concatenation

And on to the next in a series of Java Tips. While last time the performance gain was very tiny and only affected high-load scenarios, this tip may considerably speed up even simple applications where there is String-processing involved.


Advice

Use StringBuilder instead of + or += to concatenate Strings in non-linear cases (when concatenation not appears immediately one after another), ie. in loops.

Code-Example

Before

  String test = "";
  for(int i = 0; i < 50000; i++) {
     test += "abc";
  }

After

  StringBuilder test = new StringBuilder();
  for(int i = 0; i < 50000; i++) {
     test.append("abc");
  }

Benefit

Huge performance gain! The unoptimized code forces Java to create new Strings and copy the contents around all the time (because Strings are immutable in Java). The optimized code avoids this creation/copying by using StringBuilder. While the Java compiler can optimize linear concatenations, ie.

String test = "a" + "b" + "c"; 

into using a StringBuilder internally, it is not clever enough to apply this optimization correctly if there is more logic than just concatenation involved. Even if the concatenation is the only operation inside some loop-logic, the Java compiler falls back into creating a whole lot of String (or to be more exact: StringBuilder) objects, copying them around like crazy and causing a huge performance impact in some cases. I confirmed this example for measurement of StringBuilder performance: concatenating 50000 times “abc” in a loop takes ~11000ms, using StringBuilder keeps the time spent at near 0ms. (I tried concatenating with 500000 iterations, but stopped the execution of the first loop after ~10 minutes running unfinished, so the impact is nonlinear. Second loop finished in 15ms with 500000, for the record.)

Remarks

It is not necessary to optimize code like

String email = user + "@" + domain + ".com"; 

as javac is clever enough for these cases and it would reduce readability considerably. But even with the simplest loops involved the conditions change. For example, internally the first loop gets compiled by javac into following bytecode

 13  new java.lang.StringBuilder [24]
 16  dup
 17  aload_1 [test]
 18  invokestatic java.lang.String.valueOf(java.lang.Object) : java.lang.String [26]
 21  invokespecial java.lang.StringBuilder(java.lang.String) [32]
 24  ldc <String "abc"> [35]
 26  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [37]
 29  invokevirtual java.lang.StringBuilder.toString() : java.lang.String [41]
 32  astore_1 [test]
 33  iinc 4 1 [i]
 36  iload 4 [i]
 38  ldc <Integer 50000> [45]
 40  if_icmplt 13

where for each iteration of the loop(13-40) a new StringBuilder is instantiated and initialized with the result of the previous’ loop StringBuilders value before appending the constant String just once each time. The optimized code results in this bytecode

 81  new java.lang.StringBuilder [24]
 84  dup
 85  invokespecial java.lang.StringBuilder() [62]
 88  astore 4 [builder]
 90  iconst_0
 91  istore 5 [i]
 93  goto 107
 96  aload 4 [builder]
 98  ldc <String "abc"> [35]
100  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [37]
103  pop
104  iinc 5 1 [i]
107  iload 5 [i]
109  ldc <Integer 50000> [45]
111  if_icmplt 96

in which the loop(96-111) just appends the constant String to the same StringBuilder each time which was created only once before the loop started. No copying around and creation of additional objects necessary, thus a huge performance-gain.