Java Tip #2: Use StringBuilder for String concatenation

And on to the next in a series of Java Tips. While last time the performance gain was very tiny and only affected high-load scenarios, this tip may considerably speed up even simple applications where there is String-processing involved.


Advice

Use StringBuilder instead of + or += to concatenate Strings in non-linear cases (when concatenation not appears immediately one after another), ie. in loops.

Code-Example

Before

  String test = "";
  for(int i = 0; i < 50000; i++) {
     test += "abc";
  }

After

  StringBuilder test = new StringBuilder();
  for(int i = 0; i < 50000; i++) {
     test.append("abc");
  }

Benefit

Huge performance gain! The unoptimized code forces Java to create new Strings and copy the contents around all the time (because Strings are immutable in Java). The optimized code avoids this creation/copying by using StringBuilder. While the Java compiler can optimize linear concatenations, ie.

String test = "a" + "b" + "c";

into using a StringBuilder internally, it is not clever enough to apply this optimization correctly if there is more logic than just concatenation involved. Even if the concatenation is the only operation inside some loop-logic, the Java compiler falls back into creating a whole lot of String (or to be more exact: StringBuilder) objects, copying them around like crazy and causing a huge performance impact in some cases. I confirmed this example for measurement of StringBuilder performance: concatenating 50000 times "abc" in a loop takes ~11000ms, using StringBuilder keeps the time spent at near 0ms. (I tried concatenating with 500000 iterations, but stopped the execution of the first loop after ~10 minutes running unfinished, so the impact is nonlinear. Second loop finished in 15ms with 500000, for the record.)

Remarks

It is not necessary to optimize code like

String email = user + "@" + domain + ".com";

as javac is clever enough for these cases and it would reduce readability considerably. But even with the simplest loops involved the conditions change. For example, internally the first loop gets compiled by javac into following bytecode

 13  new java.lang.StringBuilder [24]
 16  dup
 17  aload_1 [test]
 18  invokestatic java.lang.String.valueOf(java.lang.Object) : java.lang.String [26]
 21  invokespecial java.lang.StringBuilder(java.lang.String) [32]
 24  ldc <String "abc"> [35]
 26  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [37]
 29  invokevirtual java.lang.StringBuilder.toString() : java.lang.String [41]
 32  astore_1 [test]
 33  iinc 4 1 [i]
 36  iload 4 [i]
 38  ldc <Integer 50000> [45]
 40  if_icmplt 13

where for each iteration of the loop(13-40) a new StringBuilder is instantiated and initialized with the result of the previous' loop StringBuilders value before appending the constant String just once each time. The optimized code results in this bytecode

 81  new java.lang.StringBuilder [24]
 84  dup
 85  invokespecial java.lang.StringBuilder() [62]
 88  astore 4 [builder]
 90  iconst_0
 91  istore 5 [i]
 93  goto 107
 96  aload 4 [builder]
 98  ldc <String "abc"> [35]
100  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [37]
103  pop
104  iinc 5 1 [i]
107  iload 5 [i]
109  ldc <Integer 50000> [45]
111  if_icmplt 96

in which the loop(96-111) just appends the constant String to the same StringBuilder each time which was created only once before the loop started. No copying around and creation of additional objects necessary, thus a huge performance-gain.

|

Similar entries

Comments

Das gleiche für C# schaut ziemlich anders aus...

http://www.codinghorror.com/blog/2009/01/the-sad-tragedy-of-micro-optimization-theater.html

Verdict: It. Does. Not. Matter.

As you already let me know, you have scanned my posting (and also Jeffs) a bit too fast and missed the parts where it states that this only applies to concatenation in loops.

But your comment raised my curiousity and I looked around a bit on the C# side for that issue. I have to admit, that I've been surprised that this issue also applies to C# as I've assumed that since the C# compiler is more recent than the Java compiler that it would also have advances on the optimization side. But according to someone who made similar measurements ( http://blog.briandicroce.com/2008/02/04/stringbuilder-vs-string-performance-in-net/ ) this is not the case.

Maybe if I find some time lying around somewhere, I'll have a deeper look into this issue and why this seemingly simple optimization opportunity is not already applied by both the Java and C# compiler.

Furthermore, and this one is also likely to be examined somewhere in the future, I'm curious why Strings are immutable in most garbage-collected languages.

Thanks for you comment.