The Takipi benchmark ‘How Misusing Streams Can Make Your Code 5 Times Slower’ contains interesting but unexplained results:
The code is available on Takipi github. I have taken the code from both the fixes
branch (optimized version)
and the master
branch (with the boxing issue). The modified code of this article is available on github
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public int streamBoxingMaxInteger() {
return integers.stream().reduce(Integer.MIN_VALUE, Integer::max);
}
public int lambdaBoxingMaxInteger() {
return integers.stream().reduce(Integer.MIN_VALUE, (a, b) -> Integer.max(a, b));
}
public int streamMaxInteger() {
return integers.stream().mapToInt(Integer::intValue).reduce(Integer.MIN_VALUE, Integer::max);
}
public int lambdaMaxInteger() {
return integers.stream().mapToInt(Integer::intValue).reduce(Integer.MIN_VALUE, (a, b) -> Integer.max(a, b));
}
Core i5, 3Ghz, JDK 8u66
Benchmark | ms/op |
---|---|
forMax2Integer | 0.094 ± 0.002 |
lambdaBoxingMaxInteger | 0.500 ± 0.018 |
lambdaMaxInteger | 0.494 ± 0.253 |
streamBoxingMaxInteger | 0.503 ± 0.005 |
streamMaxInteger | 0.107 ± 0.001 |
static int Integer#max(int a, int b)
takes int
primitive arguments and return a int
primitive.
On the other side, the stream consumes a list of Integer. Thus the result of Integer#max
is boxed from an int
to an Integer
, ie. an implicit
Integer#valueOf
is inserted by the compiler.
1
2
3
4
5
6
7
LINENUMBER 191 L0
ALOAD 0
INVOKEVIRTUAL java/lang/Integer.intValue ()I
ALOAD 1
INVOKEVIRTUAL java/lang/Integer.intValue ()I
INVOKESTATIC java/lang/Integer.max (II)I
INVOKESTATIC java/lang/Integer.valueOf (I)Ljava/lang/Integer;
As suggested by Sergey Kuksenko comment, to get rid of the
autoboxing, the max function must operate on Integer
.
1
2
3
4
5
6
7
8
9
10
11
private static Integer max(Integer a , Integer b){
return (a >= b) ? a : b;
}
public int streamBoxingMaxInteger() {
return integers.stream().reduce(Integer.MIN_VALUE, LoopBenchmarkMain::max);
}
public int lambdaBoxingMaxInteger() {
return integers.stream().reduce(Integer.MIN_VALUE, (a, b) -> max(a, b));
}
Results are far better as soon as you are aware of the autoboxing issue:
Benchmark | Before ms/op | After ms/op |
---|---|---|
lambdaBoxingMaxInteger | 0.500 ± 0.018 | 0.266 ± 0.268 |
streamBoxingMaxInteger | 0.503 ± 0.005 | 0.085 ± 0.013 |
When working with boxed primitive and to avoid this issue, it may be better to use an IntStream
using for example the mapToInt
transformation.
IntStream Stream#mapToInt(ToIntFunction<? super T> mapper) Returns an IntStream consisting of the results of applying the given function to the elements of this stream.
These two codes are very similar once compiled:
1
integers.stream().mapToInt(Integer::intValue).reduce(Integer.MIN_VALUE, Integer::max);
1
integers.stream().mapToInt(Integer::intValue).reduce(Integer.MIN_VALUE, (a, b) -> Integer.max(a, b));
The difference is in the reduce
function:
streamMaxInteger
contains a reference to a static method. The compiled code use an INVOKESTATIC
to the Integer#max
method.lambdaMaxInteger
uses a lambda. Once compiled this lambda is converted to a static method similar to:1
2
3
4
private static Integer lambda$lambdaMaxInteger$0(Integer a , Integer b){
return Integer.max(a,b);
}
integers.stream().mapToInt(Integer::intValue).reduce(Integer.MIN_VALUE, LoopBenchmarkMain::lambda$lambdaMaxInteger$0);
Once compiled both use INVOKESTATIC
but lambdaMaxInteger
has a call depth deeper by 1 than streamMaxInteger
. The performance difference
could be explained if the call to lambda$lambdaMaxInteger$0
is not inlined
In order to analyse the JVM inlining, the bench is launched using -XX:+UnlockDiagnosticVMOptions
and -XX:+PrintInlining
arguments.
1
2
3
4
5
6
7
8
9
10
11
12
@ 32 java.util.ArrayList$ArrayListSpliterator::forEachRemaining (129 bytes) inline (hot)
@ 51 java.util.ArrayList::access$100 (5 bytes) accessor
@ 99 java.util.stream.ReferencePipeline$4$1::accept (23 bytes) inline (hot)
@ 12 com.takipi.oss.benchmarks.jmh.loops.OptimizedLoopBenchmarkMain$$Lambda$1/1699549388::applyAsInt (8 bytes) inline (hot)
\-> TypeProfile (12880/12880 counts) = com/takipi/oss/benchmarks/jmh/loops/OptimizedLoopBenchmarkMain$$Lambda$1
@ 4 java.lang.Integer::intValue (5 bytes) accessor
@ 17 java.util.stream.ReduceOps$5ReducingSink::accept (19 bytes) inline (hot)
\-> TypeProfile (12880/12880 counts) = java/util/stream/ReduceOps$5ReducingSink
@ 10 com.takipi.oss.benchmarks.jmh.loops.OptimizedLoopBenchmarkMain$$Lambda$2/813872125::applyAsInt (6 bytes) inline (hot)
\-> TypeProfile (13602/13602 counts) = com/takipi/oss/benchmarks/jmh/loops/OptimizedLoopBenchmarkMain$$Lambda$2
@ 2 com.takipi.oss.benchmarks.jmh.loops.OptimizedLoopBenchmarkMain::lambda$lambdaMaxInteger$0 (6 bytes) inline (hot)
@ 2 java.lang.Integer::max (6 bytes) inlining too deep
Gotcha, @ 2 java.lang.Integer::max (6 bytes) inlining too deep
, the issue and the performance difference is not correlated to lambda/stream misuse.
Yan Bonnel is true.
For lambda, try change Integer.max by Math.max. I think you hit a limit of jit so it doesn’t inline code?
Let increase the max inlining level to 10 from 9 -XX:MaxInlineLevel=10
Results are again far better:
Benchmark | Before ms/op | After ms/op |
---|---|---|
lambdaMaxInteger | 0.494 ± 0.253 | 0.109 ± 0.025 |
streamMaxInteger | 0.107 ± 0.001 | 0.107 ± 0.001 |
Benchmark | Before ms/op | After ms/op |
---|---|---|
forMax2Integer | 0.094 ± 0.002 | 0.094 ± 0.002 |
lambdaBoxingMaxInteger | 0.500 ± 0.018 | 0.080 ± 0.001 |
lambdaMaxInteger | 0.494 ± 0.253 | 0.108 ± 0.003 |
streamBoxingMaxInteger | 0.503 ± 0.005 | 0.080 ± 0.001 |
streamMaxInteger | 0.107 ± 0.001 | 0.107 ± 0.001 |
The title How Misusing Streams
is in my PoV wrong. These issues are not related to lambda/stream but to some Java subtlety.
The autoboxing performance issue could have also occurred with a for statement:
1
2
3
4
5
6
7
public int forMaxInteger() {
Integer max = Integer.MIN_VALUE;
for (int i = 0; i < size; i++) {
max = Integer.max(max, integers.get(i));
}
return max;
}
In my opinion a better title would be Benchmark: How Misusing Autoboxing Can Make Your Code 5 Times Slower
.
Java is a subtle language and optimizing could be complex and benchmarking could be hard. Java library does not implement for wrapper classes the counterpart methods which operate on primitives. It relies on autoboxing and unboxing and the Java compiler and JIT could not be able to fully optimize the code.
Anyway for this use case (max of a list), stream and lambda are not slower than a for
statement.
On the contrary they seem faster. I do not have yet analyse why, maybe in a part 2.