Kotlin's hidden costs - Android benchmarks
A blog post by Renato Athaydes benchmarks the “hidden costs of kotlin” as described by a series of blog posts by Christophe B.
I advise you to read both of those first. Here, we will not cover the original reasons why these features might be costly nor the benchmarking code used to test it.
While the original blog posts were geared towards using kotlin on android, the benchmarks are on the jvm. Android, however, does not use the jvm. It has it’s own runtime ART (previously Dalvik). I decided to modify the benchmarks to run on an android device.
Just to be clear, nothing here invalidates any results of the previous benchmarking blog post. These are run on a completely different architecture and runtime.
Methodology
My fork is on github. I had to port the benchmarks from JMH to Spanner (a fork of Caliper) but they should be functionally the same. Also, because android does not natively support java 8 lambdas until api 24, the desugar is used to backport the feature. Tests were run on a Google Pixel running Android 7.1.2. It would be interesting to hear if the results are different on different android versions/devices.
Results
Please refer to the original benchmark blog post for what code was run in each benchmark. I’m only going to provide the results on Android here.
Lambdas
javaLambda
runtime(ns): min=16.89, 1st qu.=17.35, median=17.57 (-), mean=17.60, 3rd qu.=18.03, max=18.11
javaLambdaGeneric
runtime(ns): min=26.58, 1st qu.=27.16, median=27.55 (-), mean=27.52, 3rd qu.=27.94, max=28.30
kotlinInlinedFunction
runtime(ns): min=10.23, 1st qu.=10.24, median=10.30 (-), mean=10.50, 3rd qu.=10.74, max=11.33
kotlinLambda
runtime(ns): min=28.62, 1st qu.=28.83, median=29.63 (-), mean=29.68, 3rd qu.=30.20, max=31.88
Note: In the charts, lower is better.
Unlike before, we do see the kotlin lambda take more time than the java version. I’ve also added a version of the java lambda that uses a generic Function
instead of ToIntFunction
. You’ll note that this runs for almost exactly the same amount of time as the kotlin version, so the extra time is almost certainly due to boxing the primitive int. The kotlin inlined function is the fastest of them all, saving boxing, a few null checks, and a method lookup/call.
Companion Objects
javaPrivateConstructorCallFromStaticMethod
runtime(ns): min=77.83, 1st qu.=78.72, median=80.03 (-), mean=81.09, 3rd qu.=83.70, max=86.39
kotlinPrivateConstructorCallFromCompanionObject
runtime(ns): min=100.68, 1st qu.=107.23, median=109.71 (-), mean=109.52, 3rd qu.=113.14, max=116.23
kotlinPrivateStaticConstructorCallFromCompanionObject
runtime(ns): min=98.46, 1st qu.=101.49, median=106.56 (-), mean=105.00, 3rd qu.=107.30, max=110.67
Again, we do see a cost to the kotlin version. I’ve added an additional method which applies the advice in “Exploring Kotlin’s hidden costs” to use const
and @JvmStatic
. This appears to remove the difference between the kotlin and java versions
Local Functions
javaLocalFunction
runtime(ns): min=92.55, 1st qu.=94.46, median=99.79 (-), mean=99.13, 3rd qu.=102.84, max=105.55
javaLocalFunctionWithoutCapturingLocalVariable
runtime(ns): min=7.49, 1st qu.=7.53, median=7.58 (-), mean=7.60, 3rd qu.=7.66, max=7.84
kotlinLocalFunctionCapturingLocalVariable
runtime(ns): min=105.18, 1st qu.=108.60, median=110.76 (-), mean=114.35, 3rd qu.=122.18, max=127.57
kotlinLocalFunctionWithoutCapturingLocalVariable
runtime(ns): min=7.23, 1st qu.=7.27, median=7.36 (-), mean=7.37, 3rd qu.=7.48, max=7.49
I’ve added an additional java lambda that does not capture a value. Again, unlike before, (starting to see a pattern?) there is a huge difference between the capturing and non-capturing lambdas. This doesn’t seem language-specific though. The kotlin and java versions are nearly the same. The cost here is capturing vs not.
Null Safety
javaSayHello
runtime(ns): min=425.72, 1st qu.=442.25, median=457.86 (-), mean=458.86, 3rd qu.=469.51, max=516.85
kotlinSayHello
runtime(ns): min=439.67, 1st qu.=440.48, median=455.82 (-), mean=468.59, 3rd qu.=472.44, max=577.72
And now we’ve come across a result that seems to agree with the previous blog post. However, the null check is completely dwarfed by string allocations. In order to see its actual effect, I’ve decided to remove the string concatenation.
javaSayHello
runtime(ns): min=10.95, 1st qu.=10.96, median=10.98 (-), mean=11.03, 3rd qu.=11.12, max=11.21
kotlinSayHello
runtime(ns): min=13.82, 1st qu.=13.99, median=14.54 (-), mean=14.37, 3rd qu.=14.62, max=14.75
So the null checks do have a cost, though at less than 5ns it’s pretty small.
Varargs
javaIntVarargs
runtime(ns): min=138.62, 1st qu.=140.98, median=142.37 (-), mean=142.58, 3rd qu.=144.88, max=145.74
kotlinIntVarargs
runtime(ns): min=371.31, 1st qu.=375.83, median=392.00 (-), mean=390.80, 3rd qu.=405.13, max=407.57
This result agrees most closely with the previous blog post. The java version is over 2x faster than the kotlin one.
Delegated Properties
javaSimplyInitializedProperty
runtime(ns): min=90.50, 1st qu.=91.68, median=93.43 (-), mean=96.48, 3rd qu.=100.62, max=111.42
kotlinDelegateProperty
runtime(ns): min=211.95, 1st qu.=217.71, median=233.14 (-), mean=235.72, 3rd qu.=252.38, max=268.81
We do see a cost to using the delegated property. However it appears to be much more than 10%.
Ranges (Indirect Reference)
kotlinIndirectRange
runtime(ns): min=133.25, 1st qu.=134.83, median=138.75 (-), mean=140.82, 3rd qu.=147.47, max=154.65
kotlinLocallyDeclaredRange
runtime(ns): min=2.30, 1st qu.=2.32, median=2.36 (-), mean=2.35, 3rd qu.=2.37, max=2.39
Ok, unlike before this is the opposite of a “not significant” cost. The execution differs by an order of magnitude!
Ranges (Non-primitive Types)
javaStringComparisons
runtime(ns): min=19.94, 1st qu.=20.07, median=20.48 (-), mean=20.39, 3rd qu.=20.62, max=20.84
kotlinStringRangeInclusionWithConstantRange
runtime(ns): min=42.95, 1st qu.=43.30, median=43.67 (-), mean=43.81, 3rd qu.=44.48, max=44.73
kotlinStringRangeInclusionWithLocalRange
runtime(ns): min=144.33, 1st qu.=147.76, median=154.53 (-), mean=154.06, 3rd qu.=157.01, max=171.33
Again, unlike before, the difference here is quite large. Note: I’ve included the java string comparison benchmark as well. It’s not included in the previous blog post despite it being a part of the benchmark.
Ranges (Iteration)
kotlinRangeForEachFunction
runtime(ns): min=454.21, 1st qu.=475.43, median=508.76 (-), mean=507.04, 3rd qu.=524.45, max=587.05
kotlinRangeForEachLoop
runtime(ns): min=157.03, 1st qu.=166.79, median=172.07 (-), mean=170.62, 3rd qu.=175.78, max=176.92
kotlinRangeForEachLoopWithStep1
runtime(ns): min=444.81, 1st qu.=447.64, median=455.73 (-), mean=460.28, 3rd qu.=472.08, max=489.84
Like before, the
forEach
function is way slower than a simple for loop. However, the explicit step
is quite costly as well.
Iterations: Collection Indices
kotlinCustomIndicesIteration
runtime(ns): min=231.88, 1st qu.=236.17, median=260.12 (-), mean=256.89, 3rd qu.=273.05, max=280.20
kotlinIterationUsingLastIndexRange
runtime(ns): min=98.95, 1st qu.=105.44, median=107.80 (-), mean=107.10, 3rd qu.=109.29, max=112.01
And finally, using lastIndex
is more than twice as fast as using a custom indices
.
Conclusion
At least for Android, all the advice in the “Exploring Kotlin’s hidden costs” series is correct. This underlines the need to not only measure, but measure on the platform you are using.