How big is the overhead of using Clojure datastructures vs native Java Arrays? I was rather suprised to find it is lower than I expected, and the implementation looks much better using Clojure native vectors and lists than Java Arrays.
First an essentail when doing something like this
(set! *warn-on-reflection* true)
Without that it is easy to accidentally create arrays very inefficient calls via the array accessor and setter methods by having to rely on reflection. Now for the code
(defn hn-reverse [coll] (reduce #(conj %1 %2) '() coll))
Simple basic clojure implementation walking the incoming collection and prepending to a list.
Using criterium for a quick benchmark on my machine gives me
(let [v (vec (range 1000))]
(bench (hn-reverse v)))
On my little XPS 13 gives me
Evaluation count : 2370120 in 60 samples of 39502 calls. Execution time mean : 26.090496 microsecond Execution time std-deviation : 832.330358 ns Execution time lower quantile : 25.220196 microsecond ( 2.5%) Execution time upper quantile : 28.154777 microsecond (97.5%) Overhead used : 11.468971 ns Found 5 outliers in 60 samples (8.3333 %) low-severe 5 (8.3333 %) Variance from outliers : 18.9765 % Variance is moderately inflated by outliers
Now how would an array perform here? (and yes I assumed it would be faster hence the name)
(defn fast-reverse [^ints array] ;; ^ints type annotation is essential here, otherwise this will be very slow ;; due to reflection on aget and aset (let [l (alength array)] (loop [i (dec l) j 0 a (int-array l)] (if (= j l) a (do (aset a i (aget array j)) (recur (dec i) (inc j) a))))))
Again quick criterium test
(let [a (int-array (range 1000))]
(bench (fast-reverse a)))
This time
Evaluation count : 5851680 in 60 samples of 97528 calls. Execution time mean : 10.561515 microsecond Execution time std-deviation : 336.219142 ns Execution time lower quantile : 10.240088 microsecond ( 2.5%) Execution time upper quantile : 11.385153 microsecond (97.5%) Overhead used : 11.468971 ns Found 3 outliers in 60 samples (5.0000 %) low-severe 3 (5.0000 %) Variance from outliers : 18.9721 % Variance is moderately inflated by outliers
Using native Java Arrays is about 2.5x faster than clojure vector and list implentation in this case.
I somehow expected a bigger difference given the jump from immutable to mutable datastructures and the less allocations needed but it seems like this kind of optimization only really makes sense in the edge cases.
What was the impact on the heap?
LikeLike
That’s a good question need to setup to look at that.
LikeLike
Having run looking at the memory using visual VM, there is a clear difference when looking at the number of object instances. In my sample running 1 million iterations on 10k items over 5 million PersistenList instances are on the heap, while there are only 17K array instances. So it seems there is a lot more GC pressure. Not unexpected I guess… Also my profiling skills might not be top in this case need to get more used on what to look for on clojure here.
LikeLike