How big is the overhead of using Clojure datastructures vs native Java Arrays? I was rather suprised to find it is lower than I expected, and the implementation looks much better using Clojure native vectors and lists than Java Arrays.
First an essentail when doing something like this
(set! *warn-on-reflection* true)
Without that it is easy to accidentally create arrays very inefficient calls via the array accessor and setter methods by having to rely on reflection. Now for the code
(defn hn-reverse
[coll]
(reduce #(conj %1 %2) '() coll))
Simple basic clojure implementation walking the incoming collection and prepending to a list.
Using criterium for a quick benchmark on my machine gives me
(let [v (vec (range 1000))]
(bench (hn-reverse v)))
On my little XPS 13 gives me
Evaluation count : 2370120 in 60 samples of 39502 calls.
Execution time mean : 26.090496 microsecond
Execution time std-deviation : 832.330358 ns
Execution time lower quantile : 25.220196 microsecond ( 2.5%)
Execution time upper quantile : 28.154777 microsecond (97.5%)
Overhead used : 11.468971 ns
Found 5 outliers in 60 samples (8.3333 %)
low-severe 5 (8.3333 %)
Variance from outliers : 18.9765 % Variance is moderately inflated by outliers
Now how would an array perform here? (and yes I assumed it would be faster hence the name)
(defn fast-reverse
[^ints array]
;; ^ints type annotation is essential here, otherwise this will be very slow
;; due to reflection on aget and aset
(let [l (alength array)]
(loop [i (dec l)
j 0
a (int-array l)]
(if (= j l) a
(do
(aset a i (aget array j))
(recur (dec i) (inc j) a))))))
Again quick criterium test
(let [a (int-array (range 1000))]
(bench (fast-reverse a)))
This time
Evaluation count : 5851680 in 60 samples of 97528 calls.
Execution time mean : 10.561515 microsecond
Execution time std-deviation : 336.219142 ns
Execution time lower quantile : 10.240088 microsecond ( 2.5%)
Execution time upper quantile : 11.385153 microsecond (97.5%)
Overhead used : 11.468971 ns
Found 3 outliers in 60 samples (5.0000 %)
low-severe 3 (5.0000 %)
Variance from outliers : 18.9721 % Variance is moderately inflated by outliers
Using native Java Arrays is about 2.5x faster than clojure vector and list implentation in this case.
I somehow expected a bigger difference given the jump from immutable to mutable datastructures and the less allocations needed but it seems like this kind of optimization only really makes sense in the edge cases.