I've been experimenting with Clojure lately. I tried writing my own map function (two actually) and timed them against the built in function. However, my map functions are way way slower than the built in one. I wanted to know how I could make my implementation faster. It should give me some insights into performance tuning Clojure algorithms I write. The first function (my-map) does recursion with recur. The second version (my-map-loop) uses loop/recur which was much faster than simply using recur.
(defn my-map
([func lst] (my-map func lst []))
([func lst acc]
(if (empty? lst)
acc
(recur func (rest lst) (conj acc (func (first lst)))))))
(defn my-map-loop
([func lst]
(loop [acc []
inner-lst lst]
(if (empty? inner-lst)
acc
(recur (conj acc (func (first inner-lst))) (rest inner-lst))
))))
(let [rng (range 1 10000)]
(time (map #(* % %) rng))
(time (my-map #(* % %) rng))
(time (my-map-loop #(* % %) rng)))
These are the results I got -
"Elapsed time: 0.084496 msecs"
"Elapsed time: 14.132217 msecs"
"Elapsed time: 7.324682 mess"
Update
After resueman pointed out that I was timing things incorrectly, I changed the functions to:
(let [rng (range 1 10000)]
(time (doall (map #(* % %) rng)))
(time (doall (my-map #(* % %) rng)))
(time (doall (my-map-loop #(* % %) rng)))
nil)
These are the new results:
"Elapsed time: 9.563343 msecs"
"Elapsed time: 12.320779 msecs"
"Elapsed time: 5.608647 mess"
"Elapsed time: 11.103316 msecs"
"Elapsed time: 18.307635 msecs"
"Elapsed time: 5.86644 mess"
"Elapsed time: 10.276658 msecs"
"Elapsed time: 10.288517 msecs"
"Elapsed time: 6.19183 mess"
"Elapsed time: 9.277224 msecs"
"Elapsed time: 13.070076 msecs"
"Elapsed time: 6.830464 mess"
Looks like my second implementation is fastest of the bunch. Anyways, I would still like to know if there are ways to optimize it further.
There are many things that could be leveraged to have a faster map: transients (for your accumulator), chunked seqs (for the source but only make sense when you want a lazy output), reducible collections (for the source again) and getting more familiar with the core functions (there's is a mapv).
You should also consider using Criterium instead of time if only for the fact that it checks whether your JVM optimizations are capped (which is the default with lein).
=> (let [rng (range 1 10000)]
(quick-bench (my-map-loop #(* % %) rng))
(quick-bench (into [] (map #(* % %)) rng)) ; leveraging reducible collections and transients
(quick-bench (mapv #(* % %) rng))) ; existing core fn
(output elided to keep only the means)
Execution time mean : 776,364755 µs
Execution time mean : 409,737852 µs
Execution time mean : 456,071295 µs
It is interesting to note that mapv is no faster than (into [] (map #(* % %)) rng) that is a generic way of optimizing these kinds of computations.
Related
I'm working on a Commodore 64 emulator as a fun project with functional programming. My goal was to write the entire thing functionally and as pure as possible. I was looking at using a hash table as my memory store, but the performance of mutable vs immutable hashes seems prohibitive. I liked the idea of a hash table as kind of sparse array of memory, since in many cases, memory won't actually be instantiated. I'd be fine using a vector as well, but there doesn't seem to be a functional version of vector-set.
(define (immut-hash [c (hash)] [r 10000000])
(when (> r 0) (immut-hash (hash-set c (random #xffff) (random #xff)) (- r 1))))
(define (mut-hash [c (make-hash)] [r 10000000])
(when (> r 0) (hash-set! c (random #xffff) (random #xff)) (mut-hash c (- r 1))))
(time (immut-hash)) vs (time (mut-hash)) is much worse, as a simulation of a bunch of memory pokes, and puts it beyond the ability of my macbook pro to keep up with a c64 clock rate.
(a) Is there any better approach to improve the performance of the mutable hashes in this case?
(b) If not, is there another functional approach people would suggest?
Note - I know that this isn't likely the right solution for absolute performance. Like I said..learning.
I know this is an old discussion, but it is the top hit for searching for the performance of Racket's hash-set (e.g. the immutable, functional way of setting a hash key value pair). Since 2019 when this article was posted and answered, the underlying Racket engine has changed to use Chez Scheme, and the performance ratios have also changed significantly.
Rerunning the above tests (I've included mutable vector operations as well, since the OP mentioned it):
#lang racket
(define (immut-hash [c (hash)] [r 10000000])
(when (> r 0) (immut-hash (hash-set c (random #xffff) (random #xff)) (- r 1))))
(define (mut-hash [c (make-hash)] [r 10000000])
(when (> r 0) (hash-set! c (random #xffff) (random #xff)) (mut-hash c (- r 1))))
(define (mut-vec [c (make-vector 65536)] [r 10000000])
(when (> r 0) (vector-set! c (random #xffff) (random #xff)) (mut-vec c (- r 1))))
(time (immut-hash (hash)))
(time (immut-hash (hasheq)))
(time (mut-hash (make-hash)))
(time (mut-hash (make-hasheq)))
(time (mut-vec))
produces the following results:
cpu time: 4024 real time: 4409 gc time: 198
cpu time: 3991 real time: 4334 gc time: 188
cpu time: 2532 real time: 2631 gc time: 17
cpu time: 2432 real time: 2524 gc time: 21
cpu time: 1985 real time: 2173 gc time: 11
Conclusions from the year 2021 (using Racket's new Chez Scheme 8.x engine):
The performance degradation from using hash/make-hash instead of hasheq/make-hasheq has essentially been eliminated.
The performance degradation from using immutable hashes instead of mutable hashes has gone from over 4x to less than 2x.
The worst case scenario (immutable hash) is now only 2x worse than the best case scenario (mutable vectors).
If you know that the keys of your hash will be fixnums, you could use hasheq (or make-hasheq) instead of hash (or make-hash). This gives a better performance, at least for Racket 7.4 3m variant on my Macbook Pro.
#lang racket
(define (immut-hash [c (hash)] [r 10000000])
(when (> r 0) (immut-hash (hash-set c (random #xffff) (random #xff)) (- r 1))))
(define (mut-hash [c (make-hash)] [r 10000000])
(when (> r 0) (hash-set! c (random #xffff) (random #xff)) (mut-hash c (- r 1))))
(time (immut-hash (hash)))
(time (immut-hash (hasheq)))
(time (mut-hash (make-hash)))
(time (mut-hash (make-hasheq)))
Here's the results:
cpu time: 9383 real time: 9447 gc time: 3181
cpu time: 6644 real time: 6658 gc time: 1105
cpu time: 2220 real time: 2225 gc time: 0
cpu time: 1647 real time: 1654 gc time: 0
There's a recent thread about performance of immutable hash. Jon compared the performance of immutable hash implemented by Patricia trie vs hash array mapped trie (HAMT), the hash type (eq? vs equal?), and the insertion order. You might want to take a look at the results.
I'm new to Clojure. I have the following code, which creates an infinite lazy sequence of numbers:
(defn generator [seed factor]
(drop 1 (reductions
(fn [acc _] (mod (* acc factor) 2147483647))
seed
; using dummy infinite seq to keep the reductions going
(repeat 1))))
Each number in the sequence is dependent on the previous calculation. I'm using reductions because I need all the intermediate results.
I then instantiate two generators like so:
(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))
I then want to compare n consecutive results of these sequences, for large n, and return the number of times they are equal.
At first I did something like:
(defn run []
(->> (interleave gen-a gen-b)
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
It was taking far too long and I saw the program's memory usage spike to about 4GB. With some printlns I saw that after about 10 million iterations it got really slow, so I was thinking that maybe count needed to store the entire sequence in memory, so I changed it to use reduce:
(defn run-2 []
(reduce
(fn [acc [a b]]
(if (= a b)
(inc acc)
acc))
0
(take 40000000 (partition 2 (interleave gen-a gen-b)))))
Still, it was allocating a lot of memory and slowing down significantly after the first couple of millions. I'm pretty sure that it's storing the entire lazy sequence in memory but I'm not sure why, so I tried to manually throw away the head:
(defn run-3 []
(loop [xs (take 40000000 (partition 2 (interleave gen-a gen-b)))
total 0]
(cond
(empty? xs) total
(apply = (first xs)) (recur (rest xs) (inc total))
:else (recur (rest xs) total))))
Again, same results. This stumped me because I'm reading that all of the functions I'm using to create my xs sequence are lazy, and since I'm only using the current item I'm expecting it to use constant memory.
Coming from a Python background I'm basically trying to emulate Python Generators. I'm probably missing something obvious, so I'd really appreciate some pointers. Thanks!
Generators are not (lazy) sequences.
You are holding on to the head here:
(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))
gen-a and gen-b are gobal vars referring to the head a sequence.
You probably want something like:
(defn run []
(->> (interleave (generator 59 16807) (generator 393 48271))
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
Alternatively, define gen-a and gen-b as functions:
(defn gen-a
[]
(generator 59 16807)))
...
(defn run []
(->> (interleave (gen-a) (gen-b))
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
You can get Python-style generator functions in Clojure using the Tupelo library. Just use lazy-gen and yield like so:
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t] ))
(defn rand-gen
[seed factor]
(t/lazy-gen
(loop [acc seed]
(let [next (mod (* acc factor) 2147483647)]
(t/yield next)
(recur next)))))
(defn run2 [num-rand]
(->> (interleave
; restrict to [0..99] to simulate bad rand #'s
(map #(mod % 100) (rand-gen 59 16807))
(map #(mod % 100) (rand-gen 393 48271)))
(partition 2)
(take num-rand)
(filter #(apply = %))
(count)))
(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5
with result:
"Elapsed time: 409.697922 msecs" (time (run2 100000.0)) => 1025
"Elapsed time: 3250.592798 msecs" (time (run2 1000000.0)) => 9970
"Elapsed time: 32995.194574 msecs" (time (run2 1.0E7)) => 100068
Rather than using reductions, you could build a lazy sequence directly. This answer uses lazy-cons from the Tupelo library (you could also use lazy-seq from clojure.core).
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t] ))
(defn rand-gen
[seed factor]
(let [next (mod (* seed factor) 2147483647)]
(t/lazy-cons next (rand-gen next factor))))
(defn run2 [num-rand]
(->> (interleave
; restrict to [0..99] to simulate bad rand #'s
(map #(mod % 100) (rand-gen 59 16807))
(map #(mod % 100) (rand-gen 393 48271)))
(partition 2)
(take num-rand)
(filter #(apply = %))
(count)))
(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5
with results:
"Elapsed time: 90.42 msecs" (time (run2 100000.0)) => 1025
"Elapsed time: 862.60 msecs" (time (run2 1000000.0)) => 9970
"Elapsed time: 8474.25 msecs" (time (run2 1.0E7)) => 100068
Note that the execution times are about 4x faster, since we have cut out the generator function stuff that we weren't really using anyway.
In the Clojure documentation on type hinting, it has the following example on how type hinting and coercions can make code run much faster:
(defn foo [n]
(loop [i 0]
(if (< i n)
(recur (inc i))
i)))
(time (foo 100000))
"Elapsed time: 0.391 msecs"
100000
(defn foo2 [n]
(let [n (int n)]
(loop [i (int 0)]
(if (< i n)
(recur (inc i))
i))))
(time (foo2 100000))
"Elapsed time: 0.084 msecs"
100000
If you run this code with (set! *warn-on-reflection* true), it doesn't show a reflection warning. Is it up to programmer trial-and-error to see where these kinds of adornments make a performance difference? Or is there a tool that indicates the problematic areas?
Well you can estimate this pretty well, just by thinking about which parts of the code gets hit often.
Or you could use a normal profiler of some sort. I would recommend VIsual VM, which you can get to work with clojure. Then you just place them in the methods you see take most of the time (it will also show you calls to java.lang.reflect.Method, if this gets called a lot you should consider using type hints).
Okay, the title is not exactly what I was looking for, but it has to do I found an interesting thing in the speed of record's member function access. I'll illustrate with this REPL session:
==> (defprotocol Add (add [_]))
Add
==> (defrecord R [x y] Add (add [_] (+ x y)))
=.R
==> (let [r (->R 1 2)] (time (dotimes [_ 100000] (add r)))) ; Pure functional style
"Elapsed time: 19.613694 msecs"
nil
==> (let [r (->R 1 2)] (time (dotimes [_ 100000] (.add r)))) ; Functional creation, but with method call
"Elapsed time: 477.29611 msecs"
nil
==> (let [r (R. 1 2)] (time (dotimes [_ 100000] (.add r)))) ; Java-style
"Elapsed time: 10.051506 msecs"
nil
==> (let [r (R. 1 2)] (time (dotimes [_ 100000] (add r)))) ; Java-style creation with functional call
"Elapsed time: 18.726801 msecs"
nil
I can't really see the reason for these differences, so I'm asking that from you.
The problem with your second call is that Clojure compiler is unable to determine the type of r variable at compilation time, so it is forced to use reflections.
To avoid it you should add type hint:
(let [^user.R r (->R 1 2)] (time (dotimes [_ 100000] (.add r))))
or simply
(let [^R r (->R 1 2)] (time (dotimes [_ 100000] (.add r))))
and it'll be just as fast as Java-style method call.
If you want to easily diagnose such problems in your code, set *warn-on-reflection* flag to true:
(set! *warn-on-reflection* true)
or add it to :global-vars section in your project.clj file:
:global-vars {*warn-on-reflection* true}
So, as you can see, without reflections method calls are a little bit faster than functional calls. But reflections could make method calls really slow.
I am performing element-wise operations on two vectors on the order of 50,000 elements in size, and having unsatisfactory performance issues (a few seconds). Are there any obvious performance issues to be made, such as using a different data structure?
(defn boolean-compare
"Sum up 1s if matching 0 otherwise"
[proposal-img data-img]
(sum
(map
#(Math/abs (- (first %) (second %)))
(partition 2 (interleave proposal-img data-img)))))
Try this:
(apply + (map bit-xor proposal-img data-img)))
Some notes:
mapping a function to several collections uses an element from each as the arguments to the function - no need to interleave and partition for this.
If your data is 1's and 0's, then xor will be faster than absolute difference
Timed example:
(def data-img (repeatedly 50000 #(rand-int 2)))
(def proposal-img (repeatedly 50000 #(rand-int 2)))
(def sum (partial apply +))
After warming up the JVM...
(time (boolean-compare proposal-img data-img))
;=> "Elapsed time: 528.731093 msecs"
;=> 24802
(time (apply + (map bit-xor proposal-img data-img)))
;=> "Elapsed time: 22.481255 msecs"
;=> 24802
You should look at adopting core.matrix if you are interested in good performance for large vector operations.
In particular, the vectorz-clj library (a core.matrix implementation) has some very fast implementations for most common vector operations with double values.
(def v1 (array (repeatedly 50000 #(rand-int 2))))
(def v2 (array (repeatedly 50000 #(rand-int 2))))
(time (let [d (sub v2 v1)] ;; take difference of two vectors
(.abs d) ;; calculate absolute value (mutate d)
(esum d))) ;; sum elements and return result
=> "Elapsed time: 0.949985 msecs"
=> 24980.0
i.e. under 20ns per pair of elements - that's pretty quick: you'd be hard pressed to beat that without resorting to low-level array-fiddling code.