Difference in Clojure's record methods - performance

Okay, the title is not exactly what I was looking for, but it has to do I found an interesting thing in the speed of record's member function access. I'll illustrate with this REPL session:
==> (defprotocol Add (add [_]))
Add
==> (defrecord R [x y] Add (add [_] (+ x y)))
=.R
==> (let [r (->R 1 2)] (time (dotimes [_ 100000] (add r)))) ; Pure functional style
"Elapsed time: 19.613694 msecs"
nil
==> (let [r (->R 1 2)] (time (dotimes [_ 100000] (.add r)))) ; Functional creation, but with method call
"Elapsed time: 477.29611 msecs"
nil
==> (let [r (R. 1 2)] (time (dotimes [_ 100000] (.add r)))) ; Java-style
"Elapsed time: 10.051506 msecs"
nil
==> (let [r (R. 1 2)] (time (dotimes [_ 100000] (add r)))) ; Java-style creation with functional call
"Elapsed time: 18.726801 msecs"
nil
I can't really see the reason for these differences, so I'm asking that from you.

The problem with your second call is that Clojure compiler is unable to determine the type of r variable at compilation time, so it is forced to use reflections.
To avoid it you should add type hint:
(let [^user.R r (->R 1 2)] (time (dotimes [_ 100000] (.add r))))
or simply
(let [^R r (->R 1 2)] (time (dotimes [_ 100000] (.add r))))
and it'll be just as fast as Java-style method call.
If you want to easily diagnose such problems in your code, set *warn-on-reflection* flag to true:
(set! *warn-on-reflection* true)
or add it to :global-vars section in your project.clj file:
:global-vars {*warn-on-reflection* true}
So, as you can see, without reflections method calls are a little bit faster than functional calls. But reflections could make method calls really slow.

Related

Clojure: Reducing large lazy collection eats up memory

I'm new to Clojure. I have the following code, which creates an infinite lazy sequence of numbers:
(defn generator [seed factor]
(drop 1 (reductions
(fn [acc _] (mod (* acc factor) 2147483647))
seed
; using dummy infinite seq to keep the reductions going
(repeat 1))))
Each number in the sequence is dependent on the previous calculation. I'm using reductions because I need all the intermediate results.
I then instantiate two generators like so:
(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))
I then want to compare n consecutive results of these sequences, for large n, and return the number of times they are equal.
At first I did something like:
(defn run []
(->> (interleave gen-a gen-b)
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
It was taking far too long and I saw the program's memory usage spike to about 4GB. With some printlns I saw that after about 10 million iterations it got really slow, so I was thinking that maybe count needed to store the entire sequence in memory, so I changed it to use reduce:
(defn run-2 []
(reduce
(fn [acc [a b]]
(if (= a b)
(inc acc)
acc))
0
(take 40000000 (partition 2 (interleave gen-a gen-b)))))
Still, it was allocating a lot of memory and slowing down significantly after the first couple of millions. I'm pretty sure that it's storing the entire lazy sequence in memory but I'm not sure why, so I tried to manually throw away the head:
(defn run-3 []
(loop [xs (take 40000000 (partition 2 (interleave gen-a gen-b)))
total 0]
(cond
(empty? xs) total
(apply = (first xs)) (recur (rest xs) (inc total))
:else (recur (rest xs) total))))
Again, same results. This stumped me because I'm reading that all of the functions I'm using to create my xs sequence are lazy, and since I'm only using the current item I'm expecting it to use constant memory.
Coming from a Python background I'm basically trying to emulate Python Generators. I'm probably missing something obvious, so I'd really appreciate some pointers. Thanks!
Generators are not (lazy) sequences.
You are holding on to the head here:
(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))
gen-a and gen-b are gobal vars referring to the head a sequence.
You probably want something like:
(defn run []
(->> (interleave (generator 59 16807) (generator 393 48271))
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
Alternatively, define gen-a and gen-b as functions:
(defn gen-a
[]
(generator 59 16807)))
...
(defn run []
(->> (interleave (gen-a) (gen-b))
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
You can get Python-style generator functions in Clojure using the Tupelo library. Just use lazy-gen and yield like so:
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t] ))
(defn rand-gen
[seed factor]
(t/lazy-gen
(loop [acc seed]
(let [next (mod (* acc factor) 2147483647)]
(t/yield next)
(recur next)))))
(defn run2 [num-rand]
(->> (interleave
; restrict to [0..99] to simulate bad rand #'s
(map #(mod % 100) (rand-gen 59 16807))
(map #(mod % 100) (rand-gen 393 48271)))
(partition 2)
(take num-rand)
(filter #(apply = %))
(count)))
(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5
with result:
"Elapsed time: 409.697922 msecs" (time (run2 100000.0)) => 1025
"Elapsed time: 3250.592798 msecs" (time (run2 1000000.0)) => 9970
"Elapsed time: 32995.194574 msecs" (time (run2 1.0E7)) => 100068
Rather than using reductions, you could build a lazy sequence directly. This answer uses lazy-cons from the Tupelo library (you could also use lazy-seq from clojure.core).
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t] ))
(defn rand-gen
[seed factor]
(let [next (mod (* seed factor) 2147483647)]
(t/lazy-cons next (rand-gen next factor))))
(defn run2 [num-rand]
(->> (interleave
; restrict to [0..99] to simulate bad rand #'s
(map #(mod % 100) (rand-gen 59 16807))
(map #(mod % 100) (rand-gen 393 48271)))
(partition 2)
(take num-rand)
(filter #(apply = %))
(count)))
(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5
with results:
"Elapsed time: 90.42 msecs" (time (run2 100000.0)) => 1025
"Elapsed time: 862.60 msecs" (time (run2 1000000.0)) => 9970
"Elapsed time: 8474.25 msecs" (time (run2 1.0E7)) => 100068
Note that the execution times are about 4x faster, since we have cut out the generator function stuff that we weren't really using anyway.

How to know where to put type hints to improve numeric performance in Clojure?

In the Clojure documentation on type hinting, it has the following example on how type hinting and coercions can make code run much faster:
(defn foo [n]
(loop [i 0]
(if (< i n)
(recur (inc i))
i)))
(time (foo 100000))
"Elapsed time: 0.391 msecs"
100000
(defn foo2 [n]
(let [n (int n)]
(loop [i (int 0)]
(if (< i n)
(recur (inc i))
i))))
(time (foo2 100000))
"Elapsed time: 0.084 msecs"
100000
If you run this code with (set! *warn-on-reflection* true), it doesn't show a reflection warning. Is it up to programmer trial-and-error to see where these kinds of adornments make a performance difference? Or is there a tool that indicates the problematic areas?
Well you can estimate this pretty well, just by thinking about which parts of the code gets hit often.
Or you could use a normal profiler of some sort. I would recommend VIsual VM, which you can get to work with clojure. Then you just place them in the methods you see take most of the time (it will also show you calls to java.lang.reflect.Method, if this gets called a lot you should consider using type hints).

Making my Clojure map function implementation faster

I've been experimenting with Clojure lately. I tried writing my own map function (two actually) and timed them against the built in function. However, my map functions are way way slower than the built in one. I wanted to know how I could make my implementation faster. It should give me some insights into performance tuning Clojure algorithms I write. The first function (my-map) does recursion with recur. The second version (my-map-loop) uses loop/recur which was much faster than simply using recur.
(defn my-map
([func lst] (my-map func lst []))
([func lst acc]
(if (empty? lst)
acc
(recur func (rest lst) (conj acc (func (first lst)))))))
(defn my-map-loop
([func lst]
(loop [acc []
inner-lst lst]
(if (empty? inner-lst)
acc
(recur (conj acc (func (first inner-lst))) (rest inner-lst))
))))
(let [rng (range 1 10000)]
(time (map #(* % %) rng))
(time (my-map #(* % %) rng))
(time (my-map-loop #(* % %) rng)))
These are the results I got -
"Elapsed time: 0.084496 msecs"
"Elapsed time: 14.132217 msecs"
"Elapsed time: 7.324682 mess"
Update
After resueman pointed out that I was timing things incorrectly, I changed the functions to:
(let [rng (range 1 10000)]
(time (doall (map #(* % %) rng)))
(time (doall (my-map #(* % %) rng)))
(time (doall (my-map-loop #(* % %) rng)))
nil)
These are the new results:
"Elapsed time: 9.563343 msecs"
"Elapsed time: 12.320779 msecs"
"Elapsed time: 5.608647 mess"
"Elapsed time: 11.103316 msecs"
"Elapsed time: 18.307635 msecs"
"Elapsed time: 5.86644 mess"
"Elapsed time: 10.276658 msecs"
"Elapsed time: 10.288517 msecs"
"Elapsed time: 6.19183 mess"
"Elapsed time: 9.277224 msecs"
"Elapsed time: 13.070076 msecs"
"Elapsed time: 6.830464 mess"
Looks like my second implementation is fastest of the bunch. Anyways, I would still like to know if there are ways to optimize it further.
There are many things that could be leveraged to have a faster map: transients (for your accumulator), chunked seqs (for the source but only make sense when you want a lazy output), reducible collections (for the source again) and getting more familiar with the core functions (there's is a mapv).
You should also consider using Criterium instead of time if only for the fact that it checks whether your JVM optimizations are capped (which is the default with lein).
=> (let [rng (range 1 10000)]
(quick-bench (my-map-loop #(* % %) rng))
(quick-bench (into [] (map #(* % %)) rng)) ; leveraging reducible collections and transients
(quick-bench (mapv #(* % %) rng))) ; existing core fn
(output elided to keep only the means)
Execution time mean : 776,364755 µs
Execution time mean : 409,737852 µs
Execution time mean : 456,071295 µs
It is interesting to note that mapv is no faster than (into [] (map #(* % %)) rng) that is a generic way of optimizing these kinds of computations.

Takeuchi numbers in Clojure (performance)

When computing Takeuchi numbers, we need to figure out the number of times the function calls itself. I quickly came up with:
(def number (atom 0))
(defn tak [x y z]
(if (<= x y)
y
(do
(dosync (swap! number inc))
(tak (tak (dec x) y z)
(tak (dec y) z x)
(tak (dec z) x y)))))
(defn takeuchi_number [n]
(dosync (reset! number 0))
(tak n 0 (inc n))
#number)
(time (takeuchi_number 10))
; 1029803
; "Elapsed time: 11155.012266 msecs"
But the performance is really bad. How to make it blazingly fast in Clojure ?
As someone says, removing the dosync seems to improve things by a factor of 10, but that isn't the whole story. Once the JVM has hotspotted your code it gets a further factor of 10 faster. This is why you should be using criterium or similar to test real-world speed...
(def number (atom 0))
(defn tak [x y z]
(if (<= x y)
y
(do
(swap! number inc)
(tak (tak (dec x) y z)
(tak (dec y) z x)
(tak (dec z) x y)))))
(defn takeuchi_number [n]
(reset! number 0)
(tak n 0 (inc n))
#number)
;=> (time (takeuchi_number 10))
; "Elapsed time: 450.028 msecs"
; 1029803
;=> (time (takeuchi_number 10))
; "Elapsed time: 42.008 msecs"
; 1029803
Original with dosync was about 5s on my machine, so we're two orders of base 10 magnitude up already! Is this the best we can do? Let's refactor to pure functions and get away from the counter.
(defn tak [c x y z]
(if (<= x y)
[c y]
(let [[a- x-] (tak 0 (dec x) y z)
[b- y-] (tak 0 (dec y) z x)
[c- z-] (tak 0 (dec z) x y)]
(recur (+' 1 a- b- c- c) x- y- z-))))
(defn takeuchi_number [n]
(tak 0 n 0 (inc n)))
;=> (time (takeuchi_number 10))
; "Elapsed time: 330.741 msecs"
; [1029803 11]
;=> (time (takeuchi_number 10))
; "Elapsed time: 137.829 msecs"
; [1029803 11]
;=> (time (takeuchi_number 10))
; "Elapsed time: 136.866 msecs"
; [1029803 11]
Not as good. The cost of holding the state in the vector and passing it around is likely an overhead. However, now we've refactored to purity, let's take advantage of our good behaviour!
=> (def tak (memoize tak))
#'euler.tak/tak
=> (time (takeuchi_number 10))
"Elapsed time: 1.401 msecs"
[1029803 11]
A healthy 3000 or so times faster. Works for me.
A purely functional way of implementing this would be for your tak function to return a pair [result count], where result is the actual result of the tak computation and count is the number of times the function recursively called itself. But in this case, I think that would cause all sorts of painful contortions in the body of the function and wouldn't be worth it.
The usage of atom here, while idiomatic Clojure, imposes unnecessary overhead; it's really targeted at synchronizing independent updates to shared state between threads. Basically what you want is a mutable object you can pass around to recursive function calls in the same thread, with no synchronization required. An array should be sufficient for that purpose:
(defn tak [x y z ^longs counter]
(if (<= x y)
y
(do
(aset counter 0 (inc (aget counter 0)))
(tak (tak (dec x) y z counter)
(tak (dec y) z x counter)
(tak (dec z) x y counter)
counter))))
(defn takeuchi_number [n]
(let [counter (long-array [0])]
(tak n 0 (inc n) counter)
(aget counter 0)))
Note that I've moved the counter definition from being a global constant to being a parameter on the helper function, to ensure that the mutable state is only used locally within that function.

map part of the vector efficiently in clojure

I wonder how this can be done in Clojure idiomatically and efficiently:
1) Given a vector containing n integers in it: [A0 A1 A2 A3 ... An]
2) Increase the last x items by 1 (let's say x is 100) so the vector will become: [A0 A1 A2 A3 ... (An-99 + 1) (An-98 + 1)... (An-1 + 1) (An + 1)]
One naive implementation looks like:
(defn inc-last [x nums]
(let [n (count nums)]
(map #(if (>= % (- n x)) (inc %2) %2)
(range n)
nums)))
(inc-last 2 [1 2 3 4])
;=> [1 2 4 5]
In this implementation, basically you just map the entire vector to another vector by examine each item to see if it needs to be increased.
However, this is an O(n) operation while I only want to change the last x items in the vector. Ideally, this should be done in O(x) instead of O(n).
I am considering using some functions like split-at/concat to implement it like below:
(defn inc-last [x nums]
(let [[nums1 nums2] (split-at x nums)]
(concat nums1 (map inc nums2))))
However, I am not sure if this implementation is O(n) or O(x). I am new to Clojure and not really sure what the time complexity will be for operations like concat/split-at on persistent data structures in Clojure.
So my questions are:
1) What the time complexity here in second implementation?
2) If it is still O(n), is there any idiomatic and efficient implementation that takes only O(x) in Clojure for solving this problem?
Any comment is appreciated. Thanks.
Update:
noisesmith's answer told me that split-at will convert the vector into a list, which was a fact I did not realised previously. Since I will do random access for the result (call nth after processing the vector), I would like to have an efficient solution (O(x) time) while keeping the vector instead of list otherwise nth will slow down my program as well.
Concat and split-at both turn the input into a seq, effectively a linked-list representation, O(x) time. Here is how to do it with a vector for O(n) performance.
user> (defn inc-last-n
[n x]
(let [count (count x)
update (fn [x i] (update-in x [i] inc))]
(reduce update x (range (- count n) count))))
#'user/inc-last-n
user> (inc-last-n 3 [0 1 2 3 4 5 6])
[0 1 2 3 5 6 7]
This will fail on input that is not associative (like seq / lazy-seq) because there is no O(1) access time in non-associative types.
inc-last is an implementation using a transient, which allows to get a modifiable "in place" vector in constant time and return a persistent! vector also in constant time, which allows to make the updates in O(x). The original implementation used an imperative doseq loop but, as mentioned in the comments, transient operations can return a new object, so it's better to keep doing things in a functional way.
I added a doall to the call to inc-last-2 since it returns a lazy seq, but inc-last and inc-last-3 returns a vector so the doall is needed to be able to compare them all.
According to some quick tests I made, inc-last and inc-last-3 don't actually differ much in performance, not even for huge vectors (10000000 elements). For the inc-last-2 implementation though, there's quite a difference even for a vector of 1000 elements, modifying only the last 10, it's ~100x slower. For smaller vectors or when the n is close to (count nums) the difference is not really that much.
(Thanks to Michał Marczyk for his useful comments)
(def x (vec (range 1000)))
(defn inc-last [n x]
(let [x (transient x)
l (count x)]
(->>
(range (- l n) l)
(reduce #(assoc! %1 %2 (inc (%1 %2))) x)
persistent!)))
(defn inc-last-2 [x nums]
(let [n (count nums)]
(map #(if (>= % (- n x)) (inc %2) %2)
(range n)
nums)))
(defn inc-last-3 [n x]
(let [l (count x)]
(reduce #(assoc %1 %2 (inc (%1 %2))) x (range (- l n) l))))
(time
(dotimes [i 100]
(inc-last 50 x)))
(time
(dotimes [i 100]
(doall (inc-last-2 10 x))))
(time
(dotimes [i 100]
(inc-last-3 50 x)))
;=> "Elapsed time: 49.7965 msecs"
;=> "Elapsed time: 1751.964501 msecs"
;=> "Elapsed time: 67.651 msecs"

Resources