Clojure: Reducing large lazy collection eats up memory - performance

I'm new to Clojure. I have the following code, which creates an infinite lazy sequence of numbers:
(defn generator [seed factor]
(drop 1 (reductions
(fn [acc _] (mod (* acc factor) 2147483647))
seed
; using dummy infinite seq to keep the reductions going
(repeat 1))))
Each number in the sequence is dependent on the previous calculation. I'm using reductions because I need all the intermediate results.
I then instantiate two generators like so:
(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))
I then want to compare n consecutive results of these sequences, for large n, and return the number of times they are equal.
At first I did something like:
(defn run []
(->> (interleave gen-a gen-b)
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
It was taking far too long and I saw the program's memory usage spike to about 4GB. With some printlns I saw that after about 10 million iterations it got really slow, so I was thinking that maybe count needed to store the entire sequence in memory, so I changed it to use reduce:
(defn run-2 []
(reduce
(fn [acc [a b]]
(if (= a b)
(inc acc)
acc))
0
(take 40000000 (partition 2 (interleave gen-a gen-b)))))
Still, it was allocating a lot of memory and slowing down significantly after the first couple of millions. I'm pretty sure that it's storing the entire lazy sequence in memory but I'm not sure why, so I tried to manually throw away the head:
(defn run-3 []
(loop [xs (take 40000000 (partition 2 (interleave gen-a gen-b)))
total 0]
(cond
(empty? xs) total
(apply = (first xs)) (recur (rest xs) (inc total))
:else (recur (rest xs) total))))
Again, same results. This stumped me because I'm reading that all of the functions I'm using to create my xs sequence are lazy, and since I'm only using the current item I'm expecting it to use constant memory.
Coming from a Python background I'm basically trying to emulate Python Generators. I'm probably missing something obvious, so I'd really appreciate some pointers. Thanks!

Generators are not (lazy) sequences.
You are holding on to the head here:
(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))
gen-a and gen-b are gobal vars referring to the head a sequence.
You probably want something like:
(defn run []
(->> (interleave (generator 59 16807) (generator 393 48271))
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
Alternatively, define gen-a and gen-b as functions:
(defn gen-a
[]
(generator 59 16807)))
...
(defn run []
(->> (interleave (gen-a) (gen-b))
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))

You can get Python-style generator functions in Clojure using the Tupelo library. Just use lazy-gen and yield like so:
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t] ))
(defn rand-gen
[seed factor]
(t/lazy-gen
(loop [acc seed]
(let [next (mod (* acc factor) 2147483647)]
(t/yield next)
(recur next)))))
(defn run2 [num-rand]
(->> (interleave
; restrict to [0..99] to simulate bad rand #'s
(map #(mod % 100) (rand-gen 59 16807))
(map #(mod % 100) (rand-gen 393 48271)))
(partition 2)
(take num-rand)
(filter #(apply = %))
(count)))
(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5
with result:
"Elapsed time: 409.697922 msecs" (time (run2 100000.0)) => 1025
"Elapsed time: 3250.592798 msecs" (time (run2 1000000.0)) => 9970
"Elapsed time: 32995.194574 msecs" (time (run2 1.0E7)) => 100068

Rather than using reductions, you could build a lazy sequence directly. This answer uses lazy-cons from the Tupelo library (you could also use lazy-seq from clojure.core).
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t] ))
(defn rand-gen
[seed factor]
(let [next (mod (* seed factor) 2147483647)]
(t/lazy-cons next (rand-gen next factor))))
(defn run2 [num-rand]
(->> (interleave
; restrict to [0..99] to simulate bad rand #'s
(map #(mod % 100) (rand-gen 59 16807))
(map #(mod % 100) (rand-gen 393 48271)))
(partition 2)
(take num-rand)
(filter #(apply = %))
(count)))
(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5
with results:
"Elapsed time: 90.42 msecs" (time (run2 100000.0)) => 1025
"Elapsed time: 862.60 msecs" (time (run2 1000000.0)) => 9970
"Elapsed time: 8474.25 msecs" (time (run2 1.0E7)) => 100068
Note that the execution times are about 4x faster, since we have cut out the generator function stuff that we weren't really using anyway.

Related

Making my Clojure map function implementation faster

I've been experimenting with Clojure lately. I tried writing my own map function (two actually) and timed them against the built in function. However, my map functions are way way slower than the built in one. I wanted to know how I could make my implementation faster. It should give me some insights into performance tuning Clojure algorithms I write. The first function (my-map) does recursion with recur. The second version (my-map-loop) uses loop/recur which was much faster than simply using recur.
(defn my-map
([func lst] (my-map func lst []))
([func lst acc]
(if (empty? lst)
acc
(recur func (rest lst) (conj acc (func (first lst)))))))
(defn my-map-loop
([func lst]
(loop [acc []
inner-lst lst]
(if (empty? inner-lst)
acc
(recur (conj acc (func (first inner-lst))) (rest inner-lst))
))))
(let [rng (range 1 10000)]
(time (map #(* % %) rng))
(time (my-map #(* % %) rng))
(time (my-map-loop #(* % %) rng)))
These are the results I got -
"Elapsed time: 0.084496 msecs"
"Elapsed time: 14.132217 msecs"
"Elapsed time: 7.324682 mess"
Update
After resueman pointed out that I was timing things incorrectly, I changed the functions to:
(let [rng (range 1 10000)]
(time (doall (map #(* % %) rng)))
(time (doall (my-map #(* % %) rng)))
(time (doall (my-map-loop #(* % %) rng)))
nil)
These are the new results:
"Elapsed time: 9.563343 msecs"
"Elapsed time: 12.320779 msecs"
"Elapsed time: 5.608647 mess"
"Elapsed time: 11.103316 msecs"
"Elapsed time: 18.307635 msecs"
"Elapsed time: 5.86644 mess"
"Elapsed time: 10.276658 msecs"
"Elapsed time: 10.288517 msecs"
"Elapsed time: 6.19183 mess"
"Elapsed time: 9.277224 msecs"
"Elapsed time: 13.070076 msecs"
"Elapsed time: 6.830464 mess"
Looks like my second implementation is fastest of the bunch. Anyways, I would still like to know if there are ways to optimize it further.
There are many things that could be leveraged to have a faster map: transients (for your accumulator), chunked seqs (for the source but only make sense when you want a lazy output), reducible collections (for the source again) and getting more familiar with the core functions (there's is a mapv).
You should also consider using Criterium instead of time if only for the fact that it checks whether your JVM optimizations are capped (which is the default with lein).
=> (let [rng (range 1 10000)]
(quick-bench (my-map-loop #(* % %) rng))
(quick-bench (into [] (map #(* % %)) rng)) ; leveraging reducible collections and transients
(quick-bench (mapv #(* % %) rng))) ; existing core fn
(output elided to keep only the means)
Execution time mean : 776,364755 µs
Execution time mean : 409,737852 µs
Execution time mean : 456,071295 µs
It is interesting to note that mapv is no faster than (into [] (map #(* % %)) rng) that is a generic way of optimizing these kinds of computations.

Takeuchi numbers in Clojure (performance)

When computing Takeuchi numbers, we need to figure out the number of times the function calls itself. I quickly came up with:
(def number (atom 0))
(defn tak [x y z]
(if (<= x y)
y
(do
(dosync (swap! number inc))
(tak (tak (dec x) y z)
(tak (dec y) z x)
(tak (dec z) x y)))))
(defn takeuchi_number [n]
(dosync (reset! number 0))
(tak n 0 (inc n))
#number)
(time (takeuchi_number 10))
; 1029803
; "Elapsed time: 11155.012266 msecs"
But the performance is really bad. How to make it blazingly fast in Clojure ?
As someone says, removing the dosync seems to improve things by a factor of 10, but that isn't the whole story. Once the JVM has hotspotted your code it gets a further factor of 10 faster. This is why you should be using criterium or similar to test real-world speed...
(def number (atom 0))
(defn tak [x y z]
(if (<= x y)
y
(do
(swap! number inc)
(tak (tak (dec x) y z)
(tak (dec y) z x)
(tak (dec z) x y)))))
(defn takeuchi_number [n]
(reset! number 0)
(tak n 0 (inc n))
#number)
;=> (time (takeuchi_number 10))
; "Elapsed time: 450.028 msecs"
; 1029803
;=> (time (takeuchi_number 10))
; "Elapsed time: 42.008 msecs"
; 1029803
Original with dosync was about 5s on my machine, so we're two orders of base 10 magnitude up already! Is this the best we can do? Let's refactor to pure functions and get away from the counter.
(defn tak [c x y z]
(if (<= x y)
[c y]
(let [[a- x-] (tak 0 (dec x) y z)
[b- y-] (tak 0 (dec y) z x)
[c- z-] (tak 0 (dec z) x y)]
(recur (+' 1 a- b- c- c) x- y- z-))))
(defn takeuchi_number [n]
(tak 0 n 0 (inc n)))
;=> (time (takeuchi_number 10))
; "Elapsed time: 330.741 msecs"
; [1029803 11]
;=> (time (takeuchi_number 10))
; "Elapsed time: 137.829 msecs"
; [1029803 11]
;=> (time (takeuchi_number 10))
; "Elapsed time: 136.866 msecs"
; [1029803 11]
Not as good. The cost of holding the state in the vector and passing it around is likely an overhead. However, now we've refactored to purity, let's take advantage of our good behaviour!
=> (def tak (memoize tak))
#'euler.tak/tak
=> (time (takeuchi_number 10))
"Elapsed time: 1.401 msecs"
[1029803 11]
A healthy 3000 or so times faster. Works for me.
A purely functional way of implementing this would be for your tak function to return a pair [result count], where result is the actual result of the tak computation and count is the number of times the function recursively called itself. But in this case, I think that would cause all sorts of painful contortions in the body of the function and wouldn't be worth it.
The usage of atom here, while idiomatic Clojure, imposes unnecessary overhead; it's really targeted at synchronizing independent updates to shared state between threads. Basically what you want is a mutable object you can pass around to recursive function calls in the same thread, with no synchronization required. An array should be sufficient for that purpose:
(defn tak [x y z ^longs counter]
(if (<= x y)
y
(do
(aset counter 0 (inc (aget counter 0)))
(tak (tak (dec x) y z counter)
(tak (dec y) z x counter)
(tak (dec z) x y counter)
counter))))
(defn takeuchi_number [n]
(let [counter (long-array [0])]
(tak n 0 (inc n) counter)
(aget counter 0)))
Note that I've moved the counter definition from being a global constant to being a parameter on the helper function, to ensure that the mutable state is only used locally within that function.

Difference in Clojure's record methods

Okay, the title is not exactly what I was looking for, but it has to do I found an interesting thing in the speed of record's member function access. I'll illustrate with this REPL session:
==> (defprotocol Add (add [_]))
Add
==> (defrecord R [x y] Add (add [_] (+ x y)))
=.R
==> (let [r (->R 1 2)] (time (dotimes [_ 100000] (add r)))) ; Pure functional style
"Elapsed time: 19.613694 msecs"
nil
==> (let [r (->R 1 2)] (time (dotimes [_ 100000] (.add r)))) ; Functional creation, but with method call
"Elapsed time: 477.29611 msecs"
nil
==> (let [r (R. 1 2)] (time (dotimes [_ 100000] (.add r)))) ; Java-style
"Elapsed time: 10.051506 msecs"
nil
==> (let [r (R. 1 2)] (time (dotimes [_ 100000] (add r)))) ; Java-style creation with functional call
"Elapsed time: 18.726801 msecs"
nil
I can't really see the reason for these differences, so I'm asking that from you.
The problem with your second call is that Clojure compiler is unable to determine the type of r variable at compilation time, so it is forced to use reflections.
To avoid it you should add type hint:
(let [^user.R r (->R 1 2)] (time (dotimes [_ 100000] (.add r))))
or simply
(let [^R r (->R 1 2)] (time (dotimes [_ 100000] (.add r))))
and it'll be just as fast as Java-style method call.
If you want to easily diagnose such problems in your code, set *warn-on-reflection* flag to true:
(set! *warn-on-reflection* true)
or add it to :global-vars section in your project.clj file:
:global-vars {*warn-on-reflection* true}
So, as you can see, without reflections method calls are a little bit faster than functional calls. But reflections could make method calls really slow.

Clojure performance, large looping over large vectors

I am performing element-wise operations on two vectors on the order of 50,000 elements in size, and having unsatisfactory performance issues (a few seconds). Are there any obvious performance issues to be made, such as using a different data structure?
(defn boolean-compare
"Sum up 1s if matching 0 otherwise"
[proposal-img data-img]
(sum
(map
#(Math/abs (- (first %) (second %)))
(partition 2 (interleave proposal-img data-img)))))
Try this:
(apply + (map bit-xor proposal-img data-img)))
Some notes:
mapping a function to several collections uses an element from each as the arguments to the function - no need to interleave and partition for this.
If your data is 1's and 0's, then xor will be faster than absolute difference
Timed example:
(def data-img (repeatedly 50000 #(rand-int 2)))
(def proposal-img (repeatedly 50000 #(rand-int 2)))
(def sum (partial apply +))
After warming up the JVM...
(time (boolean-compare proposal-img data-img))
;=> "Elapsed time: 528.731093 msecs"
;=> 24802
(time (apply + (map bit-xor proposal-img data-img)))
;=> "Elapsed time: 22.481255 msecs"
;=> 24802
You should look at adopting core.matrix if you are interested in good performance for large vector operations.
In particular, the vectorz-clj library (a core.matrix implementation) has some very fast implementations for most common vector operations with double values.
(def v1 (array (repeatedly 50000 #(rand-int 2))))
(def v2 (array (repeatedly 50000 #(rand-int 2))))
(time (let [d (sub v2 v1)] ;; take difference of two vectors
(.abs d) ;; calculate absolute value (mutate d)
(esum d))) ;; sum elements and return result
=> "Elapsed time: 0.949985 msecs"
=> 24980.0
i.e. under 20ns per pair of elements - that's pretty quick: you'd be hard pressed to beat that without resorting to low-level array-fiddling code.

Performance Problem with Clojure Array

This piece of code is very slow. Execution from the slime-repl on my netbook takes a couple minutes.
(def test-array (make-array Integer/TYPE 400 400 3))
(doseq [x (range 400), y (range 400), z (range 3)]
(aset test-array x y z 0))
Conversely, this code runs really fast:
(def max-one (apply max (map (fn [w] (apply max (map #(first %) w))) test-array)))
(def max-two (apply max (map (fn [w] (apply max (map #(second %) w))) test-array)))
(def max-three (apply max (map (fn [w] (apply max (map #(last %) w))) test-array)))
Does this have something to do with chunked sequences? Is my first example just written wrong?
You're hitting Java reflection. This blog post has a workaround:
http://clj-me.cgrand.net/2009/10/15/multidim-arrays/
You might get better performance from one of the four Clojure matrix implementations available via a single interface core.matrix: at clojars, at github.

Resources