Why does time need to be a macro? - time

The Clojure source code gives the following definition of time
(defmacro time
"Evaluates expr and prints the time it took. Returns the value of
expr."
{:added "1.0"}
[expr]
`(let [start# (. System (nanoTime))
ret# ~expr]
(prn (str "Elapsed time: " (/ (double (- (. System (nanoTime)) start#)) 1000000.0) " msecs"))
ret#))
Why does this need to be a macro? What about it could not be done with a function?

Because function evaluates its arguments before it starts working with them. You can write time as function and try to call it:
(defn my-time [expr]
(let [start (System/nanoTime)
ret expr]
(prn (str "Elapsed time: " (/ (double (- (System/nanoTime) start)) 1000000.0) " msecs"))
ret))
(my-time (Thread/sleep 1000))
"Elapsed time: 0.005199 msecs"
=> nil
Compare that with time macro:
(time (Thread/sleep 1000))
"Elapsed time: 1000.1222 msecs"
=> nil
Function my-time evaluated argument (Thread/sleep 1000), symbol expr got value nil and then body of that function happened- two (System/nanoTime) were called shortly after each other, because value of expr was already computed.
Macro time didn't evaluate (Thread/sleep 1000), but expanded to:
(macroexpand `(time (Thread/sleep 1000)))
=>
(let*
[start__6136__auto__ (. java.lang.System (clojure.core/nanoTime)) ret__6137__auto__ (java.lang.Thread/sleep 1000)]
(clojure.core/prn
(clojure.core/str
"Elapsed time: "
(clojure.core//
(clojure.core/double (clojure.core/- (. java.lang.System (clojure.core/nanoTime)) start__6136__auto__))
1000000.0)
" msecs"))
ret__6137__auto__)
and functions were called in this order: (System/nanoTime), (Thread/sleep 1000) and other (System/nanoTime), which returns correct time.

Related

Clojure: Reducing large lazy collection eats up memory

I'm new to Clojure. I have the following code, which creates an infinite lazy sequence of numbers:
(defn generator [seed factor]
(drop 1 (reductions
(fn [acc _] (mod (* acc factor) 2147483647))
seed
; using dummy infinite seq to keep the reductions going
(repeat 1))))
Each number in the sequence is dependent on the previous calculation. I'm using reductions because I need all the intermediate results.
I then instantiate two generators like so:
(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))
I then want to compare n consecutive results of these sequences, for large n, and return the number of times they are equal.
At first I did something like:
(defn run []
(->> (interleave gen-a gen-b)
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
It was taking far too long and I saw the program's memory usage spike to about 4GB. With some printlns I saw that after about 10 million iterations it got really slow, so I was thinking that maybe count needed to store the entire sequence in memory, so I changed it to use reduce:
(defn run-2 []
(reduce
(fn [acc [a b]]
(if (= a b)
(inc acc)
acc))
0
(take 40000000 (partition 2 (interleave gen-a gen-b)))))
Still, it was allocating a lot of memory and slowing down significantly after the first couple of millions. I'm pretty sure that it's storing the entire lazy sequence in memory but I'm not sure why, so I tried to manually throw away the head:
(defn run-3 []
(loop [xs (take 40000000 (partition 2 (interleave gen-a gen-b)))
total 0]
(cond
(empty? xs) total
(apply = (first xs)) (recur (rest xs) (inc total))
:else (recur (rest xs) total))))
Again, same results. This stumped me because I'm reading that all of the functions I'm using to create my xs sequence are lazy, and since I'm only using the current item I'm expecting it to use constant memory.
Coming from a Python background I'm basically trying to emulate Python Generators. I'm probably missing something obvious, so I'd really appreciate some pointers. Thanks!
Generators are not (lazy) sequences.
You are holding on to the head here:
(def gen-a (generator 59 16807))
(def gen-b (generator 393 48271))
gen-a and gen-b are gobal vars referring to the head a sequence.
You probably want something like:
(defn run []
(->> (interleave (generator 59 16807) (generator 393 48271))
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
Alternatively, define gen-a and gen-b as functions:
(defn gen-a
[]
(generator 59 16807)))
...
(defn run []
(->> (interleave (gen-a) (gen-b))
(partition 2)
(take 40000000)
(filter #(apply = %))
(count)))
You can get Python-style generator functions in Clojure using the Tupelo library. Just use lazy-gen and yield like so:
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t] ))
(defn rand-gen
[seed factor]
(t/lazy-gen
(loop [acc seed]
(let [next (mod (* acc factor) 2147483647)]
(t/yield next)
(recur next)))))
(defn run2 [num-rand]
(->> (interleave
; restrict to [0..99] to simulate bad rand #'s
(map #(mod % 100) (rand-gen 59 16807))
(map #(mod % 100) (rand-gen 393 48271)))
(partition 2)
(take num-rand)
(filter #(apply = %))
(count)))
(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5
with result:
"Elapsed time: 409.697922 msecs" (time (run2 100000.0)) => 1025
"Elapsed time: 3250.592798 msecs" (time (run2 1000000.0)) => 9970
"Elapsed time: 32995.194574 msecs" (time (run2 1.0E7)) => 100068
Rather than using reductions, you could build a lazy sequence directly. This answer uses lazy-cons from the Tupelo library (you could also use lazy-seq from clojure.core).
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t] ))
(defn rand-gen
[seed factor]
(let [next (mod (* seed factor) 2147483647)]
(t/lazy-cons next (rand-gen next factor))))
(defn run2 [num-rand]
(->> (interleave
; restrict to [0..99] to simulate bad rand #'s
(map #(mod % 100) (rand-gen 59 16807))
(map #(mod % 100) (rand-gen 393 48271)))
(partition 2)
(take num-rand)
(filter #(apply = %))
(count)))
(t/spyx (time (run2 1e5))) ; expect ~1% will overlap => 1e3
(t/spyx (time (run2 1e6))) ; expect ~1% will overlap => 1e4
(t/spyx (time (run2 1e7))) ; expect ~1% will overlap => 1e5
with results:
"Elapsed time: 90.42 msecs" (time (run2 100000.0)) => 1025
"Elapsed time: 862.60 msecs" (time (run2 1000000.0)) => 9970
"Elapsed time: 8474.25 msecs" (time (run2 1.0E7)) => 100068
Note that the execution times are about 4x faster, since we have cut out the generator function stuff that we weren't really using anyway.

How to know where to put type hints to improve numeric performance in Clojure?

In the Clojure documentation on type hinting, it has the following example on how type hinting and coercions can make code run much faster:
(defn foo [n]
(loop [i 0]
(if (< i n)
(recur (inc i))
i)))
(time (foo 100000))
"Elapsed time: 0.391 msecs"
100000
(defn foo2 [n]
(let [n (int n)]
(loop [i (int 0)]
(if (< i n)
(recur (inc i))
i))))
(time (foo2 100000))
"Elapsed time: 0.084 msecs"
100000
If you run this code with (set! *warn-on-reflection* true), it doesn't show a reflection warning. Is it up to programmer trial-and-error to see where these kinds of adornments make a performance difference? Or is there a tool that indicates the problematic areas?
Well you can estimate this pretty well, just by thinking about which parts of the code gets hit often.
Or you could use a normal profiler of some sort. I would recommend VIsual VM, which you can get to work with clojure. Then you just place them in the methods you see take most of the time (it will also show you calls to java.lang.reflect.Method, if this gets called a lot you should consider using type hints).

Making my Clojure map function implementation faster

I've been experimenting with Clojure lately. I tried writing my own map function (two actually) and timed them against the built in function. However, my map functions are way way slower than the built in one. I wanted to know how I could make my implementation faster. It should give me some insights into performance tuning Clojure algorithms I write. The first function (my-map) does recursion with recur. The second version (my-map-loop) uses loop/recur which was much faster than simply using recur.
(defn my-map
([func lst] (my-map func lst []))
([func lst acc]
(if (empty? lst)
acc
(recur func (rest lst) (conj acc (func (first lst)))))))
(defn my-map-loop
([func lst]
(loop [acc []
inner-lst lst]
(if (empty? inner-lst)
acc
(recur (conj acc (func (first inner-lst))) (rest inner-lst))
))))
(let [rng (range 1 10000)]
(time (map #(* % %) rng))
(time (my-map #(* % %) rng))
(time (my-map-loop #(* % %) rng)))
These are the results I got -
"Elapsed time: 0.084496 msecs"
"Elapsed time: 14.132217 msecs"
"Elapsed time: 7.324682 mess"
Update
After resueman pointed out that I was timing things incorrectly, I changed the functions to:
(let [rng (range 1 10000)]
(time (doall (map #(* % %) rng)))
(time (doall (my-map #(* % %) rng)))
(time (doall (my-map-loop #(* % %) rng)))
nil)
These are the new results:
"Elapsed time: 9.563343 msecs"
"Elapsed time: 12.320779 msecs"
"Elapsed time: 5.608647 mess"
"Elapsed time: 11.103316 msecs"
"Elapsed time: 18.307635 msecs"
"Elapsed time: 5.86644 mess"
"Elapsed time: 10.276658 msecs"
"Elapsed time: 10.288517 msecs"
"Elapsed time: 6.19183 mess"
"Elapsed time: 9.277224 msecs"
"Elapsed time: 13.070076 msecs"
"Elapsed time: 6.830464 mess"
Looks like my second implementation is fastest of the bunch. Anyways, I would still like to know if there are ways to optimize it further.
There are many things that could be leveraged to have a faster map: transients (for your accumulator), chunked seqs (for the source but only make sense when you want a lazy output), reducible collections (for the source again) and getting more familiar with the core functions (there's is a mapv).
You should also consider using Criterium instead of time if only for the fact that it checks whether your JVM optimizations are capped (which is the default with lein).
=> (let [rng (range 1 10000)]
(quick-bench (my-map-loop #(* % %) rng))
(quick-bench (into [] (map #(* % %)) rng)) ; leveraging reducible collections and transients
(quick-bench (mapv #(* % %) rng))) ; existing core fn
(output elided to keep only the means)
Execution time mean : 776,364755 µs
Execution time mean : 409,737852 µs
Execution time mean : 456,071295 µs
It is interesting to note that mapv is no faster than (into [] (map #(* % %)) rng) that is a generic way of optimizing these kinds of computations.

Difference in Clojure's record methods

Okay, the title is not exactly what I was looking for, but it has to do I found an interesting thing in the speed of record's member function access. I'll illustrate with this REPL session:
==> (defprotocol Add (add [_]))
Add
==> (defrecord R [x y] Add (add [_] (+ x y)))
=.R
==> (let [r (->R 1 2)] (time (dotimes [_ 100000] (add r)))) ; Pure functional style
"Elapsed time: 19.613694 msecs"
nil
==> (let [r (->R 1 2)] (time (dotimes [_ 100000] (.add r)))) ; Functional creation, but with method call
"Elapsed time: 477.29611 msecs"
nil
==> (let [r (R. 1 2)] (time (dotimes [_ 100000] (.add r)))) ; Java-style
"Elapsed time: 10.051506 msecs"
nil
==> (let [r (R. 1 2)] (time (dotimes [_ 100000] (add r)))) ; Java-style creation with functional call
"Elapsed time: 18.726801 msecs"
nil
I can't really see the reason for these differences, so I'm asking that from you.
The problem with your second call is that Clojure compiler is unable to determine the type of r variable at compilation time, so it is forced to use reflections.
To avoid it you should add type hint:
(let [^user.R r (->R 1 2)] (time (dotimes [_ 100000] (.add r))))
or simply
(let [^R r (->R 1 2)] (time (dotimes [_ 100000] (.add r))))
and it'll be just as fast as Java-style method call.
If you want to easily diagnose such problems in your code, set *warn-on-reflection* flag to true:
(set! *warn-on-reflection* true)
or add it to :global-vars section in your project.clj file:
:global-vars {*warn-on-reflection* true}
So, as you can see, without reflections method calls are a little bit faster than functional calls. But reflections could make method calls really slow.

Clojure performance, large looping over large vectors

I am performing element-wise operations on two vectors on the order of 50,000 elements in size, and having unsatisfactory performance issues (a few seconds). Are there any obvious performance issues to be made, such as using a different data structure?
(defn boolean-compare
"Sum up 1s if matching 0 otherwise"
[proposal-img data-img]
(sum
(map
#(Math/abs (- (first %) (second %)))
(partition 2 (interleave proposal-img data-img)))))
Try this:
(apply + (map bit-xor proposal-img data-img)))
Some notes:
mapping a function to several collections uses an element from each as the arguments to the function - no need to interleave and partition for this.
If your data is 1's and 0's, then xor will be faster than absolute difference
Timed example:
(def data-img (repeatedly 50000 #(rand-int 2)))
(def proposal-img (repeatedly 50000 #(rand-int 2)))
(def sum (partial apply +))
After warming up the JVM...
(time (boolean-compare proposal-img data-img))
;=> "Elapsed time: 528.731093 msecs"
;=> 24802
(time (apply + (map bit-xor proposal-img data-img)))
;=> "Elapsed time: 22.481255 msecs"
;=> 24802
You should look at adopting core.matrix if you are interested in good performance for large vector operations.
In particular, the vectorz-clj library (a core.matrix implementation) has some very fast implementations for most common vector operations with double values.
(def v1 (array (repeatedly 50000 #(rand-int 2))))
(def v2 (array (repeatedly 50000 #(rand-int 2))))
(time (let [d (sub v2 v1)] ;; take difference of two vectors
(.abs d) ;; calculate absolute value (mutate d)
(esum d))) ;; sum elements and return result
=> "Elapsed time: 0.949985 msecs"
=> 24980.0
i.e. under 20ns per pair of elements - that's pretty quick: you'd be hard pressed to beat that without resorting to low-level array-fiddling code.

Resources