Elapsed time in lazy sequence evaluation - time

Given this code:
(reduce my-fun my-lazy-seq)
To measure the elapsed time of the entire operation:
(time (reduce my-fun my-lazy-seq)) ;;Elapsed time: 1000.1234 msecs
How do I measure the elapsed time of this loop at various stages before completion? For example:
Elapsed time to process next 1000 samples in my-lazy-seq: 100.1234 msecs
Elapsed time to process next 1000 samples in my-lazy-seq: 99.1234 msecs
Elapsed time to process next 1000 samples in my-lazy-seq: 101.1234 msecs
...

(doseq [thousand (partition 1000 my-lazy-seq)]
(time (reduce my-fun thousand)))

How about this:
(defn seq-counter [n coll]
(let [t0 (System/currentTimeMillis)
f (fn [i x]
(let [i (inc i)]
(if (= 0 (rem i n))
(println i "items processed in" (- (System/currentTimeMillis) t0) "ms."))
x))]
(map-indexed f coll)))
map-indexed used to check the progress. The above function will print the count and processing time in every n elements.
user=> (reduce + (seq-counter 10 (range 100)))
10 items processed in 0 ms.
20 items processed in 0 ms.
...
100 items processed in 1 ms.
4950
Refer to Idiomatic clojure for progress reporting?

Related

Is Time complexity O(n) or O(n^2)?

I feel that the time complexity of this js function I wrote is O(n) but at the same time it feels like its O(n^2). What's the correct time complexity? The function is supposed to find the last index of the first duplicate item found. For example in the first example, 1 was found at index 0 and 1 was also found at index 6. So the result would be 6 because thats the last index of the first duplicate values in that array. If no duplicates found then we return -1.
// [1, 2, 4, 5, 2, 3, 1] --> output: 6
// [1, 1, 3, 2, 4] --> output: 1
// [1, 2, 3, 4, 5, 6] --> output: -1(not found)
// It can be in any order, just need to find the last index of the first duplicate value thats there
const findFirstDuplicateIndex = (arr) => {
let ptr1 = 0
let ptr2 = 1
while (ptr1 < arr.length - 1) { // O(n)
if(arr[ptr1] === arr[ptr2]) {
return ptr2 + 1
} else {
ptr2++
}
if (ptr2 === arr.length - 1) {ptr1++; ptr2 = ptr1 + 1}
}
return -1
}
The time complexity of your code is O(n^2).
Your code is another version of two nested loops.
if (ptr2 === arr.length - 1) {ptr1++; ptr2 = ptr1 + 1}
This code is equal to adding a nesting loop i.e
for(ptr2 = ptr1+1; ptr2 < arr.length; ; ++ptr2)
If we rewrite your code with two nested for loops, we have
const findFirstDuplicateIndex = (arr) => {
for(let ptr1=0; ptr1 < arr.length - 1; ++ptr1) {
for(let ptr2=ptr1+1; ptr2 < arr.length; ++ptr2){
if(arr[ptr1] === arr[ptr2]) {
return ptr2
}
}
}
return -1;
}
Now,
For the 1st iteration: the inner loop will cost N-1.
For the 2nd iteration: the inner loop will cost N-2.
For the 3rd iteration: the inner loop will cost N-3.
........................
........................
For the (N-1)th iteration: the inner loop will cost 1.
So, the total time complexity is the sum of the cost which is
(N-1) + (N-2) + (N-3) + . . . . + 1
which is an arithmetic series and by using the arithmetic sum formula we have
(N-1) + (N-2) + (N-3) + . . . . + 1 = (N-1)*(N-2)/2 = O(N^2)
Hence, the time complexity of your code is O(n^2).
No matter, how you write it down, eventually. Your code runs the variable arr1 from 0..n-2 and your variable arr2 for each arr1 from arr1+1..n-1. Which yields the O(N^2) (worst case) time complexity.
But since it is not as easy to spot for more complicated algorithms, one good
way to assess the complexity is to have an instrumented version of the algorithm, where you simply count the number of steps. And use pathologically worst case data as input.
In your case, the worst case is, when the duplicate is positioned as the last 2 values in the array (e.g. [5 4 3 2 1 1]).
So, in step 1, write yourself some test data generator:
(defun gen-test-data (n)
(make-array (+ n 1)
:initial-contents
(append
(loop for x from n downto 1
collecting x)
'(1))))
It produces the pathological pattern in an array of length (n+1).
(gen-test-data 20)
#(20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1)
Next, write your instrumented algorithm, which returns a little extra information:
(defun first-duplicate (data)
(let* ((arr (etypecase data
((simple-vector *) data)
(cons (make-array (length data)
:initial-contents data))))
(n (array-dimension arr 0)))
(loop
with counter = 0
for i0 below (- n 1)
do (loop
for i1 from (+ i0 1) below n
do (when (= (aref arr i0) (aref arr i1))
(return-from first-duplicate
(list :value (aref arr i0)
:i0 i0
:i1 i1
:counter counter
:n n)))
do (incf counter)))))
I picked the nested loop notation here, because that is what you do anyway.
The last step now is to run the function for various n, so you can see, how n relates to counter:
(loop for n from 2 to 100 by 10
collecting (first-duplicate (gen-test-data n)))
((:VALUE 1 :I0 1 :I1 2 :COUNTER 2 :N 3)
(:VALUE 1 :I0 11 :I1 12 :COUNTER 77 :N 13)
(:VALUE 1 :I0 21 :I1 22 :COUNTER 252 :N 23)
(:VALUE 1 :I0 31 :I1 32 :COUNTER 527 :N 33)
(:VALUE 1 :I0 41 :I1 42 :COUNTER 902 :N 43)
(:VALUE 1 :I0 51 :I1 52 :COUNTER 1377 :N 53)
(:VALUE 1 :I0 61 :I1 62 :COUNTER 1952 :N 63)
(:VALUE 1 :I0 71 :I1 72 :COUNTER 2627 :N 73)
(:VALUE 1 :I0 81 :I1 82 :COUNTER 3402 :N 83)
(:VALUE 1 :I0 91 :I1 92 :COUNTER 4277 :N 93))
The output now clearly shows, that it cannot be O(N) and is rather O(N^2).
In time complexity concept we have to asign a 1 value for declaring in loops we have to check the condition and after that we have to check how many times it will run and for return any value we have to add 1 and after completing we have to add alll .

Generate a random number, excluding a single number

I'm writing a Monty Hall simulator, and found the need to generate a number within a range, excluding a single number.
This seemed easy, so I naively wrote up:
(The g/... functions are part of my personal library. Their use should be fairly clear):
(defn random-int-excluding
"Generates a random number between min-n and max-n; excluding excluding-n.
min-n is inclusive, while max-n is exclusive."
[min-n max-n excluding-n rand-gen]
(let [rand-n (g/random-int min-n max-n rand-gen)
rand-n' (if (= rand-n excluding-n) (inc rand-n) rand-n)]
(g/wrap rand-n' min-n (inc max-n))))
This generates a random number within the range, and if it equals the excluded number, adds one; wrapping if necessary. Of course this ended up giving the number after the excluded number twice the chance of being picked since it would be picked either if it or the excluded number are chosen. Sample output frequencies for a range of 0 to 10 (max exclusive), excluding 2:
([0 0.099882]
[1 0.100355]
[3 0.200025]
[4 0.099912]
[5 0.099672]
[6 0.099976]
[7 0.099539]
[8 0.100222]
[9 0.100417])
Then I read this answer, which seemed much simpler, and based on it, wrote up:
(defn random-int-excluding
"Generates a random number between min-n and max-n; excluding excluding-n.
min-n is inclusive, while max-n is exclusive."
[min-n max-n excluding-n rand-gen]
(let [r1 (g/random-int min-n excluding-n rand-gen)
r2 (g/random-int (inc excluding-n) max-n rand-gen)]
(if (g/random-boolean rand-gen) r1 r2)))
Basically, it splits the range into 2 smaller ranges: from the min to the excluded number, and from excluded number + 1 to the max. It generates random number from these ranges, then randomly chooses one of them. Unfortunately though, as I noted under the answer, this gives skewed results unless both the partitions are of equal size. Sample output frequencies; same conditions as above:
([0 0.2499497]
[1 0.2500795]
[3 0.0715849]
[4 0.071297]
[5 0.0714366]
[6 0.0714362]
[7 0.0712715]
[8 0.0715285]
[9 0.0714161])
Note the numbers part of the smaller range before the excluded number are much more likely. To fix this, I'd have to skew it to pick numbers from the larger range more frequently, and really, I'm not proficient enough in maths in general to understand how to do that.
I looked at the accepted answer from the linked question, but to me, it seems like a version of my first attempt that accepts more than 1 number to exclude. I'd expect, against what the answerer claimed, that the numbers at the end of the exclusion range would be favored, since if a number is chosen that's within the excluded range, it just advances the number past the range.
Since this is going to be one of the most called functions in the simulation, I'd really like to avoid the "brute-force" method of looping while the generated number is excluded since the range will only have 3 numbers, so there's a 1/3 chance that it will need to try again each attempt.
Does anyone know of a simple algorithm to chose a random number from a continuous range, but exclude a single number?
To generate a number in the range [a, b] excluding c, simply generate a number in the range [a, b-1], and if the result is c then output b instead.
Just generate a lazy sequence and filter out items you don't want:
(let [ignore #{4 2}]
(frequencies
(take 2000
(remove ignore (repeatedly #(rand-int 5))))))
Advantage to the other approach of mapping to different new values: This function will also work with different discrete random number distributions.
If the size of the collection of acceptable answers is small, just put all values into a vector and use rand-nth:
http://clojuredocs.org/clojure.core/rand-nth
(def primes [ 2 3 5 7 11 13 17 19] )
(println (rand-nth primes))
(println (rand-nth primes))
(println (rand-nth primes))
~/clj > lein run
19
13
11
Update
If some of the values should include more than the others, just put them in the array of values more than once. The number of occurrances of each value determines its relative weight:
(def samples [ 1 2 2 3 3 3 4 4 4 4 ] )
(def weighted-samples
(repeatedly #(rand-nth samples)))
(println (take 22 weighted-samples))
;=> (3 4 2 4 3 2 2 1 4 4 3 3 3 2 3 4 4 4 2 4 4 4)
If we wanted any number from 1 to 5, but never 3, just do this:
(def samples [ 1 2 4 5 ] )
(def weighted-samples
(repeatedly #(rand-nth samples)))
(println (take 22 weighted-samples))
(1 5 5 5 5 2 2 4 2 5 4 4 5 2 4 4 4 2 1 2 4 1)
Just to show the implementation I wrote, here's what worked for me:
(defn random-int-excluding
"Generates a random number between min-n and max-n; excluding excluding-n.
min-n is inclusive, while max-n is exclusive."
[min-n max-n excluding-n rand-gen]
(let [rand-n (g/random-int min-n (dec max-n) rand-gen)]
(if (= rand-n excluding-n)
(dec max-n)
rand-n)))
Which gives a nice even distribution:
([0 0.111502]
[1 0.110738]
[3 0.111266]
[4 0.110976]
[5 0.111162]
[6 0.111266]
[7 0.111093]
[8 0.110815]
[9 0.111182])
Just to make Alan Malloy's answer explicit:
(defn rand-int-range-excluding [from to without]
(let [n (+ from (rand-int (dec (- to from))))]
(if (= n without)
(dec to)
n)))
(->> #(rand-int-range-excluding 5 10 8)
repeatedly
(take 100)
frequencies)
;{6 28, 9 22, 5 29, 7 21}
No votes required :).

Principal component function Incanter

I have been trying to use the principal-components function from Incanter to do PCA and seem to be off track in using it. I found some sample data online from a PCA tutorial and wanted to practice on it:
(def data [[0.69 0.49] [-1.31 -1.21] [0.39 0.99] [0.09 0.29] [1.29 1.09]
[0.49 0.79] [0.19 (- 0 0.31)] [(- 0 0.81) (- 0 0.81)]
[(- 0 0.31) (- 0 0.31)] [(- 0 0.71) (- 0 1.01)]])
Upon first attempt to implement PCA I tried passing vectors to Incanter's matrix function, but found myself passing it too many arguments. At this point I decided to try a nested vector structure as defined above, but would like to avoid this route.
How would I turn data into a matrix (Incanter) such that it will be accepted as input into Incanter's function principal-components. For simplicity let's call the new matrix fooMatrix.
Once this matrix, fooMatrix, has been constructed the following code should work to extract the first two principal components
(def pca (principal-components fooMatrix))
(def components (:rotation pca))
(def pc1 (sel components :cols 0))
(def pc2 (sel components :cols 1))
and then the data can be projected on the principal components by
(def principal1 (mmult fooMatrix pc1))
(def principal2 (mmult fooMatrix pc2))
Check out the Incanter API. I believe you just want (incanter.core/matrix data). These are your options for Incanter's matrix function. Maybe A2 is what you're interested in.
(def A (matrix [[1 2 3] [4 5 6] [7 8 9]])) ; produces a 3x3 matrix
(def A2 (matrix [1 2 3 4 5 6 7 8 9] 3)) ; produces the same 3x3 matrix
(def B (matrix [1 2 3 4 5 6 7 8 9])) ; produces a 9x1 column vector
Example using your data:
user=> (use '[incanter core stats charts datasets])
nil
user=>(def data [0.69 0.49 -1.31 -1.21 0.39 0.99 0.09 0.29 1.29
1.09 0.49 0.79 0.19 (- 0 0.31) (- 0 0.81) (- 0 0.81)
(- 0 0.31) (- 0 0.31) (- 0 0.71) (- 0 1.01)])
user=>(def fooMatrix (matrix data 2))
user=>(principal-components fooMatrix)
{:std-dev (1.3877785387777999 0.27215937850413047), :rotation A 2x2 matrix
-------------
-7.07e-01 -7.07e-01
-7.07e-01 7.07e-01
}
VoilĂ . Nested vector structure gone.

how to generate a series representing the binary expansion of 'e'

I'm trying to find the first 100,000 binary digits in the expansion of 'e'. Is there an algorithm to generate the binary digits of 'e' as a infinite list?
Here's an unbounded spigot for e in Haskell:
main = print $ stream (1,0,1) [(n, a*d, d) | (n,d,a) <- map f [1..]]
where
f k = (1, k, 1)
stream z (x:xs)
| lbound == approx z 2 = lbound : stream (mul (10, -10*lbound, 1) z) (x:xs)
| otherwise = stream (mul z x) xs
where
lbound = approx z 1
approx (a,b,c) n = (a*n + b) `div` c
mul (a,b,c) (d,e,f) = (a*d, a*e + b*f, c*f)
Based on the Programming Praxis unbounded spigot for e and pi, which in turn is derived from Gibbon's first unbounded spigot for pi.
$ runhaskell A.hs
[2,7,1,8,2,8,1,8,2,8,4,5,9,0,4,5,2,3,5,3,6, ^C
I'd recommend Gibbon's paper if you're interested in these fun algorithms.
You might be interested in using CReal for this. For 100,000 binary digits, 30,200 decimal digits is enough:
Prelude> 100000 * logBase 10 2
30102.999566398114
Prelude> :m + Data.Number.CReal
Prelude> :set +s
Prelude Data.Number.CReal> last $ showCReal 1000 (exp 1)
'4'
(0.34 secs, 34061824 bytes)
Prelude Data.Number.CReal> last $ showCReal 2000 (exp 1)
'4'
(1.25 secs, 104478784 bytes)
Prelude Data.Number.CReal> last $ showCReal 4000 (exp 1)
'7'
(5.96 secs, 355775928 bytes)
Prelude Data.Number.CReal> last $ showCReal 8000 (exp 1)
'2'
(20.89 secs, 1298942504 bytes)
This pattern looks about quadratic to me, so computing the first 30,200 digits of exp 1 looks like it might reasonably finish in about five or six minutes here on my machine. A patch to output in binary directly (and therefore avoid converting to decimal and back) would likely be accepted.
edit: Projection satisfied, just under six minutes of compute time!
Prelude Data.Number.CReal> showCReal 30200 (exp 1)
"2.718281828459045235360287471352662497757247093699959574966967627724076630353547594571382178525166427427466391932003059921817413596629043572900334...middle snipped due to StackOverflow message limit...39106913376148418348845963656215266103322394174671"
(349.44 secs, 17096829912 bytes)

Generating randoms numbers in a certain range for common lisp

I'm doing some homework and for one part I have to generate random numbers in the range 10 - 80. I know (random 80) will return a number less than 80 but how do I get it to get the numbers to be above 10 as well?
Hint: (+ 1 (random 80)) will give you a number between 1 and 80 inclusive.
This code will give you random numbers from 10 to 80:
(+ 10 (random 71))
even better, try this general formula:
(defun random-from-range (start end)
(+ start (random (+ 1 (- end start)))))

Resources