Generating list of million random elements - scheme

How to efficiently generate a list of million random elements in scheme? The following code hits maximum recursion depth with 0.1 million itself.
(unfold (lambda(x)(= x 1000000)) (lambda(x)(random 1000)) (lambda(x)(+ x 1)) 0)

It really depends on the system you're using, but here's a common way to do that in plain scheme:
(let loop ([n 1000000] [r '()])
(if (zero? n)
r
(loop (- n 1) (cons (random 1000) r))))
One note about running this code as is: if you just type it into a REPL, it will lead to printing the resulting list, and that will usually involve using much more memory than the list holds. So it's better to do something like
(define l ...same...)
There are many other tools that can be used to varying degrees of convenience. unfold is one of them, and another is for loops as can be found in PLT Scheme:
(for/list ([i (in-range 1000000)]) (random 1000))

I don't know much scheme but couldn't you just use tail-recursion (which is really just looping) instead of unfold (or any other higher-order function)?

Use the do-loop-construct as described here.

Some one correct me if I am wrong but the Fakrudeen's code should end up being optimized away since it is tail recursive. Or it should be with a proper implementation of unfold. It should never reach a maximum recursion depth.
What version of scheme are you using Fakrudeen?
DrScheme does not choke on a mere million random numbers.

Taking Chicken-Scheme as implementation, here is a try with some results.
(use srfi-1)
(use extras)
(time (unfold (lambda(x)(= x 1000000))
(lambda(x)(random 1000))
(lambda(x)(+ x 1)) 0))
(time (let loop ([n 1000000] [r '()])
(if (zero? n)
r
(loop (- n 1) (cons (random 1000) r)))))
(define (range min max body)
(let loop ((current min) (ret '()))
(if (= current max)
ret
(loop (+ current 1) (cons (body current ret) ret)))))
(time (range 0 1000000 (lambda params (random 1000))))
The results are here with csc -O3 t.scm
0.331s CPU time, 0.17s GC time (major), 12/660 GCs (major/minor)
0.107s CPU time, 0.02s GC time (major), 1/290 GCs (major/minor)
0.124s CPU time, 0.022s GC time (major), 1/320 GCs (major/minor)
As you can see, the version of the author is much more slowlier than using plain tail recursive calls. It's hard to say why the unfold call is much more slowlier but I'd guess that it's because it taking a lot more time doing function calls.
The 2 other versions are quite similar. My version is almost the same thing with the exception that I'm creating a high order function that can be reused.
Unlike the plain loop, it could be reused to create a range of function. The position and current list is sent to the function in case they are needed.
The higher order version is probably the best way to do even if it takes a bit more time to execute. It is probably also because of the function calls. It could be optimized by removing parameters and it will get almost as fast as the named let.
The advantage of the higher order version is that the user doesn't have to write the loop itself and can be used with an abstract lambda function.
Edit
Looking at this specific case. Ef we are to create a million of element ranged between 0 and 999, we could possibly create a fixed length vector of a million and with values from 0 to 999 in it. Shuffle the thing back after. Then the whole random process would depend on the shuffle function which should not have to create new memory swapping values might get faster than generating random numbers. That said, the shuffle method somewhat still rely on random.
Edit 2
Unless you really need a list, you could get away with a vector instead.
Here is my second implementation with vector-map
(time (vector-map (lambda (x y) (random 1000)) (make-vector 1000000)))
# 0.07s CPU time, 0/262 GCs (major/minor)
As you can see, it is terribly faster than using a list.
Edit 3 fun
(define-syntax bigint
(er-macro-transformer
(lambda (exp rename compare)
(let ((lst (map (lambda (x) (random 1000)) (make-list (cadr exp)))))
(cons 'list lst)))))
100000
0.004s CPU time, 0/8888 GCs (major/minor)
It's probably not a good idea to use this but I felt it might be interesting. Since it's a macro, it will get executed at compile time. The compile time will be huge, but as you can see, the speed improvement is also huge. Unfortunately using chicken, I couldn't get it to build a list of a million. My guess is that the type it might use to build the list is overflowing and accessing invalid memory.
To answer the question in the comments:
I'm not a Scheme professional. I'm pretty new to it too and as I understand, the named loop or the high order function should be the way to go. The high order function is good because it's reusable. You could define a
(define (make-random-list quantity maxran)
...)
Then thats the other interesting part, since scheme is all about high order functions. You could then replace the implementation of make-random-list with anything you like. If you need some compile time execution, define the macro otherwise use a function. All that really matters is to be able to reuse it. It has to be fast and not use memory.
Common sense tells you that doing less execution it will be faster, tail recursive calls aren't suppose to consume memory. And when you're not sure, you can hide implementation into a function that can be optimized later.

MIT Scheme limits a computation's stack. Given the size of your problem, you are likely running out of stack size. Fortunately, you can provide a command-line option to change the stack size. Try:
$ mit-scheme --stack <number-of-1024-word-blocks>
There are other command-line options, check out mit-scheme --help
Note that MIT Scheme, in my experience, is one of the few schemes that has a limited stack size. This explains why trying your code in others Schemes will often succeed.
As to your question of efficiency. The routine unfold is probably not implemented with a tail-recursive/iterative algorithm. Here is a tail recursive version with a tail recursive version of 'list reverse in-place':
(define (unfold stop value incr n0)
(let collecting ((n n0) (l '()))
(if (stop n)
(reverse! l)
(collecting (incr n) (cons (value n) l)))))
(define (reverse! list)
(let reving ((list list) (rslt '()))
(if (null? list)
rslt
(let ((rest (cdr list)))
(set-cdr! list rslt)
(reving rest list)))))
Note:
$ mit-scheme --version
MIT/GNU Scheme microcode 15.3
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even
for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Tuesday November 8, 2011 at 10:45:46 PM
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118 || Edwin 3.116
Moriturus te saluto.

Related

MIT Scheme - Merge Sort + Timing Execution

I've implemented my own merge sort in MIT Scheme. I want to test it against the builtin merge-sort and compare times; however, I don't know how to get the run time of both. Also how do you increase the stack size/recursion depth as i'm testing up to 1 million elements.
There's a bunch of timing procedures in MIT Scheme, check the documentation. In particular, try this one:
(with-timings
(lambda ()
(merge-sort '(1 2 3 4 5) >))
(lambda (run-time gc-time real-time)
(write (internal-time/ticks->seconds run-time))
(write-char #\space)
(write (internal-time/ticks->seconds gc-time))
(write-char #\space)
(write (internal-time/ticks->seconds real-time))
(newline)))
The built-in sort shouldn't have a problem with one million elements, if your own implementation is a good one, it shouldn't have problems producing a result with that data size.

how is the efficiency and cost of nested function in scheme language

In SICP, I have learned using functions, it's amazing and usefull. But I am confused with the cost of function nested, like below code:
(define (sqrt x)
(define (good-enough? guess)
(< (abs (- (square guess) x)) 0.001))
(define (improve guess)
(average guess (/ x guess)))
(define (sqrt-iter guess)
(if (good-enough? guess)
guess
(sqrt-iter (improve guess))))
(sqrt-iter 1.0))
It defines three child-functions, how is the efficiency and cost? If I use more function
calls like this?
UPDATE:
look at code below, in Searching for divisors
(define (smallest-divisor n)
(find-divisor n 2))
(define (find-divisor n test-divisor)
(cond ((> (square test-divisor) n) n)
((divides? test-divisor n) test-divisor)
(else (find-divisor n (+ test-divisor 1)))))
(define (divides? a b)
(= (remainder b a) 0))
(define (prime? n)
(= n (smallest-divisor n)))
Q1: the divides? and smallest-divisor are not necessary, just for clarification. How are the efficiency and cost? Does Scheme compiler optimize for this situation. I think I should learn some knowledge about compiler.(͡๏̯͡๏)
q2: How about in interpreter?
It's implementation dependent. It does not say anything about how much cost a closure should have, but since Scheme is designed around closures any implementation should stride to make closures cheap. Many implementations does CPS conversions and that introduces a closure per evaluation operation.
There is a compiler technique called lambda lifting where local functions get transformed to global by changing free variables not in global scope to bounded ones and lifting the procedure out of the procedure it was defined in. The SICP code might get translated to something like:
(define (lift:good-enough? x guess)
(< (abs (- (square guess) x)) 0.001))
(define (lift:improve x guess)
(average guess (/ x guess)))
(define (lift:sqrt-iter x guess)
(if (lift:good-enough? x guess)
guess
(lift:sqrt-iter (lift:improve x guess))))
(define (sqrt x)
(lift:sqrt-iter x 1.0))
Where all the lift:-prefixed identifiers are unique so that it does not collide with existing global bindings.
The efficiency question breaks down to a question of compiler optimization. Generally anytime you have a lexical procedure, such as your improve procedure, that references a free variable, such as your x, then a closure needs to be created. The closure has an 'environment' that must be allocated to store all free variables. Thus there is some space overhead and some time overhead (to allocate the memory and to fill the memory).
Where does compiler optimization come to play? When a lexical procedure does not 'escape' its lexical block, such as all of yours, then the compiler can inline the procedures. In that case a closure with its environment need not be created.
But, importantly, in every day use, you shouldn't worry about the use of lexical procedures.
In one word: negligible. Using nested functions instead of top-level defined functions won't have a noticeable effect on performance, we use nested functions ("block structure" in SICP's terms) for clarity and better structuring a procedure, not for efficiency:
Such nesting of definitions, called block structure, is basically the right solution to the simplest name-packaging problem
There might be a small difference in the time it takes to look up a function depending on where it was defined, but that will depend on implementation details of the interpreter. It's not worth worrying about it.
Not relevant.
One of the important aspects of designing a programming language is often choosing between efficiency on one side, and expressiveness on the other side. In most situations these two aspects defines the charactersistics of a low-level and high-level language, respectively.
Scheme is a small, but powerful high-level language from the family of Lisp languages. One of the most powerful feature of Scheme is it's expressiveness and ability to abstract. As a programmer of Scheme you use block structure inside procedures because it encapsulates related behaviour and solves your problem in a structured way, but you don't consider low-level properties of this behaviour, such as runtime-cost of calling procedures or allocating lists. This is part of the joy of programming in a high-level language such as Scheme.
As you say: It's amazing and useful, so continue your work and create something nice. Until the program becomes considerably slow in operation I wouldn't care about these things and just concentrate on the harder problems like defining concepts and behaviour in your program.

How can I make this clojure code run faster?

I have a version implemented in Lisp(SBCL) which runs under 0.001 seconds with 12 samples. However this version(in clojure) takes more than 1.1 secs.
What should I do for making this code run as fast as original Lisp version?
To make it sure, my numbers are not including times for starting repl and others. And is from time function of both sbcl and clojure.(Yes, my laptop is rather old atom based one)
And this application is/will be used in the repl, not as compiled in single app, so running thousand times before benchmarking seems not meaningful.
Oh, the fbars are like this: [[10.0 10.5 9.8 10.1] [10.1 10.8 10.1 10.7] ... ], which is
Open-High-Low-Close price bars for the stocks.
(defn- build-new-smpl [fmx fmn h l c o]
(let [fmax (max fmx h)
fmin (min fmn l)
fc (/ (+ c fmax fmin) 3.0)
fcd (Math/round (* (- fc o) 1000))
frd (Math/round (* (- (* 2.0 c) fmax fmin) 1000))]
(if (and (> fcd 0) (> frd 0))
[1 fmax fmin]
(if (and (< fcd 0) (< frd 0))
[-1 fmax fmin]
[0 fmax fmin]))))
(defn binary-smpls-using [fbars]
(let [fopen (first (first fbars))]
(loop [fbars fbars, smpls [], fmax fopen, fmin fopen]
(if (> (count fbars) 0)
(let [bar (first fbars)
[_ h l c _] bar
[nsmpl fmx fmn] (build-new-smpl fmax fmin h l c fopen)]
(recur (rest fbars) (conj smpls nsmpl) fmx fmn))
smpls))))
================================================
Thank you. I managed to make the differences for 1000 iteration as 0.5 secs (1.3 secs on SBCL and 1.8 on Clojure). Major factor is I should have created fbars as not lazy but as concrete(?) vector or array, and this solves my problem.
You need to use a proper benchmarking library; the standard Clojure solution is Hugo Duncan's Criterium.
The reason is that code on the JVM starts running in interpreted mode and then eventually gets compiled by the JIT compiler; it's the steady-state behaviour after JIT compilation that you want to benchmark and not behaviour during the profiling stage. This is, however, quite tricky, since the JIT compiler optimizes no-ops away where it can see them, so you need to make sure your code causes side effects that won't be optimized away, but then you still need to run it in a loop to obtain meaningful results etc. -- quick and dirty solutions just don't cut it. (See the Elliptic Group, Inc. Java benchmarking article, also linked to by Criterium's README, for an extended discussion of the issues involved.)
Cycling the two samples you listed in a vector of length 1000 results in a timing of ~327 µs in a Criterium benchmark on my machine:
(require '[criterium.core :as c])
(def v (vec (take 1000 (cycle [[10.0 10.5 9.8 10.1] [10.1 10.8 10.1 10.7]]))))
(c/bench (binary-smpls-using v))
WARNING: Final GC required 4.480116525558204 % of runtime
Evaluation count : 184320 in 60 samples of 3072 calls.
Execution time mean : 327.171892 µs
Execution time std-deviation : 3.129050 µs
Execution time lower quantile : 322.731261 µs ( 2.5%)
Execution time upper quantile : 333.117724 µs (97.5%)
Overhead used : 1.900032 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
A really good benchmark would actually involve an interesting dataset (all different samples, preferably coming from the real world).
When I run this with 1000 samples I get an answer in 46 ms, though here are some common tips on making clojure code faster:
turn on reflection warnings:
(set! warn-on-reflection true)
add type hints until the reflection warnings go away, which isn't a problem here
make it a lazy sequence so you don't have to construct huge sequences in ram, which results in lots of GC overhead. (in some cases this adds overhead though I think it's a decent idea here)
in cases like this see if incanter can do the job for you (though that may be cheating)
time the creation of the result excluding the time it takes the repl to print it.

Code works in stk-simply but not in mit-scheme

I have been reading from the courses CS61A (Spring 2011) from Berkeley Opencourseware and MIT 6.001 from OCW. One uses STk (called as stk-simply) and the other uses mit-scheme as programming language for the lectures.
I just wrote a simple square root procedure using Heron's method. The file was saved as sqrt.scm
;; setting an accuracy or tolerance to deviation between the actual and the expected values
(define tolerance 0.001)
;; gives average of two numbers
(define (average x y)
(/ (+ x y) 2))
;; gives the absolute values of any number
(define (abs x)
(if (< x 0) (- x) x))
;; gives the square of a number
(define (square x) (* x x))
;; tests whether the guess is good enough by checking if the difference between square of the guess
;; and the number is tolerable
(define (good-enuf? guess x)
(< (abs (- (square guess) x)) tolerance))
;; improves the guess by averaging guess the number divided by the guess
(define (improve guess x)
(average (/ x guess) guess))
;; when a tried guess does not pass the good-enuf test then the tries the improved guess
(define (try guess x)
(if (good-enuf? guess x)
guess
(try (improve guess x) x)))
;; gives back square root of number by starting guess with 1 and then improving the guess until good-enuf
(define (sqr-root x)
(try 1 x))
This works fine in STk.
sam#Panzer:~/code/src/scheme$ sudo stk-simply
[sudo] password for sam:
Welcome to the STk interpreter version 4.0.1-ucb1.3.6 [Linux-2.6.16-1.2108_FC4-i686]
Copyright (c) 1993-1999 Erick Gallesio - I3S - CNRS / ESSI <eg#unice.fr>
Modifications by UCB EECS Instructional Support Group
Questions, comments, or bug reports to <inst#EECS.Berkeley.EDU>.
STk> (load "sqrt.scm")
okay
STk> (sqr-root 25)
5.00002317825395
STk>
But not in scheme.
sam#Panzer:~/code/src/scheme$ sudo scheme
[sudo] password for sam:
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Thursday October 27, 2011 at 7:44:21 PM
Release 9.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/i386 4.118 || Edwin 3.116
1 ]=> (load "sqrt.scm")
;Loading "sqrt.scm"... done
;Value: sqr-root
1 ]=> (sqr-root 25)
;Value: 1853024483819137/370603178776909
1 ]=>
I checked the manuals but I was unable to find the cause. Can anyone let me know why is this? Or is there an error in the code which I am ignoring?. I am a beginner in both scheme & STk.
Try this, now the code should return the same results in both interpreters (ignoring tiny rounding differences):
(define (sqr-root x)
(exact->inexact (try 1 x)))
The answer was the same all along, it's just that MIT Scheme is producing an exact result by default, whereas STk is returning an inexact value. With the above code in place, we convert the result to an inexact number after performing the calculation. Alternatively, we could perform the conversion from the beginning (possibly losing some precision in the process):
(define (sqr-root x)
(try 1 (exact->inexact x)))
This quote explains the observed behavior:
Scheme numbers are either exact or inexact. A number is exact if it was written as an exact constant or was derived from exact numbers using only exact operations. A number is inexact if it was written as an inexact constant, if it was derived using inexact ingredients, or if it was derived using inexact operations. Thus inexactness is a contagious property of a number.
You'd have both answers look the same if in MIT Scheme you had written:
(sqr-root 25.)
since 'inexactness' is 'sticky'

Efficiency: recursion vs loop

This is just curiosity on my part, but what is more efficient, recursion or a loop?
Given two functions (using common lisp):
(defun factorial_recursion (x)
(if (> x 0)
(* x (factorial_recursion (decf x)))
1))
and
(defun factorial_loop (x)
(loop for i from 1 to x for result = 1 then
(* result i) finally
(return result)))
Which is more efficient?
I don't even have to read your code.
Loop is more efficient for factorials. When you do recursion, you have up to x function calls on the stack.
You almost never use recursion for performance reasons. You use recursion to make the problem more simple.
Mu.
Seriously now, it doesn't matter. Not for examples this size. They both have the same complexity. If your code is not fast enough for you, this is probably one of the last places you'd look at.
Now, if you really want to know which is faster, measure them. On SBCL, you can call each function in a loop and measure the time. Since you have two simple functions, time is enough. If your program was more complicated, a profiler would be more useful. Hint: if you don't need a profiler for your measurements, you probably don't need to worry about performance.
On my machine (SBCL 64 bit), I ran your functions and got this:
CL-USER> (time (loop repeat 1000 do (factorial_recursion 1000)))
Evaluation took:
0.540 seconds of real time
0.536034 seconds of total run time (0.496031 user, 0.040003 system)
[ Run times consist of 0.096 seconds GC time, and 0.441 seconds non-GC time. ]
99.26% CPU
1,006,632,438 processor cycles
511,315,904 bytes consed
NIL
CL-USER> (time (loop repeat 1000 do (factorial_loop 1000)))
Evaluation took:
0.485 seconds of real time
0.488030 seconds of total run time (0.488030 user, 0.000000 system)
[ Run times consist of 0.072 seconds GC time, and 0.417 seconds non-GC time. ]
100.62% CPU
902,043,247 processor cycles
511,322,400 bytes consed
NIL
After putting your functions in a file with (declaim (optimize speed)) at the top, the recursion time dropped to 504 milliseconds and the loop time dropped to 475 milliseconds.
And if you really want to know what's going on, try dissasemble on your functions and see what's in there.
Again, this looks like a non-issue to me. Personally, I try to use Common Lisp like a scripting language for prototyping, then profile and optimize the parts that are slow. Getting from 500ms to 475ms is nothing. For instance, in some personal code, I got a couple of orders of magnitude speedup by simply adding an element type to an array (thus making the array storage 64 times smaller in my case). Sure, in theory it would have been faster to reuse that array (after making it smaller) and not allocate it over and over. But simply adding :element-type bit to it was enough for my situation - more changes would have required more time for very little extra benefit. Maybe I'm sloppy, but 'fast' and 'slow' don't mean much to me. I prefer 'fast enough' and 'too slow'. Both your functions are 'fast enough' in most cases (or both are 'too slow' in some cases) so there's no real difference between them.
If you can write recursive functions in such a way that the recursive call is the very last thing done (and the function is thus tail-recursive) and the language and compiler/interpreter you are using supports tail recursion, then the recursive function can (usually) be optimised into code that is really iterative, and is as fast as an iterative version of the same function.
Sam I Am is correct though, iterative functions are usually faster than their recursive counterparts. If a recursive function is to be as fast as an iterative function that does the same thing, you have to rely on the optimiser.
The reason for this is that a function call is much more expensive than a jump, plus you consume stack space, a (very) finite resource.
The function you give is not tail recursive because you call factorial_recursion and then you multiply it by x. An example of a tail-recursive version would be
(defun factorial-recursion-assist (x cur)
(if (> x 1)
(factorial-recursion-assist (- x 1) (+ cur (* (- x 1) x)))
cur))
(defun factorial-recursion (x)
(factorial-recursion-assist x 1))
(print (factorial-recursion 4))
Here's a tail-recursive factorial (I think):
(defun fact (x)
(funcall (alambda (i ret)
(cond ((> i 1)
(self (1- i) (* ret i)))
(t
ret)))
x 1))

Resources