MIT Scheme - Merge Sort + Timing Execution - scheme

I've implemented my own merge sort in MIT Scheme. I want to test it against the builtin merge-sort and compare times; however, I don't know how to get the run time of both. Also how do you increase the stack size/recursion depth as i'm testing up to 1 million elements.

There's a bunch of timing procedures in MIT Scheme, check the documentation. In particular, try this one:
(with-timings
(lambda ()
(merge-sort '(1 2 3 4 5) >))
(lambda (run-time gc-time real-time)
(write (internal-time/ticks->seconds run-time))
(write-char #\space)
(write (internal-time/ticks->seconds gc-time))
(write-char #\space)
(write (internal-time/ticks->seconds real-time))
(newline)))
The built-in sort shouldn't have a problem with one million elements, if your own implementation is a good one, it shouldn't have problems producing a result with that data size.

Related

How do I load my file at DrRacket

I am a undergraduate who wants to go through "The Scheme programming language" as a self-study.
Here is a simple program and I named it as "reciprocal.ss"
(define reciprocal
(lambda (n)
(if(= n 0)
"oops!"
(/ 1 n))))
Then I wanted to load my procedure:
(load "reciprocal.ss")
It produces this error:
reciprocal.ss:1:0: #%top-interaction: unbound identifier;
also, no #%app syntax transformer is bound in: #%top-interaction
I did each parts as what the book says. Perhaps I am just making a rookie mistake. Any insight would be appreciated.
Since load uses eval, using it outside of a REPL generally will not work — for reasons described in Namespaces
Using racket/load can work for you here however:
loader.ss
#lang racket/load
(load "reciprocal.ss")
(display (reciprocal 10))
reciprocal.ss
(define reciprocal
(lambda (n)
(if (= n 0) "oops!"
(/ 1 n))))
In Racket (and Scheme at large) has a more complex story than the average language regarding running external code. In general, you should use import when you want to directly 'include' a file, you should use provide/require when you want to establish module boundaries and you should use load when you are sophisticated enough to be stretching the limits of either.
The simplest approach is not to use load at all.
In "reciprocal.ss" make the first lines:
#lang racket
(provide (all-defined-out))
(define reciprocal
(lambda (n)
(if (= n 0)
"oops!"
(/ 1 n))))
Then use (require "reciprocal.ss") in the file where you need to use the function reciprocal.
The load mechanism was used back in the good old days before module systems had arrived. Writing (load "foo.ss") basically works as if you manually pasted the contents of foo.ss into the repl and excecuted it. This means that the result of your program is dependent of the order of loading files (if you are using side effects). Module systems handle this (and other things too) much better.

Mit-scheme, recursion error with built-in filter function

(filter even? (numb-2tx 100000))
;Aborting!: maximum recursion depth exceeded
;;numb-2tx generates a list from 2 to x, even for very large values of x (tested with 2000000)
When I try to apply the filter function to very long lists (>40,000 or so) I run into the maximum recursion depth error.
Is there a similar built-in that doesn't run into this problem, or will I have to come up with a tail-recursive equivalent on my own?
Start MIT Scheme with the --stack option. Like this:
$ mit-scheme --stack 10000
Here was my result with the out-of-the-box stack and also with a stack of 1000:
> (length (filter even? (iota 1000000)))
;Aborting!: maximum recursion depth exceeded
Then, after using --stack 10000:
> (length (filter even? (iota 1000000)))
;Value: 500000
It is somewhat disturbing to know that filter has this apparently non-tail-recursive behavior.

How to map a function over a list in parallel in racket?

The question title says it all, really: What is the best way map a function over a list in parallel in racket? Thanks.
If you mean over multiple processor cores, then the most general approach is to use Places.
Places enable the development of parallel programs that take advantage of machines with multiple processors, cores, or hardware threads.
A place is a parallel task that is effectively a separate instance of the Racket virtual machine. Places communicate through place channels, which are endpoints for a two-way buffered communication.
You might be able to use the other parallelization technique, Futures, but the conditions for it to work are relatively limited, for example floating point operations, as described here.
EDIT: In response to the comment:
Is there an implementation of parallel-map using places somewhere?
First, I should back up. You might not need Places. You can get concurrency using Racket threads. For example, here's a map/thread:
#lang racket
(define (map/thread f xs)
;; Make one channel for each element of xs.
(define cs (for/list ([x xs])
(make-channel)))
;; Make one thread for each elemnet of xs.
;; Each thread calls (f x) and puts the result to its channel.
(for ([x xs]
[c cs])
(thread (thunk (channel-put c (f x)))))
;; Get the result from each channel.
;; Note: This will block on each channel if not yet ready.
(for/list ([c cs])
(channel-get c)))
;; Use:
(define xs '(1 2 3 4 5))
(map add1 xs)
(map/thread add1 xs)
If the work being done involves blocking, e.g. I/O requests, this will give you "parallelism" in the sense of not being stuck on I/O. However Racket threads are "green" threads so only one at a time will be using the CPU.
If you truly need parallel use of multiple CPU cores, then you would need Futures or Places.
Due to the way Places are implemented --- effectively as multiple instances of Racket --- I don't immediately see how to write a generic map/place. For examples of using places in a "bespoke" way, see:
Places in the Guide
Places in the Reference
I don't know in Racket, but I implemented a version in SISC Scheme.
(define (map-parallel fn lst)
(call-with-values (lambda ()
(apply parallel
(map (lambda (e)
(delay (fn e)))
lst)))
list))
Only parallel is not R5RS.
Example of use:
Using a regular map:
(time (map (lambda (a)
(begin
(sleep 1000)
(+ 1 a)))
'(1 2 3 4 5)))
=> ((2 3 4 5 6) (5000 ms))
Using the map-parallel:
(time (map-parallel (lambda (a)
(begin
(sleep 1000)
(+ 1 a)))
'(1 2 3 4 5)))
=> ((2 3 4 5 6) (1000 ms))

Common Lisp: What is the downside to using this filter function on very large lists?

I want to filter out all elements of list 'a from list 'b and return the filtered 'b. This is my function:
(defun filter (a b)
"Filters out all items in a from b"
(if (= 0 (length a)) b
(filter (remove (first a) a) (remove (first a) b))))
I'm new to lisp and don't know how 'remove does its thing, what kind of time will this filter run in?
There are two ways to find out:
you could test it with data
you could analyze your source code
Let's look at the source code.
lists are built of linked cons cells
length needs to walk once through a list
for EVERY recursive call of FILTER you compute the length of a. BAD!
(Use ENDP instead.)
REMOVE needs to walk once through a list
for every recursive call you compute REMOVE twice: BAD!
(Instead of using REMOVE on a, recurse with the REST.)
the call to FILTER will not necessarily be an optimized tail call.
In some implementations it might, in some you need to tell the compiler
that you want to optimize for tail calls, in some implementations
no tail call optimization is available. If not, then you get a stack
overflow on long enough lists.
(Use looping constructs like DO, DOLIST, DOTIMES, LOOP, REDUCE, MAPC, MAPL, MAPCAR, MAPLIST, MAPCAN, or MAPCON instead of recursion, when applicable.)
Summary: that's very naive code with poor performance.
Common Lisp provides this built in: SET-DIFFERENCE should do what you want.
http://www.lispworks.com/documentation/HyperSpec/Body/f_set_di.htm#set-difference
Common Lisp does not support tail-call optimization (as per the standard) and you might just run out of memory with an abysmal call-stack (depending on the implementation).
I would not write this function, becuase, as Rainer Joswig says, the standard already provides SET-DIFFERENCE. Nonetheless, if I had to provide an implementation of the function, this is the one I would use:
(defun filter (a b)
(let ((table (make-hash-table)))
(map 'nil (lambda (e) (setf (gethash e table) t)) a)
(remove-if (lambda (e) (gethash e table)) b)))
Doing it this way provides a couple of advantages, the most important one being that it only traverses b once; using a hash table to keep track of what elements are in a is likely to perform much better if a is long.
Also, using the generic sequence functions like MAP and REMOVE-IF mean that this function can be used with strings and vectors as well as lists, which is an advantage even over the standard SET-DIFFERENCE function. The main downside of this approach is if you want extend the function with a :TEST argument that allows the user to provide an equality predicate other than the default EQL, since CL hash-tables only work with a small number of pre-defined equality predicates (EQ, EQL, EQUAL and EQUALP to be precise).
(defun filter (a b)
"Filters out all items in a from b"
(if (not (consp a)) b
(filter (rest a) (rest b))))

Generating list of million random elements

How to efficiently generate a list of million random elements in scheme? The following code hits maximum recursion depth with 0.1 million itself.
(unfold (lambda(x)(= x 1000000)) (lambda(x)(random 1000)) (lambda(x)(+ x 1)) 0)
It really depends on the system you're using, but here's a common way to do that in plain scheme:
(let loop ([n 1000000] [r '()])
(if (zero? n)
r
(loop (- n 1) (cons (random 1000) r))))
One note about running this code as is: if you just type it into a REPL, it will lead to printing the resulting list, and that will usually involve using much more memory than the list holds. So it's better to do something like
(define l ...same...)
There are many other tools that can be used to varying degrees of convenience. unfold is one of them, and another is for loops as can be found in PLT Scheme:
(for/list ([i (in-range 1000000)]) (random 1000))
I don't know much scheme but couldn't you just use tail-recursion (which is really just looping) instead of unfold (or any other higher-order function)?
Use the do-loop-construct as described here.
Some one correct me if I am wrong but the Fakrudeen's code should end up being optimized away since it is tail recursive. Or it should be with a proper implementation of unfold. It should never reach a maximum recursion depth.
What version of scheme are you using Fakrudeen?
DrScheme does not choke on a mere million random numbers.
Taking Chicken-Scheme as implementation, here is a try with some results.
(use srfi-1)
(use extras)
(time (unfold (lambda(x)(= x 1000000))
(lambda(x)(random 1000))
(lambda(x)(+ x 1)) 0))
(time (let loop ([n 1000000] [r '()])
(if (zero? n)
r
(loop (- n 1) (cons (random 1000) r)))))
(define (range min max body)
(let loop ((current min) (ret '()))
(if (= current max)
ret
(loop (+ current 1) (cons (body current ret) ret)))))
(time (range 0 1000000 (lambda params (random 1000))))
The results are here with csc -O3 t.scm
0.331s CPU time, 0.17s GC time (major), 12/660 GCs (major/minor)
0.107s CPU time, 0.02s GC time (major), 1/290 GCs (major/minor)
0.124s CPU time, 0.022s GC time (major), 1/320 GCs (major/minor)
As you can see, the version of the author is much more slowlier than using plain tail recursive calls. It's hard to say why the unfold call is much more slowlier but I'd guess that it's because it taking a lot more time doing function calls.
The 2 other versions are quite similar. My version is almost the same thing with the exception that I'm creating a high order function that can be reused.
Unlike the plain loop, it could be reused to create a range of function. The position and current list is sent to the function in case they are needed.
The higher order version is probably the best way to do even if it takes a bit more time to execute. It is probably also because of the function calls. It could be optimized by removing parameters and it will get almost as fast as the named let.
The advantage of the higher order version is that the user doesn't have to write the loop itself and can be used with an abstract lambda function.
Edit
Looking at this specific case. Ef we are to create a million of element ranged between 0 and 999, we could possibly create a fixed length vector of a million and with values from 0 to 999 in it. Shuffle the thing back after. Then the whole random process would depend on the shuffle function which should not have to create new memory swapping values might get faster than generating random numbers. That said, the shuffle method somewhat still rely on random.
Edit 2
Unless you really need a list, you could get away with a vector instead.
Here is my second implementation with vector-map
(time (vector-map (lambda (x y) (random 1000)) (make-vector 1000000)))
# 0.07s CPU time, 0/262 GCs (major/minor)
As you can see, it is terribly faster than using a list.
Edit 3 fun
(define-syntax bigint
(er-macro-transformer
(lambda (exp rename compare)
(let ((lst (map (lambda (x) (random 1000)) (make-list (cadr exp)))))
(cons 'list lst)))))
100000
0.004s CPU time, 0/8888 GCs (major/minor)
It's probably not a good idea to use this but I felt it might be interesting. Since it's a macro, it will get executed at compile time. The compile time will be huge, but as you can see, the speed improvement is also huge. Unfortunately using chicken, I couldn't get it to build a list of a million. My guess is that the type it might use to build the list is overflowing and accessing invalid memory.
To answer the question in the comments:
I'm not a Scheme professional. I'm pretty new to it too and as I understand, the named loop or the high order function should be the way to go. The high order function is good because it's reusable. You could define a
(define (make-random-list quantity maxran)
...)
Then thats the other interesting part, since scheme is all about high order functions. You could then replace the implementation of make-random-list with anything you like. If you need some compile time execution, define the macro otherwise use a function. All that really matters is to be able to reuse it. It has to be fast and not use memory.
Common sense tells you that doing less execution it will be faster, tail recursive calls aren't suppose to consume memory. And when you're not sure, you can hide implementation into a function that can be optimized later.
MIT Scheme limits a computation's stack. Given the size of your problem, you are likely running out of stack size. Fortunately, you can provide a command-line option to change the stack size. Try:
$ mit-scheme --stack <number-of-1024-word-blocks>
There are other command-line options, check out mit-scheme --help
Note that MIT Scheme, in my experience, is one of the few schemes that has a limited stack size. This explains why trying your code in others Schemes will often succeed.
As to your question of efficiency. The routine unfold is probably not implemented with a tail-recursive/iterative algorithm. Here is a tail recursive version with a tail recursive version of 'list reverse in-place':
(define (unfold stop value incr n0)
(let collecting ((n n0) (l '()))
(if (stop n)
(reverse! l)
(collecting (incr n) (cons (value n) l)))))
(define (reverse! list)
(let reving ((list list) (rslt '()))
(if (null? list)
rslt
(let ((rest (cdr list)))
(set-cdr! list rslt)
(reving rest list)))))
Note:
$ mit-scheme --version
MIT/GNU Scheme microcode 15.3
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even
for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Tuesday November 8, 2011 at 10:45:46 PM
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118 || Edwin 3.116
Moriturus te saluto.

Resources