Why is MIT-Scheme so slow in writing out data?

Why is MIT-Scheme so slow in writing out data? - scheme

In a larger program I'm writing out a small set (10^7) of numerical digits (0...9). This goes very slow with MIT-Scheme 10.1.10 on a 2.6GHz CPU, taking something like 2 minutes.
Probably I'm doing something wrong, like no buffering, but I'm pretty stuck after reading the reference guide. I reduced everything to the bare minimum:
(define (write-stuff port)
(define (loop cnt)
(if (> cnt 0)
(begin (display "0" port)
(loop (- cnt 1)))))
(loop 10000000))
(call-with-output-file "tmp.txt" write-stuff)
Any hints would be welcome...
[EDIT] To make things clear: the data-entries are unrelated to each other, and are stored in a 2D vector. They can be considered random, so I don't like to group them (it's either one-by-one or all-at-once). You can consider the data to be defined by something like
(define (data width height)
(make-initialized-vector width (lambda (x)
(make-initialized-vector height (lambda (x)
(list-ref (list #\0 #\1) (random 2)))))))
Apparently, the kernel/user-switch takes much time, so it's best to transform this to 1 string and write it out in 1 shot like #ceving suggested. Then it works fast enough for me, even though it's still 20s for 16MB.
(define (data->str data)
(string-append* (vector->list (vector-map vector->string data))))
(define dataset (data 4096 4096))
(call-with-output-file "test.txt" (lambda (p)
(display (data->str dataset) p)))

The problem is not that MIT-Scheme is so slow. The problem is, that you call the kernel function write excessively. Your program switches for every character from user mode to kernel mode. This takes much time. If you do the same in Bash it takes even longer.
Your Scheme version:
(define (write-stuff port)
(define (loop cnt)
(if (> cnt 0)
(begin (display "0" port)
(loop (- cnt 1)))))
(loop 10000000))
(call-with-output-file "mit-scheme-tmp.txt" write-stuff)
(exit)
The wrapper to run the Scheme version:
#! /bin/bash
mit-scheme --quiet --load mit-scheme-implementation.scm
On my system it takes about 1 minute:
$ time ./mit-scheme-implementation
real 1m3,981s
user 1m2,558s
sys 0m0,740s
The same for Bash:
#! /bin/bash
: > bash-tmp.txt
n=10000000
while ((n > 0)); do
echo -n 0 >> bash-tmp.txt
n=$((n - 1))
done
takes 2 minutes:
$ time ./bash-implementation
real 2m25,963s
user 1m33,704s
sys 0m50,750s
The solution is: do not execute 10 million kernel mode switches.
Execute just one (or at least 4096 times fewer):
(define (write-stuff port)
(display (make-string 10000000 #\0) port))
(call-with-output-file "mit-scheme-2-tmp.txt" write-stuff)
(exit)
And the program requires just 11 seconds.
$ time ./mit-scheme-implementation-2
real 0m11,390s
user 0m11,270s
sys 0m0,096s
This is the reason why buffering has been invented in the C library:
https://www.gnu.org/software/libc/manual/html_node/Stream-Buffering.html#Stream-Buffering

Related

measuring time in scheme

Is there a way to implement something like this
while time left
do something
where time is some variable set to x secs/minutes/hours in racket?
I could use some-constant to simulate time as in
(define (loop time)
(if (< time some-constant)
((do something)
(loop (- time 1)))
do-nothing))
but I would have to experiment to see what constant would give me one hour, etc.

Try the following:
(define (loop term-time)
(when (<= (current-seconds) term-time)
(begin <do something>
(loop term-time))))
Then you can invoke this with
(loop (+ (current-seconds) (* 60 60))) -- do it for one hour
If you just want to <do something> periodically, but doesn't want to do it at 100% CPU usage, in <do something>, you can include (sleep <secs>) to achieve this.

Drawing in MIT-Scheme very slow

I've recently begun working through MIT's SICP-course and I'm having trouble with one of the old assignments found here on the official website.
The file drawing.scm includes a procedure draw-connected that takes a number as it's argument and returns another function. This function in turn takes a curve as it's argument (In this case a curve is any procedure that returns a specific point-object depending on a parameter t it is given)
(define (draw-connected n)
(define window (make-graphics-device #f))
(let ((1/n (/ 1 n)))
(lambda (curve)
(define (iter x-old y-old count)
(let ((t (* count 1/n)))
(let ((ct (curve t)))
(let ((x-new (x-of ct))
(y-new (y-of ct)))
(graphics-draw-line
window
x-old
y-old
(exact->inexact x-new)
(exact->inexact y-new))
(if (>= count n)
'done
(iter x-new y-new (+ count 1)))))))
(graphics-clear window)
(let ((c0 (curve 0)))
(iter (x-of c0) (y-of c0) 1)))))
I slightly modified draw-connected so I could invoke it more easily (originally window was not defined in the procedure but given to it as a parameter).
Now, when I invoke this procedure by opening the respective .scm file in a buffer in edwin, evaluating it and then evaluating
((draw-connected 20) foo)
A separate window opens and after a delay of about 20 seconds a crude graph of the function foo consisting of n connected plot points appears. For greater values of n this takes even longer.
Is this normal? Is it just that draw-connected does not work efficiently?
Would it help if I somehow compiled the sourcecode first?
Would be great if someone could help me with this.

Print value -and- call function?

I am new to scheme, and have the following question:
If I want a function to also print -the value- of an expression and then call a function, how would one come up to doing that?
For example, I need the function foo(n) to print the value of n mod 2 and call foo(n/2), I would've done:
(define foo (lambda (n) (modulo n 2) (foo (/ n 2))))
But that, of course, would not print the value of n mod 2.

Here is something simple:
(define foo
(lambda (n)
(display (modulo n 2))
(when (positive? n)
(foo (/ n 2)))))
Note the check of (positive? n) to ensure that you avoid (/ 0 2) forever and ever.

I'm terrible at Lisp, but here's an idea: Maybe you could define a function that prints a value and returns it
(define (debug x) (begin (display x) (newline) x))
Then just call the function like
(some-fun (debug (some expression)))

As #Juho wrote, you need to add a display. But, your procedure is recursive without a base case, so it will never terminate.
Try this:
(define foo
(lambda (n)
(cond
((integer? n) (display (modulo n 2))
(newline)
(foo (/ n 2)))
(else n))))
then
> (foo 120)
0
0
0
1
7 1/2

Usually when dealing with more than one thing it's common to build lists to present a solution when the procedure is finished.
(define (get-digits number base)
(let loop ((nums '()) (cur number))
(if (zero? cur)
nums
(loop (cons (remainder cur base) nums)
(quotient cur base)))))
(get-digits 1234 10) ; ==> (1 2 3 4)
Now, since you use DrRacket you have a debugger so you can actually step though this code but you rather should try to make simple bits like this that is testable and that does not do side effects.
I was puzzled when you were taling about pink and blue output until I opened DrRacket and indeed there it was. Everything that is pink is from the program and everything blue is normally not outputed but since it's the result of top level forms in the IDE the REPL shows it anyway. The differences between them are really that you should not rely on blue output in production code.
As other has suggested you can have debug output with display within the code. I want to show you another way. Imagine I didn't know what to do with the elements so I give you the opportunity to do it yourself:
(define (get-digits number base glue)
(let loop ((nums '()) (cur number))
(if (zero? cur)
nums
(loop (glue (remainder cur base) nums)
(quotient cur base)))))
(get-digits 1234 10 cons) ; ==> (1 2 3 4)
(define (debug-glue a d)
(display a)
(newline)
(cons a d))
(get-digits 1234 10 debug-glue) ; ==> (1 2 3 4) and displays "4\n3\n2\n1\n"

Improving performance for converting numbers to lists, and base10 to base2

Many Project Euler problems require manipulating integers and their digits, both in base10 and base2. While I have no problem with converting integers in lists of digits, or converting base10 into base2 (or lists of their digits), I often find that performance is poor when doing such conversions repeatedly.
Here's an example:
First, here are my typical conversions:
#lang racket
(define (10->bin num)
(define (10->bin-help num count)
(define sq
(expt 2 count))
(cond
[(zero? count) (list num)]
[else (cons (quotient num sq) (10->bin-help (remainder num sq) (sub1 count)))]
)
)
(member 1 (10->bin-help num 19)))
(define (integer->lon int)
(cond
[(zero? int) empty]
[else (append (integer->lon (quotient int 10)) (list (remainder int 10)))]
)
)
Next, I need a function to test whether a list of digits is a palindrome
(define (is-palindrome? lon)
(equal? lon (reverse lon)))
Finally, I need to sum all base10 integers below some max that are palindromes in base2 and base10. Here's the accumulator-style function:
(define (sum-them max)
(define (sum-acc count acc)
(define base10
(integer->lon count))
(define base2
(10->bin count))
(cond
[(= count max) acc]
[(and
(is-palindrome? base10)
(is-palindrome? base2))
(sum-acc (add1 count) (+ acc count))]
[else (sum-acc (add1 count) acc)]))
(sum-acc 1 0))
And the regular recursive version:
(define (sum-them* max)
(define base10
(integer->lon max))
(define base2
(10->bin max))
(cond
[(zero? max) 0]
[(and
(is-palindrome? base10)
(is-palindrome? base2))
(+ (sum-them* (sub1 max)) max)]
[else (sum-them* (sub1 max))]
)
)
When I apply either of these two last functions to 1000000, I takes well over 10 seconds to complete. The recursive version seems a bit quicker than the accumulator version, but the difference is negligible.
Is there any way I can improve this code, or do I just have to accept that this is the style of number-crunching for which Racket isn't particularly suited?
So far, I have considered the possibility of replacing integer->lon by a similar integer->vector as I expect vector-append to be faster than append, but then I'm stuck with the need to apply reverse later on.

Making your existing code more efficient
Have you considered getting the list of bits using any of Racket's bitwise operations? E.g.,
(define (bits n)
(let loop ((n n) (acc '()))
(if (= 0 n)
acc
(loop (arithmetic-shift n -1) (cons (bitwise-and n 1) acc)))))
> (map bits '(1 3 4 5 7 9 10))
'((1) (1 1) (1 0 0) (1 0 1) (1 1 1) (1 0 0 1) (1 0 1 0))
It'd be interesting to see whether that speeds anything up. I expect it would help a bit, since your 10->bin procedure currently makes a call to expt, quotient, and remainder, whereas bit shifting, depending on the representations used by the compiler, would probably be more efficient.
Your integer->lon is also using a lot more memory than it needs to, since the append is copying most of the result at each step. This is kind of interesting, because you were already using the more memory efficient approach in bin->10. Something like this is more efficient:
(define (digits n)
(let loop ((n n) (acc '()))
(if (zero? n)
acc
(loop (quotient n 10) (cons (remainder n 10) acc)))))
> (map digits '(1238 2391 3729))
'((1 2 3 8) (2 3 9 1) (3 7 2 9))
More efficient approaches
All that said, perhaps you should consider the approach that you're using. It appears that right now, you're iterating through the numbers 1…MAX, checking whether each one is a palindrome, and if it is, adding it to the sum. That means you're doing something with MAX numbers, all in all. Rather than checking for palindromic numbers, why not just generate them directly in one base and then check whether they're a palindrome in the other. I.e., instead of of checking 1…MAX, check:
1
11
101, and 111
1001, and 1111
10001, 10101, 11011, and 11111,
and so on, up until the numbers are too big.
This list is all the binary palindromes, and only some of those will be decimal palindromes. If you can generate the binary palindromes using bit-twiddling techniques (so you're actually working with the integers), it's easy to write those to a string, and checking whether a string is a palindrome is probably much faster than checking whether a list is a palindrome.

Are you running these timings in DrRacket by any chance? The IDE slows down things quite a bit, especially if you happen to have debugging and/or profiling turned on, so I'd recommend doing these tests from the command line.
Also, you can usually improve the brute-force approach. For example, you can say here that we only have to consider odd numbers, because even numbers are never a palindrome when expressed as binaries (a trailing 0, but the way you represent them there's never a heading 0). This divides the execution time by 2 regardless of the algorithm.
Your code runs on my laptop in 2.4 seconds. I wrote an alternative version using strings and build-in functions that runs in 0.53 seconds (including Racket startup; execution time in Racket is 0.23 seconds):
#!/usr/bin/racket
#lang racket
(define (is-palindrome? lon)
(let ((lst (string->list lon)))
(equal? lst (reverse lst))))
(define (sum-them max)
(for/sum ((i (in-range 1 max 2))
#:when (and (is-palindrome? (number->string i))
(is-palindrome? (number->string i 2))))
i))
(time (sum-them 1000000))
yields
pu#pumbair: ~/Projects/L-Racket time ./speed3.rkt
cpu time: 233 real time: 233 gc time: 32
872187
real 0m0.533s
user 0m0.472s
sys 0m0.060s
and I'm pretty sure that people with more experience in Racket profiling will come up with faster solutions.
So I could give you the following tips:
Think about how you may improve the brute force approach
Get to know your language better. Some constructs are faster than others for no apparent reason
see http://docs.racket-lang.org/guide/performance.html and http://jeapostrophe.github.io/2013-08-19-reverse-post.html
use parallelism when applicable
Get used to the Racket profiler
N.B. Your 10->bin function returns #f for the value 0, I guess it should return '(0).

Chez Scheme allocation: --program vs --script

Consider this bit of Chez Scheme code:
(import (chezscheme))
(define (list-enumerate ls val proc)
(let loop ((ls ls) (return? #f) (val val))
(if (or (null? ls)
return?)
val
(call-with-values (lambda () (proc val (car ls)))
(lambda (return? val)
(loop (cdr ls) return? val))))))
(define (list-index ls proc)
(list-enumerate ls
0
(lambda (i elt)
(if (proc elt)
(values #t i)
(values #f (+ i 1))))))
(define n 100000)
(define data (iota n))
(time (list-index data (lambda (elt) (= elt (- n 1)))))
Run it:
~ $ scheme --script ~/scratch/_list-enumerate-allocation-test-chez-a.sps
(time (list-index data ...))
no collections
3 ms elapsed cpu time
4 ms elapsed real time
8 bytes allocated
Wow, it reports that only 8 bytes were allocated.
Let's run it again using the --program option instead of --script:
~ $ scheme --program ~/scratch/_list-enumerate-allocation-test-chez-a.sps
(time (list-index data ...))
no collections
3 ms elapsed cpu time
3 ms elapsed real time
800000 bytes allocated
Yikes, 800000 bytes allocated.
What's up with the difference?
Ed

Here's a note from Kent Dybvig in response:
That's an interesting question.
When run with --script, which uses the REPL semantics, the variables
defined in the script, like list-enumerate and list-index, are mutable,
which inhibits interprocedural optimizations, including inlining. When
run with --program, however, the variables are immutable, which allows
interprocedural optimizations.
In this case, --program allows the compiler to inline list-enumerate into
list-index's body and in turn the lambda expression within list-index's
body into list-enumerate's body. The end result is a conditional
expression within the call-with-values producer expression. This causes
the compiler to create a closure for the consumer, to avoid code
duplication along the then and else branches of the conditional. This
closure is created each time through list-enumerate's loop, resulting in
the extra allocation overhead. That's the way optimizations often go.
Mostly you win, but sometimes you lose. The good news is, on balance, the
benefits outweight he costs, even in your program. I put the call to
list-index in a loop (the modified code is below) and found that that with
--program, the code runs about 30% faster.
Kent
(import (chezscheme))
(define (list-enumerate ls val proc)
(let loop ((ls ls) (return? #f) (val val))
(if (or (null? ls)
return?)
val
(call-with-values (lambda () (proc val (car ls)))
(lambda (return? val)
(loop (cdr ls) return? val))))))
(define (list-index ls proc)
(list-enumerate ls
0
(lambda (i elt)
(if (proc elt)
(values #t i)
(values #f (+ i 1))))))
(define n 100000)
(define data (time (iota n)))
(let ()
(define runalot
(lambda (i thunk)
(let loop ([i i])
(let ([x (thunk)])
(if (fx= i 1)
x
(loop (fx- i 1)))))))
(time
(runalot 1000
(lambda ()
(list-index data (lambda (elt) (= elt (- n 1))))))))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why is MIT-Scheme so slow in writing out data? - scheme

Related

measuring time in scheme

Drawing in MIT-Scheme very slow

Print value -and- call function?

Improving performance for converting numbers to lists, and base10 to base2

Chez Scheme allocation: --program vs --script

Categories

Resources