Improve speed of string manipulations - performance

This is a follow-up question, sort of, to this one: Write an efficient string replacement function? .
In (albeit distant) future I hope to get to do natural language processing. Of course speed of strings manipulation is important because of that. Accidentally, I've stumbled over this test: http://raid6.com.au/~onlyjob/posts/arena/ - all tests are biased, this is no exception. However, it raised important question for me. And so I wrote a few tests to see how am I doing:
This was my first attempt (I'll call it #A):
#A
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln = (* (length addidtion) i)
for accumulated = addidtion
then (loop with concatenated = (concatenate 'string accumulated addidtion)
for start = (search "efgh" concatenated)
while start do (replace concatenated "____" :start1 start)
finally (return concatenated))
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Baffled with the results, I tried to use cl-ppcre - I don't know what I was hoping for, but the results came out as really bad... Here's the code I used for testing:
#B
(ql:quickload "cl-ppcre")
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln = (* (length addidtion) i)
for accumulated = addidtion
then (cl-ppcre:regex-replace-all "efgh" (concatenate 'string accumulated addidtion) "____")
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Well, then, in hopes to maybe side-step some generalizations, I decided to write my own, albeit somewhat naive version:
#C
(defun replace-all (input match replacement)
(declare (type string input match replacement)
(optimize (debug 0) (safety 0) (speed 3)))
(loop with pattern fixnum = (1- (length match))
with i fixnum = pattern
with j fixnum = i
with len fixnum = (length input) do
(cond
((>= i len) (return input))
((zerop j)
(loop do
(setf (aref input i) (aref replacement j) i (1+ i))
(if (= j pattern)
(progn (incf i pattern) (return))
(incf j))))
((char= (aref input i) (aref match j))
(decf i) (decf j))
(t (setf i (+ i 1 (- pattern j)) j pattern)))))
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion string = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i fixnum from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln fixnum = (* (length addidtion) i)
for accumulated string = addidtion
then (replace-all (concatenate 'string accumulated addidtion) "efgh" "____")
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Almost as slow as cl-ppcre! Now, that's incredible! There isn't anything I can spot here such that would result in such poor performance... And still it does suck :(
Realizing that the standard functions performed the best so far, I looked into SBCL source and after some reading I came up with this:
#D
(defun replace-all (input match replacement &key (start 0))
(declare (type simple-string input match replacement)
(type fixnum start)
(optimize (debug 0) (safety 0) (speed 3)))
(loop with input-length fixnum = (length input)
and match-length fixnum = (length match)
for i fixnum from 0 below (ceiling (the fixnum (- input-length start)) match-length) do
(loop with prefix fixnum = (+ start (the fixnum (* i match-length)))
for j fixnum from 0 below match-length do
(when (<= (the fixnum (+ prefix j match-length)) input-length)
(loop for k fixnum from (+ prefix j) below (the fixnum (+ prefix j match-length))
for n fixnum from 0 do
(unless (char= (aref input k) (aref match n)) (return))
finally
(loop for m fixnum from (- k match-length) below k
for o fixnum from 0 do
(setf (aref input m) (aref replacement o))
finally
(return-from replace-all
(replace-all input match replacement :start k))))))
finally (return input)))
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion string = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i fixnum from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln fixnum = (* (length addidtion) i)
for accumulated string = addidtion
then (replace-all (concatenate 'string accumulated addidtion) "efgh" "____")
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Finally, I can win, although a tiny fraction of performance against the standard library - yet it is still very-very bad compared to almost everything else...
Here's the table with the results:
| SBCL #A | SBCL #B | SBCL #C | SBCL #D | C gcc 4 -O3 | String size |
|-----------+-----------+------------+-----------+-------------+-------------|
| 17.463 s | 166.254 s | 28.924 s | 16.46 s | 1 s | 256 Kb |
| 68.484 s | 674.629 s | 116.55 s | 63.318 s | 4 s | 512 Kb |
| 153.99 s | gave up | 264.927 s | 141.04 s | 10 s | 768 Kb |
| 275.204 s | . . . . . | 474.151 s | 251.315 s | 17 s | 1024 Kb |
| 431.768 s | . . . . . | 745.737 s | 391.534 s | 27 s | 1280 Kb |
| 624.559 s | . . . . . | 1079.903 s | 567.827 s | 38 s | 1536 Kb |
Now, the question: What did I do wrong? Is this something inherent to Lisp strings? Can this probably be mitigated through... what?
In the long shot, I'd even consider writing a specialized library for string processing. If the problem isn't my bad code, but rather the implementation. Would it make sense to do so? If yes, what language would you suggest for doing it?
EDIT: Just for the record, I'm now trying to use this library: https://github.com/Ramarren/ropes to deal with strings concatenation. Unfortunately, it doesn't have a replace function in it and doing multiple replaces isn't very trivial. But I'll keep this post updated when I have something.
I've tried to slightly change huaiyuan's variant to use array's fill-pointers instead of string concatenation (to achieve something similar to StringBuilder suggested by Paulo Madeira. It probably can be optimized further, but I'm not sure about the types / which will method be faster / will it be worth to redefine types for * and + to get them to only operate on fixnum or signed-byte. Anyway, here's the code and the benchmark:
(defun test/e ()
(declare (optimize speed))
(labels ((min-power-of-two (num)
(declare (type fixnum num))
(decf num)
(1+
(progn
(loop for i fixnum = 1 then (the (unsigned-byte 32) (ash i 1))
while (< i 17) do
(setf num
(logior
(the fixnum
(ash num (the (signed-byte 32)
(+ 1 (the (signed-byte 32)
(lognot i)))))) num))) num)))
(join (x y)
(let ((capacity (array-dimension x 0))
(desired-length (+ (length x) (length y)))
(x-copy x))
(declare (type fixnum capacity desired-length)
(type (vector character) x y x-copy))
(when (< capacity desired-length)
(setf x (make-array
(min-power-of-two desired-length)
:element-type 'character
:fill-pointer desired-length))
(replace x x-copy))
(replace x y :start1 (length x))
(setf (fill-pointer x) desired-length) x))
(seek (old str pos)
(let ((q (position (aref old 0) str :start pos)))
(and q (search old str :start2 q))))
(subs (str old new)
(loop for p = (seek old str 0) then (seek old str p)
while p do (replace str new :start1 p))
str))
(declare (inline min-power-of-two join seek subs)
(ftype (function (fixnum) fixnum) min-power-of-two))
(let* ((builder
(make-array 16 :element-type 'character
:initial-contents "abcdefghefghefgh"
:fill-pointer 16))
(ini (get-internal-real-time)))
(declare (type (vector character) builder))
(loop for i fixnum below (+ 1000 (* 4 1024 1024 (/ (length builder))))
for j = builder then
(subs (join j builder) "efgh" "____")
for k fixnum = (* (length builder) i)
when (= 0 (mod k (* 1024 256)))
do (format t "~&~8,2F sec ~8D kB"
(/ (- (get-internal-real-time) ini) 1000)
(/ k 1024))))))
1.68 sec 256 kB
6.63 sec 512 kB
14.84 sec 768 kB
26.35 sec 1024 kB
41.01 sec 1280 kB
59.55 sec 1536 kB
82.85 sec 1792 kB
110.03 sec 2048 kB

The bottle-neck is the search function, which is perhaps not optimized in SBCL. The following version uses position to help it skip over impossible region and is about 10 times as fast as your version #A on my machine:
(defun test/e ()
(declare (optimize speed))
(labels ((join (x y)
(concatenate 'simple-base-string x y))
(seek (old str pos)
(let ((q (position (char old 0) str :start pos)))
(and q (search old str :start2 q))))
(subs (str old new)
(loop for p = (seek old str 0) then (seek old str p)
while p do (replace str new :start1 p))
str))
(declare (inline join seek subs))
(let* ((str (join "abcdefgh" "efghefgh"))
(ini (get-internal-real-time)))
(loop for i below (+ 1000 (* 4 1024 1024 (/ (length str))))
for j = str then (subs (join j str) "efgh" "____")
for k = (* (length str) i)
when (= 0 (mod k (* 1024 256)))
do (format t "~&~8,2F sec ~8D kB"
(/ (- (get-internal-real-time) ini) 1000)
(/ k 1024))))))

The tests in that page are indeed biased, so let's see by how much. The author claims to test string manipulation, but here's what the programs in that page test:
String concatenation
Memory management, either explicit (C) or implicit
In some languages, regular expressions
In others, string search algorithms and substring replacement
Memory access, which has bounds checks on several languages
There are way too many aspects just here. Here's how it's being measured:
Real time in seconds
This is unfortunate, since the computer had to be completely dedicated to running just this test for reasonable values, without any other processes whatsoever, such as services, antiviruses, browsers, even a waiting *nix shell. CPU time would be much more useful, you could even run the tests in a virtual machine.
Another aspect is that characters in C, C++, Perl, Python, PHP and Ruby are 8-bit, but they're 16-bit in many of the other tested languages. This means that memory usage is stressed in very different amounts, by at least a factor of 2. Here, cache misses are much more noticeable.
I suspect the reason Perl is so fast is that it checks its arguments once before invoking a C function, instead of constantly checking bounds. Other languages with 8-bit strings are not so fast, but are still reasonably fast.
JavaScript V8 has strings that are ASCII if possible, so if the appended and replaced token was "ëfgh", you'd pay a lot more in that implementation.
Python 3 is almost three times slower than Python 2, and my guess is it's due to the wchar_t * vs char * internal representation of strings.
JavaScript SpiderMonkey uses 16-bit strings. I didn't dig the sourced much, but the file jsstr.h mentions ropes.
Java is so slow because Strings are immutable, and so for this benchmark, it's definitely not the appropriate data type. You're paying the price of generating a huge string after each .replace(). I haven't tested, but probably a StringBuffer would be much faster.
So, this benchmark is to be taken not only with a grain of salt, but with a handful of it.
In Common Lisp, bounds checking and type dispatching in aref and its setf are probably the bottlenecks.
For good performance, you would have to ditch generic string sequences and use simple-strings or simple-vectors, whichever your implementation optimizes best. Then, you should have a way of making calls to schar or svref and their setfable forms that bypass bounds checking. From here, you can implement your own simple-string-search or simple-character-vector-search (and replace-simple-string or replace-simple-vector, although they play a much smaller role in this particular example) with full speed optimization and type declarations, with bounds checking at the head of each call instead of at each array access.
A sufficiently smart compiler™ would do all of this for you given "proper" declarations. The problem is, you'd have to use (concatenate 'simple-string/simple-vector ...), because neither simple strings nor simple vectors are adjustable.
With a compacting/moving GC, there's always a penalty in these cases (e.g. array/object copying), and choosing between array adjustment and concatenation must really depend on profiling tests. Otherwise, adjustment can be way faster than concatenation, while there's enough free memory to grow the array.
You could use adjustable arrays, if the implementation would access the actual elements directly after a brief bounds checking at the head of optimized calls to/expansions of search and replace with adjustable arrays (e.g. by having internal definitions that take the actual displaced vector/array and start and end offsets).
But I'm speculating a lot here, you have to compile, inspect the compilation and profile in each implementation for real-world facts.
As a side note, the C example code is full of bugs, such as off-by-one (-1, actually) allocations (the strcat calls write an extra byte, the zero-terminated string terminator), an uninitialized zero-terminated string gstr (the first strcat works by luck, because the memory might not be initialized to 0), conversions from size_t and time_t to int and assumption of these types in a printf format string, an unused variable pos_c that is initialized with the first allocation for gstr which is incremented without taking into account that realloc may move the buffer, and no error handling whatsoever.

Related

First power of 2 that has leading decimal digits of 12 in Common Lisp

In the webpage of Rosetta Code (http://rosettacode.org/wiki/First_power_of_2_that_has_leading_decimal_digits_of_12#C) there is the following challenge: Find the first power of 2 that has leading decimal digits of 12.
I have written several versions in Common Lisp, but I fail miserably in terms of performance compared to what other languages report.
The following code is one of the many versions that I have tried.
(defun log10 (x) (/ (log x) (log 10)))
(defun first-digits (n l)
(let* ((len-n (1+ (floor (log10 n))))
(tens (expt 10 (- len-n l))) )
(truncate n tens) ))
(defun p (rem n)
(do* ((len-rem (1+ (floor (log10 rem))))
(i 0 (1+ i))
(k 1 (* 2 k)) )
((= n 0) (1- i))
(when (= rem (first-digits k len-rem))
(decf n) )))
The performance is really poor, but I refuse to admit that Common Lisp is slower than any competitors. Any idea of how to achieve the run time of a few seconds reported by C#, C++, etc.?
Here is an answer which is now substantially equivalent to yours (based on the idea in this maths SE answer, but a little more general and a little more optimized. In particular:
it lets you specify both the number you want to raise to a power;
it lets you specify the base you want to work in (so not just 10);
it knows that (log x b) is the log of x in base b;
it computes the log you need just once;
it reorganizes some of the float operations to avoid overflows (at some point your version will call (expt 10 n) with a big enough n thay you'll get a floating point overflow.
I think (in fact I'm reasonably sure) that the remaining thing that is making it slow is float consing. It ought to be possible to prevail upon a compiler to avoid this, and I spent a small amount of time trying but to no avail.
Anyway, here it is.
(defun p (L n &key (b 2) (base 10.0d0))
;; nth occurence of power of b with leading digits L in base:
;; returns power (p below)
(let ((k-1 (floor (log L base))) ;(digits of L in base) - 1
(logb (log (float b 1.0d0) base))) ;log of b in base
(do* ((p 0 (1+ p)) ;power
(d 1 (floor (expt base (+ (rem (* p logb) 1) k-1)))) ;digits
(c (if (= L 1) 1 0) ;count of hits
(if (= L d) (1+ c) c)))
((= c n) p))))
Below is an earlier, useless answer: I've left it for posterity.
Note that the first part what's below deals with only the two-leading-digit case because I didn't bother looking at the original rosetta code thing: see the end for a generalisation.
First of all to pull out the first two decimal digits of something, you want to start by divide it by a suitable power of 10 that the result is a two digit number. That means taking logs, but you can cheat: if you know the number of bits, b, in the number then the number is 2^b + change (assuming it's a positive integer), and knowing the number of bits is very very quick. And then if 10^x = 2^y then x = y/(log_2 10) where log_2 is log base 2. And this is a constant.
So you can write this function after fiddling around with pen and paper for a bit:
(defun leading-two-digits (n)
(truncate (truncate n (expt 10 (1- (truncate (integer-length n)
(load-time-value (log 10 2))))))
10))
Note the use of load-time-value to compute the (log 10 2) just once. And note also that this relies on either having correct integer arithmetic or at least a function which will tell you what the number of bits in a number would be if you did have.
So now
> (leading-two-decimal-digits 59468049869823987435)
5
9
OK, looks good (extensive testing...).
So now you just need to start from 1 and successively multiply by the base, looking for the leading two digits being what you care about. As a hack if the base is 2 you can just left shift: my original function assumed the base was 2 and always left shifted, this one has a configurable base but still special-cases 2, which I suspect may no longer help at all.
(defun nth-ld-power (n &key (b 2) (leading 1) (second 2))
;; iterate is tfeb.github.io/tfeb-lisp-hax/, also should be in
;; Quicklisp
(iterate looking ((v 1) ;value
(p 0) ;power (so we can report it)
(c 0)) ;hit count
(multiple-value-bind (d1 d2) (leading-two-decimal-digits v)
(if (and (= d1 leading) (= d2 second))
(let ((c+ (1+ c)))
(if (= c+ n)
(values v p)
(looking (if (= b 2) (ash v 1) (* v b)) (1+ p) c+)))
(looking (if (= b 2) (ash v 1) (* v b)) (1+ p) c)))))
Again you would want to deal with the multiple-leading-digits case by passing a list of leading digits, but.
OK, so now:
> (time (nth-ld-power 2))
Timing the evaluation of (nth-ld-power 2)
User time = 0.000
System time = 0.000
Elapsed time = 0.000
Allocation = 2848 bytes
0 Page faults
GC time = 0.000
1208925819614629174706176
80
> (time (nth-ld-power 10))
Timing the evaluation of (nth-ld-power 10)
User time = 0.000
System time = 0.000
Elapsed time = 0.000
Allocation = 40704 bytes
0 Page faults
GC time = 0.000
124330809102446660538845562036705210025114037699336929360115994223289874253133343883264
286
OK, this is too quick to measure, let's make it do some real work:
> (time (nth-ld-power 1000))
Timing the evaluation of (nth-ld-power 1000)
User time = 3.264
System time = 0.039
Elapsed time = 3.384
Allocation = 845193424 bytes
517 Page faults
GC time = 0.020
12[...long number truncated here ...]
28745
So that's at least reasonable performance, I think. These figures were for LispWorks on a 2013 macbook. Obviously you can rewrite it without Tim's iterate, but I like it.
Here is a generalisation to the n-digit case.
(defun leading-decimal-digits (n m)
;; Return M leading digits from N: no sanity check
(iterate peel ((q (truncate n (expt 10 (- (truncate (integer-length n)
(load-time-value (log 10 2)))
(- m 1)))))
(digits '()))
(multiple-value-bind (qq d) (truncate q 10)
(cond
((zerop qq)
(cons d digits))
((< qq 10)
(list* qq d digits))
(t
(peel qq (cons d digits)))))))
(defun nth-power-with-leading-decimal-digits (n b leading)
;; Find the N'th occurrence of a power of B whose leading digits are
;; LEADING (a list of digits). Return the value of B^P and P, the
;; power.
(let ((nleading (length leading)))
(iterate looking ((v 1) ;value
(p 0) ;power (so we can report it)
(c 0)) ;hit count
(if (equal (leading-decimal-digits v nleading) leading)
(let ((c+ (1+ c)))
(if (= c+ n)
(values v p)
(looking (if (= b 2) (ash v 1) (* v b)) (1+ p) c+)))
(looking (if (= b 2) (ash v 1) (* v b)) (1+ p) c)))))
So:
> (nth-power-with-leading-decimal-digits 2 2 '(1 2 8))
12855504354071922204335696738729300820177623950262342682411008
203
> (nth-power-with-leading-decimal-digits 2 3 '(2 7 8))
278954761343915929031866324148580803686773879062609352173281933430969939572023921519256921927359964084535215107090906022143908601839272147120823008337941481521208646465304746378648054338849857759629806700446921838039313884792762356955010344065298744426691826196079923777796821513452648753573059469525738664313409324728161550430310432705576201607066435772343529511415605105251217669677767280155388600975280964482318641251059803074701681895639266095014303466041595626938522373004626313927779288797843485898785327074755040298312905780373591994824646107875196292150692772609024142420597241695173176699988995274347532223981851419958515807471788303361043192244700492703668505222371093498892685289816397935636213340656919509760640657215827785179171568543835684943671352357398679232652259639784388841436515656182938820127024910154405205830179436914341715546500034485031465189386918271898229029201
1860
2^203 is the second occurrence of a power of 2 beginning with 128, and 3^1860 is the second occurrence of a power of 3 beginning 278. The second one took 0.017 seconds.

Scheme : recursive process much faster than iterative

I am studying SICP and wrote two procedures to compute the sum of 1/n^2, the first generating a recursive process and the second generating an iterative process :
(define (sum-rec a b)
(if (> a b)
0
(exact->inexact (+ (/ 1 (* a a)) (sum-rec (1+ a) b)))))
(define (sum-it a b)
(define (sum_iter a tot)
(if (> a b)
tot
(sum_iter (1+ a) (+ (/ 1 (* a a)) tot))))
(exact->inexact (sum_iter a 0)))
I tested that both procedures give exactly the same results when called with small values of b, and that the result is approaching $pi^2/6$ as b gets larger, as expected.
But surprisingly, calling (sum-rec 1 250000) is almost instantaneous whereas calling (sum-it 1 250000) takes forever.
Is there an explanation for that?
As was mentioned in the comments, sum-it in its present form is adding numbers using exact arithmetic, which is slower than the inexact arithmetic being used in sum-rec. To do an equivalent comparison, this is how you should implement it:
(define (sum-it a b)
(define (sum_iter a tot)
(if (> a b)
tot
(sum_iter (1+ a) (+ (/ 1.0 (* a a)) tot))))
(sum_iter a 0))
Notice that replacing the 1 with a 1.0 forces the interpreter to use inexact arithmetic. Now this will return immediately:
(sum-it 1 250000)
=> 1.6449300668562465
You can reframe both of these versions so that they do exact or inexact arithmetic appropriately, simply by controlling what value they use for zero and relying on the contagion rules. These two are in Racket, which doesn't have 1+ by default but does have a nice syntax for optional arguments with defaults:
(define (sum-rec low high (zero 0.0))
(let recurse ([i low])
(if (> i high)
zero
(+ (/ 1 (* i i)) (recurse (+ i 1))))))
(define (sum-iter low high (zero 0.0))
(let iterate ([i low] [accum zero])
(if (> i high)
accum
(iterate (+ i 1) (+ (/ 1 (* i i)) accum)))))
The advantage of this is you can see the performance difference easily for both versions. The disadvantage is that you'd need a really smart compiler to be able to optimize the numerical operations here (I think, even if it knew low and high were machine integers, it would have to infer that zero is going to be some numerical type and generate copies of the function body for all the possible types).

Improving performance for converting numbers to lists, and base10 to base2

Many Project Euler problems require manipulating integers and their digits, both in base10 and base2. While I have no problem with converting integers in lists of digits, or converting base10 into base2 (or lists of their digits), I often find that performance is poor when doing such conversions repeatedly.
Here's an example:
First, here are my typical conversions:
#lang racket
(define (10->bin num)
(define (10->bin-help num count)
(define sq
(expt 2 count))
(cond
[(zero? count) (list num)]
[else (cons (quotient num sq) (10->bin-help (remainder num sq) (sub1 count)))]
)
)
(member 1 (10->bin-help num 19)))
(define (integer->lon int)
(cond
[(zero? int) empty]
[else (append (integer->lon (quotient int 10)) (list (remainder int 10)))]
)
)
Next, I need a function to test whether a list of digits is a palindrome
(define (is-palindrome? lon)
(equal? lon (reverse lon)))
Finally, I need to sum all base10 integers below some max that are palindromes in base2 and base10. Here's the accumulator-style function:
(define (sum-them max)
(define (sum-acc count acc)
(define base10
(integer->lon count))
(define base2
(10->bin count))
(cond
[(= count max) acc]
[(and
(is-palindrome? base10)
(is-palindrome? base2))
(sum-acc (add1 count) (+ acc count))]
[else (sum-acc (add1 count) acc)]))
(sum-acc 1 0))
And the regular recursive version:
(define (sum-them* max)
(define base10
(integer->lon max))
(define base2
(10->bin max))
(cond
[(zero? max) 0]
[(and
(is-palindrome? base10)
(is-palindrome? base2))
(+ (sum-them* (sub1 max)) max)]
[else (sum-them* (sub1 max))]
)
)
When I apply either of these two last functions to 1000000, I takes well over 10 seconds to complete. The recursive version seems a bit quicker than the accumulator version, but the difference is negligible.
Is there any way I can improve this code, or do I just have to accept that this is the style of number-crunching for which Racket isn't particularly suited?
So far, I have considered the possibility of replacing integer->lon by a similar integer->vector as I expect vector-append to be faster than append, but then I'm stuck with the need to apply reverse later on.
Making your existing code more efficient
Have you considered getting the list of bits using any of Racket's bitwise operations? E.g.,
(define (bits n)
(let loop ((n n) (acc '()))
(if (= 0 n)
acc
(loop (arithmetic-shift n -1) (cons (bitwise-and n 1) acc)))))
> (map bits '(1 3 4 5 7 9 10))
'((1) (1 1) (1 0 0) (1 0 1) (1 1 1) (1 0 0 1) (1 0 1 0))
It'd be interesting to see whether that speeds anything up. I expect it would help a bit, since your 10->bin procedure currently makes a call to expt, quotient, and remainder, whereas bit shifting, depending on the representations used by the compiler, would probably be more efficient.
Your integer->lon is also using a lot more memory than it needs to, since the append is copying most of the result at each step. This is kind of interesting, because you were already using the more memory efficient approach in bin->10. Something like this is more efficient:
(define (digits n)
(let loop ((n n) (acc '()))
(if (zero? n)
acc
(loop (quotient n 10) (cons (remainder n 10) acc)))))
> (map digits '(1238 2391 3729))
'((1 2 3 8) (2 3 9 1) (3 7 2 9))
More efficient approaches
All that said, perhaps you should consider the approach that you're using. It appears that right now, you're iterating through the numbers 1…MAX, checking whether each one is a palindrome, and if it is, adding it to the sum. That means you're doing something with MAX numbers, all in all. Rather than checking for palindromic numbers, why not just generate them directly in one base and then check whether they're a palindrome in the other. I.e., instead of of checking 1…MAX, check:
1
11
101, and 111
1001, and 1111
10001, 10101, 11011, and 11111,
and so on, up until the numbers are too big.
This list is all the binary palindromes, and only some of those will be decimal palindromes. If you can generate the binary palindromes using bit-twiddling techniques (so you're actually working with the integers), it's easy to write those to a string, and checking whether a string is a palindrome is probably much faster than checking whether a list is a palindrome.
Are you running these timings in DrRacket by any chance? The IDE slows down things quite a bit, especially if you happen to have debugging and/or profiling turned on, so I'd recommend doing these tests from the command line.
Also, you can usually improve the brute-force approach. For example, you can say here that we only have to consider odd numbers, because even numbers are never a palindrome when expressed as binaries (a trailing 0, but the way you represent them there's never a heading 0). This divides the execution time by 2 regardless of the algorithm.
Your code runs on my laptop in 2.4 seconds. I wrote an alternative version using strings and build-in functions that runs in 0.53 seconds (including Racket startup; execution time in Racket is 0.23 seconds):
#!/usr/bin/racket
#lang racket
(define (is-palindrome? lon)
(let ((lst (string->list lon)))
(equal? lst (reverse lst))))
(define (sum-them max)
(for/sum ((i (in-range 1 max 2))
#:when (and (is-palindrome? (number->string i))
(is-palindrome? (number->string i 2))))
i))
(time (sum-them 1000000))
yields
pu#pumbair: ~/Projects/L-Racket time ./speed3.rkt
cpu time: 233 real time: 233 gc time: 32
872187
real 0m0.533s
user 0m0.472s
sys 0m0.060s
and I'm pretty sure that people with more experience in Racket profiling will come up with faster solutions.
So I could give you the following tips:
Think about how you may improve the brute force approach
Get to know your language better. Some constructs are faster than others for no apparent reason
see http://docs.racket-lang.org/guide/performance.html and http://jeapostrophe.github.io/2013-08-19-reverse-post.html
use parallelism when applicable
Get used to the Racket profiler
N.B. Your 10->bin function returns #f for the value 0, I guess it should return '(0).

how to write a scheme program consumes n and sum as parameters, and show all the numbers(from 1 to n) that could sum the sum?

How to write a scheme program consumes n and sum as parameters, and show all the numbers(from 1 to n) that could sum the sum? Like this:
(find 10 10)
((10)
(9 , 1)
(8 , 2)
(7 , 3)
(7 ,2 , 1)
(6 ,4)
(6 , 3, 1)
(5 , 4 , 1)
(5 , 3 , 2)
(4 ,3 ,2 ,1))
I found one:
(define (find n sum)
(cond ((<= sum 0) (list '()))
((<= n 0) '())
(else (append
(find (- n 1) sum)
(map (lambda (x) (cons n x))
(find (- n 1) (- sum n)))))))
But it's inefficient,and i want a better one. Thank you.
The algorithm you are looking for is known as an integer partition. I have a couple of implementations at my blog.
EDIT: Oscar properly chastized me for my incomplete answer. As penance, I offer this answer, which will hopefully clarify a few things.
I like Oscar's use of streams -- as the author of SRFI-41 I should. But expanding the powerset only to discard most of the results seems a backward way of solving the problem. And I like the simplicity of GoZoner's answer, but not its inefficiency.
Let's start with GoZoner's answer, which I reproduce below with a few small changes:
(define (fs n s)
(if (or (<= n 0) (<= s 0)) (list)
(append (if (= n s) (list (list n))
(map (lambda (xs) (cons n xs))
(fs (- n 1) (- s n))))
(fs (- n 1) s))))
This produces a list of the output sets:
> (fs 10 10)
((10) (9 1) (8 2) (7 3) (7 2 1) (6 4) (6 3 1) (5 4 1) (5 3 2) (4 3 2 1))
A simple variant of that function produces the count instead of a list of sets, which shall be the focus of the rest of this answer:
(define (f n s)
(if (or (<= s 0) (<= n 0)) 0
(+ (if (= n s) 1
(f (- n 1) (- s n)))
(f (- n 1) s))))
And here is a sample run of the function, including timings on my ancient and slow home computer:
> (f 10 10)
10
> (time (f 100 100)
(time (f 100 ...))
no collections
1254 ms elapsed cpu time
1435 ms elapsed real time
0 bytes allocated
444793
That's quite slow; although it is fine for small inputs, it would be intolerable to evaluate (f 1000 1000), as the algorithm is exponential. The problem is the same as with the naive fibonacci algorithm; the same sub-problems are re-computed again and again.
A common solution to that problem is memoization. Fortunately, we are programming in Scheme, which makes it easy to encapsulate memoization in a macro:
(define-syntax define-memoized
(syntax-rules ()
((_ (f args ...) body ...)
(define f
(let ((results (make-hash hash equal? #f 997)))
(lambda (args ...)
(let ((result (results 'lookup (list args ...))))
(or result
(let ((result (begin body ...)))
(results 'insert (list args ...) result)
result)))))))))
We use hash tables from my Standard Prelude and the universal hash function from my blog. Then it is a simple matter to write the memoized version of the function:
(define-memoized (f n s)
(if (or (<= s 0) (<= n 0)) 0
(+ (if (= n s) 1
(f (- n 1) (- s n)))
(f (- n 1) s))))
Isn't that pretty? The only change is the addition of -memoized in the definition of the function; all of the parameters and the body of the function are the same. But the performance improves greatly:
> (time (f 100 100))
(time (f 100 ...))
no collections
62 ms elapsed cpu time
104 ms elapsed real time
1028376 bytes allocated
444793
That's an order-of-magnitude improvement with virtually no effort.
But that's not all. Since we know that the problem has "optimal substructure" we can use dynamic programming. Memoization works top-down, and must suspend the current level of recursion, compute (either directly or by lookup) the lower-level solution, then resume computation in the current level of recursion. Dynamic programming, on the other hand, works bottom-up, so sub-solutions are always available when they are needed. Here's the dynamic programming version of our function:
(define (f n s)
(let ((fs (make-matrix (+ n 1) (+ s 1) 0)))
(do ((i 1 (+ i 1))) ((< n i))
(do ((j 1 (+ j 1))) ((< s j))
(matrix-set! fs i j
(+ (if (= i j)
1
(matrix-ref fs (- i 1) (max (- j i) 0)))
(matrix-ref fs (- i 1) j)))))
(matrix-ref fs n s)))
We used the matrix functions of my Standard Prelude. That's more work than just adding -memoized to an existing function, but the payoff is another order-of-magnitude reduction in run time:
> (time (f 100 100))
(time (f 100 ...))
no collections
4 ms elapsed cpu time
4 ms elapsed real time
41624 bytes allocated
444793
> (time (f 1000 1000))
(time (f 1000 ...))
3 collections
649 ms elapsed cpu time, including 103 ms collecting
698 ms elapsed real time, including 132 ms collecting
15982928 bytes allocated, including 10846336 bytes reclaimed
8635565795744155161506
We’ve gone from 1254ms to 4ms, which is a rather astonishing range of improvement; the final program is O(ns) in both time and space. You can run the program at http://programmingpraxis.codepad.org/Y70sHPc0, which includes all the library code from my blog.
As a special bonus, here is another version of the define-memoized macro. It uses a-lists rather than hash tables, so it's very much slower than the version given above, but when the underlying computation is time-consuming, and you just want a simple way to improve it, this may be just what you need:
(define-syntax define-memoized
(syntax-rules ()
((define-memoized (f arg ...) body ...)
(define f
(let ((cache (list)))
(lambda (arg ...)
(cond ((assoc `(,arg ...) cache) => cdr)
(else (let ((val (begin body ...)))
(set! cache (cons (cons `(,arg ...) val) cache))
val)))))))))
This is a good use of quasi-quotation and the => operator in a cond clause for those who are just learning Scheme. I can't remember when I wrote that function -- I've had it laying around for years -- but it has saved me many times when I just needed a quick-and-dirty memoization and didn't care to worry about hash tables and universal hash functions.
This answer will appear tomorrow at my blog. Please drop in and have a look around.
This is similar to, but not exactly like, the integer partition problem or the subset sum problem. It's not the integer partition problem, because an integer partition allows for repeated numbers (here we're only allowing for a single occurrence of each number in the range).
And although it's more similar to the subset sum problem (which can be solved more-or-less efficiently by means of dynamic programming), the solution would need to be adapted to generate all possible subsets of numbers that add to the given number, not just one subset as in the original formulation of that problem. It's possible to implement a dynamic programming solution using Scheme, but it'll be a bit cumbersome, unless a matrix library or something similar is used for implementing a mutable table.
Here's another possible solution, this time generating the power set of the range [1, n] and checking each subset in turn to see if the sum adds to the expected value. It's still a brute-force approach, though:
; helper procedure for generating a list of numbers in the range [start, end]
(define (range start end)
(let loop ((acc '())
(i end))
(if (< i start)
acc
(loop (cons i acc) (sub1 i)))))
; helper procedure for generating the power set of a given list
(define (powerset set)
(if (null? set)
'(())
(let ((rest (powerset (cdr set))))
(append (map (lambda (element) (cons (car set) element))
rest)
rest))))
; the solution is simple using the above procedures
(define (find n sum)
(filter (lambda (s) (= sum (apply + s)))
(powerset (range 1 n))))
; test it, it works!
(find 10 10)
=> '((1 2 3 4) (1 2 7) (1 3 6) (1 4 5) (1 9) (2 3 5) (2 8) (3 7) (4 6) (10))
UPDATE
The previous solution will produce correct results, but it's inefficient in memory usage because it generates the whole list of the power set, even though we're interested only in some of the subsets. In Racket Scheme we can do a lot better and generate only the values as needed if we use lazy sequences, like this (but be aware - the first solution is still faster!):
; it's the same power set algorithm, but using lazy streams
(define (powerset set)
(if (stream-empty? set)
(stream '())
(let ((rest (powerset (stream-rest set))))
(stream-append
(stream-map (lambda (e) (cons (stream-first set) e))
rest)
rest))))
; same algorithm as before, but using lazy streams
(define (find n sum)
(stream-filter (lambda (s) (= sum (apply + s)))
(powerset (in-range 1 (add1 n)))))
; convert the resulting stream into a list, for displaying purposes
(stream->list (find 10 10))
=> '((1 2 3 4) (1 2 7) (1 3 6) (1 4 5) (1 9) (2 3 5) (2 8) (3 7) (4 6) (10))
Your solution is generally correct except you don't handle the (= n s) case. Here is a solution:
(define (find n s)
(cond ((or (<= s 0) (<= n 0)) '())
(else (append (if (= n s)
(list (list n))
(map (lambda (rest) (cons n rest))
(find (- n 1) (- s n))))
(find (- n 1) s)))))
> (find 10 10)
((10) (9 1) (8 2) (7 3) (7 2 1) (6 4) (6 3 1) (5 4 1) (5 3 2) (4 3 2 1))
I wouldn't claim this as particularly efficient - it is not tail recursive nor does it memoize results. Here is a performance result:
> (time (length (find 100 100)))
running stats for (length (find 100 100)):
10 collections
766 ms elapsed cpu time, including 263 ms collecting
770 ms elapsed real time, including 263 ms collecting
345788912 bytes allocated
444793
>

Scheme - Memory System

I am trying to make a memory system where you input something in a slot of memory. So what I am doing is making an Alist and the car of the pairs is the memory location and the cdr is the val. I need the program to understand two messages, Read and Write. Read just displaying the memory location selected and the val that is assigned to that location and write changes the val of the location or address. How do I make my code so it reads the location you want it to and write to the location you want it to? Feel free to test this yourself. Any help would be much appreciated. This is what I have:
(define make-memory
(lambda (n)
(letrec ((mem '())
(dump (display mem)))
(lambda ()
(if (= n 0)
(cons (cons n 0) mem) mem)
(cons (cons (- n 1) 0) mem))
(lambda (msg loc val)
(cond
((equal? msg 'read) (display
(cons n val))(set! n (- n 1)))
((equal? msg 'write) (set! mem
(cons val loc)) (set! n (- n 1)) (display mem)))))))
(define mymem (make-memory 100))
A possible solution:
(define (make-memory n)
(let ([memory (make-vector n 0)]
[protect (lambda (addr)
(if (and (addr . >= . 0) (addr . < . n))
addr
(error "access to outside of memory")))])
(match-lambda*
[`(read ,addr) (cons addr (vector-ref memory (protect addr)))]
[`(write ,addr ,x) (vector-set! memory (protect addr) x)])))
This has the added benefit of not using alists (for speed) and protecting against malicious attempts to access stuff outside the preallocated range ;).
Works as desired:
> (define mem (make-memory 10))
> (mem 'read 0)
(0 . 0)
> (mem 'read 2)
(2 . 0)
> (mem 'write 2 10)
> (mem 'read 2)
(2 . 10)
> (mem 'read 100)
access to outside of memory
This might be a bit hard to grasp if you're just starting out with Scheme.
You can read more about match-lambda and friends here.
Vectors are Scheme's equivalent of arrays in other languages (read this).

Resources