So I have code that should find the average of a list but I see no printed value in console
#lang racket
(define x (list '10 '60 '3 '55 '15 '45 '40))
(define (average x)
(/ (sum x) (length x)))
(display average (current-output-port))
(define (sum)
(if (null? x)
0
(+ (car x) (sum (cdr x)))))
it simply displays
#<procedure:average>
Your code has following problems:
sum was used before it was defined.
sum did not take a parameter.
average function was not evaluated in display.
I have used exact->inexact because that's what I think your intention is.
Following works.
(define x (list 10 60 3 55 15 45 40))
(define (sum x)
(if (null? x)
0
(+ (car x) (sum (cdr x)))))
(define (average x)
(/ (sum x) (length x)))
(display (exact->inexact (average x)) (current-output-port))
(define (average l)
(/ (foldr (lambda (x y) (+ x y)) 0 l)
(length l)))
sum and length are each O(n) resulting in a O(2n) process for average. Below we show how continuation passing style can be used to make average a O(n) process as well.
(define (average xs (return /))
(if (empty? xs)
(return 0 0)
(average (cdr xs)
(lambda (sum len)
(return (+ sum (car xs))
(+ len 1))))))
(printf "~a~n" (average '(10 60 3 55 15 45 40)))
;; 228/7
Using exact->inexact in average means only an inexact result can be returned. Making additional computations with inexact numbers leads to additional inexactitude. You might think that inexact->exact could reverse any of this however it can only make an approximation.
(average '(10 60 3 55 15 45 40)
;; 32 4/7
(inexact->exact (exact->inexact (average '(10 60 3 55 15 45 40))))
;; 32 40210710958665/70368744177664
For this reason it generally make sense only to convert an exact number to an inexact one just before it is displayed.
(printf "~a\n" (exact->inexact (average '(10 60 3 55 15 45 40))))
;; 32.57142857142857
Our average procedure also throws an error when an empty list is given.
(average '())
;; error /: division by zero
Alternatively, we could write average using a named let expression. Also O(n).
(define (average xs)
(let loop ((xs xs)
(sum 0)
(len 0))
(if (empty? xs)
(/ sum len)
(loop (cdr xs)
(+ sum (car xs))
(+ len 1)))))
(average '(10 60 3 55 15 45 40)
;; 32 4/7
I might be using a different version of scheme; https://scheme.cs61a.org/ is the compiler I use. It worked for me when I included the a parameter for the function sum like this:
(define (sum x)
(if (null? x)
0
(+ (car x) (sum (cdr x)))))
Hope this helps!
I have a lisp program that converts Roman Numerals into decimal form. It works great for valid inputs, however I'm not sure how to check to see if the input is a valid Roman Numeral. Currently when given an invalid input ("MIM") for example, it still tries to incorrectly convert it. I need it to instead return an ERROR message.
(defun mapRomanToDecimal (chars nums string)
(loop as char across string
as i = (position char chars)
collect (and i (nth i nums))))
(defun parseThroughRoman (R)
(loop with nums = (mapRomanToDecimal "IVXLCDM" '(1 5 10 50 100 500 1000) R)
as (A B) on nums if A sum (if (and B (< A B)) (- A) A)))
(defun romanToDecimal (RomanNumeral)
(format t "~d~%" (parseThroughRoman (numlist-to-string RomanNumeral))))
(defun numlist-to-string (lst)
(when lst
(concatenate 'string
(write-to-string (car lst)) (numlist-to-string (cdr lst)))))
(romanToDecimal '(C D V)) -> 405
(romanToDecimal '(M I M)) -> 1999
A little bit about style...
data type conversion is often not necessary
code can easily be more generic
Example:
(defvar *roman-chars* "IVXLCDM")
(defvar *roman-nums* '(1 5 10 50 100 500 1000))
(defun roman-numeral-to-decimal (roman-numeral)
(let ((i (position (coerce roman-numeral 'character) *roman-chars*)))
(and i (nth i *roman-nums*))))
(defun map-roman-numerals-to-decimal (roman-numerals)
(map 'list #'roman-numeral-to-decimal roman-numerals))
(defun roman-to-decimal (roman)
(loop as (A B) on (map-roman-numerals-to-decimal roman)
if A sum (if (and B (< A B)) (- A) A)))
This means you can use it with lists of symbols/characters/strings, strings, vectors symbols/characters/strings:
CL-USER 20 > (roman-to-decimal '(C D V))
405
CL-USER 21 > (roman-to-decimal '("C" "D" "V"))
405
CL-USER 22 > (roman-to-decimal '(#\C #\D #\V))
405
CL-USER 23 > (roman-to-decimal "CDV")
405
CL-USER 24 > (roman-to-decimal #(c d v))
405
So I'm having a hard time trying to write this program.
The scope is a program that will take a large number, (say 1,000,000) and split it into its digits (ie 1,500,310 -> 1 million 500 thousand 3 hundred 1 ten 0 one).
#lang r5rs
(define (three_names x)
(let loop ((x x)
(myList '()))
(if (< x 10)
(cons x myList)
(loop (quotient x 10)
(cons (remainder x 10) myList)))))
I've gotten it so that it will loop and return these values each into a list with some help from stackoverflow.
(i.e. this will take say 100 and put it in '(1 0 0) which is helpful but not helpful enough. I've been banging my head since Wednesday!)
I would go for something like this:
#lang r5rs
(define (three-names n)
(let loop ((n n)
(units '((10 "one") (10 "ten") (10 "hundred") (1000 "thousand") (1000 "million")))
(res ""))
(if (or (zero? n) (null? units))
res
(let* ((unit (car units)) (div (car unit)) (txt (cadr unit)))
(let ((q (quotient n div)) (r (remainder n div)))
(loop q
(cdr units)
(string-append (number->string r) " " txt " " res)))))))
testing:
> (three-names 1500310)
"1 million 500 thousand 3 hundred 1 ten 0 one "
> (three-names 1001)
"1 thousand 0 hundred 0 ten 1 one "
EDIT
An alternative version that
works for the intial value 0 (the previous procedure wouldn't return anything in that case)
works with symbols and returns a list rather than a string, as asked in your comment
has an additional display in the loop that wlll make it clearer how the "units" are chosen:
could be:
(define (three-names n)
(if (zero? n)
"zero"
(let loop ((n n)
(units '((10 one) (10 ten) (10 hundred) (1000 thousand) (1000 million)))
(res '()))
(display n) (display " - ") (display res) (display " - ") (display units) (newline)
(if (or (zero? n) (null? units))
res
(let* ((unit (car units)) (div (car unit)) (txt (cadr unit)))
(let ((q (quotient n div)) (r (remainder n div)))
(loop q
(cdr units)
(cons r (cons txt res)))))))))
now:
> (display (three-names 1500310))
1500310 - () - ((10 one) (10 ten) (10 hundred) (1000 thousand) (1000 million))
150031 - (0 one) - ((10 ten) (10 hundred) (1000 thousand) (1000 million))
15003 - (1 ten 0 one) - ((10 hundred) (1000 thousand) (1000 million))
1500 - (3 hundred 1 ten 0 one) - ((1000 thousand) (1000 million))
1 - (500 thousand 3 hundred 1 ten 0 one) - ((1000 million))
0 - (1 million 500 thousand 3 hundred 1 ten 0 one) - ()
(1 million 500 thousand 3 hundred 1 ten 0 one)
This is a follow-up question, sort of, to this one: Write an efficient string replacement function? .
In (albeit distant) future I hope to get to do natural language processing. Of course speed of strings manipulation is important because of that. Accidentally, I've stumbled over this test: http://raid6.com.au/~onlyjob/posts/arena/ - all tests are biased, this is no exception. However, it raised important question for me. And so I wrote a few tests to see how am I doing:
This was my first attempt (I'll call it #A):
#A
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln = (* (length addidtion) i)
for accumulated = addidtion
then (loop with concatenated = (concatenate 'string accumulated addidtion)
for start = (search "efgh" concatenated)
while start do (replace concatenated "____" :start1 start)
finally (return concatenated))
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Baffled with the results, I tried to use cl-ppcre - I don't know what I was hoping for, but the results came out as really bad... Here's the code I used for testing:
#B
(ql:quickload "cl-ppcre")
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln = (* (length addidtion) i)
for accumulated = addidtion
then (cl-ppcre:regex-replace-all "efgh" (concatenate 'string accumulated addidtion) "____")
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Well, then, in hopes to maybe side-step some generalizations, I decided to write my own, albeit somewhat naive version:
#C
(defun replace-all (input match replacement)
(declare (type string input match replacement)
(optimize (debug 0) (safety 0) (speed 3)))
(loop with pattern fixnum = (1- (length match))
with i fixnum = pattern
with j fixnum = i
with len fixnum = (length input) do
(cond
((>= i len) (return input))
((zerop j)
(loop do
(setf (aref input i) (aref replacement j) i (1+ i))
(if (= j pattern)
(progn (incf i pattern) (return))
(incf j))))
((char= (aref input i) (aref match j))
(decf i) (decf j))
(t (setf i (+ i 1 (- pattern j)) j pattern)))))
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion string = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i fixnum from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln fixnum = (* (length addidtion) i)
for accumulated string = addidtion
then (replace-all (concatenate 'string accumulated addidtion) "efgh" "____")
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Almost as slow as cl-ppcre! Now, that's incredible! There isn't anything I can spot here such that would result in such poor performance... And still it does suck :(
Realizing that the standard functions performed the best so far, I looked into SBCL source and after some reading I came up with this:
#D
(defun replace-all (input match replacement &key (start 0))
(declare (type simple-string input match replacement)
(type fixnum start)
(optimize (debug 0) (safety 0) (speed 3)))
(loop with input-length fixnum = (length input)
and match-length fixnum = (length match)
for i fixnum from 0 below (ceiling (the fixnum (- input-length start)) match-length) do
(loop with prefix fixnum = (+ start (the fixnum (* i match-length)))
for j fixnum from 0 below match-length do
(when (<= (the fixnum (+ prefix j match-length)) input-length)
(loop for k fixnum from (+ prefix j) below (the fixnum (+ prefix j match-length))
for n fixnum from 0 do
(unless (char= (aref input k) (aref match n)) (return))
finally
(loop for m fixnum from (- k match-length) below k
for o fixnum from 0 do
(setf (aref input m) (aref replacement o))
finally
(return-from replace-all
(replace-all input match replacement :start k))))))
finally (return input)))
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion string = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i fixnum from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln fixnum = (* (length addidtion) i)
for accumulated string = addidtion
then (replace-all (concatenate 'string accumulated addidtion) "efgh" "____")
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Finally, I can win, although a tiny fraction of performance against the standard library - yet it is still very-very bad compared to almost everything else...
Here's the table with the results:
| SBCL #A | SBCL #B | SBCL #C | SBCL #D | C gcc 4 -O3 | String size |
|-----------+-----------+------------+-----------+-------------+-------------|
| 17.463 s | 166.254 s | 28.924 s | 16.46 s | 1 s | 256 Kb |
| 68.484 s | 674.629 s | 116.55 s | 63.318 s | 4 s | 512 Kb |
| 153.99 s | gave up | 264.927 s | 141.04 s | 10 s | 768 Kb |
| 275.204 s | . . . . . | 474.151 s | 251.315 s | 17 s | 1024 Kb |
| 431.768 s | . . . . . | 745.737 s | 391.534 s | 27 s | 1280 Kb |
| 624.559 s | . . . . . | 1079.903 s | 567.827 s | 38 s | 1536 Kb |
Now, the question: What did I do wrong? Is this something inherent to Lisp strings? Can this probably be mitigated through... what?
In the long shot, I'd even consider writing a specialized library for string processing. If the problem isn't my bad code, but rather the implementation. Would it make sense to do so? If yes, what language would you suggest for doing it?
EDIT: Just for the record, I'm now trying to use this library: https://github.com/Ramarren/ropes to deal with strings concatenation. Unfortunately, it doesn't have a replace function in it and doing multiple replaces isn't very trivial. But I'll keep this post updated when I have something.
I've tried to slightly change huaiyuan's variant to use array's fill-pointers instead of string concatenation (to achieve something similar to StringBuilder suggested by Paulo Madeira. It probably can be optimized further, but I'm not sure about the types / which will method be faster / will it be worth to redefine types for * and + to get them to only operate on fixnum or signed-byte. Anyway, here's the code and the benchmark:
(defun test/e ()
(declare (optimize speed))
(labels ((min-power-of-two (num)
(declare (type fixnum num))
(decf num)
(1+
(progn
(loop for i fixnum = 1 then (the (unsigned-byte 32) (ash i 1))
while (< i 17) do
(setf num
(logior
(the fixnum
(ash num (the (signed-byte 32)
(+ 1 (the (signed-byte 32)
(lognot i)))))) num))) num)))
(join (x y)
(let ((capacity (array-dimension x 0))
(desired-length (+ (length x) (length y)))
(x-copy x))
(declare (type fixnum capacity desired-length)
(type (vector character) x y x-copy))
(when (< capacity desired-length)
(setf x (make-array
(min-power-of-two desired-length)
:element-type 'character
:fill-pointer desired-length))
(replace x x-copy))
(replace x y :start1 (length x))
(setf (fill-pointer x) desired-length) x))
(seek (old str pos)
(let ((q (position (aref old 0) str :start pos)))
(and q (search old str :start2 q))))
(subs (str old new)
(loop for p = (seek old str 0) then (seek old str p)
while p do (replace str new :start1 p))
str))
(declare (inline min-power-of-two join seek subs)
(ftype (function (fixnum) fixnum) min-power-of-two))
(let* ((builder
(make-array 16 :element-type 'character
:initial-contents "abcdefghefghefgh"
:fill-pointer 16))
(ini (get-internal-real-time)))
(declare (type (vector character) builder))
(loop for i fixnum below (+ 1000 (* 4 1024 1024 (/ (length builder))))
for j = builder then
(subs (join j builder) "efgh" "____")
for k fixnum = (* (length builder) i)
when (= 0 (mod k (* 1024 256)))
do (format t "~&~8,2F sec ~8D kB"
(/ (- (get-internal-real-time) ini) 1000)
(/ k 1024))))))
1.68 sec 256 kB
6.63 sec 512 kB
14.84 sec 768 kB
26.35 sec 1024 kB
41.01 sec 1280 kB
59.55 sec 1536 kB
82.85 sec 1792 kB
110.03 sec 2048 kB
The bottle-neck is the search function, which is perhaps not optimized in SBCL. The following version uses position to help it skip over impossible region and is about 10 times as fast as your version #A on my machine:
(defun test/e ()
(declare (optimize speed))
(labels ((join (x y)
(concatenate 'simple-base-string x y))
(seek (old str pos)
(let ((q (position (char old 0) str :start pos)))
(and q (search old str :start2 q))))
(subs (str old new)
(loop for p = (seek old str 0) then (seek old str p)
while p do (replace str new :start1 p))
str))
(declare (inline join seek subs))
(let* ((str (join "abcdefgh" "efghefgh"))
(ini (get-internal-real-time)))
(loop for i below (+ 1000 (* 4 1024 1024 (/ (length str))))
for j = str then (subs (join j str) "efgh" "____")
for k = (* (length str) i)
when (= 0 (mod k (* 1024 256)))
do (format t "~&~8,2F sec ~8D kB"
(/ (- (get-internal-real-time) ini) 1000)
(/ k 1024))))))
The tests in that page are indeed biased, so let's see by how much. The author claims to test string manipulation, but here's what the programs in that page test:
String concatenation
Memory management, either explicit (C) or implicit
In some languages, regular expressions
In others, string search algorithms and substring replacement
Memory access, which has bounds checks on several languages
There are way too many aspects just here. Here's how it's being measured:
Real time in seconds
This is unfortunate, since the computer had to be completely dedicated to running just this test for reasonable values, without any other processes whatsoever, such as services, antiviruses, browsers, even a waiting *nix shell. CPU time would be much more useful, you could even run the tests in a virtual machine.
Another aspect is that characters in C, C++, Perl, Python, PHP and Ruby are 8-bit, but they're 16-bit in many of the other tested languages. This means that memory usage is stressed in very different amounts, by at least a factor of 2. Here, cache misses are much more noticeable.
I suspect the reason Perl is so fast is that it checks its arguments once before invoking a C function, instead of constantly checking bounds. Other languages with 8-bit strings are not so fast, but are still reasonably fast.
JavaScript V8 has strings that are ASCII if possible, so if the appended and replaced token was "ëfgh", you'd pay a lot more in that implementation.
Python 3 is almost three times slower than Python 2, and my guess is it's due to the wchar_t * vs char * internal representation of strings.
JavaScript SpiderMonkey uses 16-bit strings. I didn't dig the sourced much, but the file jsstr.h mentions ropes.
Java is so slow because Strings are immutable, and so for this benchmark, it's definitely not the appropriate data type. You're paying the price of generating a huge string after each .replace(). I haven't tested, but probably a StringBuffer would be much faster.
So, this benchmark is to be taken not only with a grain of salt, but with a handful of it.
In Common Lisp, bounds checking and type dispatching in aref and its setf are probably the bottlenecks.
For good performance, you would have to ditch generic string sequences and use simple-strings or simple-vectors, whichever your implementation optimizes best. Then, you should have a way of making calls to schar or svref and their setfable forms that bypass bounds checking. From here, you can implement your own simple-string-search or simple-character-vector-search (and replace-simple-string or replace-simple-vector, although they play a much smaller role in this particular example) with full speed optimization and type declarations, with bounds checking at the head of each call instead of at each array access.
A sufficiently smart compiler™ would do all of this for you given "proper" declarations. The problem is, you'd have to use (concatenate 'simple-string/simple-vector ...), because neither simple strings nor simple vectors are adjustable.
With a compacting/moving GC, there's always a penalty in these cases (e.g. array/object copying), and choosing between array adjustment and concatenation must really depend on profiling tests. Otherwise, adjustment can be way faster than concatenation, while there's enough free memory to grow the array.
You could use adjustable arrays, if the implementation would access the actual elements directly after a brief bounds checking at the head of optimized calls to/expansions of search and replace with adjustable arrays (e.g. by having internal definitions that take the actual displaced vector/array and start and end offsets).
But I'm speculating a lot here, you have to compile, inspect the compilation and profile in each implementation for real-world facts.
As a side note, the C example code is full of bugs, such as off-by-one (-1, actually) allocations (the strcat calls write an extra byte, the zero-terminated string terminator), an uninitialized zero-terminated string gstr (the first strcat works by luck, because the memory might not be initialized to 0), conversions from size_t and time_t to int and assumption of these types in a printf format string, an unused variable pos_c that is initialized with the first allocation for gstr which is incremented without taking into account that realloc may move the buffer, and no error handling whatsoever.
the format can turn any type into a string, e.g
(define lam-form (list `lambda (list `x ) (list `when (list `> `x 0) (list `* 100 `x ))))
(format "~s" lam-form)
result will be: "(lambda (x) (when (> x 0) (* 100 x)))"
Then my question is: how to do the reverse? meaning, turn "(lambda (x) (when (> x 0) (* 100 x)))" back to '(lambda (x) (when (> x 0) (* 100 x)))
Use read.
Welcome to Racket v5.1.3.1.
-> (read (open-input-string "(lambda (x) (when (> x 0) (* 100 x)))"))
'(lambda (x) (when (> x 0) (* 100 x)))
If you're referring only to the use of the "~s" formatting directive, then "read" is the right choice. See the docs for racket's "printf", then click through on the definition of "write" for more information.
If, on the other hand, you want to reverse any use of format, then there may be no unique answer; e.g., undoing (format "~a~a" "abc" "def") is not possible (in the sense that there are multiple possible calls that produce abcdef ).