Common Lisp: Efficiently handling supplied-p-parameter efficiently without using &rest arguments

Common Lisp: Efficiently handling supplied-p-parameter efficiently without using &rest arguments - performance

Suppose I have the function
(defun bar (a &optional (b nil bp))
(declare (ignore a b bp)) ; <-- replace it with (list a b bp)
; to check for correctness
)
;; I want to preserve the value of bp here inside bar, and inside foo below
and I wish to write a wrapper around bar:
(defun foo (a &optional (b nil bp))
(declare (optimize speed))
(apply #'bar a (nconc (when bp (list b)))))
while preserving the value of bp in the call from foo to bar, and also keeping the arguments b visible in the emacs' minibuffer / eldoc-mode. I wanted to know if there's a possibly-nonportable non-consing way to preserve the value of bp in this call.
For example,
CL-USER> (time (loop repeat 1000000 do (foo 4 2)))
Evaluation took:
0.040 seconds of real time
0.043656 seconds of total run time (0.043656 user, 0.000000 system)
110.00% CPU
96,380,086 processor cycles
15,990,784 bytes consed <--- plenty of consing
NIL
If I ignore the b-visibility-in-eldoc, I could possibly use &rest arguments, but I do want the arguments to be visible.
While there are other ways to achieve this in this particular case, I do want to consider the case when there are multiple &optional (or &keyword) arguments.

Use a cond form to decide how to call bar:
(defun bar (a &optional (b nil bp) (c nil cp))
(declare (ignore a b bp c cp)))
(defun foo (a &optional (b nil bp) (c nil cp))
(declare (optimize speed))
(cond (cp (funcall #'bar a b c))
(bp (funcall #'bar a b))
(t (funcall #'bar a))))
CL-USER> (time (loop repeat 1000000 do (foo 1 2 3)))
Evaluation took:
0.015 seconds of real time
0.017203 seconds of total run time (0.017148 user, 0.000055 system)
113.33% CPU
41,186,554 processor cycles
0 bytes consed
Checking the arguments passed to bar:
(defun bar (a &optional (b nil bp) (c nil cp))
(list a b bp c cp))
CL-USER> (foo 1 2 3)
(1 2 T 3 T)
CL-USER> (foo 1 2)
(1 2 T NIL NIL)
CL-USER> (foo 1)
(1 NIL NIL NIL NIL)

I am not sure what bar and foo are actually supposed to do but what about:
(defun foo (a &optional (b nil bp))
(declare (optimize speed))
(funcall #'bar a (when bp b)))
CL-USER> (time (loop repeat 1000000 do (foo 4 2)))
Evaluation took:
0.005 seconds of real time
0.005810 seconds of total run time (0.005804 user, 0.000006 system)
120.00% CPU
8,099,624 processor cycles
0 bytes consed

Related

How to format parameter as a function

Shortly I have a function foo:
(defun foo (a b &key test)
(format t "~S is the result of my test ~A" (funcall test a b) test))
then the result of the evaluation is:
(foo 12 34 :test #'(lambda (a b) (+ a b)))
46 is the result of my test #<Anonymous Function #x30200171D91F>
and I want
(foo 12 34 :test #'(lambda (a b) (+ a b)))
46 is the result of my test #'(lambda (a b) (+ a b))
Unfortunately, function-lambda-expression does not display any information in CCL.
The point is this is implementation-dependent.
For instance in CCL:
(describe #'(lambda (a b) (+ a b)))
#<Anonymous Function #x302000C49E1F>
Name: NIL
Arglist (analysis): (A B)
Bits: -528481792
Plist: (CCL::FUNCTION-SYMBOL-MAP (#(B A) . #(575 18 49 63 18 49)))
Maybe, I can formulate the question differently. How to save a lambda function as a slot instance in a file in order to retrieve it from any lisp implementation.
Or to be more specific, I would like to set a slot as a non-interpreted function in order to call it to be interpreted as such and have a trace of the 'source'.
My temporary 'solution' is to use explicitly a macro function such as:
(defmacro src (func) `(read-from-string (format nil "~A" ',func)))
(setf (my-slot my-class-object) (src #'(lambda (a b) (* a b))))
;; this stores the un-interpreted function such as
(my-slot my-class-object)
;; return
#'(lambda (a b) (* a b))
;; then I can do
(funcall (my-slot my-class-object) 2 3)
6

The ability to restore the source from a function depends on the implementation and the debug level of your environment. In Common Lisp implementations that compiles code, you need to optimize for debugging to keep track of the source code. Sometimes the source is simply the filename where the function was defined, and an offset.
Named functions
If you want to keep track of functions, it is easier to do portably if you restrict yourself to named functions. Just attach the source code to the property list of the symbol, using a macro:
;; body should be a single form that returns a name, like "defun"
(defmacro with-source-code (&body body)
(destructuring-bind (form) body
(let ((name$ (gensym)))
`(let ((,name$ ,form))
(check-type ,name$ symbol)
(setf (get ,name$ 'source-code) ',form)
,name$))))
;; get the code associated with the name
(defun source-code (name)
(check-type name symbol)
(get name 'source-code))
For example:
(with-source-code
(defun my-test-fn (x y)
(+ x y)))
(source-code 'my-test-fn)
=> (DEFUN MY-TEST-FN (X Y) (+ X Y))
Weak hash tables
Weak references are also implementation dependent, but you can use the trivial-garbage system to use them portably, or be notified when the feature is unavailable.
Here you attach the actual function object to its source code (or, any object, but this is not great for numbers or characters since they are usually not identifiable):
;; defines package "tg"
(ql:quickload :trivial-garbage)
(defparameter *source-map*
(tg:make-weak-hash-table :test #'eq :weakness :key)
"Map objects to their defining forms.")
The weakness is :key so that the garbage collector may remove the entry if the key (the object whose code we want to retrieve) is garbage collected. This should be enough to avoid keeping entries indefinitely.
(defmacro remember (form)
(let ((value$ (gensym)))
`(let ((,value$ ,form))
(setf (gethash ,value$ *source-map*) ',form)
,value$)))
(defun source (object)
(gethash object *source-map*))
For example, you can define a lambda* macro that remembers the anonymous function being defined:
(defmacro lambda* ((&rest args) &body body)
`(remember (lambda ,args ,#body)))
For example:
(let ((fn (lambda* (x y) (+ x y))))
(prog1 (funcall fn 3 4)
(format t "~&Calling ~a" (source fn))))
The above returns 7 and prints Calling (LAMBDA (X Y) (+ X Y))
Metaclass
If you want to avoid weak hash tables, you can also wrap your function in another object, which can act like a function (a funcallable object), using the meta-object protocol.
In that case, you can use closer-mop to have a unified API to work with the Meta-Object Protocol:
(ql:quickload :closer-mop)
You define a subclass of funcallable-standard-object that keep track of the source code, and the function (or closure) being called:
(defclass fn-with-code (c2mop:funcallable-standard-object)
((source :reader source-of :initarg :source))
(:metaclass c2mop:funcallable-standard-class))
The object can be called like any other function, but for that you need to call set-funcallable-instance-function. We can do that after initializing the object, by definining the following method:
(defmethod initialize-instance :after ((f fn-with-code)
&key function &allow-other-keys)
(c2mop:set-funcallable-instance-function f function))
I also define a help function to build such an instance, given a function object and its source code:
(defun make-fn-with-code (function source)
(make-instance 'fn-with-code :source source :function function))
Then, we can rewrite lambda* as follows:
(defmacro lambda* ((&rest args) &body body)
(let ((code `(lambda ,args ,#body)))
`(make-fn-with-code ,code ',code)))
Finally, what is useful with this approach is that the code can be printed automatically when the function is printed, by defining a method for print-object:
(defmethod print-object ((o fn-with-code) stream)
(print-unreadable-object (o stream :type nil :identity nil)
(format stream "FUN ~a" (source-of o))))
> (lambda* (x y) (* x y))
#<FUN (LAMBDA (X Y) (* X Y))> ;; << printed as follow

You are nearly there with a macro. If you merge "foo" and "format-function" into one macro:
(defmacro format-result (a b &key test)
`(format t "~S is the result of my test ~A"
(funcall ,test ,a ,b) ',test))
so:
(FORMAT-RESULT 1 2 :test (lambda (a b) (+ a b)))
3 is the result of my test (LAMBDA (A B) (+ A B))
(FORMAT-RESULT 1 2 :test #'+)
3 is the result of my test #'+

Using test bench code from Practical Common Lisp

In Practical Common Lisp Ch. 9, Peter Seibel provides a basic unit test bench for comparing expected with actual results of evaluating S-expressions. For example, defining a test as (deftest plus-test () (check (= (+ 1 2) 3))) and evaluating (plus-test) will print the result pass ... (PLUS-TEST): (= (+ 1 2) 3). However, a slightly more complex example like (deftest cdr-test () (check (equal (cdr '(a |a| "a" #\a)) '(|a| "a" #\a) produces the result pass ... (CDR-TEST): (equal (cdr '(A a a a)) '(a a a)) rather than pass ... (CDR-TEST): (equal (cdr '(a |a| "a" #\a)) '(|a| "a" #\a)). I have not been able to successfully modify his code to print the desired result, and would appreciate some assistance. Here is his code from Ch. 9:
(defmacro with-gensyms ((&rest names) &body body)
`(let ,(loop for n in names collect `(,n (make-symbol ,(string n))))
,#body))
(defvar *test-name* nil)
(defmacro deftest (name parameters &body body)
"Define a test function. Within a test function we can call other
test functions or use `check' to run individual test cases."
`(defun ,name ,parameters
(let ((*test-name* (append *test-name* (list ',name))))
,#body)))
(defmacro check (&body forms)
"Run each expression in `forms' as a test case."
`(combine-results
,#(loop for f in forms collect `(report-result ,f ',f))))
(defmacro combine-results (&body forms)
"Combine the results (as booleans) of evaluating `forms' in order."
(with-gensyms (result)
`(let ((,result t))
,#(loop for f in forms collect `(unless ,f (setf ,result nil)))
,result)))
(defun report-result (result form)
"Report the results of a single test case. Called by `check'."
(format t "~:[FAIL~;pass~] ... ~a: ~a~%" result *test-name* form)
result)

This is just a question about format control, since the function format does the output in the code you are using.
The corresponding Common Lisp Hyperspec documentation is in FORMAT Printer Operations.
* (format t "~a" ''(a #\a))
'(A a)
NIL
* (format t "~s" ''(a #\a))
'(A #\a)
NIL

Fiddling with arguments order in Common Lisp functions

After reading Python's range() analog in Common Lisp, I went thinking that I didn't really like the function interfaces used on the answers.
Three different lambda lists appear there:
(start end &optional (step 1)): both the start and end arguments are mandatory.
(end &key (start 0) (step 1)): IMHO, using keyword arguments seems overkill for such simple function, and they are there just to hide the fact that end and start do not appear in the natural order (i.e. first start, then end)
(n &key (start 0) (step 1)) (from alexandria:iota): here, the optionality and order of the arguments are right, but at the expense of using a different abstraction.
The thing is that I would like to write (range 6) to generate (0 1 2 3 4 5) but also (range 3 6) to generate (3 4 5). And actually, it is easily doable; for instance:
(defun range (start_or_end &optional end (step 1))
(multiple-value-bind (start end)
(if end
(values start_or_end end)
(values 0 start_or_end))
(loop for n from start below end by step collect n)))
But well, I haven't seen this kind of argument fiddling in others code, and as a Lisp newbie I would like to know if that is an acceptable idiom or not.
Update: I have just discovered that Racket provides a range function similar to the one I was proposing (an also the in-range generator).

As Alessio Stalla pointed out, there's nothing the matter with this, but it's not something you'll see very often. Overloading by arity gets more complicated when the language permits optional and rest arguments.
I think the way that a case like this would typically be handled is to define things in terms of designators. You could state that a range is determined by three values: a start, an end, and a step. Then you can say that a range designator is a list of length at most three, with the following semantics:
(n) designates (:start 0 :end n :step 1)
(m n) designates (:start m :end n :step 1)
(m n s) designates (:start m :end n :step s)
Then you can do something like:
(defun range (&rest range-designator)
(destructuring-bind (a &optional (b nil bp) (c nil cp))
range-designator
(multiple-value-bind (start end step)
(cond
(cp (values a b c))
(bp (values a b 1))
(t (values 0 a 1)))
(loop for x from start to end by step
collect x))))
CL-USER> (range 5)
(0 1 2 3 4 5)
CL-USER> (range 2 7)
(2 3 4 5 6 7)
CL-USER> (range 2 7 3)
(2 5)
If you anticipate using range designators in other places, you can pull that inside stuff out a bit:
(defun to-range (designator)
(destructuring-bind (a &optional (b nil bp) (c nil cp))
designator
(cond
(cp (values a b c))
(bp (values a b 1))
(t (values 0 a 1)))))
(defun range (&rest range-designator)
(multiple-value-bind (start end step)
(to-range range-designator)
(loop for x from start to end by step collect x)))

It is acceptable, although you don't encounter it very frequently. I'm pretty sure there are functions in the standard with such a signature but I can't remember any at the moment.
One example that I do remember is the JFIELD primitive in ABCL: http://abcl.org/trac/wiki/JavaFfi#FunctionJFIELDJFIELD-RAWSETFJFIELD
If you're concerned about performance, since "parsing" the lambda list has a cost, you can use compiler macros to avoid paying it, especially in a case like yours where the behaviour of the function is driven only by the number of arguments (as opposed to their types).

Filter a list into two parts by a predicate

I want to do
(filter-list-into-two-parts #'evenp '(1 2 3 4 5))
; => ((2 4) (1 3 5))
where a list is split into two sub-lists depending on whether a predicate evaluates to true. It is easy to define such a function:
(defun filter-list-into-two-parts (predicate list)
(list (remove-if-not predicate list) (remove-if predicate list)))
but I would like to know if there is a built-in function in Lisp that can do this, or perhaps a better way of writing this function?

I don't think there is a built-in and your version is sub-optimal because it traverses the list twice and calls the predicate on each list element twice.
(defun filter-list-into-two-parts (predicate list)
(loop for x in list
if (funcall predicate x) collect x into yes
else collect x into no
finally (return (values yes no))))
I return two values instead of the list thereof; this is more idiomatic (you will be using multiple-value-bind to extract yes and no from the multiple values returned, instead of using destructuring-bind to parse the list, it conses less and is faster, see also values function in Common Lisp).
A more general version would be
(defun split-list (key list &key (test 'eql))
(let ((ht (make-hash-table :test test)))
(dolist (x list ht)
(push x (gethash (funcall key x) ht '())))))
(split-list (lambda (x) (mod x 3)) (loop for i from 0 to 9 collect i))
==> #S(HASH-TABLE :TEST FASTHASH-EQL (2 . (8 5 2)) (1 . (7 4 1)) (0 . (9 6 3 0)))

Using REDUCE:
(reduce (lambda (a b)
(if (evenp a)
(push a (first b))
(push a (second b)))
b)
'(1 2 3 4 5)
:initial-value (list nil nil)
:from-end t)

In dash.el there is a function -separate that does exactly what you ask:
(-separate 'evenp '(1 2 3 4)) ; => '((2 4) (1 3))
You can ignore the rest of the post if you use -separate. I had to implement Haskell's partition function in Elisp. Elisp is similar1 in many respects to Common Lisp, so this answer will be useful for coders of both languages. My code was inspired by similar implementations for Python
(defun partition-push (p xs)
(let (trues falses) ; initialized to nil, nil = '()
(mapc (lambda (x) ; like mapcar but for side-effects only
(if (funcall p x)
(push x trues)
(push x falses)))
xs)
(list (reverse trues) (reverse falses))))
(defun partition-append (p xs)
(reduce (lambda (r x)
(if (funcall p x)
(list (append (car r) (list x))
(cadr r))
(list (car r)
(append (cadr r) (list x)))))
xs
:initial-value '(() ()) ; (list nil nil)
))
(defun partition-reduce-reverse (p xs)
(mapcar #'reverse ; reverse both lists
(reduce (lambda (r x)
(if (funcall p x)
(list (cons x (car r))
(cadr r))
(list (car r)
(cons x (cadr r)))))
xs
:initial-value '(() ())
)))
push is a destructive function that prepends an element to list. I didn't use Elisp's add-to-list, because it only adds the same element once. mapc is a map function2 that doesn't accumulate results. As Elisp, like Common Lisp, has separate namespaces for functions and variables3, you have to use funcall to call a function received as a parameter. reduce is a higher-order function4 that accepts :initial-value keyword, which allows for versatile usage. append concatenates variable amount of lists.
In the code partition-push is imperative Common Lisp that uses a widespread "push and reverse" idiom, you first generate lists by prepending to the list in O(1) and reversing in O(n). Appending once to a list would be O(n) due to lists implemented as cons cells, so appending n items would be O(n²). partition-append illustrates adding to the end. As I'm a functional programming fan, I wrote the no side-effects version with reduce in partition-reduce-reverse.
Emacs has a profiling tool. I run it against these 3 functions. The first element in a list returned is the total amount of seconds. As you can see, appending to list works extremely slow, while the functional variant is the quickest.
ELISP> (benchmark-run 100 (-separate #'evenp (number-sequence 0 1000)))
(0.043594004 0 0.0)
ELISP> (benchmark-run 100 (partition-push #'evenp (number-sequence 0 1000)))
(0.468053176 7 0.2956386049999793)
ELISP> (benchmark-run 100 (partition-append #'evenp (number-sequence 0 1000)))
(7.412973128 162 6.853687342999947)
ELISP> (benchmark-run 100 (partition-reduce-reverse #'evenp (number-sequence 0 1000)))
(0.217411618 3 0.12750035599998455)
References
Differences between Common Lisp and Emacs Lisp
Map higher-order function
Technical Issues of Separation in Function Cells and Value Cells
Fold higher-order function

I don't think that there is a partition function in the common lisp standard, but there are libraries that provide such an utility (with documentation and source).
CL-USER> (ql:quickload :arnesi)
CL-USER> (arnesi:partition '(1 2 3 4 5) 'evenp 'oddp)
((2 4) (1 3 5))
CL-USER> (arnesi:partition '(1 2 b "c") 'numberp 'symbolp 'stringp)
((1 2) (B) ("c"))

Improve speed of string manipulations

This is a follow-up question, sort of, to this one: Write an efficient string replacement function? .
In (albeit distant) future I hope to get to do natural language processing. Of course speed of strings manipulation is important because of that. Accidentally, I've stumbled over this test: http://raid6.com.au/~onlyjob/posts/arena/ - all tests are biased, this is no exception. However, it raised important question for me. And so I wrote a few tests to see how am I doing:
This was my first attempt (I'll call it #A):
#A
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln = (* (length addidtion) i)
for accumulated = addidtion
then (loop with concatenated = (concatenate 'string accumulated addidtion)
for start = (search "efgh" concatenated)
while start do (replace concatenated "____" :start1 start)
finally (return concatenated))
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Baffled with the results, I tried to use cl-ppcre - I don't know what I was hoping for, but the results came out as really bad... Here's the code I used for testing:
#B
(ql:quickload "cl-ppcre")
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln = (* (length addidtion) i)
for accumulated = addidtion
then (cl-ppcre:regex-replace-all "efgh" (concatenate 'string accumulated addidtion) "____")
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Well, then, in hopes to maybe side-step some generalizations, I decided to write my own, albeit somewhat naive version:
#C
(defun replace-all (input match replacement)
(declare (type string input match replacement)
(optimize (debug 0) (safety 0) (speed 3)))
(loop with pattern fixnum = (1- (length match))
with i fixnum = pattern
with j fixnum = i
with len fixnum = (length input) do
(cond
((>= i len) (return input))
((zerop j)
(loop do
(setf (aref input i) (aref replacement j) i (1+ i))
(if (= j pattern)
(progn (incf i pattern) (return))
(incf j))))
((char= (aref input i) (aref match j))
(decf i) (decf j))
(t (setf i (+ i 1 (- pattern j)) j pattern)))))
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion string = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i fixnum from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln fixnum = (* (length addidtion) i)
for accumulated string = addidtion
then (replace-all (concatenate 'string accumulated addidtion) "efgh" "____")
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Almost as slow as cl-ppcre! Now, that's incredible! There isn't anything I can spot here such that would result in such poor performance... And still it does suck :(
Realizing that the standard functions performed the best so far, I looked into SBCL source and after some reading I came up with this:
#D
(defun replace-all (input match replacement &key (start 0))
(declare (type simple-string input match replacement)
(type fixnum start)
(optimize (debug 0) (safety 0) (speed 3)))
(loop with input-length fixnum = (length input)
and match-length fixnum = (length match)
for i fixnum from 0 below (ceiling (the fixnum (- input-length start)) match-length) do
(loop with prefix fixnum = (+ start (the fixnum (* i match-length)))
for j fixnum from 0 below match-length do
(when (<= (the fixnum (+ prefix j match-length)) input-length)
(loop for k fixnum from (+ prefix j) below (the fixnum (+ prefix j match-length))
for n fixnum from 0 do
(unless (char= (aref input k) (aref match n)) (return))
finally
(loop for m fixnum from (- k match-length) below k
for o fixnum from 0 do
(setf (aref input m) (aref replacement o))
finally
(return-from replace-all
(replace-all input match replacement :start k))))))
finally (return input)))
(defun test ()
(declare (optimize (debug 0) (safety 0) (speed 3)))
(loop with addidtion string = (concatenate 'string "abcdefgh" "efghefgh")
and initial = (get-internal-real-time)
for i fixnum from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
for ln fixnum = (* (length addidtion) i)
for accumulated string = addidtion
then (replace-all (concatenate 'string accumulated addidtion) "efgh" "____")
when (zerop (mod ln (* 1024 256))) do
(format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
(values))
(test)
Finally, I can win, although a tiny fraction of performance against the standard library - yet it is still very-very bad compared to almost everything else...
Here's the table with the results:
| SBCL #A | SBCL #B | SBCL #C | SBCL #D | C gcc 4 -O3 | String size |
|-----------+-----------+------------+-----------+-------------+-------------|
| 17.463 s | 166.254 s | 28.924 s | 16.46 s | 1 s | 256 Kb |
| 68.484 s | 674.629 s | 116.55 s | 63.318 s | 4 s | 512 Kb |
| 153.99 s | gave up | 264.927 s | 141.04 s | 10 s | 768 Kb |
| 275.204 s | . . . . . | 474.151 s | 251.315 s | 17 s | 1024 Kb |
| 431.768 s | . . . . . | 745.737 s | 391.534 s | 27 s | 1280 Kb |
| 624.559 s | . . . . . | 1079.903 s | 567.827 s | 38 s | 1536 Kb |
Now, the question: What did I do wrong? Is this something inherent to Lisp strings? Can this probably be mitigated through... what?
In the long shot, I'd even consider writing a specialized library for string processing. If the problem isn't my bad code, but rather the implementation. Would it make sense to do so? If yes, what language would you suggest for doing it?
EDIT: Just for the record, I'm now trying to use this library: https://github.com/Ramarren/ropes to deal with strings concatenation. Unfortunately, it doesn't have a replace function in it and doing multiple replaces isn't very trivial. But I'll keep this post updated when I have something.
I've tried to slightly change huaiyuan's variant to use array's fill-pointers instead of string concatenation (to achieve something similar to StringBuilder suggested by Paulo Madeira. It probably can be optimized further, but I'm not sure about the types / which will method be faster / will it be worth to redefine types for * and + to get them to only operate on fixnum or signed-byte. Anyway, here's the code and the benchmark:
(defun test/e ()
(declare (optimize speed))
(labels ((min-power-of-two (num)
(declare (type fixnum num))
(decf num)
(1+
(progn
(loop for i fixnum = 1 then (the (unsigned-byte 32) (ash i 1))
while (< i 17) do
(setf num
(logior
(the fixnum
(ash num (the (signed-byte 32)
(+ 1 (the (signed-byte 32)
(lognot i)))))) num))) num)))
(join (x y)
(let ((capacity (array-dimension x 0))
(desired-length (+ (length x) (length y)))
(x-copy x))
(declare (type fixnum capacity desired-length)
(type (vector character) x y x-copy))
(when (< capacity desired-length)
(setf x (make-array
(min-power-of-two desired-length)
:element-type 'character
:fill-pointer desired-length))
(replace x x-copy))
(replace x y :start1 (length x))
(setf (fill-pointer x) desired-length) x))
(seek (old str pos)
(let ((q (position (aref old 0) str :start pos)))
(and q (search old str :start2 q))))
(subs (str old new)
(loop for p = (seek old str 0) then (seek old str p)
while p do (replace str new :start1 p))
str))
(declare (inline min-power-of-two join seek subs)
(ftype (function (fixnum) fixnum) min-power-of-two))
(let* ((builder
(make-array 16 :element-type 'character
:initial-contents "abcdefghefghefgh"
:fill-pointer 16))
(ini (get-internal-real-time)))
(declare (type (vector character) builder))
(loop for i fixnum below (+ 1000 (* 4 1024 1024 (/ (length builder))))
for j = builder then
(subs (join j builder) "efgh" "____")
for k fixnum = (* (length builder) i)
when (= 0 (mod k (* 1024 256)))
do (format t "~&~8,2F sec ~8D kB"
(/ (- (get-internal-real-time) ini) 1000)
(/ k 1024))))))
1.68 sec 256 kB
6.63 sec 512 kB
14.84 sec 768 kB
26.35 sec 1024 kB
41.01 sec 1280 kB
59.55 sec 1536 kB
82.85 sec 1792 kB
110.03 sec 2048 kB

The bottle-neck is the search function, which is perhaps not optimized in SBCL. The following version uses position to help it skip over impossible region and is about 10 times as fast as your version #A on my machine:
(defun test/e ()
(declare (optimize speed))
(labels ((join (x y)
(concatenate 'simple-base-string x y))
(seek (old str pos)
(let ((q (position (char old 0) str :start pos)))
(and q (search old str :start2 q))))
(subs (str old new)
(loop for p = (seek old str 0) then (seek old str p)
while p do (replace str new :start1 p))
str))
(declare (inline join seek subs))
(let* ((str (join "abcdefgh" "efghefgh"))
(ini (get-internal-real-time)))
(loop for i below (+ 1000 (* 4 1024 1024 (/ (length str))))
for j = str then (subs (join j str) "efgh" "____")
for k = (* (length str) i)
when (= 0 (mod k (* 1024 256)))
do (format t "~&~8,2F sec ~8D kB"
(/ (- (get-internal-real-time) ini) 1000)
(/ k 1024))))))

The tests in that page are indeed biased, so let's see by how much. The author claims to test string manipulation, but here's what the programs in that page test:
String concatenation
Memory management, either explicit (C) or implicit
In some languages, regular expressions
In others, string search algorithms and substring replacement
Memory access, which has bounds checks on several languages
There are way too many aspects just here. Here's how it's being measured:
Real time in seconds
This is unfortunate, since the computer had to be completely dedicated to running just this test for reasonable values, without any other processes whatsoever, such as services, antiviruses, browsers, even a waiting *nix shell. CPU time would be much more useful, you could even run the tests in a virtual machine.
Another aspect is that characters in C, C++, Perl, Python, PHP and Ruby are 8-bit, but they're 16-bit in many of the other tested languages. This means that memory usage is stressed in very different amounts, by at least a factor of 2. Here, cache misses are much more noticeable.
I suspect the reason Perl is so fast is that it checks its arguments once before invoking a C function, instead of constantly checking bounds. Other languages with 8-bit strings are not so fast, but are still reasonably fast.
JavaScript V8 has strings that are ASCII if possible, so if the appended and replaced token was "ëfgh", you'd pay a lot more in that implementation.
Python 3 is almost three times slower than Python 2, and my guess is it's due to the wchar_t * vs char * internal representation of strings.
JavaScript SpiderMonkey uses 16-bit strings. I didn't dig the sourced much, but the file jsstr.h mentions ropes.
Java is so slow because Strings are immutable, and so for this benchmark, it's definitely not the appropriate data type. You're paying the price of generating a huge string after each .replace(). I haven't tested, but probably a StringBuffer would be much faster.
So, this benchmark is to be taken not only with a grain of salt, but with a handful of it.
In Common Lisp, bounds checking and type dispatching in aref and its setf are probably the bottlenecks.
For good performance, you would have to ditch generic string sequences and use simple-strings or simple-vectors, whichever your implementation optimizes best. Then, you should have a way of making calls to schar or svref and their setfable forms that bypass bounds checking. From here, you can implement your own simple-string-search or simple-character-vector-search (and replace-simple-string or replace-simple-vector, although they play a much smaller role in this particular example) with full speed optimization and type declarations, with bounds checking at the head of each call instead of at each array access.
A sufficiently smart compiler™ would do all of this for you given "proper" declarations. The problem is, you'd have to use (concatenate 'simple-string/simple-vector ...), because neither simple strings nor simple vectors are adjustable.
With a compacting/moving GC, there's always a penalty in these cases (e.g. array/object copying), and choosing between array adjustment and concatenation must really depend on profiling tests. Otherwise, adjustment can be way faster than concatenation, while there's enough free memory to grow the array.
You could use adjustable arrays, if the implementation would access the actual elements directly after a brief bounds checking at the head of optimized calls to/expansions of search and replace with adjustable arrays (e.g. by having internal definitions that take the actual displaced vector/array and start and end offsets).
But I'm speculating a lot here, you have to compile, inspect the compilation and profile in each implementation for real-world facts.
As a side note, the C example code is full of bugs, such as off-by-one (-1, actually) allocations (the strcat calls write an extra byte, the zero-terminated string terminator), an uninitialized zero-terminated string gstr (the first strcat works by luck, because the memory might not be initialized to 0), conversions from size_t and time_t to int and assumption of these types in a printf format string, an unused variable pos_c that is initialized with the first allocation for gstr which is incremented without taking into account that realloc may move the buffer, and no error handling whatsoever.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio