memory usage by objects in common lisp

memory usage by objects in common lisp - memory-management

Is there a way to find out how much memory is used by an instance of a class or basic data types in general?
I have a toy webframework in cl that creates and manages web pages with instances of classes that represent the html tags and their properties, and as they are supposed to make an html page, they have children in a slot called children. so I was thinking how much a user's session will cost the server if I take this approach. Thanks.

As far as I know, there is nothing like this for arbitrary objects in the standard, but there are implementation-dependent solutions, like ccl:object-direct-size in CCL:
CL-USER> (object-direct-size "foo")
16
However, be aware that whether these do what you want depends on what you mean by "size", since those functions usually don't include the size of the components the object references. You can also run the GC, initialize a few objects and compare room's output before and afterwards.
Also, note that time usually includes allocation information:
CL-USER> (time (length (make-array 100000)))
(LENGTH (MAKE-ARRAY 100000))
took 0 milliseconds (0.000 seconds) to run.
During that period, and with 2 available CPU cores,
0 milliseconds (0.000 seconds) were spent in user mode
0 milliseconds (0.000 seconds) were spent in system mode
400,040 bytes of memory allocated.
100000
Maybe you could try something like this (untested, really just a quick hack):
(defmethod size ((object standard-object))
(let ((size (ccl:object-direct-size object)))
(dolist (slot (mapcar #'ccl:slot-definition-name
(ccl:class-slots (class-of object))))
(when (slot-boundp object slot)
(incf size (size (slot-value object slot)))))
size))
(defmethod size ((list list))
(reduce (lambda (acc object) (+ acc (size object)))
list
:initial-value (ccl:object-direct-size list)))
(defmethod size (object)
(ccl:object-direct-size object))
For example:
CL-USER> (defclass foo ()
((child :accessor child :initarg :child)))
#<STANDARD-CLASS FOO>
CL-USER> (defclass bar (foo)
((child2 :accessor child2 :initarg :child2)))
#<STANDARD-CLASS BAR>
CL-USER> (size '())
0
CL-USER> (size "foo")
16
CL-USER> (size '("foo" "bar"))
40
CL-USER> (size (make-instance 'foo))
16
CL-USER> (size (make-instance 'foo :child '("foo" "bar" "baz")))
72
CL-USER> (size (make-instance
'bar
:child "foo"
:child2 (make-instance 'foo :child (make-array 100))))
456

In Common Lisp the CLOS objects usually are a collection of slots. Typically these slots might be internally stored in some kind of vector. CLOS slots typically will contain either a pointer to some data object or, for a few primitive datatypes, may include the data itself. These primitive data types have to fit into a memory word: examples are fixnums and characters. Common Lisp implementations typically don't inline more complex data structures into a slot. For example a slot could be declared to contain a vector of fixnums. Implementations would not allocate this vector inside the CLOS object. The CLOS object will point to a vector object.
The CLOS object itself should occupy then: number of slots * word size + overhead.
Let's assume a word is 4 bytes long, 32bit.
This might be the size for a CLOS object with ten slots:
10 slots * 4 bytes + 8 bytes = 48 bytes
Now imagine that each slot of a CLOS object points to a different string and each string is 100 bytes long.
Example from above:
1 CLOS object + 10 strings each 100 bytes.
48 bytes + 10 * 100 = 1048 bytes
Now imagine that each of the slot points to the same string:
1 CLOS object + 1 string of 100 bytes.
48 bytes + 100 bytes = 148 bytes
To calculate the size of a CLOS object you could either:
just count the size of the CLOS object itself. That's easy.
somehow calculate a graph of objects with are reachable from the object, determine the unique memory objects (minus direct allocated primitive objects) and sum all memory sizes of those.

I also have web framework in cl, also was struggling with same sessions problem and here is what universe sent to me https://people.gnome.org/~xan/memory.lisp
It seems to work in sbcl
(memory::dump-memory (weblocks::active-sessions))
Total memory used: 99.785706 MB

Related

How to generate random numbers in [0 ... 1.0] in Common Lisp

My understanding of Common Lisp pseudorandom number generation is that (random 1.0) will generate a fraction strictly less than 1. I would like to get numbers upto 1.0 inclusive. Is this possible? I guess I could decide on a degree of precision and generate integers and divide by the range but I'd like to know if there is a more widely accepted way of doing this. Thanks.

As you say, random will generate numbers in [0,1) by default, and in general (random x) will generate random numbers in [0,x). If these were real numbers and if the distribution really is random, then the probability of getting any number is zero, so this is effectively no different than [0,1]. But they're not real numbers: they're floats, so the probability of getting any particular value is higher since there are only a finite number of floats in [0,1].
Fortunately you can express exactly what you want: CL has a bunch of constants with names like *-epsilon which are defined so that, for instance
(/= (+ 1.0f0 single-float-epsilon) 1.0f0)
and single-float-epsilon is the smallest single-float for which this is true.
Thus (random (+ 1.0f0 single-float-epsilon)) will produce random single-floats in the range [0,1], and will eventually probably turn out 1.0f0. You can test this:
(defun tsit ()
(let ((f (+ 1.0f0 single-float-epsilon)))
(assert (/= f 1.0f0) (f) "oops")
(loop for i upfrom 1
for v = (random f)
when (= v 1.0f0)
return (values i v))))
And for me
> (tsit)
12839205
1.0
If you use double floats it takes ... quite a lot longer ... to get 1.0d0 (and remember to use double-float-epsilon).

I have a bit of a different idea here. Instead of trying to stretch the range over an epsilon, we can work with the original range, and pick a victim number somewhere in that range which gets mapped to the range limit. We can avoid a hard-coded victim by choosing one randomly, and changing it from time to time:
(defun make-random-gen (range)
(let ((victim nil)
(count 1))
(lambda ()
(when (zerop (decf count))
(setf count 10000
victim (random range)))
(let ((out (random range)))
(if (eql out victim) range out)))))
(defun testit ()
(loop with r = (make-random-gen 1.0)
for x = (funcall r)
until (eql x 1.0)
counting t))
At the listener:
[5]> (testit)
23030093
There is a small bias here in that the victim is never equal to range. So that is to say, the range value such as 1.0 is never victim and therefore always has a certain chance of occurring. Whereas every other value can potentially take a turn at being victim, having its chance of occurring temporarily reduced to zero. That should be faintly detectable in a statistical analysis of the output in that the range value will occur slightly more often than any other value.
It would be interesting to update this approach with a correction for that, an attempt to do which is this:
(defun make-random-gen (range)
(let ((victim nil)
(count 1))
(labels ((gen ()
(when (zerop (decf count))
(setf count 10000
victim (gen)))
(let ((out (random range)))
(if (eql out victim) range out))))
#'gen)))
Now when we select victim, we recurse on our own function which can potentially select range. Whenever range is selected as victim, that value is correctly suppressed: range will not occur in the output, because out will never be eql to range.
We can justify this with the following hand-waving argument:
Let us suppose that the recursive call to gen has a slight bias in favor of range being output. But whenever that happens, range is selected as victim, which prevents it from appearing in the output of gen.
There is a kind of negative feedback which should almost entirely correct the bias.
Note: our random-number-generating lambda would be better designed if it also captured a random state object also and used that. Then the sequence it yields would be undisturbed by other uses of the pseudo-random-number generator. That's a different topic.
On a theoretical note, note that neither [0, 1) nor [0, 1] yield strictly correct distributions. If we had a mathematically ideal PRNG, it would yield actual real numbers in these ranges. Since that range contains an uncountable infinity of real values, each one would occur with a zero probability: 1/aleph-null, which, I'm guessing, so tiny, that it cannot be distinguished from a real zero.
What we want is the floating-point PRNG to approximate the ideal PRNG.
The problem is that each floating-point value approximates a range of real values. So this means that if we have a generator of values in the range 0.0 to 1.0, it actually represents a range of real numbers from -epsilon to 1.0 + epsilon. If we take values from this PRNG and plot a bar graph of values, each bar in the graph has to have some nonzero width. The 0.0 bar is centered on 0, and the 1.0 bar is centered on 1. The distribution of real numbers extends from the left edge of the left bar, to the right edge of the right bar.
In order to create a PRNG which mimics an even distribution of values in the 0.0 to 1.0 interval, we have to include the 0.0 and 1.0 values with half probability. So that is to say, when we collect a large number of values from the PRNG, the 0.0 and 1.0 bars of the graph should be about half as high as all the other bars.
Under these conditions, we cannot distinguish the [0, 1.0) interval from the [0, 1.0] interval because they are exactly as large. We must include the 1.0 value, at about half the usual probability to account for the above uniformity problem. If we simply exclude that value, we create a bias in the wrong direction, because the 1.0 bar in the histogram now has a zero value.
One way we could rescue the situation might be to take the 1.0-epsilon bar of the histogram and make that value 50% more likely, so that the bar is 50% taller than average. Basically, we overload that last value of the range just before 1.0 to represent everything up to and not including 1.0, requiring that value to be more likely. And then, we exclude the 1.0 value from the output. All values approaching 1.0 from the left get mapped to the extra 50% probability of 1.0 - epsilon.

Single-value decomposition of a sparse matrix

I have a relatively big matrix of which I would like to compute the single-value decomposition. Using the straight-forward linear/svd function of core.matrix (using the :vectorz implementation) unfortunately leads to an out-of-memory exception -- my machine has comparingly little memory for a dev machine (8GB, Java heap space is set to max at 5GB).
The matrix has the dimensions [422, 23069] and is relatively sparse (~1.74% of the values are non-zero), so my next attempt was converting the matrix to a sparse-matrix:
(def sparse-fs (matrix/sparse-matrix fs))
This surprisingly fails with an ArrayOutOfBoundsException in the Java code. I could work around this by creating a sparse matrix first and then setting the non-zero values:
user> (def sparse-fs (matrix/sparse-matrix [422 23069]))
#'user/sfs
user> (count
(map-indexed
(fn [row line]
(map-indexed
(fn [col val]
(when (not (= val 0.0))
(matrix/mset! sparse-fs row col val)))))
fs))
422
However, calling linear/svd on this sparse matrix also fails, as the protocol for svd is apparently not implemented:
user> (def svd-fs (linear/svd sparse-fs))
CompilerException java.lang.IllegalArgumentException: No implementation of method: :svd of protocol:
#'clojure.core.matrix.protocols/PSVDDecomposition found for class: mikera.vectorz.Vector2,
I'm currently out of ideas on how to progress from here and would appreciate any input on how I could fit my matrix (and the svd computation) into my relatively small memory.
Update:
The protocol problem comes from me still trying to use clojure.core.matrix/sparse-matrix, which intended use I apparently don't understand. Instead I can use new-sparse-array which generates an instance implementing AMatrix, for which the decomposition protocol is implemented:
user> (def foo-sparse (matrix/sparse-matrix [422 23069]))
#'user/foo-sparse
user> (type foo-sparse)
mikera.vectorz.Vector2
user> (matrix/dimensionality foo-sparse)
1
user> (def foo-sparse (matrix/new-sparse-array [422 23069]))
#'user/foo-sparse
user> (matrix/dimensionality foo-sparse)
2
user> (type foo-sparse)
mikera.matrixx.impl.SparseRowMatrix
Unfortunately, when I call linear/svd on this matrix, I'm back at my out of memory error:
1. Caused by java.lang.OutOfMemoryError
Java heap space
DoubleArrays.java: 724 mikera.vectorz.util.DoubleArrays/createStorage
Matrix.java: 45 mikera.matrixx.Matrix/<init>
Matrix.java: 56 mikera.matrixx.Matrix/create
Matrix.java: 653 mikera.matrixx.Matrix/createIdentity
BidiagonalRow.java: 174 mikera.matrixx.decompose.impl.bidiagonal.BidiagonalRow/handleU
BidiagonalRow.java: 155 mikera.matrixx.decompose.impl.bidiagonal.BidiagonalRow/getU
BidiagonalRow.java: 115 mikera.matrixx.decompose.impl.bidiagonal.BidiagonalRow/_decompose
BidiagonalRow.java: 78 mikera.matrixx.decompose.impl.bidiagonal.BidiagonalRow/decompose
Bidiagonal.java: 21 mikera.matrixx.decompose.Bidiagonal/decompose
SvdImplicitQr.java: 177 mikera.matrixx.decompose.impl.svd.SvdImplicitQr/bidiagonalization
SvdImplicitQr.java: 154 mikera.matrixx.decompose.impl.svd.SvdImplicitQr/_decompose
SvdImplicitQr.java: 89 mikera.matrixx.decompose.impl.svd.SvdImplicitQr/decompose
SVD.java: 31 mikera.matrixx.decompose.SVD/decompose
matrix_api.clj: 334 mikera.vectorz.matrix-api/eval26238/fn
protocols.cljc: 1150 clojure.core.matrix.protocols$eval21076$fn__21077$G__21067__21084/invoke
linear.cljc: 105 clojure.core.matrix.linear$svd/invoke
I suspect that this might be related to the vectorz-clj issue 18 that operations on sparse matrices don't produce sparse results.
Any alternatives?

I could work around my memory problem on the svd computation by using the :clatrix implementation. Clatrix doesn't support sparse matrixes, but seems to use less memory on svd computation.

Why is this Clojure micro benchmark so slow?

There was a previous question which was answered successfully on comparing speeds of Clojure to Scala, but applying those same techniques to the following code still leaves it over 25 times slower than equivalent Scala code. This is comparing Clojure 1.6.0 with Leiningen 2.5.0 on Java 1.8.0_40 to Scala 2.11.6:
The comparisons are made not using the REPL but using the Leiningen "run" command and run at about the same speed when run directly from java after producing a standalone '.jar' file using the Leiningen "uberjar" command.
The micro benchmark tests the speed of doing bit manipulations inside an array, which is typical of some low level types of tasks such as encryption or compression or in primes sieving. To get a reasonable measurement interval and to avoid JIT overheads spoiling the results, the benchmark runs the same loop 1000 times.
The Clojure code is as follows:
(ns test-cljr-speed.core
(:gen-class))
(set! *unchecked-math* true)
(set! *warn-on-reflection* true)
(defn testspeed
"test array bit manipulating tight loop speeds."
[]
(let [lps 1000,
len (bit-shift-left 1 12),
bits ^int (int (bit-shift-left 1 17))]
(let [buf ^ints(int-array len)]
(letfn [(doit []
(loop [i ^int (int 0)]
(if (< i bits)
(let [w ^int (int (bit-shift-right i 5))]
(do
(aset-int ^ints buf w ^int (int (bit-or ^int (aget ^ints buf w)
^long (bit-shift-left 1 ^long (bit-and i 31)))))
(recur (inc i)))))))]
(dorun lps (repeatedly doit))))))
(defn -main
"runs test."
[& args]
(let [strt (System/nanoTime),
cnt (testspeed),
stop (System/nanoTime)]
(println "Took " (long (/ (- stop strt) 1000000)) " milliseconds.")))
Which produces the following output:
Took 9342 milliseconds.
I believe the problem to be related to reflection accessing the buffer array, but have applied all sorts of type hints as recommended and can't seem to find it.
Comparable Scala code is as follows:
object Main extends App {
def testspeed() = {
val lps = 1000
val len = 1 << 12
val bits = 1 << 17
val buf = new Array[Int](len)
def doit() = {
def set1(i: Int): Unit =
if (i < bits) {
buf(i >> 5) |= 1 << (i & 31)
set1(i + 1)
}
set1(0)
}
(0 until lps).foreach { _ => doit() }
}
val strt = System.nanoTime()
val cnt = testspeed()
val stop = System.nanoTime()
println(s"Took ${(stop - strt) / 1000000} milliseconds.")
}
Which produces the following output:
Took 365 milliseconds.
Doing the same job, it is over 25 times as fast!!!
I have turned on the warn-on-reflection flag and there doesn't seem to be any Java reflection going on where more hinting would help. Perhaps I am not turning on some optimization settings properly (perhaps set in the project file for Leiningen?) as they are hard to dig out on the Internet; for Scala I have turned off all debugging output and enabled the compiler "optimize" flag, which makes some improvement.
My question is "Is there something that can be done for this type of application that will make Clojure run at a speed more comparable to the Scala speed?".
To short circuit any false speculation, yes, the array is indeed being filled with all binary ones a multiple of times as determined by another series of tests, and no, Scala is not optimizing away all but one loop.
I am not interested in discussions on the comparative merits of the two languages, but only how one can produce reasonably elegant Clojure code to do the same job at least ten times faster on a bit by bit basis (not a simple array fill operation, as the linear fill is just representative of more complex tasks such as prime number culling).
Using a Java BitSet does not have the problem (but not all algorithms are suited to only an set of booleans), nor likely does using a Java Integer array and Java class methods to access it, but one should be able to use the Clojure "native" array types without these sort of performance problems.

First off, your type hints are not affecting the execution time of the Clojure code, and on my machine the updated version is not an improvement:
user=> (time (testspeed))
"Elapsed time: 6256.075155 msecs"
nil
user=> (time (testspeedx))
"Elapsed time: 6371.968782 msecs"
nil
You are doing a number of type hints that are not needed, and stripping them all away actually makes the code faster:
(defn testspeed-unhinted
"test array bit manipulating tight loop speeds."
[]
(let [lps 1000,
len (bit-shift-left 1 12),
bits (bit-shift-left 1 17)]
(let [buf (int-array len)]
(letfn [(doit []
(loop [i (int 0)]
(if (< i bits)
(let [w (bit-shift-right i 5)]
(do
(aset buf w (bit-or (aget buf w)
(bit-shift-left 1 (bit-and i 31))))
(recur (inc i)))))))]
(dorun lps (repeatedly doit)))))))
user=> (time (testspeed-unhinted))
"Elapsed time: 270.652953 msecs"
It occurred to me that coercing i to int on the recur would potentially speed up the code, but it actually slows it down. With that in mind, I decided to try removing ints from the code entirely and see what the result was performance wise:
(defn testspeed-unhinted-longs
"test array bit manipulating tight loop speeds."
[]
(let [lps 1000,
len (bit-shift-left 1 12),
bits (bit-shift-left 1 17)]
(let [buf (long-array len)]
(letfn [(doit []
(loop [i 0]
(if (< i bits)
(let [w (bit-shift-right i 5)]
(do
(aset buf w (bit-or (aget buf w)
(bit-shift-left 1 (bit-and i 31))))
(recur (inc i)))))))]
(dorun lps (repeatedly doit)))))))
user=> (time (testspeed-unhinted-longs))
"Elapsed time: 221.025048 msecs"
The performance gain was relatively small, so I used the criterium lib to get accurate microbenchmarks for the difference:
user=> (crit/bench (testspeed-unhinted))
WARNING: Final GC required 2.2835076167941852 % of runtime
Evaluation count : 240 in 60 samples of 4 calls.
Execution time mean : 260.877321 ms
Execution time std-deviation : 18.168141 ms
Execution time lower quantile : 251.952111 ms ( 2.5%)
Execution time upper quantile : 321.995872 ms (97.5%)
Overhead used : 15.568045 ns
Found 8 outliers in 60 samples (13.3333 %)
low-severe 1 (1.6667 %)
low-mild 7 (11.6667 %)
Variance from outliers : 51.8061 % Variance is severely inflated by outliers
nil
user=> (crit/bench (testspeed-unhinted-longs))
Evaluation count : 300 in 60 samples of 5 calls.
Execution time mean : 232.078704 ms
Execution time std-deviation : 24.828378 ms
Execution time lower quantile : 219.615718 ms ( 2.5%)
Execution time upper quantile : 297.456135 ms (97.5%)
Overhead used : 15.568045 ns
Found 11 outliers in 60 samples (18.3333 %)
low-severe 2 (3.3333 %)
low-mild 9 (15.0000 %)
Variance from outliers : 72.1097 % Variance is severely inflated by outliers
nil
So the final result is, you can get a huge speedup by removing your type hints (since everything critical in the code is already totally unambiguous type wise), and you can get a small improvement on top of that by switching from int to long (at least on my 64 bit intel machine).

I'll just answer my own question to help others that may be fighting this same issue:
After perusing another question's answer, I accidentally stumbled on the problem: "aset" is fine; "aset-int" (and all the other specialized forms of "aset-?") is not and no amount of type hinting helps.
In the following code for the test procedure Edited as per #noisesmith's answer, all I change is to using "long-array" ("int array" also works, just not quite as fast) and use the "aset" instead of "aset-long" (or "aset-int" for "int-array") and have eliminated all type hints:
(set! *unchecked-math* true)
(defn testspeed
"test array bit manipulating tight loop speeds."
[]
(let [lps 1000,
len (bit-shift-left 1 11),
bits (bit-shift-left 1 17),
buf (long-array len)]
(letfn [(doit []
(loop [i (int 0)]
(if (< i bits)
(let [w (bit-shift-right i 6)]
(do
(aset buf w (bit-or (aget buf w)
(bit-shift-left 1 (bit-and i 63))))
(recur (inc i)))))))]
(dorun lps (repeatedly doit)))))
With the result that it produces the following output:
Took 395 milliseconds.
With "aset-long" instead of "aset", the output is:
Took 7424 milliseconds.
for a speed-up of almost 19 times.
Now this is just very slightly slower than the Scala code using a Int array (which is faster for Scala than using a Long array), but that is somewhat understandable as Clojure does not have the read/modify/write primitives as "|=" and it seems that the compiler is not smart enough to see that a read/modify/write operation is what is implied in the above code.
However, being only a few percent slower is completely acceptable and means that for this type of application, performance is not the criteria for choosing between Scala or Clojure.
This solution doesn't make sense, as the specialized versions of "aset-?" should really just be calling through to the overloaded cases of "aset", but it seems there is a problem/bug affecting their performance, at least with the current version 1.6.0.

Find size of object in SBCL [duplicate]

Is there a way to find out how much memory is used by an instance of a class or basic data types in general?
I have a toy webframework in cl that creates and manages web pages with instances of classes that represent the html tags and their properties, and as they are supposed to make an html page, they have children in a slot called children. so I was thinking how much a user's session will cost the server if I take this approach. Thanks.

As far as I know, there is nothing like this for arbitrary objects in the standard, but there are implementation-dependent solutions, like ccl:object-direct-size in CCL:
CL-USER> (object-direct-size "foo")
16
However, be aware that whether these do what you want depends on what you mean by "size", since those functions usually don't include the size of the components the object references. You can also run the GC, initialize a few objects and compare room's output before and afterwards.
Also, note that time usually includes allocation information:
CL-USER> (time (length (make-array 100000)))
(LENGTH (MAKE-ARRAY 100000))
took 0 milliseconds (0.000 seconds) to run.
During that period, and with 2 available CPU cores,
0 milliseconds (0.000 seconds) were spent in user mode
0 milliseconds (0.000 seconds) were spent in system mode
400,040 bytes of memory allocated.
100000
Maybe you could try something like this (untested, really just a quick hack):
(defmethod size ((object standard-object))
(let ((size (ccl:object-direct-size object)))
(dolist (slot (mapcar #'ccl:slot-definition-name
(ccl:class-slots (class-of object))))
(when (slot-boundp object slot)
(incf size (size (slot-value object slot)))))
size))
(defmethod size ((list list))
(reduce (lambda (acc object) (+ acc (size object)))
list
:initial-value (ccl:object-direct-size list)))
(defmethod size (object)
(ccl:object-direct-size object))
For example:
CL-USER> (defclass foo ()
((child :accessor child :initarg :child)))
#<STANDARD-CLASS FOO>
CL-USER> (defclass bar (foo)
((child2 :accessor child2 :initarg :child2)))
#<STANDARD-CLASS BAR>
CL-USER> (size '())
0
CL-USER> (size "foo")
16
CL-USER> (size '("foo" "bar"))
40
CL-USER> (size (make-instance 'foo))
16
CL-USER> (size (make-instance 'foo :child '("foo" "bar" "baz")))
72
CL-USER> (size (make-instance
'bar
:child "foo"
:child2 (make-instance 'foo :child (make-array 100))))
456

I also have web framework in cl, also was struggling with same sessions problem and here is what universe sent to me https://people.gnome.org/~xan/memory.lisp
It seems to work in sbcl
(memory::dump-memory (weblocks::active-sessions))
Total memory used: 99.785706 MB

Eval times for this function alternate b/w 85 nanosec and 10 sec (!?)

Objective
I'm trying to figure out why a function I've created, items-staged-f, has both such strangely long and short evaluation times.
Strange, you say?
I say "strange" because:
(time (items-staged-f)) yields 1.313 msecs
(time (items-staged-f)) a second time yields 0.035 msecs (which is unsurprising, because the result is a lazy sequence and it must have been memoized)
The Criterium benchmarking system reports it taking 85.149767 ns (which is unsurprising)
And yet...
The time it takes to actually evaluate (items-staged-f) in the REPL is around 10 seconds. This is even before it prints anything. I was originally thinking that it takes that long likely because it's preparing to print to the REPL, because it's a long and complex data structure (nested maps and vectors in a lazy sequence), but it's just strange that the result wouldn't even start printing out until 10 seconds later when it (supposedly) takes 85 nanoseconds. Could it be that it's pre-calculating how to print the data structure?
(time (last (items-staged-f))) yields 10498.16 msecs (although this varies up to around 20 seconds), possibly for the same reason above.
And now for the code...
The goal of the function items-staged-f is to visualize what needs to be done in order to make some necessary changes to inventory items in an accounting database.
Unfamiliar functions referenced within items-staged-f may be found below.
(defn items-staged-f []
(let [items-0 (lazy-seq (items-staged :items))
both-types? #(in? % (group+line-items))
items-from-group #(get items-0 %)
replace-subgroups
(fn [[g-item l-items :as group]]
(let [items-in-both
(->> l-items
(map :item)
(filter both-types?))]
(->> (concat
(remove #(in? (:item %) items-in-both) l-items)
(mapcat items-from-group items-in-both))
(into [])
(assoc group 1))))
replaced (map replace-subgroups items-0)]
replaced))
items-staged is a function which outputs the original data which items-staged-f manipulates. (items-staged :items) outputs a map with string-keys (group items) whose values are vectors of maps (lists of sub-items):
{"786M" ; this is a group item
; below are the sub-items of the above group item
[{:description "Signature Collection Item", :item "4X1"}
{:description "Cookies, Inc. Paper Wrapped", :item "65G7"}
{:description "MyChocolate 5 oz.", :item "21F"}]}
Note that the output of items-staged-f is almost identical in structure to that of items-staged, except it is a lazy sequence of vectors instead of a hash-map with hash-map-entries, as would be expected by calling the map function on a hash-map.
in? is a predicate which checks if an object is in a given collection. For example, (in? 1 [1 2 3]) evaluates to true.
group+line-items is a function which outputs a lazy sequence of certain duplicate items I wish to eliminate. For example, (group+line-items) evaluates to: ("428X" "41SF" "6998" "75D22")
Notes
VisualVM 1.3.8 is saying that clojure.lang.Reflector.getMethods() clocks in at 28700 ms (51.3%), clojure.lang.LineNumberingPushbackReader.read() (is this because of the output in the REPL?) at 9000 ms (16.2%), and clojure.lang.RT.nthFrom() at 7800 ms (13.9%).
However, when I evaluate each element of the lazy sequence (nth items-staged-f n) individually in the REPL, only clojure.lang.LineNumberingPushbackReader.read() ever goes up. The invocations go up in increments of 32, which is the lazy-seq chunking size. Time elapsed for other methods/functions is negligible.
One other consideration is that items-staged is a function which ultimately draws its data from an Excel file (read via Apache POI). However, the raw data from the Excel file is stored as a var, so that shouldn't be an issue because it would only calculate once before being memoized (I think).
Thanks for your help!
Addendum
Once I used doall to force realization on the lazy sequence (which I thought was being realized), Criterium now says the function takes 11.370356 sec to evaluate, which unfortunately makes sense. I'll repost once I refactor.

Lazy-sequences by definition calculate their elements only when required. Printing to the REPL or requesting the last element both force realization. Timing the function call that produces the lazy sequence does not.
(defn slow-and-lazy [] (map #(do (Thread/sleep 1000) (inc %)) (range 10)))
user=> (time (slow-and-lazy))
"Elapsed time: 0.837002 msecs"
(1 2 3 4 5 6 7 8 9 10) ; printed 10 seconds later
user=> (time (doall (slow-and-lazy)))
"Elapsed time: 10000.205709 msecs"
(1 2 3 4 5 6 7 8 9 10)
In the case of (time (slow-and-lazy)), slow-and-lazy quickly returns an unrealized lazy-sequence and time finishes, printing the elapsed time and passing along the unrealized result in this case to the REPL. Then, the REPL attempts to print the sequence. In order to do so, it must realize the sequence.
That having been said, 10 seconds is an eternity for a computer, so this does warrant examination/profiling. I would suggest refactoring your code into smaller self-contained functions. In particular, the data should be passed in as arguments. Once you nail down the bottleneck (time with doall to force realization!), then consider posting a new question. Without being able to tell exactly what's going on with this code or whether IO in items-staged is the true bottleneck, there still seems to be room for improvement.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio