I currently use the following method of reading a file line by line:
(for [(line (in-lines))]
However, right now my code is too slow. Is there a "faster" way to read the input line by line?
Like ramrunner, I suspect that the problem is somewhere else. Here's a short program that I wrote that generates a 10 Megabyte text file and then reads it in using 'in-lines'.
#lang racket
(define chars (list->vector (string->list "abcde ")))
(define charslen (vector-length chars))
(define (random-line)
(list->string
(for/list ([i (in-range 80)])
(vector-ref chars (random charslen)))))
(define linecount (ceiling (/ (* 10 (expt 10 6)) 80)))
(time
(call-with-output-file "/tmp/sample.txt"
(λ (port)
(for ([i (in-range linecount)])
(write (random-line) port)))
#:exists 'truncate))
;; cpu time: 2512 real time: 2641 gc time: 357
;; okay, let's time reading it back in:
(time
(call-with-input-file "/tmp/sample.txt"
(λ (port)
(for ([l (in-lines port)])
'do-something))))
;; cpu time: 161 real time: 212 gc time: 26
;; cpu time: 141 real time: 143 gc time: 23
;; cpu time: 144 real time: 153 gc time: 22
(the times here are all in milliseconds). So it takes about a sixth of a second to read in all the lines in a 10 megabyte file.
Does this match what you're seeing?
Related
i'm porting SRFI 105 "Curly infix" to Racket.
I wrote a "reader" that works and the SRFI 105 is packaged with a REPL that works already with little change (only one line modified as far as ia can remember) but i'm facing a difficulty, being not easy with the Racket ecosytem of building lanaguages:
-first how can i make my implementation parse, not only the main program, but possibly include files? i.e if i have a (include "infix-file.scm") i want the reader/parser to load and parse it before the expansion step which is too late and more difficult
here the beginning of my specialised code (the rest is as the official SRFI 105) file SRFI-105.rkt:
#lang racket
(require syntax/strip-context)
(provide (rename-out [literal-read read]
[literal-read-syntax read-syntax]))
(define (literal-read in)
(syntax->datum
(literal-read-syntax #f in)))
(define (literal-read-syntax src in)
(define lst-code (process-input-code-rec in))
`(module anything racket ,#lst-code))
;; read all the expression of program
;; a tail recursive version
(define (process-input-code-tail-rec in) ;; in: port
(define (process-input acc)
(define result (curly-infix-read in)) ;; read an expression
(if (eof-object? result)
(reverse acc)
(process-input (cons result acc))))
(process-input '()))
; ------------------------------
; Curly-infix support procedures
; ------------------------------
and here is an example of source file using it:
#lang reader "SRFI-105.rkt"
(- (+ 3 3)
{2 + 2})
{5 + 2}
(define (fibonacci n)
(if (< n 2)
n
(+ (fibonacci (- n 1)) (fibonacci (- n 2)))))
(fibonacci 7)
(define (fib n)
(if {n < 2}
n
{(fib {n - 1}) + (fib {n - 2})} ))
(fib 11)
and the obvious results:
Welcome to DrRacket, version 8.2 [cs].
Language: reader "SRFI-105.rkt", with debugging; memory limit: 128 MB.
2
7
13
89
>
-second, i know how to make a #lang "my language.rkt" parse the following code but i do not know how to integrate the working SRFI 105 REPL in Racket ecosystem (for now it works in a separate file than parser).
here the official SRFI 105 that already work in Racket with minor changes:
; --------------
; Demo of reader
; --------------
(define-namespace-anchor a)
(define ns (namespace-anchor->namespace a))
;{1 + 1}
;(+ 1 1)
;2
;(define k {1 + 1})
;(define k (+ 1 1))
;#<void>
;k
;k
;2
; repeatedly read in curly-infix and write traditional s-expression.
(define (process-input)
(let ((result (curly-infix-read)))
(cond ((not (eof-object? result))
(let ((rv (eval result ns)))
(write result) (display "\n")
(write rv)
(display "\n"))
;; (force-output) ; flush, so can interactively control something else
(process-input)) ;; no else clause or other
)))
(process-input)
Damien
a possible solution is to use 'require' instead of 'include' for loading a source file from another and adding one more
#lang reader "SRFI-105.rkt" at the beginning of each loaded file.
example of main file loading another .rkt file
#lang reader "SRFI-105.rkt"
(require "examples-curly-infix2.rkt")
(- (+ 3 3)
{2 + 2})
{5 + 2}
(define (fibonacci n)
(if (< n 2)
n
(+ (fibonacci (- n 1)) (fibonacci (- n 2)))))
(fibonacci 7)
(define (fib n)
(if {n < 2}
n
{(fib {n - 1}) + (fib {n - 2})} ))
(fib 11)
and here is the loaded file:
#lang reader "SRFI-105.rkt"
{7 * 3}
execution will output this:
Welcome to DrRacket, version 8.2 [cs].
Language: reader "SRFI-105.rkt", with debugging; memory limit: 128 MB.
21
2
7
13
89
21 is coming from the 'required' file,
note that instead of include you can not control at which line number the file will be loaded with require (they are loaded at the beginning)
In a larger program I'm writing out a small set (10^7) of numerical digits (0...9). This goes very slow with MIT-Scheme 10.1.10 on a 2.6GHz CPU, taking something like 2 minutes.
Probably I'm doing something wrong, like no buffering, but I'm pretty stuck after reading the reference guide. I reduced everything to the bare minimum:
(define (write-stuff port)
(define (loop cnt)
(if (> cnt 0)
(begin (display "0" port)
(loop (- cnt 1)))))
(loop 10000000))
(call-with-output-file "tmp.txt" write-stuff)
Any hints would be welcome...
[EDIT] To make things clear: the data-entries are unrelated to each other, and are stored in a 2D vector. They can be considered random, so I don't like to group them (it's either one-by-one or all-at-once). You can consider the data to be defined by something like
(define (data width height)
(make-initialized-vector width (lambda (x)
(make-initialized-vector height (lambda (x)
(list-ref (list #\0 #\1) (random 2)))))))
Apparently, the kernel/user-switch takes much time, so it's best to transform this to 1 string and write it out in 1 shot like #ceving suggested. Then it works fast enough for me, even though it's still 20s for 16MB.
(define (data->str data)
(string-append* (vector->list (vector-map vector->string data))))
(define dataset (data 4096 4096))
(call-with-output-file "test.txt" (lambda (p)
(display (data->str dataset) p)))
The problem is not that MIT-Scheme is so slow. The problem is, that you call the kernel function write excessively. Your program switches for every character from user mode to kernel mode. This takes much time. If you do the same in Bash it takes even longer.
Your Scheme version:
(define (write-stuff port)
(define (loop cnt)
(if (> cnt 0)
(begin (display "0" port)
(loop (- cnt 1)))))
(loop 10000000))
(call-with-output-file "mit-scheme-tmp.txt" write-stuff)
(exit)
The wrapper to run the Scheme version:
#! /bin/bash
mit-scheme --quiet --load mit-scheme-implementation.scm
On my system it takes about 1 minute:
$ time ./mit-scheme-implementation
real 1m3,981s
user 1m2,558s
sys 0m0,740s
The same for Bash:
#! /bin/bash
: > bash-tmp.txt
n=10000000
while ((n > 0)); do
echo -n 0 >> bash-tmp.txt
n=$((n - 1))
done
takes 2 minutes:
$ time ./bash-implementation
real 2m25,963s
user 1m33,704s
sys 0m50,750s
The solution is: do not execute 10 million kernel mode switches.
Execute just one (or at least 4096 times fewer):
(define (write-stuff port)
(display (make-string 10000000 #\0) port))
(call-with-output-file "mit-scheme-2-tmp.txt" write-stuff)
(exit)
And the program requires just 11 seconds.
$ time ./mit-scheme-implementation-2
real 0m11,390s
user 0m11,270s
sys 0m0,096s
This is the reason why buffering has been invented in the C library:
https://www.gnu.org/software/libc/manual/html_node/Stream-Buffering.html#Stream-Buffering
I'm working on a Commodore 64 emulator as a fun project with functional programming. My goal was to write the entire thing functionally and as pure as possible. I was looking at using a hash table as my memory store, but the performance of mutable vs immutable hashes seems prohibitive. I liked the idea of a hash table as kind of sparse array of memory, since in many cases, memory won't actually be instantiated. I'd be fine using a vector as well, but there doesn't seem to be a functional version of vector-set.
(define (immut-hash [c (hash)] [r 10000000])
(when (> r 0) (immut-hash (hash-set c (random #xffff) (random #xff)) (- r 1))))
(define (mut-hash [c (make-hash)] [r 10000000])
(when (> r 0) (hash-set! c (random #xffff) (random #xff)) (mut-hash c (- r 1))))
(time (immut-hash)) vs (time (mut-hash)) is much worse, as a simulation of a bunch of memory pokes, and puts it beyond the ability of my macbook pro to keep up with a c64 clock rate.
(a) Is there any better approach to improve the performance of the mutable hashes in this case?
(b) If not, is there another functional approach people would suggest?
Note - I know that this isn't likely the right solution for absolute performance. Like I said..learning.
I know this is an old discussion, but it is the top hit for searching for the performance of Racket's hash-set (e.g. the immutable, functional way of setting a hash key value pair). Since 2019 when this article was posted and answered, the underlying Racket engine has changed to use Chez Scheme, and the performance ratios have also changed significantly.
Rerunning the above tests (I've included mutable vector operations as well, since the OP mentioned it):
#lang racket
(define (immut-hash [c (hash)] [r 10000000])
(when (> r 0) (immut-hash (hash-set c (random #xffff) (random #xff)) (- r 1))))
(define (mut-hash [c (make-hash)] [r 10000000])
(when (> r 0) (hash-set! c (random #xffff) (random #xff)) (mut-hash c (- r 1))))
(define (mut-vec [c (make-vector 65536)] [r 10000000])
(when (> r 0) (vector-set! c (random #xffff) (random #xff)) (mut-vec c (- r 1))))
(time (immut-hash (hash)))
(time (immut-hash (hasheq)))
(time (mut-hash (make-hash)))
(time (mut-hash (make-hasheq)))
(time (mut-vec))
produces the following results:
cpu time: 4024 real time: 4409 gc time: 198
cpu time: 3991 real time: 4334 gc time: 188
cpu time: 2532 real time: 2631 gc time: 17
cpu time: 2432 real time: 2524 gc time: 21
cpu time: 1985 real time: 2173 gc time: 11
Conclusions from the year 2021 (using Racket's new Chez Scheme 8.x engine):
The performance degradation from using hash/make-hash instead of hasheq/make-hasheq has essentially been eliminated.
The performance degradation from using immutable hashes instead of mutable hashes has gone from over 4x to less than 2x.
The worst case scenario (immutable hash) is now only 2x worse than the best case scenario (mutable vectors).
If you know that the keys of your hash will be fixnums, you could use hasheq (or make-hasheq) instead of hash (or make-hash). This gives a better performance, at least for Racket 7.4 3m variant on my Macbook Pro.
#lang racket
(define (immut-hash [c (hash)] [r 10000000])
(when (> r 0) (immut-hash (hash-set c (random #xffff) (random #xff)) (- r 1))))
(define (mut-hash [c (make-hash)] [r 10000000])
(when (> r 0) (hash-set! c (random #xffff) (random #xff)) (mut-hash c (- r 1))))
(time (immut-hash (hash)))
(time (immut-hash (hasheq)))
(time (mut-hash (make-hash)))
(time (mut-hash (make-hasheq)))
Here's the results:
cpu time: 9383 real time: 9447 gc time: 3181
cpu time: 6644 real time: 6658 gc time: 1105
cpu time: 2220 real time: 2225 gc time: 0
cpu time: 1647 real time: 1654 gc time: 0
There's a recent thread about performance of immutable hash. Jon compared the performance of immutable hash implemented by Patricia trie vs hash array mapped trie (HAMT), the hash type (eq? vs equal?), and the insertion order. You might want to take a look at the results.
See EDIT 1, 2, and 3 for updates. I leave here the complete research process.
I know we can use typed/racket modules from untyped racket (and vice versa). But when doing so, the typed/racket module just behaves as if it was typed/racket/no-check, which disables optimizations and just uses it as a normal untyped module.
For example, if you have a typed/racket module like this:
#lang typed/racket
(require math)
(provide hello)
(define (hello [str : String])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
And you want to use it in an untyped program like this:
#lang racket/base
(require "hello-matrix.rkt")
(hello "Alan Turing")
You'll get pretty bad performance results (in my case, I'm doing about 600000 matrix multiplications, the program doesn't even finish), while using #lang typed/racket makes my program finish in 3 seconds.
The downside is that my untyped code becomes infected with types, forcing me to write all my program in TR, turning me crazy pretty quickly.
But my savior was not so far away. I stumbled upon a funny April's-fool-like package Jay McCarthy wrote by a cloudy dark night, called live-free-or-die, which pretty much does this:
http://docs.racket-lang.org/live-free-or-die/index.html
#lang racket/base
(require (for-syntax racket/base
typed-racket/utils/tc-utils))
(define-syntax (live-free-or-die! stx)
(syntax-case stx ()
[(_)
(syntax/loc stx
(begin-for-syntax
(set-box! typed-context? #t)))]))
(provide live-free-or-die!
(rename-out [live-free-or-die!
Doctor-Tobin-Hochstadt:Tear-down-this-wall!]))
By using it in my typed/racket module, like so:
#lang racket
(require live-free-or-die)
(live-free-or-die!)
(require math)
(provide hello)
(define (hello str)
(define result (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
Now my module is not #lang typed/racket anymore, but the results of running it are spectacular! It runs in 3 seconds, exactly as if it was a typed/racket module.
I am, of course, disgusted with that hack, and that's why I'm wondering if there could be a better solution to this, especially for making the matrix operations from math usable.
The Google Groups discussion about that crazy module Jay wrote is the only piece of information I could get.
https://groups.google.com/forum/#!topic/racket-users/JZoHYxwwJqU
People in this thread seems to say that the module is not useful anymore:
Matthias Felleisen
Well, now that our youngsters have easily debunked the package, we can let it die because it no longer wants to live.
Is there really a better alternative?
EDIT 1 - A testable example
If you want to test the performance difference, try using this definition of do-some-crazy-matrix-operations:
#lang typed/racket
(require math)
(provide hello)
(: do-some-crazy-matrix-operations : (-> (Matrix Flonum)))
(define (do-some-crazy-matrix-operations)
(define m1 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(define m2 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(for ([i 60000])
(set! m1 (matrix-map * m1 m2))
(set! m2 (matrix-map * m1 m2)))
(matrix+ m1 m2))
(define (hello [str : String])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
(time (hello "Alan Turing"))
Using #lang typed/racket it runs in 288ms:
cpu time: 288 real time: 286 gc time: 16
Using #lang typed/racket/no-check it runs in 52 seconds:
cpu time: 52496 real time: 52479 gc time: 396
Using #lang racket and live-free-or-die it runs in 280ms:
cpu time: 280 real time: 279 gc time: 4
EDIT 2 - This was not the issue!
Following John Clement's answer, I discovered the examples were not enough to reproduce the real issue. Using typed/racket modules in untyped ones actually works fine.
My real problem is an issue with the boundary contracts created by a class that passes from untyped to typed racket.
Let's consider this implementation of hello-matrix.rkt:
#lang typed/racket
(require math)
(provide hello crazy% Crazy)
(define-type CrazyClass (Class (field [m1 (Matrix Flonum)])
(field [m2 (Matrix Flonum)])
(do (-> (Matrix Flonum)))))
(define-type Crazy (Instance CrazyClass))
(: crazy% CrazyClass)
(define crazy%
(class object%
(field [m1 (build-matrix 5 5 (lambda (x y) (add1 (random))))]
[m2 (build-matrix 5 5 (lambda (x y) (add1 (random))))])
(super-new)
(define/public (do)
(set! m1 (matrix* (matrix-transpose m1) m2))
(set! m2 (matrix* (matrix-transpose m1) m2))
(matrix+ m1 m2))))
(: do-some-crazy-matrix-operations : Crazy -> (Matrix Flonum))
(define (do-some-crazy-matrix-operations crazy)
(for ([i 60000])
(send crazy do))
(matrix+ (get-field m1 crazy) (get-field m2 crazy)))
(define (hello [str : String] [crazy : Crazy])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations crazy))
(display (format "Hello ~a! Result is ~a\n" str result)))
Then those two usages:
#lang typed/racket
(require "hello-matrix.rkt")
(define crazy : Crazy (new crazy%))
(time (hello "Alan Turing" crazy))
cpu time: 1160 real time: 1178 gc time: 68
#lang racket
(require "hello-matrix.rkt")
(define crazy (new crazy%))
(time (hello "Alan Turing" crazy))
cpu time: 7432 real time: 7433 gc time: 80
Using contract-profile:
Running time is 83.47% contracts
6320/7572 ms
BY CONTRACT
g66 # #(struct:srcloc hello-matrix.rkt 3 15 50 6)
3258 ms
(-> String (object/c (do (-> any/c (struct/c Array (vectorof Index) Index (box/c (or/c #f #t)) (-> Void) (-> (vectorof Index) Float)))) (field (m1 (struct/c Array (vectorof Index) Index (box/c (or/c #f #t)) (-> Void) (-> (vectorof Index) Float))) (m2 (struct/c Array (vectorof Index) Index (box/c (or/c #f #t)) (-> Void) (-> (vectorof Index) Float))))) any) # #(struct:srcloc hello-matrix.rkt 3 9 44 5)
3062 ms
EDIT 3 - Passing struct from typed to untyped is more performant than passing class
Using a struct instead of a class fixes this:
hello-matrix.rkt:
#lang typed/racket
(require math)
(provide hello (struct-out crazy))
(struct crazy ([m1 : (Matrix Flonum)] [m2 : (Matrix Flonum)]) #:mutable)
(define-type Crazy crazy)
(define (crazy-do [my-crazy : Crazy])
(set-crazy-m1! my-crazy (matrix* (matrix-transpose (crazy-m1 my-crazy))
(crazy-m2 my-crazy)))
(set-crazy-m2! my-crazy (matrix* (matrix-transpose (crazy-m1 my-crazy))
(crazy-m2 my-crazy)))
(matrix+ (crazy-m1 my-crazy) (crazy-m2 my-crazy)))
(: do-some-crazy-matrix-operations : Crazy -> (Matrix Flonum))
(define (do-some-crazy-matrix-operations my-crazy)
(for ([i 60000])
(crazy-do my-crazy))
(matrix+ (crazy-m1 my-crazy) (crazy-m2 my-crazy)))
(define (hello [str : String] [my-crazy : Crazy])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations my-crazy))
(display (format "Hello ~a! Result is ~a\n" str result)))
Usage:
#lang typed/racket
(require "hello-matrix.rkt")
(require math)
(define my-crazy (crazy (build-matrix 5 5 (lambda (x y) (add1 (random))))
(build-matrix 5 5 (lambda (x y) (add1 (random))))))
(time (hello "Alan Turing" my-crazy))
cpu time: 1008 real time: 1008 gc time: 52
#lang racket
cpu time: 996 real time: 995 gc time: 52
I'm writing this as an "answer" to allow me to format my code... I think we're talking a bit past each other. Specifically, I can run your typed code from an untyped module in about half a second. I named your typed code file "hello-matrix.rkt", as you suggested, and then ran the untyped module that you provided
(the one that required the TR module) and it took the same amount of time (about half a second). Let me be careful in saying this:
Contents of "hello-matrix.rkt":
#lang typed/racket
(require math)
(provide hello)
(: do-some-crazy-matrix-operations : (-> (Matrix Flonum)))
(define (do-some-crazy-matrix-operations)
(define m1 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(define m2 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(for ([i 60000])
(set! m1 (matrix-map * m1 m2))
(set! m2 (matrix-map * m1 m2)))
(matrix+ m1 m2))
(define (hello [str : String])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
(time (hello "Alan Turing"))
then, I call it from an untyped module, just like you said:
#lang racket/base
(require "hello-matrix.rkt")
(time (hello "Alan Turing"))
and here's the result:
Hello Alan Turing! Result is (array #[#[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0]])
cpu time: 719 real time: 710 gc time: 231
Hello Alan Turing! Result is (array #[#[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0]])
cpu time: 689 real time: 681 gc time: 184
That is, it takes the same time to call it from untyped racket that it does from typed racket.
This result might depend a bit on what version of DrRacket you're using; I'm using 6.11.
All of this is to demonstrate that TR code is still TR code, even if you call it from untyped code. I do believe that you're having performance problems, and I do believe that they're related to matrix operations, but this particular example doesn't illustrate them.
I've been experimenting with Clojure lately. I tried writing my own map function (two actually) and timed them against the built in function. However, my map functions are way way slower than the built in one. I wanted to know how I could make my implementation faster. It should give me some insights into performance tuning Clojure algorithms I write. The first function (my-map) does recursion with recur. The second version (my-map-loop) uses loop/recur which was much faster than simply using recur.
(defn my-map
([func lst] (my-map func lst []))
([func lst acc]
(if (empty? lst)
acc
(recur func (rest lst) (conj acc (func (first lst)))))))
(defn my-map-loop
([func lst]
(loop [acc []
inner-lst lst]
(if (empty? inner-lst)
acc
(recur (conj acc (func (first inner-lst))) (rest inner-lst))
))))
(let [rng (range 1 10000)]
(time (map #(* % %) rng))
(time (my-map #(* % %) rng))
(time (my-map-loop #(* % %) rng)))
These are the results I got -
"Elapsed time: 0.084496 msecs"
"Elapsed time: 14.132217 msecs"
"Elapsed time: 7.324682 mess"
Update
After resueman pointed out that I was timing things incorrectly, I changed the functions to:
(let [rng (range 1 10000)]
(time (doall (map #(* % %) rng)))
(time (doall (my-map #(* % %) rng)))
(time (doall (my-map-loop #(* % %) rng)))
nil)
These are the new results:
"Elapsed time: 9.563343 msecs"
"Elapsed time: 12.320779 msecs"
"Elapsed time: 5.608647 mess"
"Elapsed time: 11.103316 msecs"
"Elapsed time: 18.307635 msecs"
"Elapsed time: 5.86644 mess"
"Elapsed time: 10.276658 msecs"
"Elapsed time: 10.288517 msecs"
"Elapsed time: 6.19183 mess"
"Elapsed time: 9.277224 msecs"
"Elapsed time: 13.070076 msecs"
"Elapsed time: 6.830464 mess"
Looks like my second implementation is fastest of the bunch. Anyways, I would still like to know if there are ways to optimize it further.
There are many things that could be leveraged to have a faster map: transients (for your accumulator), chunked seqs (for the source but only make sense when you want a lazy output), reducible collections (for the source again) and getting more familiar with the core functions (there's is a mapv).
You should also consider using Criterium instead of time if only for the fact that it checks whether your JVM optimizations are capped (which is the default with lein).
=> (let [rng (range 1 10000)]
(quick-bench (my-map-loop #(* % %) rng))
(quick-bench (into [] (map #(* % %)) rng)) ; leveraging reducible collections and transients
(quick-bench (mapv #(* % %) rng))) ; existing core fn
(output elided to keep only the means)
Execution time mean : 776,364755 µs
Execution time mean : 409,737852 µs
Execution time mean : 456,071295 µs
It is interesting to note that mapv is no faster than (into [] (map #(* % %)) rng) that is a generic way of optimizing these kinds of computations.