Strange aget optimisation behavior - performance

Followup on this question about aget performance
There seems to be something very strange going on optimisation wise. We knew the following was true:
=> (def xa (int-array (range 100000)))
=> (set! *warn-on-reflection* true)
=> (time (reduce + (for [x xa] (aget ^ints xa x))))
"Elapsed time: 42.80174 msecs"
=> (time (reduce + (for [x xa] (aget xa x))))
"Elapsed time: 2067.673859 msecs"
Reflection warning, NO_SOURCE_PATH:1 - call to aget can't be resolved.
Reflection warning, NO_SOURCE_PATH:1 - call to aget can't be resolved.
However, some further experimenting really weirded me out:
=> (for [f [get nth aget]] (time (reduce + (for [x xa] (f xa x)))))
("Elapsed time: 71.898128 msecs"
"Elapsed time: 62.080851 msecs"
"Elapsed time: 46.721892 msecs"
4999950000 4999950000 4999950000)
No reflection warnings, no hints needed. Same behavior is seen by binding aget to a root var or in a let.
=> (let [f aget] (time (reduce + (for [x xa] (f xa x)))))
"Elapsed time: 43.912129 msecs"
Any idea why a bound aget seems to 'know' how to optimise, where the core function doesn't ?

It has to do with the :inline directive on aget, which expands to (. clojure.lang.RT (aget ~a (int ~i)), whereas the normal function call involves the Reflector. Try these:
user> (time (reduce + (map #(clojure.lang.Reflector/prepRet
(.getComponentType (class xa)) (. java.lang.reflect.Array (get xa %))) xa)))
"Elapsed time: 63.484 msecs"
user> (time (reduce + (map #(. clojure.lang.RT (aget xa (int %))) xa)))
Reflection warning, NO_SOURCE_FILE:1 - call to aget can't be resolved.
"Elapsed time: 2390.977 msecs"
You might wonder what's the point of inlining, then. Well, check out these results:
user> (def xa (int-array (range 1000000))) ;; going to one million elements
user> (let [f aget] (time (dotimes [n 1000000] (f xa n))))
"Elapsed time: 187.219 msecs"
user> (time (dotimes [n 1000000] (aget ^ints xa n)))
"Elapsed time: 8.562 msecs"
It turns out that in your example, as soon as you get past reflection warnings, your new bottleneck is the reduce + part and not array access. This example eliminates that and shows an order-of-magnitude advantage of the type-hinted, inlined aget.

when you call through a higher order function all arguments are cast to object. In these cases the compiler can't figure out the type for the function being called because it is unbound when the function is compiled. It can only be determined that it will be something that can be called with some arguments. No warning is printed because anything will work.
user> (map aget (repeat xa) (range 100))
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99)
you have found the edge where the clojure compiler gives up and just uses object for everything. (this is an oversimplified explanation)
if you wrap this in anything that gets compiled on it own (like an anonymous function) then the warnings become visible again, though they come from compiling the anonymous function, not form compiling the call to map.
user> (map #(aget %1 %2) (repeat xa) (range 100))
Reflection warning, NO_SOURCE_FILE:1 - call to aget can't be resolved.
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99)
and then the warning goes away when a type hint is added to the anonymous, though unchanging, function call.


FInd location of element in a vector

I'm new to APL and I would like to find the position of an element(s) within a vector. For example, if I create a vector of 50 random numbers:
lst ← 50 ? 100
How can I find the positions of 91 assuming it occurs 3 times in the vector?
I'm not an expert, but a simple way is to just select the elements from ⍳ 100 where the corresponding element in lst is 91
With Dyalog 16.0, you can use the new monadic function ⍸ "Where".
lst=91 gives a vector of 0s and 1s. Applying ⍸ on this gives the locations of all the 1s. This also works if lst is a matrix.
Thanks to ngn, Cows_quack and Probie. I should have read Mastering Dyalog APL more carefully as it also mentions this on page 126. So taking all the answers together:
⍝ Generate a list of 100 non-unique random numbers
lst ← ?100⍴100
⍝ How many times does 1, for example, appear in the vector Using the compress function?
+/ (lst = 1) ⍝ Appears twice
⍝ Find the locations of 1 in the vector
(lst = 1) / ⍳ ⍴ lst
2 37 ⍝ Positions 2 and 37
So to break down the solution; (i) (lst = 1) generates a boolean vector where true occurs where the int value of 1 exists; (ii) compress the lst vector by the boolean vector creates a new vector with the positions of 'true' in lst.
Correct me if my description is off?
Using the 'Where' function makes it more readable (though the previous method shows how the APL mindset of array programming is used to solve it):
2 37 ⍝ Positions 2 and 37
Thanks for your time on this!
While your question has already been amply answered, you may be interested in the Key operator, ⌸. When its derived function is applied monadically, it takes a single operand and applies it once for each element in the argument. The function is called with the unique element as left argument and the list of its indices as right argument:
lst ← ?100⍴10
{⍺ ⍵}⌸lst
│3 │1 3 9 28 37 38 55 70 88 │
│10│2 6 13 17 30 59 64 66 71 82 83 96 │
│7 │4 5 12 15 20 52 54 68 74 85 89 91 92 │
│9 │7 11 24 47 53 58 69 86 90 │
│8 │8 14 16 21 43 51 63 67 73 80 │
│2 │10 18 26 27 34 36 48 78 79 87 │
│1 │19 25 31 32 33 42 57 65 75 84 97 98 99 100│
│6 │22 23 45 46 50 60 76 94 │
│5 │29 49 56 61 72 77 93 95 │
│4 │35 39 40 41 44 62 81 │
Try it online!

Random assignment of elements in a list

I have a list (of N ALISTs) that I need to partition into k mutually exclusive, collectively exhaustive lists (that don't necessarily have to be the same length). That is, I need a function that will do something like the following (for N = 6, k = 2):
(my-function (listA listB listC listD listE listF))
#=> ((listA listC listF listD) (listB listE))
The approach I was thinking about goes along those lines (assigning a number up to k for each member of the ALIST and then grouping those items based on their assignment). Maybe it's stupid, I'm not sure.
(defun make-solution (problem)
"Generates random initial solution to be later explored"
(let ((assignments (mapcar #'(lambda (request) (random *fleet-size*)) problem)))
; maybe something to group back the elements of problem according to their value in assignments?
Any hints on what to fill in? Maybe a better approach? For context, what I'm doing is randomly creating an initial population for a vehicle routing problem that I can later iterate on with my local search.
Something like this maybe:
CL-USER 39 > (pprint
(let ((l (loop for i from 1 upto 100 collect i)))
(flet ((part (l k &aux (r (make-array k :initial-element nil)))
(loop while l
do (push (pop l) (aref r (random k))))
(coerce r 'list)))
(part l 7))))
((98 94 89 87 85 84 78 71 68 53 42 38 35 33 27 26 5 3)
(93 86 65 55 54 37 23 18 11 10 2)
(92 91 82 69 67 62 61 59 56 52 44 36 34 22 21 12 7)
(97 77 76 70 57 47 46 45 43 32 17 14 4)
(96 95 90 88 83 81 80 73 58 49 48 39 30 25 19 8 6)
(75 63 60 41 31 24 15 9 1)
(100 99 79 74 72 66 64 51 50 40 29 28 20 16 13))
Preserve the order:
CL-USER 40 > (pprint
(let ((l (loop for i from 1 upto 100 collect i)))
(flet ((part (l k &aux (r (make-array k :initial-element nil)))
(loop while l
do (push (pop l) (aref r (random k))))
(map 'list #'reverse r)))
(part l 7))))
((6 13 14 22 24 40 44 55 57 58 64 66 67 74 78 92 95 96)
(7 11 23 26 27 28 81 91)
(3 5 8 9 10 20 21 33 35 36 42 45 47 63 69 72 75 80 88 89 98)
(2 16 32 43 53 68 71 76 79 84 87 90 93 94 97)
(1 4 12 15 18 25 30 39 41 46 48 51 54 59 65 73 83 100)
(17 19 29 31 34 37 38 49 56 85 86)
(50 52 60 61 62 70 77 82 99))

Selecting the "P" in Prune and Search Algorithm

Note: the diagram above shows a partition into groups of 5 (the columns). The horizontal box denotes the median values of each partition. The 'P' item indicates the median of medians.
Most of the researches that I saw have this picture in Selecting their "P" and it always have an odd numbers of elements. But What if the numbers elements you have are even?
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
how do you get your "P" in an even set of elements?
This explanation gives the detail I think you're looking for:
The median of the set plays a special role in this algorithm, and it
is defined as the i-smallest item where i = (n+1)/2 if n is odd and i =
n/2 or (n+2)/2 if n is even.

Glissando Function whose arguments are the extremes of the codomain [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm a musician and I am playing with writing a function in Clojure to reproduce a simple glissando between the pitches A4 and A5 (frequencies 440Hz and 880Hz, respectively), with an exponential curve, but I'm running into trouble. Basically I want to use it like this:
(def A4 440)
(def A5 880)
(gliss A4 A5)
which should give me something like:
=>(441 484 529 576 625 676 729 784 841)
except I would eventually like to also give it a sample-rate as a third argument.
This kind of works:
(defn gliss
[start-pitch end-pitch s-rate]
(let [f (fn [x]
(expt x 2))]
(remove nil?
(fn [x]
(when (and
(>= (f x) start-pitch)
(<= (f x) end-pitch))
(f x)))
(range 0 10000 s-rate)))))
I guess the problem is the way I want to use the function. Instead of saying something like "glissando from x1 to x2 where f(x) = x^2" I'm really trying to say "glissando from f(x) == 440 to f(x) == 880" so I'm not really given a range of x to work with initially, hence why I just hard-coded 0 to 10000 in this case, but that's ugly.
What is a better way to accomplish what I'm trying to do?
Update: I made a mistake in terminology that needs fixing (for all the hordes of people who will come here looking to notate a glissando in Clojure). The third argument isn't really sample-rate, it should be number-of-samples. In other words, the sample rate (which might be 44100Hz or 48000Hz, etc.) determines the number of samples you will need for a particular duration of time. If you needed a glissando with an e exponential curve from A4 to A5 over a duration of 500 milliseconds at a sampling rate of 44100, you might use these functions:
(defn gliss
[start end samples]
(map #(+ start
(math/expt (/ (inc %) samples) 2.718281828)
(- end start)))
(range samples)))
(defn ms-to-samps
[ms s-rate]
(/ (* ms s-rate) 1000))
like this:
(def A4 440)
(def A5 (* A4 2))
(def s-rate 44100) ;; historic CD quality sample rate
(gliss A4 A5 (ms-to-samps 500 s-rate))
here's a simple exponential curve distributed over the range of the frequency range with rate samples:
(ns hello.exp
(:require [clojure.math.numeric-tower :as math]))
(defn gliss [start end rate]
(map #(+ start (* (math/expt (/ (inc %) rate) 2.718281828) (- end start)))
(range rate)))
This does not exactly fit your gliss curve because im using e as the exponent though I suspect it would sound good if you feed it to overtone ;) I suspect that a proper musical gliss would use an exponent of 1 in this function from what I read in the wikipedia article.
hello.exp> (gliss 440 880 5)
(445.5393041947095 476.4535293633514 549.7501826896913 679.8965206341077 880.0)
hello.exp> (map int (gliss 440 880 100))
(440 440 440 440 441 441 442 442 443 444 445 446 447 448 449
451 452 454 455 457 459 461 463 465 467 469 472 474 477 479
482 485 487 490 493 497 500 503 506 510 513 517 521 525 529
533 537 541 545 550 554 558 563 568 573 577 582 588 593 598
603 609 614 620 625 631 637 643 649 655 661 668 674 680 687
694 700 707 714 721 728 735 743 750 757 765 773 780 788 796
804 812 820 828 837 845 853 862 871 880)

Data mining for integers with exact fitting

I make lot of dealing with RFID cards. As much as there are different readers there are different outputs and coding of same type of cards.
I got frequent request to figure out (if possible) to translate one output to another and that means that I have to stare at these numbers and figure out what transformations are.
Most common transforms are
added constant
reversed binary sequence
cutting a few bits away
combinations of this methods
I usually have something like 30% success rate, but I always got frustrated when after a few hours I can not find translation. It's probably very simple but I just can not figure it out. That is why I am looking for a kind of algorithm/library/software that would check these rules automatically on two sets of numbers and try to figure out smallest Kolmogorov complexity.
Since I have zero knowledge about data mining I would be thankful for any pointers.
This seems like a genetic programming problem.
The 'genes' are the individual bit transformations that can occur. The fitness function is how many bits are correctly transformed for growing input sets. A genetic programming library can shuffle genes around trying to find better fitness, and "breeding" the indivduals who have high fitness levels to attempt to create a more fit individual.
Check out pyEvolve .
I don't know what's the length of the numbers but let's assume they are 64-bit. The number of different non-trivial atomic transformations is then as follows
Added constant 2**64 - 1
Reversal 1
Remove bits 63
Rotation 63
If you have combinations also, you have 4 + 12 + 24 + 24 = 64 different ways to order a subset of the transformations (without taking the parameters of the transformation into account). So what I would do is to
Have an outer loop that iterates over the 64 ways to combine the transformations
Then have an inner loop that iterates over the maximum 63 * 63 parameter values for "remove bits" and rotation; now the total number of iterations is ~~ 643 == (26)3 = 218 which is okay
Apply the hypothetical transformation (one out of 218), and then calculate the differences between the first data set and the second data set transformed; if the difference is constant you have found the additive constant for the "added constant" transformation and are done
This should be very fast on a modern PC, i.e. you should be able to find the solution in a couple of seconds. If the data sets are large (> 100) you can use a sample first and then validate the result on the whole data set only when the subset works out correctly.
I wrote a small prove of concept. Here is what I have done.
I generated ten random binary strings with 64 digits as card content examples produced by a reference reader.
Then I generated a random mapping table to simulate the different output of another reader for the same ten cards. It has the format i -> j meaning bit i from the reference content occurs as bit j on the other reader.
4 -> 0 4 -> 1 49 -> 2 32 -> 3 51 -> 4 52 -> 5 10 -> 6 47 -> 7
16 -> 8 32 -> 9 14 -> 10 24 -> 11 13 -> 12 1 -> 13 8 -> 14 47 -> 15
12 -> 16 56 -> 17 55 -> 18 22 -> 19 6 -> 20 33 -> 21 22 -> 22 45 -> 23
37 -> 24 39 -> 25 46 -> 26 47 -> 27 25 -> 28 15 -> 29 43 -> 30 13 -> 31
33 -> 32 31 -> 33 16 -> 34 49 -> 35 0 -> 36 30 -> 37 28 -> 38 31 -> 39
45 -> 40 28 -> 41 17 -> 42 18 -> 43 40 -> 44 18 -> 45 23 -> 46 54 -> 47
11 -> 48 54 -> 49 41 -> 50 39 -> 51 28 -> 52 31 -> 53 1 -> 54 34 -> 55
45 -> 56 4 -> 57 59 -> 58 11 -> 59 6 -> 60 26 -> 61 21 -> 62 0 -> 63
52 -> 64 1 -> 65 55 -> 66 46 -> 67 49 -> 68 23 -> 69 47 -> 70 45 -> 71
28 -> 72 23 -> 73 41 -> 74 41 -> 75 16 -> 76 4 -> 77 4 -> 78 18 -> 79
For example bits one and two of the other readers output equal bit four of the reference reader output. The simulated output is 80 bits width and there are some bits duplicated and maybe some others missing.
Now we want to find the mapping between both data sets. For this we just look at the correlation between the bits. That means for each combination of a bit index i (0 to 63) produced by the reference reader and each bit index j (0 to 79) produced by the other reader we just count how many examples have matching bits at this positions.
0 3545.65456386363765535465634568436588433683666585575745656555647
1 3545.65456386363765535465634568436588433683666585575745656555647
2 4474534385457494458444376567873567875326554355654.48656783626534
3 64743565476336564544465525456333.7853368114353456646256765446574
4 669655438567727425624459656765356787734855435365684.656765446734
5 4656752763479454654464558367475525457744794735656666.72365644734
6 8676354543.33636254244956545235365655566334533456446476543266354
7 33638674585643657453154636365462565864334656463.5555545874333445
8 2434756747255656.54646334345655545255742576757474462652565642556
9 64743565476336564544465525456333.7853368114353456646256765446574
10 33636452963663.5549533285656964656786215665466763937547874535425
11 685853256165765447646475.365475725457744774555634666874343666536
12 6636334723613.36674666716343455565433764334555434462474343666376
13 6.5.554543655634494466758363455745457766554373434486654325686758
14 44745543.5477296438442396567853745677326776555854828656765444514
15 33638674585643657453154636365462565864334656463.5555545874333445
16 556564367438.363545575467478584636546655885646747757963476735843
17 42725365674574746364463545696733676535445565374768466547.3604552
18 5383647278564385547315483636744478786235445466585737347.74335445
19 9767443632944725363355.47434146456546675443644547355585434377465
20 445455.349453454656648352765655565453564538377274464216765644576
21 758562545656656356533766543656447.766455443466567757565874537665
22 9767443632944725363355.47434146456546675443644547355585434377465
23 534362745636656374775744565658663634465186766.565553545674735465
24 5747445834546725763577627276364634124.73667646345373761254753665
25 667637456565545625446457456563358585536.334351636648456547466754
26 6454552583477476436662576545655745657346774755.36626676547466534
27 33638674585643657453154636365462565864334656463.5555545874333445
28 5343667256564363347755463.54568454764255645466565555329656557465
29 445437476563365.634442554345613563455546356733654424474545262334
30 5343663854566547723553645436346434346653685.26767333783456353443
31 6636334723613.36674666716343455565433764334555434462474343666376
32 758562545656656356533766543656447.766455443466567757565874537665
33 5747445474366565567775467474764.34346655867486723555545436775647
34 2434756747255656.54646334345655545255742576757474462652565642556
35 4474534385457494458444376567873567875326554355654.48656783626534
36 .676334543855634254466956545255567635586534555638446476545468574
37 552586543456454356575564583236.434566453663666565373547436577467
38 2454556387255496658644174565.53765675326556375652846436765644536
39 5747445474366565567775467474764.34346655867486723555545436775647
40 534362745636656374775744565658663634465186766.565553545674735465
41 2454556387255496658644174565.53765675326556375652846436765644536
42 59496454345447435.5557647452566656566655443284343595545434777669
43 445453618545549445.644376765875745675324756377652846438763646336
44 5545645474388363547775467676586814346653.87668745555745456755645
45 445453618545549445.644376765875745675324756377652846438763646336
46 55856652965861853473334.5656744656788237665464765739547856355625
47 645455616565347425864457494365756587514653437565464623.745468356
48 55658654763.8163545555485656566636568455885666767557745658555845
49 645455616565347425864457494365756587514653437565464623.745468356
50 35458636743883657455534674565666143686338.5846765555963456553625
51 667637456565545625446457456563358585536.334351636648456547466754
52 2454556387255496658644174565.53765675326556375652846436765644536
53 5747445474366565567775467474764.34346655867486723555545436775647
54 6.5.554543655634494466758363455745457766554373434486654325686758
55 6474554365655474256444574745655387.75148332353656848458765448554
56 534362745636656374775744565658663634465186766.565553545674735465
57 3545.65456386363765535465634568436588433683666585575745656555647
58 68385745436536364746647565414377434575665545736342644563074.6558
59 55658654763.8163545555485656566636568455885666767557745658555845
60 445455.349453454656648352765655565453564538377274464216765644576
61 46563565654574544566863565.7673743433766758355434666634365844754
62 665653852745563267466.534565475567433784536377256484434565846796
63 .676334543855634254466956545255567635586534555638446476545468574
64 4656752763479454654464558367475525457744794735656666.72365644734
65 6.5.554543655634494466758363455745457766554373434486654325686758
66 5383647278564385547315483636744478786235445466585737347.74335445
67 6454552583477476436662576545655745657346774755.36626676547466534
68 4474534385457494458444376567873567875326554355654.48656783626534
69 55856652965861853473334.5656744656788237665464765739547856355625
70 33638674585643657453154636365462565864334656463.5555545874333445
71 534362745636656374775744565658663634465186766.565553545674735465
72 2454556387255496658644174565.53765675326556375652846436765644536
73 55856652965861853473334.5656744656788237665464765739547856355625
74 35458636743883657455534674565666143686338.5846765555963456553625
75 35458636743883657455534674565666143686338.5846765555963456553625
76 2434756747255656.54646334345655545255742576757474462652565642556
77 3545.65456386363765535465634568436588433683666585575745656555647
78 3545.65456386363765535465634568436588433683666585575745656555647
79 445453618545549445.644376765875745675324756377652846438763646336
Above are the results from this with a dot representing ten matches. As you can see this recovers the mapping for all bits except bits 13, 54, and 65 where two possible matches are found.
77 out of 80 bits with only ten samples is quite good. Admittedly this will not work that good if the bit patterns contain structure and are not just random bits or if you have to take bits computed from several bits into account. But if you have access to large enough sample sets you can uncover all possible mappings.
