Glissando Function whose arguments are the extremes of the codomain [closed] - algorithm

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm a musician and I am playing with writing a function in Clojure to reproduce a simple glissando between the pitches A4 and A5 (frequencies 440Hz and 880Hz, respectively), with an exponential curve, but I'm running into trouble. Basically I want to use it like this:
(def A4 440)
(def A5 880)
(gliss A4 A5)
which should give me something like:
=>(441 484 529 576 625 676 729 784 841)
except I would eventually like to also give it a sample-rate as a third argument.
This kind of works:
(defn gliss
[start-pitch end-pitch s-rate]
(let [f (fn [x]
(expt x 2))]
(remove nil?
(map
(fn [x]
(when (and
(>= (f x) start-pitch)
(<= (f x) end-pitch))
(f x)))
(range 0 10000 s-rate)))))
I guess the problem is the way I want to use the function. Instead of saying something like "glissando from x1 to x2 where f(x) = x^2" I'm really trying to say "glissando from f(x) == 440 to f(x) == 880" so I'm not really given a range of x to work with initially, hence why I just hard-coded 0 to 10000 in this case, but that's ugly.
What is a better way to accomplish what I'm trying to do?
Update: I made a mistake in terminology that needs fixing (for all the hordes of people who will come here looking to notate a glissando in Clojure). The third argument isn't really sample-rate, it should be number-of-samples. In other words, the sample rate (which might be 44100Hz or 48000Hz, etc.) determines the number of samples you will need for a particular duration of time. If you needed a glissando with an e exponential curve from A4 to A5 over a duration of 500 milliseconds at a sampling rate of 44100, you might use these functions:
(defn gliss
[start end samples]
(map #(+ start
(*
(math/expt (/ (inc %) samples) 2.718281828)
(- end start)))
(range samples)))
(defn ms-to-samps
[ms s-rate]
(/ (* ms s-rate) 1000))
like this:
(def A4 440)
(def A5 (* A4 2))
(def s-rate 44100) ;; historic CD quality sample rate
(gliss A4 A5 (ms-to-samps 500 s-rate))

here's a simple exponential curve distributed over the range of the frequency range with rate samples:
(ns hello.exp
(:require [clojure.math.numeric-tower :as math]))
(defn gliss [start end rate]
(map #(+ start (* (math/expt (/ (inc %) rate) 2.718281828) (- end start)))
(range rate)))
This does not exactly fit your gliss curve because im using e as the exponent though I suspect it would sound good if you feed it to overtone ;) I suspect that a proper musical gliss would use an exponent of 1 in this function from what I read in the wikipedia article.
hello.exp> (gliss 440 880 5)
(445.5393041947095 476.4535293633514 549.7501826896913 679.8965206341077 880.0)
hello.exp> (map int (gliss 440 880 100))
(440 440 440 440 441 441 442 442 443 444 445 446 447 448 449
451 452 454 455 457 459 461 463 465 467 469 472 474 477 479
482 485 487 490 493 497 500 503 506 510 513 517 521 525 529
533 537 541 545 550 554 558 563 568 573 577 582 588 593 598
603 609 614 620 625 631 637 643 649 655 661 668 674 680 687
694 700 707 714 721 728 735 743 750 757 765 773 780 788 796
804 812 820 828 837 845 853 862 871 880)

Related

reformulating for loop with vectorization or other approach - octave

Is there any way to vectorize (or reformulate) each body of the loop in this code:
col=load('col-deau'); %load data
h=col(:,8); % corresponding water column
dates=col(:,3); % and its dates
%removing out-of-bound data
days=days(h~=9999.000);
h=h(h~=9999.000);
dates=sort(dates(h~=9999.000));
[k,hcat]=hist(h,nbin); %making classes (k) and boundaries of classes (hcat) of water column automatically
dcat=1:15; % make boundaries for dates
for k=1:length(dcat)-1 % Loop for each date class
ii=find(dates>=dcat(k)&dates<dcat(k+1));% Counting dates corresponding to the boundaries of each date class
for j=1:length(hcat)-1 % Loop over each class of water column
ij=find(h>=hcat(j)&h<hcat(j+1)); % Count water column corresponding to the boundaries of each water column class
obs(k,j)=length(intersect(ii,ij)); % Find the size of each intersecting matrix
end
end
I've tried using vectorization, for example, to change this part:
for k=1:length(dcat)-1
ii=find(dates>=dcat(k)&dates<dcat(k+1))
endfor
with this:
nk=1:length(dcat)-1;
ii2=find(dates>=dcat(nk)&dates<dcat(nk+1));
and also using bsxfun:
ii2=find(bsxfun(#and,bsxfun(#ge,dates,nk),bsxfun(#lt,dates,nk+1)));
but to no avail. Both these approaches produce identical output, and do not correspond to that of using for loop (in terms of elements and vector size).
For information, h is a vector which contains water column in meters and dates is a vector (integer with two digits) which contains the dates in which the measurement for a corresponding water column was taken.
The input file can be found here: https://drive.google.com/open?id=1EomLGYleaNtiGG2iV_9LRt425blxdIsm
As for the output, I want to have ii like this:
ii =
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
instead with the first approach I get ii2 which is very different in terms of value and vector size (I can't post the result because the vector size is too big).
Can someone help a desperate newbie here? I just need to reformulate the loop part into a better, more concise version.
If more details need to be added, please feel free to ask me.
You can use hist3:
pkg load statistics
[obs, ~] = hist3([dates(:) h(:)] ,'Edges', {dcat,hcat});

Joining two matrices, one with numbers and the other percentages

I have two matrices, cases and percentages. I want to combine both with the columns alternating between the two i.e. cases [c1] percent [c1] cases [c2] percent [c2]...
tab year region if sex==1, matcell(cases)
tab year region, matcell(total)
mata:st_matrix("percent", 100 * st_matrix("cases"):/st_matrix("total"))
matrix list cases
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 1313 1289 1121 1176 1176 1150 1190 1184 1042 940
r2 340 359 357 366 383 332 406 367 352 272
r3 260 246 266 265 270 259 309 306 266 283
r4 271 267 293 277 317 312 296 285 265 253
r5 218 249 246 213 264 255 247 221 229 220
r6 215 202 157 202 200 204 220 183 176 180
r7 178 193 218 199 194 195 201 187 172 159
r8 127 111 107 130 133 99 142 143 131 114
r9 64 68 85 74 70 60 59 70 76 61
. matrix list percent, format(%2.1f)
percent[9,10]
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 70.1 71.2 67.3 67.2 66.9 71.5 72.6 72.5 74.9 73.2
r2 65.3 65.2 69.1 64.4 68.0 70.5 72.0 64.8 66.4 64.9
r3 74.7 73.7 74.7 69.2 68.9 67.6 70.5 72.3 79.4 80.9
r4 66.3 72.6 72.9 74.9 72.7 73.8 72.2 73.3 74.9 71.7
r5 68.8 67.1 66.0 63.6 67.2 67.1 65.2 67.4 68.6 73.8
r6 73.1 72.9 69.2 63.7 67.6 68.0 72.4 68.8 74.9 78.9
r7 64.5 60.3 69.9 70.6 69.3 78.3 72.3 65.8 71.4 71.3
r8 66.1 64.2 63.3 74.7 69.3 56.9 70.6 70.1 63.9 57.9
r9 77.1 73.9 70.2 74.0 71.4 73.2 81.9 72.9 87.4 74.4
How do I combine both the matrices?
currently I have tried: matrix final=cases, percent but it just puts them beside each other? I want it so each column alternates between cases and percent.
I will then use putexcel command to put them into an already formatted table with columns of cases and percentages.
Let me start by supporting Nick Cox's comments.
The problem is, there is no simple solution for combining matrices as you desire. Nevertheless, it is simple to achieve the results you want, by taking a very much different path from the one you outlined. It's no fun to write an essay describing the technique in natural language; it's much simpler to demonstrate it using code, as I do below, and as I expect Nick might have been inclined to do.
By not providing a Minimal, Complete, and Verifiable example, as described in the link Nick provided to you, you've discouraged others from showing you where you've gone off the tracks.
// create a minimal amount of sample data hopefully similar to actual data
clear
input year region sex
2001 1 1
2001 1 2
2001 1 2
2002 1 1
2002 1 2
2001 2 1
2002 2 1
2002 2 2
end
list, clean noobs
// use collapse to generate summaries equivalent to two tabs
generate male = sex==1
collapse (count) total=male (sum) cases=male, by(year region)
list, clean noobs
generate percent = 100*cases/total
keep year region total percent
// flatten and interleave the columns
reshape wide total percent, i(year) j(region)
drop year
list, clean noobs
// now use export excel to output,
// or use mkmat to load into a matrix and use putexcel to output

How to find out if Prolog performs Tail Call Optimization

Using the development version of SWI Prolog (Win x64),
I wrote a DCG predicate for a deterministic lexer (hosted on github) (thus all external predicates leave no choice points):
read_token(parser(Grammar, Tables),
lexer(dfa-DFAIndex, last_accept-LastAccept, chars-Chars0),
Token) -->
( [Input],
{
dfa:current(Tables, DFAIndex, DFA),
char_and_code(Input, Char, Code),
dfa:find_edge(Tables, DFA, Code, TargetIndex)
}
-> { table:item(dfa_table, Tables, TargetIndex, TargetDFA),
dfa:accept(TargetDFA, Accept),
atom_concat(Chars0, Char, Chars),
NewState = lexer(dfa-TargetIndex,
last_accept-Accept,
chars-Chars)
},
read_token(parser(Grammar, Tables), NewState, Token)
; {
( LastAccept \= none
-> Token = LastAccept-Chars0
; ( ground(Input)
-> once(symbol:by_type_name(Tables, error, Index, _)),
try_restore_input(Input, FailedInput, InputR),
Input = [FailedInput | InputR],
format(atom(Error), '~w', [FailedInput]),
Token = Index-Error
; once(symbol:by_type_name(Tables, eof, Index, _)),
Token = Index-''
)
)
}
).
Now using (;) and -> a lot, I was wondering SWI-Prolog can optimize the recursive read_token(parser(Grammar, Tables), NewState, Token) using Tail-Call-Optimization,
or if I have to split up the predicate into several clauses manually.
I just don't know how to find out what the interpreter does, especially knowing that TCO is disabled when running the debugger.
To answer your question, I first looked for "trivial" goals that might prevent last call optimization. If found some:
; ( ground(Input)
-> once(symbol:by_type_name(Tables, error, Index, _)),
try_restore_input(Input, FailedInput, InputR),
Input = [FailedInput | InputR],
format(atom(Error), '~w', [FailedInput]),
Token = Index-Error
; once(symbol:by_type_name(Tables, eof, Index, _)),
Token = Index-''
)
In these two cases, LCO is prevented by those goals alone.
Now, I compliled your rule and looked at the expansion with listing:
?- listing(read_token).
read_token(parser(O, B), lexer(dfa-C, last_accept-T, chars-J), Q, A, S) :-
( A=[D|G],
dfa:current(B, C, E),
char_and_code(D, K, F),
dfa:find_edge(B, E, F, H),
N=G
-> table:item(dfa_table, B, H, I),
dfa:accept(I, L),
atom_concat(J, K, M),
P=lexer(dfa-H, last_accept-L, chars-M),
R=N,
read_token(parser(O, B),
P,
Q,
R,
S) % 1: looks nice!
; ( T\=none
-> Q=T-J
; ground(D)
-> once(symbol:by_type_name(B, error, W, _)),
try_restore_input(D, U, V),
D=[U|V],
format(atom(X), '~w', [U]),
Q=W-X % 2: prevents LCO
; once(symbol:by_type_name(B, eof, W, _)),
Q=W-'' % 3: prevents LCO
),
S=A % 4: prevents LCO
).
ad 1) This is the recursive case you most probably are looking for. Here, everything seems nice.
ad 2,3) Already discussed above, maybe you want to exchange goals
ad 4) This is an effect of the precise, steadfast way how {}//1 is handled in DCGs. As a rule of thumb: Implementers rather prefer to be steadfast than to strive for LCO-ness. Please refer to: DCG Expansion: Is Steadfastness ignored?
Please note also that there is a lot more to this than the simple reuse of the call frame. There is a lot of interaction with garbage collection. To overcome all those problems in SWI, an additional GC phase was necessary.
For more, refer to the tiny benchmarks in Precise Garbage Collection in Prolog
So to finally answer your question: Your rule might become optimized; provided there is no choicepoint left prior to the recursive goal.
There is also the real low level approach to this. I never use this for code development: vm_list. The listing shows you ultimately whether or not SWI might consider LCO (provided no choicepoint is there).
i_call and i_callm will never perform LCO. Only i_depart will do. At: 142 i_depart(read_token/5)
?- vm_list(read_token).
========================================================================
read_token/5
========================================================================
0 s_virgin
1 i_exit
----------------------------------------
clause 1 ((0x1cc4710)):
----------------------------------------
0 h_functor(parser/2)
2 h_firstvar(5)
4 h_firstvar(6)
6 h_pop
7 h_functor(lexer/3)
9 h_functor((-)/2)
11 h_const(dfa)
13 h_firstvar(7)
15 h_pop
16 h_functor((-)/2)
18 h_const(last_accept)
20 h_firstvar(8)
22 h_pop
23 h_rfunctor((-)/2)
25 h_const(chars)
27 h_firstvar(9)
29 h_pop
30 i_enter
31 c_ifthenelse(26,118)
34 b_unify_var(3)
36 h_list_ff(10,11)
39 b_unify_exit
40 b_var(6)
42 b_var(7)
44 b_firstvar(12)
46 i_callm(dfa,dfa:current/3)
49 b_var(10)
51 b_firstvar(13)
53 b_firstvar(14)
55 i_call(char_and_code/3)
57 b_var(6)
59 b_var(12)
61 b_var(14)
63 b_firstvar(15)
65 i_callm(dfa,dfa:find_edge/4)
68 b_unify_fv(16,11)
71 c_cut(26)
73 b_const(dfa_table)
75 b_var(6)
77 b_var(15)
79 b_firstvar(17)
81 i_callm(table,table:item/4)
84 b_var(17)
86 b_firstvar(18)
88 i_callm(dfa,dfa:accept/2)
91 b_var(9)
93 b_var(13)
95 b_firstvar(19)
97 i_call(atom_concat/3)
99 b_unify_firstvar(20)
101 b_functor(lexer/3)
103 b_functor((-)/2)
105 b_const(dfa)
107 b_argvar(15)
109 b_pop
110 b_functor((-)/2)
112 b_const(last_accept)
114 b_argvar(18)
116 b_pop
117 b_rfunctor((-)/2)
119 b_const(chars)
121 b_argvar(19)
123 b_pop
124 b_unify_exit
125 b_unify_fv(21,16)
128 b_functor(parser/2)
130 b_argvar(5)
132 b_argvar(6)
134 b_pop
135 b_var(20)
137 b_var2
138 b_var(21)
140 b_var(4)
142 i_depart(read_token/5)
144 c_var_n(22,2)
147 c_var_n(24,2)
150 c_jmp(152)
152 c_ifthenelse(27,28)
155 b_var(8)
157 b_const(none)
159 i_call((\=)/2)
161 c_cut(27)
163 b_unify_var(2)
165 h_functor((-)/2)
167 h_var(8)
169 h_var(9)
171 h_pop
172 b_unify_exit
173 c_var(10)
175 c_var_n(22,2)
178 c_var_n(24,2)
181 c_jmp(101)
183 c_ifthenelse(28,65)
186 b_firstvar(10)
188 i_call(ground/1)
190 c_cut(28)
192 b_functor((:)/2)
194 b_const(symbol)
196 b_rfunctor(by_type_name/4)
198 b_argvar(6)
200 b_const(error)
202 b_argfirstvar(22)
204 b_void
205 b_pop
206 i_call(once/1)
208 b_var(10)
210 b_firstvar(23)
212 b_firstvar(24)
214 i_call(try_restore_input/3)
216 b_unify_var(10)
218 h_list
219 h_var(23)
221 h_var(24)
223 h_pop
224 b_unify_exit
225 b_functor(atom/1)
227 b_argfirstvar(25)
229 b_pop
230 b_const('~w')
232 b_list
233 b_argvar(23)
235 b_nil
236 b_pop
237 i_call(format/3)
239 b_unify_var(2)
241 h_functor((-)/2)
243 h_var(22)
245 h_var(25)
247 h_pop
248 b_unify_exit
249 c_jmp(33)
251 b_functor((:)/2)
253 b_const(symbol)
255 b_rfunctor(by_type_name/4)
257 b_argvar(6)
259 b_const(eof)
261 b_argfirstvar(22)
263 b_void
264 b_pop
265 i_call(once/1)
267 b_unify_var(2)
269 h_functor((-)/2)
271 h_var(22)
273 h_const('')
275 h_pop
276 b_unify_exit
277 c_var(10)
279 c_var_n(23,2)
282 c_var(25)
284 b_unify_vv(4,3)
287 c_var_n(11,2)
290 c_var_n(13,2)
293 c_var_n(15,2)
296 c_var_n(17,2)
299 c_var_n(19,2)
302 c_var(21)
304 i_exit

Strange aget optimisation behavior

Followup on this question about aget performance
There seems to be something very strange going on optimisation wise. We knew the following was true:
=> (def xa (int-array (range 100000)))
#'user/xa
=> (set! *warn-on-reflection* true)
true
=> (time (reduce + (for [x xa] (aget ^ints xa x))))
"Elapsed time: 42.80174 msecs"
4999950000
=> (time (reduce + (for [x xa] (aget xa x))))
"Elapsed time: 2067.673859 msecs"
4999950000
Reflection warning, NO_SOURCE_PATH:1 - call to aget can't be resolved.
Reflection warning, NO_SOURCE_PATH:1 - call to aget can't be resolved.
However, some further experimenting really weirded me out:
=> (for [f [get nth aget]] (time (reduce + (for [x xa] (f xa x)))))
("Elapsed time: 71.898128 msecs"
"Elapsed time: 62.080851 msecs"
"Elapsed time: 46.721892 msecs"
4999950000 4999950000 4999950000)
No reflection warnings, no hints needed. Same behavior is seen by binding aget to a root var or in a let.
=> (let [f aget] (time (reduce + (for [x xa] (f xa x)))))
"Elapsed time: 43.912129 msecs"
4999950000
Any idea why a bound aget seems to 'know' how to optimise, where the core function doesn't ?
It has to do with the :inline directive on aget, which expands to (. clojure.lang.RT (aget ~a (int ~i)), whereas the normal function call involves the Reflector. Try these:
user> (time (reduce + (map #(clojure.lang.Reflector/prepRet
(.getComponentType (class xa)) (. java.lang.reflect.Array (get xa %))) xa)))
"Elapsed time: 63.484 msecs"
4999950000
user> (time (reduce + (map #(. clojure.lang.RT (aget xa (int %))) xa)))
Reflection warning, NO_SOURCE_FILE:1 - call to aget can't be resolved.
"Elapsed time: 2390.977 msecs"
4999950000
You might wonder what's the point of inlining, then. Well, check out these results:
user> (def xa (int-array (range 1000000))) ;; going to one million elements
#'user/xa
user> (let [f aget] (time (dotimes [n 1000000] (f xa n))))
"Elapsed time: 187.219 msecs"
user> (time (dotimes [n 1000000] (aget ^ints xa n)))
"Elapsed time: 8.562 msecs"
It turns out that in your example, as soon as you get past reflection warnings, your new bottleneck is the reduce + part and not array access. This example eliminates that and shows an order-of-magnitude advantage of the type-hinted, inlined aget.
when you call through a higher order function all arguments are cast to object. In these cases the compiler can't figure out the type for the function being called because it is unbound when the function is compiled. It can only be determined that it will be something that can be called with some arguments. No warning is printed because anything will work.
user> (map aget (repeat xa) (range 100))
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99)
you have found the edge where the clojure compiler gives up and just uses object for everything. (this is an oversimplified explanation)
if you wrap this in anything that gets compiled on it own (like an anonymous function) then the warnings become visible again, though they come from compiling the anonymous function, not form compiling the call to map.
user> (map #(aget %1 %2) (repeat xa) (range 100))
Reflection warning, NO_SOURCE_FILE:1 - call to aget can't be resolved.
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99)
and then the warning goes away when a type hint is added to the anonymous, though unchanging, function call.

How to calculate classification error rate

Alright. Now this question is pretty hard. I am going to give you an example.
Now the left numbers are my algorithm classification and the right numbers are the original class numbers
177 86
177 86
177 86
177 86
177 86
177 86
177 86
177 86
177 86
177 89
177 89
177 89
177 89
177 89
177 89
177 89
So here my algorithm merged 2 different classes into 1. As you can see it merged class 86 and 89 into one class. So what would be the error at the above example ?
Or here another example
203 7
203 7
203 7
203 7
16 7
203 7
17 7
16 7
203 7
At the above example left numbers are my algorithm classification and the right numbers are original class ids. As can be seen above it miss classified 3 products (i am classifying same commercial products). So at this example what would be the error rate? How would you calculate.
This question is pretty hard and complex. We have finished the classification but we are not able to find correct algorithm for calculating success rate :D
Here's a longish example, a real confuson matrix with 10 input classes "0" - "9"
(handwritten digits),
and 10 output clusters labelled A - J.
Confusion matrix for 5620 optdigits:
True 0 - 9 down, clusters A - J across
-----------------------------------------------------
A B C D E F G H I J
-----------------------------------------------------
0: 2 4 1 546 1
1: 71 249 11 1 6 228 5
2: 13 5 64 1 13 1 460
3: 29 2 507 20 5 9
4: 33 483 4 38 5 3 2
5: 1 1 2 58 3 480 13
6: 2 1 2 294 1 1 257
7: 1 5 1 546 6 7
8: 415 15 2 5 3 12 13 87 2
9: 46 72 2 357 35 1 47 2
----------------------------------------------------
580 383 496 1002 307 670 549 557 810 266 estimates in each cluster
y class sizes: [554 571 557 572 568 558 558 566 554 562]
kmeans cluster sizes: [ 580 383 496 1002 307 670 549 557 810 266]
For example, cluster A has 580 data points, 415 of which are "8"s;
cluster B has 383 data points, 249 of which are "1"s; and so on.
The problem is that the output classes are scrambled, permuted;
they correspond in this order, with counts:
A B C D E F G H I J
8 1 4 3 6 7 0 5 2 6
415 249 483 507 294 546 546 480 460 257
One could say that the "success rate" is
75 % = (415 + 249 + 483 + 507 + 294 + 546 + 546 + 480 + 460 + 257) / 5620
but this throws away useful information —
here, that E and J both say "6", and no cluster says "9".
So, add up the biggest numbers in each column of the confusion matrix
and divide by the total.
But, how to count overlapping / missing clusters,
like the 2 "6"s, no "9"s here ?
I don't know of a commonly agreed-upon way
(doubt that the Hungarian algorithm
is used in practice).
Bottom line: don't throw away information; look at the whole confusion matrix.
NB such a "success rate" will be optimistic for new data !
It's customary to split the data into say 2/3 "training set" and 1/3 "test set",
train e.g. k-means on the 2/3 alone,
then measure confusion / success rate on the test set — generally worse than on the training set alone.
Much more can be said; see e.g.
Cross-validation.
You have to define the error criteria if you want to evaluate the performance of an algorithm, so I'm not sure exactly what you're asking. In some clustering and machine learning algorithms you define the error metric and it minimizes it.
Take a look at this
https://en.wikipedia.org/wiki/Confusion_matrix
to get some ideas
You have to define a error metric to measure yourself. In your case, a simple method should be to find the properties mapping of your product as
p = properties(id)
where id is the product id, and p is likely be a vector with each entry of different properties. Then you can define the error function e (or distance) between two products as
e = d(p1, p2)
Sure, each properties must be evaluated to a number in this function. Then this error function can be used in the classification algorithm and learning.
In your second example, it seems that you treat the pair (203 7) as successful classification, so I think you have already a metric yourself. You may be more specific to get better answer.
Classification Error Rate(CER) is 1 - Purity (http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html)
ClusterPurity <- function(clusters, classes) {
sum(apply(table(classes, clusters), 2, max)) / length(clusters)
}
Code of #john-colby
Or
CER <- function(clusters, classes) {
1- sum(apply(table(classes, clusters), 2, max)) / length(clusters)
}

Resources