Fast way of checking if an element is ranked higher than another - performance

I am writing in MATLAB a program that checks whether two elements A and B were exchanged in ranking positions.
Example
Assume the first ranking is:
list1 = [1 2 3 4]
while the second one is:
list2 = [1 2 4 3]
I want to check whether A = 3 and B = 4 have exchanged relative positions in the rankings, which in this case is true, since in the first ranking 3 comes before 4 and in the second ranking 3 comes after 4.
Procedure
In order to do this, I have written the following MATLAB code:
positionA1 = find(list1 == A);
positionB1 = find(list1 == B);
positionA2 = find(list2 == A);
positionB2 = find(list2 == B);
if (positionA1 <= positionB1 && positionA2 >= positionB2) || ...
(positionA1 >= positionB1 && positionA2 <= positionB2)
... do something
end
Unfortunately, I need to run this code a lot of times, and the find function is really slow (but needed to get the element position in the list).
I was wondering if there is a way of speeding up the procedure. I have also tried to write a MEX file that performs in C the find operation, but it did not help.

If the lists don't change within your loop, then you can determine the positions of the items ahead of time.
Assuming that your items are always integers from 1 to N:
[~, positions_1] = sort( list1 );
[~, positions_2] = sort( list2 );
This way you won't need to call find within the loop, you can just do:
positionA1 = positions_1(A);
positionB1 = positions_1(B);
positionA2 = positions_2(A);
positionB2 = positions_2(B);
If your loop is going over all possible combinations of A and B, then you can also vectorize that
Find the elements that exchanged relative ranking:
rank_diff_1 = bsxfun(#minus, positions_1, positions_1');
rank_diff_2 = bsxfun(#minus, positions_2, positions_2');
rel_rank_changed = sign(rank_diff_1) ~= sign(rank_diff_2);
[A_changed, B_changed] = find(rel_rank_changed);
Optional: Throw out half of the results, because if (3,4) is in the list, then (4,3) also will be, and maybe you don't want that:
mask = (A_changed < B_changed);
A_changed = A_changed(mask);
B_changed = B_changed(mask);
Now loop over only those elements that have exchanged relative ranking
for ii = 1:length(A_changed)
A = A_changed(ii);
B = B_changed(ii);
% Do something...
end

Instead of find try to compute something like this
Check if there is any exchanged values.
if logical(sum(abs(list1-list2)))
do something
end;
For specific values A and B:
if (list1(logical((list1-list2)-abs((list1-list2))))==A)&&(list1(logical((list1-list2)+abs((list1-list2))))==B)
do something
end;

Related

A simple Increasing Mathematical Algorithm

I actually tried to search this, I'm sure this basic algorithm is everywhere on internet, CS textbooks etc, but I cannot find the right words to search it.
What I want from this algorithm to do is write "A" and "B" with the limit always increasing by 2. Like I want it to write A 3 times, then B 5 times, then A 7 times, then B 9 times and so on. And I plan to have 100 elements in total.
Like: AAABBBBBAAAAAAABBBBBBBBB...
I only want to use a single "for loop" for the entire 100 elements starting from 1 to 100. And just direct/sort "A" and "B" through "if/else if/ else".
I'm just asking for the basic mathematical algorithm behind it, showing it through any programming language would be better or redirecting me to such topic would also be fine.
You can do something like this:
There might be shorter answers, but I find this one easy to understand.
Basically, you keep a bool variable that will tell you if it's A's turn or Bs. Then we keep a variable switch that will tell us when we should switch between them. times is being updated with the repeated times we need to print the next character.
A_B = true
times = 3 // 3,5,7,9,...
switch = 3 // 3,8,15,24,...
for (i from 1 to 100)
if (A_B)
print 'A'
else
print 'B'
if (i == switch)
times += 2
switch += times
A_B = !A_B
Python:
for n in range(1, 101):
print "BA"[(int(sqrt(n)) % 2)],
The parity of the square roots of the integers follows that pattern. (Think that (n+1)²-n² = 2n+1.)
If you prefer to avoid the square root, it suffices to use an extra variable that represents the integer square root and keep it updated
r= 1
for n in range(1, 101):
if r * r <= n:
r+= 1
print "AB"[r % 2],
Here is the snippet you can test on this page. It is an example for about 500 letters totally, sure you can modify it for 100 letters. It is quite flexible that you can change the constants to produce lot of different strings in the same manner.
var toRepeat = ['A', 'B'];
var result='', j, i=3;
var sum=i;
var counter = 0;
while (sum < 500) {
j = counter % 2;
result = result + toRepeat[j].repeat(i);
sum = sum + i;
i = i + 2;
counter++;
}
document.getElementById('hLetters').innerHTML=result;
console.log(result);
<div id="hLetters"></div>
If you want it to be exactly 500 / 100 letters, just use a substring function to trim off the extra letters from the end.
To get 100 groups of A and B with increasing length of 3, 5, 7 and so on, you can run this Python code:
''.join(('B' if i % 2 else 'A') * (2 * i + 3) for i in range(100))
The output is a string of 10200 characters.
If you want the output to have only 100 characters, you can use:
import math
''.join(('B' if math.ceil(math.sqrt(i)) % 2 else 'A') for i in range(2, 102))
In js you can start with somethink like this :
$res ="";
count2 = 0;
for (i=2;i<100; i = i+2) {
count = 0;
alert(i);
while (count < i ) {
$res = $res.concat(String.fromCharCode(65+count2));
count++;
}
count2++;
}
alert ($res);

find all indices of multiple value pairs in a matrix

Suppose I have a matrix A, containing possible value pairs and a matrix B, containing all value pairs:
A = [1,1;2,2;3,3];
B = [1,1;3,4;2,2;1,1];
I would like to create a matrix C that contains all pairs that are allowed by A (i.e. C = [1,1;2,2;1,1]).
Using C = ismember(A,B,'rows') will only show the first occurence of 1,1, but I need both.
Currently I use a for-loop to create C, which looks like:
TFtot = false(size(B(:,1,1),1);
for i = 1:size(a(:,1),1)
TF1 = A(i,1) == B(:,1) & A(i,2) = B(:,2);
TFtot = TF1 | TFtot;
end
C = B(TFtot,:);
I would like to create a faster approach, because this loop currently greatly slows down the algorithm.
You're pretty close. You just need to swap B and A, then use this output to index into B:
L = ismember(B, A, 'rows');
C = B(L,:);
How ismember works in this particular case is that it outputs a logical vector that has the same number of rows as B where the ith value in B tells you whether we have found this ith row somewhere in A (logical 1) or if we haven't found this row (logical 0).
You want to select out those entries in B that are seen in A, and so you simply use the output of ismember to slice into B to extract out the affected rows, and grab all of the columns.
We get for C:
>> C
C =
1 1
2 2
1 1
Here's an alternative using bsxfun:
C = B(all(any(bsxfun(#eq, B, permute(A, [3 2 1])),3),2),:);
Or you could use pdist2 (Statistics Toolbox):
B(any(~pdist2(A,B),1),:);
Using matrix-multiplication based euclidean distance calculations -
Bt = B.'; %//'
[m,n] = size(A);
dists = [A.^2 ones(size(A)) -2*A ]*[ones(size(Bt)) ; Bt.^2 ; Bt];
C = B(any(dists==0,1),:);

How to remove those rows of matrix A, which have equal values with matrix B in specified columns in Matlab?

I have two matrices in Matlab A and B, which have equal number of columns but different number of rows. The number of rows in B is also less than the number of rows in A. B is actually a subset of A.
How can I remove those rows efficiently from A, where the values in columns 1 and 2 of A are equal to the values in columns 1 and 2 of matrix B?
At the moment I'm doing this:
for k = 1:size(B, 1)
A(find((A(:,1) == B(k,1) & A(:,2) == B(k,2))), :) = [];
end
and Matlab complains that this is inefficient and that I should try to use any, but I'm not sure how to do it with any. Can someone help me out with this? =)
I tried this, but it doesn't work:
A(any(A(:,1) == B(:,1) & A(:,2) == B(:,2), 2), :) = [];
It complains the following:
Error using ==
Matrix dimensions must agree.
Example of what I want:
A-B in the results means that the rows of B are removed from A. The same goes with A-C.
try using setdiff. for example:
c=setdiff(a,b,'rows')
Note, if order is important use:
c = setdiff(a,b,'rows','stable')
Edit: reading the edited question and the comments to this answer, the specific usage of setdiff you look for is (as noticed by Shai):
[temp c] = setdiff(a(:,1:2),b(:,1:2),'rows','stable')
c = a(c,:)
Alternative solution:
you can just use ismember:
a(~ismember(a(:,1:2),b(:,1:2),'rows'),:)
Use bsxfun:
compare = bsxfun( #eq, permute( A(:,1:2), [1 3 2]), permute( B(:,1:2), [3 1 2] ) );
twoEq = all( compare, 3 );
toRemove = any( twoEq, 2 );
A( toRemove, : ) = [];
Explaining the code:
First we use bsxfun to compare all pairs of first to column of A and B, resulting with compare of size numRowsA-by-numRowsB-by-2 with true where compare( ii, jj, kk ) = A(ii,kk) == B(jj,kk).
Then we use all to create twoEq of size numRowsA-by-numRowsB where each entry indicates if both corresponding entries of A and B are equal.
Finally, we use any to select rows of A that matches at least one row of B.
What's wrong with original code:
By removing rows of A inside a loop (i.e., A( ... ) = []) you actually resizing A at almost each iteration. See this post on why exactly this is a bad practice.
Using setdiff
In order to use setdiff (as suggested by natan) on only the first two columns you'll need use it's second output argument:
[ignore, ia] = setdiff( A(:,1:2), B(:,1:2), 'rows', 'stable' );
A = A( ia, : ); % keeping only relevant rows, beyond first two columns.
Here's another bsxfun implementation -
A(~any(squeeze(all(bsxfun(#eq,A(:,1:2),permute(B(:,1:2),[3 2 1])),2)),2),:)
One more that is dangerously close to Shai's solution, but still avoids two permute to one permute -
A(~any(all(bsxfun(#eq,A(:,1:2),permute(B(:,1:2),[3 2 1])),2),3),:)

Using non-continuous integers as identifiers in cells or structs in Matlab

I want to store some results in the following way:
Res.0 = magic(4); % or Res.baseCase = magic(4);
Res.2 = magic(5); % I would prefer to use integers on all other
Res.7 = magic(6); % elements than the first.
Res.2000 = 1:3;
I want to use numbers between 0 and 3000, but I will only use approx 100-300 of them. Is it possible to use 0 as an identifier, or will I have to use a minimum value of 1? (The numbers have meaning, so I would prefer if I don't need to change them). Can I use numbers as identifiers in structs?
I know I can do the following:
Res{(last number + 1)} = magic(4);
Res{2} = magic(5);
Res{7} = magic(6);
Res{2000} = 1:3;
And just remember that the last element is really the "number zero" element.
In this case I will create a bunch of empty cell elements [] in the non-populated positions. Does this cause a problem? I assume it will be best to assign the last element first, to avoid creating a growing cell, or does this not have an effect? Is this an efficient way of doing this?
Which will be most efficient, struct's or cell's? (If it's possible to use struct's, that is).
My main concern is computational efficiency.
Thanks!
Let's review your options:
Indexing into a cell arrays
MATLAB indices start from 1, not from 0. If you want to store your data in cell arrays, in the worst case, you could always use the subscript k + 1 to index into cell corresponding to the k-th identifier (k ≥ 0). In my opinion, using the last element as the "base case" is more confusing. So what you'll have is:
Res{1} = magic(4); %// Base case
Res{2} = magic(5); %// Corresponds to identifier 1
...
Res{k + 1} = ... %// Corresponds to indentifier k
Accessing fields in structures
Field names in structures are not allowed to begin with numbers, but they are allowed to contain them starting from the second character. Hence, you can build your structure like so:
Res.c0 = magic(4); %// Base case
Res.c1 = magic(5); %// Corresponds to identifier 1
Res.c2 = magic(6); %// Corresponds to identifier 2
%// And so on...
You can use dynamic field referencing to access any field, for instance:
k = 3;
kth_field = Res.(sprintf('c%d', k)); %// Access field k = 3 (i.e field 'c3')
I can't say which alternative seems more elegant, but I believe that indexing into a cell should be faster than dynamic field referencing (but you're welcome to check that out and prove me wrong).
As an alternative to EitanT's answer, it sounds like matlab's map containers are exactly what you need. They can deal with any type of key and the value may be a struct or cell.
EDIT:
In your case this will be:
k = {0,2,7,2000};
Res = {magic(4),magic(5),magic(6),1:3};
ResMap = containers.Map(k, Res)
ResMap(0)
ans =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
I agree with the idea in #wakjah 's comment. If you are concerned about the efficiency of your program it's better to change the interpretation of the problem. In my opinion there is definitely a way that you could priorotize your data. This prioritization could be according to the time you acquired them, or with respect to the inputs that they are calculated. If you set any kind of priority among them, you can sort them into an structure or cell (structure might be faster).
So
Priority (Your Current Index Meaning) Data
1 0 magic(4)
2 2 magic(5)
3 7 magic(6)
4 2000 1:3
Then:
% Initialize Result structure which is different than your Res.
Result(300).Data = 0; % 300 the maximum number of data
Result(300).idx = 0; % idx or anything that represent the meaning of your current index.
% Assigning
k = 1; % Priority index
Result(k).idx = 0; Result(k).Data = magic(4); k = k + 1;
Result(k).idx = 2; Result(k).Data = magic(5); k = k + 1;
Result(k).idx = 7; Result(k).Data = magic(6); k = k + 1;
...

Algorithm Issue: letter combinations

I'm trying to write a piece of code that will do the following:
Take the numbers 0 to 9 and assign one or more letters to this number. For example:
0 = N,
1 = L,
2 = T,
3 = D,
4 = R,
5 = V or F,
6 = B or P,
7 = Z,
8 = H or CH or J,
9 = G
When I have a code like 0123, it's an easy job to encode it. It will obviously make up the code NLTD. When a number like 5,6 or 8 is introduced, things get different. A number like 051 would result in more than one possibility:
NVL and NFL
It should be obvious that this gets even "worse" with longer numbers that include several digits like 5,6 or 8.
Being pretty bad at mathematics, I have not yet been able to come up with a decent solution that will allow me to feed the program a bunch of numbers and have it spit out all the possible letter combinations. So I'd love some help with it, 'cause I can't seem to figure it out. Dug up some information about permutations and combinations, but no luck.
Thanks for any suggestions/clues. The language I need to write the code in is PHP, but any general hints would be highly appreciated.
Update:
Some more background: (and thanks a lot for the quick responses!)
The idea behind my question is to build a script that will help people to easily convert numbers they want to remember to words that are far more easily remembered. This is sometimes referred to as "pseudo-numerology".
I want the script to give me all the possible combinations that are then held against a database of stripped words. These stripped words just come from a dictionary and have all the letters I mentioned in my question stripped out of them. That way, the number to be encoded can usually easily be related to a one or more database records. And when that happens, you end up with a list of words that you can use to remember the number you wanted to remember.
It can be done easily recursively.
The idea is that to handle the whole code of size n, you must handle first the n - 1 digits.
Once you have all answers for n-1 digits, the answers for the whole are deduced by appending to them the correct(s) char(s) for the last one.
There's actually a much better solution than enumerating all the possible translations of a number and looking them up: Simply do the reverse computation on every word in your dictionary, and store the string of digits in another field. So if your mapping is:
0 = N,
1 = L,
2 = T,
3 = D,
4 = R,
5 = V or F,
6 = B or P,
7 = Z,
8 = H or CH or J,
9 = G
your reverse mapping is:
N = 0,
L = 1,
T = 2,
D = 3,
R = 4,
V = 5,
F = 5,
B = 6,
P = 6,
Z = 7,
H = 8,
J = 8,
G = 9
Note there's no mapping for 'ch', because the 'c' will be dropped, and the 'h' will be converted to 8 anyway.
Then, all you have to do is iterate through each letter in the dictionary word, output the appropriate digit if there's a match, and do nothing if there isn't.
Store all the generated digit strings as another field in the database. When you want to look something up, just perform a simple query for the number entered, instead of having to do tens (or hundreds, or thousands) of lookups of potential words.
The general structure you want to hold your number -> letter assignments is an array or arrays, similar to:
// 0 = N, 1 = L, 2 = T, 3 = D, 4 = R, 5 = V or F, 6 = B or P, 7 = Z,
// 8 = H or CH or J, 9 = G
$numberMap = new Array (
0 => new Array("N"),
1 => new Array("L"),
2 => new Array("T"),
3 => new Array("D"),
4 => new Array("R"),
5 => new Array("V", "F"),
6 => new Array("B", "P"),
7 => new Array("Z"),
8 => new Array("H", "CH", "J"),
9 => new Array("G"),
);
Then, a bit of recursive logic gives us a function similar to:
function GetEncoding($number) {
$ret = new Array();
for ($i = 0; $i < strlen($number); $i++) {
// We're just translating here, nothing special.
// $var + 0 is a cheap way of forcing a variable to be numeric
$ret[] = $numberMap[$number[$i]+0];
}
}
function PrintEncoding($enc, $string = "") {
// If we're at the end of the line, then print!
if (count($enc) === 0) {
print $string."\n";
return;
}
// Otherwise, soldier on through the possible values.
// Grab the next 'letter' and cycle through the possibilities for it.
foreach ($enc[0] as $letter) {
// And call this function again with it!
PrintEncoding(array_slice($enc, 1), $string.$letter);
}
}
Three cheers for recursion! This would be used via:
PrintEncoding(GetEncoding("052384"));
And if you really want it as an array, play with output buffering and explode using "\n" as your split string.
This kind of problem are usually resolved with recursion. In ruby, one (quick and dirty) solution would be
#values = Hash.new([])
#values["0"] = ["N"]
#values["1"] = ["L"]
#values["2"] = ["T"]
#values["3"] = ["D"]
#values["4"] = ["R"]
#values["5"] = ["V","F"]
#values["6"] = ["B","P"]
#values["7"] = ["Z"]
#values["8"] = ["H","CH","J"]
#values["9"] = ["G"]
def find_valid_combinations(buffer,number)
first_char = number.shift
#values[first_char].each do |key|
if(number.length == 0) then
puts buffer + key
else
find_valid_combinations(buffer + key,number.dup)
end
end
end
find_valid_combinations("",ARGV[0].split(""))
And if you run this from the command line you will get:
$ ruby r.rb 051
NVL
NFL
This is related to brute-force search and backtracking
Here is a recursive solution in Python.
#!/usr/bin/env/python
import sys
ENCODING = {'0':['N'],
'1':['L'],
'2':['T'],
'3':['D'],
'4':['R'],
'5':['V', 'F'],
'6':['B', 'P'],
'7':['Z'],
'8':['H', 'CH', 'J'],
'9':['G']
}
def decode(str):
if len(str) == 0:
return ''
elif len(str) == 1:
return ENCODING[str]
else:
result = []
for prefix in ENCODING[str[0]]:
result.extend([prefix + suffix for suffix in decode(str[1:])])
return result
if __name__ == '__main__':
print decode(sys.argv[1])
Example output:
$ ./demo 1
['L']
$ ./demo 051
['NVL', 'NFL']
$ ./demo 0518
['NVLH', 'NVLCH', 'NVLJ', 'NFLH', 'NFLCH', 'NFLJ']
Could you do the following:
Create a results array.
Create an item in the array with value ""
Loop through the numbers, say 051 analyzing each one individually.
Each time a 1 to 1 match between a number is found add the correct value to all items in the results array.
So "" becomes N.
Each time a 1 to many match is found, add new rows to the results array with one option, and update the existing results with the other option.
So N becomes NV and a new item is created NF
Then the last number is a 1 to 1 match so the items in the results array become
NVL and NFL
To produce the results loop through the results array, printing them, or whatever.
Let pn be a list of all possible letter combinations of a given number string s up to the nth digit.
Then, the following algorithm will generate pn+1:
digit = s[n+1];
foreach(letter l that digit maps to)
{
foreach(entry e in p(n))
{
newEntry = append l to e;
add newEntry to p(n+1);
}
}
The first iteration is somewhat of a special case, since p-1 is undefined. You can simply initialize p0 as the list of all possible characters for the first character.
So, your 051 example:
Iteration 0:
p(0) = {N}
Iteration 1:
digit = 5
foreach({V, F})
{
foreach(p(0) = {N})
{
newEntry = N + V or N + F
p(1) = {NV, NF}
}
}
Iteration 2:
digit = 1
foreach({L})
{
foreach(p(1) = {NV, NF})
{
newEntry = NV + L or NF + L
p(2) = {NVL, NFL}
}
}
The form you want is probably something like:
function combinations( $str ){
$l = len( $str );
$results = array( );
if ($l == 0) { return $results; }
if ($l == 1)
{
foreach( $codes[ $str[0] ] as $code )
{
$results[] = $code;
}
return $results;
}
$cur = $str[0];
$combs = combinations( substr( $str, 1, $l ) );
foreach ($codes[ $cur ] as $code)
{
foreach ($combs as $comb)
{
$results[] = $code.$comb;
}
}
return $results;}
This is ugly, pidgin-php so please verify it first. The basic idea is to generate every combination of the string from [1..n] and then prepend to the front of all those combinations each possible code for str[0]. Bear in mind that in the worst case this will have performance exponential in the length of your string, because that much ambiguity is actually present in your coding scheme.
The trick is not only to generate all possible letter combinations that match a given number, but to select the letter sequence that is most easy to remember. A suggestion would be to run the soundex algorithm on each of the sequence and try to match against an English language dictionary such as Wordnet to find the most 'real-word-sounding' sequences.

Resources