Is there any kind of matrix filter in TCL? - sorting

Given the following matrix:
1 2 3 4
1 2 3 0
1 2 0 0
1 0 0 0
1 0 0 5
Return the rows that contain searched information.
For Example:
matrixname filter {{1} {} {3} {}}
The return would be:
1 2 3 4
1 2 3 0
and
matrixname filter {{1} {} {} {4}}
The return would be:
1 2 3 4
Does something like this already exist? I am thinking almost SQL-esk type of function WHERE COL = VALUE AND ORDER BY type of thing.
I have looked around and I am not finding anything.
---------------------------------EDIT
I have come up with the following to search the given fields.
::struct::list dbJoin -inner\
-keys FoundKeyList\
1 [::struct::list dbJoin -inner\
1 [MatrixName search -nocase column 2 $ITEM1]\
1 [MatrixName search -nocase column 1 $ITEM2]]\
1 [MatrixName search -nocase column 0 $ITEM3];
This will provide a list of row numbers that match the search criteria.
then you can just use MatrixName get row row or matrixName get rect column_tl row_tl column_br row_br to get the results.
Anyone have any feedback on this?

Two options come to mind.
In tcllib, there is the struct::matrix package. It has a search command. However, that command searches for patterns on individual cells (which can be constrained to particular columns) and you would need to write a procedure to perform the multiple searches required to achieve the conjunctive match you describe.
The other option is TclRAL. This will give you a relation value (aka a table) and you can perform a restrict command to obtain the subset matching an arbitrary expression, e.g.
set m [ral::relation table {C1 int C2 int C3 int C4 int}\
{1 2 3 4} {1 2 3 0} {1 2 0 0} {1 0 0 0} {1 0 0 5}]
set filt [ral::relation restrictwith $m {$C1 == 1 && $C3 == 3}]
However, both of these options are somewhat "heavyweight" and might be justified if there are more operations you need to perform on your tabular data than you indicate in your question. If the scope of your problem is as small as your questions indicates, then simply dashing off a procedure, as the other commenters have suggested, may be your best bet.

package require struct::matrix
struct::matrix xdata
xdata add columns 4
xdata add rows 5
xdata set rect 0 0 {
{1 2 3 4}
{1 2 3 0}
{1 2 0 0}
{1 0 0 0}
{1 0 0 5}
}
foreach {c r} [join [xdata search -regexp all {1*3}]] { puts [xdata get row $r] }
# 1 2 3 4
# 1 2 3 0
foreach {c r} [join [xdata search -regexp all {1*4}]] { puts [xdata get row $r] }
# 1 2 3 4
xdata destroy
This should achieve the expected results.

Related

Example of compress column format for rank-deficient matrices

It is the first time I deal with column-compress storage (CCS) format to store matrices. After googling a bit, if I am right, in a matrix having n nonzero elements the CCS is as follows:
-we define a vector A_v of dimensions n x 1 storing the n non-zero elements
of the matrix
- we define a second vector A_ir of dimensions n x 1 storing the rows of the
non-zero elements of the matrix
-we finally define a third vector A_jc whose elements are the indices of the
elements of A_v which corresponds to the beginning of new column, plus a
final value which is by convention equal t0 n+1, and identifies the end of
the matrix (pointing theoretically to a virtual extra-column).
So for instance,
if
M = [1 0 4 0 0;
0 3 5 2 0;
2 0 0 4 6;
0 0 7 0 8]
we get
A_v = [1 2 3 4 5 7 2 4 6 8];
A_ir = [1 3 2 1 2 4 2 3 3 4];
A_jc = [1 3 4 7 9 11];
my questions are
I) is what I wrote correct, or I misunderstood anything?
II) what if I want to represent a matri with some columns which are zeroes, e.g.,
M2 = [0 1 0 0 4 0 0;
0 0 3 0 5 2 0;
0 2 0 0 0 4 6;
0 0 0 0 7 0 8]
wouldn't the representation of M2 in CCS be identical to the one of M?
Thanks for the help!
I) is what I wrote correct, or I misunderstood anything?
You are perfectly correct. However, you have to take care that if you use a C or C++ library offsets and indices should start at 0. Here, I guess you read some Fortran doc for which indices are starting at 1. To be clear, here is below the C version, which simply translates the indices of your Fortran-style correct answer:
A_v = unmodified
A_ir = [0 2 1 0 1 3 1 2 2 4] (in short [1 3 2 1 2 4 2 3 3 4] - 1)
A_jc = [0 2 3 6 8 10] (in short [1 3 4 7 9 11] - 1)
II) what if I want to represent a matri with some columns which are
zeroes, e.g., M2 = [0 1 0 0 4 0 0;
0 0 3 0 5 2 0;
0 2 0 0 0 4 6;
0 0 0 0 7 0 8]
wouldn't the representation of M2 in CCS be identical to the one of M?
I you have an empty column, simply add a new entry in the offset table A_jc. As this column contains no element this new entry value is simply the value of the previous entry. For instance for M2 (with index starting at 0) you have:
A_v = unmodified
A_ir = unmodified
A_jc = [0 0 2 3 6 8 10] (to be compared to [0 2 3 6 8 10])
Hence the two representations are differents.
If you just start learning about sparse matrices there is an excelllent free book here: http://www-users.cs.umn.edu/~saad/IterMethBook_2ndEd.pdf

Julia: find row in matrix

Using Julia, I'd like to determine if a row is located in a matrix and (if applicable) where in the matrix the row is located. For example, in Matlab this can done with ismember:
a = [1 2 3];
B = [3 1 2; 2 1 3; 1 2 3; 2 3 1]
B =
3 1 2
2 1 3
1 2 3
2 3 1
ismember(B, a, 'rows')
ans =
0
0
1
0
From this, we can see a is located in row 3 of B. Is there a similar function to accomplish this in Julia?
You can also make use of array broadcasting by simply testing for equality (.==) without the use of comprehensions:
all(B .== a, dims=2)
Which gives you:
4x1 BitMatrix:
0
0
1
0
You can then use findall on this array:
findall(all(B .== a, 2))
However, this gives you a vector of CartesianIndex objects:
1-element Vector{CartesianIndex{2}}:
CartesianIndex(3, 1)
So if you expect to find multiple rows with the value defined in a you can either:
simplify this Vector by taking only the row index from each CartesianIndex:
[cart_idx[1] for cart_idx in findall(all(B .== a, 2))]
or pass one dimensional BitMatrix to findall (as suggested by Shep Bryan in the comment):
findall(all(B .== a, dims=2)[:, 1])
Either way you get an integer vector of column indices:
1-element Vector{Int64}:
3
Another pattern is using array comprehension:
julia> Bool[ a == B[i,:] for i=1:size(B,1) ]
4-element Array{Bool,1}:
false
false
true
false
julia> Int[ a == B[i,:] for i=1:size(B,1) ]
4-element Array{Int64,1}:
0
0
1
0
how about:
matchrow(a,B) = findfirst(i->all(j->a[j] == B[i,j],1:size(B,2)),1:size(B,1))
returns 0 when no matching row, or first row number when there is one.
matchrow(a,B)
3
should be as "fast as possible" and pretty simple too.
Though Julia doesn't have a built-in function, its easy enough as a one-liner.
a = [1 2 3];
B = [3 1 2; 2 1 3; 1 2 3; 2 3 1]
ismember(mat, x, dims) = mapslices(elem -> elem == vec(x), mat, dims)
ismember(B, a, 2) # Returns booleans instead of ints

Fast index mapping in matlab

I have the following problem:
Given a matrix A
A = [ 1 2 2 3 3 ;
2 2 2 7 9 ]
where the sequence of unique numbers within the matrix is not continuous. In this example
unique(A) = [ 1 2 3 7 9 ]. % [ 4 5 6 8 ] are missing
I want to compute the same matrix, but using instead a continuous sequence, such that
unique(A_new) = [ 1 2 3 4 5 ];
I came up with the following solution
T = [ unique(A), [ 1:numel(unique(A)) ]' ];
A_new = zeros(size(A));
for i = 1:size(T,1)
A_new( A == T(i,1) ) = T(i,2);
end
This is incredibly slow: the size of the matrix A I have to work with is 200x400x300 and the the number of unique elements within this matrix is 33406.
Any idea on how to speed up the procedure?
If I understand correctly, in your example you want:
A_new = [ 1 2 2 3 3 ;
2 2 2 4 5 ]
So just compute a lookup table (lookup) such that you can then do:
A_new = lookup(A);
So in your case, lookup would be:
[ 1 2 3 0 0 0 4 0 5 ]
I'll leave the process for generating that as an exercise for the reader.
Approach 1 (not recommended)
This should be pretty fast, but it uses more memory:
[~, A_new] = max(bsxfun(#eq, A(:).', unique(A(:))));
A_new = reshape(A_new, size(A));
How does this work?
First A is linearized into a vector (A(:)). Also, a vector containing the unique values of A is computed (unique(A(:))). From those two vectors a matrix is generated (with bsxfun) in which each entry of A is compared to each of the unique values. That way, for each entry of A we know if it equals the first unique value, or the second unique value, etc. For the A given in your question, that matrix is
1 0 0 0 0 0 0 0 0 0
0 1 1 1 1 1 0 0 0 0
0 0 0 0 0 0 1 0 1 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 1
So for example, the value 1 in entry (2,3) indicates that the third value of A(:) equals the second unique value of A (namely 2). The 1 in the lower-right entry (5,10) indicates that the tenth value of A(:) is the fifth unique value of A (which is 9).
Now the second output of max is used to extract the row position of the 1 value in each columnn (i.e. to obtain the numbers indicating "second", "fifth" etc in the above example)). These are the desired results. It only remains to reshape them into the shape of A.
Approach 2 (recommended)
The third output of unique does what you want:
[~, ~, labels] = unique(A);
A_new = reshape(labels, size(A));

Counting subrows in each row of a matrix in Matlab?

I need an algorithm in Matlab which counts how many adjacent and non-overlapping (1,1) I have in each row of a matrix A mx(n*2) without using loops. E.g.
A=[1 1 1 0 1 1 0 0 0 1; 1 0 1 1 1 1 0 0 1 1] %m=2, n=5
Then I want
B=[2;3] %mx1
Specific case
Assuming A to have ones and zeros only, this could be one way -
B = sum(reshape(sum(reshape(A',2,[]))==2,size(A,2)/2,[]))
General case
If you are looking for a general approach that must work for all integers and a case where you can specify the pattern of numbers, you may use this -
patt = [0 1] %%// pattern to be found out
B = sum(reshape(ismember(reshape(A',2,[])',patt,'rows'),[],2))
Output
With patt = [1 1], B = [2 3]
With patt = [0 1], B = [1 0]
you can use transpose then reshape so each consecutive values will now be in a row, then compare the top and bottom row (boolean compare or compare the sum of each row to 2), then sum the result of the comparison and reshape the result to your liking.
in code, it would look like:
A=[1 1 1 0 1 1 0 0 0 1; 1 0 1 1 1 1 0 0 1 1] ;
m = size(A,1) ;
n = size(A,2)/2 ;
Atemp = reshape(A.' , 2 , [] , m ) ;
B = squeeze(sum(sum(Atemp)==2))
You could pack everything in one line of code if you want, but several lines is usually easier for comprehension. For clarity, the Atemp matrix looks like that:
Atemp(:,:,1) =
1 1 1 0 0
1 0 1 0 1
Atemp(:,:,2) =
1 1 1 0 1
0 1 1 0 1
You'll notice that each row of the original A matrix has been broken down in 2 rows element-wise. The second line will simply compare the sum of each row with 2, then sum the valid result of the comparisons.
The squeeze command is only to remove the singleton dimensions not necessary anymore.
you can use imresize , for example
imresize(A,[size(A,1),size(A,2)/2])>0.8
ans =
1 0 1 0 0
0 1 1 0 1
this places 1 where you have [1 1] pairs... then you can just use sum
For any pair type [x y] you can :
x=0; y=1;
R(size(A,1),size(A,2)/2)=0; % prealocarting memory
for n=1:size(A,1)
b=[A(n,1:2:end)' A(n,2:2:end)']
try
R(n,find(b(:,1)==x & b(:,2)==y))=1;
end
end
R =
0 0 0 0 1
0 0 0 0 0
With diff (to detect start and end of each run of ones) and accumarray (to group runs of the same row; each run contributes half its length rounded down):
B = diff([zeros(1,size(A,1)); A.'; zeros(1,size(A,1))]); %'// columnwise is easier
[is js] = find(B==1); %// rows and columns of starts of runs of ones
[ie je] = find(B==-1); %// rows and columns of ends of runs of ones
result = accumarray(js, floor((ie-is)/2)); %// sum values for each row of A

Best data structure for search?

I have a list with items where each have number of properties (A, B, C, D) which I would like to filter using template containing same attributes (A, B, C, D). When I use a template I would like to filter all items matching this template. The match is assumed if item is equal to template or is smaller subsequence of it (0 match any item).
Example data
A B C D
1 0 1 0
2 0 0 0
0 0 2 3
2 0 2 1
2 0 2 0
0 0 0 0
Example templates
[2 0 0 0] will filter {[0 0 0 0], [2 0 0 0]}
[2 0 2 0] will filter {[0 0 0 0], [2 0 0 0], [2 0 2 0]}
[2 0 2 1] will filter {[0 0 0 0], [2 0 2 1]}
[3 4 5 6] will filter {[0 0 0 0]}
[0 0 2 0] will filter {[0 0 0 0], [0 0 2 3], [2 0 2 1], [2 0 2 0]}
The problem is that number of comparisons can easily reach 300k and can get slow sometimes. What tricks or structure I could use to make things quicker? Any ideas?
Assuming 4 properties, let's place all the items into 16 buckets.
First bucket is where there are no zero-values for the properties. Selecting from here - simple lookup based on key ABCD.
Second bucket is where the property A == 0. Selecting from here is a lookup on the template with value of BCD.
Third bucket is where B == 0. Selecting from here is a lookup on the template with value of ACD.
Fourth is where A == 0 and B == 0. Selecting from here is a lookup on the template with value of CD.
....
Fifteenth is where A,B,C == 0. the lookup is on D.
Sixteenth is where A,B,C,D == 0. This can be a boolean variable ;-)
Since all of the 16 buckets are 'exact match' - you can use methods like hash tables for the search inside them.
(this proposal is based on the assumption from the example that it's 0 in the prop value that counts as 'match any' and not in the template.) - because the 2000 selected only one value in your exaample. it will obviously be incorrect if the semantics is 'any' in both places.
--
update: corollary: you can have no more than 2^Nproperties matches.
Example:
Let's suppose we have 3 properties A,B,C and the following four items:
itemX[A=1, B=0, C=1] ---> B is a wildcard, so bucketAC[11] = itemX
itemY[A=2, B=0, C=0] ---> B and C are wildcards, so bucketA[2] = itemY
itemZ[A=2, B=1, C=0] ---> C is a wildcard, so bucketAB[21] = itemZ
now, the lookup for a key 'abc' would be as follows (I also include to the right the
contents of the buckets for ease of reading, and '<<' means 'accumulate' in this context)
1.results << bucketA[a] | '2' => itemY[A=2, B=0, C=0]
2.results << bucketB[b]
3.results << bucketAB[ab] | '21' => itemZ[A=2, B=1, C=0]
4.results << bucketC[c]
5.results << bucketAC[ac] | '11' => itemX[A=1, B=0, C=1]
6.results << bucketBC[bc]
7.results << bucketABC[abc]
8.results << bucket_item_all_wildcards
So if we use template [2 0 0], we get the results from key being A=2 in bucketA only.
If we use template [2 1 0], then we get the results from key being A=2 in bucketA,
and from key being AB=21 in bucketAB - two results.
NB: Of course, the above notation for keys is rather frivolous, it merely assumes "hashtable-like access with the concatenation of the said properties being the key".
If you are allowed to have items with the same properties multiple times, then you will need to have multiple elements in some slots - and then, obviously, you can have more
than 2^Nproperties search results, nonetheless you can track the maximum number of duplicates and hence always predict the worst-case maximum number of items.
Notably, if the number of properties grows, the total possible number of buckets will quickly blow up (e.g. 32 properties would mean maximum more than 4 billion buckets),
so this idea will no longer be applicable directly, and would need further
optimizations around the bucket traversal/allocation.
What about nested hash maps? For example, an item "it" will be stored as:
map(it.A)(it.B)(it.C).(it.D) = it
So [2 0 2 0] could be searched as:
map(2).keys.(2).keys

Resources