Column-wise comparision of common element-Matlab - matrix

How can I compare 2 matrices column wise; and find if there is any common element in corresponding column and return the column number (note: elements need not be in corresponding position)
Function:bsxfun(#eq,A,B) is NOT useful here as it compares corresponding elements in column.
Requirement: A=[1 2 3;4 5 6;7 8 9], B=[0 0 0;8 7 9;4 1 6] here value 4 is common in col#1 of A and B; similarly value 6,9 are common in column 3 of A&B; So return column 1 and column 3.
Can you please suggest a method; I would be grateful to you.

You can use ismember to compare columns (or rows) as you describe. It returns a logical index of A indicating matches in B. Use any to reduce column wise and find to get the column indices.
You can use a for loop over columns or use arrayfun:
find(arrayfun(#(c) any(ismember(A(:,c), B(:,c))), 1:size(A,2)))
I would be interested to see if you find a neater, more succinct solution!

Related

Sort tuples by averages of second element

I have been able to convert a csv file to list format using a function. In doing so I was able to assign a name to a class number and thereafter 3 additional numbers e.g:
In the csv file:
Hussain 1 7 8 0
Alexandra 1 0 0 2
Became :
['Alexandra', 2],['Hussain', 8]
As the sorting method asked for the name in alphabetical order and the person's highest score. I used tuples to complete the code above and would like to carry on using tuples.
Now I wish to be able to sort this so that it becomes highest averages to lowest, e.g, the sorting method for averages would result in:
[Hussain, 1.66666666667],[Alexandra, 0.6666666666]
These numbers are what I expect as they are the averages of the last three numbers in the csv file as the 2nd column is being ignored here. As Hussain has the highest average he is placed first. I would appreciate any possible help.
What I would like to be done is the following:
I would like to be able to print out all the students in order of highest averages to lowest. As Hussain has a higher average of 1.6, he is printed out first then Alexandra is printed as she has a lower average. These two students are from the same class (shown in the second column of the csv file) and they are to be printed when the user chooses class 1 to be sorted.
TIA
suppose you have a list of class 1 like:
class1 = [['Alexandra', 1, 0, 0, 2], ['Hussain', 1, 7, 8, 0]]
then you sorted this according second element in lists of list
##this is class number 1 and you find class 1 people in second index of list by i[1] == 1(class number)
avg_list = [[i[0], float(sum(i[2:]))/len(i[2:])] for i in class1 if i[1] == 1 ]
dd = sorted(avg_list, key=lambda x: x[1])
dd.reverse()
print dd
Output:
[['Hussain', 5.0], ['Alexandra', 0.6666666666666666]]
Use key function in python's sort function. Read about it here:
https://docs.python.org/3/howto/sorting.html

Is there a search algorithm for huge two-dimensional arrays?

This is not a real-life question, it is just theory-crafting.
I have a big array which consists of elements like [1,140,245,123443], all
integer or floats with low selectivity, and the number of unique values is ten
times less than the size of the array. B*tree indexing is not good in this case.
I also tried to implement bitmap indexing, but in Ruby, binary operations are not so fast.
Are there any good algorithms for searching two-dimensional arrays of fixed size vectors?
And, the main question is, how do I convert the vector in value, where the conversion function has to be monotonic, so I can apply range queries such as:
(v[0]<10, v[2]>100, v[3]=32, 0.67*10^-8<v[4]<1.2154241410*10^-6)
the only idea i have is to create separate sorted indexes for each component of vector...binary search then and merge...but it is a bad idea because in the worst case scenario it will require O(N*N) operations...
Assuming that each "column" is vaguely evenly distributed in a known range, you could keep track of a series of buckets for each column, and a list of rows that satisfy the bucket. The number of buckets for each column can be the same, or different, it's totally arbitrary. More buckets is faster, but takes slightly more memory.
my table:
range: {1to10} {1to4m} {-2mto2m}
row1: {7 3427438335 420645075}
row2: {5 3862506151 -1555396554}
row3: {1 2793453667 -1743457796}
buckets for column 1:
bucket{1-3} : row3
bucket{4-6} : row2
bucket{7-10} : row1
buckets for column 2:
bucket{1-2m} :
bucket{2m-4m} : row1, row2, row4
buckets for column 3:
bucket{-2m--1m} : row2, row3
bucket{-1m-0} :
bucket{0-1m} :
bucket{1m-2m} : row1
Then, given a series of criteria: {v[0]<=5, v[2]>3*10^10}, we pull out the buckets that match that criteria:
column 1:
v[0]<=5 matches buckets {1-3} and {4-6}, which is rows 2 and 3.
column 2:
v[2]>3*10^10} matches buckets {2m-4m} and {4-6}, which is rows 1, 2 and 3.
column 3:
"" matches all , which is rows 1, 2 and 3.
Now we know that the row(s) we're looking for meet all three criteria, so we list all the rows that are in the buckets that matched all the criteria, in this case, rows 2 and 3. At this point, the number of rows remaining will be small even for massive amounts of data, depending on the granularity of your buckets. You simply check each of the rows that is left at this point to see if they match. In this sample we see that row 2 matches, but row 3 doesn't.
This algorithm is technically O(n), but in practice, if you have large numbers of small buckets, this algorithm can be very fast.
Using an index :)
The basic idea is to turn the 2 dimensional array into a 1 dimensional sorted array(while keeping the original position) and apply binary search on the later.
This method works for any n dimensional array and is used widely by databases which can be seen as a n dimensional array with variable lengths.

octave matrix for loop performance

I am new to Octave. I have two matrices. I have to compare a particular column of a one matrix with the other(my matrix A is containing more than 5 variables, similarly matrix B is containing the same.) and if elements in column one of matrix A is equal to elements in the second matrix B then I have to use the third column of second matrix B to compute certain values.I am doing this with octave by using for loop , but it consumes a lot of time to do the computation for single day , i have to do this for a year . Because size of matrices is very large.Please suggest some alternative way so that I can reduce my time and computation.
Thank you in advance.
Thanks for your quick response -hfs
continuation of the same problem,
Thank u, but this will work only if both elements in both the rows are equal.For example my matrices are like this,
A=[1 2 3;4 5 6;7 8 9;6 9 1]
B=[1 2 4; 4 2 6; 7 5 8;3 8 4]
here column 1 of first element of A is equal to column 1 of first element of B,even the second column hence I can take the third element of B, but for the second element of column 1 is equal in A and B ,but second element of column 2 is different ,here it should search for that element and print the element in the third column,and am doing this with for loop which is very slow because of larger dimension.In mine actual problem I have given for loop as written below:
for k=1:37651
for j=1:26018
if (s(k,1:2)==l(j,1:2))
z=sin((90-s(k,3))*pi/180) , break ,end
end
end
I want an alternative way to do this which should be faster than this.
You should work with complete matrices or vectors whenever possible. You should try commands and inspect intermediate results in the interactive shell to see how they fit together.
A(:,1)
selects the first column of a matrix. You can compare matrices/vectors and the result is a matrix/vector of 0/1 again:
> A(:,1) == B(:,1)
ans =
1
1
0
If you assign the result you can use it again to index into matrices:
I = A(:,1) == B(:,1)
B(I, 3)
This selects the third column of B of those rows where the first column of A and B is equal.
I hope this gets you started.

How to balance the number of items across multiple columns

I need to find out a method to determine how many items should appear per column in a multiple column list to achieve the most visual balance. Here are my criteria:
The list should only be split into multiple columns if the item count is greater than 10.
If multiple columns are required, they should contain no less than 5 (except for the last column in case of a remainder) and no more than 10 items.
If all columns cannot contain an equal number of items
All but the last column should be equal in number.
The number of items in each column should be optimized to achieve the smallest difference between the last column and the other column(s).
Well, your requirements and your examples appear a bit contradictory. For instance, your second example could be divided into two columns with 11 items in each, and satisfy your criteria. Let's assume that for rule #2 you meant that there should be <= 10 items / column.
In addition, I think you need to add another rule to make the requirements sensible:
The number of columns must not be greater than what is required to accomodate overflow.
Otherwise, you will often end up with degenerate solutions where you have far more columns than you need. For example, in the case of 26 items you probably don't want 13 columns of 2 items each.
If that's case, here's a simple calculation that should work well and is easy to understand:
int numberOfColumns = CEILING(numberOfItems / 10);
int numberOfItemsPerColumn = CEILING(numberOfItems / numberOfColumns);
Now you'll create N-1 columns of items (having `numberOfItemsPerColumn each) and the overflow will go in the last column. By this definition, the overflow should be minimized in the last column.
If you want to automatically determine the appropriate number of columns, and have no restrictions on its limits, I would suggest the following:
Calculate the square root of the total number of items. That would make an squared layout.
Divide that number by 1.618, and assign that to the total number of rows.
Multiply that same number by 1.618, and assign that to the total number of columns.
All columns but the right most one will have the same number of items.
By the way, the constant 1.618 is the Golden Ratio. That will achieve a more pleasant layout than a squared one.
Divide and multiply the other way round for vertical displays.
Hope this algorithm helps anyone with a similar problem.
Here's what you're trying to solve:
minimize y - z where n = xy + z and 5 <= y <= 10 and 0 <= z <= y
where you have n items split into x full columns of y items and one remainder column of z items.
There is almost certainly a smart way of doing this, but given these constraints a brute force implementation exploring all 6 + 7 + 8 + 9 + 10 = 40 possible combinations for y and z would take no time at all (only assignments where (n - z) mod y = 0 are solutions).
I think a brute force solution is easy, given the constraint on the number of items per columns: let v be the number of items per column (except the last one), then v belongs to [5,10] and can thus take a whooping 6 different values.
Evaluating 6 values is easy enough. Python one-liner (or not so far) to prove it:
# compute the difference between the number of items for the normal columns
# and for the last column, lesser is better
def helper(n,v):
modulo = n % v
if modulo == 0: return 0
else: return v - modulo
# values can only be in [5,10]
# we compute the difference with the last column for each
# build a list of tuples (difference, - number of items)
# (because the greater the value the better, it means less columns)
# extract the min automatically (in case of equality, less is privileged)
# and then pick the number of items from the tuple and re-inverse it
def compute(n): return - min([(helper(n,v), -v) for v in [5,6,7,8,9,10]])[1]
For 77 this yields: 7 meaning 7 items per columns
For 22 this yields: 8 meaning 8 items per columns

help writing an algorithm

i need to write algo for this problem. i have never written an algo before . please correct me.
there is a list which contains four collumns each with numbers with upto 5 digits and about 10 rows in total. we have to remove the rows containng any number with less than 3 digits.
here is how i have tried
read list into multi-dimensional array
for each number in the array
if numdigits < 3
delete all numbers of that row
i know this is not the correct algorithm . can you help me correct it .
When creating your original list, rather check the individual values then, and not add it to that list if any of the numbers has less than 3 digits, that way reducing the original list size.
EDIT:
foreach row in original_document
{
bool allMoreThan3Digits = true
foreach cell in row
allMoreThan3Digits = allMoreThan3Digits && (ABS(cell.Value) >= 100)
if (allMoreThan3Digits)
add row to new list
}
Something like that.
With up to 5 digits in total in each column? If so here is what I would do.
For each row in list
For each column in row
if column number < 100 then
row delete

Resources