Related
It is a straightforward question: Is there a faster alternative to all(a(:,i)==a,1) in MATLAB?
I'm thinking of a implementation that benefits from short-circuit evaluations in the whole process. I mean, all() definitely benefits from short-circuit evaluations but a(:,i)==a doesn't.
I tried the following code,
% example for the input matrix
m = 3; % m and n aren't necessarily equal to those values.
n = 5000; % It's only possible to know in advance that 'm' << 'n'.
a = randi([0,5],m,n); % the maximum value of 'a' isn't necessarily equal to
% 5 but it's possible to state that every element in
% 'a' is a positive integer.
% all, equal solution
tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
%%%%%%%%%% all and equal solution %%%%%%%%%
ax_boo = all(a(:,i)==a,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc
% alternative solution
tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
%%%%%%%%%%% alternative solution %%%%%%%%%%%
ax_boo = a(1,i) == a(1,:);
for k = 2:m
ax_boo(ax_boo) = a(k,i) == a(k,ax_boo);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc
but it's intuitive that any "for-loop-solution" within the MATLAB environment will be naturally slower. I'm wondering if there is a MATLAB built-in function written in a faster language.
EDIT:
After running more tests I found out that the implicit expansion does have a performance impact in evaluating a(:,i)==a. If the matrix a has more than one row, all(repmat(a(:,i),[1,n])==a,1) may be faster than all(a(:,i)==a,1) depending on the number of columns (n). For n=5000 repmat explicit expansion has proved to be faster.
But I think that a generalization of Kenneth Boyd's answer is the "ultimate solution" if all elements of a are positive integers. Instead of dealing with a (m x n matrix) in its original form, I will store and deal with adec (1 x n matrix):
exps = ((0):(m-1)).';
base = max(a,[],[1,2]) + 1;
adec = sum( a .* base.^exps , 1 );
In other words, each column will be encoded to one integer. And of course adec(i)==adec is faster than all(a(:,i)==a,1).
EDIT 2:
I forgot to mention that adec approach has a functional limitation. At best, storing adec as uint64, the following inequality must hold base^m < 2^64 + 1.
Since your goal is to count the number of columns that match, my example converts the binary encoding to integer decimals, then you just loop over the possible values (with 3 rows that are 8 possible values) and count the number of matches.
a_dec = 2.^(0:(m-1)) * a;
num_poss_values = 2 ^ m;
num_matches = zeros(num_poss_values, 1);
for i = 1:num_poss_values
num_matches(i) = sum(a_dec == (i - 1));
end
On my computer, using 2020a, Here are the execution times for your first 2 options and the code above:
Elapsed time is 0.246623 seconds.
Elapsed time is 0.553173 seconds.
Elapsed time is 0.000289 seconds.
So my code is 853 times faster!
I wrote my code so it will work with m being an arbitrary integer.
The num_matches variable contains the number of columns that add up to 0, 1, 2, ...7 when converted to a decimal.
As an alternative you can use the third output of unique:
[~, ~, iu] = unique(a.', 'rows');
for i = 1:n
ax_boo = iu(i) == iu;
end
As indicated in a comment:
ax_boo isolates the indices of the columns I have to sum in a row vector b. So, basically the next line would be something like c = sum(b(ax_boo),2);
It is a typical usage of accumarray:
[~, ~, iu] = unique(a.', 'rows');
C = accumarray(iu,b);
for i = 1:n
c = C(i);
end
Integer :: NBE,ierr,RN,i,j
Real(kind=8), allocatable :: AA1(:,:),AA2(:,:)
NBE=40
RN=3*NBE-2
Allocate(AA1(3*NBE,3*NBE),AA2(3*NBE,RN),stat=ierr)
If (ierr .ne. 0) Then
print *, 'allocate steps failed 1'
pause
End If
Do i=1,3*NBE
Do j=1,3*NBE
AA1(i,j)=1
End Do
End Do
I want to remove columns 97 and 113 from the matrix AA1 and then this matrix becomes AA2. I just want to know if any command of Fortran can realize this operation?
Alexander Vogt's answer gives the concept of using a vector subscript to select the elements of the array for inclusion. That answer constructs the vector subscript array using
[(i,i=1,96),(i=98,112),(i=114,3*NBE)]
Some may consider
AA2 = AA1(:,[(i,i=1,96),(i=98,112),(i=114,3*NBE)])
to be less than clear in reading. One could use a "temporary" index vector
integer selected_columns(RN)
selected_columns = [(i,i=1,96),(i=98,112),(i=114,3*NBE)]
AA2 = AA1(:,selected_columns)
but that doesn't address the array constructor being not nice, especially in more complicated cases. Instead, we can create a mask and use our common techniques:
logical column_wanted(3*NBE)
integer, allocatable :: selected_columns(:)
! Create a mask of whether a column is wanted
column_wanted = .TRUE.
column_wanted([97,113]) = .FALSE.
! Create a list of indexes of wanted columns
selected_columns = PACK([(i,i=1,3*NBE)],column_wanted)
AA2 = AA1(:,selected_columns)
Here's a simple one liner:
AA2 = AA1(:,[(i,i=1,96),(i=98,112),(i=114,3*NBE)])
Explanation:
(Inner part) Construct a temporary array for the indices [1,...,96,98,...,112,114,...,3*NBE]
(Outer part) Copy the matrix and only consider the columns in the index array
OK, I yield to #IanBush... Even simpler would be to do three dedicated assignments:
AA2(:,1:96) = AA1(:,1:96)
AA2(:,97:111) = AA1(:,98:112)
AA2(:,112:) = AA1(:,114:)
I don't have a fortran compiler here at home, so I cant test it. But I'd do something line this:
i = 0
DO j = 1, 3*NBE
IF (j == 97 .OR. j == 113) CYCLE
i = i + 1
AA2(:, i) = AA1(:, j)
END DO
The CYCLE command means that the rest of the loop should not be executed any more and the next iteration should start. Thereby, i will not get incremented, so when j=96, then i=96, when j=98 then i=97, and when j=114 then i=112.
A few more words: Due to Fortran's memory layout, you want to cycle over the first index the fastest, and so forth. So your code would run faster if you changed it to:
Do j=1,3*NBE ! Outer loop over second index
Do i=1,3*NBE ! Inner loop over first index
AA1(i,j)=1
End Do
End Do
(Of course, such an easy initialisation can be done even easier with just AA1(:,:) = 1 of just AA1 = 1.
I have a matrix that contains both character and reals and I want a program that reads this matrix (finds the dimensions by itself). Here is my code:
! A fortran95 program for G95
Program Project2nd
implicit none
character(len=40), allocatable :: a(:,:)
integer i,j,k,n,m,l,st
character(len=40) d
n=0; m=1; j=1;
open(10,file=&
'/Users/dariakowsari/Documents/Physics/Programming/Fortran95-Projects/Project2nd/input.txt', &
IOstat=st)
do while (st == 0)
read(10,*,IOstat=st) d
n=n+1
end do
st=0
do j=1,m
do while (st == 0)
allocate(a(1,m))
read(10,*,IOstat=st) (a(1,j),j=1,m)
m=m+1
deallocate(a)
end do
print*, n,m
end
Here is my Matrix:
a b 13 15.5 13.2
c d 16 16.75 19
e f 19.2 12.2 18.2
With this code I got (3,2) for the dimensions of my matrix.
There are a few errors in your example code which means it doesn't compile for me but after a few changes I managed to get a similar result to you.
*Update: As noted by #francescalus in the comments to my other (now deleted) answer, that approach involved undefined behaviour and as such is not an appropriate solution. This arose from trying to read more elements from the file than were present.)
Here's an alternative approach, which should avoid this undefined behaviour, but is probably pretty inefficient.
Program Project2nd
implicit none
character(len=40), allocatable :: a(:)
integer, allocatable :: ind(:)
integer, parameter :: maxElements = 100
integer i,j,n,m,st
character(len=40) d
n=0;
open(10,file='mat.txt',IOstat=st)
!Find number of lines
do while (st == 0)
read(10,*,IOstat=st) d
if(st ==0) n=n+1
end do
!Move back to the start of the file
rewind(10)
!Read all of the data
do m=n,maxElements,n
allocate(a(m))
read(10,*,IOstat=st) a
deallocate(a)
rewind(10)
if(st.ne.0) exit
enddo
m = m -n !Need to roll back m by one iteration to get the last which worked.
if(mod(m,n).ne.0) then
print*,"Error: Number of elements not divisible by number of rows."
stop
endif
!Number of columns = n_elements/nrow
m=m/n
print*, n,m
end Program Project2nd
Essentially this uses the same code as you had for counting the number of lines, however note that you only want to increment n when the read was successful (i.e. st==0). Note we do not exit the whilst block as soon as st becomes non-zero, it is only once we reach the end of the whilst block. After that we need to rewind the file so that the next read starts at the start of the file.
In a previous comment you mentioned that you'd rather not have to specify maxElement if you really want to avoid this then replace the second do loop with something like
st = 0 ; m = n
do while (st==0)
allocate(a(m))
read(10,*,IOstat=st) a
deallocate(a)
rewind(10)
if(st.ne.0) then
m = m - n !Go back to value of m that worked
exit
endif
m=m+n
enddo
here is how to do w/o rewinding.
implicit none
character(len=100) wholeline
character(len=20), allocatable :: c(:)
integer iline,io,ni,nums
open(20,file='testin.dat')
iline=0
do while(.true.)
read(20,'(a)',iostat=io)wholeline
if(io.ne.0)exit
iline=iline+1
ni=lineitems(wholeline)
allocate(c(ni))
read(wholeline,*)c
nums=ctnums(c)
write(*,*)'line',iline,' contains ',ni,'items',nums,
$ 'are numbers'
deallocate(c)
enddo
write(*,*)'total lines is ',iline
contains
integer function ctnums(c)
! count the number of items in a character array that are numbers
! this is a template,
! obviously you could assign the numbers to a real array here
character(len=*), allocatable :: c(:)
real f
integer i,io
ctnums=0
do i = 1,size(c)
read(c(i),*,iostat=io)f
if(io.eq.0)ctnums=ctnums+1
enddo
end function
integer function lineitems(line)
! count the number of items in a space delimited string
integer,parameter ::maxitems=100
character(len=*) line
character(len=80) :: c(maxitems)
integer iline,io
lineitems=0
do iline=1,maxitems
read(line,*,iostat=io)c(:iline)
if(io.ne.0)return
lineitems=iline
enddo
if(lineitems.eq.maxitems)write(*,*)'warning maxitems reached'
end function
end
output
line 1 contains 5 items 3 are numbers
line 2 contains 5 items 3 are numbers
total lines is 2
I have two matrices in Matlab A and B, which have equal number of columns but different number of rows. The number of rows in B is also less than the number of rows in A. B is actually a subset of A.
How can I remove those rows efficiently from A, where the values in columns 1 and 2 of A are equal to the values in columns 1 and 2 of matrix B?
At the moment I'm doing this:
for k = 1:size(B, 1)
A(find((A(:,1) == B(k,1) & A(:,2) == B(k,2))), :) = [];
end
and Matlab complains that this is inefficient and that I should try to use any, but I'm not sure how to do it with any. Can someone help me out with this? =)
I tried this, but it doesn't work:
A(any(A(:,1) == B(:,1) & A(:,2) == B(:,2), 2), :) = [];
It complains the following:
Error using ==
Matrix dimensions must agree.
Example of what I want:
A-B in the results means that the rows of B are removed from A. The same goes with A-C.
try using setdiff. for example:
c=setdiff(a,b,'rows')
Note, if order is important use:
c = setdiff(a,b,'rows','stable')
Edit: reading the edited question and the comments to this answer, the specific usage of setdiff you look for is (as noticed by Shai):
[temp c] = setdiff(a(:,1:2),b(:,1:2),'rows','stable')
c = a(c,:)
Alternative solution:
you can just use ismember:
a(~ismember(a(:,1:2),b(:,1:2),'rows'),:)
Use bsxfun:
compare = bsxfun( #eq, permute( A(:,1:2), [1 3 2]), permute( B(:,1:2), [3 1 2] ) );
twoEq = all( compare, 3 );
toRemove = any( twoEq, 2 );
A( toRemove, : ) = [];
Explaining the code:
First we use bsxfun to compare all pairs of first to column of A and B, resulting with compare of size numRowsA-by-numRowsB-by-2 with true where compare( ii, jj, kk ) = A(ii,kk) == B(jj,kk).
Then we use all to create twoEq of size numRowsA-by-numRowsB where each entry indicates if both corresponding entries of A and B are equal.
Finally, we use any to select rows of A that matches at least one row of B.
What's wrong with original code:
By removing rows of A inside a loop (i.e., A( ... ) = []) you actually resizing A at almost each iteration. See this post on why exactly this is a bad practice.
Using setdiff
In order to use setdiff (as suggested by natan) on only the first two columns you'll need use it's second output argument:
[ignore, ia] = setdiff( A(:,1:2), B(:,1:2), 'rows', 'stable' );
A = A( ia, : ); % keeping only relevant rows, beyond first two columns.
Here's another bsxfun implementation -
A(~any(squeeze(all(bsxfun(#eq,A(:,1:2),permute(B(:,1:2),[3 2 1])),2)),2),:)
One more that is dangerously close to Shai's solution, but still avoids two permute to one permute -
A(~any(all(bsxfun(#eq,A(:,1:2),permute(B(:,1:2),[3 2 1])),2),3),:)
Given a non-negative integer n and an arbitrary set of inequalities that are user-defined (in say an external text file), I want to determine whether n satisfies any inequality, and if so, which one(s).
Here is a points list.
n = 0: 1
n < 5: 5
n = 5: 10
If you draw a number n that's equal to 5, you get 10 points.
If n less than 5, you get 5 points.
If n is 0, you get 1 point.
The stuff left of the colon is the "condition", while the stuff on the right is the "value".
All entries will be of the form:
n1 op n2: val
In this system, equality takes precedence over inequality, so the order that they appear in will not matter in the end. The inputs are non-negative integers, though intermediary and results may not be non-negative. The results may not even be numbers (eg: could be strings). I have designed it so that will only accept the most basic inequalities, to make it easier for writing a parser (and to see whether this idea is feasible)
My program has two components:
a parser that will read structured input and build a data structure to store the conditions and their associated results.
a function that will take an argument (a non-negative integer) and return the result (or, as in the example, the number of points I receive)
If the list was hardcoded, that is an easy task: just use a case-when or if-else block and I'm done. But the problem isn't as easy as that.
Recall the list at the top. It can contain an arbitrary number of (in)equalities. Perhaps there's only 3 like above. Maybe there are none, or maybe there are 10, 20, 50, or even 1000000. Essentially, you can have m inequalities, for m >= 0
Given a number n and a data structure containing an arbitrary number of conditions and results, I want to be able to determine whether it satisfies any of the conditions and return the associated value. So as with the example above, if I pass in 5, the function will return 10.
They condition/value pairs are not unique in their raw form. You may have multiple instances of the same (in)equality but with different values. eg:
n = 0: 10
n = 0: 1000
n > 0: n
Notice the last entry: if n is greater than 0, then it is just whatever you got.
If multiple inequalities are satisfied (eg: n > 5, n > 6, n > 7), all of them should be returned. If that is not possible to do efficiently, I can return just the first one that satisfied it and ignore the rest. But I would like to be able to retrieve the entire list.
I've been thinking about this for a while and I'm thinking I should use two hash tables: the first one will store the equalities, while the second will store the inequalities.
Equality is easy enough to handle: Just grab the condition as a key and have a list of values. Then I can quickly check whether n is in the hash and grab the appropriate value.
However, for inequality, I am not sure how it will work. Does anyone have any ideas how I can solve this problem in as little computational steps as possible? It's clear that I can easily accomplish this in O(n) time: just run it through each (in)equality one by one. But what happens if this checking is done in real-time? (eg: updated constantly)
For example, it is pretty clear that if I have 100 inequalities and 99 of them check for values > 100 while the other one checks for value <= 100, I shouldn't have to bother checking those 99 inequalities when I pass in 47.
You may use any data structure to store the data. The parser itself is not included in the calculation because that will be pre-processed and only needs to be done once, but if it may be problematic if it takes too long to parse the data.
Since I am using Ruby, I likely have more flexible options when it comes to "messing around" with the data and how it will be interpreted.
class RuleSet
Rule = Struct.new(:op1,:op,:op2,:result) do
def <=>(r2)
# Op of "=" sorts before others
[op=="=" ? 0 : 1, op2.to_i] <=> [r2.op=="=" ? 0 : 1, r2.op2.to_i]
end
def matches(n)
#op2i ||= op2.to_i
case op
when "=" then n == #op2i
when "<" then n < #op2i
when ">" then n > #op2i
end
end
end
def initialize(text)
#rules = text.each_line.map do |line|
Rule.new *line.split(/[\s:]+/)
end.sort
end
def value_for( n )
if rule = #rules.find{ |r| r.matches(n) }
rule.result=="n" ? n : rule.result.to_i
end
end
end
set = RuleSet.new( DATA.read )
-1.upto(8) do |n|
puts "%2i => %s" % [ n, set.value_for(n).inspect ]
end
#=> -1 => 5
#=> 0 => 1
#=> 1 => 5
#=> 2 => 5
#=> 3 => 5
#=> 4 => 5
#=> 5 => 10
#=> 6 => nil
#=> 7 => 7
#=> 8 => nil
__END__
n = 0: 1
n < 5: 5
n = 5: 10
n = 7: n
I would parse the input lines and separate them into predicate/result pairs and build a hash of callable procedures (using eval - oh noes!). The "check" function can iterate through each predicate and return the associated result when one is true:
class PointChecker
def initialize(input)
#predicates = Hash[input.split(/\r?\n/).map do |line|
parts = line.split(/\s*:\s*/)
[Proc.new {|n| eval(parts[0].sub(/=/,'=='))}, parts[1].to_i]
end]
end
def check(n)
#predicates.map { |p,r| [p.call(n) ? r : nil] }.compact
end
end
Here is sample usage:
p = PointChecker.new <<__HERE__
n = 0: 1
n = 1: 2
n < 5: 5
n = 5: 10
__HERE__
p.check(0) # => [1, 5]
p.check(1) # => [2, 5]
p.check(2) # => [5]
p.check(5) # => [10]
p.check(6) # => []
Of course, there are many issues with this implementation. I'm just offering a proof-of-concept. Depending on the scope of your application you might want to build a proper parser and runtime (instead of using eval), handle input more generally/gracefully, etc.
I'm not spending a lot of time on your problem, but here's my quick thought:
Since the points list is always in the format n1 op n2: val, I'd just model the points as an array of hashes.
So first step is to parse the input point list into the data structure, an array of hashes.
Each hash would have values n1, op, n2, value
Then, for each data input you run through all of the hashes (all of the points) and handle each (determining if it matches to the input data or not).
Some tricks of the trade
Spend time in your parser handling bad input. Eg
n < = 1000 # no colon
n < : 1000 # missing n2
x < 2 : 10 # n1, n2 and val are either number or "n"
n # too short, missing :, n2, val
n < 1 : 10x # val is not a number and is not "n"
etc
Also politely handle non-numeric input data
Added
Re: n1 doesn't matter. Be careful, this could be a trick. Why wouldn't
5 < n : 30
be a valid points list item?
Re: multiple arrays of hashes, one array per operator, one hash per point list item -- sure that's fine. Since each op is handled in a specific way, handling the operators one by one is fine. But....ordering then becomes an issue:
Since you want multiple results returned from multiple matching point list items, you need to maintain the overall order of them. Thus I think one array of all the point lists would be the easiest way to do this.