sort vs sortrows in Matlab - sorting

Is it possible to achieve the same functionality with sort function than using sortrows. My matrix has over 4million+ rows and sortrows is bringing in a lot of latency because of iterations. (a vectorizated approach would be appreciated)
%Col1 -> date, Col2 -> id, Col3 -> ranking within each date-group (to help you debug)
data = [ ...
734614 5 3; 734615 6 5; 734622 1 1; 734615 1 1; 734615 4 3;
734622 2 2; 734622 4 3; 734615 3 2; 734615 5 4; 734614 3 2;
734614 1 1; 734622 8 4; 734622 9 5;] ;
sortedanswer =
734614 1 1
734614 3 2
734614 5 3
734615 1 1
734615 3 2
734615 4 3
734615 5 4
734615 6 5
734622 1 1
734622 2 2
734622 4 3
734622 8 4
734622 9 5
Thanks!

You could do it as
[~,indx]=sort(data(:,1));
sortedanswer=data(indx,:)
sortedanswer =
734614 5 3
734614 3 2
734614 1 1
734615 6 5
734615 1 1
734615 4 3
734615 3 2
734615 5 4
734622 1 1
734622 2 2
734622 4 3
734622 8 4
734622 9 5
Note that it is sorted by the rows in the first column. The order of the rows is the same as that in the original data, which is why you see 5 3 in the second and third columns in the first row in mine.

Related

How to make SORTKEY for irregular observations

I'd like to make "SORTKEY" like the below. It's not the same observations for each one.
Basically, each one is 3 obs but if flg=1 then "SORTKEY" includes that observation.
In this example, it means SORTKEY = 2 is 4 obs, SORTKEY ^=2 is 3 obs.
Is there the way to make the SORTKEY manually?. If you have a good idea, please give me some advice.
I want the following dataset, using the "test" dataset.
/*
SORTKEY NO FLG
1    1  0
1    2  0
1    3  0
2    4  0
2    5  0
2    6  0
2    7  1
3    8  0
3    9  0
3    10 0
*/
data test;
input no flg;
cards;
1 0
2 0
3 0
4 0
5 0
6 0
7 1
8 0
9 0
10 0
;
run;
Use a sequence counter to track the 3-rows-per-sortkey requirement.
Example:
data want;
set have;
retain sortkey 1;
seq+1;
if seq > 3 and flag ne 1 then do;
seq = 1;
sortkey+1;
end;
run;

How to iterate n nested for loops each from 0 to n?

for example if n = 2
// Nested loop for all possible pairs
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
// here i have to use, i, j
}
}
for example if n = 3
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
for (int k = 0; k < n; k++)
// here i have to use, i, j, k
But if n = k, then how to iterate n for loops each from 0 to n?
I tried a lot but am not able to come up with a solution.
Is there any way to do it? Please help.
Can you test it with recursive function like this :
int foo(int n, int level)
{
if(level == 0)
{
for(int i=0; i<n; i++){
//Do something
}
return someValue;
}
for(int i=0; i<n; i++){
foo(n, --level);
}
}
//Start it with level = n like this
foo(n,n);
You could use an array for the indexes.
int indexes[n] = {0}; //initialize the whole array to zero
while {
//use indexes[0..n-1] as you would i, j, k, ...
indexes[n-1]++;
//propagate carry
for(i = n-1; i > 0; i--) {
if (indexes[i] == n) {
indexes[i-1]++;
indexes[i] = 0;
}
else break; //early exit from propagation, in case it's not necessary
}
if ( indexes[0] == n) break;
}
We can use the recursive solution ( with only a single loop and rest is the power of recursion ) , you just need to specify the depth and n where:
depth which specifies the level of nested for loops you want to use
n size of each loop
Below is the c++ implementation
#include <iostream>
#include <vector>
using namespace std;
void loop(int n, int depth, vector<int> &iteration){
if(!depth){
for(auto x:iteration)
cout<<x<<" ";
cout<<endl;
return;
}
for(int i=0;i<n;i++){
iteration[iteration.size()-depth] = i;
loop(n, depth-1, iteration);
}
}
int main(int argc, char const *argv[])
{
int depth = 3, n = 5;
vector<int> iteration(depth);
loop(n, depth, iteration);
return 0;
}
Output
0 0 0
0 0 1
0 0 2
0 0 3
0 0 4
0 1 0
0 1 1
0 1 2
0 1 3
0 1 4
0 2 0
0 2 1
0 2 2
0 2 3
0 2 4
0 3 0
0 3 1
0 3 2
0 3 3
0 3 4
0 4 0
0 4 1
0 4 2
0 4 3
0 4 4
1 0 0
1 0 1
1 0 2
1 0 3
1 0 4
1 1 0
1 1 1
1 1 2
1 1 3
1 1 4
1 2 0
1 2 1
1 2 2
1 2 3
1 2 4
1 3 0
1 3 1
1 3 2
1 3 3
1 3 4
1 4 0
1 4 1
1 4 2
1 4 3
1 4 4
2 0 0
2 0 1
2 0 2
2 0 3
2 0 4
2 1 0
2 1 1
2 1 2
2 1 3
2 1 4
2 2 0
2 2 1
2 2 2
2 2 3
2 2 4
2 3 0
2 3 1
2 3 2
2 3 3
2 3 4
2 4 0
2 4 1
2 4 2
2 4 3
2 4 4
3 0 0
3 0 1
3 0 2
3 0 3
3 0 4
3 1 0
3 1 1
3 1 2
3 1 3
3 1 4
3 2 0
3 2 1
3 2 2
3 2 3
3 2 4
3 3 0
3 3 1
3 3 2
3 3 3
3 3 4
3 4 0
3 4 1
3 4 2
3 4 3
3 4 4
4 0 0
4 0 1
4 0 2
4 0 3
4 0 4
4 1 0
4 1 1
4 1 2
4 1 3
4 1 4
4 2 0
4 2 1
4 2 2
4 2 3
4 2 4
4 3 0
4 3 1
4 3 2
4 3 3
4 3 4
4 4 0
4 4 1
4 4 2
4 4 3
4 4 4
Symbolically, you are generating all values of a base-n number of n digits. You perform this by starting from all zeroes and incrementing n^n times. Every time a digit reaches them, you reset it and carry to the next.
E.g. with n=3,
000 001 002 010 011 012 020 021 022 100 101 102 110 111 112 120 121 122 200 201 202 210 211 212 220 221 222
A possible implementation is with n counters.

How to increment by group

There is a table and now add a new column -- sort_num int default 0
id level sort_num
1 1 0
2 1 0
3 2 0
4 2 0
5 2 0
6 3 0
7 3 0
8 3 0
9 3 0
Now I want to set sort_num values like below
id level sort_num
1 1 1
2 1 2
3 2 1
4 2 2
5 2 3
6 3 1
7 3 2
8 3 3
9 3 4
The Java code implement above requirement is
int sortNum = 0;
int currentLevel = fooList.get(0).getLevel();
for (RuleConf foo : fooList) {
if(currentLevel != foo.getLevel()){
sortNum = 0;
currentLevel = foo.getLevel();
}
foo.setSortNum(++sortNum);
}
I want to know if Java8 could simplify above code?
PS. Use mysql to implement this requirement
set #index:=0; update t set sort_num = (#index:=#index+1) where level = 1 order by id;
set #index:=0; update t set sort_num = (#index:=#index+1) where level = 2 order by id;
set #index:=0; update t set sort_num = (#index:=#index+1) where level = 3 order by id;
The best approach is to stick to your plain enhanced for loop. I don't think it is possible to come up with a single Stream solution, since you need to have intermediate values. Like:
Map<Integer, List<RuleConf>> levels = fooList.stream()
.collect(Collectors.groupingBy(RuleConf::getLevel));
levels.values().forEach(v ->
IntStream.range(0, v.size()).forEach(i -> v.get(i).setSortNum(i + 1))
);
If you keep track of the next order numbers yourself, you may do it with one stream. This solution is thread safe as well, hence should work with parallel streams:
Map<Integer, AtomicInteger> orders = new ConcurrentHashMap<>();
fooList.stream().forEachOrdered(foo -> {
orders.putIfAbsent(foo.getLevel(), new AtomicInteger());
foo.setOrder(orders.get(foo.getLevel()).incrementAndGet());
});
It should outperform the other stream-solutions, because it requires to iterate over the list only ones.

Count the frequency of matrix values including 0

I have a vector
A = [ 1 1 1 2 2 3 6 8 9 9 ]
I would like to write a loop that counts the frequencies of values in my vector within a range I choose, this would include values that have 0 frequencies
For example, if I chose the range of 1:9 my results would be
3 2 1 0 0 1 0 1 2
If I picked 1:11 the result would be
3 2 1 0 0 1 0 1 2 0 0
Is this possible? Also ideally I would have to do this for giant matrices and vectors, so the fasted way to calculate this would be appreciated.
Here's an alternative suggestion to histcounts, which appears to be ~8x faster on Matlab 2015b:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11;
N = accumarray(A(:), 1, [maxRange,1])';
N =
3 2 1 0 0 1 0 1 2 0 0
Comparing the speed:
K>> tic; for i = 1:100000, N1 = accumarray(A(:), 1, [maxRange,1])'; end; toc;
Elapsed time is 0.537597 seconds.
K>> tic; for i = 1:100000, N2 = histcounts(A,1:maxRange+1); end; toc;
Elapsed time is 4.333394 seconds.
K>> isequal(N1, N2)
ans =
1
As per the loop request, here's a looped version, which should not be too slow since the latest engine overhaul:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11; %// your range
output = zeros(1,maxRange); %// initialise output
for ii = 1:maxRange
tmp = A==ii; %// temporary storage
output(ii) = sum(tmp(:)); %// find the number of occurences
end
which would result in
output =
3 2 1 0 0 1 0 1 2 0 0
Faster and not-looping would be #beaker's suggestion to use histcounts:
[N,edges] = histcounts(A,1:maxRange+1);
N =
3 2 1 0 0 1 0 1 2 0
where the +1 makes sure the last entry is included as well.
Assuming the input A to be a sorted array and the range starts from 1 and goes until some value greater than or equal to the largest element in A, here's an approach using diff and find -
%// Inputs
A = [2 4 4 4 8 9 11 11 11 12]; %// Modified for variety
maxN = 13;
idx = [0 find(diff(A)>0) numel(A)]+1;
out = zeros(1,maxN); %// OR for better performance : out(maxN) = 0;
out(A(idx(1:end-1))) = diff(idx);
Output -
out =
0 1 0 3 0 0 0 1 1 0 3 1 0
This can be done very easily with bsxfun.
Let the data be
A = [ 1 1 1 2 2 3 6 8 9 9 ]; %// data
B = 1:9; %// possible values
Then
result = sum(bsxfun(#eq, A(:), B(:).'), 1);
gives
result =
3 2 1 0 0 1 0 1 2

R - Making loops faster

This little code snippet is supposed to loop through a sorted data frame. It keeps a count of how many successive rows have the same information in columns aIndex and cIndex and also bIndex and dIndex. If these are the same, it deposits the count and increments it for the next time around, and if they differ, it deposits the count and resets it to 1 for the next time around.
for (i in 1:nrow(myFrame)) {
if (myFrame[i, aIndex] == myFrame[i, cIndex] &
myFrame[i, bIndex] == myFrame[i, dIndex]) {
myFrame[i, eIndex] <- count
count <- (count + 1)
} else {
myFrame[i, eIndex] <- count
count <- 1
}
}
It's been running for a long time now. I understand that I'm supposed to vectorize whenever possible, but I'm not really seeing it here. What am I supposed to do to make this faster?
Here's what an example few rows should look like after running:
aIndex bIndex cIndex dIndex eIndex
1 2 1 2 1
1 2 1 2 2
1 2 4 8 3
4 8 1 4 1
1 4 1 4 1
I think this will do what you want; the tricky part is that the count resets after the difference, which effectively puts a shift on the eIndex.
There (hopefully) is an easier way to do this, but this is what I came up with.
tmprle <- rle(((myFrame$aIndex == myFrame$cIndex) &
(myFrame$bIndex == myFrame$dIndex)))
myFrame$eIndex <- c(1,
unlist(ifelse(tmprle$values,
Vectorize(seq.default)(from = 2,
length = tmprle$lengths),
lapply(tmprle$lengths,
function(x) {rep(1, each = x)})))
)[-(nrow(myFrame)+1)]
which gives
> myFrame
aIndex bIndex cIndex dIndex eIndex
1 1 2 1 2 1
2 1 2 1 2 2
3 1 2 4 8 3
4 4 8 1 4 1
5 1 4 1 4 1
Maybe this will work. I have reworked the rle and sequence bits.
dat <- read.table(text="aIndex bIndex cIndex dIndex
1 2 1 2
1 2 1 2
1 2 4 8
4 8 1 4
1 4 1 4", header=TRUE, as.is=TRUE,sep = " ")
dat$eIndex <-NA
#identify rows where a=c and b=d, multiply by 1 to get a numeric vector
dat$id<-(dat$aIndex==dat$cIndex & dat$bIndex==dat$dIndex)*1
#identify sequence
runs <- rle(dat$id)
#create sequence, multiply by id to keep only identicals, +1 at the end
count <-sequence(runs$lengths)*dat$id+1
#shift sequence down one notch, start with 1
dat$eIndex <-c(1,count[-length(count)])
dat
aIndex bIndex cIndex dIndex eIndex id
1 1 2 1 2 1 1
2 1 2 1 2 2 1
3 1 2 4 8 3 0
4 4 8 1 4 1 0
5 1 4 1 4 1 1

Resources