Amazon QuickSight - Running Difference - amazon-quicksight

I have the following table and want to add a column with running difference.
time_gap
amounts
0
150
0.5
19
1.5
2
6
1
7
4
my desired out is
time_gap
amounts
diff
0
150
150
0.5
19
131
1.5
2
129
6
10
119
7
4
115
What I've tried:
I duplicate the amounts column and used table calculation, difference, but got the difference between two consecutive rows instead:
time_gap
amounts
diff
0
150
0.5
19
-131
1.5
2
-17
6
10
8
7
4
-6
I tried some calculated fields formulas but that didn't work either.
thank you!

Related

Make a matrix B of the first, fourth and fifth row and the first and fifth column from matrix A in OCTAVE

I have matrix A
A =
5 10 15 20 25
10 9 8 7 6
-5 -15 -25 -35 -45
1 2 3 4 5
28 91 154 217 280
And i need to make a matrix B of the first, fourth and fifth row and the first and fifth column from matrix A.
How can i do it?
>> B = A([1,4,5],[1,5])
B =
5 25
1 5
28 280
You should look up how to use index expressions in the Matlab and Octave language to extract and work with submatrices.
See the Octave help on Index expressions: https://octave.org/doc/latest/Index-Expressions.html

Scilab sort by second column

I have some data:
P = [3 10 25 32 43 1 3
6 12 35 39 49 4 9
2 9 23 36 47 2 9
...
7 20 35 42 44 3 7
15 18 19 41 42 4 6
10 18 32 35 46 3 10];
Data is always between 1 and 50.
I am selecting left 5 columns and 2 right columns:
L=P(:,1:5);
R=P(:,6:7);
I am counting occurrences:
a=tabul(L);
b=tabul(R);
In this moment, in a I am getting:
50. 3.
49. 4.
48. 3.
which tells me, that value 50 occurs 3 times, 49 occurs 4 times and so on.
What I need now is sort matrix a by second column but the first column should be arranged with the second column values. So it would look like this:
49. 4.
50. 3.
48. 3.
How can I sort matrix a this way (later I will sort b the same way)?
I was trying something like:
[a,idx]=gsort(a(:,2),"g","d");
a=a(idx,:);
but this not does what I need.
It does not work because you are overwriting a in the gsort call although you just need the index here. The following does what you want:
[dummy,idx]=gsort(a(:,2),"g","d");
a=a(idx,:);

How can you improve computation time when predicting KNN Imputation?

I feel like my run time is extremely slow for my data set, this is the code:
library(caret)
library(data.table)
knnImputeValues <- preProcess(mainData[trainingRows, imputeColumns], method = c("zv", "knnImpute"))
knnTransformed <- predict(knnImputeValues, mainData[ 1:1000, imputeColumns])
the PreProcess into knnImputeValues run's fairly quickly, however the predict function takes a tremendous amount of time. When I calculated it on a subset of the data this was the result:
testtime <- system.time(knnTransformed <- predict(knnImputeValues, mainData[ 1:15000, imputeColumns
testtime
user 969.78
system 38.70
elapsed 1010.72
Additionally, it should be noted that caret preprocess uses "RANN".
Now my full dataset is:
str(mainData[ , imputeColumns])
'data.frame': 1809032 obs. of 16 variables:
$ V1: int 3 5 5 4 4 4 3 4 3 3 ...
$ V2: Factor w/ 3 levels "1000000","1500000",..: 1 1 3 1 1 1 1 3 1 1 ...
$ V3: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ V4: int 2 5 5 12 4 5 11 8 7 8 ...
$ V5: int 2 0 0 2 0 0 1 3 2 8 ...
$ V6: int 648 489 489 472 472 472 497 642 696 696 ...
$ V7: Factor w/ 4 levels "","N","U","Y": 4 1 1 1 1 1 1 1 1 1 ...
$ V8: int 0 0 0 0 0 0 0 1 1 1 ...
$ V9: num 0 0 0 0 0 ...
$ V10: Factor w/ 56 levels "1","2","3","4",..: 45 19 19 19 19 19 19 46 46 46 ...
$ V11: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ V12: num 2 5 5 12 4 5 11 8 7 8 ...
$ V13: num 2 0 0 2 0 0 1 3 2 8 ...
$ V14: Factor w/ 4 levels "1","2","3","4": 2 2 2 2 2 2 2 2 3 3 ...
$ V15: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 2 2 ...
$ V16: num 657 756 756 756 756 ...
So is there something I'm doing wrong, or is this typical for how long it will take to run this? If you back of the envelop extrapolate (which I know isn't entire accurate) you'd get what 33 days?
Also it looks like system time is very low and user time is very high, is that normal?
My computer is a laptop, with a Intel(R) Core(TM) i5-6300U CPU # 2.40Ghz processor.
Additionally would this improve the runtime of the predict function?
cl <- makeCluster(4)
registerDoParallel()
I tried it, and it didn't seem to make a difference other than all the processors looked more active in my task manager.
FOCUSED QUESTION: I'm using Caret package to do KNN Imputation on 1.8 Million Rows, the way I'm currently doing it will take over a month to run, how do I write this in such a way that I could do it in a much faster amount of time(if possible)?
Thank you for any help provided. And the answer might very well be "that's how long it takes don't bother" I just want to rule out any possible mistakes.
You can speed this up via the imputation package and use of canopies which can be installed from Github:
Sys.setenv("PKG_CXXFLAGS"="-std=c++0x")
devtools::install_github("alexwhitworth/imputation")
Canopies use a cheap distance metric--in this case distance from the data mean vector--to get approximate neighbors. In general, we wish to keep the canopies each sized < 100k so for 1.8M rows, we'll use 20 canopies:
library("imputation")
to_impute <- mainData[trainingRows, imputeColumns] ## OP undefined
imputed <- kNN_impute(to_impute, k= 10, q= 2, verbose= TRUE,
parallel= TRUE, n_canopies= 20)
NOTE:
The imputation package requires numeric data inputs. You have several factor variables in your str output. They will cause this to fail.
You'll also get some mean vector imputation if you have fulling missing rows.
# note this example data is too small for canopies to be useful
# meant solely to illustrate
set.seed(2143L)
x1 <- matrix(rnorm(1000), 100, 10)
x1[sample(1:1000, size= 50, replace= FALSE)] <- NA
x_imp <- kNN_impute(x1, k=5, q=2, n_canopies= 10)
sum(is.na(x_imp[[1]])) # 0
# with fully missing rows
x2 <- x1; x2[5,] <- NA
x_imp <- kNN_impute(x2, k=5, q=2, n_canopies= 10)
[1] "Computing canopies kNN solution provided within canopies"
[1] "Canopies complete... calculating kNN."
row(s) 1 are entirely missing.
These row(s)' values will be imputed to column means.
Warning message:
In FUN(X[[i]], ...) :
Rows with entirely missing values imputed to column means.

TiBCO Spotfire - How to Calculate only the last 3 columns in a Data - see descr

Week Sales
1 100
2 250
3 350
4 145
5 987
6 26
7 32
8 156
I wanted to calculate the sales only for the last 3 weeks so the total will be 156+32+26.
If new weeks are added it should automatically calculate only the data from the last 3 rows.
Tried this formula but it is returning an incorrect sum
sum(sales) over (lastperiod(3(week))
https://i.stack.imgur.com/6Y7h7.jpg
If you want only the last 3 weeks sum in calculated column you can use a simple if calculation.
If([week]>(Max([week]) - 3),Sum([sales]),0)
If you need 3 weeks calculation throughout table use below one.
sum([sales]) OVER (LastPeriods(3,[week]))

Vertical white lines when plotting heatmap in TIFF

When I plot a matrix with the image function as a TIFF file, I often get vertical or horizontal lines.
My matrix is of 150000 rows x 2000 columns, the lines also appears when plotting matrices of 150000 rows x 100 columns. The results are the same.
Where do the lines come from? Is this some sort of pixelated artifact? I get them almost all the time.
The matrix looks like this:
V999 V1000 V1001 V1002 V1003 V1004 V1005 V1006 V1007 V1008 V1009 V1010
[1,] 1 4 0 0 15 15 15 15 8 0 1 0
[2,] 0 3 12 5 15 15 15 1 15 4 0 2
[3,] 0 0 0 3 6 15 15 15 15 15 0 3
[4,] 3 6 15 15 15 15 15 0 3 15 15 2
[5,] 15 15 15 0 3 15 15 2 1 5 8 11
[6,] 2 1 5 8 11 15 15 15 0 0 4 3
tiff("test.tiff", width=450, height=1100)
image(t(mc), col = col1, main="950-1500"
dev.off()
Any hints/comments will be much appreciated.
You're seeing an aliasing artifact from the x11() display. You can try dragging the window to make it bigger or smaller and eventually you'll find a window size height and width that is compatible with your desired resolution.

Resources