openblas sgemv CblasRowMajor implementation returns wrong results (cblas_sgemv) - openblas

I did some test using cblas_sgemv in openblas and found that it returned a wrong result in my test case.
A is
1 2
3 4
5 6
B is
1 2
The output C should be 5 11 17
But, it outputs 5 14 0
Here is the sample code.
https://docs.google.com/document/d/15mCkfcQuruQxi4CjvVkoK2jfgnG2w3izd0wMFMW6UOk/edit?usp=sharing

the lda parameter seems to be wrong. since the order is CblasRowMajor it should be 2 (number of columns) instead of 3 (number of rows).
cf. https://stackoverflow.com/a/30208420/6058571

Related

KDB: set difference

How can one find set (data structure) difference in KDB?
Example:
a: 1 2 3 4
b: 2 3
expected result: 1 4. Simple guesses like a-b or a _ b do not work.
Thank you very much for your help!
You're after the keyword except
q)a:1 2 3 4;b:2 3
q)a except b
1 4
Keyword you are looking for is except
except[b;a],except[a;b]
https://code.kx.com/q/ref/except/
Edit: fyi 1 except will not cover all differences if b has values not in a:
q)b,:10
q)except[a;b]
1 4
q)except[a;b],except[b;a]
1 4 10

Ranking over each matrix column's sort in julia

I have a matrix (m) of scores for 4 students on 3 different exams.
4 3 1
3 2 5
8 4 6
1 5 2
I want to know, for each student, the exams they did best to worse on. Desired output:
1 2 3
2 3 1
1 3 2
3 1 2
Now, I'm new to the language (and coding in general), so I read GeeksforGeeks' page on sorting in Julia and tried
mapslices(sortperm, -m; dims = 2)
However, this gives something subtly different: a matrix of each row being the index of the sorting.
1 2 3
3 1 2
1 3 2
2 3 1
Perhaps it was obvious, but I now realize this is not actually what I want, but I cannot find a built-in function/fast way to complete this operation. Any ideas? Preferably something which doesn't iterate through items in the matrix/row, as in reality my matrix is very, very large. Thanks!
Such functionality is provided by StatsBase.jl. Here is an example:
julia> using StatsBase
julia> m = [4 3 1
3 2 5
8 4 6
1 5 2]
4×3 Array{Int64,2}:
4 3 1
3 2 5
8 4 6
1 5 2
julia> mapslices(x -> ordinalrank(x, rev=true), m, dims = 2)
4×3 Array{Int64,2}:
1 2 3
2 3 1
1 3 2
3 1 2
You might want to use other rank, depending on how you want to split ties, see here for details.
Figured out something which works!
Run m_index_rank = mapslices(sortperm, -m; dims = 2) on the matrix and get a ranking for each row through index. Then, realizing this is, in each row, an inverse permutation away from the desired output, run mapslices(invperm, m_index_rank; dims = 2) for the desired result.
In one line, this is mapslices(r -> invperm(sortperm(r, rev=true)), m; dims=2) over the desired matrix m. dims = 2 is to carry out the operation row-wise.
I'm marking this resolved for now, but please let me know if there are cleaner/faster ways to do this.
Edit: Replaced my syntactically clunky mapslices(invperm, mapslices(sortperm, -m; dims = 2); dims = 2) with a more natural one, thanks to #phipsgabler

Subsetting multiple variables by rank within data-frame

Hopefully the below makes sense.
I have a data set a large number of variables (row). Within each variable are sections that are scored 1-15. I need to subset the dataframe based on the three highest scoring sections for each variable. Each section has additional data associated with it that would be needed, but is not required as part of the selection.
Having trouble with this. Any help is appreciated.
Dummy layout below
Variable Aux_score
1 1
1 6
1 3
1 8
1 10
2 3
2 2
2 12
2 10
2 11
3 7
3 2
3 9
3 8
3 12
You can do it like this with base r:
do.call(rbind, lapply(split(df, df$Variable), function(df) df[ tail(order(df$Aux_score), 3), ]))
Or like this with tidyverse:
df %>% group_by(Variable) %>% top_n(3, Aux_score) %>% ungroup()

How can you improve computation time when predicting KNN Imputation?

I feel like my run time is extremely slow for my data set, this is the code:
library(caret)
library(data.table)
knnImputeValues <- preProcess(mainData[trainingRows, imputeColumns], method = c("zv", "knnImpute"))
knnTransformed <- predict(knnImputeValues, mainData[ 1:1000, imputeColumns])
the PreProcess into knnImputeValues run's fairly quickly, however the predict function takes a tremendous amount of time. When I calculated it on a subset of the data this was the result:
testtime <- system.time(knnTransformed <- predict(knnImputeValues, mainData[ 1:15000, imputeColumns
testtime
user 969.78
system 38.70
elapsed 1010.72
Additionally, it should be noted that caret preprocess uses "RANN".
Now my full dataset is:
str(mainData[ , imputeColumns])
'data.frame': 1809032 obs. of 16 variables:
$ V1: int 3 5 5 4 4 4 3 4 3 3 ...
$ V2: Factor w/ 3 levels "1000000","1500000",..: 1 1 3 1 1 1 1 3 1 1 ...
$ V3: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ V4: int 2 5 5 12 4 5 11 8 7 8 ...
$ V5: int 2 0 0 2 0 0 1 3 2 8 ...
$ V6: int 648 489 489 472 472 472 497 642 696 696 ...
$ V7: Factor w/ 4 levels "","N","U","Y": 4 1 1 1 1 1 1 1 1 1 ...
$ V8: int 0 0 0 0 0 0 0 1 1 1 ...
$ V9: num 0 0 0 0 0 ...
$ V10: Factor w/ 56 levels "1","2","3","4",..: 45 19 19 19 19 19 19 46 46 46 ...
$ V11: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ V12: num 2 5 5 12 4 5 11 8 7 8 ...
$ V13: num 2 0 0 2 0 0 1 3 2 8 ...
$ V14: Factor w/ 4 levels "1","2","3","4": 2 2 2 2 2 2 2 2 3 3 ...
$ V15: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 2 2 ...
$ V16: num 657 756 756 756 756 ...
So is there something I'm doing wrong, or is this typical for how long it will take to run this? If you back of the envelop extrapolate (which I know isn't entire accurate) you'd get what 33 days?
Also it looks like system time is very low and user time is very high, is that normal?
My computer is a laptop, with a Intel(R) Core(TM) i5-6300U CPU # 2.40Ghz processor.
Additionally would this improve the runtime of the predict function?
cl <- makeCluster(4)
registerDoParallel()
I tried it, and it didn't seem to make a difference other than all the processors looked more active in my task manager.
FOCUSED QUESTION: I'm using Caret package to do KNN Imputation on 1.8 Million Rows, the way I'm currently doing it will take over a month to run, how do I write this in such a way that I could do it in a much faster amount of time(if possible)?
Thank you for any help provided. And the answer might very well be "that's how long it takes don't bother" I just want to rule out any possible mistakes.
You can speed this up via the imputation package and use of canopies which can be installed from Github:
Sys.setenv("PKG_CXXFLAGS"="-std=c++0x")
devtools::install_github("alexwhitworth/imputation")
Canopies use a cheap distance metric--in this case distance from the data mean vector--to get approximate neighbors. In general, we wish to keep the canopies each sized < 100k so for 1.8M rows, we'll use 20 canopies:
library("imputation")
to_impute <- mainData[trainingRows, imputeColumns] ## OP undefined
imputed <- kNN_impute(to_impute, k= 10, q= 2, verbose= TRUE,
parallel= TRUE, n_canopies= 20)
NOTE:
The imputation package requires numeric data inputs. You have several factor variables in your str output. They will cause this to fail.
You'll also get some mean vector imputation if you have fulling missing rows.
# note this example data is too small for canopies to be useful
# meant solely to illustrate
set.seed(2143L)
x1 <- matrix(rnorm(1000), 100, 10)
x1[sample(1:1000, size= 50, replace= FALSE)] <- NA
x_imp <- kNN_impute(x1, k=5, q=2, n_canopies= 10)
sum(is.na(x_imp[[1]])) # 0
# with fully missing rows
x2 <- x1; x2[5,] <- NA
x_imp <- kNN_impute(x2, k=5, q=2, n_canopies= 10)
[1] "Computing canopies kNN solution provided within canopies"
[1] "Canopies complete... calculating kNN."
row(s) 1 are entirely missing.
These row(s)' values will be imputed to column means.
Warning message:
In FUN(X[[i]], ...) :
Rows with entirely missing values imputed to column means.

How can I subtract columns from a 2D array of JULIA?

I'm new to julia and I have a problem.
I am working with Julia (Jupyter notebook) and I do not know how can I do column 3 - column 2 and write the result as a new column at the end of the matrix/array2D.
I have tried this:
newCol = array[(1:end),3] - array[(1:end),2]
Any suggestion?
You can subtract the two columns and then concatenate it with the original array using the normal build-an-array syntax:
julia> arr
2x3 Array{Int32,2}:
1 2 3
5 6 7
julia> [arr [arr[:,3] - arr[:,2]]]
2x4 Array{Int32,2}:
1 2 3 1
5 6 7 1
Or use hcat:
julia> hcat(arr,arr[:,3] - arr[:,2])
2x4 Array{Int32,2}:
1 2 3 1
5 6 7 1
(Note that neither of these act in place, so you'd need to assign the result somewhere if you want to use it later.)

Resources