Clustering of Items in a UI-Matrix with package skmeans - matrix

i've got a specific question on the package "skmeans".
I considered the movielens100k dataset which involves "u.data" which is a dataset of four columns in the following order "User","Item","Rating" and "Timestamp". I've implemented the following code:
UI_ratings_raw <- scan(file="u1.base",what=list(user=0,movie=0,rating=0),flush=TRUE)
UI_ratings_sparse <- sparseMatrix(UI_ratings_raw$user,UI_ratings_raw$movie,x=UI_ratings_raw$rating,dims =c(943,1682)) #Eintrag aus Forum github siehe R Datei Matrixreduzierung
UI_ratings_sparse_dgT <- as(UI_ratings_sparse,"dgTMatrix")
install.packages("skmeans")
library(skmeans)
install.packages("cluster")
library(cluster)
UI_ratings_sparse_clust_sk <- skmeans(UI_ratings_sparse_dgT,20,control = list(verbose=TRUE))
summary(silhouette(UI_ratings_sparse_clust_sk))
Clustering performed very well but only on the users side. Is there any possibility to change the code in that way that i'm able to compute Cluster for the Items?

Related

PyTorch Lightning - How can I output a summary of training to the console

PyTorch Lighting can log to TensorBoard. How can I make it log to the console a table summarizing the training runs (similar to Huggingface's Transformers, shown below):
Epoch Training Loss Validation Loss Runtime Samples Per Second
1 1.220600 1.160322 39.574900 272.496000
2 0.945200 1.121690 39.706000 271.596000
3 0.773000 1.157358 39.734000 271.405000
You can write your own callback function and add it into the trainer.
from pytorch_lightning.utilities import rank_zero_info
class LoggingCallback(pl.Callback):
def on_validation_end(self, trainer: pl.Trainer, pl_module: pl.LightningModule):
rank_zero_info("***** Test results *****")
metrics = trainer.callback_metrics
# Log results
for key in sorted(metrics):
if key not in ["log", "progress_bar"]:
rank_zero_info("{} = {}\n".format(key, str(metrics[key])))

Error message: Cube::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD using sommer package for GWAS

Output from sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sommer_4.1.4 crayon_1.4.1 lattice_0.20-41 MASS_7.3-53.1 Matrix_1.3-2 data.table_1.14.0
loaded via a namespace (and not attached):
[1] compiler_4.0.5 tools_4.0.5 rstudioapi_0.13 Rcpp_1.0.6 grid_4.0.5
I have been trying to carry out a gwas using the sommer package with the following code:
var_cov <- A.mat(m_matrix) ## aditive relationship matrix
model <- GWAS(cbind(DW20, PLA07, PLA08, PLA09, PLA10, PLA11, PLA12, PLA13, PLA14, PLA15, PLA16, PLA17, PLA18, RGR07_09, RGR08_10, RGR09_11, RGR10_12, RGR11_13, RGR12_14, RGR13_15, RGR14_16, RGR15_17, RGR16_18, SA, SL, SW) ~ 1, random = ~ vs(accession, Gu = var_cov), data = pheno2, M = m_matrix, gTerm = "u:accession", n.PC = 5)
As described in the code, I have 26 traits and I would like to use the K+P model. My SNPs matrix has
211 260 markers and 309 accessions.
When I run this code for one and two traits, it works fine. But, when I try to run with all the 26 traits I get the error message:
Error in GWAS(cbind(DW20, PLA07, PLA08, PLA09, PLA10, PLA11, PLA12, PLA13, :
Cube::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD
I searched online and found that this error is related to the package RcppArmadillo.
Following the suggestions here (http://arma.sourceforge.net/docs.html#config_hpp_arma_64bit_word) and
here (Large Matrices in RcppArmadillo via the ARMA_64BIT_WORD define), I tried to enable the ARMA_64BIT_WORD by uncommenting the line #define ARMA_64BIT_WORD (bellow) in the file RcppArmadillo\include\armadillo_bits\config.hpp:
#if !defined(ARMA_64BIT_WORD)
//#define ARMA_64BIT_WORD
//// Uncomment the above line if you require matrices/vectors capable of holding more than 4 billion elements.
//// Note that ARMA_64BIT_WORD is automatically enabled when std::size_t has 64 bits and ARMA_32BIT_WORD is not defined.
#endiff
and also including the following line in the file Makevars.win in RcppArmadillo\skeleton.
PKG_CPPFLAGS = -DARMA_64BIT_WORD=1
None of the suggestions worked and I continue getting the same error message. My questions are: is there another option to enable the ARMA_64BIT_WORD that I am missing? Is it possible to run the GWAS function in sommer package with as many traits as 26 or this number is too much? Would the error message result from a mistake in the GWAS code?
Thank you very much in advance.
My first take Ana is that you're trying to fit an unstructured multivariate model with 26 traits when you use cbind(), that means that if you have 1000 records, this will be a model of 309 x 26 = 8,034 records which would be a bit too big for the direct inversion algorithm that sommer uses, plus the number of parameters to estimate are a lot (think all the covariance parameters (26*25)/2 = 325. I would suggest fitting a GWAS per trait in a for loop to solve your issue. Unless you have a good justification to run a multivariate GWAS this is the issue with your analysis more than the C++ code behind. For example:
var_cov <- A.mat(m_matrix) ## aditive relationship matrix
traits <- c(DW20, PLA07, PLA08, PLA09, PLA10, PLA11, PLA12, PLA13, PLA14, PLA15, PLA16, PLA17, PLA18, RGR07_09, RGR08_10, RGR09_11, RGR10_12, RGR11_13, RGR12_14, RGR13_15, RGR14_16, RGR15_17, RGR16_18, SA, SL, SW)
for(itrait in traits){
model <- GWAS(as.formula(paste(itrait,"~1")), random = ~ vs(accession, Gu = var_cov), data = pheno2, M = m_matrix, gTerm = "u:accession", n.PC = 5)
}
If it turns out that even with a single trait the arma::cube function presents memory issues then definitely we need to look at why the armadillo library cannot deal with those dimensions.
Cheers,
Eduardo

I met a error warning when run regression model for a panel using plm package

I have a panel for 27 years, but met this warning when I run a regression.
panel data of global suicide rate with temperature
I use the following codes:
library(plm)
install.packages("dummies")
library(dummies)
data2 <- cbind(mydata, dummy(mydata$year, sep ="_"))
suicide_fe <- plm(suiciderate ~ dmt, data2, index = c("country", "year"),
model= "within")
summary(suicide_fe)
But I got this error:
Error in pdim.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
In addition: Warning messages:
1: In pdata.frame(data, index) :
duplicate couples (id-time) in resulting pdata.frame to find out which, use
e.g. table(index(your_pdataframe), useNA = "ifany") 2: In
is.pbalanced.default(index[[1]], index[[2]]) :
duplicate couples (id-time)

VIF values in R

I got a question: Someone have run the corvif function with the code HighstatLibV10.R available in the page http://www.highstat.com/index.php/mixed-effects-models-and-extensions-in-ecology-with-r? I can't get the VIF values because the output gives me this error:
Error in myvif(lm_mod) : object 'tmp_cor' not found!
I have 6 physical variables and I'm looking for collinearity among variables. Any help more than welcome!
If working with the corvif() is not of utmost importance you can use the vif() in the R package 'car' to get VIF values for your linear models.
So tmp_cor is an object that is supposed to be created in corvif
tmp_cor is created using the cor function (in the base stats package that comes with R install) via: tmp_cor <- cor(dataz,use="complete.obs").
However, I noticed that with both v1 and v10 of Zurr et al's HighstatLib.R code this error occurs:
Error in myvif(lm_mod) : object 'tmp_cor' not found!
First I checked V10:
It seems that the "final" version of corvif created when sourcing HighstatLibV10.R actually neglects to create tmp_cor at all!
> print(corvif)
function(dataz) {
dataz <- as.data.frame(dataz)
#vif part
form <- formula(paste("fooy ~ ",paste(strsplit(names(dataz)," "),collapse=" + ")))
dataz <- data.frame(fooy=1 + rnorm(nrow(dataz)) ,dataz)
lm_mod <- lm(form,dataz)
cat("\n\nVariance inflation factors\n\n")
print(myvif(lm_mod))
}
But, I noticed that the error in the OP's post also occurred when using V1 (i.e., HighstatLib.R associated with Zuur et al 2010). Although the code file creates 2 versions of corvif, they (and especially the latter of the two which would supercede the first) include a line to create tmp_cor:
corvif <- function(dataz) {
dataz <- as.data.frame(dataz)
#correlation part
cat("Correlations of the variables\n\n")
tmp_cor <- cor(dataz,use="complete.obs")
print(tmp_cor)
#vif part
form <- formula(paste("fooy ~ ",paste(strsplit(names(dataz)," "),collapse=" + ")))
dataz <- data.frame(fooy=1,dataz)
lm_mod <- lm(form,dataz)
cat("\n\nVariance inflation factors\n\n")
print(myvif(lm_mod))
}
So even though the code for corvif creates tmp_cor in the V1 code file, it appears that the helper function myvif (which actually uses the tmp_cor object) is not accessing it.
This suggests that we have a scoping problem...
Sure enough, if I just quickly change the tmp_cor line to create a global object, the code works fine:
tmp_cor <<- cor(dataz,use="complete.obs")
Specifically:
corvif <- function(dataz) {
dataz <- as.data.frame(dataz)
#correlation part
cat("Correlations of the variables\n\n")
tmp_cor <<- cor(dataz,use="complete.obs")
print(tmp_cor)
#vif part
form <- formula(paste("fooy ~ ",paste(strsplit(names(dataz)," "),collapse=" + ")))
dataz <- data.frame(fooy=1,dataz)
lm_mod <- lm(form,dataz)
cat("\n\nVariance inflation factors\n\n")
print(myvif(lm_mod))
}
A more complete "fix" could be done by manipulating environments.

error in running auto.arima function

I am downloading some stock's daily close data using quantmod package:
library(quantmod)
library(dygraphs)
library(forecast)
date <- as.Date("2014-11-01")
getSymbols("SBIN.BO",from = date )
close <- SBIN.BO[, 4]
dygraph(close)
dat <- data.frame(date = index(SBIN.BO),SBIN.BO)
acf1 <- acf(close)
When I tried to execute auto arima function from forecast package:
fit <- auto.arima(close, seasonal=FALSE, xreg=fourier(close, K=4))
I encountered the following error:
Error in ...fourier(x, K, 1:length(x)) :
K must be not be greater than period/2
So I want to know why there is this error? Did I do any mistake in writing code, which based upon tutorials available on Rob's website/blogs...

Resources