I am downloading some stock's daily close data using quantmod package:
library(quantmod)
library(dygraphs)
library(forecast)
date <- as.Date("2014-11-01")
getSymbols("SBIN.BO",from = date )
close <- SBIN.BO[, 4]
dygraph(close)
dat <- data.frame(date = index(SBIN.BO),SBIN.BO)
acf1 <- acf(close)
When I tried to execute auto arima function from forecast package:
fit <- auto.arima(close, seasonal=FALSE, xreg=fourier(close, K=4))
I encountered the following error:
Error in ...fourier(x, K, 1:length(x)) :
K must be not be greater than period/2
So I want to know why there is this error? Did I do any mistake in writing code, which based upon tutorials available on Rob's website/blogs...
Related
[![enter image description here][1]][1]I am running the below code in Rstudio
datas <- BNDataset("C:/Users/.../2.csv", "C:/Users/.../3.csv")
datase <- datas()
nets <- learn.dynamic.network(datase,num.time.steps = 2)
plot(nets)
but it gives the erorr: Error in plot(net) : object 'net' not found
why it happens?
s <- cbind(c(0.017979678, 0.011345375 ,0.014793026, 0.010626496, 0.01597338, 0.012467991 ,0.01597338, 0.012725869, 0.011443908, 0.011985384
+ ),c(0.018303076, 0.011264264, 0.015559947, 0.01080083, 0.016515615, 0.012609419,0.016515615, 0.013153442 ,0.011887617, 0.012681979
+ ))
It's difficult to say whether this is the problem, but with
datas <- BNDataset("C:/Users/.../2.csv", "C:/Users/.../3.csv")
you are creating a BNDataset object called datas, but
datase <- datas()
you are calling the function datas (which does not exist in bnstruct).
I have a panel for 27 years, but met this warning when I run a regression.
panel data of global suicide rate with temperature
I use the following codes:
library(plm)
install.packages("dummies")
library(dummies)
data2 <- cbind(mydata, dummy(mydata$year, sep ="_"))
suicide_fe <- plm(suiciderate ~ dmt, data2, index = c("country", "year"),
model= "within")
summary(suicide_fe)
But I got this error:
Error in pdim.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
In addition: Warning messages:
1: In pdata.frame(data, index) :
duplicate couples (id-time) in resulting pdata.frame to find out which, use
e.g. table(index(your_pdataframe), useNA = "ifany") 2: In
is.pbalanced.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
I got a question: Someone have run the corvif function with the code HighstatLibV10.R available in the page http://www.highstat.com/index.php/mixed-effects-models-and-extensions-in-ecology-with-r? I can't get the VIF values because the output gives me this error:
Error in myvif(lm_mod) : object 'tmp_cor' not found!
I have 6 physical variables and I'm looking for collinearity among variables. Any help more than welcome!
If working with the corvif() is not of utmost importance you can use the vif() in the R package 'car' to get VIF values for your linear models.
So tmp_cor is an object that is supposed to be created in corvif
tmp_cor is created using the cor function (in the base stats package that comes with R install) via: tmp_cor <- cor(dataz,use="complete.obs").
However, I noticed that with both v1 and v10 of Zurr et al's HighstatLib.R code this error occurs:
Error in myvif(lm_mod) : object 'tmp_cor' not found!
First I checked V10:
It seems that the "final" version of corvif created when sourcing HighstatLibV10.R actually neglects to create tmp_cor at all!
> print(corvif)
function(dataz) {
dataz <- as.data.frame(dataz)
#vif part
form <- formula(paste("fooy ~ ",paste(strsplit(names(dataz)," "),collapse=" + ")))
dataz <- data.frame(fooy=1 + rnorm(nrow(dataz)) ,dataz)
lm_mod <- lm(form,dataz)
cat("\n\nVariance inflation factors\n\n")
print(myvif(lm_mod))
}
But, I noticed that the error in the OP's post also occurred when using V1 (i.e., HighstatLib.R associated with Zuur et al 2010). Although the code file creates 2 versions of corvif, they (and especially the latter of the two which would supercede the first) include a line to create tmp_cor:
corvif <- function(dataz) {
dataz <- as.data.frame(dataz)
#correlation part
cat("Correlations of the variables\n\n")
tmp_cor <- cor(dataz,use="complete.obs")
print(tmp_cor)
#vif part
form <- formula(paste("fooy ~ ",paste(strsplit(names(dataz)," "),collapse=" + ")))
dataz <- data.frame(fooy=1,dataz)
lm_mod <- lm(form,dataz)
cat("\n\nVariance inflation factors\n\n")
print(myvif(lm_mod))
}
So even though the code for corvif creates tmp_cor in the V1 code file, it appears that the helper function myvif (which actually uses the tmp_cor object) is not accessing it.
This suggests that we have a scoping problem...
Sure enough, if I just quickly change the tmp_cor line to create a global object, the code works fine:
tmp_cor <<- cor(dataz,use="complete.obs")
Specifically:
corvif <- function(dataz) {
dataz <- as.data.frame(dataz)
#correlation part
cat("Correlations of the variables\n\n")
tmp_cor <<- cor(dataz,use="complete.obs")
print(tmp_cor)
#vif part
form <- formula(paste("fooy ~ ",paste(strsplit(names(dataz)," "),collapse=" + ")))
dataz <- data.frame(fooy=1,dataz)
lm_mod <- lm(form,dataz)
cat("\n\nVariance inflation factors\n\n")
print(myvif(lm_mod))
}
A more complete "fix" could be done by manipulating environments.
I'm getting error while writing spark dataframe to csv and parquet. I already try to install winutil but still not solving the error.
my code
INVALID_IMEI <- c("012345678901230","000000000000000")
setwd("D:/Revas/Jatim Old")
fileList <- list.files()
cdrSchema <- structType(structField("date","string"),
structField("time","string"),
structField("a_number","string"),
structField("b_number", "string"),
structField("duration","integer"),
structField("lac_cid","string"),
structField("imei","string"))
file <- fileList[1]
filePath <- paste0("D:/Revas/Jatim Old/",file)
dataset <- read.df(filePath, header="false",source="csv",delimiter="|",schema=cdrSchema)
dataset <- filter(dataset, ifelse(dataset$imei %in% INVALID_IMEI,FALSE,TRUE))
dataset <- filter(dataset, ifelse(isnan(dataset$imei),FALSE,TRUE))
dataset <- filter(dataset, ifelse(isNull(dataset$imei),FALSE,TRUE))
To export the dataframe, i try the following code
write.df(dataset, "D:/spark/dataset",mode="overwrite")
write.parquet(dataset, "D:/spark/dataset",mode="overwrite")
And i get the following error
Error: Error in save : org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.comma
I already found the possible cause. The issue seem to lie in the winutil version, previously im using 2.6. Changing it to 2.8 seem to solve the issue
i've got a specific question on the package "skmeans".
I considered the movielens100k dataset which involves "u.data" which is a dataset of four columns in the following order "User","Item","Rating" and "Timestamp". I've implemented the following code:
UI_ratings_raw <- scan(file="u1.base",what=list(user=0,movie=0,rating=0),flush=TRUE)
UI_ratings_sparse <- sparseMatrix(UI_ratings_raw$user,UI_ratings_raw$movie,x=UI_ratings_raw$rating,dims =c(943,1682)) #Eintrag aus Forum github siehe R Datei Matrixreduzierung
UI_ratings_sparse_dgT <- as(UI_ratings_sparse,"dgTMatrix")
install.packages("skmeans")
library(skmeans)
install.packages("cluster")
library(cluster)
UI_ratings_sparse_clust_sk <- skmeans(UI_ratings_sparse_dgT,20,control = list(verbose=TRUE))
summary(silhouette(UI_ratings_sparse_clust_sk))
Clustering performed very well but only on the users side. Is there any possibility to change the code in that way that i'm able to compute Cluster for the Items?