Performing a calculation on several data frames with a for loop - for-loop

I have a group dataframes I want to create a for loop for that will perform a calculation on all of them without having to manually enter the name of the dataframe each time.
example:
df1
df2
df3
#first I try to create a list of the dataframe names to iterate through
dflist <- list(c(df1, df2, df3))
Then I attempt to iterate through it including the calculation. Simplified version here:
for (i in 1:length(dflist)) {
x <- dflist[i]$columnone[1] %>%
y <- dflist[i]$columntwo[1] %>%
z <- mean(dflist[i]$columnthree) %>%
paste0("result_",i) <- x-y/z
}
I keep being told that z cannot be found.
What am I doing wrong?
(the paste0 line at the end is meant to store the result for each dataframe as its own new variable but is not the focus of the question)

Related

Tidyeval create new variable from a paste statement

I want a custom function to take a number, then paste a prefix in front of the number to make it a variable name, then do operations with said variable name, that already exists in the data (not a newly created variable).
This is a weird hypothetical example, but in the data I'm working with I have to do some recoding where variable names include numbers and there are sequential patterns.
mtcars_revid <- mtcars %>% mutate(blah1234=drat)
test_func <- function(data,initial_var_num,var_name) {
main_var <- paste0("blah",initial_var_num)
data %>%
mutate({{var_name}}:=main_var,
{{var_name}}:=ifelse({{var_name}}<999998,{{var_name}},NA))
}
mtcars_revid %>%
test_func(1234,new_variable_name) %>%
summarize(test_var_mean=mean(new_variable_name),
correct_mean=mean(blah1234))

Convert each multilinestring to one linestring only

In this shapefile, the geometry column is linestring apart from 4 stream reaches (8168547, 8171738, 8170616 ,8169920) that are multilinestring.
I need to convert each multilinestring to one linestring only .
I have tried many things but none worked. For example, I tried st_cast in sf package in R. However, it increased the number of the rows (it converts each multilinestring to several linestrings).
How can I convert each multilinestring to one linestring only?
in geopandas, this can be done with explode:
import geopandas as gpd
gdf = gpd.read_file(filepath)
exploded = gdf.explode()
The {sf} way of converting multilinestrings to linestrings would be, as you mention, via sf::st_cast().
But there is a problem with your data - some of the streams are not possible to make into simple linestrings. A linestring must have a single start and a single end point - this is simply not possible for some of your rchids. As a result some of your objects end up being duplicated.
As this is a general failure - and not a R specific one - I would expect the comment to be valid also for geopandas, although I have not ran the code to verify.
I suggest first casting your object to linestrings, then identifying duplicites and filtering them out.
library(sf)
library(dplyr)
streams <- st_read("tukituki_rivStrah3.shp") %>%
select(-length) %>% # filtering out, as length is a derived metric
st_cast("LINESTRING")
duplicities <- streams %>%
st_drop_geometry() %>%
group_by(rchid) %>%
tally %>%
filter(n > 1) %>%
pull(rchid)
# this will not do...
mapview::mapview(streams[streams$rchid == duplicities[2],])
clean_streams <- streams %>%
filter(!rchid %in% duplicities)

Error: requires numeric/complex matrix/vector arguments for %*%; cross validating glmmTMB model

I am adapting some k-fold cross validation code written for glmer/merMod models to a glmmTMB model framework. All seems well until I try and use the output from the model(s) fit with training data to predict and exponentiate values into a matrix (to then break into quantiles/number of bins to assess predictive performance). I can get get this line to work using glmer models, but it seems when I run the same model using glmmTMB I get Error in model.matrix: requires numeric/complex matrix/vector arguments There are many other posts out there discussing this error code and I have tried converting the data frame into matrix form and changing the class of the covariates with no luck. Separately running the parts before and after the %*% works but when combined I get the error. For context, this code is intended to be run with use/availability data so the example variables may not make sense, but the problem gets shown well enough. Any suggestions as to what is going on?
library(lme4)
library(glmmTMB)
# Example with mtcars dataset
data(mtcars)
# Model both with glmmTMB and lme4
m1 <- glmmTMB(am ~ mpg + wt + (1|carb), family = poisson, data=mtcars)
m2 <- glmer(am ~ mpg + wt + (1|carb), family = poisson, data=mtcars)
#--- K-fold code (hashed out sections are original glmer version of code where different)---
# define variables
k <- 5
mod <- m1 #m2
dt <- model.frame(mod) #data used
reg.list <- list() # initialize object to store all models used for cross validation
# finds the name of the response variable in the model dataframe
resp <- as.character(attr(terms(mod), "variables"))[attr(terms(mod), "response") + 1]
# define column called sets and populates it with character "train"
dt$sets <- "train"
# randomly selects a proportion of the "used"/am records (i.e. am = 1) for testing data
dt$sets[sample(which(dt[, resp] == 1), sum(dt[, resp] == 1)/k)] <- "test"
# updates the original model using only the subset of "trained" data
reg <- glmmTMB(formula(mod), data = subset(dt, sets == "train"), family=poisson,
control = glmmTMBControl(optimizer = optim, optArgs=list(method="BFGS")))
#reg <- glmer(formula(mod), data = subset(dt, sets == "train"), family=poisson,
# control = glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=2e5)))
reg.list[[i]] <- reg # store models
# uses new model created with training data (i.e. reg) to predict and exponentiate values
predall <- exp(as.numeric(model.matrix(terms(reg), dt) %*% glmmTMB::fixef(reg)))
#predall <- exp(as.numeric(model.matrix(terms(reg), dt) %*% lme4::fixef(reg)))
Without looking at the code too carefully: glmmTMB::fixef(reg) returns a list (with elements cond (conditional model parameters), zi (zero-inflation parameters), disp (dispersion parameters) rather than a vector.
If you replace this bit with glmmTMB::fixef(reg)[["cond"]] it will probably work.

Quantstrat applystrategy incorrect dimensions trying to work with manual mktdata OHCLV data vs getSymbols

I apologize for not having a working example atm
All I really need is a sample format for how to load multiple symbols from a csv
The function call says
https://www.rdocumentation.org/packages/quantstrat/versions/0.16.7/topics/applyStrategy
mktdata
"an xts object containing market data. depending on indicators, may need to be in OHLCV or BBO formats, default NULL"
The reason I don't wish to use getSymbols is because I do some preprocessing and load the data from csv's because my internet is shoddy. I do download data, but about once a week. My preprocess produces different symbols from a subset of 400 symbols based on the time periods I scan. I'm trying to frontload all my download processing, and no matter what I try, I can't get it to load from either a dataframe or an xts object. Right now I'm converting from csv to dataframe to xts and attempting to load.
I have noticed my xts objects differ from the getSymbols (error about incorrect dimensions). Specifically if I call colnames. Mine will say none, where as getSymbols subelements list 6 columns.
Anyways. What I would like to do, is see a minimal example of loading custom OHCLV data from a csv into an xts that can be supplied as an object to mktdata = in the applyStrategy call. That way I can format my code to match
I have the code to load and create the xts object from a dataframe.
#loads from a dataframe which includes Symbol, Date, Open, High, Low, Close, Volume, Adjusted
tempData <- symbol_data_set[symbol_data_set$Symbol %in% symbolstring & symbol_data_set$Date >= startDate & symbol_data_set$Date<=endDate,]
#creates a list of xts
vectorXTS <- mclapply(symbolstring,function(x)
{
df <- symbol_data_set[symbol_data_set$Symbol==x & symbol_data_set$Date >= startDate & symbol_data_set$Date<=endDate,]
#temp <- as.xts(
temp <- cbind(as.data.frame(df[,2]),as.data.frame(df[,-1:-2]))
rownames(df) <- df$Date
#,order.by=as.POSIXct(df$Date),)
z <- read.zoo(temp, index = 1, col.names=TRUE, header = TRUE)
#sets names to Symbol.Open ...
colnames(z) <- c(paste0(symbolstring[x],".Open"),paste0(symbolstring[x],".High"),paste0(symbolstring[x],".Low"),paste0(symbolstring[x],".Close"),paste0(symbolstring[x],".Volume"),paste0(symbolstring[x],".Adjusted"))
return(as.xts(z, match.to=AAPL))
#colnames(as.xts(z))
})
names(symbolstring) <- symbolstring
names(vectorXTS) <- symbolstring
for(i in symbolstring) assign(symbolstring[i],vectorXTS[i])
colnames(tempData) <- c(paste0(x,".Symbol"),paste0(x,".Date"),paste0(x,".Open"),paste0(x,".High"),paste0(x,".Low"),paste0(x,".Close"),paste0(x,".Volume"),paste0(x,".Adjusted"))
head(tempData)
rownames(tempData) <- tempData$Date
#attempts to use this xts object I created
results <- applyStrategy(strategy= strategyName, portfolios = portfolioName,symbols=symbolstring,mktdata)
error
Error in mktdata[, keep] : incorrect number of dimensions
This is how you store an xts getSymbols object in a file and reload it for use for quantStrat's applyStrategy (two methods shown, the read.xts method is the ideal as you can see how the csv's are stored)
getSymbols("AAPL",from=startDate,to=endDate,adjust=TRUE,src='yahoo',auto.assign = TRUE)
saveRDS(AAPL, file= 'stuff.Rdata')
AAPL <- readRDS(file= 'stuff.Rdata')
write.zoo(AAPL,file="zoo.csv", index.name = "Date", row.names=FALSE)
rm(AAPL)
AAPL <- as.xts(read.zoo(file="zoo.csv",header = TRUE))
If you want to work with multiple symbols, I had this work.
Note initially I had a reference to the 1st element, i.e. vectorXTS[[1]], and it worked
Note: at least setting it up like this got it to run...
vectorXTS <- mclapply(symbolstring,function(x)
{
df <- symbol_data_set[symbol_data_set$Symbol==x & symbol_data_set$Date >= startDate & symbol_data_set$Date<=endDate,]
temp <- cbind(as.data.frame(df[,2]),as.data.frame(df[,-1:-2]))
rownames(df) <- df$Date
z <- read.zoo(temp, index = 1, col.names=TRUE, header = TRUE)
colnames(z) <- c(paste0(x,".Open"),paste0(x,".High"),paste0(x,".Low"),paste0(x,".Close"),paste0(x,".Volume"),paste0(x,".Adjusted"))
write.zoo(z,file=paste0(x,"zoo.csv"), index.name = "Date", row.names=FALSE)
return(as.xts(read.zoo(file=paste0(x,"zoo.csv"),header = TRUE)))
})
names(vectorXTS) <- symbolstring
#this will assign to memory vs vectorXTS if one wishes to avoid using mktdata = vectorXTS[[]]
for(i in symbolstring) assign(i,vectorXTS[[i]])
results <- applyStrategy(strategy= strategyName, portfolios = portfolioName,symbols=symbolstring, mktdata = vectorXTS[[]])
#alternatively
#results <- applyStrategy(strategy= strategyName, portfolios = portfolioName,symbols=symbolstring)

How to save Julia for loop returns in an array or dataframe?

I am trying to apply a function over each row of a DataFrame as the code shows.
using RDatasets
iris = dataset("datasets", "iris")
function mean_n_var(x)
mean1=mean([x[1], x[2], x[3], x[4]])
var1=var([x[1], x[2], x[3], x[4]])
rst=[mean1, var1]
return rst
end
mean_n_var([2,4,5,6])
for row in eachrow(iris[1:4])
println(mean_n_var(convert(Array, row)))
end
However, instead of printing results, I'd like to save them in an array or another DataFrame.
Thanks in advance.
I thought it is worth to mention some more options available over what was already mentioned.
I assume you want a Matrix or a DataFrame. There are several possible approaches.
First is the most direct to get a Matrix:
mean_n_var(a) = [mean(a), var(a)]
hcat((mean_n_var(Array(x)) for x in eachrow(iris[1:4]))...) # rows
vcat((mean_n_var(Array(x)).' for x in eachrow(iris[1:4]))...) # cols
another possible approach is vectorized, e.g.:
mat_iris = Matrix(iris[1:4])
mat = hcat(mean(mat_iris, 2), var(mat_iris, 2))
df = DataFrame([vec(f(mat_iris, 2)) for f in [mean,var]], [:mean, :var])
DataFrame(mat) # this constructor also accepts variable names on master but is not released yet

Resources