I'm trying to create 5 new columns in my data frame by calculating the mean of specific already existing columns.
i so far tried the following code:
`data_new2 <- data1 %>%
data1$chronisch <-apply(data1[,c(22,28,31,25)],1,mean) %>%
data1$Sozial <-apply(data1[,33],1,mean) %>%
data1$überforderung <-apply(data1[,c(30,24,27)],1,mean) %>%
data1$Anerkennung <-apply(data1[,c(26,29)],1,mean) %>%
data1$Arbeit <-apply(data1[,c(23,32)],1,mean)`
by it give the error message: Fehler in apply(data1[, c(26, 29)], 1, mean) %>% data1$Arbeit <- apply(data1[, : konnte Funktion "%>%<-" nicht finden
But I cannot figure out what the problem is. And yes, deplore is installed and loaded.
Would appreciate ny help!
Related
I have a strange Error and actually don't know how to solve it, even after checking other posts. Everything runs until the Kriging and then I receive the error: Error in (function (classes, fdef, mtable) unable to find an inherited method for function ‘krige’ for signature ‘"formula", "tbl_df"’
The strange thing is that everything worked a few days ago, I did not change anything in the code and now it doesn't run anymore. Some other posts related the problem with the Raster, but I could not find any discrepances. Is there something because of recent updates? I use for example the sp package.
Unfortunately I cannot provide the data I use, hopefully it can be solved without.
How can I solve the issue? Thank you in advance for the help.
homeDir = "D:/Folder/DataXYyear/"
y = 1992
Source = paste("Year", y, ".csv")
File = file.path(homeDir,Source)
GWMeas <- read_csv(File)
GWMeasX <- na.omit(GWMeas)
ggplot(
data = GWMeasX,
mapping = aes(x = X, y = Y, color = level)
) +
geom_point(size = 3) +
scale_color_viridis(option = "B") +
theme_classic()
GWMX_sf <- st_as_sf(GWMeasX, coords = c("X", "Y"), crs = 25832) %>%
cbind(st_coordinates(.))
v_emp_OK <- gstat::variogram(
level~1,
as(GWMX_sf, "Spatial") # switch from {sf} to {sp}
)
v_mod_OK <- automap::autofitVariogram(level~1, as(GWMX_sf, "Spatial"), model = "Sph")$var_model
GWMeasX %>% as.data.frame %>% glimpse
GW.vgm <- variogram(level~1, locations = ~X+Y, data = GWMeasX) # calculates sample variogram values
GW.fit <- fit.variogram(GW.vgm, model=vgm(model = "Gau")) # fit model
sf_GWlevel <- st_as_sf(GWMeasX, coords = c("X", "Y"), crs = 25833)
grd_sf <- sf_GWlevel %>%
st_bbox() %>%
st_as_sfc() %>%
st_make_grid(
cellsize = c(5000, 5000), # 5000m pixel size
what = "centers"
) %>%
st_as_sf() %>%
cbind(., st_coordinates(.))
grid <- as(grd_sf, "Spatial")
gridded(grid) <- TRUE
grid <- as(grid, "SpatialPixels")
createGrid <- function(XY.Spacing)
crs(grid) <- crs(GWMX_sf)
OK3 <- krige(formula = level~1, # variable to interpolate
data = GWMX_sf, # gauge data
newdata = grid, # grid to interpolate on
model = v_mod_OK, # variogram model to use
nmin = 4, # minimum number of points to use for the interpolation
nmax = 20, # maximum number of points to use for the interpolation
maxdist = 120e3 # maximum distance of points to use for the interpolation
)
I tried to add significane level (package:ggpubrto)to my t_test plot (package:rstatix) and got a plot which the lines of significance are in the "pulled" to the right of the plot.
I copy the code from this link [https://www.datanovia.com/en/blog/how-to-perform-multiple-t-test-in-r-for-different-variables/][1] but still got the same plot
here is the code:
library(tidyverse)
library(rstatix)
library(ggpubr)
# Prepare the data and inspect a random sample of the data
mydata <- iris %>%
filter(Species != "setosa") %>%
as_tibble()
mydata %>% sample_n(6)
mydata.long <- mydata %>%
pivot_longer(-Species, names_to = "variables", values_to = "value")
mydata.long %>% sample_n(6)
stat.test <- mydata.long %>%
group_by(variables) %>%
t_test(value ~ Species) %>%
adjust_pvalue(method = "BH") %>%
add_significance()
stat.test
myplot <- ggboxplot(
mydata.long, x = "Species", y = "value",
fill = "Species", palette = "npg", legend = "none",
ggtheme = theme_pubr(border = TRUE)) +
facet_wrap(~variables)
# Add statistical test p-values
stat.test <- stat.test %>% add_xy_position(x = "Species")
myplot + stat_pvalue_manual(stat.test, label = "p.adj.signif")`
[this is the result from the site:][2]
[and this is what i got:][4]
any idea what i did wrong?
My Rstudio version is 1.4.1103
[1]: https://www.datanovia.com/en/blog/how-to-perform-multiple-t-test-in-r-for-different-variables/
[2]: https://i.stack.imgur.com/tzPo6.png
[3]: https://i.stack.imgur.com/1rtAO.jpg
[4]: https://i.stack.imgur.com/MJolk.png
I found it
i changed the "xmin" and "xmax values of "stat.test
A code was still working yesterday but no longer works. Below is the code and the error message. Someone can help me?
tbl(connexion, "donnees1") %>%
select(date_heure_debut) %>%
sdf_schema()
$date_heure_debut
$date_heure_debut$name
[1] "date_heure_debut"
$date_heure_debut$type
[1] "StringType
tbl(connexion, "donnees1") %>%
dplyr::mutate(
annee_debut = lubridate::year(date_heure_debut)
) %>%
sdf_register("donnees1")
Error in lubridate::year(date_heure_debut) :
object 'date_heure_debut' not found
I'm trying to test if I can run prophet with sparklyr to make forecast for data in cluster. But when I use spark_apply the program is stuck.
Running sparklyr on an edgenode connected to a yarn-client with spark 2.2.0.
The data is sales by locations spanning last 4 years.
The plan is to create a dataframe with all the data and partition the data by locations then call prophet on each location and get prediction for the next 7 days.
Here I tried to pull data for one location and apply prophet but sparklyr was stuck.
library("sparklyr")
library("prophet")
sc <- spark_connect(master = "yarn-client",version = "2.2.0"))
query = "select * from saletable"
df <- sdf_sql(sc,query) %>%
filter(locationid=="1111") %>%
select(date,sales) %>%
sdf_repartition(partitions=1) %>%
select(ds=date,y=sales)
## try to predict sales the next 7 days and get the predictions
sparkly_prophet <- function(df){
m <- prophet::prophet(df)
future <- prophet::make_future_dataframe(m,periods=7,freq='day')
forecast <- predict(m,future)
return (dplyr::select(forecast,yhat) %>% tail(7))
}
Then I run but it gets stuck
spark_apply(df,sparkly_prophet)
When I've used spark_apply(), I've had better success including the function definition within the call to spark_apply(). I'm not sure why this is, but it may be worth a short to restructure your code as
spark_apply(
df,
function(df) {
m <- prophet::prophet(df)
future <- prophet::make_future_dataframe(m, periods = 7, freq = "day")
forecast <- predict(m, future)
yhat <- dplyr::select(forecast, yhat)
return(tail(yhat, 7))
}
)
I am building a simple Random forest model on iris data in spark, I was hoping for some method of accuracy measure.
I thought of a simple column matching option too, however this did not work
Code:
library("SparkR")
sc = sparkR.session("local[*]")
iris_data <- as.DataFrame(iris)
train <- sample(iris_data, withReplacement=FALSE, fraction=0.5, seed=42)
test <- except(iris_data, train)
model_rf <- spark.randomForest(train, Species ~., "classification", numTrees = 10)
summary(model_rf)
Problem:
predictions <- predict(model_rf, test)
total_rows <- NROW(test)
predictions$correct <- (test$Species == test$prediction)
accuracy <- correct/total_rows
print(accuracy)
Error:
Error in column(callJMethod(x#sdf, "col", c)) :
P.S:
Using data bricks to run spark, don't mind running locally either
So this is how I did it,
total_rows <- NROW(test)
predictions$result <- ifelse((predictions$Species == predictions$prediction),
"TRUE", "FALSE")
correct <- NROW(predictions[predictions$result == "TRUE",])
accuracy <- correct/total_rows
cat(accuracy, "%")