Shiny app - aggregate data set by selection_filter and create new variables - filter

I am quite new to R and have been trying to find a solution for my problem since weeks. I hope someone can help me.
1.I want to develop a shiny app in a dashboard, where the user can select values via selection_filter (e.g. out of the variable "age group" the value "40-49 years" and from "sex" the value "female"). Based on these selections, columns (e.g. column x,y, and z) from the original dataset will be aggregated. I already wrote a function using aggregate().
2.Based on the aggregated columns, new values shall be calculated (e.g. d=(x-y)/(z/2)).
3.The aggregated columns and the newly calculated values shall be displayed in a table to the user.
The function from 1)
aggreg.function <- function(a,b,c) {
agg.data<- aggregate(cbind(x,y,z), shared_Cervix, sum,
subset=c(!AgeGroup %in% a & !Sex %in% b & !Edition %in% c))
#Calculate new values
agg.data$d<- agg.data$x+agg.data$y
agg.data$f<- (agg.data$x+agg.data$y)/(agg.data$z/2)
View(m.agg.data)
}
user_data<- reactive({
aggreg.function(input$AgeGroup, input$Sex, input$Edition)
})
EDIT
Thanks for the recommendations. I change my code but now I struggle a bit with adding new columns. In total, I want to insert 17 new columns based on the filtered table (data_step2()). Is there a way to insert multiple columns at the same time. In my example: is it possible to combine data_step3 and data_step4?
ui<-fluidPage(
selectInput("Age","Age:",sort(unique(Complete_test$Age))),
selectInput("Raced","Race:",sort(unique(Complete_test$Race))),
selectInput("Stage","Stage:",sort(unique(Complete_test$Stage))),
selectInput("Grade","Grade:",sort(unique(Complete_test$Grade))),
selectInput("Edition","Edition:",sort(unique(Complete$Edition))),
DT::dataTableOutput("filtered.result")
)
server = function(input, output) {
data_step1 <- reactive({
Complete%>% filter(Age %in% input$Age & Stage %in% input$Stage & Grade %in% input$Grade & Race %in% input$Race & Edition%in% input$_Edition)})
data_step2 <- reactive({
data_step1() %>% group_by(Age, Stage, Grade, Race, Edition, Year ) %>% summarise(across(everything(), sum))
})
#is it possible to combine data_step3 and data_step_4 ?
data_step3 <- reactive({
data_step2() %>% mutate(xy=x+y)
})
data_step4 <- reactive({
data_step3() %>% mutate(w=xy/(x2))
})
output$filtered.result <- DT::renderDataTable({
data_step4()
})
}
shinyApp(ui, server)
```

Here's something you might be able to do for step 1 of your question, although it's hard to tell what your data might look like and what you end result should look like. I'm assuming that you want to allow a user of your dashboard to select the AgeGroup and Sex variables to view data, so in the UI of the application, two selectInput functions were used to provide that functionality. For the server side, two reactive statements are used to filter the data as a user changes the inputs. In each one, an input is required and then the dataset is filtered by the input. Notice in the second reactive statement, "data_step1()" is used instead of "data_step1"; this allows the reactivity of these statements to continue to be updated as a user changes inputs.
For step 2 and 3, you'll want to use the "data_step2()" dataset to add the new columns and then you can use a function such as renderDataTable to display the output on your dashboard.
library(shiny)
library(dplyr)
ui <- fluidPage(
selectInput("AgeGroup", "AgeGroup:", sort(unique(Data$AgeGroup))),
selectInput("Sex", "Sex:", sort(unique(Data$Sex)))
)
server <- function(input,output,session){
data_step1 <- reactive({
req(input$AgeGroup) # Require input
data %>% filter(AgeGroup %in% input$AgeGroup)}) # Filter full dataset by AgeGroup input
data_step2 <- reactive({
req(input$Sex) # Require input
data_step1() %>% filter(Sex %in% input$Sex)}) # Filter full dataset by Sex input
# Do next steps with the filtered data_step2() dataset...
}
# Run the application
shinyApp(ui = ui, server = server)

Related

Google Sheets add a Permanent timestamp

I am setting up a sheet where a person will be able to check a checkbox, in different times, depending on the progress of a task. So, there are 5 checkboxes per row, for a number of different tasks.
Now, the idea is that, when you check one of those checkboxes, a message builds up in the few next cells coming after. So, the message is built in 3 cells. The first cell is just text, the second one is the date, and the third one is time. Also, those cells have 5 paragraphs each (one per checkbox).
The problem comes when I try to make that timestamp stay as it was when it was entered. As it is right now, the time changes every time I update any part of the Google Sheet.
I set u my formulas as follows:
For the text message:
=IF($C4=TRUE,"Insert text 1 here","")&CHAR(10)&IF($E4=TRUE, "Insert text here","")&CHAR(10)&IF($G4=TRUE, "Insert text 3 here","")&CHAR(10)&IF($I4=TRUE, "Insert text 4 here,"")&CHAR(10)&IF($K4=TRUE, "Insert text 5 here","")
For the date:
=IF($C4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($E4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($G4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($I4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($K4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")
And for the time:
=IF($C4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($E4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($G4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($I4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($K4=TRUE,(TEXT(NOW(),"HH:mm")),"")
And it all looks like this:
I would appreciate it greatly if anyone could help me get this to work so that date and time are inserted after checking those boxes and they donĀ“t change again
Notice that your struggle with the continuous changing date time. I had the same struggle as yours over the year, and I found a solution that works for my case nicely. But it needs to be a little more "dirty work" with Apps Script
Some background for my case:
I have multiple sheets in the spreadsheet to run and generate the
timestamp
I want to skip my first sheet without running to generate timestamp
in it
I want every edit, even if each value that I paste from Excel to
generate timestamp
I want the timestamp to be individual, each row have their own
timestamp precise to every second
I don't want a total refresh of the entire sheet timestamp when I am
editing any other row
I have a column that is a MUST FILL value to justify whether the
timestamp needs to be generated for that particular row
I want to specify my timestamp on a dedicated column only
function timestamp() {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const totalSheet = ss.getSheets();
for (let a=1; a<totalSheet.length; a++) {
let sheet = ss.getSheets()[a];
let range = sheet.getDataRange();
let values = range.getValues();
function autoCount() {
let rowCount;
for (let i = 0; i < values.length; i++) {
rowCount = i
if (values[i][0] === '') {
break;
}
}
return rowCount
}
rowNum = autoCount()
for(let j=1; j<rowNum+1; j++){
if (sheet.getRange(j+1,7).getValue() === '') {
sheet.getRange(j+1,7).setValue(new Date()).setNumberFormat("yyyy-MM-dd hh:mm:ss");
}
}
}
}
Explanation
First, I made a const totalSheet with getSheets() and run it
with a for loop. That is to identify the total number of sheets
inside that spreadsheet. Take note, in here, I made let a=1;
supposed all JavaScript the same, starts with 0, value 1 is to
skip the first sheet and run on the second sheet onwards
then, you will notice a function let sheet = ss.getSheets()[a]
inside the loop. Take note, it is not supposed to use const if
your value inside the variable is constantly changing, so use
let instead will work fine.
then, you will see a function autoCount(). That is to make a for
loop to count the number of rows that have values edited in it. The
if (values[i][0] === '') is to navigate the script to search
through the entire sheet that has value, looking at the row i and
the column 0. Here, the 0 is indicating the first column of the
sheet, and the i is the row of the sheet. Yes, it works like a
json object with panda feeling.
then, you found the number of rows that are edited by running the
autoCount(). Give it a rowNum variable to contain the result.
then, pass that rowNum into a new for loop, and use if (sheeet.getRange(j+1,7).getValue() === '') to determine which row
has not been edited with timestamp. Take note, where the 7 here
indicating the 7th column of the sheet is the place that I want a
timestamp.
inside the for loop, is to setValue with date in a specified
format of ("yyyy-MM-dd hh:mm:ss"). You are free to edit into any
style you like
ohya, do remember to deploy to activate the trigger with event type
as On Change. That is not limiting to edit, but for all kinds of
changes including paste.
Here's a screenshot on how it would look like:
Lastly, please take note on some of my backgrounds before deciding to or not to have the solution to work for your case. Cheers, and happy coding~!

How do I add specific values of columns to create new columns?

I have a dataset which I want to format in order to perform repeated measures anova. My dataset is of the form:
set.seed(32)
library(tibble)
id<- rep(1:2,each=3)
y_0 <- rep(rnorm(2,mean=50,sd=10),each=3)
time <- rep(c(1,2,3),times=2)
c<-rep(rnorm(2,mean=10,sd=12),each=3)
data <- tibble(id,y,t,c)
I want to bring the dataset in the form of a dataset for repeated measures anova meaning I want to have only one value for id in each column and create 3 more columns. One for y+c in time 1 named y_1,y+c in time 2 named y_2 and y+c in time 3 named y_3. Can anyone provide some assistance?

shiny DT::renderdatatable

suppose the columns of a data table are: unique-ID, name, salary, position.
I display the table in a shiny application using DT::renderdatatable (DT::dataTableOutput).
I would like to click on a row of the output to display other data of the person belonging to the ID in another output.
what is the solution?
in short, how do I extract the unique-ID from a clicked line item?
You can use the _rows_selected extension coming with DT tables. Here's the list of possible arguments. Several live examples are here or here.
This is a simple example of a plot that updates with the lines selected in the table:
library(shiny)
library(DT)
ui <- fluidPage(
DT::dataTableOutput("test_table"),
plotOutput("test_plot")
)
server <- function(input, output, session) {
output$test_table <- DT::renderDataTable({
mtcars
})
output$test_plot <- renderPlot({
s <- input$test_table_rows_selected
if (!is.null(s)) {
plot(mtcars[s, "disp"])
}
})
}
shinyApp(ui, server)

Spark: lazy action?

I am working on a complex application. From source data, we compute many statistics, eg .
val df1 = sourceData.filter($"col1" === "val" and ...)
.select(...)
.groupBy(...)
.min()
val df2 = sourceData.filter($"col2" === "val" and ...)
.select(...)
.groupBy(...)
.count()
As the dataframe are grouped on the same columns, the result dataframes are then grouped together:
df1.join(df2, Seq("groupCol"), "full_outer")
.join(df3....)
.write.save(...)
(in my code this is done in a loop)
This is not performant, the problem is that each dataframe (I have about 30) ends with a action, so in my understanding each dataframe is computed and returned to the driver, which then sends back data to executors to perform the join.
This gives me memory error, I can increase the driver memory but I am looking for a better way of doing it. For ex. if all dataframes were computed only at the end (with the saving of the joined dataframe) I guess that everything would be managed by the cluster.
Is there a way to do a kind of lazy action? Or should I join the dataframes in another way?
Thx
First of all, the code you've shown contains only one action-like operation - DataFrameWriter.save. All other components are lazy.
But laziness doesn't really help you here. The biggest problem (assuming no ugly data skew or misconfigured broadcasting) is that the individual aggregations require separate shuffles and expensive subsequent merge.
A naive solution would be to leverage that:
the dataframe are grouped on the same columns
to shuffle first:
val groupColumns: Seq[Column] = ???
val sourceDataPartitioned = sourceData.groupBy(groupColumns: _*)
and use the result to compute individual aggregates
val df1 = sourceDataPartitioned
...
val df2 = sourceDataPartitioned
...
However, this approach is rather brittle and is unlikely to scale in presence large / skewed groups.
Therefore it would be much better to rewrite your code to perform only aggregation. Luckily for you, standard SQL behavior is all you need.
Let's start with structuring you code into three element tuples with:
_1 being a predicate (the condition you use with filter).
_2 being a list of Columns for which you want to compute aggregates.
_3 being an aggregate function.
Where example structure can look this:
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.{count, min}
val ops: Seq[(Column, Seq[Column], Column => Column)] = Seq(
($"col1" === "a" and $"col2" === "b", Seq($"col3", $"col4"), count),
($"col2" === "b" and $"col3" === "c", Seq($"col4", $"col5"), min)
)
Now you compose aggregate expressions using
agg_function(when(predicate, column))
pattern
import org.apache.spark.sql.functions.when
val exprs: Seq[Column] = ops.flatMap {
case (p, cols, f) => cols.map {
case c => f(when(p, c))
}
}
and use it on the sourceData
sourceData.groupBy(groupColumns: _*).agg(exprs.head, exprs.tail: _*)
Add aliases when necessary.

Writing arbitrary R objects to SQLite database

I'm trying to store large list objects created in R to an SQLite database via RSQLite. Since These list objects contain several 2d and 3d matrices, I'd like store them as individual entries. I read serializing these and storing them as blobs does the trick.
The Problem is however that my code does not appear to store the blobs as individual rows, instead rather storing each separate byte as row. Here is my code:
library(RSQLite)
out1 <- serialize(model1,NULL)
out2 <- serialize(model2,NULL)
out3 <- serialize(model3,NULL)
model4 <- serialize(rnorm(10),NULL)
model5 <- serialize(rnorm(20),NULL)
model6 <- serialize(rnorm(30),NULL)
db <- dbConnect(SQLite(), dbname="Test.sqlite")
dbGetQuery(conn = db,
"CREATE TABLE IF NOT EXISTS models
(_id INTEGER PRIMARY KEY AUTOINCREMENT,
model BLOB)")
test4 <- data.frame(g=I(model4))
test5 <- data.frame(g=I(model5))
test6 <- data.frame(g=I(model6))
dbGetPreparedQuery(db, "INSERT INTO models (model) values (:g)", bind.data=test4)
dbGetPreparedQuery(db, "INSERT INTO models (model) values (:g)", bind.data=test5)
dbGetPreparedQuery(db, "INSERT INTO models (model) values (:g)", bind.data=test6)
dbListTables(db)
p1 = dbGetQuery( db,'select * from models' )
Also, while the writing process works fine in this case, it is incredibly slow with files larger than 1000kb...

Resources