How to update bokeh active interaction with GeoJSON as data source? - drop-down-menu

I have made an interactive choropleth map with bokeh, and I'm trying to add active interactions using the dropdown widget (Select). However, most tutorials and SO questions about active interactions use ColumnDataSource, and not GeoJSONDataSource.
The issue is that GeoJSONDataSource doesn't have a .data method like ColumnDataSource does, so idk exactly how the syntax works when updating it.
My dataset is a dictionary in the form of city_dict = {'Amsterdam': <some data frame>, 'Antwerp': <some data frame>, ...}, where the dataframe is in geojson format. I have already confirmed that this format works when making glyphs.
def update(attr, old, new):
s_value = dropdown.value
p.title.text = '%s', s_value
new_src1 = make_dataset(s_value)
val1 = GeoJSONDataSource(new_src1)
r1.data_source = val1
where make_dataset is a function that transforms my original dataset into a dataset that can feed into the GeoJSONDataSource function. make_dataset requires a string (name of the city) to work eg. 'Amsterdam'. It works on passive interactions.
The main plot code (removed unnecessary stuff) is:
dropdown = Select(value='Amsterdam', options = cities)
controls = WidgetBox(dropdown)
initial_city = 'Amsterdam'
a = make_dataset(initial_city)
src1 = GeoJSONDataSource(a)
p = figure(title = 'Amsterdam', plot_height = 750 , plot_width = 900, toolbar_location = 'right')
r1 = p.patches('xs','ys', source = src1, fill_color = {'field' :'norm', 'transform' : color_mapper})
dropdown.on_change('value', update)
layout = row(controls, p)
curdoc().add_root(layout)
I've added the error I get. error handling message Message 'PATCH-DOC' (revision 1) content: {'events': [{'kind': 'ModelChanged', 'model': {'type': 'Select', 'id': '1147'}, 'attr': 'value', 'new': 'Antwerp'}], 'references': []}: ValueError("expected a value of type str, got ('%s', 'Antwerp') of type tuple",)

Related

Fine-tune a pre-trained model

I am new to transformer based models. I am trying to fine-tune the following model (https://huggingface.co/Chramer/remote-sensing-distilbert-cased) on my dataset. The code:
enter image description here
and I got the following error:
enter image description here
I will be thankful if anyone could help.
The preprocessing steps I followed:
input_ids_t = []
attention_masks_t = []
for sent in df_train['text_a']:
encoded_dict = tokenizer.encode_plus(
sent,
add_special_tokens = True,
max_length = 128,
pad_to_max_length = True,
return_attention_mask = True,
return_tensors = 'tf',
)
input_ids_t.append(encoded_dict['input_ids'])
attention_masks_t.append(encoded_dict['attention_mask'])
# Convert the lists into tensors.
input_ids_t = tf.concat(input_ids_t, axis=0)
attention_masks_t = tf.concat(attention_masks_t, axis=0)
labels_t = np.asarray(df_train['label'])
and i did the same for testing data. Then:
train_data = tf.data.Dataset.from_tensor_slices((input_ids_t,attention_masks_t,labels_t))
and the same for testing data
It sounds like you are feeding the transformer_model 1 input instead of 3. Try removing the square brackets around transformer_model([input_ids, input_mask, segment_ids])[0] so that it reads transformer_model(input_ids, input_mask, segment_ids)[0]. That way, the function will have 3 arguments and not just 1.

Properly evaluate a test dataset

I trained a machine translation model using huggingface library:
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
# Replace -100 in the labels as we can't decode them.
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# Some simple post-processing
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
result = {"bleu": result["score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
result = {k: round(v, 4) for k, v in result.items()}
return result
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
model_dir = './models/'
trainer.save_model(model_dir)
The code above is taken from this Google Colab notebook. After the training, I can see the trained model is saved to the folder models and the metric is calculated. Now I want to load the trained model and do the prediction on a new dataset, here is what I tried:
dataset = load_dataset('csv', data_files='data/training_data.csv')
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Tokenize the test dataset
tokenized_datasets = train_test.map(preprocess_function_v2, batched=True)
test_dataset = tokenized_datasets['test']
model = AutoModelForSeq2SeqLM.from_pretrained('models')
model(test_dataset)
It threw the following error:
*** AttributeError: 'Dataset' object has no attribute 'size'
I tried the evaluate() function as well, but it said:
*** torch.nn.modules.module.ModuleAttributeError: 'MarianMTModel' object has no attribute 'evaluate'
And the function eval only prints the configuration of the model.
What is the proper way to evaluate the performance of the trained model on a new dataset?
Turned out that the prediction can be produced using the following code:
inputs = tokenizer(
questions,
max_length=max_input_length,
truncation=True,
return_tensors='pt',
padding=True).to('cuda')
translation = model.generate(**inputs)

How to adjust transparency (alpha) in seaborn swarmplot?

I have a swarmplot:
sns.swarmplot(y = "age gap corr", x = "cluster",
data = scatter_data, hue = 'group', dodge=True)
and I would like to adjust the transparency of the dots:
sns.swarmplot(y = "age gap corr", x = "cluster",
data = scatter_data, hue = 'group', dodge=True,
scatter_kws = {'alpha': 0.1})
sns.swarmplot(y = "age gap corr", x = "cluster",
data = scatter_data, hue = 'group', dodge=True,
plot_kws={'scatter_kws': {'alpha': 0.1}})
but neither of the above methods works.
any help is appreciated.
You can simply input the alpha argument directly in the swarmplot function:
import seaborn as sns
df = sns.load_dataset('diamonds').sample(1000)
sns.swarmplot(data=df, x='cut', y='carat', hue='color', alpha=0.5)
The documentation for swarmplot states
kwargs : key, value mappings
Other keyword arguments are passed through to matplotlib.axes.Axes.scatter().
Thus, you don't need to use scatter_kws={...}.
Compare this to, e.g., sns.lmplot, which states
{scatter,line}_kws : dictionaries
Additional keyword arguments to pass to plt.scatter and plt.plot.

How to connect leaflet map clicks (events) with plot creation in a shiny app

Hello I am creating an environmental shiny app in which I want to use a leaflet map to create some simple plots based on openair package(https://rpubs.com/NateByers/Openair).
Aq_measurements() general form
AQ<- (aq_measurements(country = “country”, city = “city”, location = “location”, parameter = “pollutant choice”, date_from = “YYYdateY-MM-DD”, date_to = “YYYY-MM-DD”).
All parameters available in locations dataframe.
worldmet() general form
met <- importNOAA(code = "12345-12345", year = YYYYY:YYYY)
NOAA Code available in locations dataframe
Below I create a sample of my initial data frame:
location = c("100 ail","16th and Whitmore","40AB01 - ANTWERPEN")
lastUpdated = c("2018-02-01 09:30:00", "2018-02-01 03:00:00", "2017-03-07 10:00:00")
firstUpdated = c("2015-09-01 00:00:00","2016-03-06 19:00:00","2016-11-22 15:00:00")
pm25=c("FALSE","FALSE","FALSE")
pm10=c("TRUE","FALSE","FALSE")
no2=c("TRUE","FALSE","FALSE")
latitude=c(47.932907,41.322470,36.809700)
longitude=c(106.92139000,-95.93799000
,-107.65170000)
df = data.frame(location, lastUpdated, firstUpdated,latitude,longitude,pm25,pm10,no2)
As a general idea I want to be able to click on a certain location in the map based on this dataframe. Then I have one selectInput() and 2 dateInput(). The 2 dateInput() should take as inputs the df$firstUpdated and df$lastUpdated respectively. Then the selectInput() should take as inputs the pollutants that exist in the df based on "TRUE"/"FALSE" value. And then the plots should be created. All of these should be triggered by clicking on the map.
Up to now I was not able to achieve this so in order to help you understand I connected the selectInput() and the dateInput() with input$loc which is a selectIpnut() with locations in the first tab as I will not need this when I find the solution.
library(shiny)
library(leaflet)
library(plotly)
library(shinythemes)
library(htmltools)
library(DT)
library(utilr)
library(openair)
library(plotly)
library(dplyr)
library(ggplot2)
library(gissr)
library(ropenaq)
library(worldmet)
# Define UI for application that draws a histogram
ui = navbarPage("ROPENAQ",
tabPanel("CREATE DATAFRAME",
sidebarLayout(
# Sidebar panel for inputs ----
sidebarPanel(
wellPanel(
uiOutput("loc"),
helpText("Choose a Location to create the dataframe.")
)
),
mainPanel(
)
)
),
tabPanel("LEAFLET MAP",
leafletOutput("map"),
wellPanel(
uiOutput("dt"),
uiOutput("dt2"),
helpText("Choose a start and end date for the dataframe creation. Select up to 2 dates")
),
"Select your Pollutant",
uiOutput("pollutant"),
helpText("While all pollutants are listed here, not all pollutants are measured at all locations and all times.
Results may not be available; this will be corrected in further revisions of the app. Please refer to the measurement availability
in the 'popup' on the map."),
hr(),
fluidRow(column(8, plotOutput("tim")),
column(4,plotOutput("polv"))),
hr(),
fluidRow(column(4, plotOutput("win")),
column(8,plotOutput("cal"))),
hr(),
fluidRow(column(12, plotOutput("ser"))
)
)
)
#server.r
# load data
# veh_data_full <- readRDS("veh_data_full.RDS")
# veh_data_time_var_type <- readRDS("veh_data_time_var_type.RDS")
df$location <- gsub( " " , "+" , df$location)
server = function(input, output, session) {
output$pollutant<-renderUI({
selectInput("pollutant", label = h4("Choose Pollutant"),
choices = colnames(df[,6:8]),
selected = 1)
})
#Stores the value of the pollutant selection to pass to openAQ request
###################################
#output$OALpollutant <- renderUI({OALpollutant})
##################################
# create the map, using dataframe 'locations' which is polled daily (using ropenaq)
#MOD TO CONSIDER: addd all available measurements to the popup - true/false for each pollutant, and dates of operation.
output$map <- renderLeaflet({
leaflet(subset(df,(df[,input$pollutant]=="TRUE")))%>% addTiles() %>%
addMarkers(lng = subset(df,(df[,input$pollutant]=="TRUE"))$longitude, lat = subset(df,(df[,input$pollutant]=="TRUE"))$latitude,
popup = paste("Location:", subset(df,(df[,input$pollutant]=="TRUE"))$location, "<br>",
"Pollutant:", input$pollutant, "<br>",
"First Update:", subset(df,(df[,input$pollutant]=="TRUE"))$firstUpdated, "<br>",
"Last Update:", subset(df,(df[,input$pollutant]=="TRUE"))$lastUpdated
))
})
#Process Tab
OAL_site <- reactive({
req(input$map_marker_click)
location %>%
filter(latitude == input$map_marker_click$lat,
longitude == input$map_marker_click$lng)
###########
#call Functions for data retrieval and processing. Might be best to put all data request
#functions into a seperate single function. Need to:
# call importNOAA() to retrieve meteorology data into temporary data frame
# call aq_measurements() to retrieve air quality into a temporary data frame
# merge meteorology and air quality datasets into one working dataset for computations; temporary
# meteorology and air quality datasets to be removed.
# call openAir() functions to create plots from merged file. Pass output to a dashboard to assemble
# into appealing output.
# produce output, either as direct download, or as an emailable PDF.
# delete all temporary files and reset for next run.
})
#fun
output$loc<-renderUI({
selectInput("loc", label = h4("Choose location"),
choices = df$location ,selected = 1
)
})
output$dt<-renderUI({
dateInput('date',
label = 'First Available Date',
value = subset(df$firstUpdated,(df[,1]==input$loc))
)
})
output$dt2<-renderUI({
dateInput('date2',
label = 'Last available Date',
value = subset(df$lastUpdated,(df[,1]==input$loc))
)
})
rt<-reactive({
AQ<- aq_measurements(location = input$loc, date_from = input$dt,date_to = input$dt2,parameter = input$pollutant)
met <- importNOAA(year = 2014:2018)
colnames(AQ)[9] <- "date"
merged<-merge(AQ, met, by="date")
# date output -- reports user-selected state & stop dates in UI
merged$location <- gsub( " " , "+" , merged$location)
merged
})
#DT
output$tim = renderPlot({
timeVariation(rt(), pollutant = "value")
})
}
shinyApp(ui = ui, server = server)
The part of my code that I believe input$MAPID_click should be applied is:
output$map <- renderLeaflet({
leaflet(subset(locations,(locations[,input$pollutant]=="TRUE")))%>% addTiles() %>%
addMarkers(lng = subset(locations,(locations[,input$pollutant]=="TRUE"))$longitude, lat = subset(locations,(locations[,input$pollutant]=="TRUE"))$latitude,
popup = paste("Location:", subset(locations,(locations[,input$pollutant]=="TRUE"))$location, "<br>",
"Pollutant:", input$pollutant, "<br>",
"First Update:", subset(locations,(locations[,input$pollutant]=="TRUE"))$firstUpdated, "<br>",
"Last Update:", subset(locations,(locations[,input$pollutant]=="TRUE"))$lastUpdated
))
})
output$dt<-renderUI({
dateInput('date',
label = 'First Available Date',
value = subset(locations$firstUpdated,(locations[,1]==input$loc))
)
})
output$dt2<-renderUI({
dateInput('date2',
label = 'Last available Date',
value = subset(locations$lastUpdated,(locations[,1]==input$loc))
)
})
rt<-reactive({
AQ<- aq_measurements(location = input$loc, date_from = input$dt,date_to = input$dt2)
met <- importNOAA(year = 2014:2018)
colnames(AQ)[9] <- "date"
merged<-merge(AQ, met, by="date")
# date output -- reports user-selected state & stop dates in UI
merged$location <- gsub( " " , "+" , merged$location)
merged
})
#DT
output$tim = renderPlot({
timeVariation(rt(), pollutant = "value")
})
Here is a minimal example. You click on your marker and you get a plot.
ui = fluidPage(
leafletOutput("map"),
textOutput("temp"),
plotOutput('tim')
)
#server.r
#df$location <- gsub( " " , "+" , df$location)
server = function(input, output, session) {
output$map <- renderLeaflet({
leaflet(df)%>% addTiles() %>% addMarkers(lng = longitude, lat = latitude)
})
output$temp <- renderPrint({
input$map_marker_click$lng
})
output$tim <- renderPlot({
temp <- df %>% filter(longitude == input$map_marker_click$lng)
# timeVariation(temp, pollutant = "value")
print(ggplot(data = temp, aes(longitude, latitude)) + geom_point())
})
}
shinyApp(ui = ui, server = server)

Slickgrid filtering is crippled when CheckboxSelectColumn plugin is used

I have a slickgrid with 25,000+ rows. I have setup column filtering (see example) which works fine and is very fast.
I have now added the CheckboxSelectColumn plugin (see example) and while this worked it has crippled the speed of the filtering. Everything still works, just very much slower.
I have tried optimising the filtering by supplying RefreshHints (see example) but no joy.
Is it just the combination of filtering plus checkboxes plus large row count, or am I doing something wrong?
Here are the relevant bits of code (CoffeeScript).
Setup the Column Filters
setupColumnFilters:()->
$(grid.getHeaderRow()).delegate(':input', 'change keyup', (e) ->
columnId = $(this).data('columnId')
if columnId?
newVal = $.trim($(this).val())
columnFilters[columnId] = newVal
# Trying to optimise using RefreshHints
newLen = newVal?.length
oldlen = columnFilters[columnId]?.length ? 0
isNarrowing = newLen > oldlen
isExpanding = newLen < oldlen
renderedRange = grid.getRenderedRange()
dataView.setRefreshHints({
ignoreDiffsBefore: renderedRange.top,
ignoreDiffsAfter: renderedRange.bottom + 1,
isFilterNarrowing: isNarrowing,
isFilterExpanding: isExpanding
})
dataView.refresh()
)
grid.onHeaderRowCellRendered.subscribe((e, args) ->
node = $(args.node)
node.empty()
id = args.column.id
if id == '_checkbox_selector'
node.hide()
return
placeholder = 'filter by ' + id
html = '<input type="text" placeholder="' + placeholder + '">'
$(html)
.data('columnId', id)
.val(columnFilters[id])
.appendTo(node)
.focus(()->$(this).attr('placeholder', ''))
.blur(()-> $(this).attr('placeholder', placeholder) if $(this).val()?)
)
Setup the CheckboxSelect Plugin
setupCheckboxSelect:() ->
checkboxPlugin = new Slick.CheckboxSelectColumn({ cssClass: "slick-cell-checkboxsel" });
columns.unshift(checkboxPlugin.getColumnDefinition());
grid.setColumns(columns);
grid.registerPlugin(checkboxPlugin);
The Filter Function
filter: (item) =>
grid.setSelectedRows([])
columns = grid.getColumns()
for columnId, filter of columnFilters
if filter?
column = columns[grid.getColumnIndex(columnId)]
field = item[column.field]
return false unless (field? && field.toLowerCase().indexOf(filter.toLowerCase()) > -1)
return true
Whoa, why are you calling grid.setSelectedRows([]) in your filter?!?
It gets called 25'000 times whenever you refresh the data.
Besides being completely pointless, it does slow things down even more when you use the checkbox select column since it needs to synchronize the state (based on selection).

Resources