Not getting right value from state store - apache-kafka-streams

I am trying to use state store to merge multiple kafka streams. As part of it , I am consuming messages from multiple topics and put them in state store with keys for Ex :
message from topic1 saved in state store as key_p1 and value1
message from topic2 saved in state store as key_p2 and value2
message from topic3 saved in state store as key_p3 and value3.
To meet SLA , I tired to query the state store to verify if I received mandatory transactions (for ex on topic2 and topic3 , with values key_p2 and key_p3).
val priorityTxns: KeyValueIterator[String, ValueAndTimestamp[String]] = kvStore.range(key_p2,key_p3)
Though i have the messages in state store , most of the times (not always ) I only get only one message.
Is there a way to refresh the store before querying?
Code in my transform method :
`override def transform(kafkaKey: String, value: String): KeyValue[String,String] ={
var key=""
var taggedMsg=""
if((kafkaKey != null) && (kafkaKey != "")) {
key=kafkaKey+txnKeys.get(msgTag).get // this will be like key_p1,key_p2,key_p3 etc.
kvStore.put(key, ValueAndTimestamp.make(taggedMsg, context.timestamp))
log.info("saving value in state store with key " + key + " and value " + kvStore.get(key))
}
}`
and code in inti(context ProcessorContext)
val priorityTxns: KeyValueIterator[String, ValueAndTimestamp[String]] = kvStore.range(key_p2,key_p3)
val tempP3 = kvStore.get(rangeKey + "_p3")
val tempP2 = kvStore.get(rangeKey + "_p2")
while (priorityTxns.hasNext) {
log.info("available priority keys " + priorityTxns.peekNextKey())
val e = priorityTxns.next()
}

Related

Can I use variables across all the threads in the thread groups in jmeter?

I'm trying to create a test plan for rate-limiting behavior.
I set a rule that blocks after X requests per minute, and I want to check that I get response code 200 until I reached the X requests, and from then, to get 429. I created a counter that shared between all the threads, but it seems to be a mess because it's not a thread-safe.
This is my beanshell "once only controller":
String props_pre_fix = ${section_id} + "-" + ${START.HMS};
props.remove("props_pre_fix" + ${section_id}, props_pre_fix);
props.put("props_pre_fix" + ${section_id}, props_pre_fix);
props.put(props_pre_fix + "_last_response_code", "200");
props.put(props_pre_fix + "_my_counter", "0");
and this is the beanshell assertion:
String props_pre_fix = props.get("props_pre_fix" + ${section_id});
//log.info("props_pre_fix " + props_pre_fix);
//extract my counter from props
int my_counter = Integer.parseInt(props.get(props_pre_fix + "_my_counter"));
//extract last response code
String last_response_code = props.get(props_pre_fix + "_last_response_code");
log.info("last_response_code " + last_response_code);
//if last seconds is greater than current seconds it means we are in a new minute - set counter to zero
if(last_response_code.equals("429") && ResponseCode.equals("200")){
log.info("we moved to a new minute - my_counter should be zero");
my_counter = 0;
}
//increase counter
my_counter++;
log.info("set counter with value: " + my_counter);
//save counter
props.put(props_pre_fix + "_my_counter", my_counter + "");
log.info("counter has set with value: " + my_counter);
if (ResponseCode.equals("200")) {
props.put(props_pre_fix + "_last_response_code", "200");
if(my_counter <= ${current_limit}){
Failure = false;
}
else {
Failure = true;
FailureMessage = "leakage of " + (my_counter - ${current_limit}) + " requests";
}
}
else if (ResponseCode.equals("429")) {
props.put(props_pre_fix + "_last_response_code", "429");
if(my_counter > ${current_limit}){
Failure = false;
}
}
I'm using props to share the counter, but I obviously feel that this is not the right way to do it.
Can you suggest me how to do that?
I don't think that it is possible to automatically test this requirement using JMeter Assertions because you don't have access to the current throughput so I would rather recommend considering cross-checking Response Codes per Second and Transactions per Second charts (can be installed using JMeter Plugins Manager)
All the 200 and 429 responses can be marked as successful using Response Assertion configured like:
If for some reason you still want to do this programmatically you might want to take a look at Summariser class source which is used for displaying current throughput in the STDOUT.
Also be informed that starting from JMeter 3.1 you should be using JSR223 Test Elements and Groovy language for scripting.

How to connect leaflet map clicks (events) with plot creation in a shiny app

Hello I am creating an environmental shiny app in which I want to use a leaflet map to create some simple plots based on openair package(https://rpubs.com/NateByers/Openair).
Aq_measurements() general form
AQ<- (aq_measurements(country = “country”, city = “city”, location = “location”, parameter = “pollutant choice”, date_from = “YYYdateY-MM-DD”, date_to = “YYYY-MM-DD”).
All parameters available in locations dataframe.
worldmet() general form
met <- importNOAA(code = "12345-12345", year = YYYYY:YYYY)
NOAA Code available in locations dataframe
Below I create a sample of my initial data frame:
location = c("100 ail","16th and Whitmore","40AB01 - ANTWERPEN")
lastUpdated = c("2018-02-01 09:30:00", "2018-02-01 03:00:00", "2017-03-07 10:00:00")
firstUpdated = c("2015-09-01 00:00:00","2016-03-06 19:00:00","2016-11-22 15:00:00")
pm25=c("FALSE","FALSE","FALSE")
pm10=c("TRUE","FALSE","FALSE")
no2=c("TRUE","FALSE","FALSE")
latitude=c(47.932907,41.322470,36.809700)
longitude=c(106.92139000,-95.93799000
,-107.65170000)
df = data.frame(location, lastUpdated, firstUpdated,latitude,longitude,pm25,pm10,no2)
As a general idea I want to be able to click on a certain location in the map based on this dataframe. Then I have one selectInput() and 2 dateInput(). The 2 dateInput() should take as inputs the df$firstUpdated and df$lastUpdated respectively. Then the selectInput() should take as inputs the pollutants that exist in the df based on "TRUE"/"FALSE" value. And then the plots should be created. All of these should be triggered by clicking on the map.
Up to now I was not able to achieve this so in order to help you understand I connected the selectInput() and the dateInput() with input$loc which is a selectIpnut() with locations in the first tab as I will not need this when I find the solution.
library(shiny)
library(leaflet)
library(plotly)
library(shinythemes)
library(htmltools)
library(DT)
library(utilr)
library(openair)
library(plotly)
library(dplyr)
library(ggplot2)
library(gissr)
library(ropenaq)
library(worldmet)
# Define UI for application that draws a histogram
ui = navbarPage("ROPENAQ",
tabPanel("CREATE DATAFRAME",
sidebarLayout(
# Sidebar panel for inputs ----
sidebarPanel(
wellPanel(
uiOutput("loc"),
helpText("Choose a Location to create the dataframe.")
)
),
mainPanel(
)
)
),
tabPanel("LEAFLET MAP",
leafletOutput("map"),
wellPanel(
uiOutput("dt"),
uiOutput("dt2"),
helpText("Choose a start and end date for the dataframe creation. Select up to 2 dates")
),
"Select your Pollutant",
uiOutput("pollutant"),
helpText("While all pollutants are listed here, not all pollutants are measured at all locations and all times.
Results may not be available; this will be corrected in further revisions of the app. Please refer to the measurement availability
in the 'popup' on the map."),
hr(),
fluidRow(column(8, plotOutput("tim")),
column(4,plotOutput("polv"))),
hr(),
fluidRow(column(4, plotOutput("win")),
column(8,plotOutput("cal"))),
hr(),
fluidRow(column(12, plotOutput("ser"))
)
)
)
#server.r
# load data
# veh_data_full <- readRDS("veh_data_full.RDS")
# veh_data_time_var_type <- readRDS("veh_data_time_var_type.RDS")
df$location <- gsub( " " , "+" , df$location)
server = function(input, output, session) {
output$pollutant<-renderUI({
selectInput("pollutant", label = h4("Choose Pollutant"),
choices = colnames(df[,6:8]),
selected = 1)
})
#Stores the value of the pollutant selection to pass to openAQ request
###################################
#output$OALpollutant <- renderUI({OALpollutant})
##################################
# create the map, using dataframe 'locations' which is polled daily (using ropenaq)
#MOD TO CONSIDER: addd all available measurements to the popup - true/false for each pollutant, and dates of operation.
output$map <- renderLeaflet({
leaflet(subset(df,(df[,input$pollutant]=="TRUE")))%>% addTiles() %>%
addMarkers(lng = subset(df,(df[,input$pollutant]=="TRUE"))$longitude, lat = subset(df,(df[,input$pollutant]=="TRUE"))$latitude,
popup = paste("Location:", subset(df,(df[,input$pollutant]=="TRUE"))$location, "<br>",
"Pollutant:", input$pollutant, "<br>",
"First Update:", subset(df,(df[,input$pollutant]=="TRUE"))$firstUpdated, "<br>",
"Last Update:", subset(df,(df[,input$pollutant]=="TRUE"))$lastUpdated
))
})
#Process Tab
OAL_site <- reactive({
req(input$map_marker_click)
location %>%
filter(latitude == input$map_marker_click$lat,
longitude == input$map_marker_click$lng)
###########
#call Functions for data retrieval and processing. Might be best to put all data request
#functions into a seperate single function. Need to:
# call importNOAA() to retrieve meteorology data into temporary data frame
# call aq_measurements() to retrieve air quality into a temporary data frame
# merge meteorology and air quality datasets into one working dataset for computations; temporary
# meteorology and air quality datasets to be removed.
# call openAir() functions to create plots from merged file. Pass output to a dashboard to assemble
# into appealing output.
# produce output, either as direct download, or as an emailable PDF.
# delete all temporary files and reset for next run.
})
#fun
output$loc<-renderUI({
selectInput("loc", label = h4("Choose location"),
choices = df$location ,selected = 1
)
})
output$dt<-renderUI({
dateInput('date',
label = 'First Available Date',
value = subset(df$firstUpdated,(df[,1]==input$loc))
)
})
output$dt2<-renderUI({
dateInput('date2',
label = 'Last available Date',
value = subset(df$lastUpdated,(df[,1]==input$loc))
)
})
rt<-reactive({
AQ<- aq_measurements(location = input$loc, date_from = input$dt,date_to = input$dt2,parameter = input$pollutant)
met <- importNOAA(year = 2014:2018)
colnames(AQ)[9] <- "date"
merged<-merge(AQ, met, by="date")
# date output -- reports user-selected state & stop dates in UI
merged$location <- gsub( " " , "+" , merged$location)
merged
})
#DT
output$tim = renderPlot({
timeVariation(rt(), pollutant = "value")
})
}
shinyApp(ui = ui, server = server)
The part of my code that I believe input$MAPID_click should be applied is:
output$map <- renderLeaflet({
leaflet(subset(locations,(locations[,input$pollutant]=="TRUE")))%>% addTiles() %>%
addMarkers(lng = subset(locations,(locations[,input$pollutant]=="TRUE"))$longitude, lat = subset(locations,(locations[,input$pollutant]=="TRUE"))$latitude,
popup = paste("Location:", subset(locations,(locations[,input$pollutant]=="TRUE"))$location, "<br>",
"Pollutant:", input$pollutant, "<br>",
"First Update:", subset(locations,(locations[,input$pollutant]=="TRUE"))$firstUpdated, "<br>",
"Last Update:", subset(locations,(locations[,input$pollutant]=="TRUE"))$lastUpdated
))
})
output$dt<-renderUI({
dateInput('date',
label = 'First Available Date',
value = subset(locations$firstUpdated,(locations[,1]==input$loc))
)
})
output$dt2<-renderUI({
dateInput('date2',
label = 'Last available Date',
value = subset(locations$lastUpdated,(locations[,1]==input$loc))
)
})
rt<-reactive({
AQ<- aq_measurements(location = input$loc, date_from = input$dt,date_to = input$dt2)
met <- importNOAA(year = 2014:2018)
colnames(AQ)[9] <- "date"
merged<-merge(AQ, met, by="date")
# date output -- reports user-selected state & stop dates in UI
merged$location <- gsub( " " , "+" , merged$location)
merged
})
#DT
output$tim = renderPlot({
timeVariation(rt(), pollutant = "value")
})
Here is a minimal example. You click on your marker and you get a plot.
ui = fluidPage(
leafletOutput("map"),
textOutput("temp"),
plotOutput('tim')
)
#server.r
#df$location <- gsub( " " , "+" , df$location)
server = function(input, output, session) {
output$map <- renderLeaflet({
leaflet(df)%>% addTiles() %>% addMarkers(lng = longitude, lat = latitude)
})
output$temp <- renderPrint({
input$map_marker_click$lng
})
output$tim <- renderPlot({
temp <- df %>% filter(longitude == input$map_marker_click$lng)
# timeVariation(temp, pollutant = "value")
print(ggplot(data = temp, aes(longitude, latitude)) + geom_point())
})
}
shinyApp(ui = ui, server = server)

Snappydata - sql put into on jobserver don't aggregate values

I'm trying to create a jar to run on snappy-job shell with streaming.
I have aggregation function and it works in windows perfectly. But I need to have a table with one value for each key. Base on a example from github a create a jar file and now I have problem with put into sql command.
My code for aggregation:
val resultStream: SchemaDStream = snsc.registerCQ("select publisher, cast(sum(bid)as int) as bidCount from " +
"AggrStream window (duration 1 seconds, slide 1 seconds) group by publisher")
val conf = new ConnectionConfBuilder(snsc.snappySession).build()
resultStream.foreachDataFrame(df => {
df.write.insertInto("windowsAgg")
println("Data received in streaming window")
df.show()
println("Updating table updateTable")
val conn = ConnectionUtil.getConnection(conf)
val result = df.collect()
val stmt = conn.prepareStatement("put into updateTable (publisher, bidCount) values " +
"(?,?+(nvl((select bidCount from updateTable where publisher = ?),0)))")
result.foreach(row => {
println("row" + row)
val publisher = row.getString(0)
println("publisher " + publisher)
val bidCount = row.getInt(1)
println("bidcount : " + bidCount)
stmt.setString(1, publisher)
stmt.setInt(2, bidCount)
stmt.setString(3, publisher)
println("Prepared Statement after bind variables set: " + stmt.toString())
stmt.addBatch()
}
)
stmt.executeBatch()
conn.close()
})
snsc.start()
snsc.awaitTermination()
}
I have to update or insert to table updateTable, but during update command the current value have to added to the one from stream.
And now :
What I see when I execute the code:
select * from updateTable;
PUBLISHER |BIDCOUNT
--------------------------------------------
publisher333 |10
Then I sent message to kafka:
1488487984048,publisher333,adv1,web1,geo1,11,c1
and again select from updateTable:
select * from updateTable;
PUBLISHER |BIDCOUNT
--------------------------------------------
publisher333 |11
the Bidcount value is overwritten instead of added.
But when I execute the put into command from snappy-sql shell it works perfectly:
put into updateTable (publisher, bidcount) values ('publisher333',4+
(nvl((select bidCount from updateTable where publisher =
'publisher333'),0)));
1 row inserted/updated/deleted
snappy> select * from updateTable;
PUBLISHER |BIDCOUNT
--------------------------------------------
publisher333 |15
Could you help me with this case? Mayby someone has other solution for insert or update value using snappydata ?
Thank you in advanced.
bidCount value is read from tomi_update table in case of streaming but it's getting read from updateTable in case of snappy-sql. Is this intentional? May be you wanted to use updateTable in both the cases ?

Elasticsearch to Spark Streaming

I'm analyzing logs and I have this architecture:
kafka->spark streaming -> elastic search
My main goal is to create machine learning models in streaming. I think that I can do two things:
1) Kafka->spark Streaming (ML) -> elastic search
2) Kafka->spark Streaming-> elasticsearch -> spark streaming(ML)
-I think that the second architecture is the best since spark streaming will use indexed data directely. What do you think? is that correct?
-Can we easly connecte spark streaming to elasticsearch in real time?
-If we create a model in spark streaming (after elastic search) must we use this model in this place (after elasticsearch) or we can use it in spark streaming (directery after kafka) ? #use== predict in real time
-Does creating models after elasticsearch made our models static (or not in the real time approch)
Thank you.
You mean that?
kafka -> spark Streaming -> elasticsearch db
val sqlContext = new SQLContext(sc)
//kafka group
val group_id = "receiveScanner"
// kafka topic
val topic = Map("testStreaming"-> 1)
// zk connect
val zkParams = Map(
"zookeeper.connect" ->"localhost",
"zookeeper.connection.timeout.ms" -> "10000",
"group.id" -> group_id)
// Kafka
val kafkaConsumer = KafkaUtils.createStream[String,String,StringDecoder,StringDecoder](ssc,zkParams,topic,StorageLevel.MEMORY_ONLY_SER)
val receiveData = kafkaConsumer.map(_._2 )
// printer kafka data
receiveData.print()
receiveData.foreachRDD{ rdd=>
val transform = rdd.map{ line =>
val data = Json.parse(line)
// play json parse
val id = (data \ "id").asOpt[Int] match { case Some(x) => x; case None => 0}
val name = ( data \ "name" ).asOpt[String] match { case Some(x)=> x ; case None => "" }
val age = (data \ "age").asOpt[Int] match { case Some(x) => x; case None => 0}
val address = ( data \ "address" ).asOpt[String] match { case Some(x)=> x ; case None => "" }
Row(id,name,age,address)
}
val transfromrecive = sqlContext.createDataFrame(transform,schameType)
import org.apache.spark.sql.functions._
import org.elasticsearch.spark.sql._
//filter age < 20 , to ES database
transfromrecive.where(col("age").<(20)).orderBy(col("age").asc)
.saveToEs("member/user",Map("es.mapping.id" -> "id"))
}
}
/**
* dataframe schame
* */
def schameType = StructType(
StructField("id",IntegerType,false)::
StructField("name",StringType,false)::
StructField("age",IntegerType,false)::
StructField("address",StringType,false)::
Nil
)

jdbcTemplate.queryForList returns list of Map where all column values are NULL

Any calls using jdbcTemplate.queryForList returns a list of Maps which have NULL values for all columns. The columns should've had string values.
I do get the correct number of rows when compared to the result I get when I run the same query in a native SQL client.
I am using the JDBC ODBC bridge and the database is MS SQL server 2008.
I have the following code in my DAO:
public List internalCodeDescriptions(String listID) {
List rows = jdbcTemplate.queryForList("select CODE, DESCRIPTION from CODE_DESCRIPTIONS where LIST_ID=? order by sort_order asc", new Object[] {listID});
//debugcode start
try {
Connection conn1 = jdbcTemplate.getDataSource().getConnection();
Statement stat = conn1.createStatement();
boolean sok = stat.execute("select code, description from code_descriptions where list_id='TRIGGER' order by sort_order asc");
if(sok) {
ResultSet rs = stat.getResultSet();
ResultSetMetaData rsmd = rs.getMetaData();
String columnname1=rsmd.getColumnName(1);
String columnname2=rsmd.getColumnName(2);
int type1 = rsmd.getColumnType(1);
int type2 = rsmd.getColumnType(2);
String tn1 = rsmd.getColumnTypeName(1);
String tn2 = rsmd.getColumnTypeName(2);
log.debug("Testquery gave resultset with:");
log.debug("Column 1 -name:" + columnname1 + " -typeID:"+type1 + " -typeName:"+tn1);
log.debug("Column 2 -name:" + columnname2 + " -typeID:"+type2 + " -typeName:"+tn2);
int i=1;
while(rs.next()) {
String cd=rs.getString(1);
String desc=rs.getString(2);
log.debug("Row #"+i+": CODE='"+cd+"' DESCRIPTION='"+desc+"'");
i++;
}
} else {
log.debug("Query execution returned false");
}
} catch(SQLException se) {
log.debug("Something went haywire in the debug code:" + se.toString());
}
log.debug("Original jdbcTemplate list result gave:");
Iterator<Map<String, Object>> it1= rows.iterator();
while(it1.hasNext()) {
Map mm = (Map)it1.next();
log.debug("Map:"+mm);
String code=(String)mm.get("CODE");
String desc=(String)mm.get("description");
log.debug("CODE:"+code+" : "+desc);
}
//debugcode end
return rows;
}
As you can see I've added some debugging code to list the results from the queryForList and I also obtain the connection from the jdbcTemplate object and uses that to sent the same query using the basic jdbc methods (listID='TRIGGER').
What is puzzling me is that the log outputs something like this:
Testquery gave resultset with:
Column 1 -name:code -typeID:-9 -typeName:nvarchar
Column 2 -name:decription -typeID:-9 -typeName:nvarchar
Row #1: CODE='C1' DESCRIPTION='BlodoverxF8rin eller bruk av blodprodukter'
Row #2: CODE='C2' DESCRIPTION='Kodetilfelle, hjertestans/respirasjonstans'
Row #3: CODE='C3' DESCRIPTION='Akutt dialyse'
...
Row #58: CODE='S14' DESCRIPTION='Forekomst av hvilken som helst komplikasjon'
...
Original jdbcTemplate list result gave:
Map:(CODE=null, DESCRIPTION=null)
CODE:null : null
Map:(CODE=null, DESCRIPTION=null)
CODE:null : null
...
58 repetitions total.
Why does the result from the queryForList method return NULL in all columns for every row? How can I get the result I want using jdbcTemplate.queryForList?
The xF8 should be the letter ø so I have some encoding issues, but I can't see how that may cause all values - also strings not containing any strange letters (se row#2) - to turn into NULL values in the list of maps returned from the jdbcTemplate.queryForList method.
The same code ran fine on another server against a MySQL Server 5.5 database using the jdbc driver for MySQL.
The issue was resolved by using the MS SQL Server jdbc driver rather than using the JDBC ODBC bridge. I don't know why it didn't work with the bridge though.

Resources