KStream left join doesn't produce record on left side null - spring

I have the following left join to sink code, which should have created a record when healthStream got a message, but it's only producing a record when both streams have a message.
val cncStream = streamsBuilder.stream(config.cncManagerTopic,
Consumed.with<String, AgentMessage>(Serdes.String(), AgentMessageSerde()))
val healthStream = streamsBuilder.stream(config.healthTopic,
Consumed.with<String, AgentMessage>(Serdes.String(), AgentMessageSerde()))
healthStream.leftJoin(cncStream,
{ healthMsg, logsMessage -> Agent(
healthMsg.generateKey(), healthMsg.context.customerId, healthMsg.context.agentId, healthMsg.healthMsg.status.toString(),
healthMsg.healthMsg.message, LocalDateTime.now(), healthMsg.healthMsg.metrics,
LogStatus((logsMessage?.logsUploadedMsg?.ok == true), logsMessage?.logsUploadedMsg?.error.orEmpty(), logsMessage?.logsUploadedMsg?.path.orEmpty()) )
},
JoinWindows.ofTimeDifferenceWithNoGrace(Duration.ofMinutes(5)),
StreamJoined.`as`(KafkaAgentsConfiguration.GetStoreName(config.agentsTopic)))
.to(config.agentsTopic, Produced.with(Serdes.String(), AgentSerde()))
Is it a bug? Am I doing something wrong?
JoinWindow duration doesn't change the results.

Related

Not getting right value from state store

I am trying to use state store to merge multiple kafka streams. As part of it , I am consuming messages from multiple topics and put them in state store with keys for Ex :
message from topic1 saved in state store as key_p1 and value1
message from topic2 saved in state store as key_p2 and value2
message from topic3 saved in state store as key_p3 and value3.
To meet SLA , I tired to query the state store to verify if I received mandatory transactions (for ex on topic2 and topic3 , with values key_p2 and key_p3).
val priorityTxns: KeyValueIterator[String, ValueAndTimestamp[String]] = kvStore.range(key_p2,key_p3)
Though i have the messages in state store , most of the times (not always ) I only get only one message.
Is there a way to refresh the store before querying?
Code in my transform method :
`override def transform(kafkaKey: String, value: String): KeyValue[String,String] ={
var key=""
var taggedMsg=""
if((kafkaKey != null) && (kafkaKey != "")) {
key=kafkaKey+txnKeys.get(msgTag).get // this will be like key_p1,key_p2,key_p3 etc.
kvStore.put(key, ValueAndTimestamp.make(taggedMsg, context.timestamp))
log.info("saving value in state store with key " + key + " and value " + kvStore.get(key))
}
}`
and code in inti(context ProcessorContext)
val priorityTxns: KeyValueIterator[String, ValueAndTimestamp[String]] = kvStore.range(key_p2,key_p3)
val tempP3 = kvStore.get(rangeKey + "_p3")
val tempP2 = kvStore.get(rangeKey + "_p2")
while (priorityTxns.hasNext) {
log.info("available priority keys " + priorityTxns.peekNextKey())
val e = priorityTxns.next()
}

SQLRPGLE & JSON_OBJECT CTE Statements -101 Error

This program compiles correctly, we are on V7R3 - but when running it receives an SQLCOD of -101 and an SQLSTATE code is 54011 which states: Too many columns were specified for a table, view, or table function. This is a very small JSON that is being created so I do not think that is the issue.
The RPGLE code:
dcl-s OutFile sqltype(dbclob_file);
xfil_tofile = '/ServiceID-REFCODJ.json';
Clear OutFile;
OutFile_Name = %TrimR(XFil_ToFile);
OutFile_NL = %Len(%TrimR(OutFile_Name));
OutFile_FO = IFSFileCreate;
OutFile_FO = IFSFileOverWrite;
exec sql
With elm (erpRef) as (select json_object
('ServiceID' VALUE trim(s.ServiceID),
'ERPReferenceID' VALUE trim(i.RefCod) )
FROM PADIMH I
INNER JOIN PADGUIDS G ON G.REFCOD = I.REFCOD
INNER JOIN PADSERV S ON S.GUID = G.GUID
WHERE G.XMLTYPE = 'Service')
, arr (arrDta) as (values json_array (
select erpRef from elm format json))
, erpReferences (refs) as ( select json_object ('erpReferences' :
arrDta Format json) from arr)
, headerData (hdrData) as (select json_object(
'InstanceName' : trim(Cntry) )
from padxmlhdr
where cntry = 'US')
VALUES (
select json_object('header' : hdrData format json,
'erpReferenceData' value refs format json)
from headerData, erpReferences )
INTO :OutFile;
Any help with this would be very much appreciated, this is our first attempt at creating JSON for sending and have not experienced this issue before.
Thanks,
John
I am sorry for the delay in getting back to this issue. It has been corrected, the issue was with the "values" statement.
This is the correct code needed to make it work correctly:
Select json_object('header' : hdrData format json,
'erpReferenceData' value refs format json)
INTO :OutFile
From headerData, erpReferences )

The name 'devicetypes' is not in scope on the right side of 'equals'. Consider swapping the expressions on either side of 'equals'

I am trying to get data out of my db but i am getting the above mentioned error on this line. Please HELP!!!!
join specvalue in db.Types on devicespecifications.DeviceTypeFKID equals devicetypes.DeviceTypeID
I have Tried switching the equals but it doesn't work. Please Help
List<DeviceDetails> devicedetails = (
from devices in db.Device
join devicespecifications in db.DeviceSpecifications on devices.DeviceID equals devicespecifications.DeviceFKID
join devicetypes in db.Types on devices.DeviceTypeFKID equals devicetypes.DeviceTypeID
join specvalue in db.Types on devicespecifications.DeviceTypeFKID equals devicetypes.DeviceTypeID // This Line is giving me the above mentioned error
join devicehistories in db.DeviceHistory on devices.DeviceID equals devicehistories.DeviceFKID
join locations in db.Locations on devices.LocationFKID equals locations.LocationID
join ips in db.IP on devices.DeviceID equals ips.DeviceFKID
where devices.DeviceID == id
select new DeviceDetails()
{
DeviceID = devices.DeviceID,
DeviceName = devices.DeviceName,
EntryDate = devices.EntryDate,
AssignDate = devices.AssignDate,
DeviceStatus = devices.DeviceStatus.ToString(),
MACAddress = devices.MACAddress,
DateRepaired= devicehistories.DateRepaired,
Remarks= devicehistories.Remarks,
SpecificationType = devicespecifications.DeviceTypeFKID,
devicetypes.DeviceTypeID,
SpecificationValue = devicespecifications.SpecificationValue,
FamilyIP = ips.FamilyIP,
ChildIP = ips.ChildIP,
LocationTypeValue = locations.LocationTypeValue,
DeviceTypeValue = devicetypes.DeviceTypeValue
}).ToList<DeviceDetails>();
return devicedetails;
}
In the mentioned row:
join specvalue in db.Types on devicespecifications.DeviceTypeFKID equals devicetypes.DeviceTypeID
you use devicetypes name again but you should use specvalue in this line.

read json key-values with hive/sql and spark

I am trying to read this json file into a hive table, the top level keys i.e. 1,2.., here are not consistent.
{
"1":"{\"time\":1421169633384,\"reading1\":130.875969,\"reading2\":227.138275}",
"2":"{\"time\":1421169646476,\"reading1\":131.240628,\"reading2\":226.810211}",
"position": 0
}
I only need the time and readings 1,2 in my hive table as columns ignore position.
I can also do a combo of hive query and spark map-reduce code.
Thank you for the help.
Update , here is what I am trying
val hqlContext = new HiveContext(sc)
val rdd = sc.textFile(data_loc)
val json_rdd = hqlContext.jsonRDD(rdd)
json_rdd.registerTempTable("table123")
println(json_rdd.printSchema())
hqlContext.sql("SELECT json_val from table123 lateral view explode_map( json_map(*, 'int,string')) x as json_key, json_val ").foreach(println)
It throws the following error :
Exception in thread "main" org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: SELECT json_val from temp_hum_table lateral view explode_map( json_map(*, 'int,string')) x as json_key, json_val
at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:239)
at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:50)
at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:49)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
This would work, if you rename "1" and "2" (key names) to "x1" and "x2" (inside the json file or in the rdd):
val resultrdd = sqlContext.sql("SELECT x1.time, x1.reading1, x1.reading1, x2.time, x2.reading1, x2.reading2 from table123 ")
resultrdd.flatMap(row => (Array( (row(0),row(1),row(2)), (row(3),row(4),row(5)) )))
This would give you an RDD of tuples with time, reading1 and reading2. If you need a SchemaRDD, you would map it to a case class inside the flatMap transformation, like this:
case class Record(time: Long, reading1: Double, reading2: Double)
resultrdd.flatMap(row => (Array( Record(row.getLong(0),row.getDouble(1),row.getDouble(2)),
Record(row.getLong(3),row.getDouble(4),row.getDouble(5)) )))
val schrdd = sqlContext.createSchemaRDD(resultrdd)
Update:
In the case of many nested keys, you can parse the row like this:
val allrdd = sqlContext.sql("SELECT * from table123")
allrdd.flatMap(row=>{
var recs = Array[Record]();
for(col <- (0 to row.length-1)) {
row(col) match {
case r:Row => recs = recs :+ Record(r.getLong(2),r.getDouble(0),r.getDouble(1));
case _ => ;
}
};
recs
})

How can I make this big nasty query more efficient?

This is for a PostgreSQL database. I have a view where I am joining a table to itself 15 times to create what we call levels, from there I am doing a COALESCE function on the 'levels'- then a little manipulation on that field, as well. I am also pulling what is a description field for each of the 15 levels. This is where my query became sluggishly slow. I am joining the SETHEADERT table to the multiple levels to get the description field for each level. As you can see I only have 3 description fields and it is currently taking very long to run. When I had 2 it took a little bit but wasn't bad. I hope this makes sense. My code is below. Any help on how to make this more efficient is greatly appreciated.
SELECT
subset_cls,
prctr1,
CASE
WHEN prctr1 LIKE 'PC%' THEN split_part( overlay(prctr1 placing '00000' from 1 for 2 ),'.',1)
ELSE prctr1 end as pctrl2,
LVL01,
desc01,
LVL02,
desc02
FROM
( SELECT
SRC.SAP_SETNODE.SUBSET_CLS AS SUBSET_CLS,
SRC.SAP_SETHEADERT.DESCRIPTION AS desc01,
DESC_02.DESCRIPTION AS desc02,
DESC_03.DESCRIPTION AS desc03,
SRC.SAP_SETNODE.SET_NAME AS LVL01,
SRC.SAP_SETNODE.SUBSET_NAME AS LVL02,
SETNODE_1.SUBSET_NAME AS LVL03,
SETNODE_2.SUBSET_NAME AS LVL04,
SETNODE_3.SUBSET_NAME AS LVL05,
SETNODE_4.SUBSET_NAME AS LVL06,
SETNODE_5.SUBSET_NAME AS LVL07,
SETNODE_6.SUBSET_NAME AS LVL08,
SETNODE_7.SUBSET_NAME AS LVL09,
SETNODE_8.SUBSET_NAME AS LVL10,
SETNODE_9.SUBSET_NAME AS LVL11,
SETNODE_10.SUBSET_NAME AS LVL12,
SETNODE_11.SUBSET_NAME AS LVL13,
SETNODE_12.SUBSET_NAME AS LVL14,
SETNODE_13.SUBSET_NAME AS LVL15,
COALESCE(
SETNODE_13.SUBSET_NAME,
SETNODE_12.SUBSET_NAME,
SETNODE_11.SUBSET_NAME,
SETNODE_10.SUBSET_NAME,
SETNODE_9.SUBSET_NAME,
SETNODE_8.SUBSET_NAME,
SETNODE_7.SUBSET_NAME,
SETNODE_6.SUBSET_NAME,
SETNODE_5.SUBSET_NAME,
SETNODE_4.SUBSET_NAME,
SETNODE_3.SUBSET_NAME,
SETNODE_2.SUBSET_NAME,
SETNODE_1.SUBSET_NAME,
SRC.SAP_SETNODE.SUBSET_NAME,
SRC.SAP_SETNODE.SET_NAME)
AS prctr1
FROM SRC.SAP_SETNODE
LEFT JOIN SRC.SAP_SETHEADERT ON SRC.SAP_SETHEADERT.SET_NAME = SRC.SAP_SETNODE.SET_NAME
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_1 ON SRC.SAP_SETNODE.SUBSET_NAME = SETNODE_1.SET_NAME
AND SRC.SAP_SETNODE.SUBSET_CLS = SETNODE_1.SUBSET_CLS
LEFT JOIN SRC.SAP_SETHEADERT as DESC_02 ON DESC_02.SET_NAME = SETNODE_1.SET_NAME
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_2 ON SETNODE_1.SUBSET_NAME = SETNODE_2.SET_NAME
AND SETNODE_1.SUBSET_CLS = SETNODE_2.SUBSET_CLS
LEFT JOIN SRC.SAP_SETHEADERT as DESC_03 ON DESC_03.SET_NAME = SETNODE_2.SET_NAME
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_3 ON SETNODE_2.SUBSET_NAME = SETNODE_3.SET_NAME
AND SETNODE_2.SUBSET_CLS = SETNODE_3.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_4 ON SETNODE_3.SUBSET_NAME = SETNODE_4.SET_NAME
AND SETNODE_3.SUBSET_CLS = SETNODE_4.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_5 ON SETNODE_4.SUBSET_NAME = SETNODE_5.SET_NAME
AND SETNODE_4.SUBSET_CLS = SETNODE_5.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_6 ON SETNODE_5.SUBSET_NAME = SETNODE_6.SET_NAME
AND SETNODE_5.SUBSET_CLS = SETNODE_6.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_7 ON SETNODE_6.SUBSET_NAME = SETNODE_7.SET_NAME
AND SETNODE_6.SUBSET_CLS = SETNODE_7.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_8 ON SETNODE_7.SUBSET_NAME = SETNODE_8.SET_NAME
AND SETNODE_7.SUBSET_CLS = SETNODE_8.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_9 ON SETNODE_8.SUBSET_NAME = SETNODE_9.SET_NAME
AND SETNODE_8.SUBSET_CLS = SETNODE_9.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_10 ON SETNODE_9.SUBSET_NAME = SETNODE_10.SET_NAME
AND SETNODE_9.SUBSET_CLS = SETNODE_10.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_11 ON SETNODE_10.SUBSET_NAME = SETNODE_11.SET_NAME
AND SETNODE_10.SUBSET_CLS = SETNODE_11.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_12 ON SETNODE_11.SUBSET_NAME = SETNODE_12.SET_NAME
AND SETNODE_11.SUBSET_CLS = SETNODE_12.SUBSET_CLS
LEFT JOIN SRC.SAP_SETNODE AS SETNODE_13 ON SETNODE_12.SUBSET_NAME = SETNODE_13.SET_NAME
AND SETNODE_12.SUBSET_CLS = SETNODE_13.SUBSET_CLS
GROUP BY SRC.SAP_SETNODE.SUBSET_CLS, SRC.SAP_SETHEADERT.DESCRIPTION, DESC_02.DESCRIPTION,
DESC_03.DESCRIPTION, SRC.SAP_SETNODE.SET_NAME,
SRC.SAP_SETNODE.SUBSET_NAME, SETNODE_1.SUBSET_NAME, SETNODE_2.SUBSET_NAME,
SETNODE_3.SUBSET_NAME, SETNODE_4.SUBSET_NAME, SETNODE_5.SUBSET_NAME,
SETNODE_6.SUBSET_NAME, SETNODE_7.SUBSET_NAME, SETNODE_8.SUBSET_NAME,
SETNODE_9.SUBSET_NAME, SETNODE_10.SUBSET_NAME, SETNODE_11.SUBSET_NAME,
SETNODE_12.SUBSET_NAME, SETNODE_13.SUBSET_NAME
HAVING SRC.SAP_SETNODE.SUBSET_CLS='0101' AND SRC.SAP_SETNODE.SET_NAME='SISW.'
||get_fy_part('YEAR', clock_timestamp())
ORDER BY SRC.SAP_SETNODE.SET_NAME, SRC.SAP_SETNODE.SUBSET_NAME, SETNODE_1.SUBSET_NAME,
SETNODE_2.SUBSET_NAME, SETNODE_3.SUBSET_NAME, SETNODE_4.SUBSET_NAME, SETNODE_5.SUBSET_NAME,
SETNODE_6.SUBSET_NAME, SETNODE_7.SUBSET_NAME
) foo
If you move the filtering to the where clause there will be less rows to be joined. Change this:
group by ...
HAVING
SRC.SAP_SETNODE.SUBSET_CLS = '0101' AND
SRC.SAP_SETNODE.SET_NAME = 'SISW.' || get_fy_part('YEAR', clock_timestamp())
to
where
SRC.SAP_SETNODE.SUBSET_CLS = '0101' AND
SRC.SAP_SETNODE.SET_NAME = 'SISW.' || get_fy_part('YEAR', clock_timestamp())
group by ...

Resources