Is there a way to extract window start time and window endtime in spark streaming windowing? - spark-streaming

I am having a DSTREAM on which I use window method. Then subsequently I do other operations like reduceByKey. Is it possible to add the window start time and end time to the DSTREAM data and use it as a key?
consider i have a DSTREAM with the following input schema:
(call_id, call_duration, call_count)
after window operation on the DSTREAM, is it possible produce the following output?
(window_start_time, window_end_time, average_call_duration, total_call_counts)

dstream.foreachRDD((rdd, time)=> {
// time is scheduler time for the batch job.it's interval was your window/slide length.
})
Use the time as the parameter of your ETL function.

Related

SAP Script Recording and Playback - Cumulative data

If I am using the Script Recording and Playback feature on same transaction for instance ME25, multiple times, I am getting cumulative data as part of multiple scripts rather than incremental data.
Explanation :
If I open ME25 details and enter "100-310" as Material and "Ball Bearing" as Short Text and stop the recording, I get the following script, which is expected behavior.
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/ctxtEBAN-MATNR[3,0]").text = "100-310"
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/txtEBAN-TXZ01[4,0]").text = "Ball Bearing"
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/txtEBAN-TXZ01[4,0]").setFocus
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/txtEBAN-TXZ01[4,0]").caretPosition = 12
After this, I restart the recording and type Qty Requested as "100" and delivery date as "21.04.2021" and stop the recording. I get the following script:
session.findById("wnd[0]").maximize
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/ctxtEBAN-MATNR[3,0]").text = "100-310"
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/txtEBAN-TXZ01[4,0]").text = "Ball Bearing"
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/txtEBAN-MENGE[5,0]").text = "100"
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/ctxtRM06B-EEIND[8,0]").text = "21.04.2021"
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/ctxtEBAN-EKGRP[9,0]").setFocus
session.findById("wnd[0]/usr/tblSAPMM06BTC_0106/ctxtEBAN-EKGRP[9,0]").caretPosition = 0
Instead of getting the incremental part that I typed for the second recording instance, I am getting complete script. Is there a way to achieve incremental scripts?
I can reproduce in my SAP GUI 7.60 (whatever screen it is; whatever kind of field it is, I can reproduce even with very simple fields like in a Selection Screen).
It seems that it happens all the time, even if you write your own recorder (a VBS script which mainly uses session.record = True + event handlers). It's due to the fact that SAP GUI sends all the screen events (i.e. the user actions) since the screen was entered, when the user presses a button, a function key, closes a window, or stops the SAP GUI Recorder.
If you write your own recorder, I guess you can save the screen contents when you start the recorder, and when the "change" events occur you may ignore those ones where the new field value has not changed since the recorder started. But that's a lot of work.
I think that it's far more easy to append manually the last lines of the last script to the initial script.

How write performance can be improved for RecordWriter

Can anyone help me out finding correct API to improve write performance?
We use MultipleOutputs<ImmutableBytesWritable, Result> class to write data we read from a table, we use the newly created file as a backup. We face performance issue in write using MultipleOutputs, it takes nearly 5 seconds for every 10000 records we write.
This is the code we use:
Result[] results = // result from another table
MultipleOutputs<ImmutableBytesWritable, Result> mos = new MultipleOutputs<ImmutableBytesWritable, Result> ();
for(Result res : results ){
mos.write(new ImmutableBytesWritable(result.getRow()), result, baseoutputpath);
}
We get a batch of 10000 rows and write them in a loop, with baseoutputpath changing depending on Result content.
We are facing performance dip when writing into MultipleOutputs, we suspect that it might be due to writing in a loop.
Is there any other API in maprdb or HBase which push data to database using fewer RPC calls by buffering upto certain limit.
We write data as records so no file system write class would work for us.
Please note that we use mapreduce job to do all of the above.

How to do aggregation on fixed-size count-based sliding window?

How do I implement a sliding window aggregation (or transformation) with a fixed-size count-based window?
For e.g: If I have stream data like the following
input stream = 1,2,3,4,5,6,7,8...
Assume that time is not relevant here. And say my aggregate function is AVERAGE and window size is fixed at 3 records (not 3 millis, 3 secs, 3 hours etc), I would like my output stream to be
output stream = avg(1,2,3), avg(2,3,4), avg(3,4,5), avg(4,5,6), avg(5,6,7)... = 2,3,4,5,6...
The Windows documented in Kafka streams work are "time-based". Even the constructor of base class Window has following signature:
Window(long startMs, long endMs)
So I was not sure if it's the right tool to do non-time based windowing aggregating.
Apache Flink supports count-based sliding and tumbling windows. That's exactly what I need, but I'm looking for a similar feature in Kafka Streams.
If time-ordering is no concern for you, you can implement a custom Transformer with attached state.
StreamsBuilder builder = new StreamsBuilder();
builder.addStoreStore(...); // add KeyValueStore here
KStream result = builder.stream("topic").transform(...); // pass in name of your KeyValueStore, too
For you custom Transformer you can maintain a List per key with the list being your window -- as long as the list is smaller than you window-size you append new record to the list -- if it's exactly the size, you trigger the computation -- if it exceeds the size, you trim it and trigger the computation afterwards.
See the docs for more details: https://kafka.apache.org/10/documentation/streams/developer-guide/processor-api.html (Note, that a Processor and a Transformer are basically the same thing.)
If you wish to use Apache Storrm which is also an streaming engine, kafka can be connected as a data source to it. Storm new version provides a concept called Tumbling Window, which delivers exact number of tuple to your topology. This can easily be used to solve your problem.
For more have a look at https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_storm-component-guide/content/storm-windowing-concepts.html

Visual Basic: How to use timer properly

I'm trying to write a simple program that could perform some tasks at specified time.
Here's what I have:
If (TimeOfDay = "06:12:50") Then
MsgBox(TimeOfDay)
End If
If (TimeOfDay = "06:13:58") Then
MsgBox(TimeOfDay)
End If
This code is placed inside Timer1_Tick, I set time interval - 1000 and it works OK, I get TimeOfDay value in MsgBox when current time is equal to my specified time.
But what should I do to make it work dynamically? For example: I want to type TIME value via TextBox and pass it to Timer1_Tick I need to do it as many times as I want so everytime current time matches with my specified hour,minute,second it would work, but I don't know where I have to put my code, because if I place code in while loop and in Time_Ticker1 it runs while loop every second and UI crashes immediately.
Thank you in advance for your help!
Have you considered setting a Windows Scheduled event of MSG to yourself using the AT command line? The operating system timer/scheduler, dialog, storage and queue are already there and the MSG can optionally be dismissed if there is no one to receive it within a set amount of time. Example to send the time at 06:12:15 run the following into a command shell.
AT 06:12:15 msg %USERNAME% It is 06:12:15 am

ns3 execute function during simulation at regular time interval

How can I execute a function every 20us for example?
I aim to log the busy/free state on a CSMA channel using ns3::CsmaChannel::IsBusy (is that the best way to do it?) I have to call it at regular time interval to log each returned value and time.
You can use
simulator:: schedule function
to schedule function on a regular time interval
Simulator:: Schedule (Seconds(20.0), &function name, parameter1, parameter 2)
This is the syntax for scheduling an event for every 20 seconds

Resources