Spark UI Output Op Duration vs Job Duration: What's the difference? - spark-streaming

On Spark UI page, what is the difference of column: "Output Op Duration" and "Job Duration"?

From Sparks mailing list:
"It means the total time to run a batch, including the Spark job duration +
time spent on the driver. E.g.,
foreachRDD { rdd =>
rdd.count() // say this takes 1 second.
Thread.sleep(10000) // sleep 10 seconds
}
In the above example, the Spark job duration is 1 seconds and the output op
duration is 11 seconds."

Related

NiFi List File Processor - Interrupt run schedule and restart the run schedule when new incoming file

Scenario:
I have a List File Processor looking out for incoming file ("file 1"). I scheduled it to start picking up the "file 1" 20s after the "file 1" is downloaded.
Let assume that the List File Processor noticed the incoming "file 1" and started the delay 20 sec before picking the "file 1".
In the middle of 20 sec delay, there is an new incoming file ("file 2") noticed by the List File Processor.
List File Processor will be interrupt and reset the 20 sec schedule time when ("file 2") appear within the previous 20 sec delay
Could the List File Processor in the middle of 20 sec delay be interrupt and restart the initial delay? Appreciate help with explanation and example.
Thanks.

In performance test , what happen if the test will take 10 second and duration will finished in 5 minutes?

My scenario is , I record a test and it will take 10 second to finish ,after that I put a duration and make it 5 minutes , so my question is , is my test will be take time same as duration ? or the test will be finished in 10 second but the result will display after 5 minutes?
Test will be finished:
When the last Sampler is executed
Or the time set in "Duration" passes
whatever comes the first
If you have only 1 loop on Thread Group level - all the samplers will be executed once, if you have 2 loops - they will be executed twice, etc. The "duration" constraint is applicable at any case.
More information: Getting Started with JMeter - A Basic Tutorial

How can I give dependancy to every(repeat Job ) in TWS

How can I add dependency to Every Job for example
there are 2 job in one jobstream,both are every job mean they run at every 30 min.but I want to implement one condition in between.
Condition: 2nd job will run only after completion of 1st for every 30 min mean each instance of 2nd job will run only after each instance of 1st job
Please give me solution.I need this
Job1
every 30 min
at 10.30
Job2
every 30 min
at 10.30
follow job1
For this scenario you cannot use the every on the job, that let each job to repeat by its own and how you have seen let the 2nd job to run after the 1st job has completed the first time.
In order to have the dependency considered at each run you have to include the 2 jobs in a job stream and repeat the whole job stream
There are two possible solutions for that, depending on your scenario:
Use the every on job stream
SCHEDULE JS1
ON RUNCYCLE RC1 "FREQ=DAILY;INTERVAL=1"
( SCHEDTIME 1030 EVERY 0030 EVERYENDTIME 1800 )
ONOVERLAP ENQUEUE
:
JOB1
JOB2
FOLLOWS JOB1
END
Add a 3rd job after job2 that resubmit the job stream using conman sbs. In this case you can use datecalc to calculate the AT time of the new instance.

Spark Streaming:how to sum up all result for several DStreams?

I am now using Spark Streaming + Kafka to construct my message processing system.But I have a little technical problem , I will describe it below:
For example , I want to do a wordcount for each 10 minutes,So, in my earliest code,I set Batch Interval to 10 minutes.Code is like below:
val sparkConf = new SparkConf().setAppName(args(0)).setMaster(args(1))
val ssc = new StreamingContext(sparkConf, Minutes(10))
But I don't think it is a very good solution because 10 minutes is what a long time and large amount of data that my memory cannot sustain so much data.So , I want to reduce batch interval to 1 minutes, like:
val sparkConf = new SparkConf().setAppName(args(0)).setMaster(args(1))
val ssc = new StreamingContext(sparkConf, Minutes(1))
Then the problem comes:How can I sum up the result of 10 minutes for ten '1 minutes'? I think this word can only be done in driver instead of worker program,what can I do?
I am new learner of Spark Streaming.Any one can give me a hand?
Maybe I have my idea. In this condition ,I should use stateful function like UpdateStateByKey() because , since what I want is a global 10 minutes' result but what I can get is just each intermediate result of each 1 minute , so before each 10 minutes end , I have to record the state of each 1 minute , such as the word count result of each 1 minute and add them up for each 1 minute.
Posting here as I had a similar issue and came across the Window Operations section of Spark Streaming. In the poster's original case, they want a count for the past 10 minutes, done every 10 minutes although their program calculates counts each 1 minute. Assuming we have counts defined and calculated as the standard word count (i.e. at a 1-minute batch duration, with tuples (word, count)), we could follow the linked guide and define something along the lines of
// Reduce/count last 10 seconds worth of data, every 10 seconds
val windowedWordCounts = counts.reduceByKeyAndWindow(_+_, Seconds(10), Seconds(10))
where _+_ is a sum function.

Total time taken by jmeter to execute the given load

I am performing load test with these parameters :
threads=4
ramp_up_period=90
loop_count=60
So according to above numbers, my assumption is that each one of the four thread will be created in 22.25 seconds and this 4 thread cycle will be repeated 60 times.
Below is the load test summarized report :
According to JMeter manual ramp up period is :
The ramp-up period tells JMeter how long to take to "ramp-up" to the full number of threads chosen. If 10 threads are used, and the ramp-up
period is 100 seconds, then JMeter will take 100 seconds to get all 10
threads up and running. Each thread will start 10 (100/10) seconds
after the previous thread was begun. If there are 30 threads and a
ramp-up period of 120 seconds, then each successive thread will be
delayed by 4 seconds.
So according to above scenarios approximate total time for executing load test with mentioned thread group parameters is :
TotalTime = ramp_up_period*loop_count
which in my case evaluates to 90*60 = 5400 seconds, but according to summariser Total Time is coming 74 seconds
JMeter version is 2.11.
Is there is any problem in my understanding or there is some issue with JMeter ?
Initially JMeter will start 1 thread which will be doing something, which is under your Loop Controller. In 30 seconds second thread will join, in 30 more seconds 3rd thread will start and finally on 90th second 4th thread will start.
Starting from 90 second 4 threads will be doing "what is under your loop controller".
There is no way to determine how long it would take, especially under the load. If you need a load test to last approximately N seconds you can use Duration input under Sheduler in Thread Group.
If you want to forcefully stop the test if certain conditions are met there are 2 more options:
Use Test Action Sampler
Use Beanshell Sampler
Example Beanshell code (assumed to be run in separate thread group in endless loop with reasonable delay between firing events)
if (currenttime - teststart > Long.parseLong(props.get("test_run_time").toString())) {
try {
DatagramSocket socket = new DatagramSocket();
byte[] buf = "StopTestNow".getBytes("ASCII");
InetAddress address = InetAddress.getByName("localhost");
DatagramPacket packet = new DatagramPacket(buf, buf.length, address, 4445);
socket.send(packet);
socket.close();
} catch (Throwable ex) {
}
}
TotalTime would be that if you were working without concurrency. When working in a multi-threaded environment it can happen that thread 1 is already performing its second call when thread 3 is still firing up.

Resources