questions on time usage reported by SLURM

questions on time usage reported by SLURM - time

I have problems understanding the time usage report below:
1) why the times for job step 1 & 2 do not add up to the batch line?
2) what is the relationship between each column, especially for TotalCPU and CPUTime?
3) for time usage of the job, which one is best to report?
$ sacct -o JOBID,AllocCPUs,AveCPU,reqcpus,systemcpu,usercpu,tot
alcpu,cputime,cputimeraw -j 649176
JobID AllocCPUS AveCPU ReqCPUS SystemCPU UserCPU TotalCPU CPUTime CPUTimeRAW
------------ ---------- ---------- -------- ---------- ---------- ---------- ---------- ----------
649176 24 24 00:02.047 01:06.896 01:08.943 00:23:36 1416
649176.batch 24 00:00:00 24 00:00.027 00:00.014 00:00.041 00:23:36 1416
649176.0 24 00:00:00 24 00:00.813 00:24.886 00:25.699 00:08:48 528
649176.1 24 00:00:18 24 00:01.207 00:41.996 00:43.203 00:14:24 864

1) why the times for job step 1 & 2 do not add up to the batch line?
The time reported for .batch for SystemCPU, UserCPU and TotalCPU is the time spend running the commands in the batch file, not counting the spawned processes[1]. CPUTime and CPUTimeRAW do count the spawned processes and thus they add up to the lines corresponding to the job steps.
2) what is the relationship between each column, especially for
TotalCPU and CPUTime?
TotalCPU is the sum of UserCPU and SystemCPU of each CPU, while CPUTime is the elapsed time multiplied by the number requested CPU. The difference between both is the time spent with the CPUs doing nothing (neither in user mode nor in kernel mode), most of the time waiting for I/O [2]
3) for time usage of the job, which one is best to report?
It depends on what you want to show. Elapsed (which you did not show here) gives the "time to solution". CPUTimeRAW is what is often accounted and paid for. Difference between CPUTime and TotalCPU gives information about the I/O overhead.
[1] From the man page
SystemCPU The amount of system CPU time used by the job or job step. The format of the output is identical to that of the
Elapsed field.
NOTE: SystemCPU provides a measure of the task’s parent process and does not include CPU time of child
processes.
[2] https://en.wikipedia.org/wiki/CPU_time

Related

JMeter ramp-up vs duration.

Let's say that I have the current configuration :
Number of Threads (users): 150
Ramp-up : 30
Loop Count : None
If I add up a duration of 2 mins so :
Number of Threads (users): 150
Ramp-up : 30
Loop Count : None
Duration (minutes) : 2
How does Jmeter going to react? If each Threads take about 10 seconds to complete
Thanks in advance

Both Loop Count and Duration (if both present) are taken into account, whichever comes first. So in first configuration, you are not limiting loop count or duration, so the script will run "forever". In second case, loop count is still not limited, but duration is. So the test will stop 2 minutes after startup of the very first user, and the time includes ramp-up time. Stopping includes not running new samplers, and hard stop for all running samplers.
In your case, 150 users will finish starting after 30 sec. That means the first thread to run will complete 3 iterations (x10 sec) by the time the last thread just started its first.
Within the remaining 90 sec, all threads will complete roughly 8-9 iterations.
So for the first thread you should expect 11-12 iterations, for the very last thread to start, 8-9 iterations. Remaining threads anywhere between those numbers. Assuming ~30 threads executed same number of iterations, between 8 and 12, you get roughly 1500 iterations in total (could be a little over or under). Last iteration of each thread may be incomplete (e.g. some samplers did not get to run before test ran out of time).
Generally, since duration may leave unfinished iterations, I think it's only good as a fall back or threshold in pipeline automation. For example: run is configured to complete 1000 iterations (should take about 16 min if iteration takes 10 sec). So duration is set to 24 min (gives about 50% slack). It won't be needed if performance is decent, but if execution takes extremely long, we may hard stop it at 24 min, since there's no point to continue: we already know something is wrong.

scheduling order (timeline) for FCFS, SJN, SRT, Round Robin

I got this question from my friend, new to the operating system area.
We got:
Job number 1
Arrival Time = 0, CPU cycles = 80,
Job number 2
Arrival Time = 22, CPU cycles = 60,
Job number 3
Arrival Time = 44, CPU cycles = 24,
Job Number 4
Arrival Time = 55, CPU cycles = 40.
How can I do the scheduling order for FCFS, SJN, SRT, Round Robin (using a time quantum of 20).
Thank you. If you can give me any ideas...

FCFS: stands for first come first serve Means the job that comes first must be executed at first so it will do it it's work in this way.
Job number 1 then Job number 2 then Job number 3 then Job Number 4.
It will not stop executing when the new process comes in. It has disadvantage that a short process has to wait for a longer process to finish because suppose a shorter process comes at end then it has to wait for all long processes on queue to end.
Shortest job first
This can't be implemented for short-term-scheduler.I will explain why at the end
You next algorithm is SRT is based on this algorithm. Here we get jobs in an order and we don't swith the cpu if the new job is short. However they arrival time of the processes is different so it will first execute job1 then job2 then job3 then job4. May be you think that job3is shorter and it should switch cpu but that only happends if you are allowed to preempt the process and if that's the case then it's not SJB but SRT.
However if you have two process at time 0 with length/burst = 80 and 40 then it will pick the process whose burst is 40.
Only for long term scheduler This algorithm is not for short-term scheduler because you need to know the length/burst of next process which takes time or you can also try to approximate time , there is a complex formula but that does not work always.
SRT: stands for shortest remaining time first: A premptive algorithm/version of shortest job first.
First job : arrival time 0 and CPU cycle 80
second job: arrival time 22 and CPU cycle 60
3rd job : arrival time 44 and CPU cycle 24
4rth job: arrival time 55 and CPU cycle 40
At time 0 the only process you have is F1(means first job) after 22 m-seconds a new process is placed in queue so the scheduler computes it's length and first job was executed for 22 m-seconds now it's remaining time is 80-22 = 58 and the length of new process is 60 which is greater than 58 so scheduler does not switch to the new process. After 44 m-seconds a third job comes in so you have 80-44 = 36 and time required by 3rd job is only 24 so the scheduler places all results of that process on stack and allocatess CPU to the third job/process.
After 55 second another process is placed on the queue with length 40 but the third process is still shorted so the first job is process 3 which will be finished for the first.
Job 1 runs for 44 seconds : remaining time = 36 at this point Job 3 gets processor.
job 3 runs for 24 seconds: finished.
At this point we have job 1 with 36 m-seconds , job 2 with 60 m-seconds and job 4 with 40 seconds. So scheduler allocates again CPU to job 1 because it's the shortest and once it's finished then it allocates to job 4 and then job 2.
Disadvantage: Too much preemption require also time.
Round-Robin: Similar to First-come-first-serve the only difference is that it allocates equal time to each process(means allows preemption where fcfs does not allow). Here you use time quantum of few m-seconds.
There are two possibilities: Either the process completes it's burst then the process releases the CPU voluntarily if not the after the time quantum is over the scheduler take the CPU away from this process and allocates it to the next.
If you have N process in queue then every process will get 1/n cpu time in chunks of at most q time units.
Job1 gets CPU for 20 m-seconds and because there is no other process in queue then this job1 gets again cpu for next 20 sm-seconds.
As the first job has executed 2*20 this time there has been a new process(job2) in the queue so this process will get CPU for 20 seconds. When the job2 was executing there are two other process placed in the queue.(job3 and job4).
Now job3 gets the CPU for 20 seconds and then job4 gets CPU for 20 m-seconds.
After first cycle:
Job1 has executed 2*20 ; remaining time = 40
job2 has executed 20 ; remaining time = 40
job3 has executed 20; remaining time = 4
job4 has executed 20; remaining time = 20.
After second cycle:
- Job1 has executed 40-20 ; remaining time = 20
job2 has executed 40-20 ; remaining time = 20
job3 has executed 4-4; remaining time = finished/only executed for 4 seconds.
job4 has executed 20-20; remaining time = 0/finished.
After third cycle:
- Job1 has executed 20 - 20 ; remaining time = 0
job2 has executed 20 -20 ; remaining time = 0
The performance of this algorithm depends on the size of time-quantum. If you have small time quantum then too many preemptions/context-switches (moving processes to stack and allocating cpu to new process) is also not good and if time quantum is too large then it's also not good and try to think why? :D

In a Slurm cluster job, how can AveCPU time be greater than CPUTime?

JobID JobName MaxRSS Elapsed AveCPU CPUTime SystemCPU
------------ ---------- ---------- ---------- ---------- ---------- ----------
16260894 GP 06:29:33 2-16:55:30 05:49:13
16260894.ba+ batch 3336K 06:29:33 00:00:00 2-16:55:30 00:00.008
16260894.0 gp_wrappe+ 5566876K 06:29:33 3-11:40:54 2-16:55:30 05:49:13
Above is the sacct output for a job I ran on a slurm cluster. I used 10 CPU, 1 task and 1 node. I'm still not too familiar with using slurm clusters, but here's what I think I understand: CPUTime represents the max amount of cpu time the jobs could have taken, and in this case is just equal to n_cpus * Elapsed. However, my thought was that AveCPU was the actual cpu time used in the program.
Not only would I expect it to be less than CPUTime in principle, because I can't use more resources than I have. But also, I would expect my script to only be using the max processing power (all 10 cores) a fraction of the time since only some parts of the processing are fully multithreaded. So how can it be that AveCPU is 3.5 days, while CPUTime is 2.7 days?

Stanford CoreNLP Server disable logging

I have the feeling that the logging of the server is quite exhaustive. Is there a way to disable or reduce the logging output? It seems that if I send a document to the server it will write the content to stdout which might be a performance killer.
Can I do that somehow?
Update
I found a way to suppress the output from the server. Still my question is how and if I can do this using a command line argument for the actual server. However for a dirty workaround it seems the following can ease the overhead.
Running the server with
java -mx6g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -prettyPrint false 2&>1 >/dev/null
where >/dev/null would pipe the output into nothing. Unfortunately this alone did not help. 2&>1 seems to do the trick here. I confess that I do not know what it's actually doing. However, I compared two runs.
Running with 2&>1 >/dev/null
Processed 100 sentences
Overall time: 2.1797 sec
Time per sentence: 0.0218 sec
Processed 200 sentences
Overall time: 6.5694 sec
Time per sentence: 0.0328 sec
...
Processed 1300 sentences
Overall time: 30.482 sec
Time per sentence: 0.0234 sec
Processed 1400 sentences
Overall time: 32.848 sec
Time per sentence: 0.0235 sec
Processed 1500 sentences
Overall time: 35.0417 sec
Time per sentence: 0.0234 sec
Running without additional arguments
ParagraphVectorTrainer - Epoch 1 of 6
Processed 100 sentences
Overall time: 2.9826 sec
Time per sentence: 0.0298 sec
Processed 200 sentences
Overall time: 5.5169 sec
Time per sentence: 0.0276 sec
...
Processed 1300 sentences
Overall time: 54.256 sec
Time per sentence: 0.0417 sec
Processed 1400 sentences
Overall time: 59.4675 sec
Time per sentence: 0.0425 sec
Processed 1500 sentences
Overall time: 64.0688 sec
Time per sentence: 0.0427 sec
This was a very shallow test but it appears that this can have quite an impact. The difference here is a factor of 1.828 which is quite a difference over time.
However, this was just a quick test and I cannot guarantee that my results are completely sane!
Further update:
I assume that this has to do with how the JVM is optimizing the code over time but the time per sentence becomes compareable with the one I am having on my local machine. Keep in mind that I got the results below using 2&>1 >/dev/null to eliminate the stdout logging.
Processed 68500 sentences
Overall time: 806.644 sec
Time per sentence: 0.0118 sec
Processed 68600 sentences
Overall time: 808.2679 sec
Time per sentence: 0.0118 sec
Processed 68700 sentences
Overall time: 809.9669 sec
Time per sentence: 0.0118 sec

You're now the third person that's asked for this :) -- Preventing Stanford Core NLP Server from outputting the text it receives . In the HEAD of the GitHub repo, and in versions 3.6.1 onwards, there's a -quiet flag that prevents the server from outputting the text it receives. Other logging can then be configured with SLF4J, if it's in your classpath.

Average waiting time in Round Robin scheduling

Waiting time is defined as how long each process has to wait before it gets it's time slice.
In scheduling algorithms such as Shorted Job First and First Come First Serve, we can find that waiting time easily when we just queue up the jobs and see how long each one had to wait before it got serviced.
When it comes to Round Robin or any other preemptive algorithms, we find that long running jobs spend a little time in CPU, when they are preempted and then wait for sometime for it's turn to execute and at some point in it's turn, it executes till completion. I wanted to findout the best way to understand 'waiting time' of the jobs in such a scheduling algorithm.
I found a formula which gives waiting time as:
Waiting Time = (Final Start Time - Previous Time in CPU - Arrival Time)
But I fail to understand the reasoning for this formula. For e.g. Consider a job A which has a burst time of 30 units and round-robin happens at every 5 units. There are two more jobs B(10) and C(15).
The order in which these will be serviced would be:
0 A 5 B 10 C 15 A 20 B 25 C 30 A 35 C 40 A 45 A 50 A 55
Waiting time for A = 40 - 5 - 0
I choose 40 because, after 40 A never waits. It just gets its time slices and goes on and on.
Choose 5 because A spent in process previouly between 30 and 35.
0 is the start time.
Well, I have a doubt in this formula as why was 15 A 20 is not accounted for?
Intuitively, I unable to get how this is getting us the waiting time for A, when we are just accounting for the penultimate execution only and then subtracting the arrival time.
According to me, the waiting time for A should be:
Final Start time - (sum of all times it spend in the processing).
If this formula is wrong, why is it?
Please help clarify my understanding of this concept.

You've misunderstood what the formula means by "previous time in CPU". This actually means the same thing as what you call "sum of all times it spend in the processing". (I guess "previous time in CPU" is supposed to be short for "total time previously spent running on the CPU", where "previously" means "before the final start".)
You still need to subtract the arrival time because the process obviously wasn't waiting before it arrived. (Just in case this is unclear: The "arrival time" is the time when the job was submitted to the scheduler.) In your example, the arrival time for all processes is 0, so this doesn't make a difference there, but in the general case, the arrival time needs to be taken into account.
Edit: If you look at the example on the webpage you linked to, process P1 takes two time slices of four time units each before its final start, and its "previous time in CPU" is calculated as 8, consistent with the interpretation above.

Last waiting
value-(time quantum×(n-1))
Here n denotes the no of times a process arrives in the gantt chart.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio