sched_wakeup_granularity_ns , sched_min_granularity_ns and SCHED_RR - linux-kernel

The following value from my box :
sysctl -A | grep "sched" | grep -v "domain"
kernel.sched_autogroup_enabled = 0
kernel.sched_cfs_bandwidth_slice_us = 5000
kernel.sched_child_runs_first = 0
kernel.sched_latency_ns = 18000000
kernel.sched_migration_cost_ns = 5000000
kernel.sched_min_granularity_ns = 10000000
kernel.sched_nr_migrate = 32
kernel.sched_rr_timeslice_ms = 100
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_shares_window_ns = 10000000
kernel.sched_time_avg_ms = 1000
kernel.sched_tunable_scaling = 1
kernel.sched_wakeup_granularity_ns = 3000000
It means in one second , 0.95 second is for SCHED_FIFO or SCHED_RR ,
only 0.05 reserved for SCHED_OTHER , What I am curious is
sched_wakeup_granularity_ns , I have googled it and get the explanation :
Ability of tasks being woken to preempt the current task.
The smaller the value, the easier it is for the task to force the preemption
I think sched_wakeup_granularity_ns only effect SCHED_OTHER task ,
the SCHED_FIFO and SCHED_RR should not in sleep mode , so no need to "wakeup",
am I correct ?!
and for sched_min_granularity_ns, the explanation is :
Minimum preemption granularity for processor-bound tasks.
Tasks are guaranteed to run for this minimum time before they are preempted
I like to know , although SCHED_RR tasks can has 95% of cpu time , But
since the sched_min_granularity_ns value = 10000000 , it is 0.01 second ,
that means that every SCHED_OTHER get 0.01 second timeslice to run before been preempted unless it is blocked by blocking socket or sleep or else , it imply that if I have 3 tasks in core 1 for example , 2 tasks with SCHED_RR , the third task with SCHED_OTHER , and the third task just run a endless loop without blocking socket recv and without yield , so once the third task get the cpu and run , it will run 0.01 second
and then context switch out , even the next task is priority with SCHED_RR ,
it is the right understaning for sched_min_granularity_ns usage ?!
Edit :
http://lists.pdxlinux.org/pipermail/plug/2006-February/045495.html
describe :
No SCHED_OTHER process may be preempted by another SCHED_OTHER process.
However a SCHED_RR or SCHED_FIFO process will preempt SCHED_OTHER
process before their time slice is done. So a SCHED_RR process
should wake up from a sleep with fairly good accuracy.
means SCHED_RR task can preempt the endless while loop without blocking even
time slice is not done ?!

Tasks with a higher scheduling class "priority" will preempt all tasks with a lower priority scheduling class, regardless of any timeouts. Take a look at the below snippet from kernel/sched/core.c:
void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
const struct sched_class *class;
if (p->sched_class == rq->curr->sched_class) {
rq->curr->sched_class->check_preempt_curr(rq, p, flags);
} else {
for_each_class(class) {
if (class == rq->curr->sched_class)
break;
if (class == p->sched_class) {
resched_curr(rq);
break;
}
}
}
/*
* A queue event has occurred, and we're going to schedule. In
* this case, we can save a useless back to back clock update.
*/
if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
rq_clock_skip_update(rq, true);
}
for_each_class will return the classes in this order: stop, deadline, rt, fair, idle. The loop will stop when trying to preempt a task with the same scheduling class as the preempting task.
So for your question, the answer is yes, an "rt" task will preempt a "fair" task.

Related

What part does priority play in round robin scheduling?

I am trying to solve the following homework problem for an operating systems class:
The following processes are being scheduled using a preemptive, round robin scheduling algorithm. Each process is assigned a numerical priority, with a higher number indicating a higher relative priority.
In addition to the processes listed below, the system also has an idle task (which consumes no CPU resources and is identified as Pidle ). This task has priority 0 and is scheduled whenever the system has no other available processes to run.
The length of a time quantum is 10 units.
If a process is preempted by a higher-priority process, the preempted process is placed at the end of the queue.
+--+--------+----------+-------+---------+
| | Thread | Priority | Burst | Arrival |
+--+--------+----------+-------+---------+
| | P1 | 40 | 15 | 0 |
| | P2 | 30 | 25 | 25 |
| | P3 | 30 | 20 | 30 |
| | P4 | 35 | 15 | 50 |
| | P5 | 5 | 15 | 100 |
| | P6 | 10 | 10 | 105 |
+--+--------+----------+-------+---------+
a. Show the scheduling order of the processes using a Gantt chart.
b. What is the turnaround time for each process?
c. What is the waiting time for each process?
d. What is the CPU utilization rate?
My question is --- What role does priority play when we're considering that this uses the round robin algorithm? I have been thinking about it a lot what I have come up with is that it only makes sense if the priority is important at the time of its arrival in order to decide if it should preempt another process or not. The reason I have concluded this is because if it was checked every time there was a context switch then the process with the highest priority would always be run indefinitely and other processes would starve. This is against the idea of round robin making sure that no process executes longer than one time quantum and the idea that after a process executes it goes to the end of the queue.
Using this logic I have worked out the problem as such:
Could you please advise me if I'm on the right track of the role priority has in this situation and if I'm approaching it the right way?
I think you are on the wrong track. Round robin controls the run order within a priority. It is as if each priority has its own queue, and corresponding round robin scheduler. When a given priority’s queue is empty, the subsequent lower priority queues are considered. Eventually, it will hit idle.
If you didn’t process it this way, how would you prevent idle from eventually being scheduled, despite having actual work ready to go?
Most high priority processes are reactive, that is they execute for a short burst in response to an event, so for the most part on not on a run/ready queue.
In code:
void Next() {
for (int i = PRIO_HI; i >= PRIO_LO; i--) {
Proc *p;
if ((p = prioq[i].head) != NULL) {
Resume(p);
/*NOTREACHED*/
}
}
panic(“Idle not on runq!”);
}
void Stop() {
unlink(prioq + curp->prio, curp);
Next();
}
void Start(Proc *p) {
p->countdown = p->reload;
append(prioq + p->prio, p);
Next();
}
void Tick() {
if (--(curp->countdown) == 0) {
unlink(prioq + curp->prio, curp);
Start(curp);
}
}

Laravel task scheduling command frequency delay or offset

Is there a way to delay or offset a scheduled command from the proposed frequency options?
e.g:
$schedule->command('GetX')->everyTenMinutes(); --> run at 9:10, 9:20, 9:30
$schedule->command('GetY')->everyTenMinutes(); --> run at 9:15, 9:25, 9:35
There are no delay function when scheduling tasks.
But when method can be used to schedule a task every 10 minutes, delay 5 minutes:
// this command is scheduled to run if minute is 05, 15, 25, 35, 45, 55
// the truth test is checked every minute
$schedule->command('foo:bar')->everyMinute()->when(function () {
return date('m') - 5 % 10 == 0;
});
Follow this rule, you can schedule a task every x minutes, delay y minutes
$schedule->command('foo:bar')->everyMinute()->when(function () {
return date('m') - y % x == 0;
});
If it gets difficult, the direct way you can simply write a custom Cron schedule. It is the easier way to understand without getting headache when you read the code later.
$schedule->command('foo:bar')->cron('05,15,25,35,45,55 * * * *');

Analyzing readdir() performance

It's bothering me that linux takes so long to list all files for huge directories, so I created a little test script that recursively lists all files of a directory:
#include <stdio.h>
#include <dirent.h>
int list(char *path) {
int i = 0;
DIR *dir = opendir(path);
struct dirent *entry;
char new_path[1024];
while(entry = readdir(dir)) {
if (entry->d_type == DT_DIR) {
if (entry->d_name[0] == '.')
continue;
strcpy(new_path, path);
strcat(new_path, "/");
strcat(new_path, entry->d_name);
i += list(new_path);
}
else
i++;
}
closedir(dir);
return i;
}
int main() {
char *path = "/home";
printf("%i\n", list(path));
return 0;
When compiling it with gcc -O3, the program runs about 15 sec (I ran the programm a few times and it's approximately constant, so the fs cache should not play a role here):
$ /usr/bin/time -f "%CC %DD %EE %FF %II %KK %MM %OO %PP %RR %SS %UU %WW %XX %ZZ %cc %ee %kk %pp %rr %ss %tt %ww %xx" ./a.out
./a.outC 0D 0:14.39E 0F 0I 0K 548M 0O 2%P 178R 0.30S 0.01U 0W 0X 4096Z 7c 14.39e 0k 0p 0r 0s 0t 1692w 0x
So it spends about S=0.3sec in kernelspace and U=0.01sec in userspace and has 7+1692 context switches.
A context switch takes about 2000nsec * (7+1692) = 3.398msec [1]
However, there are more than 10sec left and I would like to find out what the program is doing in this time.
Are there any other tools to investigate what the program is doing all the time?
gprof just tells me the time for the (userspace) call graph and gcov does not list time spent in each line but only how often a time is executed...
[1] http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html
oprofile is a decent sampling profiler which can profile both user and kernel-mode code.
According to your numbers, however, approximately 14.5 seconds of the time is spent asleep, which is not really registered well by oprofile. Perhaps what may be more useful would be ftrace combined with a reading of the kernel code. ftrace provides trace points in the kernel which can log a message and stack trace when they occur. The event that would seem most useful for determining why your process is sleeping would be the sched_switch event. I would recommend that you enable kernel-mode stacks and the sched_switch event, set a buffer large enough to capture the entire lifetime of your process, then run your process and stop tracing immediately after. By reviewing the trace, you will be able to see every time your process went to sleep, whether it was runnable or non-runnable, a high resolution time stamp, and a call stack indicating what put it to sleep.
ftrace is controlled through debugfs. On my system, this is mounted in /sys/kernel/debug, but yours may be different. Here is an example of what I would do to capture this information:
# Enable stack traces
echo "1" > /sys/kernel/debug/tracing/options/stacktrace
# Enable the sched_switch event
echo "1" > /sys/kernel/debug/tracing/events/sched/sched_switch/enable
# Make sure tracing is enabled
echo "1" > /sys/kernel/debug/tracing/tracing_on
# Run the program and disable tracing as quickly as possible
./your_program; echo "0" > /sys/kernel/debug/tracing/tracing_on
# Examine the trace
vi /sys/kernel/debug/tracing/trace
The resulting output will have lines which look like this:
# tracer: nop
#
# entries-in-buffer/entries-written: 22248/3703779 #P:1
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [000] d..3 2113.437500: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:0 next_pid=878 next_prio=120
<idle>-0 [000] d..3 2113.437531: <stack trace>
=> __schedule
=> schedule
=> schedule_preempt_disabled
=> cpu_startup_entry
=> rest_init
=> start_kernel
kworker/0:0-878 [000] d..3 2113.437836: sched_switch: prev_comm=kworker/0:0 prev_pid=878 prev_prio=120 prev_state=S ==> next_comm=your_program next_pid=898 next_prio=120
kworker/0:0-878 [000] d..3 2113.437866: <stack trace>
=> __schedule
=> schedule
=> worker_thread
=> kthread
=> ret_from_fork
The lines you will care about will be when your program appears as the prev_comm task, meaning the scheduler is switching away from your program to run something else. prev_state will indicate that your program was still runnable (R) or was blocked (S, U or some other letter, see the ftrace source). If blocked, you can examine the stack trace and the kernel source to figure out why.

Getting surprising elapsed time in windows and linux

I have written one function which is platform independent and working nicely in windows as well as linux. I wanted to check the execution time of that function. I am using QueryPerformanceCounter to calculate the execution time in windows and "gettimeofday" in linux.
The problem is in windows the execution time is 60 mili seconds and in linux its showing 4 ms. Its a huge difference b/w them. Can anybody suggest what might went wrong....or If any body knows the some other APIs better than these to calculate elapsed time please let me know...
here is the code for i have written using gettimeofday......
void main()
{
timeval start_time;
timeval end_time;
gettimeofday(&start_time,NULL);
function_invoke(........);
gettimeofday(&end_time,NULL);
timeval res;
timersub(&start_time,&end_time,&res);
cout<<"function_invoke took seconds = "<<res.tv_sec<<endl;
cout<<"function_invoke took microsec = "<<res.tv_usec<<endl;
}
OUTPUT :
function_invoke took seconds = 0
function_invoke took microsec = 4673 ( 4.673 mili seconds )

how to judge of the trade-off of lua closure and lua coroutine?(when both of them can perform the same task)

ps:let alone the code complexity of closure implementation of the same task.
The memory overhead for a closure will be less than for a coroutine (unless you've got a lot of "upvalues" in the closure, and none in the coroutine). Also the time overhead for invoking the closure is negligible, whereas there is some small overhead for invoking the coroutine. From what I've seen, Lua does a pretty good job with coroutine switches, but if performance matters and you have the option not to use a coroutine, you should explore that option.
If you want to do benchmarks yourself, for this or anything else in Lua:
You use collectgarbage("collect");collectgarbage("count") to report the size of all non-garbage-collectable memory. (You may want to do "collect" a few times, not just one.) Do that before and after creating something (a closure, a coroutine) to know how much size it consumes.
You use os.clock() to time things.
See also Programming in Lua on profiling.
see also:
https://gist.github.com/LiXizhi/911069b7e7f98db76d295dc7d1c5e34a
-- Testing coroutine overhead in LuaJIT 2.1 with NPL runtime
--[[
Starting function test...
memory(KB): 0.35546875
Functions: 500000
Elapsed time: 0 s
Starting coroutine test...
memory(KB): 13781.81640625
Coroutines: 500000
Elapsed time: 0.191 s
Starting single coroutine test...
memory(KB): 0.4453125
Coroutines: 500000
Elapsed time: 0.02800000000002
conclusions:
1. memory overhead: 0.26KB per coroutine
2. yield/resume pair overhead: 0.0004 ms
if you have 1000 objects each is calling yield/resume at 60FPS, then the time overhead is 0.2*1000/500000*60*1000 = 24ms
and if you do not reuse coroutine, then memory overhead is 1000*60*0.26 = 15.6MB/sec
]]
local total = 500000
local start, stop
function loopy(n)
n = n + 1
return n
end
print "Starting function test..."
collectgarbage("collect");collectgarbage("collect");collectgarbage("collect");
local beforeCount =collectgarbage("count")
start = os.clock()
for i = 1, total do
loopy(i)
end
stop = os.clock()
print("memory(KB):", collectgarbage("count") - beforeCount)
print("Functions:", total)
print("Elapsed time:", stop-start, " s")
print "Starting coroutine test..."
collectgarbage("collect");collectgarbage("collect");collectgarbage("collect");
local beforeCount =collectgarbage("count")
start = os.clock()
for i = 1, total do
co = coroutine.create(loopy)
coroutine.resume(co, i)
end
stop = os.clock()
print("memory(KB):", collectgarbage("count") - beforeCount)
print("Coroutines:", total)
print("Elapsed time:", stop-start, " s")
print "Starting single coroutine test..."
collectgarbage("collect");collectgarbage("collect");collectgarbage("collect");
local beforeCount =collectgarbage("count")
start = os.clock()
co = coroutine.create(function()
for i = 1, total do
loopy(i)
coroutine.yield();
end
end)
for i = 1, total do
coroutine.resume(co, i)
end
stop = os.clock()
print("memory(KB):", collectgarbage("count") - beforeCount)
print("Coroutines:", total)
print("Elapsed time:", stop-start, " s")

Resources