Cpu schedule, removing thread from queue - cpu

I'm implementing now CPU schedule algorithms FCFS, SJF and Round Robin. Could somebody tell when process is removed from queue (FCFS,SJF,RR)? I mean, first CPU execute thread and after executing remove from queue or the other way around?

A process (thread) should be removed from the queue immediately prior to execution, then placed back on the scheduling queue once execution is suspended.

Related

Why Golang scheduler uses two Queues (global run queue and local run queue) to manage goroutine?

I was reading how Golang internally manages new created goroutine in the application. And I come to know runtime scheduler use to queue to manage the created goroutines.
Global run queue: All newly created goroutine is placed to this queue.
Local run queue: All go routine which is about to run is allocated to local run queue and from there scheduler will assign it to OS thread.
So, Here my question is why scheduler is using two queues to manage goroutine. Why can't they just use global run queue and from there scheduler will map it to OS thread.
First, please note that this blog is an unofficial and old source, so the information in it shouldn't be taken as totally accurate with respect to the current version of Go (or any version, for that matter). You can still learn from it, but the Go scheduler is improved over time, which can make information out of date. For example, the blog says "Go scheduler is not a preemptive scheduler but a cooperating scheduler". As of Go 1.14, this is no longer true as preemption was added to the runtime. As for the other information, I won't vouch for it's accuracy, but here's an explanation of what they say.
Reading the blog post:
There are two different run queues in the Go scheduler: the Global Run Queue (GRQ) and the Local Run Queue (LRQ). Each P is given a LRQ that manages the Goroutines assigned to be executed within the context of a P. These Goroutines take turns being context-switched on and off the M assigned to that P. The GRQ is for Goroutines that have not been assigned to a P yet. There is a process to move Goroutines from the GRQ to a LRQ that we will discuss later.
This means the GRQ is for Goroutines that haven't been assigned to run yet, the LRQ is for Goroutines that have been assigned to a P to run or have already begun executing. Each Goroutine will start on the GRQ, and join a LRQ later to begin executing.
Here is the process that the previous quote was referencing, where Goroutines are moved from the GRQ to LRQ:
In figure 10, P1 has no more Goroutines to execute. But there are Goroutines in a runnable state, both in the LRQ for P2 and in the GRQ. This is a moment where P1 needs to steal work. The rules for stealing work are as follows.
runtime.schedule() {
// only 1/61 of the time, check the global runnable queue for a G.
// if not found, check the local queue.
// if not found,
// try to steal from other Ps.
// if not, check the global runnable queue.
// if not found, poll network.
}
This means a P will prioritize running goroutines in their own LRQ, then from other P's LRQ, then from the GRQ, then from network polling. There is also a small chance to immediately run a Goroutine from the GRQ immediately. By having multiple queues, it allows this priority system to be constructed.
Why do we want priority in which goroutines get run? It may have various performance benefits. For example, it could make better use of the CPU cache. If you run a Goroutine that was already running recently, it's more likely that the data it's working with is still in the CPU cache, making it fast to access. When you start up a new Goroutine, it may use or create data that isn't in the cache yet. That data will then enter the cache and could evict the data being used by another Goroutine, which in turn causes that Goroutine to be slower when it resumes again. In the pathological case, this is called cache thrashing, and greatly reduces the effective speed of the processor.
Allowing the CPU cache to work effectively can be one of the most important factors in achieving high performance on modern processors, but it's not the only reason to have such a queue system. In general, the more logical processes that are running at the same time (such as Goroutines in a Go program), the more resource contention will occur. This is because the resources used by a process tend to be fairly stable over the runtime of the process. In other words, every time you start a new process tends to increase the overall resource load, while continuing an already started process tends to maintain the resource load, and finishing a process tends to reduce the resource load. Therefore, prioritizing already running processes over new processes would tend to help keep the resource load in a manageable range.
It's analogous to the practical advice of "finish what you started". If you have a lot of tasks to accomplish, it's more effective to complete them one at a time, or multitask just a handful of things if you can. If you just keep starting new tasks and never finished the previous ones, eventually you have so many things going on at the same time that you feel overwhelmed.

Does the Windows scheduler sometimes fail to preempt a running thread immediately to let a higher-priority thread run?

My application operates on pairs of long vectors - say it adds them together to produce a vector result. Its rules state that it must completely finish with one pair before it can be given another. I would like to use multiple threads to speed things up. I am running Windows 10.
I created an OpenMP parallel for construct and divided the vector among all the threads of the team. All threads start, all threads run pretty fast, so the multithreading is effective.
But the speedup is slight, and the reason is that some of the time, one of the worker threads takes way longer than usual. I have instrumented the operation, and I see that sometimes the worker threads take a long time to start - delay varies from 20 microseconds on average to dozens of milliseconds depending on system load. The master thread does not show this delay.
That makes me think that the scheduler is taking some time to start the worker threads. The master thread is already running, so it doesn't have to wait to be started.
But here is the nub of the question: raising the priority of the process doesn't make any difference. I can raise it to high priority or even realtime priority, and I still see that startup of the worker threads is often delayed. It looks like the Windows scheduler is not fully preemptive, and sometimes lets a lower-priority thread run when a higher-priority one is eligible. Can anyone confirm this?
I have verified that the worker threads are created with the default OS priority, namely the base priority of the class of the master process. This should be higher that the priority of any running thread, I think. Or is it normal for there to be some thread with realtime priority that might be blocking my workers? I don't see one with Task Manager.
I guess one last possibility is that the task switch might take 20 usec. Is that plausible?
I have a 4-core system without hyperthreading.

Difference between ThreadCount and StepCount in TIBCO BW Engine

Can anyone explain me the difference between StepCount and ThreadCount property of TIBCO BW Engine . I had tried to understand through TIBCO docs but unable to understand.
So, Please if anyone can explain me this would be great .
Thanks in advance.
The ThreadCount property defines the amount of threads (java threads) which execute all you processes. So with the default value of 8 threads you can run 8 job simultaneously.
The StepCount on the other hand defines the amount of activities executed before a thread can context switch into another job.
Sample scenario:
a process with 5 activities
ThreadCount is 2
StepCount is 4
If there are 3 incoming requests, the first two requests spawn 1 job each. The third job is spawned, but gets paused due to insufficient threads.
After the first job completes the forth activity, the thread is freed and can be assigned to another paused job.
So the first job will be paused and the third job starts to execute.
When the second job reaches the forth activity, this thread will be freed and is available for re-assignement. So the second jobs pauses and the first resumes.
After the third job reaches its forth activity, the thread is freed again and resumes job number one (and completes this one). Afterwards Job number 3 get completed.
All of this is a theoretical scenario. What you usually need is to set the amount of concurrent jobs (so ThreadCount). The StepCount is close to irrelevant, because the engine will take care of the pooling and mapping of physical threads to virtual BW jobs.
ThreadCount
The Thread Count concept states the number of thread a TIBCO BW engine can allocate. The default number of threads is eight.
The number of threads means the number of jobs that can be executed simultaneously in the engine. So the maximum number of jobs that can concurrently in the engine are limited to number threads, that is eight. This property specifies the size of the job thread pool, and is applied to all the AppNodes in the AppSpace if set at the AppSpace level.
Threads carries out a limited number of tasks or activities uninterrupted and then yield to the next job that is ready. Starting with a default value of eight threads the thread count can be adjusted to optimum value and now it can be doubled until CPU maximum level is reached.
StepCount
The StepCount concept states the number of activities that are accomplished by an engine thread, without any disruption, before yielding the engine thread to another job that is ready in the job pool. The default value of the step counter is -1. When the value is set to -1, the engine can decide the required StepCount value. A low StepCount value may humiliate engine performance due to frequent thread exchange depending on the situation.

Suggestion for Oracle AQ dequeue approach

I have a need to dequeue messages coming from an Oracle Queue on a continuous basis.
As far as I could imagine, we can deuque the message in two ways, either through Asyncronous Auto-Notification approach or by manual polling process where one can dequeue one message at a time.
I can't go for Asyncronous notification feature as the number of messages it receives could go upto 1000 within 5 mintues during peak hours and
I do not want to overload the database by spawning multiple callback procedures in the background.
With the manual polling process,I can create a one-time scheduler job that runs 24*7 which calls a stored proc that dequeus the messages in a loop in WAIT mode(kind of listening for a message).
The problem with this approach is that
1) the scheduler job runs continously and occupies one permanent job slot
2) the stored procedure does not EXIT as it runs in a loop waiting for messages.
Are there any alternative/better solutions where I do not need to have a job/procedure running continuously looking for messages?
Can I use auto-notification approach to get notification for the very first message,un-subscribe the subscriber and dequeue further messages and
subscribe to the queue again when there are no more messages ? Is this a safe approach and will i lose any message in between subscription and un-subscription ?
BTW, We use Oracle 10gR2 database, so I can't use PURGE ON NOTIFICATION option.
Appreciate your expert solution!!
You're right, it's not a good idea to use auto-notification for a high-volume queue.
At one client I've seen a one-time scheduler job which runs 24*7, it seems to work reasonably well, and they can enqueue a special "STOP" message (which goes to the top of the queue) that it listens for and stops processing messages.
However, generally I'd lean towards a job that runs regularly (e.g. once per minute, or whatever granularity is suitable for you) which would dequeue all the messages. I'd put the dequeue in a loop with a loop counter and a "maximum messages" limiter based on the maximum number of messages you'd expect in a 1-minute period. The job would keep processing messages until (a) there are no more messages in the queue, or (b) the maximum limit has been reached.
You can then set the schedule for the job based on the maximum delay you want to see between an enqueue and a dequeue. E.g. if it doesn't matter if a message isn't processed within 5 minutes, you could set the job to run once every 5 minutes.
The maximum limit needs to be quite a high figure - e.g. 10x or 100x the expected maximum number - otherwise a spike could flood your queue and it might not keep up. The idea of the maximum limit is to ensure that the job never runs forever. This should give ops enough time to detect a problem with the queue (e.g. if some rogue process is flooding the queue with bogus messages).

Google App Engine Task Queue

I want to run 50 tasks. All these tasks execute the same piece of code. Only difference will be the data. Which will be completed faster ?
a. Queuing up 50 tasks in a queue
b. Queuing up 5 tasks each in 10 different queue
Is there any ideal number of tasks that can be queued up in 1 queue before using another queue ?
The rate at which tasks are executed depends on two factors: the number of instances your app is running on, and the execution rate of the queue the tasks are on.
The maximum task queue execution rate is now 100 per queue per second, so that's not likely to be a limiting factor - so there's no harm in adding them to the same queue. In any case, sharding between queues for more execution rate is at best a hack. Queues are designed for functional separation, not as a performance measure.
The bursting rate of task queues is controlled by the bucket size. If there is a token in the queue's bucket the task should run immediately. So if you have:
queue:
- name: big_queue
rate: 50/s
bucket_size: 50
And haven't queue any tasks in a second all tasks should start right away.
see http://code.google.com/appengine/docs/python/config/queue.html#Queue_Definitions for more information.
Splitting the tasks into different queues will not improve the response time unless the bucket hadn't had enough time to completely fill with tokens.
I'd add another factor into the mix- concurrency. If you have slow running (more than 30 seconds or so) tasks, then AppEngine seems to struggle to scale up the correct number of instances to deal with the requests (seems to max out about 7-8 for me).
As of SDK 1.4.3, there's a setting in your queue.xml and your appengine-web.config you can use to tell AppEngine that each instance can handle more than one task at a time:
<threadsafe>true</threadsafe> (in appengine-web.xml)
<max-concurrent-requests>10</max-concurrent-requests> (in queue.xml)
This solved all my problems with tasks executing too slowly (despite setting all other queue params to the maximum)
More Details (http://blog.crispyfriedsoftware.com)
Queue up 50 tasks and set your queue to process 10 at a time or whatever you would like if they can run independently of each other. I see a similar problem and I just run 10 tasks at a time to process the 3300 or so that I need to run. It takes 45 minutes or so to process all of them but the CPU time used is negligible surprisingly.

Resources