Difference between ThreadCount and StepCount in TIBCO BW Engine - tibco

Can anyone explain me the difference between StepCount and ThreadCount property of TIBCO BW Engine . I had tried to understand through TIBCO docs but unable to understand.
So, Please if anyone can explain me this would be great .
Thanks in advance.

The ThreadCount property defines the amount of threads (java threads) which execute all you processes. So with the default value of 8 threads you can run 8 job simultaneously.
The StepCount on the other hand defines the amount of activities executed before a thread can context switch into another job.
Sample scenario:
a process with 5 activities
ThreadCount is 2
StepCount is 4
If there are 3 incoming requests, the first two requests spawn 1 job each. The third job is spawned, but gets paused due to insufficient threads.
After the first job completes the forth activity, the thread is freed and can be assigned to another paused job.
So the first job will be paused and the third job starts to execute.
When the second job reaches the forth activity, this thread will be freed and is available for re-assignement. So the second jobs pauses and the first resumes.
After the third job reaches its forth activity, the thread is freed again and resumes job number one (and completes this one). Afterwards Job number 3 get completed.
All of this is a theoretical scenario. What you usually need is to set the amount of concurrent jobs (so ThreadCount). The StepCount is close to irrelevant, because the engine will take care of the pooling and mapping of physical threads to virtual BW jobs.

ThreadCount
The Thread Count concept states the number of thread a TIBCO BW engine can allocate. The default number of threads is eight.
The number of threads means the number of jobs that can be executed simultaneously in the engine. So the maximum number of jobs that can concurrently in the engine are limited to number threads, that is eight. This property specifies the size of the job thread pool, and is applied to all the AppNodes in the AppSpace if set at the AppSpace level.
Threads carries out a limited number of tasks or activities uninterrupted and then yield to the next job that is ready. Starting with a default value of eight threads the thread count can be adjusted to optimum value and now it can be doubled until CPU maximum level is reached.
StepCount
The StepCount concept states the number of activities that are accomplished by an engine thread, without any disruption, before yielding the engine thread to another job that is ready in the job pool. The default value of the step counter is -1. When the value is set to -1, the engine can decide the required StepCount value. A low StepCount value may humiliate engine performance due to frequent thread exchange depending on the situation.

Related

Does the Windows scheduler sometimes fail to preempt a running thread immediately to let a higher-priority thread run?

My application operates on pairs of long vectors - say it adds them together to produce a vector result. Its rules state that it must completely finish with one pair before it can be given another. I would like to use multiple threads to speed things up. I am running Windows 10.
I created an OpenMP parallel for construct and divided the vector among all the threads of the team. All threads start, all threads run pretty fast, so the multithreading is effective.
But the speedup is slight, and the reason is that some of the time, one of the worker threads takes way longer than usual. I have instrumented the operation, and I see that sometimes the worker threads take a long time to start - delay varies from 20 microseconds on average to dozens of milliseconds depending on system load. The master thread does not show this delay.
That makes me think that the scheduler is taking some time to start the worker threads. The master thread is already running, so it doesn't have to wait to be started.
But here is the nub of the question: raising the priority of the process doesn't make any difference. I can raise it to high priority or even realtime priority, and I still see that startup of the worker threads is often delayed. It looks like the Windows scheduler is not fully preemptive, and sometimes lets a lower-priority thread run when a higher-priority one is eligible. Can anyone confirm this?
I have verified that the worker threads are created with the default OS priority, namely the base priority of the class of the master process. This should be higher that the priority of any running thread, I think. Or is it normal for there to be some thread with realtime priority that might be blocking my workers? I don't see one with Task Manager.
I guess one last possibility is that the task switch might take 20 usec. Is that plausible?
I have a 4-core system without hyperthreading.

Spring batch multithreading: throttle-limit impact

I have a multi-threaded Step configured with a threadpool with a corePoolSize of 48 threads (it's a big machine) but I did not configure the throttle-limit.
I am wondering if I have been under utilizaing the machine because of this.
The Spring Batch documentation says that throttle-limit is the max amount of concurrent tasks that can run at one time and the default is 4.
I can see in jconsole that in fact there are 48 threads created and they seem to be executing (I can also see that in my logs).
But, so, even though I can see the 48 threads created, does the throttle-limit of 4 mean that only 4 of those 48 threads are indeed executing work concurrently?
Thank you in advance.
Yes, your understanding is correct i.e. only threads equal to throttle limit be doing work concurrently.
In your case, since its a thread - pool , any four threads could be chosen randomly to do the work and rest of threads will remain idle but since threads get rotated for those four tasks, it will give an impression that all threads are doing work concurrently.
corePoolSize simply indicates the number of threads to be started and maintained during job run but that doesn't mean that all are running concurrently what it means that you are trying to avoid thread creation overhead etc during job run.
You have not shared any code or job structure so its hard to point any more specifics.
Hope it helps !!

How to calculate the number of threads and loop count in jMeter

I am not able to find out the specific answer of how to calculate the number of threads for running a load test in JMeter?
How to identify the loop count ?
Is there any formula?
What are the parameters to consider for the calculation?
Say if you want to fire 100 request to server with 2 tps. then your threas properties should be like below:
Number of threads(users) :2.
Ramp up period: 100
Loop count :50
Based on above example.Please find below explaination.
• Number of Threads (N): Sets the number of threads the JMeter will use to execute our test plan. We must know that each thread will execute the whole test plan, which effectively utilizes the number of users that could use the tested service at any given time simultaneously.
• Ramp-Up Period R: Specifies how much time (in seconds) it will take for JMeter to start all the threads (simultaneous user connections). If the number of users is 5 and the ramp-up time is 10 seconds, then each thread will be started in a 2 second delayed interval. We need to be careful when setting this value, because if the value is too high the first thread will already finish processing the whole test plan before the second thread will even begin. This is important because that would effectively reduce the number of concurrent users using the testing server application at any given time. But the ramp-up period also needs to be high enough to avoid starting all of the thread at a single time, which could overload the target application.
• Loop Count (L): How many times each thread group will loop through all configured elements belonging to that thread group.
Hope it helps!

Hadoop Fairschduler doesn't utilize all map slots

Running a 12-node hadoop cluster with total 48 map-slots available. Submitting bunch of jobs, but never see all map slots being utilized. Maximum number of busy slots is floating around 30-35, but never close to 48. Why?
Here's the configuration of fairscheduler.
<?xml version="1.0"?>
<allocations>
<pool name="big">
<minMaps>10</minMaps>
<minReduces>10</minReduces>
<maxRunningJobs>3</maxRunningJobs>
</pool>
<pool name="medium">
<minMaps>10</minMaps>
<minReduces>10</minReduces>
<maxRunningJobs>3</maxRunningJobs>
<weight>3.0</weight>
</pool>
<pool name="small">
<minMaps>20</minMaps>
<minReduces>20</minReduces>
<maxRunningJobs>20</maxRunningJobs>
<weight>100.0</weight>
</pool>
</allocations>
The idea is that jobs in small queue should always have a priority, the next important queue is 'medium' and the less important is 'big'. Sometimes I see jobs in medium or big queue starve although there are more map slots available that are not used.
I think that the issue can be caused because the maxRunningJobs option is not taken into account while computing shares for jobs. I think that parameter is handled after slots (from the exceeding job) has been already assigned to a tasktracker. That is happening every n seconds from the UpdateThread.update()-> update Runability() method from FairScheduler class. I suppose that in your case after some time jobs from “medium” and “big” pool gets a bigger deficit than jobs from the “small” pool, that means that the next task will be scheduled from the job in medium or big pool. When the task is scheduled the restriction of maxRunningJobs take place and puts the exceeding jobs into a non runnable state. The same thing appears on the following update.
This is just my guess after looking after some source of fscheduler. If you can I would probably try to remove maxRunningJobs from the config and see how the scheduler behaves without that limitation and if it takes all of your slots..
Weigths for the pools in my oppinion seems to be to high. Weigh of 100 would mean that this pool should get 100x more slots than the default pool. I would try to lower this number by few factors if you want to have fair sharing between your pools. Otherwise jobs from others pools will be launched just when they will meet their deficit (it is calculated from the running tasks and minShare)
Another option why jobs are starving is maybe because of delay scheduling that is included in the fsched with the aim of improving computation locality? This can be probably improved by increasing a repclication factor but I do not think this is your case..
some docs on the fairscheduler..
The starvation probably occurs because the priority of the small pool is really really high (2^100 more than big 2^97 more than medium). When all the jobs are are ordered by priority and you have waiting jobs in the small pool. The next job in that pool needs 20 slots and it has higher priority than anything else so the open slots just wait there until a currently running job will free them. there are no "unneeded slots" to divide to other priorities
see highlights from the implementation notes of the fair schedulere:
"The fair shares are calculated by dividing the capacity of the
cluster among runnable jobs according to a "weight" for each job. By
default the weight is based on priority, with each level of priority
having 2x higher weight than the next (for example, VERY_HIGH has 4x
the weight of NORMAL). However, weights can also be based on job sizes
and ages, as described in the Configuring section. For jobs that are
in a pool, fair shares also take into account the minimum guarantee
for that pool. This capacity is divided among the jobs in that pool
according again to their weights."
Finally, when limits on a user's running jobs or a pool's running jobs
are in place, we choose which jobs get to run by sorting all jobs in
order of priority and then submit time, as in the standard Hadoop
scheduler. Any jobs that fall after the user/pool's limit in this
ordering are queued up and wait idle until they can be run. During
this time, they are ignored from the fair sharing calculations and do
not gain or lose deficit (their fair share is set to zero).

Google App Engine Task Queue

I want to run 50 tasks. All these tasks execute the same piece of code. Only difference will be the data. Which will be completed faster ?
a. Queuing up 50 tasks in a queue
b. Queuing up 5 tasks each in 10 different queue
Is there any ideal number of tasks that can be queued up in 1 queue before using another queue ?
The rate at which tasks are executed depends on two factors: the number of instances your app is running on, and the execution rate of the queue the tasks are on.
The maximum task queue execution rate is now 100 per queue per second, so that's not likely to be a limiting factor - so there's no harm in adding them to the same queue. In any case, sharding between queues for more execution rate is at best a hack. Queues are designed for functional separation, not as a performance measure.
The bursting rate of task queues is controlled by the bucket size. If there is a token in the queue's bucket the task should run immediately. So if you have:
queue:
- name: big_queue
rate: 50/s
bucket_size: 50
And haven't queue any tasks in a second all tasks should start right away.
see http://code.google.com/appengine/docs/python/config/queue.html#Queue_Definitions for more information.
Splitting the tasks into different queues will not improve the response time unless the bucket hadn't had enough time to completely fill with tokens.
I'd add another factor into the mix- concurrency. If you have slow running (more than 30 seconds or so) tasks, then AppEngine seems to struggle to scale up the correct number of instances to deal with the requests (seems to max out about 7-8 for me).
As of SDK 1.4.3, there's a setting in your queue.xml and your appengine-web.config you can use to tell AppEngine that each instance can handle more than one task at a time:
<threadsafe>true</threadsafe> (in appengine-web.xml)
<max-concurrent-requests>10</max-concurrent-requests> (in queue.xml)
This solved all my problems with tasks executing too slowly (despite setting all other queue params to the maximum)
More Details (http://blog.crispyfriedsoftware.com)
Queue up 50 tasks and set your queue to process 10 at a time or whatever you would like if they can run independently of each other. I see a similar problem and I just run 10 tasks at a time to process the 3300 or so that I need to run. It takes 45 minutes or so to process all of them but the CPU time used is negligible surprisingly.

Resources