Using 2 different scheduling policies in the same project - linux-kernel

In an embedded project,we are facing difficulties in deciding which scheduling policy to use.For certain testcases to pass, we need to use SCHED_OTHER and for some other test cases we need to use SCHED_RR.But if we set SCHED_RRfor some task and rest as SCHED_OTHER,all the test cases are passing.Was it legal and are there any additional side effects for such usage of two policies in the same project?

I assume you are talking about Linux? Then yes, it is perfectly acceptable to have some tasks running with SCHED_RR and others running with SCHED_OTHER.
Note that SCHED_RR tasks will always get to run ahead of SCHED_OTHER tasks. So it is not surprising that your tests runs better if you set your tasks to SCHED_RR. The thing to watch out for is that your SCHED_RR tasks might use 100% of the CPU, and starve the SCHED_OTHER tasks. Maybe this is happening when you say some input is getting dropped.
Michael

Related

How to control how many tasks to run per executor in PySpark [duplicate]

I don't quite understand spark.task.cpus parameter. It seems to me that a “task” corresponds to a “thread” or a "process", if you will, within the executor. Suppose that I set "spark.task.cpus" to 2.
How can a thread utilize two CPUs simultaneously? Couldn't it require locks and cause synchronization problems?
I'm looking at launchTask() function in deploy/executor/Executor.scala, and I don't see any notion of "number of cpus per task" here. So where/how does Spark eventually allocate more than one cpu to a task in the standalone mode?
To the best of my knowledge spark.task.cpus controls the parallelism of tasks in you cluster in the case where some particular tasks are known to have their own internal (custom) parallelism.
In more detail:
We know that spark.cores.max defines how many threads (aka cores) your application needs. If you leave spark.task.cpus = 1 then you will have #spark.cores.max number of concurrent Spark tasks running at the same time.
You will only want to change spark.task.cpus if you know that your tasks are themselves parallelized (maybe each of your task spawns two threads, interacts with external tools, etc.) By setting spark.task.cpus accordingly, you become a good "citizen". Now if you have spark.cores.max=10 and spark.task.cpus=2 Spark will only create 10/2=5 concurrent tasks. Given that your tasks need (say) 2 threads internally the total number of executing threads will never be more than 10. This means that you never go above your initial contract (defined by spark.cores.max).

Possible to parallellize SonarQube background tasks?

In SonarQube (5.6.4 LTS) there is a view where background (project analysis) tasks are visualized: (Administration / Projects / Background Tasks). It seems like the tasks are run in sequence (one at a time). Some tasks could take 40 minutes which means other projects are queued up waiting for this task to finish before they could be started.
Is it possible to configure the SonarQube Compute Engine so that these tasks are run in parallel instead?
As per documentation on Background Tasks:
You can control the number of Analysis Reports that can be processed at a time in $SQ_HOME/conf/sonar.properties (see sonar.ce.workerCount - Default is 1).
Careful though: blindly increasing sonar.ce.workerCount without proper monitoring is just like shooting in the dark. The underlying resources available (CPU/RAM) are fixed (all workers run in the Compute Engine JVM), and you don't want to end-up with very limited memory for each task and/or high CPU-switching. That would kill performance for each of the tasks, rather than having only a few in parallel which will be much more efficient.
In short: better to have maximum 2 tasks in parallel that can complete under a minute (i.e. max 10 minutes to run 20 tasks), rather than 20 sluggish tasks in parallel that will overall take 15 minutes to complete because they struggle to share common CPU/RAM.
Update: with SonarQube 6.7+ and the new licence plans, "parallel processing of reports" has become a commercial feature and is only available in the Enterprise Edition.

Testing CPU Scheduling

How can I test a CPU scheduling algorithm (example:RR)?
As you know, an operating system includes its own processes which run on the CPU. However, I want to do it in a pure environment without any other processes and just with the P1, P2, and P3 processes that I have made.
Is there any simulation environment for testing CPU scheduling algorithms?
Edited:PART 1 : For example a company like Microsoft or in Universites how test the CPU scheduling algorithms and see it's result? I want to see that result.
PART 2 : Is there any simulation environment for doing this?
when we have OS (Windows,Linux) so there are some processes which blongs the OS.but I want to do it in a pure environment.
I don't know my idea is right or No,please tell if I'm making mistake for testing the CPU scheduling algorithm.
How can I implement it?
because I had just do it in a paper.
The CPU scheduler a.k.a task/process scheduler is inside the kernel for Linux systems. So, one way to compare between two different tasks scheduler is to build the same kernel with two different scheduler and compare with the same workload or application. The default scheduler in Linux is CFS (Complete Fair Scheduler). There are several other schedulers for example, Real-Time, BFS, and others. The RR (Round Robin) is just the method in choosing the next task to be schedule after one task is preempted.
Here is more info about Tuning the Task Scheduler

Performance test - approach

I have User Registration, Flight Search, Book Tickets modules in my application. I have created my JMeter test & I have different thread groups for each module in my test. I verified & it works well.
Thread Group 1: XX number of users - access the site - click on regression , enter the details & register. (bold -> loop- happening again and again)
Thread Group 2: XX number of users - access the site - login, - search for flights - (bold -> loop - happening again and again)
Thread Group 3: XX number of users - access the site - login, book ticket - (bold -> loop - happening again and again)
Issue:
My manager says we need to run all modules (all thread groups) together with appropriate users as that is how It is going to be in Production. Even though i can run them all together, - in case of issues - i would not know which feature of the application caused the issue.
My aim is to run each module separately & find its performance. I think that doing the module wise would be the correct approach to get the response time, resource utilization etc.
Clarify:
I do not have much experience in performance testing. What is the correct approach / How do you do your test for your application?
If i have to find server's optimal load (at which it performs better) - what should my approach be?
Intentionally tagging loadrunner as this question is not specific to JMeter & it is generic.
If your goal is to represent human behavior to assess the risk of deployment then testing each business process atomically will not accomplish your goal.
You appear to be engaging in a process that is more appropriately termed performance unit testing. This is very common with developers (as differentiated from performance testers) who seek to qualify the performance of an individual business process across some number of users. These are also typically classified with non-normal think times (often eliminated altogether), small data sets, smaller than useful test environments and extremely short test durations, such as 5-15 minutes.
You can mark this business scenario as transactions it means the HTTP requests for each module will be grouped for ex- Login requests in one group or transaction , Search flight as one group or transaction and similarly for Book tickets etc. By following this you will be testing it in a integrated manner and it would be a production like scenario too. After your run due to grouping you can easily find out which group of request taking more time either search, book tickets etc... In this way you will get the accurate performance statistics and you will achieve the production like scenario too
The approach really depends on what your goal for the testing exercise is. If you're looking to optimize or profile a particular module, it makes sense to test it in isolation.
However, if you're trying to check if your server scales, or if you have enough capacity, you should test all your modules at once, at or above your expected load levels.
A counter example to your isolated approach:
Say you have to modules A and B. They are both CPU intensive and take up 80% CPU when you run them. You first tested A, it used 80% CPU, you had 20% to spare and it performed fine. Now you test B alone, same result.
Now you go to production and users try to use both A and B modules, both are trying to use 80% CPU and suddenly you don't have sufficient CPU and your performance suffers.
I know this is kind of late but still.
I do not have much experience in performance testing. What is the
correct approach / How do you do your test for your application?
As James mentioned, the approach to conducting a performance test in a normal scenario would be to run all the critical business flows at the same time and not in an isolated fashion.
In order to identify issues, we would group the requests under transactions and name the business flows appropriately. This will help in identifying which requests have failed and which feature/portion of the application is at fault.
Running them individually will not provide you more insights simply because, a load testing tool will only be able to confirm the presence of a bottleneck but not the root cause irrespective of the number of business flows involved.
If i have to find server's optimal load (at which it performs better)
- what should my approach be?
In order to identify the optimal load for the server, it is mandatory to run all the scripts together as the end users are going to access the application (all critical scenarios) concurrently and not in a modularized manner.

Prevent execution of non-SGE programs

From the point of view of the system administration of an SGE node, is it possible to force users to run long-running programs through qsub instead of running it stand-alone?
The problem is that the same machine is acting as the control node and the computation node. So, I can't distinguish a long-running program from a user who is compiling with "gcc". Ideally, I would like to force users to submit long-running jobs (i.e., more than an hour) through qsub. I don't even mind being a bit mean and killing jobs that have run longer than an hour but weren't submitted through qsub.
Until now, all that I can do is send e-mails out asking users to "Please use qsub!"...
I've looked through the SGE configuration and nothing seems relevant. But maybe I've just missed something...any help would be appreciated! Thanks!
I'm a little confused about your setup, but I'm assuming users are submitting jobs by logging into what is also a computation node. Here are some ideas, best to worst:
Obviously, the best thing is to have a separate control node for users.
Barring that, run a resource-limited VM as the control node.
Configure user-level resource limits (e.g. ulimit) on the nodes. You can restrict CPU, memory, and process usage, which are probably what you care about rather than clock time.
It sounds like the last one may be best for you. It's not hard, either.

Resources