Limiting the CPU utilization of a user in windows - windows

I want to know if there is any way to limit the number of cpu usage by the user name in windows? For example, there are 8 cores and I want to limit the global cpu usage of a user to 6. So, he can not run more than 6 serial jobs (each use one core).
In Linux, that can be done via scripting. But I haven't see any similar thing even with powershell scripts. Does that mean, it can not be done?

The keyword for this is Affinity.
Affinity starts at 0 being first core.
Affinity is a bitmap
10000000 = first core
01000000 = second core
11000000 = first and second core
00100000 = third core
10100000 = first and third core
11100000 = first second and third core
function Set-Affinity([string]$Username,[int[]]$core){
[int]$affinty = 0
$core | %{ $affinty += [math]::pow(2,$_)}
get-process -IncludeUserName | ?{$_.UserName -eq $Username} | %{
$_.ProcessorAffinity = $affinty
}
}
Set-Affinity -username "TESTDOMAIN\TESTUSER" -core 0,1,2,3

Related

Ignite Off heap Tiered doesn't work

I an using Ignite's Data Grid and wanted to test the off heap tiered mode. I have 1 server and 1 client as part of the grid on different machines. Here are the steps that I follow to create the cache :
Start the server on one node.
Start the client on another node (use Discovery spi to connect to the server) and create a cache along with a near cache and load 10,000 entries into the cache.
The cache memory mode is OFFHEAP_TIERED and the off heap memory is set to zero using the method CacheConfiguration#setOffHeapMaxMemory(int size).
Open the Ignite CLI (visor) and check the number of entries stored off heap and the one's stored on heap.
The strange thing that I encounter is that not even a single entry is stored off heap. The visor shows all the entries in the client and on the server being stored on heap. But, if I do not a use a near cache then, all the entries are stored in off heap.
I want to know whether this a problem with the statistics shown by the visor or is there a change in behavior of Ignite storing entries when a near cache is enabled.
This is my Client Side Code
public class IgniteClient {
public static void main(String[] args) {
TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
// IP has not been shown intentionally
ipFinder.setAddresses(Arrays.asList("*.*.*.*"));
TcpDiscoverySpi spi = new TcpDiscoverySpi();
spi.setIpFinder(ipFinder);
IgniteConfiguration icfg = new IgniteConfiguration();
icfg.setMetricsUpdateFrequency(-1);
icfg.setClientMode(true);
Ignite grid = Ignition.start(icfg);
CacheConfiguration<Integer, String> ccfg = new CacheConfiguration<Integer, String>();
NearCacheConfiguration<Integer, String> ncfg = new NearCacheConfiguration<>();
ccfg.setMemoryMode(CacheMemoryMode.OFFHEAP_TIERED);
ccfg.setOffHeapMaxMemory(0);
ccfg.setName("data");
ncfg.setNearStartSize(1000);
IgniteCache<Integer, String> dataCache = grid.getOrCreateCache(ccfg, ncfg);
for (int i = 1; i <= 10000; i++) {
dataCache.put(i, Integer.toString(i));
}
System.out.println("The entries in data cache are " + dataCache.size(CachePeekMode.ALL));
}
}
This is my Server Side Code
public class IgniteMain {
public static void main(String[] args) {
IgniteConfiguration icfg = new IgniteConfiguration();
icfg.setMetricsUpdateFrequency(-1);
Ignite grid = Ignition.start(icfg);
}
}
This is the output of the command 'cache' on the Ignite visor which is running on the client machine
Time of the snapshot: 01/28/17, 18:23:41
+===================================================================================================================+
| Name(#) | Mode | Nodes | Entries (Heap / Off heap) | Hits | Misses | Reads | Writes |
+===================================================================================================================+
| data(#c0) | PARTITIONED | 2 | min: 10000 (10000 / 0) | min: 0 | min: 0 | min: 0 | min: 0 |
| | | | avg: 10000.00 (10000.00 / 0.00) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 |
| | | | max: 10000 (10000 / 0) | max: 0 | max: 0 | max: 0 | max: 0 |
+-------------------------------------------------------------------------------------------------------------------+
As you can see the visor shows that all the entries are in the heap and none of them are stored off heap.
Also, if I create and load the cache from the server and start the client (it does nothing) then all the entries are stored off heap.
To add to this there is other behavior which might throw more light.
Post the steps provided above, if you start another server node, the new server node stores the cache entries in off heap memory (assuming backup is set).
When you run the client again to clear the existing cache and add
the data again, on the servers, part data is on heap and part on off
heap.
I investigated and Ignite woks this way as you see.
You can track this issue for fix https://issues.apache.org/jira/browse/IGNITE-4662
Or not use near cache

sched_wakeup_granularity_ns , sched_min_granularity_ns and SCHED_RR

The following value from my box :
sysctl -A | grep "sched" | grep -v "domain"
kernel.sched_autogroup_enabled = 0
kernel.sched_cfs_bandwidth_slice_us = 5000
kernel.sched_child_runs_first = 0
kernel.sched_latency_ns = 18000000
kernel.sched_migration_cost_ns = 5000000
kernel.sched_min_granularity_ns = 10000000
kernel.sched_nr_migrate = 32
kernel.sched_rr_timeslice_ms = 100
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_shares_window_ns = 10000000
kernel.sched_time_avg_ms = 1000
kernel.sched_tunable_scaling = 1
kernel.sched_wakeup_granularity_ns = 3000000
It means in one second , 0.95 second is for SCHED_FIFO or SCHED_RR ,
only 0.05 reserved for SCHED_OTHER , What I am curious is
sched_wakeup_granularity_ns , I have googled it and get the explanation :
Ability of tasks being woken to preempt the current task.
The smaller the value, the easier it is for the task to force the preemption
I think sched_wakeup_granularity_ns only effect SCHED_OTHER task ,
the SCHED_FIFO and SCHED_RR should not in sleep mode , so no need to "wakeup",
am I correct ?!
and for sched_min_granularity_ns, the explanation is :
Minimum preemption granularity for processor-bound tasks.
Tasks are guaranteed to run for this minimum time before they are preempted
I like to know , although SCHED_RR tasks can has 95% of cpu time , But
since the sched_min_granularity_ns value = 10000000 , it is 0.01 second ,
that means that every SCHED_OTHER get 0.01 second timeslice to run before been preempted unless it is blocked by blocking socket or sleep or else , it imply that if I have 3 tasks in core 1 for example , 2 tasks with SCHED_RR , the third task with SCHED_OTHER , and the third task just run a endless loop without blocking socket recv and without yield , so once the third task get the cpu and run , it will run 0.01 second
and then context switch out , even the next task is priority with SCHED_RR ,
it is the right understaning for sched_min_granularity_ns usage ?!
Edit :
http://lists.pdxlinux.org/pipermail/plug/2006-February/045495.html
describe :
No SCHED_OTHER process may be preempted by another SCHED_OTHER process.
However a SCHED_RR or SCHED_FIFO process will preempt SCHED_OTHER
process before their time slice is done. So a SCHED_RR process
should wake up from a sleep with fairly good accuracy.
means SCHED_RR task can preempt the endless while loop without blocking even
time slice is not done ?!
Tasks with a higher scheduling class "priority" will preempt all tasks with a lower priority scheduling class, regardless of any timeouts. Take a look at the below snippet from kernel/sched/core.c:
void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
const struct sched_class *class;
if (p->sched_class == rq->curr->sched_class) {
rq->curr->sched_class->check_preempt_curr(rq, p, flags);
} else {
for_each_class(class) {
if (class == rq->curr->sched_class)
break;
if (class == p->sched_class) {
resched_curr(rq);
break;
}
}
}
/*
* A queue event has occurred, and we're going to schedule. In
* this case, we can save a useless back to back clock update.
*/
if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
rq_clock_skip_update(rq, true);
}
for_each_class will return the classes in this order: stop, deadline, rt, fair, idle. The loop will stop when trying to preempt a task with the same scheduling class as the preempting task.
So for your question, the answer is yes, an "rt" task will preempt a "fair" task.

Analyzing readdir() performance

It's bothering me that linux takes so long to list all files for huge directories, so I created a little test script that recursively lists all files of a directory:
#include <stdio.h>
#include <dirent.h>
int list(char *path) {
int i = 0;
DIR *dir = opendir(path);
struct dirent *entry;
char new_path[1024];
while(entry = readdir(dir)) {
if (entry->d_type == DT_DIR) {
if (entry->d_name[0] == '.')
continue;
strcpy(new_path, path);
strcat(new_path, "/");
strcat(new_path, entry->d_name);
i += list(new_path);
}
else
i++;
}
closedir(dir);
return i;
}
int main() {
char *path = "/home";
printf("%i\n", list(path));
return 0;
When compiling it with gcc -O3, the program runs about 15 sec (I ran the programm a few times and it's approximately constant, so the fs cache should not play a role here):
$ /usr/bin/time -f "%CC %DD %EE %FF %II %KK %MM %OO %PP %RR %SS %UU %WW %XX %ZZ %cc %ee %kk %pp %rr %ss %tt %ww %xx" ./a.out
./a.outC 0D 0:14.39E 0F 0I 0K 548M 0O 2%P 178R 0.30S 0.01U 0W 0X 4096Z 7c 14.39e 0k 0p 0r 0s 0t 1692w 0x
So it spends about S=0.3sec in kernelspace and U=0.01sec in userspace and has 7+1692 context switches.
A context switch takes about 2000nsec * (7+1692) = 3.398msec [1]
However, there are more than 10sec left and I would like to find out what the program is doing in this time.
Are there any other tools to investigate what the program is doing all the time?
gprof just tells me the time for the (userspace) call graph and gcov does not list time spent in each line but only how often a time is executed...
[1] http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html
oprofile is a decent sampling profiler which can profile both user and kernel-mode code.
According to your numbers, however, approximately 14.5 seconds of the time is spent asleep, which is not really registered well by oprofile. Perhaps what may be more useful would be ftrace combined with a reading of the kernel code. ftrace provides trace points in the kernel which can log a message and stack trace when they occur. The event that would seem most useful for determining why your process is sleeping would be the sched_switch event. I would recommend that you enable kernel-mode stacks and the sched_switch event, set a buffer large enough to capture the entire lifetime of your process, then run your process and stop tracing immediately after. By reviewing the trace, you will be able to see every time your process went to sleep, whether it was runnable or non-runnable, a high resolution time stamp, and a call stack indicating what put it to sleep.
ftrace is controlled through debugfs. On my system, this is mounted in /sys/kernel/debug, but yours may be different. Here is an example of what I would do to capture this information:
# Enable stack traces
echo "1" > /sys/kernel/debug/tracing/options/stacktrace
# Enable the sched_switch event
echo "1" > /sys/kernel/debug/tracing/events/sched/sched_switch/enable
# Make sure tracing is enabled
echo "1" > /sys/kernel/debug/tracing/tracing_on
# Run the program and disable tracing as quickly as possible
./your_program; echo "0" > /sys/kernel/debug/tracing/tracing_on
# Examine the trace
vi /sys/kernel/debug/tracing/trace
The resulting output will have lines which look like this:
# tracer: nop
#
# entries-in-buffer/entries-written: 22248/3703779 #P:1
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [000] d..3 2113.437500: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:0 next_pid=878 next_prio=120
<idle>-0 [000] d..3 2113.437531: <stack trace>
=> __schedule
=> schedule
=> schedule_preempt_disabled
=> cpu_startup_entry
=> rest_init
=> start_kernel
kworker/0:0-878 [000] d..3 2113.437836: sched_switch: prev_comm=kworker/0:0 prev_pid=878 prev_prio=120 prev_state=S ==> next_comm=your_program next_pid=898 next_prio=120
kworker/0:0-878 [000] d..3 2113.437866: <stack trace>
=> __schedule
=> schedule
=> worker_thread
=> kthread
=> ret_from_fork
The lines you will care about will be when your program appears as the prev_comm task, meaning the scheduler is switching away from your program to run something else. prev_state will indicate that your program was still runnable (R) or was blocked (S, U or some other letter, see the ftrace source). If blocked, you can examine the stack trace and the kernel source to figure out why.

What is the contents of the cache after loop?

A computer uses a small direct-mapped cache between the main memory and the
processor. The cache has four 16-bit words, and each word has an associated 13-bit tag,
as shown in Figure (a). When a miss occurs during a read operation, the requested
word is read from the main memory and sent to the processor. At the same time, it is
copied into the cache, and its block number is stored in the associated tag. Consider the
following loop in a program where all instructions and operands are 16 bits long:
LOOP: Add (R1)+,R0
Decrement R2
BNE LOOP
<-13 bits-> <--16bit->
0|TAG |DATA |
2| | |
4| | |
6|_______ | ______ |
(a)Cache
.
.
| A03C |<---ADDRESS 054E
| 05D9 |
| 10D7 |
.
.
(b)Main Memory
Assume that, before this loop is entered, registers R0, R1, and R2 contain 0, 054E,
and 3, respectively. Also assume that the main memory contains the data shown in
Figure (b), where all entries are given in hexadecimal notation. The loop starts at
location LOOP = 02EC.
(a) Show the contents of the cache at the end of each pass through the loop.

Change affinity of process with windows script

In Windows, with
START /node 1 /affinity ff cmd /C "app.exe"
I can set the affinity of app.exe (number of cores used by app.exe).
With a windows script, How I can change the affinity of a running process ?
PowerShell can do this task for you
Get Affinity:
PowerShell "Get-Process app | Select-Object ProcessorAffinity"
Set Affinity:
PowerShell "$Process = Get-Process app; $Process.ProcessorAffinity=255"
Example: (8 Core Processor)
Core # = Value = BitMask
Core 1 = 1 = 00000001
Core 2 = 2 = 00000010
Core 3 = 4 = 00000100
Core 4 = 8 = 00001000
Core 5 = 16 = 00010000
Core 6 = 32 = 00100000
Core 7 = 64 = 01000000
Core 8 = 128 = 10000000
Just add the decimal values together for which core you want to use. 255 = All 8 cores.
All Cores = 255 = 11111111
Example Output:
C:\>PowerShell "Get-Process notepad++ | Select-Object ProcessorAffinity"
ProcessorAffinity
-----------------
255
C:\>PowerShell "$Process = Get-Process notepad++; $Process.ProcessorAffinity=13"
C:\>PowerShell "Get-Process notepad++ | Select-Object ProcessorAffinity"
ProcessorAffinity
-----------------
13
C:\>PowerShell "$Process = Get-Process notepad++; $Process.ProcessorAffinity=255"
C:\>
Source:
Here is a nicely detailed post on how to change a process's affinity:
http://www.energizedtech.com/2010/07/powershell-setting-processor-a.html
The accepted answer works, but only for the first process in the list. The solution to that in the comments does not work for me.
To change affinity of all processes with the same name use this:
Powershell "ForEach($PROCESS in GET-PROCESS processname) { $PROCESS.ProcessorAffinity=255}"
Where 255 is the mask as given in the accepted answer.
For anyone else looking for answers to this and not finding any, the solution I found was to use an app called WinAFC (or AffinityChanger). This is a partial GUI, partial command line app that allows you to specify profiles for certain executables, and will poll the process list for them. If it finds matching processes, it will change the affinity of those processes according to the settings in the loaded profile.
There is some documentation here: http://affinitychanger.sourceforge.net/
For my purposes, I created a profile that looked like this:
TestMode = 0
TimeInterval = 1
*\convert.exe := PAIR0+PAIR1
This profile sets any convert.exe process to use the first two CPU core pairs (CPU0, CPU1, CPU2, and CPU3), polling every second. TestMode is a toggle that allows you to see if your profile is working without actually setting affinities.
Hope someone finds this useful!
If you really like enums, you can do it this way. ProcessorAffinity is an IntPtr, so it takes a little extra type casting.
[flags()] Enum Cores {
Core1 = 0x0001
Core2 = 0x0002
Core3 = 0x0004
Core4 = 0x0008
Core5 = 0x0010
Core6 = 0x0020
Core7 = 0x0040
Core8 = 0x0080
}
$a = get-process notepad
[cores][int]$a.Processoraffinity
Core1, Core2, Core3, Core4
$a.ProcessorAffinity = [int][cores]'core1,core2,core3,core4'
wmic process where name="some.exe" call setpriority ProcessIDLevel
I think these are the priority levels .You can also use PID instead of process name.

Resources