Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
An Idle server(unix, windows), with at least a chef client and a Zabbix agent running on them, can have the absolute zero CPU utilization, or it is always using, for example, at least 0.001%?
To achive Zero cpu utilization you need to Shut it down.
Following are the ways:
Manually click on the shut down(on windows).
You can create a batch file for the same.
Pull the power cable out.(*Not recommended).
Most CPUs have a Halt instruction, or its moral equivalent. When the Halt instruction is reached, the CPU will stop executing its normal, Fetch-Decode-Execute loop (or moral equivalent).
The only thing that can bring the CPU out of this state is the delivery of an external interrupt (from some other piece of hardware), which will cause the CPU to perform its normal interrupt processing. What it does after that is up to the operating system.
As an example, a system may arrange for its network cards to cause interrupts and then Halt. Most, modern, everyday OSes will not do this since they have plenty of background tasks that they can schedule when the CPU would otherwise be idle.
Depends on the definition of CPU utilization.
At the CPU level, utilization can't be zero, unless the computer is off. Which is intuitive of course. Some process is always running, that should constitute good architecture.
Normally though, at the OS level, CPU utilization is calculated by the OS and its probable that each OS will have its own implementation of the calculation algorithm.
There are a certain set of tasks known as idle tasks, which run when there are no non-idle processes to run. In linux boot, the idle task comes under process 0.
Utilization for a period of time, is the percentage of time non-idle processes were run. For 10ms, if the CPU was idle for 5ms, utilization is 50%.
Simply put, 0% utilization means the CPU is running, but its just waiting for other tasks to be assigned, while running default idle tasks. It is also possible for some minor, non-idle tasks to be running, at something like 0.0001% utilization, which gets rounded off to zero.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I understand that goroutines are very light weight and we can spawn thousands of them but I want to know if there is some scenario when we should spawn a process instead of a goroutine (like hitting some kind of process boundaries in terms of resource or something else). Can spawning a new process in some scenario be beneficial in terms of resource utilization or some other dimension?
To get things started, here's three reasons. I'm sure there's more.
Reason #1
In a perfect world, CPUs would be busy doing the most important work they can (and not wasted doing the less important work while more important work waits).
To do this, whatever controls what work a CPU does (the scheduler) has to know how important each piece of work is. This is normally done with (e.g.) thread priorities. When there are 2 or more processes that are isolated from each other, whatever controls what work a CPU does can't be part of either process. Otherwise you get a situation where one process is consuming CPU time doing unimportant work because it can't know that there's a different process that wants the CPU for more important work.
This is why things like "goroutines" are broken (inferior to plain old threads). They simply can't do the right thing (unless there's never more than one process that wants CPU time).
Processes (combined with "process priorities") can fix that problem (while adding multiple other problems).
Reason #2
In a perfect world, software would never crash. The reality is that sometimes processes do crash (and sometimes the reason has nothing to do with software - e.g. a hardware flaw). Specifically, when one process crashes often there's no sane way to tell how much damage was done within that process, so the entire process typically gets terminated. To deal with this problem people use some form of redundancy (multiple redundant processes).
Reason #3
In a perfect world, all CPUs and all memory would be equal. In reality things don't scale up like that, so you get things like ccNUMA where a CPU can access memory in the same NUMA domain quickly, but the same CPU can't access memory in a different NUMA domain as quickly. To cope with that, ideally (when allocating memory) you'd want to tell the OS "this memory needs low latency more than bandwidth" (and OS would allocate memory for the fastest/closest NUMA domain only) or you'd tell the OS "this memory needs high bandwidth more than low latency" (and the OS would allocate memory from all NUMA domains). Sadly every language I've ever seen has "retro joke memory management" (without any kind of "bandwidth vs. latency vs. security" hints); which means that the only control you get is the choice between "one process spread across all NUMA domains vs. one process for each NUMA domain".
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
A PC has a microprocessor which processes 16 instructions per microsecond. Each instruction is 64 bits long. Its memory can retrieve or store
data/instructions at a rate of 32 bits per microsecond.
Mention 3 options to upgrade system performance. Which option gives most improved performance?
And the answer provided is
a) upgrade processor to one with twice the speed
b) upgrade memory with one twice the speed
c) double clock speed
(b) gives most improved performance.
Overcoming the bottleneck of a PC can improve the integrated performance.
However, my problem is that I am not sure of why b gives the most improved performance. Additionally, would a and c give the same performance? Will it provide the same performance? Can it be calculated? I am not sure of how these different parts would work on the performance.
Your question's leading paragraph contains the necessary numbers to see why it's b):
The CPU's processing rate is fixed at 16 instructions per microsecond. So an instruction takes less than a microsecond to execute.
Each instruction is 64 bits long, but the memory system retrieves data at 32 bits per microsecond. So it takes two microseconds to retrieve a single instruction (i.e. 64 bits).
The bottleneck is clear: it takes longer to retrieve an instruction (2μs) than it does to execute it (1/16thμs).
If you increase the CPU speed (answer a)), the CPU will execute an individual instruction faster, but it will still be waiting idle at least 2μs for the next instruction to arrive, so the improvement is wasted.
To eliminate bottlenecks you need to increase the memory-system's speed to match the CPU's execution speed, so the memory needs to read 64 bits in a 1/16μs (or 32 bits in 1/32μs).
I assume answer c) refers to increasing the speed of some systemwide master clock which would also increase the CPU and Memory data-rates. This would improve performance, but the CPU would still be slaved to the memory speed.
Note that your question describes a simplistic computer. Computers were like this originally, where the CPU accessed memory directly, instruction-by-instruction. However as CPUs got faster, memory did not - so computer-engineers added cache levels: this is much faster memory (but much smaller in capacity) where instructions (and data memory) can be read as fast as a CPU can execute them, solving the bottleneck without needing to make all system memory match the CPU's speed.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
We are trying to understand how Windows CPU Scheduler works in order to optimize our applications to achieve maximum possible infrastructure/real work ratio. There's some things in xperf that we don't understand and would like to ask the community to shed some light on what's really happening.
We initially started to investigate these issues when we got reports that some servers were "slow" or "unresponsive".
Background information
We have a Windows 2012 R2 Server that runs our middleware infrastructure with the following specs.
We found concerning that 30% of CPU is getting wasted on kernel, so we started to dig deeper.
The server above runs "host" ~500 processes (as windows services), each of these "host" processes has an inner while loop with a ~250 ms delay (yuck!), and each of those "host" processes may have ~1..2 "child" processes that are executing the actual work.
While having the infinite loop with 250 ms delay between iterations, the actual useful work for the "host" application to execute may appear only every 10..15 seconds. So there's a lot of cycles wasted for unnecessary looping.
We are aware that design of the "host" application is sub-optimal, to say the least, as applied to our scenario. The application is getting changed to an event-based model which will not require the loop and therefore we expect a significant reduction of "kernel" time in CPU utilization graph.
However, while we were investigating this problem, we've done some xperf analysis which raised several general questions about Windows CPU Scheduler for which we were unable to find any clear/concise explanation.
What we don't understand
Below is the screenshot from one of xperf sessions.
You can see from the "CPU Usage (Precise)" that
There's 15 ms time slices, of which majority are under-utilized. The utilization of those slices is ~35-40%. So I assume that this in turn means that CPU gets utilized about ~35-40% of the time, yet the system's performance (let's say observable through casual tinkering around the system) is really sluggish.
With this we have this "mysterious" 30% kernel time cost, judged by the task manager CPU utilization graph.
Some CPU's are obviously utilized for the whole 15 ms slice and beyond.
Questions
As far as Windows CPU Scheduling on multiprocessor systems is concerned:
What causes 30% kernel cost? Context switching? Something else? What consideration should be made when applications are written to reduce this cost? Or even - achieve perfect utilization with minimal infrastructure cost (on multiprocessor systems, where number of processes is higher than the number of cores)
What are these 15 ms slices?
Why CPU utilization has gaps in these slices?
To diag the CPU usage issues, you should use Event Tracing for Windows (ETW) to capture CPU Sampling data (not precise, this is useful to detect hangs).
To capture the data, install the Windows Performance Toolkit, which is part of the Windows SDK.
Now run WPRUI.exe, select First Level, under Resource select CPU usage and click on start.
Now capture 1 minute of the CPU usage. After 1 minute click on Save.
Now analyze the generated ETL file with the Windows Performance Analyzer by drag & drop the CPU Usage (sampled) graph to the analysis pane and order the colums like you see in the picture:
Inside WPA, load the debug symbols and expand Stack of the SYSTEM process. In this demo, the CPU usage comes from the nVIDIA driver.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have a mix of Dell machines, all bought at the same time, with only slightly different specs. The Windows Experience index is almost identical.
Yet my program runs at a speed difference of 25-40% (i.e. really noticeable) on the machines.
They are all out-of-the-box business Dells, no extra programs running (not that take up significant resources anyway).
My program is graphics-based, loading in a lot of data and then processing it on the CPU. CPU usage is identical on all machines, I only use a single thread.
I'd expect maybe 5-10% variation based on the processors (according to benchmarks).
My programmer spidey-sense tells me that something is wrong here.
Is this going to be something like cache misses? Is there anything else I should be looking at?
I have used debugging programs such as WinDbg for these situations. There are a lot of tools in those programs to nail down where the exact bottleneck is. E.g. Run them side by side and identify which point the slower machine is lagging. If physical specs of the machines are identical, it is most likely some difference in network configuration if there is a bottleneck when it is downloading the graphics. In that case, a tool such as WireShark would show you what hops the application is taking over the network to retrieve the data. If netwrok configuration is identical, I wouldn't rule out a physcial problem with the machine such as faulty ram or dodgy network cable. Also, check out the running processes side by side and see if there is any difference, kill unnessessary tasks that may be taking up memory on the slower computer and remove if necessary.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Typically in a working environment, I have many windows open, Outlook, 2/3 word docuements, few windows in browser, notepad++, some vpn client, excel etc..
Having said that, there are chances that about 40% of these apps are not frequently used, but are referred only sparingly. They occupy memory none-the-less.
Now, how does a typical OS deal with that kind of memory consumption ? does it suspend that app to hard disk (pagefile , or linux swap area etc) thereby freeing up that memory for usage, or does it keep occupying the memory there as it is.
Can this suspension be a practical solution, doable thing ? Are there any downsides ? response time ?
Is there some study material I can refer to for reading on this topic/direction..
would appreciate the help here.
The detailed answer depends on your OS and how it implements its memory management, but here is a generality:
The OS doesnt look at memory in terms of how many processes are in RAM, it looks at in terms of discrete units called pages. Most processes have several pages of RAM. Pages that are least referenced can be swapped out of RAM and onto the hard disk when physical RAM becomes scarce. Rarely, therefore, is an entire process swapped out of RAM, but only certain parts of it. It could be, for example, that some aspect of your currently running program is idle (ie the page is rarely accessed). In that case, it could be swapped out even though the process is in the foreground.
Try the wiki article for starters on how this process works and the many methods to implement it.