Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Besides just running an infinite loop, are there any tricks (like maybe cache misses?) to making a CPU as hot as possible?
This could be architecture specific or not.
I have produced several reports on stress testing PCs via my [Free] reliability/burn-in tests for Windows and Linux. You can find the reports through Googling for “Roy Longbottom burn-in”.
What you need is a variety of programs that run at high speeds to test CPUs, caches and RAM. They should log and display speeds at reasonably short intervals with temperatures noted or preferably also logged. On running them, you can find which are the most effective. You can run multiple copies concurrently via BAT files for Windows or shell scripts with Linux, rather than relying on more complicated multithreading programs. You also really need programs that check for correct results of calculations. For system testing, one of the programs can use graphics. Here, for nVidia, CUDA programs are useful in producing high temperatures.
Following shows CPU core and case temperatures before and after blowing out the dust from the heatsink.
Following are results on a laptop, testing with data in L1 cache. These show variations in speed according to temperature. Other CPUs might be more affected by different data/instructions/which cache is used.
Overheating Core 2 Duo Laptop 1.83 GHz
Words 5K 5K
Ops/wd 2 32
Core MFLOPS Core MFLOPS
Minute °C x2 °C x2
0.0 65 65
0.5 96 4716 91 10168
1.0 98 3362 94 4756
1.5 91 2076 87 4443
2.0 87 2054 86 4452
2.5 85 2054 85 4235
3.0 84 2036 84 4237
3.5 82 3098 83 4376
4.0 89 4773 83 4420
You might also be interested in my Raspberry Pi tests (Cooking The Pi), where the RPI is overheated via a 60W lamp, to crash when overclocked and show speed variations that vary with temperature. Here, the CPU integrated graphics is the most demanding hardware.
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Stress%20Tests.htm#anchor7
Since modern CPUs are usually power aware, they have many builtin mechanisms to power down whole parts (cores on a multicore system, or even internal units within).
In order to really stress the CPU, you need to make sure you activate as many units as possible.
This can be most easily achieved through heavyloading your vector execution units if you have some, or floating point ones otherwise. On the memory side, try to constantly consume as much bandwidth as possible in order to stress the memory unit, caches and memory buses. Naturally, run your code on all available cores (if hyperthreading is available it might be a good idea to use that too)
This type of benchmarking is commonly known as a power virus. There are several examples over the web, you can look up cpuburn for e.g. (it has some sub flavors)
There are apps, and softwares that run the so called 'Stress Tests'.
It'll run tests to stress the CPU and any other hardware in order to see if it gives any errors and etc. They are used for stability purposes, as you can imagine, to make sure the CPU/device/system will be OK under heavy processing.
Just look up for stress test softwares, you'll find a bunch of them.
Related
I'm using a multi-threaded software(PFC3D developped by Itasca consulting) to do some simulations.After moving to a powerful computer Intel Xeon Gold 5120T CPU 2.2GHZ 2.19 GHZ (2 Processors)(28 physical cores, 56 logical cores)(Windows10) to have rapid calculations, the software seems to only use a limited number of cores.Normally the number of cores detected in the software is 56 and it takes automaticly the maximum number of cores.
I'm quite sure that the problem is in the system not in my software because I'm running the same code in a intel core i9-9880H Processor (16 logical cores) and it'is using all the cores with even more efficiency than the xeon gold.
The software is using 22 to 30;
28 cores/56 threads are displayed on task managers CPU page.I have windows 10 pro.
I appreciate very much your precious help.
Thank you
Youssef
interface
classes
details
code
It's hard to say because I do not have the code and you provide so little information.
You seems to have no IO because you said that you use 100% of the CPU on i9. That should simplify a little bit but...
There could be many reasons.
My feeling is that you have threading synchronisation (like critical section) that depends on shared ressource(s). That ressource seems to be lightly solicitated and thread require it just a little wich enable 16 threads to access it without too much collisions (or very little). I mean that thread do not have to wait for shared resource (it is mostly available / not locked). But adding more threads improve significantly collisions amount (locking state of shared ressources by another thread) to have to wait for that ressource. That really sounds like something like that. But it is only a guess.
A quick try that could potentially improve the performance (because I have the feeling that shared resource require very quick access), is to use SpinLock instead of regular Critical Section. But that is totally a guess based on very little and also SpinLock is available in C# but perhaps not in your language.
About the number of CPU taken, it could be normal to take only the half depending on how the program is made. Sometimes it could be better to not use hyperthreaded and perhaps your program is doing this itself. Also there could be a bug in either the program itself, in C# or in the BIOS which tell the app that there is only 28 cpus instead of 56 (usually due to hyperthreading). IT is still a guess.
There could be some additional information that could potentially help you in this stack overflow question.
Good luck.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I looked at the different configurations of Macs available: MacBook Pro, iMac and iMac Pro.
Are the huge configurations of e.g. the iMac Pro (Xeon, 18 cores etc.) noticeable speeding up Xcode compilation times? Or are those specs tailored at video editing?
Also if I compare
3,2 GHz 8-Core Intel Xeon W Processor
4,2 GHz Quad‑Core Intel Core i7 Processor
more cores, less GHz or the other way round? What's most important for Xcode compilation performance - cores? Processor? Ghz?
Its super easy.
Xcode uses processor power for compiling tasks.
CPU Specification formula:
**
3,2Ghz * 8 cores = 25,6 Ghz
4,2Ghz * 4 cores = 16,8 Ghz
**
So answering to your question, the most important for Xcode compilation performance is processor power.
First processor, xeon based will be much more productive for xcode routine. Use that formula.
p.s. My answer based on assumption that both processors is the same or nearnly same year production. Its also important to take in mind the youth of CPU.
For 100% sure, check your processors at Geekbench
A higher clock speed allows more processes to be executed in a given time frame. Whereas multiple cores allow for parallel processing. However, the benefits are not double, because not everything will be able to run in parallel for the whole time.
4 cores sounds like plenty. You could maybe go to 6 and be able to justify it, but 8 would be overkill and a waste of money. A higher clock speed will be much more useful and would be much more useful when using the computer for other tasks as well. Also, in regards to the type of processor, they don’t matter too much. As long as you are getting the performance, the implementation doesn’t matter much compared to the other metrics.
Edit
It is also important to take into account the Turbo Boost speeds. This allows a processor to run at a lower clock speed, when non-intensive tasks are running, in order to save energy consumption. For intensive tasks, it will be the Turbo Boost speed that you are getting. This is managed automatically by macOS, but can be manually controlled using an app such as Turbo Boost Switcher.
For the Quad-Core i7, it has a Turbo Boost of 4.5GHz, whereas the 8 Core Xeon has a Turbo Boost of 4.2GHz. This makes them much closer in terms of clock speed. However, the i7 still beats the Xeon in terms of outright clock speed. It also beats it in terms of normal speed, which will benefit with other tasks performed on the computer, and will help with any ‘turbo lag’, if it it managed by the system. Finally, it also has an additional benefit of beating the Xeon on price. This means that for compiling and other Xcode tasks, the i7 is a clear winner.
Look at your current machine. Open Activity Monitor while you are building. If everything is perfect, you would have 100% CPU usage. On a good day you come to 70%, because nothing is perfect.
I have some third party build-scripts that are highly inefficient and use only one core. So your 18 core Mac won't benefit from that at all.
The first and cheapest approach is to make sure you use pre-compiled headers, especially for C++ code, and that your build scripts use all available processors. I have one C++ library that I could build four times faster after doing that.
Note that "GHz" numbers don't tell you what really happens. As your Mac uses more cores, it heats up, and has to reduce the clock speed. So that 3.2 GHz eight core model can run four threads at a much higher speed, probably the same speed as the 4.2 GHz quad core model.
Right now I would recommend you get an M1 Mac for highest single core performance and good multi-core performance, or wait a little bit for their second generation with 8 performance cores. That's what I will be doing.
I suggest you take the i7 one. (If both of the processors have the same release date, always take the newer release date)
If you are comparing processor performances, you need to know what that processor build for. Intel Xeon is a server processor, and Intel i7 is high-end pc processor.
When comparing 4,2 GHz Quad‑Core Intel Core i7 Processor vs 3,2 GHz 8-Core Intel Xeon W Processor for a single app the answer is simply the i7 one. Xcode build process may only take one full core with paralleling some its computing process in other core.
The 8-Core Xeon will better use for running computing process as a server do.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
We are trying to understand how Windows CPU Scheduler works in order to optimize our applications to achieve maximum possible infrastructure/real work ratio. There's some things in xperf that we don't understand and would like to ask the community to shed some light on what's really happening.
We initially started to investigate these issues when we got reports that some servers were "slow" or "unresponsive".
Background information
We have a Windows 2012 R2 Server that runs our middleware infrastructure with the following specs.
We found concerning that 30% of CPU is getting wasted on kernel, so we started to dig deeper.
The server above runs "host" ~500 processes (as windows services), each of these "host" processes has an inner while loop with a ~250 ms delay (yuck!), and each of those "host" processes may have ~1..2 "child" processes that are executing the actual work.
While having the infinite loop with 250 ms delay between iterations, the actual useful work for the "host" application to execute may appear only every 10..15 seconds. So there's a lot of cycles wasted for unnecessary looping.
We are aware that design of the "host" application is sub-optimal, to say the least, as applied to our scenario. The application is getting changed to an event-based model which will not require the loop and therefore we expect a significant reduction of "kernel" time in CPU utilization graph.
However, while we were investigating this problem, we've done some xperf analysis which raised several general questions about Windows CPU Scheduler for which we were unable to find any clear/concise explanation.
What we don't understand
Below is the screenshot from one of xperf sessions.
You can see from the "CPU Usage (Precise)" that
There's 15 ms time slices, of which majority are under-utilized. The utilization of those slices is ~35-40%. So I assume that this in turn means that CPU gets utilized about ~35-40% of the time, yet the system's performance (let's say observable through casual tinkering around the system) is really sluggish.
With this we have this "mysterious" 30% kernel time cost, judged by the task manager CPU utilization graph.
Some CPU's are obviously utilized for the whole 15 ms slice and beyond.
Questions
As far as Windows CPU Scheduling on multiprocessor systems is concerned:
What causes 30% kernel cost? Context switching? Something else? What consideration should be made when applications are written to reduce this cost? Or even - achieve perfect utilization with minimal infrastructure cost (on multiprocessor systems, where number of processes is higher than the number of cores)
What are these 15 ms slices?
Why CPU utilization has gaps in these slices?
To diag the CPU usage issues, you should use Event Tracing for Windows (ETW) to capture CPU Sampling data (not precise, this is useful to detect hangs).
To capture the data, install the Windows Performance Toolkit, which is part of the Windows SDK.
Now run WPRUI.exe, select First Level, under Resource select CPU usage and click on start.
Now capture 1 minute of the CPU usage. After 1 minute click on Save.
Now analyze the generated ETL file with the Windows Performance Analyzer by drag & drop the CPU Usage (sampled) graph to the analysis pane and order the colums like you see in the picture:
Inside WPA, load the debug symbols and expand Stack of the SYSTEM process. In this demo, the CPU usage comes from the nVIDIA driver.
When using the desktop PC's in my university (Which have 4Gb of ram), calculations in Matlab are fairly speedy, but on my laptop (Which also has 4Gb of ram), the exact same calculations take ages. My laptop is much more modern so I assume it also has a similar clock speed to the desktops.
For example, I have written a program that calculates the solid angle subtended by 50 disks at 500 points. On the desktop PC's this calculation takes about 15 seconds, on my laptop it takes about 5 minutes.
Is there a way to reduce the time taken to perform these calculations? e.g, can I allocate more ram to MATLAB, or can I boot up my PC in a way that optimises it for using MATLAB? I'm thinking that if the processor on my laptop is also doing calculations to run other programs this will slow down the MATLAB calculations. I've closed all other applications, but I know theres probably a lot of stuff going on I can't see. Can I boot my laptop up in a way that will have less of these things going on in the background?
I can't modify the code to make it more efficient.
Thanks!
You might run some of my benchmarks which, along with example results, can be found via:
http://www.roylongbottom.org.uk/
The CPU core used at a particular point in time, is the same on Pentiums, Celerons, Core 2s, Xeons and others. Only differences are L2/L3 cache sizes and external memory bus speeds. So you can compare most results with similar vintage 2 GHz CPUs. Things to try, besides simple number crunching tests.
1 - Try memory test, such as my BusSpeed, to show that caches are being used and RAM not dead slow.
2 - Assuming Windows, check that the offending program is the one using most CPU time in Task Manager, also that with the program not running, that CPU utilisation is around zero.
3 - Check that CPU temperature is not too high, like with SpeedFan (free D/L).
4 - If disk light is flashing, too much RAM might be being used, with some being swapped in and out. Task Manager Performance would show this. Increasing RAM demands can be checked my some of my reliability tests.
There are many things that go into computing power besides RAM. You mention processor speed, but there is also number of cores, GPU capability and more. Programs like MATLAB are designed to take advantage of features like parallelism.
Summary: You can't compare only RAM between two machines and expect to know how they will perform with respect to one another.
Side note: 4 GB is not very much RAM for a modern laptop.
Firstly you should perform a CPU performance benchmark on both computers.
Modern operating systems usually apply the most aggressive power management schemes when it is run on laptop. This usually means turning off one or more cores, or setting them to a very low frequency. For example, a Quad-core CPU that normally runs at 2.0 GHz could be throttled down to 700 MHz on one CPU while the other three are basically put to sleep, while it is on battery. (Remark. Numbers are not taken from a real example.)
The OS manages the CPU frequency in a dynamic way, tweaking it on the order of seconds. You will need a software monitoring tool that actually asks for the CPU frequency every second (without doing busy work itself) in order to know if this is the case.
Plugging in the laptop will make the OS use a less aggressive power management scheme.
(If this is found to be unrelated to MATLAB, please "flag" this post and ask moderator to move this question to the SuperUser site.)
In designing any desktop applications, are there any general rules on how much memory should the application uses?
For heavy-weight applications, those can be easily understood or at least profiled such as Firefox or Google Chrome. But for smaller utilities or line-of-business application, how much is an acceptable amount of memory usage?
I asked because I've recently come across a trade-off between memory usage and performance and wonder if there is any general consensus regarding it?
EDIT: Platform is Windows XP for users with machine just capable of running rich internet applications.
My specific trade-off problem is about caching a lot of images in memory. If possible, I'd love to have my app cache as much as the user's memory will allow. I have done it so that the application will cache upto a certain maximum limit considering memory pressure at the moment..
But what would be a good number? How do you come up with one? That's the point I'm asking.
There is no absolute answer for this. It depends on too many variables.
Here are some trade-offs for consideration:
What device/platform are you developing for?
Do you expect your user to use this software as the main purpose for their computer (example maybe you are developing some kind of server software)
Who is your target audience, home users? power users?
Are you making realistic expectations for the amount of RAM a user will have?
Are you taking into consideration that the user will be using a lot of other software on that computer as well?
Sometimes it's possible to have your cake and eat it too. For example if you were reading a file and writing it back out, you could read it chunk by chunk instead of reading the whole file into memory and then writing it out. In this case you have better memory usage, and no speed decrease.
I would generally recommend to use more RAM to get better speed if you must. But only if the RAM requirements are realistic for your target audience. For example if you expect a home user that has 1GB of RAM to use your program, then don't use 600MB of RAM yourself.
Consider using more RAM in this instance to get better speed, and to optimize some other part of your code to use less RAM.
Edit:
About your specific situation of caching images. I think it would be best for you to allow the user to set the amount of caching they would like to perform as an option. That way people with a lot of RAM can put it higher for better performance, and the people with little RAM can set it low.
This depends entirely on your target platform, which is more or less a business decision. The more memory you will need, the less customers will be able to use your software. Some questions to ask: How much memory do your customers (or potential customers) have installed in their computers? What other applications will they run simultaneously with your application? Is your application something assumed to be running exclusively (like a full screen computer game), or a utility which is supposed to run mostly in background, or to be switched into it from other applications often?
Hear is one example of a survey showing a distribution of installed RAM in systems of people playing games via Steam (source: Valve - Survey Summary Data):
Less than 96 Mb 0.01 %
96 Mb to 127 Mb 0.01 %
128 Mb to 255 Mb 0.21 %
256 Mb to 511 Mb 5.33 %
512 Mb to 999 Mb 19.81 %
1 Gb to 1.49 Gb 30.16 %
1.5 Gb to 1.99 Gb 6.10 %
2.0 Gb 38.37 %
A conclusion I would draw from a survey like this in my domain (computer games) is I can reasonably expect almost all our users having 512 MB or more, and vast majority having 1 GB or more. For a computer game which is supposed to run exclusive this means working set around 400 MB is rather safe and will limit almost no one out, and if it provides a significant added value for the product, it may have a sense to have a working set around 800 MB.
This depends on your target PC hardware. If your application uses too much memory then it will be slow while windows pages. TEST! Try both options in your compromise, and some in between if it makes sense. Run the tests on a typical machine that your users would use and with a sensible number of other applications open. So for most people that is Outlook and probably an instance or 2 of Internet Explorer (or the mail client/browser of your choice). I work in an organistion where uses of my application are also likely to be running some other custom applications so we test with those running as well. We have found that our application used too much memory, and makes switching application painfully slow so we have slowed our application slightly to reduce its memory usage.
If you are interested our target hardware was originally 512Mb machines becuase that was what our common standard spec workstation was. Several PC's had to be upgraded to 1Gb though becuase of this application. We have now trimmed its RAM usage a bit but it is written in VB .NET and most of the memory used seems to be the framework. PerfMon says the process is using aroung 200Mb (peak) but that the managed heap is only around 2Mb!