"intel xeon silver 4216" whether the instruction set of this CPU supports SHA256? [duplicate] - cpu

This question already has answers here:
Are there in x86 any instructions to accelerate SHA (SHA1/2/256/512) encoding?
(4 answers)
Closed 1 year ago.
It seems that AMD supports SHA256 and Intel does not. I checked that there is no relevant information. I ask questions here just hoping to get a positive reply

According to Wikipedia the list of Intel CPUs supporting the SHA opcodes are (among others) Intel Ice Lake (and later) processors.
The Xeon Silver 4216 is a Cascade Lake architecture processor which is the predecessor of the Ice Lake architecture.
Therefore the Intel Xeon Silver 4216 does not support the SHA opcodes.

Related

Where to find OPcodes documentation of my CPU and also some other CPUs?

I could find datasheets containing documentation of OPcodes and their meanings for some popular (and old) microprocessors on internet, for example, here's the link of 8085 and 4004 -
intel 4004 : https://web.archive.org/web/20110601032753/http://www.intel.com/Assets/PDF/DataSheet/4004_datasheet.pdf
intel 8085 : https://ia801807.us.archive.org/3/items/intel-8085-datasheet/8085_datasheet.pdf
I really want to know new technologies implemented in recent CPUs (CPUs released by Intel, AMD) and their OPcodes, but I could not find any documentation, especially of CPUs from AMD.
Intel do have some documentations of their latest CPUs at
https://www.intel.com/content/www/us/en/products/docs/processors/core/core-technical-resources.html , but I couldn't find documentation of OPcodes of their CPU. And I couldn't find any documentations (except standard specifcations of their CPU) by AMD.
I really want Documentation related to OPcodes of my CPU: Ryzen 5 3500 u

How to measure the ACTUAL number of clock cycles elapsed on modern x86?

On recent x86, RDTSC returns some pseudo-counter that measures time instead of clock cycles.
Given this, how do I measure actual clock cycles for the current thread/program?
Platform-wise, I prefer Windows, but a Linux answer works too.
This is not simple. Such a thing is described in the Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 3B:
Here is the behaviour:
For Pentium M processors; for Pentium 4 processors, Intel Xeon processors; and for P6 family processors: the time-stamp counter increments
with every internal processor clock cycle. The internal processor clock cycle is determined by the current core-clock to bus-clock ratio. Intel®
SpeedStep® technology transitions may also impact the processor clock.
For Pentium 4 processors, Intel Xeon processors; for Intel Core Solo
and Intel Core Duo processors; for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors; for Intel Core 2 and Intel Xeon processors; for Intel Atom processors: the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the maximum resolved frequency at which the processor is booted. The maximum resolved frequency may differ from the processor base frequency. On certain processors, the TSC frequency may not be the same as the frequency in the brand string.
Here is the advise for your use-case:
To determine average processor clock frequency, Intel recommends the use of performance monitoring logic to count processor core clocks over the period of time for which the average is required. See Section 18.17, “Counting Clocks on systems with Intel Hyper-Threading Technology in Processors Based on Intel NetBurst® Microarchitecture,” and Chapter 19, “Performance-
Monitoring Events,” for more information.
The bad news is that AFAIK performance counters are often not portable between AMD and Intel processors. Thus, you certainly need to check which performance counters to use in the AMD documentation. There are also complications: you cannot easily measure the number of of cycle taken by any arbitrary code. For example, the processor can be halted or enter in sleep mode for a short period of time (see C-state) or the OS can executing some protected code that cannot be profiled without high privileges (for sake of security). This method is fine as long as you need to measure the number of cycle of a numerically-intensive code taking relatively-long time (at least several dozens of cycles). On top of all of that, the documentation and usage of MSR is pretty complex and it has some restrictions.
Performance counters like CPU_CLK_UNHALTED.THREAD and CPU_CLK_UNHALTED.REF_TSC seems a good start for what you want to measure. Using library to read such performance counter is generally a very good idea (unless you like having a headache for at least few days). PAPI might be enough to do the job for this.
Here is some interesting related posts:
Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC
How to read performance counters by rdpmc instruction?

Which mobile windows devices don't support AVX2

I understand that Intels AVX2 extension is on the market since 2011 and therefore it is pretty much standard in modern devices.
However, for some decision making we need to find out, roughly, the share of existing mobile windows devices which don't support AVX2 (nor its successor AVX-512).
It is rather well documented, which CPUs, Intel and AMD, actually support the extension. So that is not what I am asking for.
How do I find which mobile windows devices exist on the market, including recent years, that have processors which don't yet support the AVX2 instruction set?
You're incorrect about the dates, and about being "pretty much standard", unfortunately. It could have been by now if Intel hadn't disabled it for market-segmentation reasons in their low-end CPUs. (To be slightly fair, that may have let them sell chips with defects in one half of a 256-bit execution unit, improving yields).
All AMD CPUs aimed at mobile/laptop use (not Geode), including low-power CPUs since Jaguar, have had AVX since Bulldozer. Their low-power CPUs decode 256-bit instructions to two 128-bit uops, same as they did in Bulldozer-family and Zen1. (Which meant it wasn't always worth using in Bulldozer-family, but it wasn't a lot slower than carefully-tuned SSE, and sometimes still faster, and meant software had that useful baseline. And 128-bit AVX instructions are great everywhere, often saving instructions by being 3 operand.) Intel used the same decode into 2 halves strategy in Gracemont as the E-cores for Alder Lake, like they did for SSE in P6 CPUs before Core 2, like Pentium III and Pentium M.
AVX was new in Sandy Bridge (2011) and Bulldozer (2011), AVX2 was new in Haswell (2013) and Excavator (2015).
Pentium/Celeron versions of Skylake / Coffee Lake etc. (lower end than i3) have AVX disabled, along with AVX2/FMA/BMI1/2. BMI1 and 2 include some instructions that use VEX encodings on general-purpose integer registers, which seems to indicate that Intel disables decoding of VEX prefixes entirely as part of binning a silicon chip for use in low-end SKUs.
The first Pentium/Celeron CPUs with AVX1/2/FMA are Ice Lake / Tiger Lake based. There are currently Alder Lake based Pentiums with AVX2, like 2c4t (2 P cores) Pentium Gold G7400 and Pentium Gold 8505 (mobile 1 P core, 4 E cores). So 7xxx and 8xxx and higher Pentiums should have AVX1 / AVX2 / FMA, but earlier ones mostly not. One of the latest without AVX is desktop Pentium Gold G6405, 2c4t Comet Lake launched in Q1 2021. (The mobile version, 6405U, launched in Q4'19). There's also an "Amber Lake Y" Pentium Gold 6500Y with AVX2, launched Q1'21.
Low-power CPUs in the Silvermont family (up to 2019's Tremont) don't support AVX at all.
These are common in "netbook" and low budget laptops, as well as low-power servers / NAS. (The successor, Gracemont, has AVX1/2/FMA, so it can work as the E-cores in Alder Lake.)
These CPUs are badged as Pentium J-series and N-series. For example, Intel Pentium Processor N6415 launched in 2021, with 4 cores, aimed at "PC/Client/Tablet" use cases. These are Elkheart Lake (Tremont cores), with only SSE4.2.
The "Atom" brand name is still used on server versions of these, including chips with 20 cores.

Xcode compile times: which Mac configuration delivers noticeable best performance? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I looked at the different configurations of Macs available: MacBook Pro, iMac and iMac Pro.
Are the huge configurations of e.g. the iMac Pro (Xeon, 18 cores etc.) noticeable speeding up Xcode compilation times? Or are those specs tailored at video editing?
Also if I compare
3,2 GHz 8-Core Intel Xeon W Processor
4,2 GHz Quad‑Core Intel Core i7 Processor
more cores, less GHz or the other way round? What's most important for Xcode compilation performance - cores? Processor? Ghz?
Its super easy.
Xcode uses processor power for compiling tasks.
CPU Specification formula:
**
3,2Ghz * 8 cores = 25,6 Ghz
4,2Ghz * 4 cores = 16,8 Ghz
**
So answering to your question, the most important for Xcode compilation performance is processor power.
First processor, xeon based will be much more productive for xcode routine. Use that formula.
p.s. My answer based on assumption that both processors is the same or nearnly same year production. Its also important to take in mind the youth of CPU.
For 100% sure, check your processors at Geekbench
A higher clock speed allows more processes to be executed in a given time frame. Whereas multiple cores allow for parallel processing. However, the benefits are not double, because not everything will be able to run in parallel for the whole time.
4 cores sounds like plenty. You could maybe go to 6 and be able to justify it, but 8 would be overkill and a waste of money. A higher clock speed will be much more useful and would be much more useful when using the computer for other tasks as well. Also, in regards to the type of processor, they don’t matter too much. As long as you are getting the performance, the implementation doesn’t matter much compared to the other metrics.
Edit
It is also important to take into account the Turbo Boost speeds. This allows a processor to run at a lower clock speed, when non-intensive tasks are running, in order to save energy consumption. For intensive tasks, it will be the Turbo Boost speed that you are getting. This is managed automatically by macOS, but can be manually controlled using an app such as Turbo Boost Switcher.
For the Quad-Core i7, it has a Turbo Boost of 4.5GHz, whereas the 8 Core Xeon has a Turbo Boost of 4.2GHz. This makes them much closer in terms of clock speed. However, the i7 still beats the Xeon in terms of outright clock speed. It also beats it in terms of normal speed, which will benefit with other tasks performed on the computer, and will help with any ‘turbo lag’, if it it managed by the system. Finally, it also has an additional benefit of beating the Xeon on price. This means that for compiling and other Xcode tasks, the i7 is a clear winner.
Look at your current machine. Open Activity Monitor while you are building. If everything is perfect, you would have 100% CPU usage. On a good day you come to 70%, because nothing is perfect.
I have some third party build-scripts that are highly inefficient and use only one core. So your 18 core Mac won't benefit from that at all.
The first and cheapest approach is to make sure you use pre-compiled headers, especially for C++ code, and that your build scripts use all available processors. I have one C++ library that I could build four times faster after doing that.
Note that "GHz" numbers don't tell you what really happens. As your Mac uses more cores, it heats up, and has to reduce the clock speed. So that 3.2 GHz eight core model can run four threads at a much higher speed, probably the same speed as the 4.2 GHz quad core model.
Right now I would recommend you get an M1 Mac for highest single core performance and good multi-core performance, or wait a little bit for their second generation with 8 performance cores. That's what I will be doing.
I suggest you take the i7 one. (If both of the processors have the same release date, always take the newer release date)
If you are comparing processor performances, you need to know what that processor build for. Intel Xeon is a server processor, and Intel i7 is high-end pc processor.
When comparing 4,2 GHz Quad‑Core Intel Core i7 Processor vs 3,2 GHz 8-Core Intel Xeon W Processor for a single app the answer is simply the i7 one. Xcode build process may only take one full core with paralleling some its computing process in other core.
The 8-Core Xeon will better use for running computing process as a server do.

Intel xeon phi programming with gcc

I kind of want to get the intel xeon phi co-processor since there is a model which seems to be running for $230. I have two questions. Can I fully utilize the capabilities of this just using gcc along with openmp or will I need the intel compiler. Also what is it about this model which makes it so cheap?
http://www.amazon.com/Intel-BC31S1P-Xeon-31S1P-Coprocessor/dp/B00OMCB4JI/ref=sr_1_2?ie=UTF8&qid=1444411560&sr=8-2&keywords=intel+xeon+phi
3100 series is a first generation of Xeon Phi (codenamed Knights Corner, abbreviated KNC).
Using GCC for Xeon Phi KNC programming is definitely not perfect idea. See for example: Xeon Phi Knights Corner intrinsics with GCC
So it's extremely recommended to use Intel Compiler for KNC. And yes, in case of non-commerical use, you can apply for free Intel Compilers license here: https://software.intel.com/en-us/qualify-for-free-software (this is kind of new program, unavailable in past).
Given KNC price tag is low enough, although I periodically observe KNC sales for similar prices (so at least it's not "incomplete" Phi; and it's not cheating, although Gilles' passive cooling point is valid). I don't know which problems you work on, but you should be aware that KNC is most of all suitable for some highly parallel workloads. There is a good reference of types of applications which could benefit from using Xeon Phi KNC: https://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-applications-and-solutions-catalog
As I mentioned in the beginning, you are asking about first generation Xeon Phi. Many things (including GCC answer) will likely change with introduction of second generation of Xeon Phi (codenamed Knights Landing, KNL) to be publically released in ~next year.
Gcc permits you to compile codes and run them for Xeon Phi, and I believes it does quite a good job in that. Indeed, AFAIK, gcc is the compiler used for compiling the Linux environment available on Xeon Phi. However, for fully taking advantage of the potential performance of Xeon Phi, I would strongly encourage you to use the Intel compiler. As a matter of fact, and unless I'm greatly mistaken, you can download and install the Intel compiler suite for free for personal use.
Regarding the Xeon Phi card, it comes cheap, not really because it lacks of anything one would wand for a Xeon Phi card, but more because it is a passively cooled card. That means that, unless you thinker some cooling device with cardboard and fans, you won't be able to slot the card and use it in a standard PC. You'll need a rackable server which doesn't come cheap and is usually very noisy. So if you've got a server to put the card in, this is a bargain. But if you don't, you'd better think it through.

Resources