How to find number of physical core in a Windows system with c++ code [duplicate] - windows

This question already has answers here:
How to get the number of actual cores on the cpu on windows? [duplicate]
(2 answers)
How to Detect the Number of Physical Processors / Cores on Windows, Mac and Linux
(14 answers)
Closed 5 years ago.
I tried this but it will shows number of logical processors only
SYSTEM_INFO sysinfo;
GetSystemInfo(&sysinfo);
int numCPU = sysinfo.dwNumberOfProcessors;

From https://msdn.microsoft.com/en-us/library/windows/desktop/ms724958(v=vs.85).aspx:
Note For information about the physical processors shared by logical processors, call GetLogicalProcessorInformationEx with the RelationshipType parameter set to RelationProcessorPackage (3).
You can get the related hardware of the logical processors, and infer how many physical processors are there

Related

L1 cache - how many simultaneous loads from the same cache line can x86/x64 do? [duplicate]

This question already has answers here:
Load/stores per cycle for recent CPU architecture generations
(1 answer)
How many CPU cycles are needed for each assembly instruction?
(5 answers)
VIPT Cache: Connection between TLB & Cache?
(1 answer)
Closed 2 years ago.
I have some code which reads from an array. The array is largish. I'd expect it to live substantially in L2 cache. Call this TOld.
I wrote an alternative that reads from an array that fits mainly in a single cache line (that I don't expect to be evicted). Call this TNew.
They should produce the same results, and they do. TOld does a single read of its array to get its result. TNew does 6 reads (and a few simple arithmetic ops which are negligible). In both cases I'd expect the reads to dominate.
Cost of L2 cache read by TOld ~15 cycles. Cost of L1 cache reads by TNew ~5 cycles, but I do 6 of them, so expect total ~30 cycles. So I'd expected TNew should be about half the speed of TOld. Instead it's just a few percent difference.
This suggests that the L1 cache is capable of doing 2 reads simultaneously, and from the same cache line. Is that possible in x86/x64?
Other alternative is I haven't correctly aligned TNew's array to land in a single cache line and it's in straddling 2 cache lines, maybe that allows 2 simultaneous reads, one per line. Is that possible?
Frankly neither seem credible, but opinions welcome.

Phy memory set + offset bigger than page offset [duplicate]

This question already has answers here:
Why does Intel use a VIPT cache and not VIVT or PIPT?
(1 answer)
VIPT Cache: Connection between TLB & Cache?
(1 answer)
Closed 2 years ago.
I understood that their is a way that even when cache set + offset bigger than page offset, the cache don't need to wait for finding the physical page number and using 2-3 lsb of the virtual page number as msb of the set, in this way it just took 2 clocks instead of 3 to reach the data. What is the name of it? Any relevant links are welcome

Why does level 1 use split cache? [duplicate]

This question already has answers here:
What does a 'Split' cache means. And how is it useful(if it is)?
(1 answer)
Why is the size of L1 cache smaller than that of the L2 cache in most of the processors?
(7 answers)
Why implement data cache and instruction cache to reduce miss? [duplicate]
(2 answers)
Closed 2 years ago.
Generally, what processors follow is that level 1 cache is split while level 2 in unified cache. Why is it so?
I'm not aware of any processor designed in the last 15 years that has a unified (L1) cache.

How can a 32 bit cpu transfer 64 or even 128 bits in parallel on a data bus? [duplicate]

This question already has answers here:
Data bus width and word size
(2 answers)
How do we determine if a processor is 8-bit; 16-bit or 32-bit
(8 answers)
word size and data bus
(2 answers)
Closed 2 years ago.
Well, I just recently started reading the book: Structured Computer Organization, By Andrew Tannerbaun, and everthing was clear to me until I reached this sentence on ch.2: "Finally, many computers can transfer 64 or 128 bits in parallel on a single bus cycle, even on 32-bit machines". The problem with this is that I cannot picture how something like this would work and, as far as I know, a cpu has a single data bus.
If there were for example, a 32bit CPU in a 64bit system (64bit data bus), how would the CPU do to transfer the 64bits "in parallel" on the same bus cycle?

Blocked from accessing more memory in R despite having available memory in my system [duplicate]

This question already has answers here:
Increasing (or decreasing) the memory available to R processes
(7 answers)
Closed 8 years ago.
I am trying to access more memory using code I found in stackoverflow (Increasing (or decreasing) the memory available to R processes). However, I get the following error which I haven't been able to resolve:
memory.limit(10000)
Error in memory.limit(10000) :
don't be silly!: your machine has a 4Gb address limit
R is telling me that I have a 4gb address limit (despite the fact that I'm on a 64bit OS with 16gb of RAM). Anyone know how to get around this?
Windows OS: Windows 7 Enterprise, Intel(R) Core(TM) i7-2600 CPY #3.40GHz
Installed Memory (RAM): 16.0GB
System type: 64 bit OS
R Version: 3.0.0
RStudio Version: 0.97.551
I never used R, but with a quick search I came across memory.limit()documentation (here)
I quote :
memory.limit(size = NA)
size : numeric. If NA report the memory size, otherwise request a new limit, in Mb.
10.000 MB = 10 GB, hence the error.
About the 64-bit problem, it may come from R itself (depending on the virtual machine version I guess).

Resources