I am looking for a scsi command that I can use to get device max link capability.
Can I use scsi MODE SENSE (page 19) to read the phy descriptors and get the link speed? But how is it different than max link speed and negotiated linkspeed?
According to SPL-4, you many use SMP DISCOVER function or SCSI MODE SENSE page 19h Protocol Specific Port, subpage 01h Phy Control And Discover.
For link speed, you should use NEGOTIATED LOGICAL LINK RATE. The max link speed are HARDWARE MAXIMUM PHYSICAL LINK RATE (physical limitation) and PROGRAMMED MAXIMUM PHYSICAL LINK RATE (configured by PHY CONTROL function)
This is the example of using SMP DISCOVER way:
[fge#el7 ~]$ sudo smp_discover /dev/bsg/expander-0\:0
phy 0:T:attached:[500015535990a300:00 t(SATA)] 3 Gbps
phy 1:T:attached:[500015535990a301:00 t(SATA)] 3 Gbps
phy 2:T:attached:[500015535990a302:00 t(SATA)] 1.5 Gbps
phy 3:T:attached:[500015535990a303:00 t(SATA)] 3 Gbps
phy 4:T:attached:[500015535990a304:00 t(SATA)] 3 Gbps
phy 5:T:attached:[500015535990a305:00 t(SATA)] 3 Gbps
phy 6:T:attached:[500015535990a306:00 t(SATA)] 3 Gbps
phy 7:T:attached:[500015535990a307:00 t(SATA)] 3 Gbps
phy 8:T:attached:[500015535990a308:00 t(SATA)] 3 Gbps
phy 9:T:attached:[500015535990a309:00 t(SATA)] 3 Gbps
phy 10:T:attached:[500015535990a30a:00 t(SATA)] 3 Gbps
phy 11:T:attached:[500015535990a30b:00 t(SATA)] 3 Gbps
phy 12:T:attached:[500015535990a30c:00 t(SATA)] 3 Gbps
phy 13:T:attached:[500015535990a30d:00 t(SATA)] 3 Gbps
phy 14:T:attached:[50014ee5aaab876f:01 t(SSP)] 3 Gbps
phy 15:T:attached:[500015535990a30f:00 t(SATA)] 3 Gbps
phy 16:S:attached:[5d4ae520995f8200:03 i(SSP+STP+SMP)] 3 Gbps
phy 20:T:attached:[500015535956633f:16 exp i(SMP) t(SMP)] 3 Gbps
phy 21:T:attached:[500015535956633f:17 exp i(SMP) t(SMP)] 3 Gbps
phy 22:T:attached:[500015535956633f:18 exp i(SMP) t(SMP)] 3 Gbps
phy 23:T:attached:[500015535956633f:19 exp i(SMP) t(SMP)] 3 Gbps
phy 24:D:attached:[500015535990a33e:24 V i(SSP) t(SSP)] 3 Gbps
Related
For example I have a 16 lane CPU, with a PCIE x16 graphics card and a PCIE x1 Wifi card. Does this make my graphics card run at PCIE x8 or PCIE x15?
Edit: My CPU is a Intel Core i5 7600k, and my Motherboard is an MSI Mortar Z270.
Your card would run at x8, because your CPU allows following combinations:
1x16 or 2x8 or (1x8 and 2x4)
I am beginner in cluster configuration. I know in our cluster we have types of worker nodes:
16 x 4TB Disks
128 RAM
2 x 8 Core CPUs
12 x 1.2 TB Disks
256 RAM
2 x 10 Core CPUs
I am confused about the configuration. What does mean 2 x 8 cores? It means 2 processor with 8 physical core each? So if my processor are hyperthreading i will have 2 X 8 X 2 = 32 virtual cores?
And 12 x 1.2 TB means, 12 disks with 1.2 TB each?
Usually 2x 8 Core CPUs, means, that you have 2 physical chips on your motherboard, each having 8 Cores. If you enable hyperthreading, you then have 32 virtual cores.
The amount of disks is either the way, like you stated it, or its the number of nodes. Then you have 16 nodes with 4TB disk.... and 12 nodes with 1.2TB disk ....
I am just wondering, how someone can get to this hardware, not knowing what it means. Can you send me some nodes? :)
With the intention of comparing the speed of GPU vs CPU computing, I ran the example codes available here (a Mandelbrot set on the GPU) from MATLAB central. Below are the results that I obtained:
Case 1 (without GPU): 6.2 secs
Case 2 (using parallel.gpu.GPUArray): 6.518 secs (1.39 secs in the example)
Case 3 (Using Element-wise Operation): 1.259 secs (0.14 secs in the example)
As can be seen, there is no improvement in case 2 and only slight improvement of around 4 times in case 3. As the example did not state the details of GPU they used, may I know if this is simply due to the "incompetency" of my graphic card or am I missing something important?
The graphic card is also responsible for driving my display (HP Z Display Z23i 23-inch IPS LED Backlit Monitor).
CPU: Intel i7-4790, 3.6 GHz (8 cores)
GPU:
Name: 'NVS 510'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 6
ToolkitVersion: 5
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
FreeMemory: 1.6934e+09
MultiprocessorCount: 1
ClockRateKHz: 797000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Thank you!
Edit
The GPU used in the example here is Tesla C2050. (Credits to #Sam Roberts)
The times on that link are most likely for a different GPU in comparison to yours. They don't specify what kind of graphics card they're using, but my guess is that they're using a more higher end card.
By Googling NVS 510, the specs are similar to the card that I have for my machine. However, your card is geared towards business while mine is geared towards gaming. I have a GTX 660 which is one of the higher end GPUs that are available on the market.
These are the attributes of my graphics card:
CUDADevice with properties:
Name: 'GeForce GTX 660'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 6.5000
ToolkitVersion: 5.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
FreeMemory: 1.5357e+09
MultiprocessorCount: 5
ClockRateKHz: 1084500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
The differences between my card and yours are that I have 5 multiprocessors, and my clock rate is about 300 MHz faster than yours. For a side-by-side comparison, check out my card in comparison to yours:
NVS 510: http://www.nvidia.ca/object/nvs-510-graphics-card.html#pdpContent=2
GTX 660: http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-660/specifications
Upon further inspection, I have a much higher memory bandwidth than your card. I also have 960 GPU cores in comparison to your 192.
I decided to run these tests to compare my performance with your timings. My CPU is an i7-4770 3.6 GHz Intel and I have 16 GB of RAM on my machine.
The times that I get by running those examples are the following:
Case #1 - Without GPU: 6.46 seconds
Case #2 - Naive GPU: 0.82 seconds - 7.9x faster
Case #3 - Through CUDA: 0.09 seconds - 71.7x faster
With this, my guess is that your graphics card may be of a lower quality in comparison to those tests that MathWorks performed. Maybe try updating your graphics drivers and see if that helps. However, my guess is that my performance is much better due to the multiprocessor count, faster clock, a higher amount of cores and higher memory bandwidth.
I have Linux running as a VM with 2 vCPUs and one interface. For the interface's rx interrupt I have the IRQ affinity set to both the vCPUs (in /proc/../smp_affinity).
How is the interrupt assignment to the CPU done in this case ?
With iperf traffic the combined cpu usage for 2 vCPUs is 100%, with most of it from soft-interrupt handling. At a given instance the split between the 2 vCPUs is random, 30-70, 60-40, 50-50 etc.
If I change the IRQ affinity to one vCPU that CPU goes to 100%.
If the kernel is doing plain round robin between the 2 vCPUs shouldn't the load on both the vCPUs be close to 100% instead of the combined load between 2 vCPUs being 100% ?
Here is my cat /proc/cpuinfo output:
...
processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5520 # 2.27GHz
stepping : 5
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 3
cpu cores : 4
apicid : 23
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic ...
bogomips : 4533.56
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management :
This machine has two CPUs, each with 4 cores with hyperthreading capability, so the total processor number is 16(2 CPU * 4 core * 2 hyperthreading). These processors have same output, to keep clean, I just show the last one's info and omit part of flags in the flags line.
So how do I calculate the peak performance of this machine in terms of GFlops?
Let me know if more info should be supplied.
Thanks.
You can check the Intel export spec.
The GFLOP in the chart is usually referred as the peak of a single chip.
It shows 36.256 Gflop/s for E5520.
This single chip has 4 physical cores with SSE.
So this GFLOP can also be calculated as:
2.26GHz*2(mul,add)*2(SIMD double precision)*4(physical core) = 36.2.
You system has two CPUs, so your peak is 36.2*2 = 72.4 GFLOP/S.
you can find a formula in this website:
http://www.novatte.com/our-blog/197-how-to-calculate-peak-theoretical-performance-of-a-cpu-based-hpc-system
here the formula:
performance in GFlops = (CPU speed in GHz) x (number of CPU cores) x (CPU instruction per cycle) x (number of CPUs per node).
so in your case: 2.27x4x4x2=72.64 GFLOP/s
see here for the configuration of your CPU http://ark.intel.com/products/40200