I have a server with four mic cards (mic0-mic3), and it works well.I want to disable some mic, for example mic3, now only mic0 - mic2 is available.
what should I do?
OFFLOAD_DEVICES="0,1,2" # run with devices 0, 1 and 2 visible
The environment variable OFFLOAD_DEVICES restricts the process to use only the MIC cards specified as the value of the variable. is a comma separated list of physical device numbers in the range 0 to (number_of_devices_in_the_system-1).
Devices available for offloading are numbered logically. That is _Offload_number_of_devices() returns the number of allowed devices and device indexes specified in the target specifiers of offload pragmas are in the range 0 to (number_of_allowed_devices-1).
Example
export OFFLOAD_DEVICES="1,2"
Allows the program to use only physical MIC cards 1 and 2 (for instance, in a system with four installed cards). Offloads to devices numbered 0 or 1 will be performed on physical devices 1 and 2. Offloads to target numbers higher than 1 will wrap-around so that all offloads remain within logical devices 0 and 1 (which map to physical cards 1 and 2). The function _Offload_get_device_number() executed on a MIC device will return 0 or 1, when the offload is running on physical devices 1 or 2.
Related
It's known that a single PF can map to multiple VFs.
About the number of VFs associated with a single PF:
In PCIe 5.0 spec:
IMPLEMENTATION NOTE
VFs Spanning Multiple Bus Numbers
As an example, consider an SR-IOV Device that supports a single PF. Initially, only PF 0 is visible. Software Sets ARI Capable Hierarchy. From the SR-IOV Extended Capability it determines: InitialVFs is 600, First VF Offset is 1 and VF Stride is 1.
If software sets NumVFs in the range [0 … 255], then the Device uses a single Bus Number.
If software sets NumVFs in the range [256 … 511], then the Device uses two Bus Numbers.
If software sets NumVFs in the range [512 … 600], then the Device uses three Bus Numbers.
PF 0 and VF 0,1 through VF 0,255 are always on the first (captured) Bus Number. VF 0,256 through VF 0,511 are always on the second Bus Number (captured Bus Number plus 1). VF 0,512 through VF 0,600 are always on the third Bus Number (captured Bus Number plus 2).
From Oracle:
Each SR-IOV device can have a physical function and each physical function can have up to 64,000 virtual functions associated with it.
From the "sharing PCIe I/O bandwidth" point of view, it might be understandable to having hundres or thousands of VFs (associated with a single PF), each VF is assigned to a VM, with the assumption that most of the VFs are in idle state at a particular time point;
However, from the "chip manufacturing" point of view, for a non-trival PCIe function, duplicating hundreds or thousands of the VF part of the IP instances within a single die would make the die area too large to be practical.
So my question is, as stated in the subject line, are there practical use cases for having so many VFs associcated with a single PF?
I know that with the function SetProcessAffinityMask(1) I can set a mask that defines the available processors (e.g. 1 - only the first processor is available). I can count the number of 1's in the binary version of AffinityMask and get the result, but I wonder if there exists a special function that can show it.
GetProcessAffinityMask returns values for the process and system but you have to count the bits yourself. Maximum is 32 or 64 depending on your processes bitness.
GetNativeSystemInfo is documented to return the number of processors for the group your process is in. This means the maximum number is probably 64.
There can be more than 64 logical processors on Windows 7 and later but then you have to deal with NUMA and processor groups. GetActiveProcessorGroupCount+GetActiveProcessorCount should be able to tell you the number for the system. You can call some extra functions to spread your threads over multiple groups but according to MSDN:
Starting with Windows 11 and Windows Server 2022, on a system with more than 64 processors, process and thread affinities span all processors in the system, across all processor groups, by default.
There are several frequency bands within the radio spectrum that are used for the Wi-Fi and within these, there are many channels that have been designated with numbers so they can be identified.
The table given below provides the frequencies for the total of fourteen 802.11 Wi-Fi channels that are available around the globe.
How to allocate different channels for example 1, 3, 8 to three-radio interfaces of a wireless node as drawn in Figure 2 for node 1?
The Showcases > Wireless > Multiple Wireless Interfaces example is not my answer. (without access point and set channel number to the interface not based on SSID for tuning channel for periods of simulation time is intended. See the second figure)
You can set in omnetpp.ini:
*.host[*].wlan[*].radio.bandwidth = 20MHz
*.host[0].wlan[0].radio.centerFrequency = 2.412GHz # channel no 1
*.host[0].wlan[1].radio.centerFrequency = 2.422GHz # channel no 3
*.host[0].wlan[2].radio.centerFrequency = 2.447GHz # channel no 8
# and so on ...
Reference: Ieee80211Radio.
I'm not even sure if this is possible but I think it's worth asking anyway.
Say we have 100 devices in a network. Each device has a unique ID.
I want to tell a group of these devices to do something by broadcasting only one packet (A packet that all the devices receive).
For example, if I wanted to tell devices 2,5,75,116 and 530 to do something, I have to broadcast this : 2-5-75-116-530
But this packet can get pretty long if I wanted (for example) 95 of the devices to do something!!!
So I need a method to reduce the length of this packet.
After thinking for a while, I came up with an idea:
what if I used only prime numbers as device IDs? Then I could send the product of device IDs of the group I need, as the packet and every device will check if the remainder of the received number and its device ID is 0.
For example if I wanted devices 2,3,5 and 7 to do something, I would broadcast 2*3*5*7 = 210 and then each device will calculate "210 mod self ID" and only devices with IDs 2,3,5 and 7 will get 0 so they know that they should do something.
But this method is not efficient because the 100th prime numbers is 541 and the broadcasted number may get really big and the "mod" calculation may get really hard.(the devices have 8bit processors).
So I just need a method for the devices to determine if they should do something or ignore the received packet. And I need the packet to be as short as possible.
I tried my best to explain the question, If its still vague, please tell me to explain more.
You can just use a bit string in which every bit represents a device. Then, you just need a bitwise AND to tell if a given machine should react.
You'd need one bit per device, which would be, for example, 32 bytes for 256 devices. Admittedly, that's a little wasteful if you only need one machine to react, but it's pretty compact if you need, say, 95 devices to respond.
You mentioned that you need the device id to be <= 4 bytes, but that's no problem: 4 bytes = 32 bits = enough space to store 2^32 device ids. For example, the device id for the 101st machine (if you start at 0) could just be 100 (0b01100100) = 1 byte. You would just need to use that to figure out which byte of the packet to use (ceil(100 / 8) = the 13th) and bitwise AND that byte against 100 % 8 = 4 = 0b00000100.
As cobarzan said, you also can use a hybrid scheme allowing for individual addressing. In that scenario, you could use the first bit as a signal to indicate multiple- or single-machine addressing. As cobarzan said, that requires more processing, and it means the first byte can only store 7 machine signals, rather than 8.
Like Ed Cottrell suggested, a bit string would do the job. If the machines are labeled {1,..,n}, there are 2n-1 possible subsets (assuming you do not send requests with no intended target). So you need a data structure able to hold every possible signature of such a subset, whatever you decide the signature to be. And n bits (one for each machine) is the best one can do regarding the size of such a data structure. The evaluation performed on the machines takes constant time (on machine with label l just look at the lth bit).
But one could go for some hybrid scheme. Say you have a task for one device only, then it would be a pity to send n bits (all 0s, except one). So you can take one additional bit T which indicates the type of packet. The value of T is set to 0 if you are sending a bit string of length n as described above or set to 1 if you are using a more appropriate scheme (i.e. less bits). In the case of just one machine that needs to perform the task, you could send directly the label of the machine (which is O(log n) bits long). This approach reduces the size of the packet if you have less than O(n/log n) machines you need to perform the task. Evaluation on the machines is more expensive though.
Symptoms:
I'm trying to first of all make sure there are actually two nvidia cards in this box, so
in VS2010 -> NSight -> Windows -> SystemInfo -> Display Devices
I can see that there seems to be two devices.
NVIDIA GeForce GTX 560 Ti
Name \\.\DISPLAY1<br>
ID PCI\VEN_10DE&DEV_1200&SUBSYS_35151458&REV_A1<br/>
State Flags AttachedToDesktop, PrimaryDevice<br/>
Monitor<br/>
Name \\.\DISPLAY1\Monitor0 <br/>
String Generic PnP Monitor <br/>
State Flags AttachedToDesktop, MultiDriver <br/>
NVIDIA GeForce GTX 560 Ti
Name \.\DISPLAY2
ID PCI\VEN_10DE&DEV_1200&SUBSYS_35151458&REV_A1
State Flags None
BUT
in VS2010 -> NSight -> Windows -> SystemInfo -> GPU Devices or CUDA Devices
I can only see one column of values (not counting the 'Attribute' column)
I can only see one card under NVIDIA Control Panel -> 3D settings -> set PhysX Configuration
In code, when I do
int devCount;
cudaGetDeviceCount(&devCount);
devCount will be just '1'
As a result, I cannot set to use a specific GPU as I wanted.
QUESTIONS:
I wonder
If this is because the 1st GeForce card is used by the monitor hence all CUDA computations are carried out on the 2nd card and are only aware of the 2nd card?
Even if assumption is correct, is there a way to circumvent this on Windows so that I can still do computation on two GPU devices?
My recommendation if unsure of the number of GPUs in a windows system is to check Device Manager. Alternatively, if you have physical access to the system, look at the I/O area of the case and count the cards, or open the box and count the cards.
Also note that in device manager, GPUs like Tesla K10 and GeForce GTX 690 (and there are some others) will show up as 2 GPU adapters, even though there is only one physical card. However, for logical and programming purposes, these devices will show up as 2 separate adapters. Likewise, CUDA will enumerate them as 2 separate GPUs, so that you can for example use cudaSetDevice() to select one or the other. Cards like these are effectively two GPUs in one.