How does multi-core processors handle interrupts?
I know of how single core processors handle interrupts.
I also know of the different types of interrupts.
I want to know how multi core processors handle hardware, program, CPU time sequence and input/output interrupt
This should be considered as a continuation for or an expansion of the other answer.
Most multiprocessors support programmable interrupt controllers such as Intel's APIC. These are complicated chips that consist of a number components, some of which could be part of the chipset. At boot-time, all I/O interrupts are delivered to core 0 (the bootstrap processor). Then, in an APIC system, the OS can specify for each interrupt which core(s) should handle that interrupt. If more than one core is specified, it means that it's up to the APIC system to decide which of the cores should handle an incoming interrupt request. This is called interrupt affinity. Many scheduling algorithms have been proposed for both the OS and the hardware. One obvious technique is to load-balance the system by scheduling the interrupts in a round-robin fashion. Another is this technique from Intel that attempts to balance performance and power.
On a Linux system, you can open /proc/interrupts to see how many interrupts of each type were handled by each core. The contents of that file may look something like this on a system with 8 logical cores:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 19 0 0 0 0 0 0 0 IR-IO-APIC 2-edge timer
1: 1 1 0 0 0 0 0 0 IR-IO-APIC 1-edge i8042
8: 0 0 1 0 0 0 0 0 IR-IO-APIC 8-edge rtc0
9: 0 0 0 0 1 0 0 2 IR-IO-APIC 9-fasteoi acpi
12: 3 0 0 0 0 0 1 0 IR-IO-APIC 12-edge i8042
16: 84 4187879 7 3 3 14044994 6 5 IR-IO-APIC 16-fasteoi ehci_hcd:usb1
19: 1 0 0 0 6 8 7 0 IR-IO-APIC 19-fasteoi
23: 50 2 0 3 273272 8 1 4 IR-IO-APIC 23-fasteoi ehci_hcd:usb2
24: 0 0 0 0 0 0 0 0 DMAR-MSI 0-edge dmar0
25: 0 0 0 0 0 0 0 0 DMAR-MSI 1-edge dmar1
26: 0 0 0 0 0 0 0 0 IR-PCI-MSI 327680-edge xhci_hcd
27: 11656 381 178 47851679 1170 481 593 104 IR-PCI-MSI 512000-edge 0000:00:1f.2
28: 5 59208205 0 1 3 3 0 1 IR-PCI-MSI 409600-edge eth0
29: 274 8 29 4 15 18 40 64478962 IR-PCI-MSI 32768-edge i915
30: 19 0 0 0 2 2 0 0 IR-PCI-MSI 360448-edge mei_me
31: 96 18 23 11 386 18 40 27 IR-PCI-MSI 442368-edge snd_hda_intel
32: 8 88 17 275 208 301 43 76 IR-PCI-MSI 49152-edge snd_hda_intel
NMI: 4 17 30 17 4 5 17 24 Non-maskable interrupts
LOC: 357688026 372212163 431750501 360923729 188688672 203021824 257050174 203510941 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 4 17 30 17 4 5 17 24 Performance monitoring interrupts
IWI: 2 0 0 0 0 0 0 140 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 15122413 11566598 15149982 12360156 8538232 12428238 9265882 8192655 Rescheduling interrupts
CAL: 4086842476 4028729722 3961591824 3996615267 4065446828 4033019445 3994553904 4040202886 Function call interrupts
TLB: 2649827127 3201645276 3725606250 3581094963 3028395194 2952606298 3092015503 3024230859 TLB shootdowns
TRM: 169827 169827 169827 169827 169827 169827 169827 169827 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 7194 7194 7194 7194 7194 7194 7194 7194 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event
PIW: 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event
The first column specifies the interrupt request (IRQ) number. All the IRQ numbers that are in use can be found in the list. The file /proc/irq/N/smp_affinity contains a single value that specifies the affinity of IRQ N. This value should be interpreted depending on the current mode of operation of the APIC.
A logical core can receive multiple I/O and IPI interrupts. At that point, local interrupt scheduling takes place, which is also configurable by assigning priorities to interrupts.
Other programmable interrupt controllers are similar.
In general, this depends on the particular system you have under test.
The broader approach is to have a specific chip in each processor1 that is assigned, either statically or dinamically2, a unique ID and that can send and receive interrupts over a shared or dedicated bus.
The IDs allows specific processors to be targets of interrupts.
Code running on the processor A can ask its interrupt chip to raise an interrupt on processor B, when this happens a message is sent along the above-mentioned bus, routed to processor B where the relative interrupt chip picks it up, decode it and raise the corresponding interrupt.
At the system level, one or more, general interrupt controllers are present to route interrupt requests from the IO devices (in any bus) to the processors.
These controllers are programmable, the OS can balance the interrupt load across all the processors (or implement any other convenient policy).
This is the most flexible approach, a wired approach is also possible.
In this case, processor A signals are wired directly to processor B inputs and vice versa; asserting these signals give rise to an interrupt on the target processor.
The general concept is called Inter-processor Interrupt (IPI).
The x86 architecture follows the first approach closely3 (beware of the nomenclature though, processor has a different meaning).
Other architectures may not, like the IBM OS/360 M65MP that uses a wired approach4.
Software generated interrupts are just instructions in a program, each processor executes their own instruction stream and thus if program X generate an exception when running on processor A, it is processor A that handles it.
Task scheduling is distributed across all the processors usually (that's what Linux does.
Time-keeping is usually done by a designated processor that serves a hardware timer interrupt.
This is not always the case, I haven't looked at the precise details of the implementations of the modern OSes.
1 Usually an integrated chip, so we can think of it as a functional unit of the processor.
2 By a power-on protocol.
3 Actually, this is reverse causality.
4 I'm following the Wikipedia examples.
Related
In the output from BenchmarkDotNet I have the following lines:
For the first benchmark
WorkloadResult 100: 1 op, 614219700.00 ns, 614.2197 ms/op
GC: 123 1 0 518085976 1
Threading: 2 0 1
For the second benchmark
WorkloadResult 73: 1 op, 464890400.00 ns, 464.8904 ms/op
GC: 14 1 0 59217312 1
Threading: 7469 0 1
What do the values in GC and Threading mean?
You can find it at https://benchmarkdotnet.org/articles/guides/how-it-works.html and at https://adamsitnik.com/the-new-Memory-Diagnoser/#how-to-read-the-results.
To tell the long story short, you should care only about the results printed in the table, which are explained with Legend printed below it.
Hello I´m trying to create a program based on C++ that calculates a function values on a given range and then the program proceeds to create a DXF file in order for it to be Graphed.
The issue that I´m having it´s with the DXF part this is the code that my C++ program generates but it seems to be unable to be read by Autocad. Any insights on the issue will be much appreciated.
0
SECTION
2
ENTITIES
0
POLYLINE
8
0
62
1
66
1
70
8
0
VERTEX
8
0
70
32
10
1
20
2
30
0
0
VERTEX
8
0
70
32
10
1.2
20
2.13688
30
0
0
VERTEX
8
0
70
32
10
1.4
20
2.28024
30
0
0
VERTEX
8
0
70
32
10
1.6
20
2.42929
30
0
0
VERTEX
8
0
70
32
10
1.8
20
2.58329
30
0
0
VERTEX
8
0
70
32
10
2
20
2.74166
30
0
0
91
0
0
SEQEND
0
ENDSEC
0
EOF
There is an error in the last VERTEX:
0
VERTEX
8
0
70
32
10
2
20
2.74166
30
0
0 <---- This 0 is too much, starts a structural group tag (0, 91)
91
0
0
SEQEND
0
ENDSEC
0
EOF
If you have any information what the group code 91 (vertex identifier) is for, let me know, I am very interested.
The Issue that I was seem to be having it´s that I was using the DXF codes for a LWPOLYLINE when I should be using the DXF regarding POLYLINE. The difference is subtle but if the person that´s reading this is having the issue backtrack one by one the GROUP CODES and make sure all of them are part of the same ENTITY. I will share the code that finally was able to create an OUTPUT on AutoCad 2018 (Keep in mind the changes on the DXF format on the versions of AutoCad depending on your case)
0
SECTION
2
ENTITIES
0
POLYLINE
8
0
62
1
66
1
70
8
0
VERTEX
8
0
70
32
10
0
20
0
30
0
0
SEQEND
0
ENDSEC
0
EOF
The MQTT 3.1.1 documentation is very clear and helpful, however I am having trouble understanding the meaning of one section regarding the Keep Alive byte structure in the connect message.
The documentation states:
The Keep Alive is a time interval measured in seconds. Expressed as a 16-bit word, it is the maximum time interval that is permitted to elapse between the point at which the Client finishes transmitting one Control Packet and the point it starts sending the next.
And gives an example of a keep alive payload:
Keep Alive MSB (0) 0 0 0 0 0 0 0 0
Keep Alive LSB (10) 0 0 0 0 1 0 1 0
I have interpreted this to represent a keep alive interval of 10 seconds, as the interval is given in seconds and that makes the most sense. However I'm not sure how you would represent longer intervals of, for example, 10 minutes.
Finally, would the maximum keep alive interval of 65535 seconds (~18 hours) be represented by these bytes
Keep Alive MSB (255) 1 1 1 1 1 1 1 1
Keep Alive LSB (255) 1 1 1 1 1 1 1 1
Thank you for your help
2^16=65536 seconds 65536/60 = 1092.27 minutes 1092.27/60 = 18.20 hours
0.20hour*60 = 12minutes y 0.27min*60 = 16.2sec
result: 18 hours,12minutes, 16sec
10 minutes = 600 seconds
600 in binary -> 0000 0010 0101 1000
And yes 65535 is the largest number that can be represented by a 16bit binary field, but there are very few situations where an 18 hour keep alive interval would make sense.
I'm running a DSE 4.6.5 Cluster (Cassandra 2.0.14.352).
Following datastax's guidelines, on every machine, I separated the data directory from the commitlog/saved caches directories:
data is on blazing fast drives
commit log and saved caches are on the system drives : 2 HDD RAID1
Monitoring disks with OpsCenter while performing intensive writes, I see no issue with the first, however I see the queue size from the later (commit log) averaging around 300 to 400 with spikes up to 700 requests. Of course the latency is also fairly high on theses drives ...
Is this affecting, the performance of my cluster ?
Would you recommend putting the commit log and saved cache on a SSD ? separated from the system disks ?
Thanks.
Edit - Adding tpstats from one of nodes :
[root#dbc4 ~]# nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 15938 0 0
RequestResponseStage 0 0 154745533 0 0
MutationStage 1 0 306973172 0 0
ReadRepairStage 0 0 253 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 340298 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 0 0 0
MemoryMeter 1 1 36284 0 0
FlushWriter 0 0 23419 0 996
ValidationExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MemtablePostFlusher 0 0 27007 0 0
MiscStage 0 0 0 0 0
PendingRangeCalculator 0 0 7 0 0
CompactionExecutor 8 10 7400 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 1 222 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 49547
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
Edit 2 - sar output :
04:10:02 AM CPU %user %nice %system %iowait %steal %idle
04:10:02 PM all 22.25 26.33 1.93 0.48 0.00 49.02
04:20:01 PM all 23.23 26.19 1.90 0.49 0.00 48.19
04:30:01 PM all 23.71 26.44 1.90 0.49 0.00 47.45
04:40:01 PM all 23.89 26.22 1.86 0.47 0.00 47.55
04:50:01 PM all 23.58 26.13 1.88 0.53 0.00 47.88
Average: all 21.60 26.12 1.71 0.56 0.00 50.01
Monitoring disks with OpsCenter while performing intensive writes, I see no issue with the first,
Cassandra persists writes in memory (memtable) and on the commitlog (disk).
When the memtable size grows to a threshold, or when you manually trigger it, Cassandra will write everything to disk (flush the memtables).
To make sure your setup is capable of handling your workload try to manually flush all your memtables
nodetool flush
on a node. Or just a specific keyspace with
nodetool flush [keyspace] [columnfamilfy]
At the same time monitor your disks I/O.
If you have high I/O wait you can either share the workload by adding more nodes, or switch the data drives to better one with higher throughput.
Keep an eye to dropped mutations (can be other nodes sending the writes/hints) and dropped flush-writer.
I see the queue size from the later (commit log) averaging around 300 to 400 with spikes up to 700 requests.
This will probably be your writes to the commitlog.
Is your hardware serving any other thing? Is it software raid? Do you have swap disabled?
Cassandra works best alone :) So yes, put at least, the commitlog on a separate (can be smaller) disk.
I've been working on this for about a week. There are no errors in my coding, I just need to get algorithm and concept right. I've implemented a neural network consisting of 1 hidden layer. I use the backpropagation algorithm to correct the weights.
My problem is that the network can only learn one pattern. If I train it with the same training data over and over again, it produces the desired outputs when given input that is numerically close to the training data.
training_input:1, 2, 3
training_output: 0.6, 0.25
after 300 epochs....
input: 1, 2, 3
output: 0.6, 0.25
input 1, 1, 2
output: 0.5853, 0.213245
But if I use multiple varying training sets, it only learns the last pattern. Aren't neural networks supposed to learn multiple patterns? Is this a common beginner mistake? If yes then point me in the right direction. I've looked at many online guides, but I've never seen one that goes into detail about dealing with multiple input. I'm using sigmoid for the hidden layer and tanh for the output layer.
+
Example training arrays:
13 tcp telnet SF 118 2425 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 26 10 0.38 0.12 0.04 0 0 0 0.12 0.3 anomaly
0 udp private SF 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 3 0 0 0 0 0.75 0.5 0 255 254 1 0.01 0.01 0 0 0 0 0 anomaly
0 tcp telnet S3 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 255 79 0.31 0.61 0 0 0.21 0.68 0.6 0 anomaly
The last columns(anomaly/normal) are the expected outputs. I turn everything into numbers, so each word can be represented by a unique integer.
I give the network one array at a time, then I use the last column as the expected output to adjust the weights. I have around 300 arrays like these.
As for the hidden neurons, I tried from 3, 6 and 20 but nothing changed.
+
To update the weights, I calculate the gradient for the output and hidden layers. Then I calculate the deltas and add them to their associated weights. I don't understand how that is ever going to learn to map multiple inputs to multiple outputs. It looks linear.
If you train a neural network too much, with respect to the number of iterations through the back-propagation algorithm, on one data set the weights will eventually converge to a state where it will give the best outcome for that specific training set (overtraining for machine learning). It will only learn the relationships between input and target data for that specific training set, but not the broader more general relationship that you might be looking for. It's better to merge some distinctive sets and train your network on the full set.
Without seeing the code for the back-propagation algorithm I could not give you any advice on if it's working correctly. One problem I had when implementing the back-propagation was not properly calculating the derivative of the activation function around the input value. This website was very helpful for me.
No Neural networks are not supposed to know multiple tricks.
You train them for a specific task.
Yes they can be trained for other tasks as well
But then they get optimized for another task.
So thats why you should create load and save functions, for your network so that you can easily switch brains and perform other tasks, if required.
If your not sure what taks it is currently train a neural to find the diference between the tasks.