How to distinguish between packets received in a node according to the type of packet and its source (creator) - omnet++

in the result files (*.sca, *.anf), for a particular node, the "packetReceived:count" metric shows the number of packets received for that node. I want to distinguish received packets by node according to packet type and source, like this:
Source
Destination
Type
Count
n1
n5
UdpBasicApp
3
n1
n5
PingApp
75
n2
n5
VoIPApp
2
n2
n6
VoIPApp
32
n1
n7
UdpBasicApp
8
, etc. To reach this detailed information according to the above table, what is the practical solution?
Thanks a lot

For example, you can use multiple sink apps in the destination, one for each category you want to distinguish between.

Related

How do I speculatively receive requests from multiple MPI sending processes?

Say I have 4 MPI processes labelled: P0, P1, P2, P3. Each process potentially has packets to send to other processes, but may not.
I.e. P0 needs to send packets to P1 and P2, or
P0->[P1, P2]
Similarly,
P1->[P3]
P2 ->[]
P3 -> [P1]
So P1 has to receive potential packets from both P0 and P3, and P3 has to receive packets from P1, and P2 from P0.
How do I do this in MPI? It's sort of like a 'sparse' all to all communication, however in order to set up the recvs I need to know at each process how many times it will receive packets, I'm not sure how to do this, as using MPI_MProbe in a loop breaks as soon as the receiver detects a single packet, how do I ensure that it only breaks when it receives all packets?
Each process needs to tell every other process how many msgs there will be, including zero. You can do that with an all-to-all.
However, more efficiently you can do a reduce-scatter. Each process makes a send buffer of length P with 0/1 depending whether a msg is sent. View that as a matrix with element (i,j) is 1 if process i sends to j. Then a reduce-scatter basically gives each process j the sum of elements in column j. Meaning the number of messages it will receive. You then run a MPI_Probe that many times.
I've solved it with the following similar method to you #Victor Eijkhout,
Code snippet in Rust:
let mut to_receive = vec![0i32; size as usize];
world.all_reduce_into(
&packet_destinations,
&mut to_receive,
SystemOperation::sum(),
);
Where packet_destinations is a vector containing 1 if the process corresponding to the index is being sent data from the current process, and zero otherwise.
Thank you for your response, I loved your HPC textbook by the way.

How to understand the kth receive on a channel with capacity C happens before the k+Cth send from that channel completes?

It comes from Channel communication.
What I really can't understand is why kth receive happens before the k+Cth send? Why not kth send or k+1 th send?
A capacity of a buffered channel is a number of nonblocked sends to the channel that can happen. When you have used all the capacity (C + k states about it) the next send will be blocked until a receive from the channel and only then it will complete.
If rephrase the sentence from the link with some numbers like C = 3 and k = 1 (for clarity), it will be like:
"The fourth send to a channel with a capacity 3 will complete after the first receive from the channel".
The kth receive on a channel with capacity C happens before the k+Cth send from that channel completes.
It simply means that the channel with capacity c, can only hold c messages at a time thus to send 11th(k+c) message on a channel with capacity of 4(c), 7(k) messaged should have been received other wise the 11th send will be blocked until the receive.

How to understand slot sharing and parallelism in Apache Flink

I'm trying to figure out slot sharing and parallelism in Flink with the example WordCount.
Saying that I need to do the word count job with Flink, there are only one data source and only one sink.
In this case, can I make a design just like the image above? I mean, I set two sub-tasks on Source + map() and two sub-tasks on keyBy()/window()/apply(), in other words, I have two lines: A --- B --- Sink and C --- D --- Sink so that I can get a better performance.
For example, there is a data stream coming: aaa, bbb, aaa. With the design above, I may get such a situation: aaa and bbb goes into the A --- B and the other aaa goes into the C --- D. And finally, I can get the result aaa: 2, bbb: 1 at the Sink. Am I right for now?
If I'm right, I know that subtasks of the same task cannot share a slot, so does it mean that A and C can't share a slot, B and D can't share a slot? Am I right? How do I assign the slots? Should I put A + B + Sink into one slot and put C + D into another slot?
Slot sharing is enabled by default. With slot sharing enabled, the number of slots required is the same as the parallelism of the task with the highest parallelism (which is two in this case).
In this example the scheduler will put A + B + Sink into one slot, and C + D into another. This isn't something you normally need to configure or even give much thought to, as the defaults work well in most cases.
If you were to completely disable slot sharing, then this job would require 5 slots, one for each of A, B, C, D, and the sink. But disabling slot sharing is almost never a good idea. Just make sure each slot has sufficient resources to run all of the subtasks concurrently.

WebSphere MQ Cluster QMGR, Mechanism of dispatch messages to nodes

I am using WAS MQ 7.0 and there is my scenario;
I have a Cluster Queue Manager which name 'CLUSD' and two nodes for clustering which names 'N1' , 'N2'.
N1 and N2 configurations are as the same which means there is no priority set for each queue.
When I tried to send messages to CLUSD, the QMGR tried to send messages to their nodes (N1, N2); but there is no undestandable mechanism that why sometimes N1 is get more messages than N2 and vise versa.
I have a message producer which send messages in a while loop for couple of minutes. After each minute I get count of enqueue for each nodes queue; obviously always there is different between count of N1 and N2.
I know when I tried to use WAS MQ, always I have bigger fish to fry ;) but I want to get same result when there is same configuration such as software, hardware and etc.
What can I do for cover this.
As documented here http://publib.boulder.ibm.com/infocenter/wmqv7/v7r0/index.jsp?topic=/com.ibm.mq.csqzah.doc/qc10940_.htm:
The distribution of user messages is not always exact, because administration and maintenance of the cluster causes messages to flow across channels. The result is an uneven distribution of user messages which can take some time to stabilize. Because of the admixture of administration and user messages, place no reliance on the exact distribution of messages during workload balancing.
This blog describes more:
https://www.ibm.com/developerworks/community/blogs/aimsupport/entry/websphere_mq_clustering_workload_balancing_dick_hamilton14?lang=en

algorithm to replicate data between computers

Suppose I have n computers. Each of them has a set of integers. Each computer will not have the same set.
i.e. computer 1 has {1,2,3,4}, computer 2 has {4, 5,10,20,21}, computer 3 has {-10,3,5} and so on.
I want to replicate this data so that all computers will have all the integers , i.e. all of them will have {-10,1,2,3,4,5,10,20,21}
I want to minimize the number of messages that each computer sends and also minimize the time. (i.e. avoid a serial approach where computer 1 first communicates with everyone and gets the data it is missing, then computer 2 does the same and so on.
What is an efficient way of doing this?
Thanks.
Minimal approach would be : All computers send info to just one ( master ) computer and get the result
For reliability you could consider at least two computers as master computers
Assumptions :
Total n computers
One of the computers is considered as master
Algorithm :
All computers send input-info to Master ( total n-1 messages )
Master processes the info
Master sends the result-info to all computers ( total n-1 messages )
Reliability :
Total failure of the system based on this algorithm can only occur if all the masters failed .
Efficiency :
With 1 master , total messages : 2 * (n-1)
With 2 masters , total messages : 2 * 2 * (n-1)
With 3 masters , total messages : 3 * 2 * (n-1)
If all the computers are on the same network, you could use UDP sockets with SO_BROADCAST option.
This way when one computer does a message 'send', all the other computers would 'recv' the message and update as necessary.
Here's one way of doing it in 2*n - 2 moves. Model the machines as nodes in a linked-list and numbered from 1..n.
Let node 1 send all of its data in one message to node 2.
Node 2 remembers the message node 1 sent, performs a union of its content with node 1's content and sends the unified message to Node 3. Then Node 2 waits for a response from Node 3.
Node 3 does the same as above and so on until we get to node 'n'. Node 'n' now has the complete set.
Node 'n' already knows what message node 'n - 1' sent it, so it sends the diff back to node 'n - 1'
Node 'n - 1' performs the union as above. Since it has remembered the message of node 'n - 2' (in step 2 above), it can send the diff back to node 'n - 3'
and so on.
I think it is not complex to show that the above leads to 2 * (n - 1) messages being sent in the network.
I think it can be proven that 2n - 2 is necessary by considering each node to have a unique element. It should be a short exercise in mathematical induction to prove that 2n - 2 is necessary..
Well, there's already a system that does this, is widely adopted, well documented, widely available and, while it's perhaps not perfect (for assorted definitions of perfect), it's practical.
It's called RSync.
You can start here: http://www.samba.org/~tridge/phd_thesis.pdf

Resources