Atomic Broadcast Exercice - parallel-processing

I'm trying to solve the exercice 5.10 of the book
"Foundations of Multithreaded, Parallel, and Distributed Programming".
The exercice is
"Assume one producer process and N consumer processes share a bounded buffer having B slots. The producer deposits messages in the buffer; consumers fetch them. Every message deposited by the producer is to be received by all N consumers. Futthermore, each consumer is to receive the messages in the order theu were deposited. However, consumers can receive messages at different times. For example, one consumer could receive up to B more messages than another if the second consumer is slow.
Develop a monitor that implements this kind of communication. Use Signal and Continue discipline."
Can someone help me, please?
Thank you very much!
--
EDIT:
I'm commenting now what I already made (cause I thought that the question was very big if I wrote everything that).
/* creating a buffer of B positions. */
global buffer[B];
Monitor {
cond ok_write;
cond ok_read;
int stamp_buffer[B] = [0, 0, .., 0]
request_write (int pos){
if (stamp_buffer[pos] > 0)
wait(ok_write);
write_message (buufer[pos]);
stamp_buffer[pos] = N;
signalAll (ok_read);
}
request_read (int pos){
if (stamp_buffer[pos] == 0)
wait (ok_read);
stamp_buffer[pos] --;
}
release_read (int pos){
if (stamp_buffer[pos]==0)
signal(ok_write);
}
}
So, I think that I still have that problem: "A reader can read the same message two times."
The basic idea of my algorithm is:
The writer write in a position "pos" and set the value of stamp[pos] to N.
Then, when each reader read the position pos, it do stamp[pos] - 1.
So, if stamp[pos] is zero, the message buffer[pos] was already readed N times and the writer can write in this position again.
But, if some reader read a message two times (or more), the writer can wirte a new message in the position pos and some reader will not read the old message.

Related

MPI_Send does not work with higher buffer size?

When MPI_Send buffer size is 100 program works, but it stucks when it is 1000 or greater. Why?
if(id == 0){
rgb_image = stbi_load(argv[1], &width, &height, &bpp, CHANNEL_NUM);
for(int i = 0; i < size -1; i++)
MPI_Send(rgb_image,1000,MPI_UINT8_T,i,0,MPI_COMM_WORLD);
}
uint8_t *part = (uint8_t*) malloc(sizeof(uint8_t)*(1000));
if(id != size-1 && size > 1)
MPI_Recv(part,1000,MPI_UINT8_T,0,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
This program is not valid w.r.t. MPI Standard since there is no matching receive (on rank 0) for
MPI_Send(..., dest=0, ...)
MPI_Send() is allowed to block until a matching receive is posted (and that generally happens when the message is "large") ... and the required matching receive never gets posted.
A typical fix would be to issue a MPI_Irecv(...,src = 0,...) on rank 0 before the MPI_Send() (and MPI_Wait() after), or to handle 0 -> 0 communication with MPI_Sendrecv().
That being said, it would likely more efficient to create a communicator will all the ranks minus the last one, and MPI_Bcast() in this communicator.
If a program works for small buffers but not for large, you are probably running into "eager sends". Normally, a send & receive transaction involves the sender & receiver talking back and forth, confirming that the data went across. This is overhead, so for small messages, many MPIs will just send the data, without confirmation. The data then goes into some secret buffer on the receiver.
But this means that your program will "succeed" if it's not a correct program. As is the case here. See #Giles answer.

MPI: Best way to coordinate many sends and recieves

I've been away from parallel programming for a long period of time and I am trying to figure out the best method for coordinating sending large amounts of data between many processors with a complicated dependency structure. For example, I might to send data to/from the following processes:
int process_1_dependencies[] = {2,3,5,6}
int process_2_dependencies[] = {1}
int process_3_dependencies[] = {1,4,5}
int process_4_dependencies[] = {3,5,6}
int process_5_dependencies[] = {1,3,4,6}
int process_6_dependencies[] = {1,4,5,7}
int process_7_dependencies[] = {6,8}
int process_8_dependencies[] = {7}
The obvious, and stupid, way of doing this would be do something like:
for(int i = 0; i < world_size; i++)
{
for(int j = 0; j < dependency_length; j++)
{
if (i == my_rank)
{
mpi_irecv(...,source=dependency[j],)
}
else
{
if (i == dependency[j])
{
mpi_isend(...,dest=dependency[j])
}
}
}
// blocking stuff?
}
I'm not actually sure if this would work once you have 100's of communications going and in anycase, it seems super inefficient. It's at least O(N) and only allows a single process to be receiving at once. A better way would be to use blocking and ensure that independent processes are simultaneously exchanging information. But that becomes quite complicated and requires optimizing which processes are simultaneously sending and receiving.
Am I just completely overthinking this? Is it safe to do something like this (provided that every sending process has a receiving pair):
for(int i = 0; i < dependency_length; i++)
{
mpi_isend(..., dest=dependency[i], ...)
mpi_irecv(..., source=dependency[i], ...)
}
//blocking stuff
sorry for the lack of focus in the question. I'm away from my computer so I can't really test it out, and in even if it did would I guess I'm not confident that it is saleable and that the buffers would keep working for arbitrary numbers of processes?
To avoid queueing a large number of messages and to avoid opaque deadlock problems, you can also employ a single call to MPI_Alltoallv, where all sends and receives are done for you automatically, and---with crossed fingers--- even hope that you MPI implemetation is able to optimize all communication on its own. The prototype is
MPI_Alltoallv
(
sendbuf, // buffer containing all data needed by other ranks in comm
sendcounts, // number of elements to send to each rank in comm
sdispls, // offsets in sendbuf per rank in comm
sendtype, // MPI datatype of the sent data
recvbuf, // buffer to contain all data needed by this rank
recvcounts, // number of elements to receive per rank in comm
rdispls, // offsets in recvbuf per rank in comm
recvtype, // MPI datatype of the received data
comm // the communicator
);
where sendcounts would be directly related to your process_X_dependencies; it would contain non-zero values at positions listed by process_X_dependencies.

How to simulate limited RSU capacity in veins?

I have to simulate a scenario with a RSU that has limited processing capacity; it can only process a limited number of messages in a time unit (say 1 second).
I tried to set a counter in the RSU application. the counter is incremented each time the RSU receives a message and decremented after processing it. here is what I have done:
void RSUApp::onBSM(BasicSafetyMessage* bsm)
{
if(msgCount >= capacity)
{
//drop msg
this->getParentModule()->bubble("capacity limit");
return;
}
msgCount++;
//process message here
msgCount--;
}
it seems useless, I tested it using capacity limit=1 and I have 2 vehicles sending messages at the same time. the RSU process both although it should process one and drop the other.
can anyone help me with this?
In the beginning of the onBSM method the counter is incremented, your logic gets executed and finally the counter gets decremented. All those steps happen at once, meaning in one step of the simulation.
This is the reason why you don't see an effect.
What you probably want is a certain amount of "messages" to be processed in a certain time interval (e.g. 500 ms). It could somehow look like this (untested):
if (simTime() <= intervalEnd && msgCount >= capacity)
{
this->getParentModule()->bubble("capacity limit");
return;
} else if (simTime() > intervalEnd) {
intervalEnd = simTime() + YOURINTERVAL;
msgCount = 0;
}
......
The variable YOURINTERVAL would be time amount of time you like to consider as the interval for your capacity.
You can use self messaging with scheduleAt(simTime()+delay, yourmessage);
the delay will simulate the required processing time.

Algorithm to time-sort N data streams

So I've got N asynchronous, timestamped data streams. Each stream has a fixed-ish rate. I want to process all of the data, but the catch is that I must process the data in order as close to the time that the data arrived as possible (it is a real-time streaming application).
So far, my implementation has been to create a fixed window of K messages which I sort by timestamp using a priority queue. I then process the entirety of this queue in order before moving on to the next window. This is okay, but its less than ideal because it creates lag proportional to the size of the buffer, and also will sometimes lead to dropped messages if a message arrives just after the end of the buffer has been processed. It looks something like this:
// Priority queue keeping track of the data in timestamp order.
ThreadSafeProrityQueue<Data> q;
// Fixed buffer size
int K = 10;
// The last successfully processed data timestamp
time_t lastTimestamp = -1;
// Called for each of the N data streams asyncronously
void receiveAsyncData(const Data& dat) {
q.push(dat.timestamp, dat);
if (q.size() > K) {
processQueue();
}
}
// Process all the data in the queue.
void processQueue() {
while (!q.empty()) {
const auto& data = q.top();
// If the data is too old, drop it.
if (data.timestamp < lastTimestamp) {
LOG("Dropping message. Too old.");
q.pop();
continue;
}
// Otherwise, process it.
processData(data);
lastTimestamp = data.timestamp;
q.pop();
}
}
Information about the data: they're guaranteed to be sorted within their own stream. Their rates are between 5 and 30 hz. They consist of images and other bits of data.
Some examples of why this is harder than it appears. Suppose I have two streams, A and B both running at 1 Hz and I get the data in the following order:
(stream, time)
(A, 2)
(B, 1.5)
(A, 3)
(B, 2.5)
(A, 4)
(B, 3.5)
(A, 5)
See how if I processed the data in order of when I received them, B would always get dropped? that's what I wanted to avoid.Now in my algorithm, B would get dropped every 10th frame, and I would process the data with a lag of 10 frames into the past.
I would suggest a producer/consumer structure. Have each stream put data into the queue, and a separate thread reading the queue. That is:
// your asynchronous update:
void receiveAsyncData(const Data& dat) {
q.push(dat.timestamp, dat);
}
// separate thread that processes the queue
void processQueue()
{
while (!stopRequested)
{
data = q.pop();
if (data.timestamp >= lastTimestamp)
{
processData(data);
lastTimestamp = data.timestamp;
}
}
}
This prevents the "lag" that you see in your current implementation when you're processing a batch.
The processQueue function is running in a separate, persistent thread. stopRequested is a flag that the program sets when it wants to shut down--forcing the thread to exit. Some people would use a volatile flag for this. I prefer to use something like a manual reset event.
To make this work, you'll need a priority queue implementation that allows concurrent updates, or you'll need to wrap your queue with a synchronization lock. In particular, you want to make sure that q.pop() waits for the next item when the queue is empty. Or that you never call q.pop() when the queue is empty. I don't know the specifics of your ThreadSafePriorityQueue, so I can't really say exactly how you'd write that.
The timestamp check is still necessary because it's possible for a later item to be processed before an earlier item. For example:
Event received from data stream 1, but thread is swapped out before it can be added to the queue.
Event received from data stream 2, and is added to the queue.
Event from data stream 2 is removed from the queue by the processQueue function.
Thread from step 1 above gets another time slice and item is added to the queue.
This isn't unusual, just infrequent. And the time difference will typically be on the order of microseconds.
If you regularly get updates out of order, then you can introduce an artificial delay. For example, in your updated question you show messages coming in out of order by 500 milliseconds. Let's assume that 500 milliseconds is the maximum tolerance you want to support. That is, if a message comes in more than 500 ms late, then it will get dropped.
What you do is add 500 ms to the timestamp when you add the thing to the priority queue. That is:
q.push(AddMs(dat.timestamp, 500), dat);
And in the loop that processes things, you don't dequeue something before its timestamp. Something like:
while (true)
{
if (q.peek().timestamp <= currentTime)
{
data = q.pop();
if (data.timestamp >= lastTimestamp)
{
processData(data);
lastTimestamp = data.timestamp;
}
}
}
This introduces a 500 ms delay in the processing of all items, but it prevents dropping "late" updates that fall within the 500 ms threshold. You have to balance your desire for "real time" updates with your desire to prevent dropping updates.
There's always be a lag and that lag will be determined by how long you'll be willing to wait for your slowest "fixed-ish rate" stream.
Suggestion:
keep the buffer
keep an array of bool flags with the meaning:"if position ix is true, in the buffer there is at least a sample originated from stream ix"
sort/process as soon as you have all flag to true
Not full-proof (each buffer will be sorted, but from one buffer to another you may have timestamp inversion), but perhaps good enough?
Playing around with the count of "satisfied" flags to trigger the processing (at step 3) may be used to make the lag smaller, but with the risk of more inter-buffer timestamp inversions. In extreme, accepting the processing with only one satisfied flag means "push a frame as soon as you receive it, timestamp sorting be damned".
I mentioned this to support my feeling that lag/timestamp inversions balance is inherent to your problem - except for absolutely equal framerates, there will be perfect solution in which one of the sides is not sacrificed.
Since a "solution" will be an act of balancing, any solution will require gathering/using extra information to help decisions (e.g. that "array of flags"). If what I suggested sounds silly for your case (may well be, the details you chose to share aren't too many), start thinking what metrics will be relevant for your targeted level of "quality of experience" and use additional data structures to help gathering/processing/using those metrics.

Algorithm for solving producer/consumer issue

I want to know how to solve the producer/consumer problem but using 2 differents consumers and I also need to know how to solve it using strict alternation.
I made the following algorithm for 1 producer and 1 consumer
producer()
{
while(true)
{
if i == N //full buffer
turn = 1
while turn <> 0
{
// nothing
}
produceitem(&item)//produce the item
insertitem(item, buffer)//insert the item in the buffer
turn = 1
//process zone
}
}
consumer()
{
while(true)
{
if i == 0 //empty buffer
turn = 0
while turn <> 1
{
// nothing
}
consumeitem(&item)
deleteitem(buffer)//erase the item from buffer
turn = 0
//process zone
}
}
Using that kind of "pseudo-code" I want to know to to solve the same problem(if the last was OK) with 2 consumers.
You can use the router pattern on small scale in both cases:
(source: enterpriseintegrationpatterns.com)
Basically after the queue you place special artifical consumer (router, just one). If you have two competing consumers, just put each received message randomly either in outQueue1 or outQueue2.
In case of strict alternation, the router remember which queue was used last time and sends to the second one.
If you don't want to introduce extra step, you need some sort of synchronization. In first case both consumers are competing for the same lock that they obtain randomly. The latter case is more complicated and will require more advanced synchronization so that both consumers are woken up alternatively.

Resources