Selective Repeat number of acknowledgements - algorithm

I've been trying to solve a question that uses selective repeat protocol but I cant seem to figure it out.
what is the maximum number of different acknowledgements messages that can be propagating?
or in the specific example, window size of 13 with sequence number from 1-211. in time t the last packet in the senders window has sequence number 171. assume that medium does not reorder messages in both directions. what is the maximum number of ACK messages with different sequence numbers that are currently propagating back from the receiver to sender at time t ?
Thanks in advance :)

Related

Count number of point for a librato metric

I'm trying to build a composite metric to know how many point are sent on a period for a specific metric.
The closer stackoverflow response to this is about counting the number of source, and I failed to change it to do what I want (How can i count the total number of sources my metric has with Librato?)
The metric in question is a timing on a function execution, that receive around 20k values on peak hour
At first, I sum-ed the series with a count aggregation, and the pattern I had then was close to what I expected, but regarding our logs, it always differ
The composite I made was like that
sum(s("timing", "%", {function:"count"}))
Any ideas ?
Thanks
Well, the librato support told me the composite do what I want
The differences with the logs were due to errors during metrics sending

Algorithm to cluster / assign observed values

I am looking for an algorithm to solve the following problem:
I have nclient communicating values with a server. The values are basically an array of x probabilities (values between 0 and 1), e.g. 0, 0.5, 1, 0.7, 0.1,. Every client communicates around 1000 of such arrays to the server. Over time, for each client these values change uniquely for each client.
I want to take on the role of an observer, watching the traffic between client and server, capturing each communicated message, but without the knowledge who of the clients is the sender of the message. However, I know that these message change uniquely over time. So if I capture sufficient message, I will be able to distuingish between the clients.
My question is now, what class of algorithms/approaches are suitable to perform such a "categorization" or "identification" to assign a captured message to a certain client on the basis of previous captures messages.
I guess I will have to emply statistic. Do know of algorithms or approaches that could deal with this problem?
Thanks

Find the count of a particular number in an infinite stream of numbers at a particular moment

I faced this problem in a recent interview:
You have a stream of incoming numbers in range 0 to 60000 and you have a function which will take a number from that range and return the count of occurrence of that number till that moment. Give a suitable Data structure/algorithm to implement this system.
My solution is:
Make an array of size 60001 pointing to bit-vectors. These bit vectors will contain the count of the incoming numbers and the incoming numbers will also be used to index into the array for the corresponding number. Bit-vectors will dynamically increase as the count gets too big to hold in them.
So, if the numbers are coming at rate 100numbers/sec then, in 1million years total numbers will be = (100*3600*24)*365*1000000 = 3.2*10^15. In the worst case where all numbers in the stream is same it will take ceil((log(3.2*10^15) / log 2) )= 52bits and if the numbers are uniformly distributed the we will have (3.2*10^15) / 60001 = 5.33*10^10 number of occurrences for each number which will require total of 36 bits for each numbers.
So, assuming 4byte pointers we need (60001 * 4)/1024 = 234 KB memory for the array and for the case with same numbers, we need bit vector size = 52/8 = 7.5 bytes which is still around 234KB. And for the other case we need (60001 * 36 / 8)/1024 = 263.7 KB for bit vector totaling about 500KB. So, it is very much feasible to do this with ordinary PC and memory.
But the interviewer said, as it is infinite stream it will eventually overflow and gave me hint like how can we do this if there were many PCs and we could pass messages between them or think about file system etc. But I kept thinking if this solution was not working then, others would too. Needless to say, I did not get the job.
How to do this problem with less memory? Can you think of an alternative approach (using network of PCs may be)?
A formal model for the problem could be the following.
We want to know if it exists a constant space bounded Turing machine such that, in any given time it recognizes the language L of all couples (number,number of occurrences so far). This means that all correct couples will be accepted and all incorrect couples will be rejected.
As a corollary of the Theorem 3.13 in Hopcroft-Ullman we know that every language recognized by a constant space bounded machine is regular.
It can be proven by using the pumping lemma for regular languages that the language described above is not a regular language. So you can't recognize it with a constant space bounded machine.
you can easily use index based search, by using an array like int arr[60000][1], whenever you get a number , say 5000, directly access the index( num-1) = (5000-1) as, arr[num-1][1], and increment the number, and now whenever u want to know how many times a particular num has ocurred you can just access it by arr[num-1][1] and you'll get the count for that number, Its simplest possible linear time implementation.
Isn't this External Sorting? Store the infinite stream in a file. Do a seek() (RandomAccessFile.seek() in Java) in the file and get to the appropriate timestamp. This is similar to Binary Search since the data is sorted by timestamps. Once you get to the appropriate timestamp, the problem turns into counting a particular number from an infinite set of numbers. Here, instead of doing a quick sort in memory, Counting sort can be done since the range of numbers is limited.

Algorithm/Heuristic for grouping chat message histories by 'conversation'/implicit sessions from time stamps?

The problem: I have a series of chat messages -- between two users -- with time stamps. I could present, say, an entire day's worth of chat messages at once. During the entire day, however, there were multiple, discrete conversations/sessions...and it would be more useful to the user to see these divided up as opposed to all of the days as one continuous stream.
Is there an algorithm or heuristic that can 'deduce' implicit session/conversation starts/breaks from time stamps? Besides an arbitrary 'if the gap is more than x minutes, it's a separate session'. And if that is the only case, how is this interval determined? In any case, I'd like to avoid this.
For example, there are...fifty messages that get sent between 2:00 and 3:00, and then a break, and then twenty messages sent between 4:00 and 5:00. There would be a break inserted between there...but how would the break be determined?
I'm sure that there is already literature on this subject, but I just don't know what to search for.
I was playing around with things like edge detection algorithms and gradient-based approaches for a while.
(see comments for more clarification)
EDIT (Better idea):
You can view each message as being of two types:
A continuation of a previous conversation
A brand new conversation
You can model these two types of messages as independent Poisson processes, where the time difference between adjacent messages is an exponential distribution.
You can then empirically determine the exponential parameters for these two types of messages by hand (wouldn't be too hard to do given some initial data). Now you have a model for these two events.
Finally when a new message comes along, you can calculate the probability of the message being of type 1 or type 2. If type 2, then you have a new conversation.
Clarification:
The probability of the message being a new conversation, given that the delay is some time T.
P(new conversation | delay=T) = P(new conversation AND delay=T)/P(delay=T)
Using Bayes' Rule:
= P(delay=T | new conversation)*P(new conversation)/P(delay=T)
The same calculation goes for P(old conversation | delay=T).
P(delay=T | new conversation) comes from the model. P(new conversation) is easily calculable from the data used to generate your model. P(delay=T) you don't need to calculate at all since all you want to do is compare the two probabilities.
The difference in timestamps between adjacent messages depends on the type of conversation and the people participating. Thus you'll want an algorithm that takes into account local characteristics, as opposed to a global threshold parameter.
My proposition would be as follows:
Get the time difference between the last 10 adjacent messages.
Compute the mean (or median)
If the delay until the next message is more than 30 times the the mean, it's a new conversation.
Of course, I came up with these numbers on the spot. They would have to be tuned to fit your purpose.

Sorting Items by a Cyclical Sequence Number

I am developing an algorithm to reorder packets in a transmission. Each packet has an associated sequence number in [0, 256). The first packet's sequence number can take on any one of those values, after which the next packet takes the next value, and the next packet the value after that, and so forth (rolling over after 255).
The sequence numbers of the packets, in the correct order, would appear as follows, where "n" is the first packet's sequence number:
n, n+1, n+2, ..., 254, 255, 0, 1, 2, ..., 254, 255, 0, 1, 2, ..., 254, 255, 0, 1, ...
Each packet is given a timestamp when it arrives at its destination, and they all arrive approximately in order. (I don't have an exact figure, but given a list of packets sorted by arrival timestamp, it is safe to say that a packet will never be more than five spots away from its position in the list indicated by its sequence number.)
I feel that I cannot have been the first person to deal with a problem like this, given the prevalence of telecommunications and its historical importance to the development of computer science. My question, then:
Is there a well-known algorithm to reorder an approximately-ordered sequence, such as the one described above, given a cyclically-changing key?
Is there a variation of this algorithm that is tolerant of large chunks of missing items? Let us assume that these chunks can be of any length. I am specifically worried about chunks of 256 or more missing items.
I have a few ideas for algorithms for the first, but not for the second. Before I invest the man-hours to verify that my algorithms are correct, however, I wanted to make sure that somebody at Bell Labs (or anywhere else) hadn't already done this better thirty years ago.
I don't know if this solution is actually used anywhere, but here is what I would try (assuming no missing packets, a maximum "shuffeling" of five positions, and a maximum sequence number of 255):
n = 0;
max_heap h = empty;
while( true ) do
while( h.top().index != 0 ) do
p = next_packet;
i = n - p.seq;
if( i > 0 ) i = i - 255;
h.add( i, p );
done
p = h.pop();
n = n + 1;
p.increase_indexes( 1 );
// Do something with it
done
Basically in the priority queue we store how many packets there are between the last handled packet and the packets still waiting to be handled. The queue will stay very small, because packets are handled as soon as they can, when they come in. Also increasing the keys will be very simple, since no reordering of the heap is necessary.
I am not sure how you could adapt this to missing packets. Most likely by using some timeout, or maximum offset, after which the packtets are declared the "next" and the heap is updated accordingly.
I do not think this problem is possible at all however, if you miss more than 256 packets. Take the subsequences
127,130,128,129
There could be several causes for this
1) Packets 128 and 129 were out of order and should be reordered
2) Packets 128 and 129 were lost, then 253 packtets were lost, so the order is correct
3) A mixture of 1 and 2
Interesting problem!
My solution would be sort the packets according to time of arrival and locally sorting a window of elements (say 10) circularly according to their packet number. You can refine this in many ways. If difference between two consecutive packet numbers (arranged according to time of arrival) is greater than certain threshold you might put a barrier between them (i.e. you cannot sort across barriers). Also, if time difference between packets (arranged according to time of arrival) is greater than some threshold you might want to put a barrier (this should probably take care of problem 2).
Use a priority queue.
After each receiving each packet:
put it in the queue.
repeatedly remove the top element of the queue as long as it is the one you're waiting for.
For the second question:
In general, no, there's no way to solve it.
If the packet arrival has some periodicity (eg: expecting a packet in every 20ms), then you can easily detect this, clean up the queue, and after receiving 5 packets, you'll know how to process again...

Resources