Queue with memory limited size - algorithm

I need a queue with next properties and supported operations:
Push element in the beginning, automatically popping out from the end until size of queue become not greater than predefined limit.
Take N elements from beginning lazily and not traversing whole queue.
Limit by total size of all elements like 2Mb.
I know I can implement this by myself as a wrapper aroud Data.Sequence, or something else (according to mentioned implementations). Also found this blog post from Well Typed. I just wonder is this already implemented somewhere?
And if there is no standard implementation with desired behaviour, it would be nice to hear recommendations which standard data structure to use to implement such queue.
There is lrucache library which has almost anything I want except it limits it's size by number of elements in queue.

Related

What is the name of this kind of cache/ data structure?

I need a fixed-size cache of objects that keeps track how many times each object was requested. When it is full and a new object is added, the object with the lowest usage score gets removed.
So this is different from a LRU-cache of size N in that if some object is heavily requested, then even adding N new objects won't push it out of cache.
Some kind of mix of a cache and a priority queue. Is there a name for that?
Thanks!
Without a time element, this kind of cache clogs up with things that were used a lot in the past, but aren't used currently. Replacement becomes impossible, because everything in the cache has been used more than once, so you won't evict anything in favor of a new item.
You could write some code that degrades the value of the count over time (i.e. take into account the time since last used), but doing so is just a really complicated way of simulating an LRU cache. I experimented with it at one point, but found that it didn't perform any better than the simple LRU cache. At least not in my application.

What is the best buffer management drop policy?

I am working on project that contains a fixed-size buffer of type (FIFO): First input First Output, where clients send their requests to that buffer, and the system handles them.When the buffer is full, I have to apply one of the following overloading policies (Drop Policies): DRPH : Drop one Request from the Head of buffer. DRPT: Drop one Request from Tail of buffer.DRPR: Drop 25% of elements in the buffer randomly. BLCK: block new connections until space is available in buffer.
I made a simulation to measure the performance using Httperf by sending many requests per second and measuring the response time, but I have got unstable values for response time especially when the requests number is large. so by simulation I can not get the best drop policy. I repeated the simulation many times, each time I have got different values.
The question is :
theoretically, what is the best buffer management drop policy among the mentioned policies? .
It definitely depends on your data and in which order it is needed. But usually, with a FIFO, the data at the end of the buffer is the oldest and so the one with the least likelhood to be required again. So DRPR is probably the best solution. But only if you can afford losing data (e.g. because it can be re-inserted later). If that is not the case you have to block connections until buffer space is available again.
Another thing: I would strive for a dynamic buffer. Start with a reasonable default size and see how quick it fills up. Above a certain rate increase the buffer size (and below a certain threshold you can lower it again) up to a certain maximum.

Efficiently implementing Birman-Schiper-Stephenson(BSS) protocol's delay queue

I am using the Birman-Schiper-Stephenson protocol of distributed system with the current assumption that peer set of any node doesn't change. As the protocol dictates, the messages which have come out of causal order to a node have to be put in a 'delay queue'. My problem is with the organisation of the delay queue where we must implement some kind of order with the messages. After deciding the order we will have to make a 'Wake-Up' protocol which would efficiently search the queue after the current timestamp is modified to find out if one of the delayed messages can be 'woken-up' and accepted.
I was thinking of segregating the delayed messages into bins based on the points of difference of their vector-timestamps with the timestamp of this node. But the number of bins can be very large and maintaining them won't be efficient.
Please suggest some designs for such a queue(s).
Sorry about the delay -- didn't see your question until now. Anyhow, if you look at Isis2.codeplex.com you'll see that in Isis2, I have a causalsend implementation that employs the same vector timestamp scheme we described in the BSS paper. What I do is to keep my messages in a partial order, sorted by VT, and then when a delivery occurs I can look at the delayed queue and deliver off the front of the queue until I find something that isn't deliverable. Everything behind it will be undeliverable too.
But in fact there is a deeper insight here: you actually never want to allow the queue of delayed messages to get very long. If the queue gets longer than a few messages (say, 50 or 100) you run into the problem that the guy with the queue could be holding quite a few bytes of data and may start paging or otherwise running slowly. So it becomes a self-perpetuating cycle in which because he has a queue, he is very likely to be dropping messages and hence enqueuing more and more. Plus in any case from his point of view, the urgent thing is to recover that missed message that caused the others to be out of order.
What this adds up to is that you need a flow control scheme in which the amount of pending asynchronous stuff is kept small. But once you know the queue is small, searching every single element won't be very costly! So this deeper perspective says flow control is needed no matter what, and then because of flow control (if you have a flow control scheme that works) the queue is small, and because the queue is small, the search won't be costly!

How to store few millions of cache and then track down 20 oldest cache

I got an interview question saying I need to store few millions of cache and then I need to keep a track on 20 oldest cache and as soon as the threshold of cache collection increases, replace the 20 oldest with next set of oldest cache.
I answered to keep a hashmap for it, again the question increases
what if we wanna access any of the element on hashmap fastly, how to
do, so I told its map so accessing won't be time taking but
interviewer was not satisfied. So what should be the idle way for such
scenarios.
A queue is well-suited to finding and removing the oldest members.
A queue implemented as a doubly linked list has O(1) insertion and deletion at both ends.
A priority queue lends itself to giving different weights to different items in the queue (e.g. some queue elements may be more expensive to re-create than others).
You can use a hash map to hold the actual elements and find them quickly based on the hash key, and a queue of the hash keys to track age of cache elements.
By using a double-linked list for the queue and also maintaining a hash map of the elements you should be able to make a cache that supports a max size (or even a LRU cache). This should result in references to objects being stored 3 times and not the object being stored twice, be sure to check for this if you implement this (a simple way to avoid this is to just queue the hash key)
When checking for overflow you just pop the last item off the queue and then remove it from the hash map
When accessing an item you can use the hash map to find the cached item. Then if you are implementing a LRU cache you just remove it from the queue and add it back to the beginning, this.
By using this structure Insert, Update, Read, Delete are all going to be O(1).
The follow on question to expect is for an interviewer to ask for the ability for items to have a time-to-live (TTL) that varies per cached item. For this you need to have another queue that maintains items ordered by time-to-live, the only problem here is that inserts now become O(n) as you have to scan the TTL queue and find the spot to expire, so you have to decide if the memory usage of storing the TTL queue as a n-tree will be worthwhile (thereby yielding O(log n) insert time). Or you could implement your TTL queue as buckets for each ~1minute interval or similar, you get ~O(1) inserts still and just degrade the performance of your expiration background process slightly but not greatly (and it's a background process).

Is it possible to declare a maximum queue size with AMQP?

As the title says — is it possible to declare a maximum queue size and broker behaviour when this maximum size is reached? Or is this a broker-specific option?
I ask because I'm trying to learn about AMQP, not because I have this specific problem with any specific broker… But broker-specific answers would still be insightful.
AFAIK you can't declare maximum queue size with RabbitMQ.
Also there's no such setting in the AMQP sepc:
http://www.rabbitmq.com/amqp-0-9-1-quickref.html#queue.declare
Depending on why you're asking, you might not actually need a maximum queue size. Since version 2.0 RabbitMQ will seamlessly persist large queues to disk instead of storing all the messages in RAM. So if your concern the broker crashing because it exhausts its resources, this actually isn't much of a problem in most circumstances - assuming you aren't strapped for hard disk space.
In general this persistence actually has very little performance impact, because by definition the only "hot" parts of the queue are the head and tail, which stay in RAM; the majority of the backlog is "cold" so it makes little difference that it's sitting on disk instead.
We've recently discovered that at high throughput it isn't quite that simple - under some circumstances the throughput can deteriorate as the queue grows, which can lead to unbounded queue growth. But when that happens is a function of CPU, and we went for quite some time without hitting it.
You can read about RabbitMQ maximum queue implementation here http://www.rabbitmq.com/maxlength.html
They do not block the incoming messages addition but drop the messages from the head of the queue.
You should definitely read about Flow control here:
http://www.rabbitmq.com/memory.html
With qpid, yes
you can confire maximun queue size and politic in case raise the maximum. Ring, ignore messages,broke connection.
you also have lvq queues (las value) very configurable
There are some things that you can't do with brokers, but you can do in your app. For instance, there are two AMQP methods, basic.get and queue.declare, which return the number of messages in the queue. You can use this to periodically get a count of outstanding messages and take action (like start new consumer processes) if the message count gets too high.

Resources