i just want to know what is the suitable data structure to implement for:
1. storing the recently visited web addresses on a web browser?
2. the processes to be scheduled on the CPU of a computer?
3. the undo mechanism in a text editor like Notepad?
storing the recently visited web addresses on a web browser?
If you want to store the k last addresses, you can use a queue.
If the queue is smaller than k - just add the new address.
If it's of size k, delete the last element in the queue (one inserted first, the "oldest" entry), and insert the new one.
Might want to combine it with a set (or a map that maps to the queue's entry) to make sure an entry is not filling multiples values in your queue.
If you don't need to ever delete entries (and the number of "visited" elements is unbounded), you can use a set.
the processes to be scheduled on the CPU of a computer?
There are many options for that, but some simple ones are using a queue, or a priority queue (heap).
the undo mechanism in a text editor like Notepad?
A stack. Each "do" is a push, and to "Undo", you pop the last element and revert its action.
Related
I need a queue with next properties and supported operations:
Push element in the beginning, automatically popping out from the end until size of queue become not greater than predefined limit.
Take N elements from beginning lazily and not traversing whole queue.
Limit by total size of all elements like 2Mb.
I know I can implement this by myself as a wrapper aroud Data.Sequence, or something else (according to mentioned implementations). Also found this blog post from Well Typed. I just wonder is this already implemented somewhere?
And if there is no standard implementation with desired behaviour, it would be nice to hear recommendations which standard data structure to use to implement such queue.
There is lrucache library which has almost anything I want except it limits it's size by number of elements in queue.
In my OSX app, I'm using an NSTreeController to keep track of any changes to to a document. The tree controller enables versioning by acting as a source control, which means that documents can create their own branches, etc.
It works fine so far. The problem is that every change to the document adds an NSTreeNode to the tree. Which means that after a few hours of use, the tree has accumulated many nodes, which means tons of objects in memory.
Is there a way I can create an NSTreeController with a capacity (like you'd give to an NSArray) which will automatically trim child nodes? If not, what's the best way to manually flush nodes at an appropriate interval so memory usage doesn't bloat?
I am using the Birman-Schiper-Stephenson protocol of distributed system with the current assumption that peer set of any node doesn't change. As the protocol dictates, the messages which have come out of causal order to a node have to be put in a 'delay queue'. My problem is with the organisation of the delay queue where we must implement some kind of order with the messages. After deciding the order we will have to make a 'Wake-Up' protocol which would efficiently search the queue after the current timestamp is modified to find out if one of the delayed messages can be 'woken-up' and accepted.
I was thinking of segregating the delayed messages into bins based on the points of difference of their vector-timestamps with the timestamp of this node. But the number of bins can be very large and maintaining them won't be efficient.
Please suggest some designs for such a queue(s).
Sorry about the delay -- didn't see your question until now. Anyhow, if you look at Isis2.codeplex.com you'll see that in Isis2, I have a causalsend implementation that employs the same vector timestamp scheme we described in the BSS paper. What I do is to keep my messages in a partial order, sorted by VT, and then when a delivery occurs I can look at the delayed queue and deliver off the front of the queue until I find something that isn't deliverable. Everything behind it will be undeliverable too.
But in fact there is a deeper insight here: you actually never want to allow the queue of delayed messages to get very long. If the queue gets longer than a few messages (say, 50 or 100) you run into the problem that the guy with the queue could be holding quite a few bytes of data and may start paging or otherwise running slowly. So it becomes a self-perpetuating cycle in which because he has a queue, he is very likely to be dropping messages and hence enqueuing more and more. Plus in any case from his point of view, the urgent thing is to recover that missed message that caused the others to be out of order.
What this adds up to is that you need a flow control scheme in which the amount of pending asynchronous stuff is kept small. But once you know the queue is small, searching every single element won't be very costly! So this deeper perspective says flow control is needed no matter what, and then because of flow control (if you have a flow control scheme that works) the queue is small, and because the queue is small, the search won't be costly!
I got an interview question saying I need to store few millions of cache and then I need to keep a track on 20 oldest cache and as soon as the threshold of cache collection increases, replace the 20 oldest with next set of oldest cache.
I answered to keep a hashmap for it, again the question increases
what if we wanna access any of the element on hashmap fastly, how to
do, so I told its map so accessing won't be time taking but
interviewer was not satisfied. So what should be the idle way for such
scenarios.
A queue is well-suited to finding and removing the oldest members.
A queue implemented as a doubly linked list has O(1) insertion and deletion at both ends.
A priority queue lends itself to giving different weights to different items in the queue (e.g. some queue elements may be more expensive to re-create than others).
You can use a hash map to hold the actual elements and find them quickly based on the hash key, and a queue of the hash keys to track age of cache elements.
By using a double-linked list for the queue and also maintaining a hash map of the elements you should be able to make a cache that supports a max size (or even a LRU cache). This should result in references to objects being stored 3 times and not the object being stored twice, be sure to check for this if you implement this (a simple way to avoid this is to just queue the hash key)
When checking for overflow you just pop the last item off the queue and then remove it from the hash map
When accessing an item you can use the hash map to find the cached item. Then if you are implementing a LRU cache you just remove it from the queue and add it back to the beginning, this.
By using this structure Insert, Update, Read, Delete are all going to be O(1).
The follow on question to expect is for an interviewer to ask for the ability for items to have a time-to-live (TTL) that varies per cached item. For this you need to have another queue that maintains items ordered by time-to-live, the only problem here is that inserts now become O(n) as you have to scan the TTL queue and find the spot to expire, so you have to decide if the memory usage of storing the TTL queue as a n-tree will be worthwhile (thereby yielding O(log n) insert time). Or you could implement your TTL queue as buckets for each ~1minute interval or similar, you get ~O(1) inserts still and just degrade the performance of your expiration background process slightly but not greatly (and it's a background process).
My company currently services their clients using a Windows-based fat-client app that has workflow processing embedded in it. Basically, the customer inserts a set of documents to the beginning of the workflow, the documents are processed through a number of workflow steps and then after a period of time the output is presented to the customer. We currently scale up for larger customers by installing the application on other machines and let the cluster of machines work on different document subsets. Not ideal but with minimal changes to the application it did allow us to easily scale up to our current level.
The problem we now face is that as our customers have provided us with larger document sets we find ourselves spending more than expected on machines, IT support, etc... So, we have started to think about re-architecting the platform to make it scaleable. A feature of our solution is that each document can be processed independently of one another. Also we have 10 workflow steps of which two of the steps take up about 90% of the processing time.
One idea we are mulling over is to add a workflow step field to the document schema to track which workflow step has been completed for the document. We can then throw the entire cluster of machines to work on a single document set. A single machine would not be responsible for sequentially processing a document through all workflow steps but queries the db for the next document/workflow step pair and perform that processing. Does this sound like a reasonable approach? Any suggestions?
Thanks in advance.
While I'm not sure what specific development environment you are working with, I have had to deal with some similar workflows where we have a varied number of source documents, various steps, etc. all with different performance characteristics.
Assuming you have a series of independent steps - i.e. Step A's work product is the input for Step B, and step B's product is the input for step C, etc. I would look at message queueing as a potential solution.
For example, all new documents are tossed into a queue. One or more listener apps hit the queue and grab the next available document to perform step A. As step A completes, a link to the output product and/or relevant data are tossed into another queue. a separate listener app pulls from this second queue into step B, etc. until the final output product is created.
In this way, you use one queue for the holding area between each discreet step, and can scale up or down any individual process between the queues.
For example, we use this to go from some data transformations, through a rendering process, and out to a spooler. The data is fast, the renderings are CPU bound, and the printing is I/O bound, but each individual step can be scaled out based on need.
You could (technically) use a DB for this - but a message queue and/or a service bus would likely serve you better.
Hopefully that points you in the right direction!