I have in front of me a task to implement the LRU cache. And the longest operation in the system should take O(log (n)). As my Cache I use std :: MAP. I still need a second container for storing key + Creation Time - Sort by time. And when I need to update the address to the cache it should take somewhere:
Find by key O(log (n)).
Remove to an iterator O(1).
Insert a new element of the O(log (n)).
The oldest member must reside naturally in container.begin().
I can use only STL.
List - does not suit me.
list.find() - O (n)
Priority Queue - delete items not implemented.
I think it could ideally be stored in a std::set;
std::set<pair<KEY, TIME>>;
Sort std::set:
struct compare
{
bool operator ()(const pair<K, TIME> &a, const pair<K, TIME> &b)
{
return a.second < b.second;
}
};
And to find a key in std :: set to write the function wich looks for the first element of the pair - std::set<pair<KEY, TIME>>;.
What do you think? Can anyone tell if this suits my specified complexity requirements?
Yes you can use map plus set to get the complexity of deleting/updating/inserting as O(logn).
map stores key,value pair.
set should store time,key in this order ( you have done opposite ). When cache is full and you want to remove a key it will be correspong to the element in set to which it = myset.begin() points to.
Having said that you can improve performance by using hash + double linked list.
You can achieve O(1) complexity when chose proper data structures:
template<typename Key_t, typename Value_t>
class LruCache {
....
using Order_t = std::list<Key_t>;
Order_t m_order;
std::unordered_map<Key_t, std::pair<typename Order_t::iterator, Value_t>> m_container;
};
m_order is a list. You need to add some elements at the beginning or at the end of the list (O(1)).
Removing an item from a list if you have iterator to it: m_order.erase(it) - O(1).
Removing Recently Used Key from a list: pop_front/pop_back: O(1).
When you need to find a key, use hash_map - find - (O(1) on average).
When you found a key, you have a value, which is the real value and in addition an iterator to proper item in the list.
The whole complexity can be O(1), then.
Related
I have a linked list that contains numbers from 0 to 1 and my task is to remove numbers from a given range (x, y) from this list. Do you have any idea how to solve that problem in a reasonable complexity?
Let's first think about how a LinkedList is structured. Lets take a look at the following image:
Each element in a (doubly) linked list has a pointer to the next (and the previous) element. The Java class LinkedList is for example a doubly-linked list.
In such a list there is no direct access to "give me the index of element B". We just have a head reference (pointing at the start of the list) and a tail reference (pointing at the end). To find the element B, we need to start at head (or tail) and completely walk through the entire list, following the next (or prev) pointer of the elements until we found element B.
So, back to your question, there is no efficient way to remove elements of range(x, y) from a LinkedList. This can only be done efficient in sorted structures like PriorityQueue or a sorted ArrayList (binary search yields O(log(n)) or one with direct access to elements like HashSet for example.
Here is a code snippet in Java that solves your task for LinkedList, however, as stated, it is not efficient and has a running time of O(n) (we need to take a look at each element in order to find out which elements need to be deleted):
LinkedList<Integer> list = ...
// Inclusive lower bound
int lowerBound = ...
// Exclusive upper bound
int upperBound = ...
ListIterator<Integer> listIter = list.listIterator();
while (listIter.hasNext()) {
int value = listIter.next();
// Check if the value is inside bounds
if (value >= lowerBound || value < upperBound) {
// Remove the element from the list using the iterator
// which prevents ConcurrentModificationException
listIter.remove();
}
}
If you think about it, linkedlist has no method getAtIndex. You can only start from Head and work your way to the tail or vice versa. The complexity of this would be O(n)
I'm trying to think of a way to desing a data structure that I can efficiently insert to, remove from and search in it.
The catch is that the search function is getting a similar object as input, with 2 attributes, and I need to find an object in my dataset, such that both the 1st and 2nd of the object in my dataset are equal to or bigger than the one in search function's input.
So for example, if I send as input, the following object:
object[a] = 9; object[b] = 14
Then a valid found object could be:
object[a] = 9; object[b] = 79
but not:
object[a] = 8; object[b] = 28
Is there anyway to store the data such that the search complexity is better than linear?
EDIT:
I forgot to include in my original question. The search has to return the smallest possible object in the dataset, by multipication of the 2 attributes.
Meaning that the value of object[a]*object[b] of an object that fits the original condition, is smaller than any other object in the dataset that also fits.
You may want to use k-d tree data structure, which is typically use to index k dimensional points. The search operation, like what you perform, requires O(log n) in average.
This post may help when attributes are hierarchically linked like name, forename. For point in a 2D space k-d tree is more adapted as explain by fajarkoe.
class Person {
string name;
string forename;
... other non key attributes
}
You have to write a comparator function which take two objects of class X as input and returns -1, 0 or +1 for <, = and > cases.
Libraries like glibc(), with qsort() and bsearch or more higher languages like Java and its java.util.Comparator class and java.util.SortedMap (implementation java.util.TreeMap) as containers use comparators.
Other languages use equivalent concept.
The comparator method may be wrote followin your spec like:
int compare( Person left, Person right ) {
if( left.name < right.name ) {
return -1;
}
if( left.name > right.name ) {
return +1;
}
if( left.forename < right.forename ) {
return -1;
}
if( left.forename > right.forename ) {
return +1;
}
return 0;
}
Complexity of qsort()
Quicksort, or partition-exchange sort, is a sorting algorithm
developed by Tony Hoare that, on average, makes O(n log n) comparisons
to sort n items. In the worst case, it makes O(n2) comparisons, though
this behavior is rare. Quicksort is often faster in practice than
other O(n log n) algorithms.1 Additionally, quicksort's sequential
and localized memory references work well with a cache. Quicksort is a
comparison sort and, in efficient implementations, is not a stable
sort. Quicksort can be implemented with an in-place partitioning
algorithm, so the entire sort can be done with only O(log n)
additional space used by the stack during the recursion.2
Complexity of bsearch()
If the list to be searched contains more than a few items (a dozen,
say) a binary search will require far fewer comparisons than a linear
search, but it imposes the requirement that the list be sorted.
Similarly, a hash search can be faster than a binary search but
imposes still greater requirements. If the contents of the array are
modified between searches, maintaining these requirements may even
take more time than the searches. And if it is known that some items
will be searched for much more often than others, and it can be
arranged so that these items are at the start of the list, then a
linear search may be the best.
I posted quite confusing question, so I rewrote it from scratch...
This is actually purely theoretical question.
Say, we have binary heap. Let the heap be a MaxHeap, so root node has the biggest value and every node has bigger value than it's children. We can do some common low-level operations on this heap: "Swap two nodes", "compare two nodes".
Using those low-level operation, we can implement usual higher level recursive operations: "sift-up", "sift-down".
Using those sift-up and sift-downs, we can implement "insert", "repair" and "update". I am interested in the "update" function. Let's assume that I already have the position of the node to be changed. Therefore, update function is very simple:
function update (node_position, new_value){
heap[node_position] = new_value;
sift_up(node_position);
sift_down(node_position);
}
My question is: Is it (mathematicaly) possible, to make more advanced "update" function, that could update more nodes at once, in a way, that all nodes change their values to new_values, and after that, their position is corrected? Something like this:
function double_update (node1_pos, node2_pos, node1_newVal, node2_newVal){
heap[node1_pos] = node1_newVal;
heap[node2_pos] = node2_newVal;
sift_up(node1_position);
sift_down(node1_position);
sift_up(node2_position);
sift_down(node2_position);
}
I did some tests this with this "double_update" and it worked, although it doesn't prove anything.
What about "triple updates", and so on...
I did some other tests with "multi updates", where I changed values of all nodes and then called { sift-up(); sift-down(); } once for each of them in random order. This didn't work, but the result wasn't far from correct.
I know this doesn't sound useful, but I am interested in the theory behind it. And if I make it work, I actually do have one use for it.
It's definitely possible to do this, but if you're planning on changing a large number of keys in a binary heap, you might want to look at other heap structures like the Fibonacci heap or the pairing heap which can do this much faster than the binary heap. Changing k keys in a binary heap with n nodes takes O(k log n) time, while in a Fibonacci heap it takes time O(k). This is asymptotically optimal, since you can't even touch k nodes without doing at least Ω(k) work.
Another thing to consider is that if you change more than Ω(n / log n) keys at once, you are going to do at least Ω(n) work. In that case, it's probably faster to implement updates by just rebuilding the heap from scratch in Θ(n) time using the standard heapify algorithm.
Hope this helps!
Here's a trick and possibly funky algorithm, for some definition of funky:
(Lots of stuff left out, just to give the idea):
template<typename T> class pseudoHeap {
private:
using iterator = typename vector<T>::iterator;
iterator max_node;
vector<T> heap;
bool heapified;
void find_max() {
max_node = std::max_element(heap.begin(), heap.end());
}
public:
void update(iterator node, T new_val) {
if (node == max_node) {
if (new_val < *max_node) {
heapified = false;
*max_node = new_val;
find_max();
} else {
*max_node = new_val;
}
} else {
if (new_val > *max_node) max_node = new_val;
*node = new_val;
heapified = false;
}
T& front() { return &*max_node; }
void pop_front() {
if (!heapified) {
std::iter_swap(vector.end() - 1, max_node);
std::make_heap(vector.begin(), vector.end() - 1);
heapified = true;
} else {
std::pop_heap(vector.begin(), vector.end());
}
}
};
Keeping a heap is expensive. If you do n updates before you start popping the heap, you've done the same amount of work as just sorting the vector when you need it to be sorted (O(n log n)). If it's useful to know what the maximum value is all the time, then there is some reason to keep a heap, but if the maximum value is no more likely to be modified than any other value, then you can keep the maximum value always handy at amortized cost O(1) (that is, 1/n times it costs O(n) and the rest of the time it's O(1). That's what the above code does, but it might be even better to be lazy about computing the max as well, making front() amortized O(1) instead of constant O(1). Depends on your requirements.
As yet another alternative, if the modifications normally don't cause the values to move very far, just do a simple "find the new home and rotate the subvector" loop, which although it's O(n) instead of O(log n), is still faster on short moves because the constant is smaller.
In other words, don't use priority heaps unless you're constantly required to find the top k values. When there are lots of modifications between reads, there is usually a better approach.
Today, I had discussion with someone about Kruskal Minimum Spanning Tree algorithm because of page 13 of this slide.
The author of the presentation said that if we implement disjoint sets using (doubly) linked list, the performance for Make and Find will be O(1) and O(1) respectively. The time for operation Union(u,v) is min(nu,nv), where nu and nv are the sizes of the sets storing u and v.
I said that we can improve the time for the Union(u,v) to be O(1) by making the representation pointer of each member pointing a locator that contains the pointer to the real representation of the set.
In Java, the data structure would look like this :
class DisjointSet {
LinkedList<Vertex> list = new LinkedList<Vertex>(); // for holding the members, we might need it for print
static Member makeSet(Vertex v) {
Member m = new Member();
DisjointSet set = new DisjointSet();
m.set = set;
set.list.add(m);
m.vertex = v;
Locator loc = new Locator();
loc.representation = m;
m.locator = loc;
return m;
}
}
class Member {
DisjointSet set;
Locator locator;
Vertex vertex;
Member find() {
return locator.representation;
}
void union(Member u, Member v) { // assume nv is less than nu
u.set.list.append(v.set.list); // hypothetical method, append a list in O(1)
v.set = u.set;
v.locator.representation = u.locator.representation;
}
}
class Locator {
Member representation;
}
Sorry for the minimalistic code. If it can be made this way, than running time for every disjoint set operation (Make,Find,Union) will be O(1). But the one whom I had discussion with can't see the improvement. I would like to know your opinion on this.
And also what is the fastest performance of Find/Union in various implementations? I'm not an expert in data structure, but by quick browsing on the internet I found out there is no constant time data structure or algorithm to do this.
My intuition agrees with your colleague. You say:
u.set.list.append(v.set.list); // hypothetical method, append a list in O(1)
It looks like your intent is that the union is done via the append. But, to implement Union, you would have to remove duplicates for the result to be a set. So I can see an O(1) algorithm for a fixed set size, for example...
Int32 set1;
Int32 set2;
Int32 unionSets1And2 = set1 | set2;
But that strikes me as cheating. If you're doing this for general cases of N, I don't see how you avoid some form of iterating (or hash lookup). And that would make it O(n) (or at best O(log n)).
FYI: I had a hard time following your code. In makeSet, you construct a local Locator that never escapes the function. It doesn't look like it does anything. And it's not clear what your intent is in the append. Might want to edit and elaborate on your approach.
Using Tarjan's version of the Union-Find structure (with path compression and rank-weighed union), a sequence of m Finds and (n-1) intermixed Unions would be in O(m.α(m,n)), where α(m,n) is the inverse of Ackermann function which for all practical values of m and n has value 4. So this basically means that Union-Find has worst case amortized constant operations, but not quite.
To my knowledge, it is impossible to obtain a better theoretical complexity, though improvements have led to better practical efficiency.
For special cases of disjoint-sets such as those used in language theory, it has been shown that linear (i.e., everything in O(1)) adaptations are possible---essentially by grouping nodes together---but these improvements cannot be translated to the general problem. On the other hand of the spectrum, a somewhat similar core idea has been used with great success and ingenuity to make an O(n) algorithm for minimum spanning tree (Chazelle's algorithm).
So your code cannot be correct. The error is what Moron pointed out: when you make the union of two sets, you only update the "representation" of the lead of each list, but not of all other elements---while simultaneously assuming in the find function that every element directly knows its representation.
Given a number of lists of items, find the lists with matching items.
The brute force pseudo-code for this problem looks like:
foreach list L
foreach item I in list L
foreach list L2 such that L2 != L
for each item I2 in L2
if I == I2
return new 3-tuple(L, L2, I) //not important for the algorithm
I can think of a number of different ways of going about this - creating a list of lists and removing each candidate list after searching the others for example - but I'm wondering if there is a better algorithm for this?
I'm using Java, if that makes a difference to your implementation.
Thanks
Create a Map<Item,List<List>>.
Iterate through every item in every list.
each time you touch an item, add the current list to that item's entry in the Map.
You now have a Map entry for each item that tells you what lists that item appears in.
This algorithm is about O(N) where N is the number of lists (the exact complexity will be affected by how good your Map implementation is). I believe your algorithm was at least O(N^2).
Caveat: I am comparing number of comparisons, not memory use. If your lists are super huge and full of mostly non duplicated items, the map that my method creates might become too big.
As per your comment you want a MultiMap implementation. A multimap is like a Map but it can map each key to multiple values. Store the value and a reference to all the maps that contain that value.
Map<Object, List>
of course you should use a type safe instead of Object and a type safe List as the value. What you are trying to do is called an Inverted Index.
I'll start with the assumption that the datasets can fit in memory. If not, then you will need something fancier.
I refer below to a "set", where I am thinking of something like a C++ std::set. I don't know the Java equivalent, but any storage scheme that permits rapid lookup (tree, hash table, whatever).
Comparing three lists: L0, L1 and L2.
Read L0, placing each element in a set: S0.
Read L1, placing items that match an element of S0 into a new set: S1, and discarding others.
Discard S0.
Read L2, keeping items that match an element of S1 and discarding others.
Update
Just realised that the question was for "n" lists, not three. However the extension should be obvious. (I hope)
Update 2
Some untested C++ code to illustrate the algorithm
#include <string>
#include <vector>
#include <set>
#include <cassert>
typedef std::vector<std::string> strlist_t;
strlist_t GetMatches(std::vector<strlist_t> vLists)
{
assert(vLists.size() > 1);
std::set<std::string> s0, s1;
std::set<std::string> *pOld = &s1;
std::set<std::string> *pNew = &s0;
// unconditionally load first list as "new"
s0.insert(vLists[0].begin(), vLists[0].end());
for (size_t i=1; i<vLists.size(); ++i)
{
//swap recently read "new" to "old" now for comparison with new list
std::swap(pOld, pNew);
pNew->clear();
// only keep new elements if they are matched in old list
for (size_t j=0; j<vLists[i].size(); ++j)
{
if (pOld->end() != pOld->find(vLists[i][j]))
{
// found match
pNew->insert(vLists[i][j]);
}
}
}
return strlist_t(pNew->begin(), pNew->end());
}
You can use a trie, modified to record what lists each node belongs to.