This question already has answers here:
Stable separation for two classes of elements in an array
(3 answers)
Closed 9 years ago.
Suppose I have a function f and array of elements.
The function returns A or B for any element; you could visualize the elements this way ABBAABABAA.
I need to sort the elements according to the function, so the result is: AAAAAABBBB
The number of A values doesn't have to equal the number of B values. The total number of elements can be arbitrary (not fixed). Note that you don't sort chars, you sort objects that have a single char representation.
Few more things:
the sort should take linear time - O(n),
it should be performed in place,
it should be a stable sort.
Any ideas?
Note: if the above is not possible, do you have ideas for algorithms sacrificing one of the above requirements?
If it has to be linear and in-place, you could do a semi-stable version. By semi-stable I mean that A or B could be stable, but not both. Similar to Dukeling's answer, but you move both iterators from the same side:
a = first A
b = first B
loop while next A exists
if b < a
swap a,b elements
b = next B
a = next A
else
a = next A
With the sample string ABBAABABAA, you get:
ABBAABABAA
AABBABABAA
AAABBBABAA
AAAABBBBAA
AAAAABBBBA
AAAAAABBBB
on each turn, if you make a swap you move both, if not you just move a. This will keep A stable, but B will lose its ordering. To keep B stable instead, start from the end and work your way left.
It may be possible to do it with full stability, but I don't see how.
A stable sort might not be possible with the other given constraints, so here's an unstable sort that's similar to the partition step of quick-sort.
Have 2 iterators, one starting on the left, one starting on the right.
While there's a B at the right iterator, decrement the iterator.
While there's an A at the left iterator, increment the iterator.
If the iterators haven't crossed each other, swap their elements and repeat from 2.
Lets say,
Object_Array[1...N]
Type_A objs are A1,A2,...Ai
Type_B objs are B1,B2,...Bj
i+j = N
FOR i=1 :N
if Object_Array[i] is of Type_A
obj_A_count=obj_A_count+1
else
obj_B_count=obj_B_count+1
LOOP
Fill the resultant array with obj_A and obj_B with their respective counts depending on obj_A > obj_B
The following should work in linear time for a doubly-linked list. Because up to N insertion/deletions are involved that may cause quadratic time for arrays though.
Find the location where the first B should be after "sorting". This can be done in linear time by counting As.
Start with 3 iterators: iterA starts from the beginning of the container, and iterB starts from the above location where As and Bs should meet, and iterMiddle starts one element prior to iterB.
With iterA skip over As, find the 1st B, and move the object from iterA to iterB->previous position. Now iterA points to the next element after where the moved element used to be, and the moved element is now just before iterB.
Continue with step 3 until you reach iterMiddle. After that all elements between first() and iterB-1 are As.
Now set iterA to iterB-1.
Skip over Bs with iterB. When A is found move it to just after iterA and increment iterA.
Continue step 6 until iterB reaches end().
This would work as a stable sort for any container. The algorithm includes O(N) insertion/deletion, which is linear time for containers with O(1) insertions/deletions, but, alas, O(N^2) for arrays. Applicability in you case depends on whether the container is an array rather than a list.
If your data structure is a linked list instead of an array, you should be able to meet all three of your constraints. You just skim through the list and accumulating and moving the "B"s will be trivial pointer changes. Pseudo code below:
sort(list) {
node = list.head, blast = null, bhead = null
while(node != null) {
nextnode = node.next
if(node.val == "a") {
if(blast != null){
//move the 'a' to the front of the 'B' list
bhead.prev.next = node, node.prev = bhead.prev
blast.next = node.next, node.next.prev = blast
node.next = bhead, bhead.prev = node
}
}
else if(node.val == "b") {
if(blast == null)
bhead = blast = node
else //accumulate the "b"s..
blast = node
}
3
node = nextnode
}
}
So, you can do this in an array, but the memcopies, that emulate the list swap, will make it quiet slow for large arrays.
Firstly, assuming the array of A's and B's is either generated or read-in, I wonder why not avoid this question entirely by simply applying f as the list is being accumulated into memory into two lists that would subsequently be merged.
Otherwise, we can posit an alternative solution in O(n) time and O(1) space that may be sufficient depending on Sir Bohumil's ultimate needs:
Traverse the list and sort each segment of 1,000,000 elements in-place using the permutation cycles of the segment (once this step is done, the list could technically be sorted in-place by recursively swapping the inner-blocks, e.g., ABB AAB -> AAABBB, but that may be too time-consuming without extra space). Traverse the list again and use the same constant space to store, in two interval trees, the pointers to each block of A's and B's. For example, segments of 4,
ABBAABABAA => AABB AABB AA + pointers to blocks of A's and B's
Sequential access to A's or B's would be immediately available, and random access would come from using the interval tree to locate a specific A or B. One option could be to have the intervals number the A's and B's; e.g., to find the 4th A, look for the interval containing 4.
For sorting, an array of 1,000,000 four-byte elements (3.8MB) would suffice to store the indexes, using one bit in each element for recording visited indexes during the swaps; and two temporary variables the size of the largest A or B. For a list of one billion elements, the maximum combined interval trees would number 4000 intervals. Using 128 bits per interval, we can easily store numbered intervals for the A's and B's, and we can use the unused bits as pointers to the block index (10 bits) and offset in the case of B (20 bits). 4000*16 bytes = 62.5KB. We can store an additional array with only the B blocks' offsets in 4KB. Total space under 5MB for a list of one billion elements. (Space is in fact dependent on n but because it is extremely small in relation to n, for all practical purposes, we may consider it O(1).)
Time for sorting the million-element segments would be - one pass to count and index (here we can also accumulate the intervals and B offsets) and one pass to sort. Constructing the interval tree is O(nlogn) but n here is only 4000 (0.00005 of the one-billion list count). Total time O(2n) = O(n)
This should be possible with a bit of dynamic programming.
It works a bit like counting sort, but with a key difference. Make arrays of size n for both a and b count_a[n] and count_b[n]. Fill these arrays with how many As or Bs there has been before index i.
After just one loop, we can use these arrays to look up the correct index for any element in O(1). Like this:
int final_index(char id, int pos){
if(id == 'A')
return count_a[pos];
else
return count_a[n-1] + count_b[pos];
}
Finally, to meet the total O(n) requirement, the swapping needs to be done in a smart order. One simple option is to have recursive swapping procedure that doesn't actually perform any swapping until both elements would be placed in correct final positions. EDIT: This is actually not true. Even naive swapping will have O(n) swaps. But doing this recursive strategy will give you absolute minimum required swaps.
Note that in general case this would be very bad sorting algorithm since it has memory requirement of O(n * element value range).
Question: I have a single linked list (i.e. a list with only pointer to the next node). Additionally this is a circular linked list (in this example, the last node has a pointer to the first node). Every node in the list contains a char.
An example of such a list can be: a->b->c->b->a
Now how can I verify if this list is a pallindrome?
I have thought of the following solution:
Start from the head of list. Find the length of the list and then the mid node. Now start again from the head of the list and keep pushing elements in stack until the mid. Now traverse the list from the mid and pop element. If the value of the popped element is equal to the value of the current node. if not, return false. otherwise, continue until the stack is empty and we've verified all chars. CONS: uses extra stack space :(
Start from the head of list. Find the length of the list and then the mid node. now reverse the 2nd half of this list. and then using 2 pointers (one pointing to start and the other pointing to the mid+1'th element), check if the values are same. if not, return false. else continue until we reach the start node again. CONS: Changing original data structure.
Is there a more elegant way to approach this problem (which hopefully does not use O(n) extra space or changes original list)? I'm interested in the algorithm rather than any specific implementation.
Thanks
Since you're dealing with a single linked list, you must use a little extra space or a lot more extra time.
Your first approach sounds reasonable, but you can determine the length of the list and palindrome-ness in a single run.
We modify the so-called Floyd's Cycle-Finding Algorithm:
two pointers, "slow" and "fast", both start at the list head; the slow pointer advances one list element per iteration, the fast pointer two elements
in each step, the slow pointer pushes the current element on the stack
if the fast pointer reaches the end of the list, the slow pointer points to the middle of the list, so now:
the slow pointer advances to the end of the list, and in each step:
it pops one element from the stack and compares it to the current list element (if they are not equal, return false)
if the slow pointer reaches the end of the list, it is a palindrome
A little extra work is required for lists with an odd number of elements.
This is in pseudo-Haskell (I can't remember the exact syntax these days) and I've written for the non-circular case -- to fix that, just replace the clause matching against [] with whatever condition you use to identify you've come full circle.
p(xs) = q(xs, Just(xs)) != Nothing
q([], maybeYs) = maybeYs
q(x : xs, Nothing) = Nothing
q(x : xs, maybeYs) =
let maybeZs = q(xs, maybeYs) in
case maybeZs of
Nothing -> Nothing
Just (x :: zs) -> Just(zs)
otherwise -> Nothing
Since you know the Linked List does make a cycle, and you are only looking for palindromes starting at head, you can make this easier on yourself.
A -> B -> C -> B -> A
In this case, start with a pointer at head (call it H), and a pointer at head.Left() (call it T).
Now keep moving the head pointer H to the right, and the tail pointer T to the left.
As you walk the list, verify that the values of those elements are equal (i.e. a palindrome).
Your stopping condition however take a bit more. There are two cases:
Both pointers end point at the same element (i.e. odd number of elements)
The H pointer is pointing at the element just to the right of T.
So, you stop if H==T or if H==(T.Right()).
Using this approach (or similar) you visit each element just once.
Use the Tortoise and Hare approach as in the other solutions if you don't know if the linked list is cyclic.
Just paste my implementation so we could compare with each others, full test here:
/**
* Given a circular single linked list and the start pointer, check if it is a palindrome
* use a slow/fast pointer + stack is an elegant way
* tip: wheneve there is a circular linked list, think about using slow/fast pointer
*/
#include <iostream>
#include <stack>
using namespace std;
struct Node
{
char c;
Node* next;
Node(char c) {this->c = c;}
Node* chainNode(char c)
{
Node* p = new Node(c);
p->next = NULL;
this->next = p;
return p;
}
};
bool isPalindrome(Node* pStart)
{
Node* pSlow = pStart;
Node* pFast = pStart;
stack<Node*> s;
bool bEven = false;
while(true)
{
// BUG1: check fast pointer first
pFast = pFast->next;
if(pFast == pStart)
{
bEven = false;
break;
}
else
{
pFast = pFast->next;
if(pFast == pStart)
{
bEven = true;
break;
}
}
pSlow = pSlow->next;
s.push(pSlow);
}
if(s.empty()) return true; // BUG2: a, a->b->a
if(bEven) pSlow = pSlow->next; // BUG3: a->b->c->b->a, a->b->c->d->c->b->a: jump over the center pointer
while(!s.empty())
{
// pop stack and advance linked list
Node* topNode = s.top();
s.pop();
pSlow = pSlow->next;
// check
if(topNode->c != pSlow->c)
{
return false;
}
else
{
if(s.empty()) return true;
}
}
return false;
}
I think we dont need an extra space for this. And this can be done with O(n) complexity.
Modifying Philip's solution:
We modify the so-called Floyd's Cycle-Finding Algorithm:
Two pointers, "slow" and "fast", both start at the list head; the slow pointer advances one list element per iteration, the fast pointer two elements
in each step, the slow pointer pushes the current element on the stack
if the fast pointer reaches the end of the list, the slow pointer points to the middle of the list, so now:
Have another pointer at the start of the linked-list (start pointre) and now -
move the start pointer and the slow pointer one by one and compare them - if they are not equal, return false
- if the slow pointer reaches the end of the list, it is a palindrome
This is O(n) time complexity and no extra space is required.
In this earlier question, the OP asked for a data structure similar to a stack supporting the following operations in O(1) time each:
Push, which adds a new element atop the stack,
Pop, which removes the top element from the stack,
Find-Max, which returns (but does not remove) the largest element of the stack, and
Find-Min, which returns (but does not remove) the smallest element of the stack.
A few minutes ago I found this related question asking for a clarification on a similar data structure that instead of allowing for the max and min to be queried, allows for the median element of the stack to be queried. These two data structures seem to be a special case of a more general data structure supporting the following operations:
Push, which pushes an element atop the stack,
Pop, which pops the top of the stack, and
Find-Kth, which for a fixed k determined when the structure is created, returns the kth largest element of the stack.
It is possible to support all of these operations by storing a stack and an balanced binary search tree holding the top k elements, which would enable all these operations to run in O(log k) time. My question is this: is it possible to implement the above data structure faster than this? That is, could we get O(1) for all three operations? Or perhaps O(1) for push and pop and O(log k) for the order statistic lookup?
Since the structure can be used to sort k elements with O(k) push and find-kth operations, every comparison-based implementation has at least one of these cost Omega(log k), even in an amortized sense, with randomization.
Push can be O(log k) and pop/find-kth can be O(1) (use persistent data structures; push should precompute the order statistic). My gut feeling based on working with lower bounds for comparison-based algorithms is that O(1) push/pop and O(log k) find-kth is doable but requires amortization.
I think what tophat was saying is, implement a purely functional data structure that supports only O(log k) insert and O(1) find-kth (cached by insert), and then make a stack of these structures. Push inserts into the top version and pushes the update, pop pops the top version, and find-kth operates on the top version. This is O(log k)/O(1)/(1) but super-linear space.
EDIT: I was working on O(1) push/O(1) pop/O(log k) find-kth, and I think it can't be done. The sorting algorithm that tophat referred to can be adapted to get √k evenly spaced order statistics of a length-k array in time O(k + (√k) log k). Problem is, the algorithm must know how each order statistic compares with all other elements (otherwise it might be wrong), which means that it has bucketed everything into one of √k + 1 buckets, which takes Ω(k log (√k + 1)) = Ω(k log k) comparisons on information theoretic grounds. Oops.
Replacing √k by keps for any eps > 0, with O(1) push/O(1) pop, I don't think find-kth can be O(k1 - eps), even with randomization and amortization.
Whether this is actually faster than your log k implementation, depending on which operations are used most frequently, I propose an implementation with O(1) Find-kth and Pop and O(n) Push, where n is the stack size. And I also want to share this with SO because it is just a hilarious data structure at first sight, but might even be reasonable.
It's best described by a doubly doubly linked stack, or perhaps more easily dscribed as a hybrid of a linked stack and a doubly linked sorted list. Basically each node maintains 4 references to other nodes, the next and previous in stack order and the next and previous in sorted order on the element size. These two linked lists can be implemented using the same nodes, but they work completely seperately, i.e. the sorted linked list doesn't have to know about the stack order and vice versa.
Like a normal linked stack, the collection itself will need to maintain a reference to the top node (and to the bottom?). To accomodate the O(1) nature of the Find-kth method, the collection will also keep a reference to the kth largest element.
The pop method works as follows:
The popped node gets removed from the sorted doubly linked list, just like a removal from a normal sorted linked list. It takes O(1) as the collection has a reference to the top. Depending on whether the popped element was larger or smaller than the kth element, the reference to the kth largest element is set to either the previous or the next. So the method still has O(1) complexity.
The push method works just like a normal addition to a sorted linked list, which is a O(n) operation. It start with the smallest element, and inserts the new node when a larger element is encountered. To maintain the correct reference to the kth largest element, again either the previous or next element to the current kth largest element is selected, depending on whether the pushed node was larger or smaller than the kth largest element.
Of course next to this, the reference to the 'top' of the stack has to be set in both methods. Also there's the problem of k > n, for which you haven't specified what the data structure should do. I hope it is clear how it works, otherwise I could add an example.
But ok, not entirely the complexity you had hoped for, but I find this an interesting 'solution'.
Edit: An implementation of the described structure
A bounty was issued on this question, which indicates my original answer wasn’t good enough:P Perhaps the OP would like to see an implementation?
I have implemented both the median problem and the fixed-k problem, in C#. The implementation of the tracker of the median is just a wrapper around the tracker of the kth element, where k can mutate.
To recap the complexities:
Push takes O(n)
Pop takes O(1)
FindKth takes O(1)
Change k takes O(delta k)
I have already described the algorithm in reasonable detail in my original post. The implementation is then fairly straightforward(but not so trivial to get right, as there are a lot of inequality signs and if statements to consider). I have commented only to indicate what is done, but not the details of how, as it would otherwise become too large. The code is already quite lengthy for a SO post.
I do want to provide the contracts of all non-trivial public members:
K is the index of the element in the sorted linked list to keep a reference too. Is it mutable and when set, the structure is immediately corrected for that.
KthValue is the value at that index, unless the structure doesn’t have k elements yet, in which case it returns a default value.
HasKthValue exists to easily distinguish these default values from elements which happened to be the default value of its type.
Constructors: a null enumerable is interpreted as an empty enumerable, and a null comparer is interpreted as the default. This comparer defines the order used when determining the kth value.
So this is the code:
public sealed class KthTrackingStack<T>
{
private readonly Stack<Node> stack;
private readonly IComparer<T> comparer;
private int k;
private Node smallestNode;
private Node kthNode;
public int K
{
get { return this.k; }
set
{
if (value < 0) throw new ArgumentOutOfRangeException();
for (; k < value; k++)
{
if (kthNode.NextInOrder == null)
return;
kthNode = kthNode.NextInOrder;
}
for (; k >= value; k--)
{
if (kthNode.PreviousInOrder == null)
return;
kthNode = kthNode.PreviousInOrder;
}
}
}
public T KthValue
{
get { return HasKthValue ? kthNode.Value : default(T); }
}
public bool HasKthValue
{
get { return k < Count; }
}
public int Count
{
get { return this.stack.Count; }
}
public KthTrackingStack(int k, IEnumerable<T> initialElements = null, IComparer<T> comparer = null)
{
if (k < 0) throw new ArgumentOutOfRangeException("k");
this.k = k;
this.comparer = comparer ?? Comparer<T>.Default;
this.stack = new Stack<Node>();
if (initialElements != null)
foreach (T initialElement in initialElements)
this.Push(initialElement);
}
public void Push(T value)
{
//just a like a normal sorted linked list should the node before the inserted node be found.
Node nodeBeforeNewNode;
if (smallestNode == null || comparer.Compare(value, smallestNode.Value) < 0)
nodeBeforeNewNode = null;
else
{
nodeBeforeNewNode = smallestNode;//untested optimization: nodeBeforeNewNode = comparer.Compare(value, kthNode.Value) < 0 ? smallestNode : kthNode;
while (nodeBeforeNewNode.NextInOrder != null && comparerCompare(value, nodeBeforeNewNode.NextInOrder.Value) > 0)
nodeBeforeNewNode = nodeBeforeNewNode.NextInOrder;
}
//the following code includes the new node in the ordered linked list
Node newNode = new Node
{
Value = value,
PreviousInOrder = nodeBeforeNewNode,
NextInOrder = nodeBeforeNewNode == null ? smallestNode : nodeBeforeNewNode.NextInOrder
};
if (newNode.NextInOrder != null)
newNode.NextInOrder.PreviousInOrder = newNode;
if (newNode.PreviousInOrder != null)
newNode.PreviousInOrder.NextInOrder = newNode;
else
smallestNode = newNode;
//the following code deals with changes to the kth node due the adding the new node
if (kthNode != null && comparer.Compare(value, kthNode.Value) < 0)
{
if (HasKthValue)
kthNode = kthNode.PreviousInOrder;
}
else if (!HasKthValue)
{
kthNode = newNode;
}
stack.Push(newNode);
}
public T Pop()
{
Node result = stack.Pop();
//the following code deals with changes to the kth node
if (HasKthValue)
{
if (comparer.Compare(result.Value, kthNode.Value) <= 0)
kthNode = kthNode.NextInOrder;
}
else if(kthNode.PreviousInOrder != null || Count == 0)
{
kthNode = kthNode.PreviousInOrder;
}
//the following code maintains the order in the linked list
if (result.NextInOrder != null)
result.NextInOrder.PreviousInOrder = result.PreviousInOrder;
if (result.PreviousInOrder != null)
result.PreviousInOrder.NextInOrder = result.NextInOrder;
else
smallestNode = result.NextInOrder;
return result.Value;
}
public T Peek()
{
return this.stack.Peek().Value;
}
private sealed class Node
{
public T Value { get; set; }
public Node NextInOrder { get; internal set; }
public Node PreviousInOrder { get; internal set; }
}
}
public class MedianTrackingStack<T>
{
private readonly KthTrackingStack<T> stack;
public void Push(T value)
{
stack.Push(value);
stack.K = stack.Count / 2;
}
public T Pop()
{
T result = stack.Pop();
stack.K = stack.Count / 2;
return result;
}
public T Median
{
get { return stack.KthValue; }
}
public MedianTrackingStack(IEnumerable<T> initialElements = null, IComparer<T> comparer = null)
{
stack = new KthTrackingStack<T>(initialElements == null ? 0 : initialElements.Count()/2, initialElements, comparer);
}
}
Of course you're always free to ask any question about this code, as I realize some things may not be obvious from the description and sporadic comments
The only actual working implementation I can wrap my head around is Push/Pop O(log k) and Kth O(1).
Stack (single linked)
Min Heap (size k)
Stack2 (doubly linked)
The value nodes will be shared between the Stack, Heap and Stack2
PUSH:
Push to the stack
If value >= heap root
If heap size < k
Insert value in heap
Else
Remove heap root
Push removed heap root to stack2
Insert value in heap
POP:
Pop from the stack
If popped node has stack2 references
Remove from stack2 (doubly linked list remove)
If popped node has heap references
Remove from the heap (swap with last element, perform heap-up-down)
Pop from stack2
If element popped from stack2 is not null
Insert element popped from stack2 into heap
KTH:
If heap is size k
Return heap root value
You could use a skip list . (I first thought of linked-list, but insertion is O(n) and amit corrected me with skip list. I think this data structure could be pretty interesting in your case)
With this data structure, inserting/deleting would take O(ln(k))
and finding the maximum O(1)
I would use :
a stack, containing your elements
a a stack containing the history of skip list (containing the k smallest elements)
(I realised it was the Kth largest..element. but it's pretty much the same problem)
when pushing (O(ln(k)):
if the element is less the kth element, delete the kth element (O(ln(k)) put it in the LIFO pile (O(1)) then insert the element in the skip list O(ln(k))
otherwise it's not in the skip list just put it on the pile (O(1))
When pushing you add a new skip list to the history, since this is similar to a copy on write it wouldn't take more than O(ln(k))
when popping (O(1):
you just pop from both stacks
getting kth element O(1):
always take the maximum element in the list (O(1))
All the ln(k) are amortised cost.
Example:
I will take the same example as yours (on Stack with find-min/find-max more efficient than O(n)) :
Suppose that we have a stack and add the values 2, 7, 1, 8, 3, and 9, in that order. and k = 3
I will represent it this way :
[number in the stack] [ skip list linked with that number]
first I push 2,7 and 1 (it doesn't make sens to look for the kth element in a list of less than k elements)
1 [7,2,1]
7 [7,2,null]
2 [2,null,null]
If I want the kth element I just need to take the max in the linked list: 7
now I push 8,3, 9
on the top of the stack I have :
8 [7,2,1] since 8 > kth element therefore skip list doesn't change
then :
3 [3,2,1] since 3 < kth element, the kth element has changed. I first delete 7 who was the previous kth element (O(ln(k))) then insert 3 O(ln(k)) => total O(ln(k))
then :
9 [3,2,1] since 9 > kth element
Here is the stack I get :
9 [3,2,1]
3 [3,2,1]
8 [7,2,1]
1 [7,2,1]
7 [7,2,null]
2 [2,null,null]
find k th element :
I get 3 in O(1)
now I can pop 9 and 3 (takes O(1)):
8 [7,2,1]
1 [7,2,1]
7 [7,2,null]
2 [2,null,null]
find kth element :
I get 7 in O(1)
and push 0 (takes O(ln(k) - insertion)
0 [2,1,0]
8 [7,2,1]
1 [7,2,1]
7 [7,2,null]
2 [2,null,null]
#tophat is right - since this structure could be used to implement a sort, it can't have less complexity than an equivalent sort algorithm. So how do you do a sort in less than O(lg N)? Use Radix Sort.
Here is an implementation which makes use of a Binary Trie. Inserting items into a binary Trie is essentially the same operation as performing a radix sort. The cost for inserting and deleting s O(m), where m is a constant: the number of bits in the key. Finding the next largest or smallest key is also O(m), accomplished by taking the next step in an in-order depth-first traversal.
So the general idea is to use the values pushed onto the stack as keys in the trie. The data to store is the occurance count of that item in the stack. For each pushed item: if it exists in the trie, increment its count, else store it with a count of 1. When you pop an item, find it, decrement the count, and remove it if the count is now 0. Both those operations are O(m).
To get O(1) FindKth, keep track of 2 values: The value of the Kth item, and how many instances of that value are in the first K item. (for example, for K=4 and a stack of [1,2,3,2,0,2], the Kth value is 2 and the "iCount" is 2.) Then when you push values < the KthValue, you simply decrement the instance count, and if it is 0, do a FindPrev on the trie to get the next smaller value.
When you pop values greater than the KthValue, increment the instance count if more instances of that vaue exist, else do a FindNext to get the next larger value.
(The rules are different if there are less than K items. In that case, you can simply track the max inserted value. When there are K items, the max will be the Kth.)
Here is a C implementation. It relies on a BinaryTrie (built using the example at PineWiki as a base) with this interface:
BTrie* BTrieInsert(BTrie* t, Item key, int data);
BTrie* BTrieFind(BTrie* t, Item key);
BTrie* BTrieDelete(BTrie* t, Item key);
BTrie* BTrieNextKey(BTrie* t, Item key);
BTrie* BTriePrevKey(BTrie* t, Item key);
Here is the Push function.
void KSStackPush(KStack* ks, Item val)
{
BTrie* node;
//resize if needed
if (ks->ct == ks->sz) ks->stack = realloc(ks->stack,sizeof(Item)*(ks->sz*=2));
//push val
ks->stack[ks->ct++]=val;
//record count of value instances in trie
node = BTrieFind(ks->trie, val);
if (node) node->data++;
else ks->trie = BTrieInsert(ks->trie, val, 1);
//adjust kth if needed
ksCheckDecreaseKth(ks,val);
}
Here is the helper to track the KthValue
//check if inserted val is in set of K
void ksCheckDecreaseKth(KStack* ks, Item val)
{
//if less than K items, track the max.
if (ks->ct <= ks->K) {
if (ks->ct==1) { ks->kthValue = val; ks->iCount = 1;} //1st item
else if (val == ks->kthValue) { ks->iCount++; }
else if (val > ks->kthValue) { ks->kthValue = val; ks->iCount = 1;}
}
//else if value is one of the K, decrement instance count
else if (val < ks->kthValue && (--ks->iCount<=0)) {
//if that was only instance in set,
//find the previous value, include all its instances
BTrie* node = BTriePrev(ks->trie, ks->kthValue);
ks->kthValue = node->key;
ks->iCount = node->data;
}
}
Here is the Pop function
Item KSStackPop(KStack* ks)
{
//pop val
Item val = ks->stack[--ks->ct];
//find in trie
BTrie* node = BTrieFind(ks->trie, val);
//decrement count, remove if no more instances
if (--node->data == 0)
ks->trie = BTrieDelete(ks->trie, val);
//adjust kth if needed
ksCheckIncreaseKth(ks,val);
return val;
}
And the helper to increase the KthValue
//check if removing val causes Kth to increase
void ksCheckIncreaseKth(KStack* ks, Item val)
{
//if less than K items, track max
if (ks->ct < ks->K)
{ //if removing the max,
if (val==ks->kthValue) {
//find the previous node, and set the instance count.
BTrie* node = BTriePrev(ks->trie, ks->kthValue);
ks->kthValue = node->key;
ks->iCount = node->data;
}
}
//if removed val was among the set of K,add a new item
else if (val <= ks->kthValue)
{
BTrie* node = BTrieFind(ks->trie, ks->kthValue);
//if more instances of kthValue exist, add 1 to set.
if (node && ks->iCount < node->data) ks->iCount++;
//else include 1 instance of next value
else {
BTrie* node = BTrieNext(ks->trie, ks->kthValue);
ks->kthValue = node->key;
ks->iCount = 1;
}
}
}
So this is algorithm is O(1) for all 3 operations. It can also support the Median operation: Start with KthValue = the first value, and whenever stack size changes by 2, do an IncreaseKth or DecreasesKth operation. The downside is that the constant is large. It is only a win when m < lgK. However, for small keys and large K, this may be good choice.
What if you paired the stack with a pair of Fibonacci Heaps? That could give amortized O(1) Push and FindKth, and O(lgN) delete.
The stack stores [value, heapPointer] pairs. The heaps store stack pointers.
Create one MaxHeap, one MinHeap.
On Push:
if MaxHeap has less than K items, insert the stack top into the MaxHeap;
else if the new value is less than the top of the MaxHeap, first insert the result of DeleteMax in the MinHeap, then insert the new item into MaxHeap;
else insert it into the MinHeap. O(1) (or O(lgK) if DeleteMax is needed)
On FindKth, return the top of the MaxHeap. O(1)
On Pop, also do a Delete(node) from the popped item's heap.
If it was in the MinHeap, you are done. O(lgN)
If it was in the MaxHeap, also perform a DeleteMin from the MinHeap and Insert the result in the MaxHeap. O(lgK)+O(lgN)+O(1)
Update:
I realized I wrote it up as K'th smallest, not K'th largest.
I also forgot a step when a new value is less than the current K'th smallest. And that step
pushes the worst case insert back to O(lg K). This may still be ok for uniformly distributed input and small K, as it will only hit that case on K/N insertions.
*moved New Idea to different answer - it got too large.
Use a Trie to store your values. Tries already have an O(1) insert complexity. You only need to worry about two things, popping and searching, but if you tweak your program a little, it would be easy.
When inserting (pushing), have a counter for each path that stores the number of elements inserted there. This will allow each node to keep track of how many elements have been inserted using that path, i.e. the number represents the number of elements that are stored beneath that path. That way, when you try to look for the kth element, it would be a simple comparison at each path.
For popping, you can have a static object that has a link to the last stored object. That object can be accessed from the root object, hence O(1). Of course, you would need to add functions to retrieve the last object inserted, which means the newly pushed node must have a pointer to the previously pushed element (implemented in the push procedure; very simple, also O(1)). You also need to decrement the counter, which means each node must have a pointer to the parent node (also simple).
For finding kth element (this is for smallest kth element, but finding the largest is very similar): when you enter each node you pass in k and the minimum index for the branch (for the root it would be 0). Then you do a simple if comparison for each path: if (k between minimum index and minimum index + pathCounter), you enter that path passing in k and the new minimum index as (minimum index + sum of all previous pathCounters, excluding the one you took). I think this is O(1), since increasing the number data within a certain range doesn't increase the difficulty of finding k.
I hope this helps, and if anything is not very clear, just let me know.
Given a number of lists of items, find the lists with matching items.
The brute force pseudo-code for this problem looks like:
foreach list L
foreach item I in list L
foreach list L2 such that L2 != L
for each item I2 in L2
if I == I2
return new 3-tuple(L, L2, I) //not important for the algorithm
I can think of a number of different ways of going about this - creating a list of lists and removing each candidate list after searching the others for example - but I'm wondering if there is a better algorithm for this?
I'm using Java, if that makes a difference to your implementation.
Thanks
Create a Map<Item,List<List>>.
Iterate through every item in every list.
each time you touch an item, add the current list to that item's entry in the Map.
You now have a Map entry for each item that tells you what lists that item appears in.
This algorithm is about O(N) where N is the number of lists (the exact complexity will be affected by how good your Map implementation is). I believe your algorithm was at least O(N^2).
Caveat: I am comparing number of comparisons, not memory use. If your lists are super huge and full of mostly non duplicated items, the map that my method creates might become too big.
As per your comment you want a MultiMap implementation. A multimap is like a Map but it can map each key to multiple values. Store the value and a reference to all the maps that contain that value.
Map<Object, List>
of course you should use a type safe instead of Object and a type safe List as the value. What you are trying to do is called an Inverted Index.
I'll start with the assumption that the datasets can fit in memory. If not, then you will need something fancier.
I refer below to a "set", where I am thinking of something like a C++ std::set. I don't know the Java equivalent, but any storage scheme that permits rapid lookup (tree, hash table, whatever).
Comparing three lists: L0, L1 and L2.
Read L0, placing each element in a set: S0.
Read L1, placing items that match an element of S0 into a new set: S1, and discarding others.
Discard S0.
Read L2, keeping items that match an element of S1 and discarding others.
Update
Just realised that the question was for "n" lists, not three. However the extension should be obvious. (I hope)
Update 2
Some untested C++ code to illustrate the algorithm
#include <string>
#include <vector>
#include <set>
#include <cassert>
typedef std::vector<std::string> strlist_t;
strlist_t GetMatches(std::vector<strlist_t> vLists)
{
assert(vLists.size() > 1);
std::set<std::string> s0, s1;
std::set<std::string> *pOld = &s1;
std::set<std::string> *pNew = &s0;
// unconditionally load first list as "new"
s0.insert(vLists[0].begin(), vLists[0].end());
for (size_t i=1; i<vLists.size(); ++i)
{
//swap recently read "new" to "old" now for comparison with new list
std::swap(pOld, pNew);
pNew->clear();
// only keep new elements if they are matched in old list
for (size_t j=0; j<vLists[i].size(); ++j)
{
if (pOld->end() != pOld->find(vLists[i][j]))
{
// found match
pNew->insert(vLists[i][j]);
}
}
}
return strlist_t(pNew->begin(), pNew->end());
}
You can use a trie, modified to record what lists each node belongs to.