Keeping track of the median of an expanding array - algorithm

Interview Question:
Edited Below
You are given an array. You make 2 heaps out of it, one minheap and the other max heap. Now find the median of the array using these 2 provided heaps in O(nlog n) time.
Corrected Question
Numbers are randomly generated and stored into an (expanding) array. How would you keep track of the median?
Solution
This problem can be solved using 2 heaps and the median can always be accessed in O(1) time.

Here's how you use both heaps. Note that I'm assuming you don't know the number of elements, and this is why we must pop until we pop something from the min heap that is larger than or equal to what we pop from the max heap. Note that we return the average because in the case of a set like {1, 2, 3, 4} the median is actually 2.5 (the average of the two "middle" values). I'm assuming double as the value type, but this can obviously be anything. Here:
double min = minheap.pop();
double max = maxheap.pop();
while(min < max) {
min = minheap.pop();
max = maxheap.pop();
}
return (min + max) / 2;
Since popping is O(log n) and we have to pop O(n / 2) values, this is O(n log n).

A working implementation in java, using 2 heaps, O(n log n). At any time I keep the two heaps balanced in size (ie. they differ at most by 1, if we entered n elements such that n%2==1). Getting the median is O(1). Adding a new element is O(log n).
public class MedianOfStream {
private int count;
private PriorityQueue<Integer> highs, lows;
public MedianOfStream() {
highs = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
#Override
public int compare(Integer arg0, Integer arg1) {
return arg0.compareTo(arg1);
}
});
lows = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
#Override
public int compare(Integer arg0, Integer arg1) {
return arg1.compareTo(arg0);
}
});
}
private int getMedian() {
if (count == 0)
return 0;
if (lows.size() == highs.size()) {
return (lows.peek() + highs.peek()) / 2;
} else if (lows.size() < highs.size()) {
return highs.peek();
}
return lows.peek();
}
private void swap(){
int h = highs.poll();
int l = lows.poll();
highs.add(l);
lows.add(h);
}
public int updateMedian(int n) {
count++;
if (count == 1)
lows.add(n);
else if (count==2) {
highs.add(n);
if(highs.peek()<lows.peek()) {
swap(); // O(log n)
}
}
else {
if (n > highs.peek()) {
lows.add(highs.poll()); // O(log n)
highs.add(n); // O(log n)
} else {
highs.add(lows.poll()); // O(log n)
lows.add(n); // O(log n)
}
if(highs.peek()<lows.peek()) {
swap(); // O(log n)
}
}
// if we added an even # of items,
// the heaps must be exactly the same size,
// otherwise we tolerate a 1-off difference
if (Math.abs(lows.size() - highs.size()) > (count % 2)) {
if (lows.size() < highs.size()) {
lows.add(highs.poll()); // O(log n)
} else {
highs.add(lows.poll()); // O(log n)
}
}
return getMedian(); // O(1)
}
}

Popping from a heap is an O(log N) operation, so you can achieve O(N log N) by popping half the elements from one of the heaps and taking the last popped value (you'd have to handle edge cases). This doesn't take advantage of the other heap though.
You can achieve O(N) using the selection algorithm, but the constant factor is very high. The former suggestion is probably better if you already have a heap.

JavaScript solution using two heaps:
function addNewNumber(minHeap, maxHeap, randomNumber) {
if (maxHeap.size() === minHeap.size()) {
if (minHeap.peek() && randomNumber > minHeap.peek()) {
maxHeap.insert(minHeap.remove());
minHeap.insert(randomNumber);
} else {
maxHeap.insert(randomNumber);
}
} else {
if (randomNumber < maxHeap.peek()) {
minHeap.insert(maxHeap.remove());
maxHeap.insert(randomNumber);
} else {
minHeap.insert(randomNumber);
}
}
}
function getMedian(minHeap, maxHeap) {
if (!maxHeap.size()) {
return 0;
}
if (minHeap.size() === maxHeap.size()) {
return (minHeap.peek() + maxHeap.peek()) / 2;
} else {
return maxHeap.peek();
}
}

Related

Trie Autocomplete with word weight(frequency)

I was asked this during a recent phone interview -
Given a Dictionary with a word and the weight of a word(frequency, higher is better), like so -
var words = new Dictionary<string,int>();
words.Add("am",7);
words.Add("ant", 5);
words.Add("amazon", 10);
words.Add("amazing", 8);
words.Add("an", 4);
words.Add("as", 11);
words.Add("be", 8);
words.Add("bee", 2);
words.Add("bed", 4);
words.Add("best", 12);
words.Add("amuck", 1);
words.Add("amock", 2);
words.Add("bestest", 1);
Design an API method, that given a prefix and a number k, return the top k words that match the prefix.
The words should be sorted based on their weight, the higher the better.
So, prefix = "am", k = 5, returns amazon, amazing, am, amock, amuck - in that specific order.
Performance on the prefix lookup is paramount, you can pre-process and use as much space as you like, as long as the prefix lookup is fast.
This is a Trie implementation, but my question is how best to handle the word weight and optimise the lookup. In my mind the options are -
a. For each node in the Trie, also store a sorted list of words (SortedDictionary<int,List<string>>) that start with this prefix - more space, but faster lookup.
b. For each node, store the Child nodes in some kind of sorted list, so you would still need to do a DFS for each child node to get the K words needed - less space compared to a., but slower.
I decided to go with option a.
public class TrieWithSuggestions
{
TrieWithSuggestions _trieRoot;
public TrieWithSuggestions()
{
}
public char Character { get; set; }
public int WordCount { get; set; } = 1;
public TrieWithSuggestions[] ChildNodes { get; set; } = new TrieWithSuggestions[26];
//Stores all words with this prefix.
public SortedDictionary<int, HashSet<string>> PrefixWordsDictionary = new SortedDictionary<int, HashSet<string>>();
public TrieWithSuggestions ConstructTrie(Dictionary<string, int> words)
{
if (words.Count > 0)
{
_trieRoot = new TrieWithSuggestions() { Character = default(char) };
foreach (var word in words)
{
var node = _trieRoot;
for (int i = 0; i < word.Key.Length; i++)
{
var c = word.Key[i];
if (node.ChildNodes[c - 'a'] != null)
{
node = node.ChildNodes[c - 'a'];
UpdateParentNodeInformation(node, word.Key, words[word.Key]);
node.WordCount++;
}
else
{
InsertIntoTrie(node, word.Key, i, words);
break;
}
}
}
}
return _trieRoot;
}
public List<string> GetMathchingWords(string prefix, int k)
{
if (_trieRoot != null)
{
var node = _trieRoot;
foreach (var ch in prefix)
{
if (node.ChildNodes[ch - 'a'] != null)
{
node = node.ChildNodes[ch - 'a'];
}
else
return null;
}
if (node != null)
return GetWords(node, k);
else
return null;
}
return null;
}
List<string> GetWords(TrieWithSuggestions node, int k)
{
List<string> output = new List<string>();
foreach (var dictEntry in node.PrefixWordsDictionary)
{
var entries = node.PrefixWordsDictionary[dictEntry.Key];
var take = Math.Min(entries.Count, k);
output.AddRange(entries.Take(take).ToList());
k -= entries.Count;
if (k == 0)
break;
}
return output;
}
void InsertIntoTrie(TrieWithSuggestions parentNode, string word, int startIndex, Dictionary<string, int> words)
{
for (int i = startIndex; i < word.Length; i++)
{
var c = word[i];
var childNode = new TrieWithSuggestions() { Character = c };
parentNode.ChildNodes[c - 'a'] = childNode;
UpdateParentNodeInformation(parentNode, word, words[word]);
parentNode = childNode;
if (i == word.Length - 1)
UpdateParentNodeInformation(parentNode, word, words[word]);
}
}
void UpdateParentNodeInformation(TrieWithSuggestions parentNode, string word, int wordWeight)
{
wordWeight *= -1;
if (parentNode.PrefixWordsDictionary.ContainsKey(wordWeight))
{
if (!parentNode.PrefixWordsDictionary[wordWeight].Contains(word))
parentNode.PrefixWordsDictionary[wordWeight].Add(word);
}
else
parentNode.PrefixWordsDictionary.Add(wordWeight, new HashSet<string>() { word });
}
}
Construct Trie - RunTime O(N* M * logN), Space - O(N * M * N) , N - #of words, M - avg word length.
Justification -
If there were no Dictionary, this would be O(N * M), insertion into a SortedDictionary is O(logN), so worst case Runtime must be O(N* M * logN)
Space seems trickier, but like before if there were no SortedDictionary, space would be O(N * M), and in the worst case, the Dictionary could have all N words, so Space Complexity looks like O(N * M * N)
GetMatchingWords - RunTime O(len(prefix) + k)
Function call -
var trie = new TrieWithSuggestions();
trie.ConstructTrie(words);
var list = trie.GetMathchingWords("am", 10); //amazon, amazing, am, amock, amuck
QUESTION:
Given the conditions on space and pre-processing, is there a better way to do this?
EDIT 1 -
a. Given this setup, it is best to sort the words by weight and then insert into the Trie. In this case a simple List<string> would suffice, since higher frequency words would have been inserted first automatically.
b. Now lets say that in addition to being initialized with a Dictionary<string,int>, we are also going to get additional word, frequency pairs. We would still want a lookup that is as fast as possible, given this requirement what is now the best data-structure to store the sorted list of words within a TrieNode, is a SortedDictionary<int,HashSet<string>> the best option?
You could first sort the input with respect to the weights. Then, you could use Lists instead of Dictionaries on the nodes of trie. Since the words come in increasing (or decreasing) order of weight, checking the last element of the list is enough to decide where to put this new word. This gets rid of the O(logN) time taken by Dictionary.
The input can be sorted in O(N * logN) with a comparison sort, or in O(N + W) with a counting sort where W is the maximum weight.
Time complexity of setting up the trie becomes O(N * logN + N * M). This is better than O(N * M * logN). Query time does not change.
(Last paragraph assumes HashSet operations execute in O(1) as in the question. It is wrong to make this assumption for arbitrary inputs and hash functions.)

Last remaining number

I was asked this question in an interview.
Given an array 'arr' of positive integers and a starting index 'k' of the array. Delete element at k and jump arr[k] steps in the array in circular fashion. Do this repeatedly until only one element remain. Find the last remaining element.
I thought of O(nlogn) solution using ordered map. Is any O(n) solution possible?
My guess is that there is not an O(n) solution to this problem based on the fact that it seems to involve doing something that is impossible. The obvious thing you would need to solve this problem in linear time is a data structure like an array that exposes two operations on an ordered collection of values:
O(1) order-preserving deletes from the data structure.
O(1) lookups of the nth undeleted item in the data structure.
However, such a data structure has been formally proven to not exist; see "Optimal Algorithms for List Indexing and Subset Rank" and its citations. It is not a proof to say that if the natural way to solve some problem involves using a data structure that is impossible, the problem itself is probably impossible, but such an intuition is often correct.
Anyway there are lots of ways to do this in O(n log n). Below is an implementation of maintaining a tree of undeleted ranges in the array. GetIndex() below returns an index into the original array given a zero-based index into the array if items had been deleted from it. Such a tree is not self-balancing so will have O(n) operations in the worst case but in the average case Delete and GetIndex will be O(log n).
namespace CircleGame
{
class Program
{
class ArrayDeletes
{
private class UndeletedRange
{
private int _size;
private int _index;
private UndeletedRange _left;
private UndeletedRange _right;
public UndeletedRange(int i, int sz)
{
_index = i;
_size = sz;
}
public bool IsLeaf()
{
return _left == null && _right == null;
}
public int Size()
{
return _size;
}
public void Delete(int i)
{
if (i >= _size)
throw new IndexOutOfRangeException();
if (! IsLeaf())
{
int left_range = _left._size;
if (i < left_range)
_left.Delete(i);
else
_right.Delete(i - left_range);
_size--;
return;
}
if (i == _size - 1)
{
_size--; // Can delete the last item in a range by decremnting its size
return;
}
if (i == 0) // Can delete the first item in a range by incrementing the index
{
_index++;
_size--;
return;
}
_left = new UndeletedRange(_index, i);
int right_index = i + 1;
_right = new UndeletedRange(_index + right_index, _size - right_index);
_size--;
_index = -1; // the index field of a non-leaf is no longer necessarily valid.
}
public int GetIndex(int i)
{
if (i >= _size)
throw new IndexOutOfRangeException();
if (IsLeaf())
return _index + i;
int left_range = _left._size;
if (i < left_range)
return _left.GetIndex(i);
else
return _right.GetIndex(i - left_range);
}
}
private UndeletedRange _root;
public ArrayDeletes(int n)
{
_root = new UndeletedRange(0, n);
}
public void Delete(int i)
{
_root.Delete(i);
}
public int GetIndex(int indexRelativeToDeletes )
{
return _root.GetIndex(indexRelativeToDeletes);
}
public int Size()
{
return _root.Size();
}
}
static int CircleGame( int[] array, int k )
{
var ary_deletes = new ArrayDeletes(array.Length);
while (ary_deletes.Size() > 1)
{
int next_step = array[ary_deletes.GetIndex(k)];
ary_deletes.Delete(k);
k = (k + next_step - 1) % ary_deletes.Size();
}
return array[ary_deletes.GetIndex(0)];
}
static void Main(string[] args)
{
var array = new int[] { 5,4,3,2,1 };
int last_remaining = CircleGame(array, 2); // third element, this call is zero-based...
}
}
}
Also note that if the values in the array are known to be bounded such that they are always less than some m less than n, there are lots of O(nm) algorithms -- for example, just using a circular linked list.
I couldn't think of an O(n) solution. However, we could have O(n log n) average time by using a treap or an augmented BST with a value in each node for the size of its subtree. The treap enables us to find and remove the kth entry in O(log n) average time.
For example, A = [1, 2, 3, 4] and k = 3 (as Sumit reminded me in the comments, use the array indexes as values in the tree since those are ordered):
2(0.9)
/ \
1(0.81) 4(0.82)
/
3(0.76)
Find and remove 3rd element. Start at 2 with size = 2 (including the left subtree). Go right. Left subtree is size 1, which together makes 3, so we found the 3rd element. Remove:
2(0.9)
/ \
1(0.81) 4(0.82)
Now we're starting on the third element in an array with n - 1 = 3 elements and looking for the 3rd element from there. We'll use zero-indexing to correlate with our modular arithmetic, so the third element in modulus 3 would be 2 and 2 + 3 = 5 mod 3 = 2, the second element. We find it immediately since the root with its left subtree is size 2. Remove:
4(0.82)
/
1(0.81)
Now we're starting on the second element in modulus 2, so 1, and we're adding 2. 3 mod 2 is 1. Removing the first element we are left with 4 as the last element.

Implementing queue using stack

This is question from homework:
Implement a FIFO queue using two stacks.
The total running time of Enqueue and Dequeue functions should be O(n) in the worst case scenario. Also, analyze the running time of the algorithm.
What I did:
void Enqueue(T *value)
{
s1.Push(value);
}
T *Dequeue()
{
if (s2.size > 0)
return s2.Pop();
else if (s1.size > 0)
{
for (int i = 0; i < s1.size; i++)
s2.Push(s1.Pop());
return s2.Pop();
}
else return NULL;
}
Analysis of the algorithm:
Running time of one Enqueue is Theta(1). Total running time of the all Enqueue functions is n * Theta(1) = Theta(n).
Running time of Dequeue in worst case scenario is Theta(n) (when you call it after the last Enqueue, i.e. when all the items inserted). In all other cases the running time is Theta(1).
So, the total running time is:
O(n) + O(n) + n * O(1) = 3 * O(n) = O(n)
Is this correct?
So, the total running time is: O(n) + O(n) + n * O(1) = 3 * O(n) =
O(n)
You're in the right direction, but you usually don't analyze "total running time", you split it to amortized average, worst case, and best case - and analyze it for each operation.
In your algorithm, it is easy to see that:
enqueue() runs in Theta(1) for all cases.
dequeue() runs in Theta(n) worst case and Theta(1) best case.
Noe, for the tricky part - we need to analyzed dequeue() amortised analysis.
First, note that before each Theta(n) (worst case), dequeue() you must have emptied the list, and inserted n elements.
This means, in order for the worst case to happen, you must have done at least n enqueue() operations, each Theta(1).
This gives us amortised time of:
T(n) = (n*CONST1 + CONST2*n) /(n+1)
^ ^ ^
n enqueues 1 "espansive" dequeue #operations
It is easy to see that T(n) is in Theta(1), giving you Theta(1) amortized time complexity.
tl;dr:
enqueue: Theta(1) all cases
dequeue: Theta(1) amortized, Theta(n) worst case
import java.util.Stack;
public class Q5_ImplementQueueUsingStack {
/**
* #param args
* Implement a MyQueue class which implements a queue using two stacks.
*/
public static Stack<Integer> s1 = new Stack<Integer>();
public static Stack<Integer> s2 = new Stack<Integer>();
public static void main(String[] args) {
int[] array = {2,5,10,3,11,7,13,8,9,4,1,6};
for(int i=0;i<5;i++){
enQueue(array[i]);
}
System.out.println(deQueue());
System.out.println(deQueue());
System.out.println(deQueue());
System.out.println(deQueue());
System.out.println(deQueue());
for(int i=0;i<4;i++){
enQueue(array[i+5]);
}
System.out.println(deQueue());
for(int i=0;i<3;i++){
enQueue(array[i+9]);
}
System.out.println(deQueue());
System.out.println(deQueue());
System.out.println(deQueue());
System.out.println(deQueue());
System.out.println(deQueue());
System.out.println(deQueue());
}
public static void enQueue(int data){
s1.push(data);
}
public static int deQueue(){
if(s2.isEmpty())
while(!s1.isEmpty())
s2.push(s1.pop());
return s2.pop();
}
}
Implement the following operations of a queue using stacks.
push(x) -- Push element x to the back of queue.
pop() -- Removes the element from in front of queue.
peek() -- Get the front element.
empty() -- Return whether the queue is empty.
class MyQueue {
Stack<Integer> input;
Stack<Integer> output;
/** Initialize your data structure here. */
public MyQueue() {
input = new Stack<Integer>();
output = new Stack<Integer>();
}
/** Push element x to the back of queue. */
public void push(int x) {
input.push(x);
}
/** Removes the element from in front of queue and returns that element. */
public int pop() {
peek();
return output.pop();
}
/** Get the front element. */
public int peek() {
if(output.isEmpty()) {
while(!input.isEmpty()) {
output.push(input.pop());
}
}
return output.peek();
}
/** Returns whether the queue is empty. */
public boolean empty() {
return input.isEmpty() && output.isEmpty();
}
}

find number of 1s in a 0-1 array with all 1s “on the left”?

An array consists N many 1's and 0's, all 1's come before any 0's. Find no of 1's in the array. It is clear that with binary search it is O(log N). Is there an algorithm do this in O(log(number of 1's)) time?
You can actually do it in O(lg m) time, where m is the number of 1s. I won't give the entire algorithm since this looks like homework, but here's a hint: try to "reverse" a binary search so that it expands the search area rather than contracting it.
If you just iterate over this array, you count all 1's and finally find 0 you made N+1 steps so it's O(n+1) algorith in my opinion.
class OneZero
{
public static int binary_search (byte[] array, int start, int end, int value)
{
if (end <= start) return -1;
if (Math.floor((start+end)/2) == value) return Math.floor((start+end)/2);
return binary_search (array, start, Math.floor((start+end)/2)-1, value);
}
public static int nbOfOnes (byte[] array, int value)
{
return (binary_search(array, 0, array.length,value)+1);
}
public static void main ()
{
byte[] arr = { 1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0 };
System.out.println(" number of 1s is: "+nbOfOnes(arr,1));
}
}

Calculate Space and Time complexity and Improve efficiency of this program

Problem
Find a list of non repeating number in a array of repeating numbers.
My Solution
public static int[] FindNonRepeatedNumber(int[] input)
{
List<int> nonRepeated = new List<int>();
bool repeated = false;
for (int i = 0; i < input.Length; i++)
{
repeated = false;
for (int j = 0; j < input.Length; j++)
{
if ((input[i] == input[j]) && (i != j))
{
//this means the element is repeated.
repeated = true;
break;
}
}
if (!repeated)
{
nonRepeated.Add(input[i]);
}
}
return nonRepeated.ToArray();
}
Time and space complexity
Time complexity = O(n^2)
Space complexity = O(n)
I am not sure with the above calculated time complexity, also how can I make this program more efficient and fast.
The complexity of the Algorithm you provided is O(n^2).
Use Hashmaps to improve the algorithm. The Psuedo code is as follows:
public static int[] FindNonRepeatedNumbers(int[] A)
{
Hashtable<int, int> testMap= new Hashtable<int, int>();
for (Entry<Integer, String> entry : testMap.entrySet()) {
tmp=testMap.get(A[i]);
testMap.put(A[i],tmp+1);
}
/* Elements that are not repeated are:
Set set = teatMap.entrySet();
// Get an iterator
Iterator i = set.iterator();
// Display elements
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
if(me.getValue() >1)
{
System.out.println(me.getValue());
}
}
Operation:
What I did here is I used Hashmaps with keys to the hashmaps being the elements of the input array. The values for the hashmaps are like counters for each element. So if an element occurs once then the value for that key is 1 and the key value is subsequently incremented based on recurrence of element in input array.
So finally you just check your hashmap and then display elements with hashvalue 1 which are non-repated elements. The time complexity for this algorithm is O(k) for creating hashmap and O(k) for searching, if the input array length is k. This is much faster than O(n^2). The worst case is when there are no repeated elements at all. The psuedo code might be messy but this approach is the best way I could think of.
Time complexity O(n) means you can't have an inner loop. A full inner loop is O(n^2).
two pointers. begining and end. increment begining when same letters reached and store the start and end pos ,length for reference... increment end otherwise.. keep doing this til end of list..compare all the outputs and you should have the longest continuous list of unique numbers. I hope this is what the question required. Linear algo.
void longestcontinuousunique(int arr[])
{
int start=0;
int end =0;
while (end! =arr.length())
{
if(arr[start] == arr[end])
{
start++;
savetolist(start,end,end-start);
}
else
end++
}
return maxelementof(savedlist);
}

Resources