How to validate if a B-tree is sorted - algorithm

I just had this as an interview question and was wondering if anyone knows the answer?
Write a method that validates whether a B-tree is correctly sorted. You do NOT need to validate whether
the tree is balanced. Use the following model for a node in the B-tree.
It was to be done in Java and use this model:
class Node {
List<Integer> keys;
List<Node> children;
}

One (space-inefficient but simple) way to do this is to do a generalized inorder traversal of the B-tree to get back the keys in what should be sorted order, then to check whether that sequence actually is in sorted order. Here's some quick code for this:
public static boolean isSorted(Node root) {
ArrayList<Integer> values = new ArrayList<Integer>();
performInorderTraversal(root, values);
return isArraySorted(values);
}
private static void performInorderTraversal(Node root, ArrayList<Integer> result) {
/* An empty tree has no values. */
if (result == null) return;
/* Process the first tree here, then loop, processing the interleaved
* keys and trees.
*/
performInorderTraversal(root.children.get(0), result);
for (int i = 1; i < root.children.size(); i++) {
result.add(root.children.get(i - 1));
performInorderTraversal(root.children.get(i), result);
}
}
private static boolean isArraySorted(ArrayList<Integer> array) {
for (int i = 0; i < array.size() - 1; i++) {
if (array.get(i) >= array.get(i + 1)) return false;
}
return true;
}
This takes time O(n) and uses space O(n), where n is the number of elements in the B-tree. You can cut the space usage down to O(h), where h is the height of the B-tree, by not storing all the elements in the traversal and instead just tracking the very last one, stopping the search early if the next-encountered value is not larger than the previous one. I didn't do that here because it takes more code, but conceptually it's not too hard.
Hope this helps!

Related

Trie Autocomplete with word weight(frequency)

I was asked this during a recent phone interview -
Given a Dictionary with a word and the weight of a word(frequency, higher is better), like so -
var words = new Dictionary<string,int>();
words.Add("am",7);
words.Add("ant", 5);
words.Add("amazon", 10);
words.Add("amazing", 8);
words.Add("an", 4);
words.Add("as", 11);
words.Add("be", 8);
words.Add("bee", 2);
words.Add("bed", 4);
words.Add("best", 12);
words.Add("amuck", 1);
words.Add("amock", 2);
words.Add("bestest", 1);
Design an API method, that given a prefix and a number k, return the top k words that match the prefix.
The words should be sorted based on their weight, the higher the better.
So, prefix = "am", k = 5, returns amazon, amazing, am, amock, amuck - in that specific order.
Performance on the prefix lookup is paramount, you can pre-process and use as much space as you like, as long as the prefix lookup is fast.
This is a Trie implementation, but my question is how best to handle the word weight and optimise the lookup. In my mind the options are -
a. For each node in the Trie, also store a sorted list of words (SortedDictionary<int,List<string>>) that start with this prefix - more space, but faster lookup.
b. For each node, store the Child nodes in some kind of sorted list, so you would still need to do a DFS for each child node to get the K words needed - less space compared to a., but slower.
I decided to go with option a.
public class TrieWithSuggestions
{
TrieWithSuggestions _trieRoot;
public TrieWithSuggestions()
{
}
public char Character { get; set; }
public int WordCount { get; set; } = 1;
public TrieWithSuggestions[] ChildNodes { get; set; } = new TrieWithSuggestions[26];
//Stores all words with this prefix.
public SortedDictionary<int, HashSet<string>> PrefixWordsDictionary = new SortedDictionary<int, HashSet<string>>();
public TrieWithSuggestions ConstructTrie(Dictionary<string, int> words)
{
if (words.Count > 0)
{
_trieRoot = new TrieWithSuggestions() { Character = default(char) };
foreach (var word in words)
{
var node = _trieRoot;
for (int i = 0; i < word.Key.Length; i++)
{
var c = word.Key[i];
if (node.ChildNodes[c - 'a'] != null)
{
node = node.ChildNodes[c - 'a'];
UpdateParentNodeInformation(node, word.Key, words[word.Key]);
node.WordCount++;
}
else
{
InsertIntoTrie(node, word.Key, i, words);
break;
}
}
}
}
return _trieRoot;
}
public List<string> GetMathchingWords(string prefix, int k)
{
if (_trieRoot != null)
{
var node = _trieRoot;
foreach (var ch in prefix)
{
if (node.ChildNodes[ch - 'a'] != null)
{
node = node.ChildNodes[ch - 'a'];
}
else
return null;
}
if (node != null)
return GetWords(node, k);
else
return null;
}
return null;
}
List<string> GetWords(TrieWithSuggestions node, int k)
{
List<string> output = new List<string>();
foreach (var dictEntry in node.PrefixWordsDictionary)
{
var entries = node.PrefixWordsDictionary[dictEntry.Key];
var take = Math.Min(entries.Count, k);
output.AddRange(entries.Take(take).ToList());
k -= entries.Count;
if (k == 0)
break;
}
return output;
}
void InsertIntoTrie(TrieWithSuggestions parentNode, string word, int startIndex, Dictionary<string, int> words)
{
for (int i = startIndex; i < word.Length; i++)
{
var c = word[i];
var childNode = new TrieWithSuggestions() { Character = c };
parentNode.ChildNodes[c - 'a'] = childNode;
UpdateParentNodeInformation(parentNode, word, words[word]);
parentNode = childNode;
if (i == word.Length - 1)
UpdateParentNodeInformation(parentNode, word, words[word]);
}
}
void UpdateParentNodeInformation(TrieWithSuggestions parentNode, string word, int wordWeight)
{
wordWeight *= -1;
if (parentNode.PrefixWordsDictionary.ContainsKey(wordWeight))
{
if (!parentNode.PrefixWordsDictionary[wordWeight].Contains(word))
parentNode.PrefixWordsDictionary[wordWeight].Add(word);
}
else
parentNode.PrefixWordsDictionary.Add(wordWeight, new HashSet<string>() { word });
}
}
Construct Trie - RunTime O(N* M * logN), Space - O(N * M * N) , N - #of words, M - avg word length.
Justification -
If there were no Dictionary, this would be O(N * M), insertion into a SortedDictionary is O(logN), so worst case Runtime must be O(N* M * logN)
Space seems trickier, but like before if there were no SortedDictionary, space would be O(N * M), and in the worst case, the Dictionary could have all N words, so Space Complexity looks like O(N * M * N)
GetMatchingWords - RunTime O(len(prefix) + k)
Function call -
var trie = new TrieWithSuggestions();
trie.ConstructTrie(words);
var list = trie.GetMathchingWords("am", 10); //amazon, amazing, am, amock, amuck
QUESTION:
Given the conditions on space and pre-processing, is there a better way to do this?
EDIT 1 -
a. Given this setup, it is best to sort the words by weight and then insert into the Trie. In this case a simple List<string> would suffice, since higher frequency words would have been inserted first automatically.
b. Now lets say that in addition to being initialized with a Dictionary<string,int>, we are also going to get additional word, frequency pairs. We would still want a lookup that is as fast as possible, given this requirement what is now the best data-structure to store the sorted list of words within a TrieNode, is a SortedDictionary<int,HashSet<string>> the best option?
You could first sort the input with respect to the weights. Then, you could use Lists instead of Dictionaries on the nodes of trie. Since the words come in increasing (or decreasing) order of weight, checking the last element of the list is enough to decide where to put this new word. This gets rid of the O(logN) time taken by Dictionary.
The input can be sorted in O(N * logN) with a comparison sort, or in O(N + W) with a counting sort where W is the maximum weight.
Time complexity of setting up the trie becomes O(N * logN + N * M). This is better than O(N * M * logN). Query time does not change.
(Last paragraph assumes HashSet operations execute in O(1) as in the question. It is wrong to make this assumption for arbitrary inputs and hash functions.)

Last remaining number

I was asked this question in an interview.
Given an array 'arr' of positive integers and a starting index 'k' of the array. Delete element at k and jump arr[k] steps in the array in circular fashion. Do this repeatedly until only one element remain. Find the last remaining element.
I thought of O(nlogn) solution using ordered map. Is any O(n) solution possible?
My guess is that there is not an O(n) solution to this problem based on the fact that it seems to involve doing something that is impossible. The obvious thing you would need to solve this problem in linear time is a data structure like an array that exposes two operations on an ordered collection of values:
O(1) order-preserving deletes from the data structure.
O(1) lookups of the nth undeleted item in the data structure.
However, such a data structure has been formally proven to not exist; see "Optimal Algorithms for List Indexing and Subset Rank" and its citations. It is not a proof to say that if the natural way to solve some problem involves using a data structure that is impossible, the problem itself is probably impossible, but such an intuition is often correct.
Anyway there are lots of ways to do this in O(n log n). Below is an implementation of maintaining a tree of undeleted ranges in the array. GetIndex() below returns an index into the original array given a zero-based index into the array if items had been deleted from it. Such a tree is not self-balancing so will have O(n) operations in the worst case but in the average case Delete and GetIndex will be O(log n).
namespace CircleGame
{
class Program
{
class ArrayDeletes
{
private class UndeletedRange
{
private int _size;
private int _index;
private UndeletedRange _left;
private UndeletedRange _right;
public UndeletedRange(int i, int sz)
{
_index = i;
_size = sz;
}
public bool IsLeaf()
{
return _left == null && _right == null;
}
public int Size()
{
return _size;
}
public void Delete(int i)
{
if (i >= _size)
throw new IndexOutOfRangeException();
if (! IsLeaf())
{
int left_range = _left._size;
if (i < left_range)
_left.Delete(i);
else
_right.Delete(i - left_range);
_size--;
return;
}
if (i == _size - 1)
{
_size--; // Can delete the last item in a range by decremnting its size
return;
}
if (i == 0) // Can delete the first item in a range by incrementing the index
{
_index++;
_size--;
return;
}
_left = new UndeletedRange(_index, i);
int right_index = i + 1;
_right = new UndeletedRange(_index + right_index, _size - right_index);
_size--;
_index = -1; // the index field of a non-leaf is no longer necessarily valid.
}
public int GetIndex(int i)
{
if (i >= _size)
throw new IndexOutOfRangeException();
if (IsLeaf())
return _index + i;
int left_range = _left._size;
if (i < left_range)
return _left.GetIndex(i);
else
return _right.GetIndex(i - left_range);
}
}
private UndeletedRange _root;
public ArrayDeletes(int n)
{
_root = new UndeletedRange(0, n);
}
public void Delete(int i)
{
_root.Delete(i);
}
public int GetIndex(int indexRelativeToDeletes )
{
return _root.GetIndex(indexRelativeToDeletes);
}
public int Size()
{
return _root.Size();
}
}
static int CircleGame( int[] array, int k )
{
var ary_deletes = new ArrayDeletes(array.Length);
while (ary_deletes.Size() > 1)
{
int next_step = array[ary_deletes.GetIndex(k)];
ary_deletes.Delete(k);
k = (k + next_step - 1) % ary_deletes.Size();
}
return array[ary_deletes.GetIndex(0)];
}
static void Main(string[] args)
{
var array = new int[] { 5,4,3,2,1 };
int last_remaining = CircleGame(array, 2); // third element, this call is zero-based...
}
}
}
Also note that if the values in the array are known to be bounded such that they are always less than some m less than n, there are lots of O(nm) algorithms -- for example, just using a circular linked list.
I couldn't think of an O(n) solution. However, we could have O(n log n) average time by using a treap or an augmented BST with a value in each node for the size of its subtree. The treap enables us to find and remove the kth entry in O(log n) average time.
For example, A = [1, 2, 3, 4] and k = 3 (as Sumit reminded me in the comments, use the array indexes as values in the tree since those are ordered):
2(0.9)
/ \
1(0.81) 4(0.82)
/
3(0.76)
Find and remove 3rd element. Start at 2 with size = 2 (including the left subtree). Go right. Left subtree is size 1, which together makes 3, so we found the 3rd element. Remove:
2(0.9)
/ \
1(0.81) 4(0.82)
Now we're starting on the third element in an array with n - 1 = 3 elements and looking for the 3rd element from there. We'll use zero-indexing to correlate with our modular arithmetic, so the third element in modulus 3 would be 2 and 2 + 3 = 5 mod 3 = 2, the second element. We find it immediately since the root with its left subtree is size 2. Remove:
4(0.82)
/
1(0.81)
Now we're starting on the second element in modulus 2, so 1, and we're adding 2. 3 mod 2 is 1. Removing the first element we are left with 4 as the last element.

Algorithm to list unique permutations of string with duplicate letters

For example, string "AAABBB" will have permutations:
"ABAABB",
"BBAABA",
"ABABAB",
etc
What's a good algorithm for generating the permutations? (And what's its time complexity?)
For a multiset, you can solve recursively by position (JavaScript code):
function f(multiset,counters,result){
if (counters.every(x => x === 0)){
console.log(result);
return;
}
for (var i=0; i<counters.length; i++){
if (counters[i] > 0){
_counters = counters.slice();
_counters[i]--;
f(multiset,_counters,result + multiset[i]);
}
}
}
f(['A','B'],[3,3],'');
This is not full answer, just an idea.
If your strings has fixed number of only two letters I'll go with binary tree and good recursion function.
Each node is object that contains name with prefix of parent name and suffix A or B furthermore it have numbers of A and B letters in the name.
Node constructor gets name of parent and number of A and B from parent so it needs only to add 1 to number of A or B and one letter to name.
It doesn't construct next node if there is more than three A (in case of A node) or B respectively, or their sum is equal to the length of starting string.
Now you can collect leafs of 2 trees (their names) and have all permutations that you need.
Scala or some functional language (with object-like features) would be perfect for implementing this algorithm. Hope this helps or just sparks some ideas.
Since you actually want to generate the permutations instead of just counting them, the best complexity you can hope for is O(size_of_output).
Here's a good solution in java that meets that bound and runs very quickly, while consuming negligible space. It first sorts the letters to find the lexographically smallest permutation, and then generates all permutations in lexographic order.
It's known as the Pandita algorithm: https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order
import java.util.Arrays;
import java.util.function.Consumer;
public class UniquePermutations
{
static void generateUniquePermutations(String s, Consumer<String> consumer)
{
char[] array = s.toCharArray();
Arrays.sort(array);
for (;;)
{
consumer.accept(String.valueOf(array));
int changePos=array.length-2;
while (changePos>=0 && array[changePos]>=array[changePos+1])
--changePos;
if (changePos<0)
break; //all done
int swapPos=changePos+1;
while(swapPos+1 < array.length && array[swapPos+1]>array[changePos])
++swapPos;
char t = array[changePos];
array[changePos] = array[swapPos];
array[swapPos] = t;
for (int i=changePos+1, j = array.length-1; i < j; ++i,--j)
{
t = array[i];
array[i] = array[j];
array[j] = t;
}
}
}
public static void main (String[] args) throws java.lang.Exception
{
StringBuilder line = new StringBuilder();
generateUniquePermutations("banana", s->{
if (line.length() > 0)
{
if (line.length() + s.length() >= 75)
{
System.out.println(line.toString());
line.setLength(0);
}
else
line.append(" ");
}
line.append(s);
});
System.out.println(line);
}
}
Here is the output:
aaabnn aaanbn aaannb aabann aabnan aabnna aanabn aananb aanban aanbna
aannab aannba abaann abanan abanna abnaan abnana abnnaa anaabn anaanb
anaban anabna ananab ananba anbaan anbana anbnaa annaab annaba annbaa
baaann baanan baanna banaan banana bannaa bnaaan bnaana bnanaa bnnaaa
naaabn naaanb naaban naabna naanab naanba nabaan nabana nabnaa nanaab
nanaba nanbaa nbaaan nbaana nbanaa nbnaaa nnaaab nnaaba nnabaa nnbaaa

Finding the Space Complexitiy for these methods

I'm familiar with running time for both of the following methods which is O(N). However, I'm not familiar with space complexity. Since these methods consists simply of assignment statements, comparison statements, and loops, I assume that the space complexity is just O(1), but want to make sure. Thank you.
//first method
public static <E> Node<E> buildList(E[] items) {
Node<E> head = null;
if (items!=null && items.length>0) {
head = new Node<E> (items[0], null);
Node<E> tail = head;
for (int i=1; i<items.length; i++) {
tail.next = new Node<E>(items[i], null);
tail = tail.next;
}
}
return head;
}
//second method
public static <E> int getLength(Node<E> head) {
int length = 0;
Node<E> node = head;
while (node!=null) {
length++;
node = node.next;
}
return length;
}
As described in this post, space complexity is related to the amount of memory used by the algorithm, depending on the size of the input.
For instance an algorithm with O(1) space complexity uses a fixed amount of memory, independently of the input size. This is the case of your second algorithm that basically only uses a few extra variables.
An algorithm with O(n) space complexity uses an amount of memory proportional to its input size. This is the case of your first algorithm that performs one allocation (one new) for each element of the input.

Calculate Space and Time complexity and Improve efficiency of this program

Problem
Find a list of non repeating number in a array of repeating numbers.
My Solution
public static int[] FindNonRepeatedNumber(int[] input)
{
List<int> nonRepeated = new List<int>();
bool repeated = false;
for (int i = 0; i < input.Length; i++)
{
repeated = false;
for (int j = 0; j < input.Length; j++)
{
if ((input[i] == input[j]) && (i != j))
{
//this means the element is repeated.
repeated = true;
break;
}
}
if (!repeated)
{
nonRepeated.Add(input[i]);
}
}
return nonRepeated.ToArray();
}
Time and space complexity
Time complexity = O(n^2)
Space complexity = O(n)
I am not sure with the above calculated time complexity, also how can I make this program more efficient and fast.
The complexity of the Algorithm you provided is O(n^2).
Use Hashmaps to improve the algorithm. The Psuedo code is as follows:
public static int[] FindNonRepeatedNumbers(int[] A)
{
Hashtable<int, int> testMap= new Hashtable<int, int>();
for (Entry<Integer, String> entry : testMap.entrySet()) {
tmp=testMap.get(A[i]);
testMap.put(A[i],tmp+1);
}
/* Elements that are not repeated are:
Set set = teatMap.entrySet();
// Get an iterator
Iterator i = set.iterator();
// Display elements
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
if(me.getValue() >1)
{
System.out.println(me.getValue());
}
}
Operation:
What I did here is I used Hashmaps with keys to the hashmaps being the elements of the input array. The values for the hashmaps are like counters for each element. So if an element occurs once then the value for that key is 1 and the key value is subsequently incremented based on recurrence of element in input array.
So finally you just check your hashmap and then display elements with hashvalue 1 which are non-repated elements. The time complexity for this algorithm is O(k) for creating hashmap and O(k) for searching, if the input array length is k. This is much faster than O(n^2). The worst case is when there are no repeated elements at all. The psuedo code might be messy but this approach is the best way I could think of.
Time complexity O(n) means you can't have an inner loop. A full inner loop is O(n^2).
two pointers. begining and end. increment begining when same letters reached and store the start and end pos ,length for reference... increment end otherwise.. keep doing this til end of list..compare all the outputs and you should have the longest continuous list of unique numbers. I hope this is what the question required. Linear algo.
void longestcontinuousunique(int arr[])
{
int start=0;
int end =0;
while (end! =arr.length())
{
if(arr[start] == arr[end])
{
start++;
savetolist(start,end,end-start);
}
else
end++
}
return maxelementof(savedlist);
}

Resources