What is the complexity of creating a lexicographic tree - algorithm

What is the complexity of creating a lexicographic tree?

If you create a prefix tree out of your input, you can perform this query in constant time.
Edit
The query is linear in the length of the search string. I meant that it was constant with regard to the size of the word list.

The appropriate data structure for this is probably a sorted list. In that case this becomes a bisection search problem, so O(log n).

As Gabe mentioned above Trie is good solution but it's little bit hard to implement for dictionaries with large number of words. If O(n log n) algorithm is OK for you, you can solve this problem with binary search. Here is code written in C:
char dict[n][m]; // where n is number of words in dictionary and
// m is maximum possible length of word
char word[m]; // it's your word
int l = -1, r = n;
while(l+1 < r) {
int k = (l+r)/2;
if(strcmp(dict[k], word) < 0) l = k;
else r = k;
}
int len = strlen(word);
l++; // first word's index with greater or equal prefix then word is l+1
bool matches = (strlen(word[l]) >= len);
for(int i = 0; i < len && matches; i++) {
if(word[i] != dict[l][i]) {
matches = 0;
}
}
if(matches) {
printf("given word is prefix of %dth word.", l);
} else {
printf("given word isn't in dictinary.");
}

just run with a simple loop and check whether each word start with whatever.
in almost every language there is a build in function for check whether one string start with another.
the complexity is O(log n), while n being the number of the words in the dictionary.

Related

How to find which position have prefix sum M in BIT?

Suppose I have created a Binary Indexed Tree with prefix sum of length N. The main array contains only 0s and 1s. Now I want to find which index has a prefix sum M(That means have exactly M 1s).
Like my array is a[]={1,0,0,1,1};
prefix-sum would look like {1,1,1,2,3};
now 3rd index(0 based) has prefix sum of 2.
How can i find this index with BIT?
Thanks in advance.
Why can't you do binary search for that index ? It will take O(log n * log n) time. Here is a simple implementation -
int findIndex(int sum) {
int l = 1, r = n;
while(l <= r) {
int mid = l + r >> 1;
int This = read(mid);
if(This == sum) return mid;
else if(This < sum) l = mid+1;
else r = mid-1;
} return -1;
}
I used the read(x) function. That should return the sum of interval [1,x] in O(log n) time. The overall complexity will be O(log^2 n).
Hope it helps.
If elements in array a[n] is non-negative (and the prefix sum array p[n]is non-decreasing), you can locate an element by prefix sum as query prefix sum by index from BIT, which takes O(logn) time. The only difference is that you need to compare the prefix sum you get at each level to your input to decide which subtree you need to search subsequently -- if the prefix sum is smaller than your input, continue searching the left subtree; otherwise, search the right subtree; repeat this process until reach a node that sums up the desired prefix sum, in which case return the index of the node. The idea is analogous to binary search because the prefix sums are naturally sorted in BIT. If there are negative values in a[n], this method won't work since prefix sums in BIT won't be sorted in this case.

Rabin Karp Algorithm - How is the worst case O(m*n) for the given input?

In the Top Coder's code of RK algorithm:
// correctly calculates a mod b even if a < 0
function int_mod(int a, int b)
{
return (a % b + b) % b;
}
function Rabin_Karp(text[], pattern[])
{
// let n be the size of the text, m the size of the
// pattern, B - the base of the numeral system,
// and M - a big enough prime number
if(n < m) return; // no match is possible
// calculate the hash value of the pattern
hp = 0;
for(i = 0; i < m; i++)
hp = int_mod(hp * B + pattern[i], M);
// calculate the hash value of the first segment
// of the text of length m
ht = 0;
for(i = 0; i < m; i++)
ht = int_mod(ht * B + text[i], M);
if(ht == hp) check character by character if the first
segment of the text matches the pattern;
// start the "rolling hash" - for every next character in
// the text calculate the hash value of the new segment
// of length m; E = (Bm-1) modulo M
for(i = m; i < n; i++) {
ht = int_mod(ht - int_mod(text[i - m] * E, M), M);
ht = int_mod(ht * B, M);
ht = int_mod(ht + text[i], M);
if(ht == hp) check character by character if the
current segment of the text matches
the pattern;
}
}
It is written that
Unfortunately, there are still cases when we will have to run the entire inner loop of the “naive” method for every starting position in the text – for example, when searching for the pattern “aaa” in the string “aaaaaaaaaaaaaaaaaaaaaaaaa” — so in the worst case we will still need (n * m) iterations.
But won't the algorithm stop at the first iteration itself - as when it will see that first three alphabets are 'a' which matches the needle ?
Rabin-Karp algorithm keeps computing hash values of all the substring of text of size M and matches it with that of the hash value of the pattern. Now, there can be multiple substrings having a same hash value.
So when the hash values of the pattern and some substring of the text match, we need to iterate over them again just to make sure if they are actually same.
In case of pattern = "AAA" and text = "AAAAAAAAAAAAA", there are O(n) substrings matching the hash value of the pattern. And for every match, we need to iterate over to confirm in O(m) time; hence the worst-case complexity O(n*m).
Suppose the string we are searching for is not "aaa" but rather some other string whose hash is the same as the hash of "aaa". Then the comparison will be needed at every point.
Of course, we would expect the comparison to fail earlier than m characters, but it could require o(m) characters.
Having said that, a common use of RK is to find all (overlapping) instances, in which case the example cited would clearly be o(mn).

Divide and Conquer Algorithms- Binary search variant

This is a practice question for the understanding of Divide and conquer algorithms.
You are given an array of N sorted integers. All the elements are distinct except one
element is repeated twice. Design an O (log N) algorithm to find that element.
I get that array needs to be divided and see if an equal counterpart is found in the next index, some variant of binary search, I believe. But I can't find any solution or guidance regarding that.
You can not do it in O(log n) time because at any step even if u divide the array in 2 parts, u can not decide which part to consider for further processing and which should be left.
On the other hand if the consecutive numbers are all present in the array then by looking at the index and the value in the index we can decide if the duplicate number is in left side or right side of the array.
D&C should look something like this
int Twice (int a[],int i, int j) {
if (i >= j)
return -1;
int k = (i+j)/2;
if (a[k] == a[k+1])
return k;
if (a[k] == a[k-1])
return k-1;
int m = Twice(a,i,k-1);
int n = Twice(a,k+1,j);
return m != -1 ? m : n;
}
int Twice (int a[], int n) {
return Twice(a,0,n);
}
But it has complexity O(n). As it is said above, it is not possible to find O(lg n) algorithm for this problem.

Find if a string can be obtained from a matrix of characters

Given a matrix of characters and a string, find whether the string can be obtained from the matrix. From each character in the matrix, we can move up/down/right/left. For example, if the matrix[3][4] is:
o f a s
l l q w
z o w k
and the string is follow, then the function should return true.
The only approach I can think of is a backtracking algorithm that searches whether the word is possible or not. Is there any other faster algorithm to approach this problem?
And suppose I have a lot of queries (on finding whether a word exists or not). Then can there be some preprocessing done to answer the queries faster?
You can solve this using DFS. Let's define a graph for the problem. The vertices of the graph will comprise of the cell of a combination of cell of the matrix and a length of prefix of the string we are searching for. When we are at a given vertex this will mean that all the characters of the specified prefix were matched so far and that we currently are at the given cell.
We define edges as connecting cells adjacent by a side and doing a "valid" transaction. That is the cell we are going to should be the next in the string we are searching for.
To solve the problem we do a DFS from all cells that contain the first letter of the string and prefix length 1(meaning we've matched this first letter). From there on we continue the search and on each step we compute which are the edges going out of the current position(cell/string prefix length combination). We terminate the first time we reach a prefix of length L - the length of the string.
Note that DFS may be considered backtracking but what is more important is to keep track of the nodes in the graph we've already visited. Thus the overall complexity is bound by N * M * L where N and M are the dimensions of the matrix and L - the length of the string.
You could of course find all possible strings (start with a charater and go as far as you can). This can be done with a recursive function.
grid:
abc
def
ghi
strings:
abcfedghi
abcfehgd
abcfehi
abedghif
abefc
abefighd
abehgd
abehifc
ad...
...
Then sort these strings and when looking for a word use a binary search on the list. (When looking for an n letter word you would of course only consider the first n letters of the strings in the list.) A lot of preparation and much memory needed, but searching will be fast. So if you use the same grid again and again, the preparation may finally pay :-)
Below is the pseudo code for finding if the given string is present in a given matrix. Here visited keeps track of the location of the string in the matrix and it uses backtracking for keeping track of that. I hope this is helpful.
bool isSafe(matrix[n][m], int visited[n][m], int i, int j, int n, int m){
if(i<m && j<n && i>=0 && j>=0 && visited[i][j] == 0)
return true;
return false;
}
bool dfs(char matrix[n][m], int i, int j, int visited[n][m], char str[], int index){
if(index == strlen(str))
return true;
// row moves
int x[] = {-1, 0, 1, -1};
// col moves
int y[] = {0, -1, 1, 0};
if(str[index] == matrix[i][j]){
visited[i][j] = 1;
// for all the neighbours
for(int k = 0; k<4; k++){
// mark given position visited
next_x = i + x[k];
next_y = j + y[k];
if(isSafe(matrix, visited, next_x, next_y, n, m)){
if(dfs(matrix, next_x, next_y, visited, str, index+1) == true)
return true;
}
}
// backtrack
visited[i][j] = 0;
}
return false;
}
bool isPresent(char matrix[n][m], char str[]){
// visited initialized to 0
int visited[n][m] = {0};
for(int i=0;i<n;i++)
for(int j=0;j<n;j++){
if(dfs(matrix, i, j, n, m ,visited, str, 0) == true)
return true;
}
return false;
}

Algorithm to find the smallest non negative integer that is not in a list

Given a list of integers, how can I best find an integer that is not in the list?
The list can potentially be very large, and the integers might be large (i.e. BigIntegers, not just 32-bit ints).
If it makes any difference, the list is "probably" sorted, i.e. 99% of the time it will be sorted, but I cannot rely on always being sorted.
Edit -
To clarify, given the list {0, 1, 3, 4, 7}, examples of acceptable solutions would be -2, 2, 8 and 10012, but I would prefer to find the smallest, non-negative solution (i.e. 2) if there is an algorithm that can find it without needing to sort the entire list.
One easy way would be to iterate the list to get the highest value n, then you know that n+1 is not in the list.
Edit:
A method to find the smallest positive unused number would be to start from zero and scan the list for that number, starting over and increase if you find the number. To make it more efficient, and to make use of the high probability of the list being sorted, you can move numbers that are smaller than the current to an unused part of the list.
This method uses the beginning of the list as storage space for lower numbers, the startIndex variable keeps track of where the relevant numbers start:
public static int GetSmallest(int[] items) {
int startIndex = 0;
int result = 0;
int i = 0;
while (i < items.Length) {
if (items[i] == result) {
result++;
i = startIndex;
} else {
if (items[i] < result) {
if (i != startIndex) {
int temp = items[startIndex];
items[startIndex] = items[i];
items[i] = temp;
}
startIndex++;
}
i++;
}
}
return result;
}
I made a performance test where I created lists with 100000 random numbers from 0 to 19999, which makes the average lowest number around 150. On test runs (with 1000 test lists each), the method found the smallest number in unsorted lists by average in 8.2 ms., and in sorted lists by average in 0.32 ms.
(I haven't checked in what state the method leaves the list, as it may swap some items in it. It leaves the list containing the same items, at least, and as it moves smaller values down the list I think that it should actually become more sorted for each search.)
If the number doesn't have any restrictions, then you can do a linear search to find the maximum value in the list and return the number that is one larger.
If the number does have restrictions (e.g. max+1 and min-1 could overflow), then you can use a sorting algorithm that works well on partially sorted data. Then go through the list and find the first pair of numbers v_i and v_{i+1} that are not consecutive. Return v_i + 1.
To get the smallest non-negative integer (based on the edit in the question), you can either:
Sort the list using a partial sort as above. Binary search the list for 0. Iterate through the list from this value until you find a "gap" between two numbers. If you get to the end of the list, return the last value + 1.
Insert the values into a hash table. Then iterate from 0 upwards until you find an integer not in the list.
Unless it is sorted you will have to do a linear search going item by item until you find a match or you reach the end of the list. If you can guarantee it is sorted you could always use the array method of BinarySearch or just roll your own binary search.
Or like Jason mentioned there is always the option of using a Hashtable.
"probably sorted" means you have to treat it as being completely unsorted. If of course you could guarantee it was sorted this is simple. Just look at the first or last element and add or subtract 1.
I got 100% in both correctness & performance,
You should use quick sorting which is N log(N) complexity.
Here you go...
public int solution(int[] A) {
if (A != null && A.length > 0) {
quickSort(A, 0, A.length - 1);
}
int result = 1;
if (A.length == 1 && A[0] < 0) {
return result;
}
for (int i = 0; i < A.length; i++) {
if (A[i] <= 0) {
continue;
}
if (A[i] == result) {
result++;
} else if (A[i] < result) {
continue;
} else if (A[i] > result) {
return result;
}
}
return result;
}
private void quickSort(int[] numbers, int low, int high) {
int i = low, j = high;
int pivot = numbers[low + (high - low) / 2];
while (i <= j) {
while (numbers[i] < pivot) {
i++;
}
while (numbers[j] > pivot) {
j--;
}
if (i <= j) {
exchange(numbers, i, j);
i++;
j--;
}
}
// Recursion
if (low < j)
quickSort(numbers, low, j);
if (i < high)
quickSort(numbers, i, high);
}
private void exchange(int[] numbers, int i, int j) {
int temp = numbers[i];
numbers[i] = numbers[j];
numbers[j] = temp;
}
Theoretically, find the max and add 1. Assuming you're constrained by the max value of the BigInteger type, sort the list if unsorted, and look for gaps.
Are you looking for an on-line algorithm (since you say the input is arbitrarily large)? If so, take a look at Odds algorithm.
Otherwise, as already suggested, hash the input, search and turn on/off elements of boolean set (the hash indexes into the set).
There are several approaches:
find the biggest int in the list and store it in x. x+1 will not be in the list. The same applies with using min() and x-1.
When N is the size of the list, allocate an int array with the size (N+31)/32. For each element in the list, set the bit v&31 (where v is the value of the element) of the integer at array index i/32. Ignore values where i/32 >= array.length. Now search for the first array item which is '!= 0xFFFFFFFF' (for 32bit integers).
If you can't guarantee it is sorted, then you have a best possible time efficiency of O(N) as you have to look at every element to make sure your final choice is not there. So the question is then:
Can it be done in O(N)?
What is the best space efficiency?
Chris Doggett's solution of find the max and add 1 is both O(N) and space efficient (O(1) memory usage)
If you want only probably the best answer then it is a different question.
Unless you are 100% sure it is sorted, the quickest algorithm still has to look at each number in the list at least once to at least verify that a number is not in the list.
Assuming this is the problem I'm thinking of:
You have a set of all ints in the range 1 to n, but one of those ints is missing. Tell me which of int is missing.
This is a pretty easy problem to solve with some simple math knowledge. It's known that the sum of the range 1 .. n is equal to n(n+1) / 2. So, let W = n(n+1) / 2 and let Y = the sum of the numbers in your set. The integer that is missing from your set, X, would then be X = W - Y.
Note: SO needs to support MathML
If this isn't that problem, or if it's more general, then one of the other solutions is probably right. I just can't really tell from the question since it's kind of vague.
Edit: Well, since the edit, I can see that my answer is absolutely wrong. Fun math, none-the-less.
I've solved this using Linq and a binary search. I got 100% across the board. Here's my code:
using System.Collections.Generic;
using System.Linq;
class Solution {
public int solution(int[] A) {
if (A == null) {
return 1;
} else {
if (A.Length == 0) {
return 1;
}
}
List<int> list_test = new List<int>(A);
list_test = list_test.Distinct().ToList();
list_test = list_test.Where(i => i > 0).ToList();
list_test.Sort();
if (list_test.Count == 0) {
return 1;
}
int lastValue = list_test[list_test.Count - 1];
if (lastValue <= 0) {
return 1;
}
int firstValue = list_test[0];
if (firstValue > 1) {
return 1;
}
return BinarySearchList(list_test);
}
int BinarySearchList(List<int> list) {
int returnable = 0;
int tempIndex;
int[] boundaries = new int[2] { 0, list.Count - 1 };
int testCounter = 0;
while (returnable == 0 && testCounter < 2000) {
tempIndex = (boundaries[0] + boundaries[1]) / 2;
if (tempIndex != boundaries[0]) {
if (list[tempIndex] > tempIndex + 1) {
boundaries[1] = tempIndex;
} else {
boundaries[0] = tempIndex;
}
} else {
if (list[tempIndex] > tempIndex + 1) {
returnable = tempIndex + 1;
} else {
returnable = tempIndex + 2;
}
}
testCounter++;
}
if (returnable == list[list.Count - 1]) {
returnable++;
}
return returnable;
}
}
The longest execution time was 0.08s on the Large_2 test
You need the list to be sorted. That means either knowing it is sorted, or sorting it.
Sort the list. Skip this step if the list is known to be sorted. O(n lg n)
Remove any duplicate elements. Skip this step if elements are already guaranteed distinct. O(n)
Let B be the position of 1 in the list using a binary search. O(lg n)
If 1 isn't in the list, return 1. Note that if all elements from 1 to n are in the list, then the element at B+n must be n+1. O(1)
Now perform a sortof binary search starting with min = B, max = end of the list. Call the position of the pivot P. If the element at P is greater than (P-B+1), recurse on the range [min, pivot], otherwise recurse on the range (pivot, max]. Continue until min=pivot=max O(lg n)
Your answer is (the element at pivot-1)+1, unless you are at the end of the list and (P-B+1) = B in which case it is the last element + 1. O(1)
This is very efficient if the list is already sorted and has distinct elements. You can do optimistic checks to make it faster when the list has only non-negative elements or when the list doesn't include the value 1.
Just gave an interview where they asked me this question. The answer to this problem can be found using worst case analysis. The upper bound for the smallest natural number present on the list would be length(list). This is because, the worst case for the smallest number present in the list given the length of the list is the list 0,1,2,3,4,5....length(list)-1.
Therefore for all lists, smallest number not present in the list is less than equal to length of the list. Therefore, initiate a list t with n=length(list)+1 zeros. Corresponding to every number i in the list (less than equal to the length of the list) mark assign the value 1 to t[i]. The index of the first zero in the list is the smallest number not present in the list. And since, the lower bound on this list n-1, for at least one index j

Resources