when dealing with wordBreak problem, I found this solution is really concise. But not sure about the time complexity. anyone can help?
my understanding is worst case, O(n*k), n is the size of the wordDict, and k is the length of the String.
class Solution {
public boolean wordBreak(String s, List<String> wordDict) {
return wordBreak(s, wordDict, new HashMap<String, Boolean>());
}
private boolean wordBreak(String s, List<String> wordDict, Map<String, Boolean> memo) {
if (s == null) return false;
if (s.isEmpty()) return true;
if (memo.containsKey(s)) return memo.get(s);
for (String dict : wordDict) { //number of words O(n)
//startsWith is bounded by the length of dict word, avg is O(m), can be ignored
//substring is bounded by the length of dict word, avg is O(k), k is the length of s
//wordBreak will be executed k/m times, k is the length of s, worse case k times... when a single letter is in the dict
if (s.startsWith(dict) && wordBreak(s.substring(dict.length()), wordDict, memo)) {
memo.put(s, true);
return true;
}
}
memo.put(s, false);
return false;
}
}
It's worse than O(nk) for several reasons:
You ignore "m", but m is Omega(log k). (Because k < |A|^(m+1) where |A| is the size of your alphabet).
s.substring is probably O(n). Your code looks like Java, and it's O(n) in Java.
Even if s.substring were linear, Your Map requires the string to be hashed, so your map operations are O(n) (where note carefully -- n is the size of the string rather than the size of the hashtable like it would normally be).
Probably this means you have complexity O(n^2k*logk).
You can fix 3 easily -- you can use s.length rather than s as the key to your hashtable.
Problem 2 is easy but slightly annoying to fix -- rather than slicing your string, you can use a variable that indexes into the string. You probably have to re-write startsWith yourself to using this index (or use a trie -- see below). If your programming language has an O(1) slice operation (for example, string_view in C++) then you could use that instead.
Problem 1 is only theoretical, since for real word lists, m is really small compared to either the length of the dictionary or the potential length of input strings.
Note that using a trie for the dictionary rather than a word list is likely to result in a huge time improvement, with realistic examples being linear excluding dictionary construction (although worst-case examples where the dictionary and input strings are chosen maliciously will be O(nk)).
Related
Design an algorithm that sorts n integers where there are duplicates. The total number of different numbers is k. Your algorithm should have time complexity O(n + k*log(k)). The expected time is enough. For which values of k does the algorithm become linear?
I am not able to come up with a sorting algorithm for integers which satisfies the condition that it must be O(n + k*log(k)). I am not a very advanced programmer but I was in the problem before this one supposed to come up with an algorithm for all numbers xi in a list, 0 ≤ xi ≤ m such that the algorithm was O(n+m), where n was the number of elements in the list and m was the value of the biggest integer in the list. I solved that problem easily by using counting sort but I struggle with this problem. The condition that makes it the most difficult for me is the term k*log(k) under the ordo notation if that was n*log(n) instead I would be able to use merge sort, right? But that's not possible now so any ideas would be very helpful.
Thanks in advance!
Here is a possible solution:
Using a hash table, count the number of unique values and the number of duplicates of each value. This should have a complexity of O(n).
Enumerate the hashtable, storing the unique values into a temporary array. Complexity is O(k).
Sort this array with a standard algorithm such as mergesort: complexity is O(k.log(k)).
Create the resulting array by replicating the elements of the sorted array of unique values each the number of times stored in the hash table. complexity is O(n) + O(k).
Combined complexity is O(n + k.log(k)).
For example, if k is a small constant, sorting an array of n values converges toward linear time as n becomes larger and larger.
If during the first phase, where k is computed incrementally, it appears that k is not significantly smaller than n, drop the hash table and just sort the original array with a standard algorithm.
The runtime of O(n + k*log(k) indicates (like addition in runtimes often does) that you have 2 subroutines, one which runes in O(n) and the other that runs in O(k*log(k)).
You can first count the frequency of the elements in O(n) (for example in a Hashmap, look this up if youre not familiar with it, it's very useful).
Then you just sort the unique elements, from which there are k. This sorting runs in O(k*log(k)), use any sorting algorithm you want.
At the end replace the single unique elements by how often they actually appeared, by looking this up in the map you created in step 1.
A possible Java solution an be like this:
public List<Integer> sortArrayWithDuplicates(List<Integer> arr) {
// O(n)
Set<Integer> set = new HashSet<>(arr);
Map<Integer, Integer> freqMap = new HashMap<>();
for(Integer i: arr) {
freqMap.put(i, freqMap.getOrDefault(i, 0) + 1);
}
List<Integer> withoutDups = new ArrayList<>(set);
// Sorting => O(k(log(k)))
// as there are k different elements
Arrays.sort(withoutDups);
List<Integer> result = new ArrayList<>();
for(Integer i : withoutDups) {
int c = freqMap.get(i);
for(int j = 0; j < c; j++) {
result.add(i);
}
}
// return the result
return result;
}
The time complexity of the above code is O(n + k*log(k)) and solution is in the same line as answered above.
The time complexity of this algorithm to compute permutations recursively should be O(n!*n), but I am not 100% sure about the space complexity.
There are n recursions, and the biggest space required for a recursion is n (space of every permutation * n! (number of permutations). Is the space complexity of the algorithm` O(n!*n^2)?
static List<String> permutations(String word) {
if (word.length() == 1)
return Arrays.asList(word);
String firstCharacter = word.substring(0, 1);
String rest = word.substring(1);
List<String> permutationsOfRest = permutations(rest);
List<String> permutations = new ArrayList<String>(); //or hashset if I don’t want duplicates
for (String permutationOfRest : permutationsOfRest) {
for (int i = 0; i <= permutationOfRest.length(); i++) {
permutations.add(permutationOfRest.substring(0, i) + firstCharacter + permutationOfRest.substring(i));
}
}
return permutations;
}
No, the space complexity is "just" O(n! × n), since you don't simultaneously hold onto all recursive calls' permutationsOfRest / permutations. (You do have two at a time, but that's just a constant factor, so isn't relevant to the asymptotic complexity.)
Note that if you don't actually need a List<String>, it might be better to wrap things up as a custom Iterator<String> implementation, so that you don't need to keep all permutations in memory at once, and don't need to pre-calculate all permutations before you start doing anything with any of them. (Of course, that's a bit trickier to implement, so it's not worth it if the major use of the Iterator<String> will just be to pre-populate a List<String> anyway.)
Say I want to store a dictionary of strings and I want to know if some string exists or not. I can use a Trie or a HashMap. The HashMap has a time complexity of O(1) with a high probability while the Trie in that case would have a time complexity of O(k) where k is the length of the string.
Now my question is: Doesn't calculating the hash value of the string have a time complexity of O(k) thus making the complexity of the HashMap the same? If not, why?
The way I see it is that a Trie here would have lower time complexity than a HashMap for looking up a string since the HashMap -in addition to calculating the hash value- might hit collisions. Am I missing something?
Update:
Which data structure would you use to optimize for speed when constructing a dictionary?
Apart from the complexity of implementation of a trie, certain optimizations are done in the implementation of the hashCode method that determines the buckets in a hash table. For java.lang.String, an immutable class, here is what JDK-8 does:
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
Thus, it is cached (and is thread-safe). Once calculated, the hash code of a string need not be recalculated. This saves you from having to spend the O(k) time in the case of hash table (or hash set, hash map).
While implementing dictionaries, I think tries shine where you are more interested in possible partial matches rather than exact matches. Generally speaking hash based solutions work best in case of exact matches.
The time complexity of performing operations on a hash table is typically measured in the number of hashes and compares that have to be performed. On expectation, the cost, when measured this way, is O(1) because on expectation only a constant number of hashed and compares must be used.
To determine the cost of using a hash table for strings, you do indeed need to factor in the cost of these operations, which will be O(k) each for a string of length k. Therefore, the cost of a hash table operation on a string is O(1) · O(k) = O(k), matching the trie cost, though only on expectation and with a different constant factor.
During a recent job interview, I was asked to give a solution to the following problem:
Given a string s (without spaces) and a dictionary, return the words in the dictionary that compose the string.
For example, s= peachpie, dic= {peach, pie}, result={peach, pie}.
I will ask the the decision variation of this problem:
if s can be composed of words in the
dictionary return yes, otherwise
return no.
My solution to this was in backtracking (written in Java)
public static boolean words(String s, Set<String> dictionary)
{
if ("".equals(s))
return true;
for (int i=0; i <= s.length(); i++)
{
String pre = prefix(s,i); // returns s[0..i-1]
String suf = suffix(s,i); // returns s[i..s.len]
if (dictionary.contains(pre) && words(suf, dictionary))
return true;
}
return false;
}
public static void main(String[] args) {
Set<String> dic = new HashSet<String>();
dic.add("peach");
dic.add("pie");
dic.add("1");
System.out.println(words("peachpie1", dic)); // true
System.out.println(words("peachpie2", dic)); // false
}
What is the time complexity of this solution?
I'm calling recursively in the for loop, but only for the prefix's that are in the dictionary.
Any idea's?
You can easily create a case where program takes at least exponential time to complete. Let's just take a word aaa...aaab, where a is repeated n times. Dictionary will contain only two words, a and aa.
b in the end ensure that function never finds a match and thus never exits prematurely.
On each words execution, two recursive calls will be spawned: with suffix(s, 1) and suffix(s, 2). Execution time, therefore, grows like fibonacci numbers: t(n) = t(n - 1) + t(n - 2). (You can verify it by inserting a counter.) So, complexity is certainly not polynomial. (and this is not even the worst possible input)
But you can easily improve your solution with Memoization. Notice, that output of function words depends on one thing only: at which position in original string we're starting. E.e., if we have a string abcdefg and words(5) is called, it doesn't matter how exactly abcde is composed (as ab+c+de or a+b+c+d+e or something else). Thus, we don't have to recalculate words("fg") each time.
In the primitive version, this can be done like this
public static boolean words(String s, Set<String> dictionary) {
if (processed.contains(s)) {
// we've already processed string 's' with no luck
return false;
}
// your normal computations
// ...
// if no match found, add 's' to the list of checked inputs
processed.add(s);
return false;
}
PS Still, I do encourage you to change words(String) to words(int). This way you'll be able to store results in array and even transform the whole algorithm to DP (which would make it much simpler).
edit 2
Since I have not much to do besides work, here's the DP (dynamic programming) solution. Same idea as above.
String s = "peachpie1";
int n = s.length();
boolean[] a = new boolean[n + 1];
// a[i] tells whether s[i..n-1] can be composed from words in the dictionary
a[n] = true; // always can compose empty string
for (int start = n - 1; start >= 0; --start) {
for (String word : dictionary) {
if (start + word.length() <= n && a[start + word.length()]) {
// check if 'word' is a prefix of s[start..n-1]
String test = s.substring(start, start + word.length());
if (test.equals(word)) {
a[start] = true;
break;
}
}
}
}
System.out.println(a[0]);
Here's a dynamic programming solution that counts the total number of ways to decompose the string into words. It solves your original problem, since the string is decomposable if the number of decompositions is positive.
def count_decompositions(dictionary, word):
n = len(word)
results = [1] + [0] * n
for i in xrange(1, n + 1):
for j in xrange(i):
if word[n - i:n - j] in dictionary:
results[i] += results[j]
return results[n]
Storage O(n), and running time O(n^2).
The loop on all the string will take n. Finding all suffixes and prefixes will take n + (n - 1) + (n - 2) + .... + 1 (n for first call of words, (n - 1) for second and so on), which is
n^2 - SUM(1..n) = n^2 - (n^2 + n)/2 = n^2 / 2 - n / 2
which in complexity theory is equivalent to n^2.
Checking for existence in HashSet in normal case is Theta(1), but in worst case it is O(n).
So, normal case complexity of your algorithm is Theta(n^2), and worst case - O(n^3).
EDIT: I confused order of recursion and iteration, so this answer is wrong. Actually time depends on n exponentially (compare with computation of Fibonacci numbers, for example).
More interesting thing is the question how to improve your algorithm. Traditionally for string operations suffix tree is used. You can build suffix tree with your string and mark all the nodes as "untracked" at the start of the algo. Then go through the strings in a set and each time some node is used, mark it as "tracked". If all strings in the set are found in the tree, it will mean, that the original string contains all the substrings from set. And if all the nodes are marked as tracked, it will mean, that string consists only of substring from set.
Actual complexity of this approach depends on many factors like tree building algorithm, but at least it allows to divide the problem into several independent subtasks and so measure final complexity by complexity of the most expensive subtask.
In the book "The Algorithm Design Manual" by Skiena, computing the mode (most frequent element) of a set, is said to have a Ω(n log n) lower bound (this puzzles me), but also (correctly i guess) that no faster worst-case algorithm exists for computing the mode. I'm only puzzled by the lower bound being Ω(n log n).
See the page of the book on Google Books
But surely this could in some cases be computed in linear time (best case), e.g. by Java code like below (finds the most frequent character in a string), the "trick" being to count occurences using a hashtable. This seems obvious.
So, what am I missing in my understanding of the problem?
EDIT: (Mystery solved) As StriplingWarrior points out, the lower bound holds if only comparisons are used, i.e. no indexing of memory, see also: http://en.wikipedia.org/wiki/Element_distinctness_problem
// Linear time
char computeMode(String input) {
// initialize currentMode to first char
char[] chars = input.toCharArray();
char currentMode = chars[0];
int currentModeCount = 0;
HashMap<Character, Integer> counts = new HashMap<Character, Integer>();
for(char character : chars) {
int count = putget(counts, character); // occurences so far
// test whether character should be the new currentMode
if(count > currentModeCount) {
currentMode = character;
currentModeCount = count; // also save the count
}
}
return currentMode;
}
// Constant time
int putget(HashMap<Character, Integer> map, char character) {
if(!map.containsKey(character)) {
// if character not seen before, initialize to zero
map.put(character, 0);
}
// increment
int newValue = map.get(character) + 1;
map.put(character, newValue);
return newValue;
}
The author seems to be basing his logic on the assumption that comparison is the only operation available to you. Using a Hash-based data structure sort of gets around this by reducing the likelihood of needing to do comparisons in most cases to the point where you can basically do this in constant time.
However, if the numbers were hand-picked to always produce hash collisions, you would end up effectively turning your hash set into a list, which would make your algorithm into O(n²). As the author points out, simply sorting the values into a list first provides the best guaranteed algorithm, even though in most cases a hash set would be preferable.
So, what am I missing in my understanding of the problem?
In many particular cases, an array or hash table suffices. In "the general case" it does not, because hash table access is not always constant time.
In order to guarantee constant time access, you must be able to guarantee that the number of keys that can possibly end up in each bin is bounded by some constant. For characters this is fairly easy, but if the set elements were, say, doubles or strings, it would not be (except in the purely academic sense that there are, e.g., a finite number of double values).
Hash table lookups are amortized constant time, i.e., in general, the overall cost of looking up n random keys is O(n). In the worst case, they can be linear. Therefore, while in general they could reduce the order of mode calculation to O(n), in the worst case it would increase the order of mode calculation to O(n^2).