Understanding this Time Complexity issue (Cracking the coding interview) - algorithm

Question:
String joinWords(String[] words) {
String sentence = "";
for (String w : words) {
sentence = sentence + w;
}
return sentence;
}
So the solution of this is O(xn^2)
From what I understand, for every iteration, the amount of letters in the variable 'sentence' increases by one. Which, I think, will be O(n^2)?
Is the 'x' just the number of letters in 'words' array?
Answer: On each concatenation, a new copy of the string is created, and the two strings are copied over, character by character. The first iteration requires us to copy x characters. The second iteration requires copying 2x characters. The third iteration requires 3x, and so on. The total time therefore is O(x + 2x + ... x nx). This reduces to O(xn^2)

The answer is wrong in two ways. O(xn^2) doesn't exist. In big O you drop all constants. It would be O(n^2) if that was right (it isn't).
The next part depends on language, the implementation of + on strings and the string class itself. If the string class is mutable, then + should be O(n) assuming it has a decent implementation (doesn't cause a reallocatiion and copy on each use of +). If the string class is immutable, it depends on how the String is implemented- does it use a single character buffer for all data, or can it handle multiple character pointers in an ordered list? Of course even on the worst implementation it wouldn't be O(n^2), more like O(2n) which is O(n) (2 copies per iteration). Anyone giving me an answer of O(n^2) would be marked wrong. But really any modern String class implementation would be O(n) without a constant and be pretty much O(n*l) in space (where n is the number of words and l is the average word length).
class String {
String base; //used for appended strings
String additional; //used for appended strings
char baseData[]; //used for pure strings
String(String base, String additional) {
this.base = base;
this.additional = additional;
}
operator + (String newString) {
return new String(this, newString);
}
//As an example of how this works
int length() {
if(base != null) {
return base.length()+additional.length(); //This can be cached, and should be for efficiency.
}
else {
return baseData.length;
}
}
}
Notice that + is O(1). Yes, I know Java doesn't have operator overloading, the function is there to show how its implemented.

Related

Time and Space Complexity of this Palindrome Algorithm?

Can someone tell me the time and space complexity for this algorithm? Basically the function takes in a string and the function must return true if it's a palindrome (same backwards as it is forwards) or false if it is not.
I am thinking it is O(n) for both but please correct me if I am wrong.
function isPalindrome(string) {
var reversing = string.split("").reverse().join("")
return string === reversing
}
Your function has a time and space complexity of O(string.length) because it constructs an array of characters and then a new string with the characters in reverse order, with the same length as the original string. Comparing these strings has the same time complexity.
Note however that this works for single words but not for complete phrases: a phrase that can be read in both directions with the same letters, but not necessarily the same spacing is also a palindrome.
Here is an alternative version:
function isPalindrome(string) {
string = string.replace(/ /g, "");
var reverse = string.split("").reverse().join("");
return string === reverse;
}
This function has the same time and space complexity of O(string.length).

What is the Time and Space complexity of following solution?

Problem statement:
Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences.
Note:
The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.
Sample test case:
Input:
s = "catsanddog"
wordDict = ["cat", "cats", "and", "sand", "dog"]
Output:
[
"cats and dog",
"cat sand dog"
]
My Solution:
class Solution {
unordered_set<string> words;
unordered_map<string, vector<string> > memo;
public:
vector<string> getAllSentences(string s) {
if(s.size()==0){
return {""};
}
if(memo.count(s)) {
return memo[s];
}
string curWord = ""; vector<string> result;
for(int i = 0; i < s.size(); i++ ) {
curWord+=s[i];
if(words.count(curWord)) {
auto sentences = getAllSentences(s.substr(i+1));
for(string s : sentences) {
string sentence = curWord + ((int)s.size()>0? ((" ") + s) : "");
result.push_back(sentence);
}
}
}
return memo[s] = result;
}
vector<string> wordBreak(string s, vector<string>& wordDict) {
for(auto word : wordDict) {
words.insert(word);
}
return getAllSentences(s);
}
};
I am not sure about the time and space complexity. I think it should be 2^n where n is the length of given string s. Can anyone please help me to prove time and space complexity?
I have also some following questions:
If I don't use memo in the getAllSentences function what will be the
time complexity in this case?
Is there any better solution than this?
Let's try to go through the algorithm step by step but for specific wordDict to simplify the things.
So let wordDict be all the characters from a to z,
wordDict = ["a",..., "z"]
In this case if(words.count(curWord)) would be true every time when i = 0 and false otherwise.
Also, let's skip using memo cache (we'll add it later).
In the case above, we just got though string s recursively until we reach the end without any additional memory except result vector which gives the following:
time complexity is O(n!)
space complexity is O(1) - just 1 solution exists
where n - lenght of s
Now let's examine how using memo cache changes the situation in our case. Cache would contain n items - size of our string s which changes space complexity to O(n). Our time is the same since every there will be no hits by using memo cache.
This is the basis for us to move forward.
Now let's try to find how the things are changed if wordDict contains all the pairs of letters (and length of s is 2*something, so we could reach the end).
So, wordDict = ['aa','ab',...,'zz']
In this case we move forward with for 2 letters instead of 1 and everything else is the same, which gives us the following complexity withoug using memo cache:
time complexity is O((n/2)!)
space complexity is O(1) - just 1 solution exists
Memo cache would contain (n/2) items, giving a complexity of O(n) which also changes space complexity to O(n) but all the checks there are of different length.
Let's now imagine that wordDict contains both dictionaries we mentioned before ('a'...'z','aa'...'zz').
In this case we have the following complexity without using memo cache
time complexity is O((n)!) as we need to check the case for i=0 and i=1 which roughly doubles the number of checks we need to do for each step but on the other size it reduces the number of checks we have to do later since we move forward by 2 letters instead of one (this is the trickiest part for me).
Space complexity is ~O(2^n) since every additional char doubles the number of results.
Now let's think of the memo cache we have. It would be usefull for every 3 letters, because for example '...ab c...' gives the same as '...a bc...', so it reduces the number of calculations by 2 at every step, so our complexity would be the following
time complexity is roughly O((n/2)!) and we need O(2*n)=O(n) memory to store the memo. Let's also remember that in n/2 expression 2 reflects the cache effectiveness.
space complexity is O(2^n) - 2 here is a charateristic of the wordDict we've constructed
These were 3 cases for us to understand how the complexity is changing depending of the curcumstances. Now let's try to generalize it to the generic case:
time complexity is O((n/(l*e))!) where l = min length of words in wordDict, e - cache effectiveness (I would assume it 1 in general case but there might bt situations where it's different as we saw in the case above
space complexity is O(a^n) where a is a similarity of words in our wordDict, could be very very roughly estimated as P(h/l)=(h/l)! where h is max word length in a dictionary and l is min word length as (for example, if wordDict contains all combinations of up 3 letters, this gives us 3! combinations for every 6 letters)
This is how I see your approach and it's complexity.
As for improving the solution itself, I don't see any simple way to improve it. There might be an alternative way to divide the string in 3 parts and then processing each part separately but it would definitely work if we could get rid of searching the results and just count the number of results without displaying them.
I hope it helps.

Finding the longest sub-string with no repetition in a string. Time Complexity?

I recently interviewed with a company for software engineering position. I was asked the question of longest unique sub-string in a string. My algorithms was as follows -
Start from the left-most character, and keep storing the character in a hash table with the key as the character and the value as the index_where_it_last_occurred. Add the character to the answer string as long as its not present in the hash table. If we encounter a stored character again, I stop and note down the length. I empty the hash table and then start again from the right index of the repeated character. The right index is retrieved from the (index_where_it_last_occurred) flag. If I ever reach the end of the string, I stop and return the longest length.
For example, say the string was, abcdecfg.
I start with a, store in hash table. I store b and so on till e. Their indexes are stored as well. When I encounter c again, I stop since it's already hashed and note down the length which is 5. I empty the hash table, and start again from the right index of the repeated character. The repeated character being c, I start again from the position 3 ie., the character d. I keep doing this while I don't reach the end of string.
I am interested in knowing what the time complexity of this algorithm will be. IMO, it'll be O(n^2).
This is the code.
import java.util.*;
public class longest
{
static int longest_length = -1;
public static void main(String[] args)
{
Scanner in = new Scanner(System.in);
String str = in.nextLine();
calc(str,0);
System.out.println(longest_length);
}
public static void calc(String str,int index)
{
if(index >= str.length()) return;
int temp_length = 0;
LinkedHashMap<Character,Integer> map = new LinkedHashMap<Character,Integer>();
for (int i = index; i<str.length();i++)
{
if(!map.containsKey(str.charAt(i)))
{
map.put(str.charAt(i),i);
++temp_length;
}
else if(map.containsKey(str.charAt(i)))
{
if(longest_length < temp_length)
{
longest_length = temp_length;
}
int last_index = map.get(str.charAt(i));
// System.out.println(last_index);
calc(str,last_index+1);
break;
}
}
if(longest_length < temp_length)
longest_length = temp_length;
}
}
If the alphabet is of size K, then when you restart counting you jump back at most K-1 places, so you read each character of the string at most K times. So the algorithm is O(nK).
The input string which contains n/K copies of the alphabet exhibits this worst-case behavior. For example if the alphabet is {a, b, c}, strings of the form "abcabcabc...abc" have the property that nearly every character is read 3 times by your algorithm.
You can solve the original problem in O(K+n) time, using O(K) storage space by using dynamic programming.
Let the string be s, and we'll keep a number M which will be the the length of maximum unique_char string ending at i, P, which stores where each character was previously seen, and best, the longest unique-char string found so far.
Start:
Set P[c] = -1 for each c in the alphabet.
M = 0
best = 0
Then, for each i:
M = min(M+1, i-P[s[i]])
best = max(best, M)
P[s[i]] = i
This is trivially O(K) in storage, and O(K+n) in running time.

Valid Permutation of Parenthesis [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Solution to a recursive problem (code kata)
give an algorithm to find all valid permutation of parenthesis for given n
for eg :
for n=3, O/P should be
{}{}{}
{{{}}}
{{}}{}
{}{{}}
{{}{}}
Overview of the problem
This is a classic combinatorial problem that manifests itself in many different ways. These problems are essentially identical:
Generating all possible ways to balance N pairs of parentheses (i.e. this problem)
Generating all possible ways to apply a binary operator to N+1 factors
Generating all full binary trees with N+1 leaves
Many others...
See also
Wikipedia/Catalan number
On-Line Encyclopedia of Integer Sequences/A000108
A straightforward recursive solution
Here's a simple recursive algorithm to solve this problem in Java:
public class Parenthesis {
static void brackets(int openStock, int closeStock, String s) {
if (openStock == 0 && closeStock == 0) {
System.out.println(s);
}
if (openStock > 0) {
brackets(openStock-1, closeStock+1, s + "<");
}
if (closeStock > 0) {
brackets(openStock, closeStock-1, s + ">");
}
}
public static void main(String[] args) {
brackets(3, 0, "");
}
}
The above prints (as seen on ideone.com):
<<<>>>
<<><>>
<<>><>
<><<>>
<><><>
Essentially we keep track of how many open and close parentheses are "on stock" for us to use as we're building the string recursively.
If there's nothing on stock, the string is fully built and you can just print it out
If there's an open parenthesis available on stock, try and add it on.
Now you have one less open parenthesis, but one more close parenthesis to balance it out
If there's a close parenthesis available on stock, try and add it on.
Now you have one less close parenthesis
Note that if you swap the order of the recursion such that you try to add a close parenthesis before you try to add an open parenthesis, you simply get the same list of balanced parenthesis but in reverse order! (see on ideone.com).
An "optimized" variant
The above solution is very straightforward and instructive, but can be optimized further.
The most important optimization is in the string building aspect. Although it looks like a simple string concatenation on the surface, the above solution actually has a "hidden" O(N^2) string building component (because concatenating one character to an immutable String of length N is an O(N) operation). Generally we optimize this by using a mutable StringBuilder instead, but for this particular case we can also simply use a fixed-size char[] and an index variable.
We can also optimize by simplifying the recursion tree. Instead of recursing "both ways" as in the original solution, we can just recurse "one way", and do the "other way" iteratively.
In the following, we've done both optimizations, using char[] and index instead of String, and recursing only to add open parentheses, adding close parentheses iteratively: (see also on ideone.com)
public class Parenthesis2 {
public static void main(String[] args) {
brackets(4);
}
static void brackets(final int N) {
brackets(N, 0, 0, new char[N * 2]);
}
static void brackets(int openStock, int closeStock, int index, char[] arr) {
while (closeStock >= 0) {
if (openStock > 0) {
arr[index] = '<';
brackets(openStock-1, closeStock+1, index+1, arr);
}
if (closeStock-- > 0) {
arr[index++] = '>';
if (index == arr.length) {
System.out.println(arr);
}
}
}
}
}
The recursion logic is less obvious now, but the two optimization techniques are instructive.
Related questions
Checking string has balanced parentheses
Basic Recursion, Check Balanced Parenthesis
The possible number of binary search trees that can be created with N keys is given by the Nth catalan number. Why?
While not an actual algorithm, a good starting point is Catalan numbers:
Reference
http://en.wikipedia.org/wiki/Catalan_number
Eric Lippert recently blogged about this in his article Every Tree There Is. The article refers to code written in the previous article Every Binary Tree There Is.
If you can enumerate all the binary trees then it turns out you can enumerate all the solutions to dozens of different equivalent problems.
A non-recursive solution in Python:
#! /usr/bin/python
def valid(state,N):
cnt=0
for i in xrange(N):
if cnt<0:
return False
if (state&(1<<i)):
cnt+=1
else:
cnt-=1
return (cnt==0)
def make_string(state,N):
ret=""
for i in xrange(N):
if state&(1<<i):
ret+='{'
else:
ret+='}'
return ret
def all_permuts(N):
N*=2
return [make_string(state,N) for state in xrange(1<<N) if valid(state,N)]
if __name__=='__main__':
print "\n".join(all_permuts(3))
This basically examines the binary representation of each number in [0,2^n), treating a '1' as a '{' and a '0' as a '}' and then filters out only those that are properly balanced.

String Tiling Algorithm

I'm looking for an efficient algorithm to do string tiling. Basically, you are given a list of strings, say BCD, CDE, ABC, A, and the resulting tiled string should be ABCDE, because BCD aligns with CDE yielding BCDE, which is then aligned with ABC yielding the final ABCDE.
Currently, I'm using a slightly naïve algorithm, that works as follows. Starting with a random pair of strings, say BCD and CDE, I use the following (in Java):
public static String tile(String first, String second) {
for (int i = 0; i < first.length() || i < second.length(); i++) {
// "right" tile (e.g., "BCD" and "CDE")
String firstTile = first.substring(i);
// "left" tile (e.g., "CDE" and "BCD")
String secondTile = second.substring(i);
if (second.contains(firstTile)) {
return first.substring(0, i) + second;
} else if (first.contains(secondTile)) {
return second.substring(0, i) + first;
}
}
return EMPTY;
}
System.out.println(tile("CDE", "ABCDEF")); // ABCDEF
System.out.println(tile("BCD", "CDE")); // BCDE
System.out.println(tile("CDE", "ABC")); // ABCDE
System.out.println(tile("ABC", tile("BCX", "XYZ"))); // ABCXYZ
Although this works, it's not very efficient, as it iterates over the same characters over and over again.
So, does anybody know a better (more efficient) algorithm to do this ? This problem is similar to a DNA sequence alignment problem, so any advice from someone in this field (and others, of course) are very much welcome. Also note that I'm not looking for an alignment, but a tiling, because I require a full overlap of one of the strings over the other.
I'm currently looking for an adaptation of the Rabin-Karp algorithm, in order to improve the asymptotic complexity of the algorithm, but I'd like to hear some advice before delving any further into this matter.
Thanks in advance.
For situations where there is ambiguity -- e.g., {ABC, CBA} which could result in ABCBA or CBABC --, any tiling can be returned. However, this situation seldom occurs, because I'm tiling words, e.g. {This is, is me} => {This is me}, which are manipulated so that the aforementioned algorithm works.
Similar question: Efficient Algorithm for String Concatenation with Overlap
Order the strings by the first character, then length (smallest to largest), and then apply the adaptation to KMP found in this question about concatenating overlapping strings.
I think this should work for the tiling of two strings, and be more efficient than your current implementation using substring and contains. Conceptually I loop across the characters in the 'left' string and compare them to a character in the 'right' string. If the two characters match, I move to the next character in the right string. Depending on which string the end is first reached of, and if the last compared characters match or not, one of the possible tiling cases is identified.
I haven't thought of anything to improve the time complexity of tiling more than two strings. As a small note for multiple strings, this algorithm below is easily extended to checking the tiling of a single 'left' string with multiple 'right' strings at once, which might prevent extra looping over the strings a bit if you're trying to find out whether to do ("ABC", "BCX", "XYZ") or ("ABC", "XYZ", BCX") by just trying all the possibilities. A bit.
string Tile(string a, string b)
{
// Try both orderings of a and b,
// since TileLeftToRight is not commutative.
string ab = TileLeftToRight(a, b);
if (ab != "")
return ab;
return TileLeftToRight(b, a);
// Alternatively you could return whichever
// of the two results is longest, for cases
// like ("ABC" "BCABC").
}
string TileLeftToRight(string left, string right)
{
int i = 0;
int j = 0;
while (true)
{
if (left[i] != right[j])
{
i++;
if (i >= left.Length)
return "";
}
else
{
i++;
j++;
if (i >= left.Length)
return left + right.Substring(j);
if (j >= right.Length)
return left;
}
}
}
If Open Source code is acceptable, then you should check the genome benchmarks in Stanford's STAMP benchmark suite: it does pretty much exactly what you're looking for. Starting with a bunch of strings ("genes"), it looks for the shortest string that incorporates all the genes. So for example if you have ATGC and GCAA, it'll find ATGCAA. There's nothing about the algorithm that limits it to a 4-character alphabet, so this should be able to help you.
The first thing to ask is if you want to find the tilling of {CDB, CDA}? There is no single tilling.
Interesting problem. You need some kind of backtracking. For example if you have:
ABC, BCD, DBC
Combining DBC with BCD results in:
ABC, DBCD
Which is not solvable. But combining ABC with BCD results in:
ABCD, DBC
Which can be combined to:
ABCDBC.

Resources