How to find loop/ repetition in a data stream? - algorithm

I came across an interesting question in an interview. But I couldn't answer it, neither I found it on Google.
Question is as follows:
You are given a data stream. With the help of variable declaration how you can find whether there is any repetition or loop in the data.
Example of the data stream are:
100100100100
0001000100010001
100100010001
10...0010....010....01(where 0....0 is 0^10^10^10)
How can this problem be solved? Is there any algorithm for such kind of problem?

I think there must two approaches to this problem
1. Longest repeated substring problem
This is well known problem which have solution in linear time. You have to construct suffix tree for your string then analyze it.
Please check this article for details
2. Repeated substring problem (any)
You can modify Longest repeated substring to find any repeated substring.

The brute force solution would be to use a map or a dictionary for that, i.e. for stream 100100100100 it will be:
dict["1"]++
dict["10"]++
dict["100"]++
dict["1001"]++
etc till the max length of the repetition to find. Then we drop the first symbol and repeat, i.e. 1 is dropped and 00100100100 is left to analyze:
dict["0"]++
dict["00"]++
dict["001"]++
dict["0010"]++
etc.
At the end we iterate over the map and print all keys with more than one value.
There are more efficient algorithms, but this is the easiest I guess.

Related

Reordering the alphabet to come in first in lexicographical order in fastest way

Consider we have a list with names of people which no two of them are the same. The maximum size of the list is:
Now the goal is to find out how many names (and which ones!)can come first in lexicographical ordering if we changed the English alphabets order!
for instance if the list is:
ha haa st
then with changing the alphabet we can bring ha and st in first place but no matter how we change it haa will always come after ha, so two names can come first.
Of course there is a brute force way to found out the answers but that need to check all 26! possible orders of alphabet for each word! Since the time limit on this problem is 1 second then I think an algorithm with O(nlogn) or lower would do find. However I don't know how to approach the problem. I thing using trie would be helpful (since i encountered the problem when I was learning data structures!) but may be graph algorithms could also help.
How can I find the right algorithm and approach to this problem and how to implement it in code?
Let w be that first word
If we change alphabet letters, the first word keeps its length. Any name which can come in first place must be of length ...length(w).
Let L be the candidate words above. All the names are different as per initial formulation, so L is also made of unique names.
Only name in L is solution. Any name in L is solution. The answer to your problem is L's size.
tldr: count all the words of length length(w)

How to solve the below recurrence relation question

Consider a recursive algorithm that break a given problem into five parts. Out of these five parts the algorithm utilizes three parts and discards two parts. The chosen parts are broken into five again and the same process is recursively repeated until the problem size is 1. Once the problem size is 1, the individual parts are recombined.
Write a recurrence relation for the above algorithm. Please state your assumptions.
Solve the recurrence relation developed in part 1 above using the Substitution Method. Specify the guess and the method you used in deciding that guess.
Please let know the answer even if you know only for the part 1.
Thank you!
This depends on what you mean by the "parts" of the problem. If the initial problem is some sort of data structure, particularly an array of numbers, then in the first part you would divide the structure into five categories based on certain properties of the numbers, and then discard two of those categories and repeat the process on the remaining 3. Just to be clear, is this the exact wording of a homework problem you were given? It would be good to have some information.

Closure Number Method for Generate Parenthesis Problem

The standard Generate Parenthesis question on Leetcode is as follows
Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses.
For example, given n = 3, a solution set is:
[
"((()))",
"(()())",
"(())()",
"()(())",
"()()()"
]
In the solution tab they have explained Closure Number Method which I am finding it difficult to understand.
I did a dry run of the code and even got the correct answer but can't seem to understand why it works? What is the intuition behind this method?
Any help would be greatly appreciated!
The basic idea of this algorithm is dynamic programming. So you try to divide your problem into smaller problems that are easy to solve. In this example you make the sub-problems so small that the solution is either an empty string (if the size is 0) or the solution is "()" (for the size 1).
You start using the knowledge that if you want the parenthesis of a given length then the first character needs to be "(" and in some later place of the string there needs to be this character: ")". Otherwhise the output is not valid.
Now you don't know the position of the closing parenthesis so you just try every position (the first for loop).
The second thing you know, is that between the opening and the closing parenthesis and after the closing parenthesis there has to be something, that you don't realy know how it looks (because there are many possibilities), but it has to be a valid parenthesis pair again.
Now this problem is just the problem you already solved. So you just put in every possibility of valid parenthesis (using a smaller input size). Because this is just what your algorithm already does you can use the recursive function call to do this.
So summarized: You know a part of the problem, and that the rest of the problem is just the same problem with a smaller size. So you solve the small part of the problem you know and recursively call the same method to do this on the rest of the problem. Afterwards you just put it all together and got your solution.
Dynamic programming is usually not that easy to understand but very powerfull. So don't wory if you don't understand it directly. Solving puzzles like these is the best way to learn dynamic programming.
The closure number of a sequence in the size of the smallest prefix of the sequence which is a valid sequence on its own.
If a sequence has a closure number of k, than you know that in index 0 there is '(' and in index k there is ')'
The method solves the problem by checking all possible sizes of such prefix, for each one it breaks the sequence to the prefix (removing the 0 and k element) and all the rest of the sequence and solving the two sub problems recursively.

Matching a set of strings against a string to maximize the number of possible matches

I have a very interesting problem.
I have a set of strings and I would like to know how to best match a combination these strings in another string against a maximization function.
An example would be. Say I have the set:
['aabbcaa', 'bbc']
and I have the string
'fgabbcdaabbcaaef'
and the possible matches for this are:
fga[bbc]daadaa[bbc]aaef
or
fga[bbc]daad[aabbcaa]ef
Now, given a simple maximization function, I would say that fga[bbc]daad[aabbcaa]ef is the winner due to the number of total characters matched. A different maximization function could give more weight to larger words replaced instead of total characters.
I would love to know if someone could point me to some algos on how to do this. What I’m stumped by is after I find a set of potential matches I’m not sure how to maximize the set of words to choose in an efficient way.
The dictionary, the words of the dictionary, and the word that’s being matched against, could be of any size.
Would appreciate any help I could get with this. Thank you!
Found the answer and it works nicely. Pseudocode is:
Loop over the set and find everywhere the set strings match in the target string. Store the start_index, end_index, and give a score to that string for matching. I currently use the length of string.
Then using all the matches found, run it through the "Weighted Interval Scheduling" algorithm to find the optimal set of matches
https://courses.cs.washington.edu/courses/cse521/13wi/slides/06dp-sched.pdf

Binary search implementation with actors in scala?

So I have such problem in Scala, I need to implement binary search with the help of actors, with no loops and recursion, preferably with concurrency between actors. Of course it has no sense, but the problem is as follows. I think that it will be nice to have one actor-coordinator,which coordinates the work of others. So input data is sorted array and the key for searching. Output - index of a key. Do you have some ideas how it possible to implement?
Thanks in advance.
I'm not sure how you could have concurrency for binary search, as every step of the algorithm needs the result of the last one.
You could do "n-ary" search: Split the array in n parts and let every actor compare the value at the boundaries of the sub-arrays. You don't even have to wait for all answers, as soon as you got the two actors with different comparision result, you could start the next round recursively for the subarray you found.

Resources