This was asked in an interview
"What is the most efficient way to implement a shuffle function in a music
player to play random songs without repetition"
I suggested link-list approach i.e. use a link-list, generate a random number and remove that item/song from the list ( this way , we ensure that no song is repeated )
then I suggested bit vector approach but he wasn't satisfied at all.
so what according to you is the best approach to implement such a function?
Below are some implementations. I also had difficulties during the interview but after the interview I saw that the solution is simple.
public class MusicTrackProgram {
// O(n) in-place swapping
public static List<MusicTrack> shuffle3(List<MusicTrack> input) {
Random random = new Random();
int last = input.size() - 1;
while (last >= 0) {
int randomInt = Math.abs(random.nextInt() % input.size());
// O(1)
MusicTrack randomTrack = input.get(randomInt);
MusicTrack temp = input.get(last);
// O(1)
input.set(last, randomTrack);
input.set(randomInt, temp);
--last;
}
return input;
}
// O(n) but extra field
public static List<MusicTrack> shuffle(List<MusicTrack> input) {
List<MusicTrack> result = new ArrayList<>();
Random random = new Random();
while (result.size() != input.size()) {
int randomInt = Math.abs(random.nextInt() % input.size());
// O(1)
MusicTrack randomTrack = input.get(randomInt);
if (randomTrack.isUsed) {
continue;
}
// O(1)
result.add(randomTrack);
randomTrack.isUsed = true;
}
return result;
}
// very inefficient O(n^2)
public static List<MusicTrack> shuffle2(List<MusicTrack> input) {
List<MusicTrack> result = new ArrayList<>();
Random random = new Random();
while (result.size() != input.size()) {
int randomInt = Math.abs(random.nextInt() % input.size());
// O(1)
MusicTrack randomTrack = input.get(randomInt);
// O(1)
result.add(randomTrack);
// O(n)
input.remove(randomTrack);
}
return result;
}
public static void main(String[] args) {
List<MusicTrack> musicTracks = MusicTrackFactory.generate(1000000);
List<MusicTrack> result = shuffle3(musicTracks);
result.stream().forEach(x -> System.out.println(x.getName()));
}
}
There is no perfect answer, I guess this sort of questions is aimed to start a discussion. Most likely your interviewer was wanting to hear about Fisher–Yates shuffle (aka Knuth shuffle).
Here is brief outline from wiki:
Write down the numbers from 1 through N.
Pick a random number k between one and the number of unstruck numbers remaining (inclusive).
Counting from the low end, strike out the kth number not yet struck out, and write it down elsewhere.
Repeat from step 2 until all the numbers have been struck out.
The sequence of numbers written down in step 3 is now a random permutation of the original numbers.
You should mention its inefficiencies and benefits, how you could improve this, throw in a few lines of code and discuss what and how you would test this code.
We can use link list and a queue for implementing a song search in mp3 player
We can extend this to following functionalities:
Add a new song
Delete a song
Randomly play a song
Add a song in play queue
Suppose initially we have 6 songs stored as link list
Link list has 2 pointers : start and end
totalSongCount=6
Randomly play a song:
We will generate a random number between 1 to totalSongCount. Let this be 4
We will remove the node representing song 4 and keep it after end pointer
we will decerement the totalSongCount (totalSongCount--).
Next time random number will be generated between 1 to 5 as we have decremented the totalSongCount , we can repeat the process
To add a new song, just add it to link list and make it as head pointer(add in beginning)
increment totalSongCount (totalSongCount++)
To delete a song , first find it and delete it
Also keep a track whether it is after end pointer , if it is not just decerement the totalSongCount (totalSongCount--)
The selected song can have two option:
Either play at that moment or
Add to a playlist (Seperate queue)
I think below solution should work
class InvalidInput extends Exception{
public InvalidInput(String str){
super(str);
}
}
class SongShuffler{
String songName[];
int cooldownPeriod;
Queue<String> queue;
int lastIndex ;
Random random;
public SongShuffler(String arr[], int k) throws InvalidInput{
if(arr.length < k)
throw new InvalidInput("Arr length should be greater than k");
songName = arr;
cooldownPeriod = k;
queue = new LinkedList<String>();
lastIndex = arr.length-1;
random = new Random();
}
public String getSong(){
if(queue.size() == cooldownPeriod){
String s = queue.poll();
songName[lastIndex+1] = s;
lastIndex++;
}
int ind = random.nextInt(lastIndex);
String ans = songName[ind];
queue.add(ans);
songName[ind] = songName[lastIndex];
lastIndex--;
return ans;
}
}
Related
For example, string "AAABBB" will have permutations:
"ABAABB",
"BBAABA",
"ABABAB",
etc
What's a good algorithm for generating the permutations? (And what's its time complexity?)
For a multiset, you can solve recursively by position (JavaScript code):
function f(multiset,counters,result){
if (counters.every(x => x === 0)){
console.log(result);
return;
}
for (var i=0; i<counters.length; i++){
if (counters[i] > 0){
_counters = counters.slice();
_counters[i]--;
f(multiset,_counters,result + multiset[i]);
}
}
}
f(['A','B'],[3,3],'');
This is not full answer, just an idea.
If your strings has fixed number of only two letters I'll go with binary tree and good recursion function.
Each node is object that contains name with prefix of parent name and suffix A or B furthermore it have numbers of A and B letters in the name.
Node constructor gets name of parent and number of A and B from parent so it needs only to add 1 to number of A or B and one letter to name.
It doesn't construct next node if there is more than three A (in case of A node) or B respectively, or their sum is equal to the length of starting string.
Now you can collect leafs of 2 trees (their names) and have all permutations that you need.
Scala or some functional language (with object-like features) would be perfect for implementing this algorithm. Hope this helps or just sparks some ideas.
Since you actually want to generate the permutations instead of just counting them, the best complexity you can hope for is O(size_of_output).
Here's a good solution in java that meets that bound and runs very quickly, while consuming negligible space. It first sorts the letters to find the lexographically smallest permutation, and then generates all permutations in lexographic order.
It's known as the Pandita algorithm: https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order
import java.util.Arrays;
import java.util.function.Consumer;
public class UniquePermutations
{
static void generateUniquePermutations(String s, Consumer<String> consumer)
{
char[] array = s.toCharArray();
Arrays.sort(array);
for (;;)
{
consumer.accept(String.valueOf(array));
int changePos=array.length-2;
while (changePos>=0 && array[changePos]>=array[changePos+1])
--changePos;
if (changePos<0)
break; //all done
int swapPos=changePos+1;
while(swapPos+1 < array.length && array[swapPos+1]>array[changePos])
++swapPos;
char t = array[changePos];
array[changePos] = array[swapPos];
array[swapPos] = t;
for (int i=changePos+1, j = array.length-1; i < j; ++i,--j)
{
t = array[i];
array[i] = array[j];
array[j] = t;
}
}
}
public static void main (String[] args) throws java.lang.Exception
{
StringBuilder line = new StringBuilder();
generateUniquePermutations("banana", s->{
if (line.length() > 0)
{
if (line.length() + s.length() >= 75)
{
System.out.println(line.toString());
line.setLength(0);
}
else
line.append(" ");
}
line.append(s);
});
System.out.println(line);
}
}
Here is the output:
aaabnn aaanbn aaannb aabann aabnan aabnna aanabn aananb aanban aanbna
aannab aannba abaann abanan abanna abnaan abnana abnnaa anaabn anaanb
anaban anabna ananab ananba anbaan anbana anbnaa annaab annaba annbaa
baaann baanan baanna banaan banana bannaa bnaaan bnaana bnanaa bnnaaa
naaabn naaanb naaban naabna naanab naanba nabaan nabana nabnaa nanaab
nanaba nanbaa nbaaan nbaana nbanaa nbnaaa nnaaab nnaaba nnabaa nnbaaa
This question already has answers here:
Unique (non-repeating) random numbers in O(1)?
(22 answers)
Closed 8 years ago.
I want to know an algorithm to find unique random number which is non repeatable. Every time when I call that in program should be give a unique and random number which is not given before by that algorithm. I want to know because some time in a game or app this kind of requirements are came.
For ex. In a game I have created some objects and save all them in a array, and want to retrieve them by randomly and uniquely and not want to delete from array. This is just a scenario.
I have tried some alternative but they are not good performance wise, never got answer of this question.
How it is possible programmatically?
Thanks in advance.
Below code generates unique random numbers from 1-15. Modify as per your requirement:-
public class Main
{
int i[]= new int[15];
int x=0;
int counter;
public int getNumber()
{
return (int)((Math.random()*15)+1);
}
public int getU()
{
x = getNumber();
while(check(x))
{
x = getNumber();
}
i[counter]=x;
counter++;
return x;
}
public boolean check(int x)
{
boolean temp = false;
for(int n=0;n<=counter;n++)
{
if(i[n]==x)
{
temp = true;
break;
}
else
{
temp = false;
}
}
return temp;
}
public static void main(String args[])
{
Main obj = new Main();
for(int i=0;i!=15;i++)
{
System.out.println(obj.getU());
}
}
}
for more info see below links :-
https://community.oracle.com/message/4860317
Expand a random range from 1–5 to 1–7
The best option seems to me is to remove the returned number from the input list.
Let me explain:
Start with the whole range, for example: range = [0, 1, 2, 3, 4]
Toss a random index, let's say 3.
Now remove range[3] from range, you get range = [0, 1, 3, 4]
And so on.
Here is an example code in python:
import random
rangeStart = 0
rangeEnd = 10
rangeForExample = range(rangeStart, rangeEnd)
randomIndex = random.randrange(rangeStart, rangeEnd)
randomResult = rangeForExample[randomIndex]
rangeForExample.remove(randomResult)
This can be achieved in many ways. Here are the two of them(currently on top of my head) :
Persisting the previously generated values.(for range based random no. generation)
In this method you generate a random number and store it(either on file or db) so that when you generate next no. you can match it with the previous numbers and discard it if its already generated.
Generating a unique number every-time. (for non-range based random no. generation)
In this method you use a series or something like that which can give you unique number, current-time-millisecs for instance.
Get count of your array.
Random an index between (0, count).
Retrieve item of index in array.
Remove that item at index.
As I see that you do iOS, I would give an example in objective-C.
NSMutableArray *array = <creation of your array>;
int count = array.count;
while (1) {
int randomIndex = arc4random() % count;
id object = [array objectAtIndex:randomIndex];
NSLog(#"Random object: %#", object);
[array removeObject:object];
count--; // This is important
if(array.count == 0)
{
return;
}
}
Here are two options I could think of ..
Using a history-list
1. Keep past picked random numbers in a list
2. Find a new random number
3. If the number exist in history list, go to 2
4. [optional] If the number lower history list randomness, go to 2
5. add the number to the history list
Using jumps
At Time 0: i=0; seed(Time); R0 = random() % jump_limit
1. i++
2. Ji = random() % jump_limit
3. Ri = Ri-1 + Ji
Given a review paragraph and keywords, find minimum length snippet from paragraph which contains all keywords in any order.If there are millions of review, what preprocessing step would you do.
The first part is simple, just the minimum window problem. Now, for preprocessing, I use inverted index. So, for each review I build a table storing the list of occurance of each word. Now, when a query comes, I retrieve the list of indices for each word. Now, is there some way to find out the min window length from this set of list in O(n) time? I tried building min and max heap to store the current index of each list and then keeping a track of the min window length(using the root of both the heaps). Then I perform extractMin operation and remove the same element from the max heap as well. To keep address of the location of each element in the max heap(for removal), I maintain a hash table. Now from the list, to which the extracted element belonged, I insert the next element into both the heaps and change the window length, if needed. This takes O(nlog n) time. Is it possible to do this in O(n) time?
Assuming this combination is sorted here is how I would do it:
Create a list of objects that describe the word and its index, Something like Obj(String name,Int index).
Init a set containing all keywords of the query.
Init the lower bound of the window as the index of the first element in the list.
Go through the list updating the upper bound of the window as the current object's index, updating the lower bound of the window as the index of the first occurrence of any of the words in your query (i.e. once min_window is set to the index of an actual word occurrence it is no longer updated) and by removing the corresponding word from the set of keywords.
When the set is empty, save the resulting lower and upper bound along with the length of the snippet.
Repeat the steps 2 to 5 but this time the list you're going to use is the list that starts at the element that comes right after the one defined by the previous min_window and by only keeping the min_window and max_window if the length of the snippet is shorter than the previous one (this should be repeated until you can no longer find all occurrences in the given sublist).
#include<bits/stdc++.h>
using namespace std;
map<string,int>word;
void functionlower(string& str){
transform(str.begin(),str.end(),str.begin(),::tolower);
}
string compareWord(string& str){
string temp;
temp.resize(str.size());
transform(str.begin(),str.end(),temp.begin(),::tolower);
return temp;
}
int main(){
int total_word;
cin>>total_word;
for(int i=0;i<total_word;i++){
string str;
cin>>str;
functionlower(str);
word.insert({str,0});
}
cin.ignore();
string str;
vector<string>para;
getline(cin,str);
int index=0;
for(int i=0;i<=str.size();i++){
if(i==str.size()||str[i]==' '){para.push_back(str.substr(index,i-index)); index=i+1;}
}
int currlen=0;
int currpos=0;
int lenprint=0;
int olen=-1;
int opos=-1;
for(int i=0;i<para.size();i++){
string search=compareWord(para[i]);
if(word.find(search)!=word.end()){
if(word[search]==0)currlen++;
word[search]++;
}
while(currlen>=word.size()){
search=compareWord(para[currpos]);
if((i-currpos)<olen||olen==-1){
olen=i-currpos;
opos=currpos;
}
if(word.find(search)!=word.end()){
if(word[search]==1)break;
word[search]--;
currpos++;
lenprint=i;
}else currpos++;
}
}
for(int i=0;i<=olen;i++){
cout<<para[opos+i]<<" ";
}
cout<<endl;
return 0;
}
O(nlogk) where k is number of words need to search
Assuming a constant wordLength, this solution can be achieved in O(n) time complexity, where n is number of words in para; here is the code for implementation in java:
package Basic.MinSnippetWithAllKeywords;
import java.util.*;
/**
* Given a review paragraph and keywords,
* find minimum length snippet from paragraph which contains all keywords in any order.
*/
public class Solution {
public String minSnippet(String para, Set<String> keywords) {
LinkedList<Integer> deque = new LinkedList<>();
String[] words = para.split("\\s");
for (int i = 0; i < words.length; ++i) {
if(keywords.contains(words[i]))
deque.offer(i);
}
while(deque.size() > 1) {
int first = deque.pollFirst();
int second = deque.peekFirst();
if (words[first] != words[second]) {
deque.offerFirst(first);
break;
}
}
while(deque.size() > 1) {
int first = deque.pollLast();
int second = deque.peekLast();
if(words[first] != words[second]) {
deque.offerLast(first);
break;
}
}
if (deque.isEmpty())
return "";
return String.join(" ",
Arrays.copyOfRange(words, deque.peekFirst(), deque.peekLast() + 1));
}
/*
Example:
my name is shubham mishra
is name
*/
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String para = sc.nextLine();
String keyLine = sc.nextLine();
Set<String> keywords = new HashSet<>();
keywords.addAll(Arrays.asList(keyLine.split("\\s")));
System.out.println(new Solution().minSnippet(para, keywords));
}
}
A machine is taking measurements and giving me discrete numbers continuously like so:
1 2 5 7 8 10 11 12 13 14 18
Let us say these measurements can be off by 2 points and a measurement is generated every 5 seconds. I want to ignore the measurements that may potentially be same
Like continuous 2 and 3 could be same because margin of error is 2 so how do I partition the data such that I get only distinct measurements but I would also want to handle the situation in which the measurements are continuously increasing like so:
1 2 3 4 5 6 7 8 9 10
In this case if we keep ignoring the consecutive numbers with difference of less than 2 then we might lose actual measurements.
Is there a class of algorithms for this? How would you solve this?
Just drop any number that comes 'in range of' the previous (kept) one. It should simply work.
For your increasing example:
1 is kept, 2 is dropped because it is in range of 1, 3 is dropped because it is in range of 1, then 4 is kept, 5 and 6 are dropped in range of 4, then 7 is kept, etc, so you still keep the increasing trend if it's big enough (which is what you want, right?
For the original example, you'd get 1,5,8,11,14,18 as a result.
In some lines of work, the standard way to deal with problems of this nature is by using the Kalman filter.
To quote Wikipedia:
Its [Kalman filter's] purpose is to use measurements
observed over time, containing noise
(random variations) and other
inaccuracies, and produce values that
tend to be closer to the true values
of the measurements and their
associated calculated values.
The filter itself is very easy to implement, but does require calibration.
I would have two queues:
Temporary Queue
Final Queue/List
Your first value would go into the temporary queue and in the final list. As new values come in, check to see if the new value is within the deadband of the last value in the list. If it is then add it to the temporary queue. If not then add it to the final list. If your temporary queue starts to increase in size before you get a new value outside of the deadband, then once you are outside of the deadband do a check to see if the values are monotonically increasing or decreasing the whole time. If they are always increasing or decreasing then add the contents of the queue to the final list, otherwise just add the single new value to the final list. This is the general gist of it.
Here is some code I whipped up quickly that implements a class to do what I described above:
public class MeasurementsFilter
{
private Queue<int> tempQueue = new Queue<int>();
private List<int> finalList = new List<int>();
private int deadband;
public MeasurementsFilter(int deadband)
{
this.deadband = deadband;
}
public void Reset()
{
finalList.Clear();
tempQueue.Clear();
}
public int[] FinalValues()
{
return finalList.ToArray();
}
public void AddNewValue(int value)
{
// if we are just starting then the first value always goes in the list and queue
if (tempQueue.Count == 0)
{
tempQueue.Enqueue(value);
finalList.Add(value);
}
else
{
// if the new value is within the deadband of the last value added to the final list
// then enqueue the value and wait
if ((tempQueue.Peek() - deadband <= value) && (value <= tempQueue.Peek() + deadband))
{
tempQueue.Enqueue(value);
}
// else the new value is outside of the deadband of the last value added to the final list
else
{
tempQueue.Enqueue(value);
if (QueueIsAlwaysIncreasingOrAlwaysDecreasing())
{
//dequeue first item (we already added it to the list before, but we need it for comparison purposes)
int currentItem = tempQueue.Dequeue();
while (tempQueue.Count > 0)
{
// if we are not seeing two in a row of the same (i.e. they are not duplicates of each other)
// then add the newest value to the final list
if (currentItem != tempQueue.Peek())
{
currentItem = tempQueue.Dequeue();
finalList.Add(currentItem);
}
// otherwise if we are seeing two in a row (i.e. duplicates)
// then discard the value and loop to the next value
else
{
currentItem = tempQueue.Dequeue();
}
}
// add the last item from the final list back into the queue for future deadband comparisons
tempQueue.Enqueue(finalList[finalList.Count - 1]);
}
else
{
// clear the queue and add the new value to the list and as the starting point of the queue
// for future deadband comparisons
tempQueue.Clear();
tempQueue.Enqueue(value);
finalList.Add(value);
}
}
}
}
private bool QueueIsAlwaysIncreasingOrAlwaysDecreasing()
{
List<int> queueList = new List<int>(tempQueue);
bool alwaysIncreasing = true;
bool alwaysDecreasing = true;
int tempIncreasing = int.MinValue;
int tempDecreasing = int.MaxValue;
int i = 0;
while ((alwaysIncreasing || alwaysDecreasing) && (i < queueList.Count))
{
if (queueList[i] >= tempIncreasing)
tempIncreasing = queueList[i];
else
alwaysIncreasing = false;
if (queueList[i] <= tempDecreasing)
tempDecreasing = queueList[i];
else
alwaysDecreasing = false;
i++;
}
return (alwaysIncreasing || alwaysDecreasing);
}
}
Here is some test code that you can throw into a Winform Load event or button click:
int[] values = new int[] { 1, 2, 2, 1, 4, 8, 3, 2, 1, 0, 6 };
MeasurementsFilter filter = new MeasurementsFilter(2);
for (int i = 0; i < values.Length; i++)
{
filter.AddNewValue(values[i]);
}
int[] finalValues = filter.FinalValues();
StringBuilder printValues = new StringBuilder();
for (int i = 0; i < finalValues.Length; i++)
{
printValues.Append(finalValues[i]);
printValues.Append(" ");
}
MessageBox.Show("The final values are: " + printValues);
What is the complexity of the algorithm is that is used to find the smallest snippet that contains all the search key words?
As stated, the problem is solved by a rather simple algorithm:
Just look through the input text sequentially from the very beginning and check each word: whether it is in the search key or not. If the word is in the key, add it to the end of the structure that we will call The Current Block. The Current Block is just a linear sequence of words, each word accompanied by a position at which it was found in the text. The Current Block must maintain the following Property: the very first word in The Current Block must be present in The Current Block once and only once. If you add the new word to the end of The Current Block, and the above property becomes violated, you have to remove the very first word from the block. This process is called normalization of The Current Block. Normalization is a potentially iterative process, since once you remove the very first word from the block, the new first word might also violate The Property, so you'll have to remove it as well. And so on.
So, basically The Current Block is a FIFO sequence: the new words arrive at the right end, and get removed by normalization process from the left end.
All you have to do to solve the problem is look through the text, maintain The Current Block, normalizing it when necessary so that it satisfies The Property. The shortest block with all the keywords in it you ever build is the answer to the problem.
For example, consider the text
CxxxAxxxBxxAxxCxBAxxxC
with keywords A, B and C. Looking through the text you'll build the following sequence of blocks
C
CA
CAB - all words, length 9 (CxxxAxxxB...)
CABA - all words, length 12 (CxxxAxxxBxxA...)
CABAC - violates The Property, remove first C
ABAC - violates The Property, remove first A
BAC - all words, length 7 (...BxxAxxC...)
BACB - violates The Property, remove first B
ACB - all words, length 6 (...AxxCxB...)
ACBA - violates The Property, remove first A
CBA - all words, length 4 (...CxBA...)
CBAC - violates The Property, remove first C
BAC - all words, length 6 (...BAxxxC)
The best block we built has length 4, which is the answer in this case
CxxxAxxxBxxAxx CxBA xxxC
The exact complexity of this algorithm depends on the input, since it dictates how many iterations the normalization process will make, but ignoring the normalization the complexity would trivially be O(N * log M), where N is the number of words in the text and M is the number of keywords, and O(log M) is the complexity of checking whether the current word belongs to the keyword set.
Now, having said that, I have to admit that I suspect that this might not be what you need. Since you mentioned Google in the caption, it might be that the statement of the problem you gave in your post is not complete. Maybe in your case the text is indexed? (With indexing the above algorithm is still applicable, just becomes more efficient). Maybe there's some tricky database that describes the text and allows for a more efficient solution (like without looking through the entire text)? I can only guess and you are not saying...
I think the solution proposed by AndreyT assumes no duplicates exists in the keywords/search terms. Also, the current block can get as big as the text itself if text contains lot of duplicate keywords.
For example:
Text: 'ABBBBBBBBBB'
Keyword text: 'AB'
Current Block: 'ABBBBBBBBBB'
Anyway, I have implemented in C#, did some basic testing, would be nice to get some feedback on whether it works or not :)
static string FindMinWindow(string text, string searchTerms)
{
Dictionary<char, bool> searchIndex = new Dictionary<char, bool>();
foreach (var item in searchTerms)
{
searchIndex.Add(item, false);
}
Queue<Tuple<char, int>> currentBlock = new Queue<Tuple<char, int>>();
int noOfMatches = 0;
int minLength = Int32.MaxValue;
int startIndex = 0;
for(int i = 0; i < text.Length; i++)
{
char item = text[i];
if (searchIndex.ContainsKey(item))
{
if (!searchIndex[item])
{
noOfMatches++;
}
searchIndex[item] = true;
var newEntry = new Tuple<char, int> ( item, i );
currentBlock.Enqueue(newEntry);
// Normalization step.
while (currentBlock.Count(o => o.Item1.Equals(currentBlock.First().Item1)) > 1)
{
currentBlock.Dequeue();
}
// Figuring out minimum length.
if (noOfMatches == searchTerms.Length)
{
var length = currentBlock.Last().Item2 - currentBlock.First().Item2 + 1;
if (length < minLength)
{
startIndex = currentBlock.First().Item2;
minLength = length;
}
}
}
}
return noOfMatches == searchTerms.Length ? text.Substring(startIndex, minLength) : String.Empty;
}
This is an interesting question.
To restate it more formally:
Given a list L (the web page) of length n and a set S (the query) of size k, find the smallest sublist of L that contains all the elements of S.
I'll start with a brute-force solution in hopes of inspiring others to beat it.
Note that set membership can be done in constant time, after one pass through the set. See this question.
Also note that this assumes all the elements of S are in fact in L, otherwise it will just return the sublist from 1 to n.
best = (1,n)
For i from 1 to n-k:
Create/reset a hash found[] mapping each element of S to False.
For j from i to n or until counter == k:
If found[L[j]] then counter++ and let found[L[j]] = True;
If j-i < best[2]-best[1] then let best = (i,j).
Time complexity is O((n+k)(n-k)). Ie, n^2-ish.
Here's a solution using Java 8.
static Map.Entry<Integer, Integer> documentSearch(Collection<String> document, Collection<String> query) {
Queue<KeywordIndexPair> queue = new ArrayDeque<>(query.size());
HashSet<String> words = new HashSet<>();
query.stream()
.forEach(words::add);
AtomicInteger idx = new AtomicInteger();
IndexPair interval = new IndexPair(0, Integer.MAX_VALUE);
AtomicInteger size = new AtomicInteger();
document.stream()
.map(w -> new KeywordIndexPair(w, idx.getAndIncrement()))
.filter(pair -> words.contains(pair.word)) // Queue.contains is O(n) so we trade space for efficiency
.forEach(pair -> {
// only the first and last elements are useful to the algorithm, so we don't bother removing
// an element from any other index. note that removing an element using equality
// from an ArrayDeque is O(n)
KeywordIndexPair first = queue.peek();
if (pair.equals(first)) {
queue.remove();
}
queue.add(pair);
first = queue.peek();
int diff = pair.index - first.index;
if (size.incrementAndGet() == words.size() && diff < interval.interval()) {
interval.begin = first.index;
interval.end = pair.index;
size.set(0);
}
});
return new AbstractMap.SimpleImmutableEntry<>(interval.begin, interval.end);
}
There are 2 static nested classes KeywordIndexPair and IndexPair, the implementation of which should be apparent from the names. Using a smarter programming language that supports tuples those classes wouldn't be necessary.
Test:
Document: apple, banana, apple, apple, dog, cat, apple, dog, banana, apple, cat, dog
Query: banana, cat
Interval: 8, 10
For all the words, maintain min and max index in case there is going to be more than one entry; if not both min and mix index will same.
import edu.princeton.cs.algs4.ST;
public class DicMN {
ST<String, Words> st = new ST<>();
public class Words {
int min;
int max;
public Words(int index) {
min = index;
max = index;
}
}
public int findMinInterval(String[] sw) {
int begin = Integer.MAX_VALUE;
int end = Integer.MIN_VALUE;
for (int i = 0; i < sw.length; i++) {
if (st.contains(sw[i])) {
Words w = st.get(sw[i]);
begin = Math.min(begin, w.min);
end = Math.max(end, w.max);
}
}
if (begin != Integer.MAX_VALUE) {
return (end - begin) + 1;
}
return 0;
}
public void put(String[] dw) {
for (int i = 0; i < dw.length; i++) {
if (!st.contains(dw[i])) {
st.put(dw[i], new Words(i));
}
else {
Words w = st.get(dw[i]);
w.min = Math.min(w.min, i);
w.max = Math.max(w.max, i);
}
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
DicMN dic = new DicMN();
String[] arr1 = { "one", "two", "three", "four", "five", "six", "seven", "eight" };
dic.put(arr1);
String[] arr2 = { "two", "five" };
System.out.print("Interval:" + dic.findMinInterval(arr2));
}
}