Remove duplicates in O(n) by hand - algorithm

I need to remove all duplicates in a list, but only if the item in list a is the same in list b aswell. This is my current code, but at 100k items it's taking literal days, is there a fast way to do this?
Any help appreciated.
List<int> ind = new List<int>();
List<int> used = new List<int>();
for (int i = 0; i < a.Count; i++)
{
for (int j = 0; j < a.Count; j++)
{
if (i != j&&!used.Contains(i))
{
if (a[j] == a[i] && b[i] == b[j])
{
ind.Add(j);
used.Add(j);
}
}
}
}
List<string> s2 = new List<string>();
List<string> a2 = new List<string>();
for (int i = 0; i < a.Count; i++)
{
if (!ind.Contains(i))
{
s2.Add(a[i]);
a2.Add(b[i]);
}
}

The key to many such problems is the correct data structure. To avoid duplicates, you need to use Sets, as they remove duplicates automatically.
Here is the code in Java, I hope it is similar in C#:
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Random;
import java.util.Set;
class Duplicates
{
static List<Integer> list1 = new ArrayList<>();
static List<Integer> list2 = new ArrayList<>();
static final int SIZE = 100_000;
static final int MAX_VALUE = 1000_000;
public static void main(String[] args)
{
// populate the lists with random values for testing
Random r = new Random();
for(int i=0; i<SIZE; i++)
{
list1.add(r.nextInt(MAX_VALUE));
list2.add(r.nextInt(MAX_VALUE));
}
Set<Integer> set1 = new HashSet<>(list1);
Set<Integer> set2 = new HashSet<>(list2);
// items that are in both lists
Set<Integer> intersection = new HashSet<>(set1);
intersection.retainAll(set2);
Set<Integer> notSeenYet = new HashSet<>(intersection);
List<Integer> list1Unique = new ArrayList<Integer>();
for(int n: list1)
{
if(intersection.contains(n)) // we may have to skip this one
{
if(notSeenYet.contains(n)) // no, don't skip, it's the first occurrence
{
notSeenYet.remove(n);
}
else
{
continue;
}
}
list1Unique.add(n);
}
System.out.println("list 1 contains "+list1Unique.size()+" values after removing all duplicates that are also in list 2");
}
}
It takes less than a second for 100k values.
Output
list 1 contains 99591 values after removing all duplicates that are
also in list 2

Create a HashSet.
First, iterate through the list b and add all elements into the HashSet.
Then, iterate through each element of the list a. When you visit an element, ask the HashSet if it already contains that element. If if doesn't, it's a new element, so just keep it. If it does, it is a duplicate and you can remove it from a.
HashSets can perform the Do you have this element? question in O(1), so for the whole list, you have O(n).
For more information, check the documentation.

Here is a general algorithm to consider. We can start by sorting both lists in ascending order. Using a good sorting algorithm such as merge sort, this would take O(NlgN) time, where N is the length of the list. Once we have paid this penalty, we need only maintain two pointers in each of the lists. The general algorithm would basically involve walking up both lists, searching for duplicates in the first a list, should the value in question match the pointer into the b list. If there be a match, then duplicates would be removed from the a list, otherwise we keep walking until reaching the end of the a list. This process would be only O(N), making the biggest penalty the initial sort, which is O(NlgN).

To "remove duplicates" I understand to mean "from n identical items, leave the first and remove the remaining n - 1". If so then this is the algorithm:
Convert list b to set B. Also introduce set A_dup. Run through list a and for each item:
if item is found in A_dup then remove it from a,
else if item is found in set B then add it to A_dup.
Repeat.
Checking for existence in sets (both A_dup and B) is O(1) operation, also to add new item in the set. So, you're left with iterating through list a, which in total gives us O(n).

I think what you are trying to do is find distinct pairs, right?
If so, you can do that in one line using Zip and Distinct and a C# Tuple (or use an anonymous type).
var result = a.Zip(b, (x,y) => (x, y)).Distinct();

import java.util.*;
import java.util.stream.Collectors;
public class Test {
public static void main(String args[]) {
List<String> dupliKhaneList = new ArrayList<>();
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Khan");
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Khan");
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Zidan");
// Solution 1 if want to remove in list
List<String> uniqueList = dupliKhaneList.stream().distinct().collect(Collectors.toList());
System.out.println("DupliKhane => " + dupliKhaneList);
System.out.println("Unique 1 => " + uniqueList);
// Solution 2 if want to remove using 2 list
List<String> list1 = new ArrayList<>();
list1.add("Vaquar");
list1.add("Khan");
list1.add("Vaquar");
list1.add("Vaquar");
list1.add("Khan");
list1.add("Vaquar");
list1.add("Zidan");
List<String> list2 = new ArrayList<>();
list2.add("Zidan");
System.out.println("list1 => " + list1);
System.out.println("list2 => " + list2);
list1.removeAll(list2);
System.out.println("removeAll duplicate => " + list1);
}
}
Results :
DupliKhane => [Vaquar, Khan, Vaquar, Vaquar, Khan, Vaquar, Zidan]
Unique 1 => [Vaquar, Khan, Zidan]
list1 => [Vaquar, Khan, Vaquar, Vaquar, Khan, Vaquar, Zidan]
list2 => [Zidan]
removeAll duplicate => [Vaquar, Khan, Vaquar, Vaquar, Khan, Vaquar]

Related

Sort two lists the same way

I need to sort a list of DateTime from earliest to latest.
List<DateTime> list = [2021-01-15 12:26:40.709246, 2021-02-25 13:26:40.709246, 2021-02-20 19:26:40.709246];
datetimeList.sort();
I have another list of Strings.
List<String> list = ["one", "two", "three"];
The indexes of stringList have to match the indexes of datetimeList. So the index of "one" always has to be the same as the index of 2021-01-15 12:26:40.709246 and so on.
If I sort the lists individually, the DateTime is sorted by DateTime and the Strings are sorted alphabetically. This way, the String does not go with its initial date anymore.
How can I sort one list (datetimeList) with the other list (stringList) sorting exactly the same way?
The easiest solution would be to create a struct/class to combine both variables so you don't have to worry about keeping the objects in the arrays aligned. The last thing you need to do is to sort the array ob new objects by the date. For that, I cannot help you due to missing knowledge about Dart.
You could us a SplayTreeMap as well.https://api.dart.dev/stable/2.8.4/dart-collection/SplayTreeMap-class.html.
SplayTreeMap ensures that its keys are in sorted order.You could use your datetime as key and the its contents of other list as value.
main() {
final SplayTreeMap<DateTime, String> map =
new SplayTreeMap<DateTime, String>();
map[DateTime.parse("2021-01-15 12:26:40.709246")] = "one";
map[DateTime.parse("2021-02-25 13:26:40.709246")] = "three";
map[DateTime.parse("2021-02-20 19:26:40.709246")] = "two";
for (final DateTime key in map.keys) {
print("$key : ${map[key]}");
}
}
I recommend the simpler suggestions given here.
For completeness, I'll provide one more approach: Compute the permutation by sorting a list of indices:
List<int> sortedPermutation<T>(List<T> elements, int compare(T a, T b)) =>
[for (var i = 0; i < elements.length; i++) i]
..sort((i, j) => compare(elements[i], elements[j]));
Then you can reorder the existing lists to match:
List<T> reorder<T>(List<T> elements, List<int> permutation) =>
[for (var i = 0; i < permutation.length; i++) elements[permutation[i]]];
If you do:
var sorted = reorder(original, sortedPermutation(original, compare));
it should give you a sorted list.
It's less efficient than sorting in-place because you create a new list,
but you can apply the same reordering to multiple lists afterwards.
Fast and very effective way.
void main() {
final l1 = [3, 1, 2];
final l2 = ['three', 'one', 'two'];
final l3 = ['drei', 'ein', 'zwei'];
print(l1);
print(l2);
print(l3);
myCompare(int x, int y) => x.compareTo(y);
l1.sortLists([l2, l3], myCompare);
print('============');
print(l1);
print(l2);
print(l3);
}
extension SortListByList<E> on List<E> {
sortLists(Iterable<List> lists, int Function(E, E) compare) {
for (final list in lists) {
if (list.length != length) {
throw StateError('The length of lists must be equal');
}
}
final rules = <int>[];
sort((x, y) {
final rule = compare(x, y);
rules.add(rule);
return rule;
});
for (final list in lists) {
var rule = 0;
list.sort((x, y) => rules[rule++]);
}
}
}
Output:
[3, 1, 2]
[three, one, two]
[drei, ein, zwei]
============
[1, 2, 3]
[one, two, three]
[ein, zwei, drei]

Generate ordered list of sum between elements in large lists

I'm not sure whether this question should be posted in math of overflow, but here we go.
I have an arbitrary amount of ordered lists (say 3 for example) with numerical values. These lists can be long enough that trying all combinations of values becomes too computationally heavy.
What I need is to get an ordered list of possible sums when picking one value from each of the lists. Since the lists can be large, I only want the N smallest sums.
What I've considered is to step down one of the lists for each iteration. This however misses many cases that would have been possible if another list would have been chosen for that step.
An alternative would be a recursive solution, but that would generate many duplicate cases instead.
Is there any known methods that could solve such a problem?
Let we have K lists.
Make min-heap.
a) Push a structure contaning sum of elements from every list (the first ones at this elements) and list of indexes key = Sum(L[i][0]), [ix0=0, ix1=0, ix2=0]
b) Pop the smallest element from the heap, output key (sum) value
c) Construct K new elements from popped one - for every increment corresponding index and update sum
key - L[0][ix0] + L[0][ix0 + 1], [ix0 + 1, ix1, ix2]
key - L[1][ix1] + L[1][ix1 + 1], [ix0, ix1 + 1, ix2]
same for ix2
d) Push them into the heap
e) Repeat from b) until N smallest sums are extracted
A Java implementation of the min heap algorithm with a simple test case:
The algorithm itself is just as described by #MBo.
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.PriorityQueue;
class MinHeapElement {
int sum;
List<Integer> idx;
}
public class SumFromKLists {
public static List<Integer> sumFromKLists(List<List<Integer>> lists, int N) {
List<Integer> ans = new ArrayList<>();
if(N == 0) {
return ans;
}
PriorityQueue<MinHeapElement> minPq = new PriorityQueue<>(new Comparator<MinHeapElement>() {
#Override
public int compare(MinHeapElement e1, MinHeapElement e2) {
return e1.sum - e2.sum;
}
});
MinHeapElement smallest = new MinHeapElement();
smallest.idx = new ArrayList<>();
for(int i = 0; i < lists.size(); i++) {
smallest.sum += lists.get(i).get(0);
smallest.idx.add(0);
}
minPq.add(smallest);
ans.add(smallest.sum);
while(ans.size() < N) {
MinHeapElement curr = minPq.poll();
if(ans.get(ans.size() - 1) != curr.sum) {
ans.add(curr.sum);
}
List<MinHeapElement> candidates = nextPossibleCandidates(lists, curr);
if(candidates.size() == 0) {
break;
}
minPq.addAll(candidates);
}
return ans;
}
private static List<MinHeapElement> nextPossibleCandidates(List<List<Integer>> lists, MinHeapElement minHeapElement) {
List<MinHeapElement> candidates = new ArrayList<>();
for(int i = 0; i < lists.size(); i++) {
List<Integer> currList = lists.get(i);
int newIdx = minHeapElement.idx.get(i) + 1;
while(newIdx < currList.size() && currList.get(newIdx) == currList.get(newIdx - 1)) {
newIdx++;
}
if(newIdx < currList.size()) {
MinHeapElement nextElement = new MinHeapElement();
nextElement.sum = minHeapElement.sum + currList.get(newIdx) - currList.get(minHeapElement.idx.get(i));
nextElement.idx = new ArrayList<>(minHeapElement.idx);
nextElement.idx.set(i, newIdx);
candidates.add(nextElement);
}
}
return candidates;
}
public static void main(String[] args) {
List<Integer> list1 = new ArrayList<>();
list1.add(2); list1.add(4); list1.add(7); list1.add(8);
List<Integer> list2 = new ArrayList<>();
list2.add(1); list2.add(3); list2.add(5); list2.add(8);
List<List<Integer>> lists = new ArrayList<>();
lists.add(list1); lists.add(list2);
sumFromKLists(lists, 11);
}
}

Algorithm to list unique permutations of string with duplicate letters

For example, string "AAABBB" will have permutations:
"ABAABB",
"BBAABA",
"ABABAB",
etc
What's a good algorithm for generating the permutations? (And what's its time complexity?)
For a multiset, you can solve recursively by position (JavaScript code):
function f(multiset,counters,result){
if (counters.every(x => x === 0)){
console.log(result);
return;
}
for (var i=0; i<counters.length; i++){
if (counters[i] > 0){
_counters = counters.slice();
_counters[i]--;
f(multiset,_counters,result + multiset[i]);
}
}
}
f(['A','B'],[3,3],'');
This is not full answer, just an idea.
If your strings has fixed number of only two letters I'll go with binary tree and good recursion function.
Each node is object that contains name with prefix of parent name and suffix A or B furthermore it have numbers of A and B letters in the name.
Node constructor gets name of parent and number of A and B from parent so it needs only to add 1 to number of A or B and one letter to name.
It doesn't construct next node if there is more than three A (in case of A node) or B respectively, or their sum is equal to the length of starting string.
Now you can collect leafs of 2 trees (their names) and have all permutations that you need.
Scala or some functional language (with object-like features) would be perfect for implementing this algorithm. Hope this helps or just sparks some ideas.
Since you actually want to generate the permutations instead of just counting them, the best complexity you can hope for is O(size_of_output).
Here's a good solution in java that meets that bound and runs very quickly, while consuming negligible space. It first sorts the letters to find the lexographically smallest permutation, and then generates all permutations in lexographic order.
It's known as the Pandita algorithm: https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order
import java.util.Arrays;
import java.util.function.Consumer;
public class UniquePermutations
{
static void generateUniquePermutations(String s, Consumer<String> consumer)
{
char[] array = s.toCharArray();
Arrays.sort(array);
for (;;)
{
consumer.accept(String.valueOf(array));
int changePos=array.length-2;
while (changePos>=0 && array[changePos]>=array[changePos+1])
--changePos;
if (changePos<0)
break; //all done
int swapPos=changePos+1;
while(swapPos+1 < array.length && array[swapPos+1]>array[changePos])
++swapPos;
char t = array[changePos];
array[changePos] = array[swapPos];
array[swapPos] = t;
for (int i=changePos+1, j = array.length-1; i < j; ++i,--j)
{
t = array[i];
array[i] = array[j];
array[j] = t;
}
}
}
public static void main (String[] args) throws java.lang.Exception
{
StringBuilder line = new StringBuilder();
generateUniquePermutations("banana", s->{
if (line.length() > 0)
{
if (line.length() + s.length() >= 75)
{
System.out.println(line.toString());
line.setLength(0);
}
else
line.append(" ");
}
line.append(s);
});
System.out.println(line);
}
}
Here is the output:
aaabnn aaanbn aaannb aabann aabnan aabnna aanabn aananb aanban aanbna
aannab aannba abaann abanan abanna abnaan abnana abnnaa anaabn anaanb
anaban anabna ananab ananba anbaan anbana anbnaa annaab annaba annbaa
baaann baanan baanna banaan banana bannaa bnaaan bnaana bnanaa bnnaaa
naaabn naaanb naaban naabna naanab naanba nabaan nabana nabnaa nanaab
nanaba nanbaa nbaaan nbaana nbanaa nbnaaa nnaaab nnaaba nnabaa nnbaaa

Is there an efficient algorithm that could do this?

I have two lists of integers of equal length, each with no duplicates, and I need to map them to each other based on the (absolute value) of their differences, where nothing could be switched in the output to make the totaled differences of all pair smaller. The 'naive' approach I could think of would run would be this (in condensed C#, but I think it's pretty easy to get):
Dictionary<int, int> output;
List<int> list1, list2;
while(!list1.Empty) //While we haven't arranged all the pairs
{
int bestDistance = Int32.MaxValue; //best distance between numbers so far
int bestFirst, bestSecond; //best numbers so far
foreach(int i in list1)
{
foreach(int j in list2)
{
int distance = Math.Abs(i - j);
//if the distance is better than the best so far, make it the new best
if(distance < bestDistance)
{
bestDistance = distance;
bestFirst = i;
bestSecond = j;
}
}
}
output[bestFirst] = bestSecond; //add the best to dictionary
list1.Remove(bestFirst); //remove it from the lists
list2.Remove(bestSecond);
}
Essentially, it just finds the best pair, removes it, and then repeates until it's done. But this runs in cubic time, if I see it correctly, and would take incredibly long for large lists. Is there any faster way to do this?
This is less trivial than my initial hunch suggested. The key to keeping this O(N log(N)) is to work with sorted lists, and search for the "pivot" element in the second sorted list with the smallest difference to the first element in the first sorted list.
Thus the steps to take become:
Sort both input lists
Find the pivot element in the second sorted list
Return this pivot element together with the first element of the first sorted list
Keep track of the element index left to the pivot and right to the pivot
Iterate the first list in sorted order, returning either the left or right element, depending on which difference is smallest and adjusting the left and right indexes.
As in (c# example):
public static IEnumerable<KeyValuePair<int, int>> FindSmallestDistances(List<int> first, List<int> second)
{
Debug.Assert(first.Count == second.Count); // precondition.
// sort the input: O(N log(N)).
first.Sort();
second.Sort();
// determine pivot: O(N).
var min_first = first[0];
var smallest_abs_dif = Math.Abs(second[0] - min_first);
var pivot_ndx = 0;
for (int i = 1; i < second.Count; i++)
{
var abs_dif = Math.Abs(second[i] - min_first);
if (abs_dif < smallest_abs_dif)
{
smallest_abs_dif = abs_dif;
pivot_ndx = i;
}
};
// return the first one.
yield return new KeyValuePair<int, int>(min_first, second[pivot_ndx]);
// Iterate the rest: O(N)
var left = pivot_ndx - 1;
var right = pivot_ndx + 1;
for (var i = 1; i < first.Count; i++)
{
if (left >= 0)
{
if (right < first.Count && Math.Abs(first[i] - second[left]) > Math.Abs(first[i] - second[right]))
yield return new KeyValuePair<int, int>(first[i], second[right++]);
else
yield return new KeyValuePair<int, int>(first[i], second[left--]);
}
else
yield return new KeyValuePair<int, int>(first[i], second[right++]);
}
}

Google search results: How to find the minimum window that contains all the search keywords?

What is the complexity of the algorithm is that is used to find the smallest snippet that contains all the search key words?
As stated, the problem is solved by a rather simple algorithm:
Just look through the input text sequentially from the very beginning and check each word: whether it is in the search key or not. If the word is in the key, add it to the end of the structure that we will call The Current Block. The Current Block is just a linear sequence of words, each word accompanied by a position at which it was found in the text. The Current Block must maintain the following Property: the very first word in The Current Block must be present in The Current Block once and only once. If you add the new word to the end of The Current Block, and the above property becomes violated, you have to remove the very first word from the block. This process is called normalization of The Current Block. Normalization is a potentially iterative process, since once you remove the very first word from the block, the new first word might also violate The Property, so you'll have to remove it as well. And so on.
So, basically The Current Block is a FIFO sequence: the new words arrive at the right end, and get removed by normalization process from the left end.
All you have to do to solve the problem is look through the text, maintain The Current Block, normalizing it when necessary so that it satisfies The Property. The shortest block with all the keywords in it you ever build is the answer to the problem.
For example, consider the text
CxxxAxxxBxxAxxCxBAxxxC
with keywords A, B and C. Looking through the text you'll build the following sequence of blocks
C
CA
CAB - all words, length 9 (CxxxAxxxB...)
CABA - all words, length 12 (CxxxAxxxBxxA...)
CABAC - violates The Property, remove first C
ABAC - violates The Property, remove first A
BAC - all words, length 7 (...BxxAxxC...)
BACB - violates The Property, remove first B
ACB - all words, length 6 (...AxxCxB...)
ACBA - violates The Property, remove first A
CBA - all words, length 4 (...CxBA...)
CBAC - violates The Property, remove first C
BAC - all words, length 6 (...BAxxxC)
The best block we built has length 4, which is the answer in this case
CxxxAxxxBxxAxx CxBA xxxC
The exact complexity of this algorithm depends on the input, since it dictates how many iterations the normalization process will make, but ignoring the normalization the complexity would trivially be O(N * log M), where N is the number of words in the text and M is the number of keywords, and O(log M) is the complexity of checking whether the current word belongs to the keyword set.
Now, having said that, I have to admit that I suspect that this might not be what you need. Since you mentioned Google in the caption, it might be that the statement of the problem you gave in your post is not complete. Maybe in your case the text is indexed? (With indexing the above algorithm is still applicable, just becomes more efficient). Maybe there's some tricky database that describes the text and allows for a more efficient solution (like without looking through the entire text)? I can only guess and you are not saying...
I think the solution proposed by AndreyT assumes no duplicates exists in the keywords/search terms. Also, the current block can get as big as the text itself if text contains lot of duplicate keywords.
For example:
Text: 'ABBBBBBBBBB'
Keyword text: 'AB'
Current Block: 'ABBBBBBBBBB'
Anyway, I have implemented in C#, did some basic testing, would be nice to get some feedback on whether it works or not :)
static string FindMinWindow(string text, string searchTerms)
{
Dictionary<char, bool> searchIndex = new Dictionary<char, bool>();
foreach (var item in searchTerms)
{
searchIndex.Add(item, false);
}
Queue<Tuple<char, int>> currentBlock = new Queue<Tuple<char, int>>();
int noOfMatches = 0;
int minLength = Int32.MaxValue;
int startIndex = 0;
for(int i = 0; i < text.Length; i++)
{
char item = text[i];
if (searchIndex.ContainsKey(item))
{
if (!searchIndex[item])
{
noOfMatches++;
}
searchIndex[item] = true;
var newEntry = new Tuple<char, int> ( item, i );
currentBlock.Enqueue(newEntry);
// Normalization step.
while (currentBlock.Count(o => o.Item1.Equals(currentBlock.First().Item1)) > 1)
{
currentBlock.Dequeue();
}
// Figuring out minimum length.
if (noOfMatches == searchTerms.Length)
{
var length = currentBlock.Last().Item2 - currentBlock.First().Item2 + 1;
if (length < minLength)
{
startIndex = currentBlock.First().Item2;
minLength = length;
}
}
}
}
return noOfMatches == searchTerms.Length ? text.Substring(startIndex, minLength) : String.Empty;
}
This is an interesting question.
To restate it more formally:
Given a list L (the web page) of length n and a set S (the query) of size k, find the smallest sublist of L that contains all the elements of S.
I'll start with a brute-force solution in hopes of inspiring others to beat it.
Note that set membership can be done in constant time, after one pass through the set. See this question.
Also note that this assumes all the elements of S are in fact in L, otherwise it will just return the sublist from 1 to n.
best = (1,n)
For i from 1 to n-k:
Create/reset a hash found[] mapping each element of S to False.
For j from i to n or until counter == k:
If found[L[j]] then counter++ and let found[L[j]] = True;
If j-i < best[2]-best[1] then let best = (i,j).
Time complexity is O((n+k)(n-k)). Ie, n^2-ish.
Here's a solution using Java 8.
static Map.Entry<Integer, Integer> documentSearch(Collection<String> document, Collection<String> query) {
Queue<KeywordIndexPair> queue = new ArrayDeque<>(query.size());
HashSet<String> words = new HashSet<>();
query.stream()
.forEach(words::add);
AtomicInteger idx = new AtomicInteger();
IndexPair interval = new IndexPair(0, Integer.MAX_VALUE);
AtomicInteger size = new AtomicInteger();
document.stream()
.map(w -> new KeywordIndexPair(w, idx.getAndIncrement()))
.filter(pair -> words.contains(pair.word)) // Queue.contains is O(n) so we trade space for efficiency
.forEach(pair -> {
// only the first and last elements are useful to the algorithm, so we don't bother removing
// an element from any other index. note that removing an element using equality
// from an ArrayDeque is O(n)
KeywordIndexPair first = queue.peek();
if (pair.equals(first)) {
queue.remove();
}
queue.add(pair);
first = queue.peek();
int diff = pair.index - first.index;
if (size.incrementAndGet() == words.size() && diff < interval.interval()) {
interval.begin = first.index;
interval.end = pair.index;
size.set(0);
}
});
return new AbstractMap.SimpleImmutableEntry<>(interval.begin, interval.end);
}
There are 2 static nested classes KeywordIndexPair and IndexPair, the implementation of which should be apparent from the names. Using a smarter programming language that supports tuples those classes wouldn't be necessary.
Test:
Document: apple, banana, apple, apple, dog, cat, apple, dog, banana, apple, cat, dog
Query: banana, cat
Interval: 8, 10
For all the words, maintain min and max index in case there is going to be more than one entry; if not both min and mix index will same.
import edu.princeton.cs.algs4.ST;
public class DicMN {
ST<String, Words> st = new ST<>();
public class Words {
int min;
int max;
public Words(int index) {
min = index;
max = index;
}
}
public int findMinInterval(String[] sw) {
int begin = Integer.MAX_VALUE;
int end = Integer.MIN_VALUE;
for (int i = 0; i < sw.length; i++) {
if (st.contains(sw[i])) {
Words w = st.get(sw[i]);
begin = Math.min(begin, w.min);
end = Math.max(end, w.max);
}
}
if (begin != Integer.MAX_VALUE) {
return (end - begin) + 1;
}
return 0;
}
public void put(String[] dw) {
for (int i = 0; i < dw.length; i++) {
if (!st.contains(dw[i])) {
st.put(dw[i], new Words(i));
}
else {
Words w = st.get(dw[i]);
w.min = Math.min(w.min, i);
w.max = Math.max(w.max, i);
}
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
DicMN dic = new DicMN();
String[] arr1 = { "one", "two", "three", "four", "five", "six", "seven", "eight" };
dic.put(arr1);
String[] arr2 = { "two", "five" };
System.out.print("Interval:" + dic.findMinInterval(arr2));
}
}

Resources