Sort two lists the same way - sorting

I need to sort a list of DateTime from earliest to latest.
List<DateTime> list = [2021-01-15 12:26:40.709246, 2021-02-25 13:26:40.709246, 2021-02-20 19:26:40.709246];
datetimeList.sort();
I have another list of Strings.
List<String> list = ["one", "two", "three"];
The indexes of stringList have to match the indexes of datetimeList. So the index of "one" always has to be the same as the index of 2021-01-15 12:26:40.709246 and so on.
If I sort the lists individually, the DateTime is sorted by DateTime and the Strings are sorted alphabetically. This way, the String does not go with its initial date anymore.
How can I sort one list (datetimeList) with the other list (stringList) sorting exactly the same way?

The easiest solution would be to create a struct/class to combine both variables so you don't have to worry about keeping the objects in the arrays aligned. The last thing you need to do is to sort the array ob new objects by the date. For that, I cannot help you due to missing knowledge about Dart.

You could us a SplayTreeMap as well.https://api.dart.dev/stable/2.8.4/dart-collection/SplayTreeMap-class.html.
SplayTreeMap ensures that its keys are in sorted order.You could use your datetime as key and the its contents of other list as value.
main() {
final SplayTreeMap<DateTime, String> map =
new SplayTreeMap<DateTime, String>();
map[DateTime.parse("2021-01-15 12:26:40.709246")] = "one";
map[DateTime.parse("2021-02-25 13:26:40.709246")] = "three";
map[DateTime.parse("2021-02-20 19:26:40.709246")] = "two";
for (final DateTime key in map.keys) {
print("$key : ${map[key]}");
}
}

I recommend the simpler suggestions given here.
For completeness, I'll provide one more approach: Compute the permutation by sorting a list of indices:
List<int> sortedPermutation<T>(List<T> elements, int compare(T a, T b)) =>
[for (var i = 0; i < elements.length; i++) i]
..sort((i, j) => compare(elements[i], elements[j]));
Then you can reorder the existing lists to match:
List<T> reorder<T>(List<T> elements, List<int> permutation) =>
[for (var i = 0; i < permutation.length; i++) elements[permutation[i]]];
If you do:
var sorted = reorder(original, sortedPermutation(original, compare));
it should give you a sorted list.
It's less efficient than sorting in-place because you create a new list,
but you can apply the same reordering to multiple lists afterwards.

Fast and very effective way.
void main() {
final l1 = [3, 1, 2];
final l2 = ['three', 'one', 'two'];
final l3 = ['drei', 'ein', 'zwei'];
print(l1);
print(l2);
print(l3);
myCompare(int x, int y) => x.compareTo(y);
l1.sortLists([l2, l3], myCompare);
print('============');
print(l1);
print(l2);
print(l3);
}
extension SortListByList<E> on List<E> {
sortLists(Iterable<List> lists, int Function(E, E) compare) {
for (final list in lists) {
if (list.length != length) {
throw StateError('The length of lists must be equal');
}
}
final rules = <int>[];
sort((x, y) {
final rule = compare(x, y);
rules.add(rule);
return rule;
});
for (final list in lists) {
var rule = 0;
list.sort((x, y) => rules[rule++]);
}
}
}
Output:
[3, 1, 2]
[three, one, two]
[drei, ein, zwei]
============
[1, 2, 3]
[one, two, three]
[ein, zwei, drei]

Related

How to efficiently add a sorted List into another sorted List?

I'm having trouble determining the most efficient way of doing this in Dart.
If have two lists that in sorted descending order,
List<int> messages = [10, 5, 4, 1];
List<int> newMessages = [5, 3, 2];
How can I add newMessages to messages so that messages now looks like
messages = [10, 5, 5, 4, 3, 2, 1];
If both lists are long, and are using the default list implementation, it might be more efficient to create a new list based on the two other lists. The reason is that inserting an element inside an existing list requires all elements after this insertion index to be moved forward. Also, when the list grows, it needs to allocate a bigger list and move all elements into this.
If we instead creates a new list, we can inform Dart what the size of this list is going to be exactly and we can prevent moving elements:
void main() {
List<int> messages = [10, 5, 4, 1];
List<int> newMessages = [5, 3, 2];
// The compare argument is given since both lists are sorted in reverse order
print(newSortedListBasedOnTwoAlreadySortedLists<int>(
messages, newMessages, (a, b) => b.compareTo(a)));
// [10, 5, 5, 4, 3, 2, 1]
}
List<E> newSortedListBasedOnTwoAlreadySortedLists<E>(
List<E> l1,
List<E> l2, [
int Function(E a, E b)? compare,
]) {
Iterator<E> i1 = l1.iterator;
Iterator<E> i2 = l2.iterator;
if (!i1.moveNext()) {
if (!i2.moveNext()) {
return [];
} else {
return l2.toList();
}
}
if (!i2.moveNext()) {
return l1.toList();
}
bool i1alive = true;
bool i2alive = true;
return List.generate(l1.length + l2.length, (_) {
if (i1alive && i2alive) {
E v1 = i1.current;
E v2 = i2.current;
int compareResult = (compare == null)
? Comparable.compare(v1 as Comparable, v2 as Comparable)
: compare(v1, v2);
if (compareResult > 0) {
i2alive = i2.moveNext();
return v2;
} else {
i1alive = i1.moveNext();
return v1;
}
} else if (i1alive) {
E v1 = i1.current;
i1alive = i1.moveNext();
return v1;
} else {
E v2 = i2.current;
i2alive = i2.moveNext();
return v2;
}
});
}
Note: The method could in theory take two Iterable as argument as long as we are sure that a call to .length does not have any negative consequences like e.g. need to iterate over the full structure (with e.g. mappings). To prevent this issue, I ended up declaring the method to take List as arguments since we know for sure that .length is not problematic here.
This sounds like you need to merge the two lists.
As stated elsewhere, it's more efficient to create a new list than to move elements around inside the existing lists.
The merge can be written fairly simply:
/// Merges two sorted lists.
///
/// The lists must be ordered in increasing order according to [compare].
///
/// Returns a new list containing the elements of both [first] and [second]
/// in increasing order according to [compare].
List<T> merge<T>(List<T> first, List<T> second, int Function(T, T) compare) {
var result = <T>[];
var i = 0;
var j = 0;
while (i < first.length && j < second.length) {
var a = first[i];
var b = second[j];
if (compare(a, b) <= 0) {
result.add(a);
i++;
} else {
result.add(b);
j++;
}
}
while (i < first.length) {
result.add(first[i++]);
}
while (j < second.length) {
result.add(second[j++]);
}
return result;
}
(In this case, the lists are descending, so they'll need a compare function which reverses the order, like (a, b) => b.compareTo(a))
You can use binary search to insert all new messages one by one in a sorted manner while maintaining efficiency.
void main() {
List<int> messages = [10, 5, 4, 1];
List<int> newMessages = [5, 3, 2];
for (final newMessage in newMessages) {
final index = binarySearchIndex(messages, newMessage);
messages.insert(index, newMessage);
}
print(messages); // [10, 5, 5, 4, 3, 2, 1]
}
int binarySearchIndex(
List<int> numList,
int value, [
int? preferredMinIndex,
int? preferredMaxIndex,
]) {
final minIndex = preferredMinIndex ?? 0;
final maxIndex = preferredMaxIndex ?? numList.length - 1;
final middleIndex = ((maxIndex - minIndex) / 2).floor() + minIndex;
final comparator = numList[middleIndex];
if (middleIndex == minIndex) {
return comparator > value ? maxIndex : minIndex;
}
return comparator > value ?
binarySearchIndex(numList, value, middleIndex, maxIndex):
binarySearchIndex(numList, value, minIndex, middleIndex);
}

Remove duplicates in O(n) by hand

I need to remove all duplicates in a list, but only if the item in list a is the same in list b aswell. This is my current code, but at 100k items it's taking literal days, is there a fast way to do this?
Any help appreciated.
List<int> ind = new List<int>();
List<int> used = new List<int>();
for (int i = 0; i < a.Count; i++)
{
for (int j = 0; j < a.Count; j++)
{
if (i != j&&!used.Contains(i))
{
if (a[j] == a[i] && b[i] == b[j])
{
ind.Add(j);
used.Add(j);
}
}
}
}
List<string> s2 = new List<string>();
List<string> a2 = new List<string>();
for (int i = 0; i < a.Count; i++)
{
if (!ind.Contains(i))
{
s2.Add(a[i]);
a2.Add(b[i]);
}
}
The key to many such problems is the correct data structure. To avoid duplicates, you need to use Sets, as they remove duplicates automatically.
Here is the code in Java, I hope it is similar in C#:
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Random;
import java.util.Set;
class Duplicates
{
static List<Integer> list1 = new ArrayList<>();
static List<Integer> list2 = new ArrayList<>();
static final int SIZE = 100_000;
static final int MAX_VALUE = 1000_000;
public static void main(String[] args)
{
// populate the lists with random values for testing
Random r = new Random();
for(int i=0; i<SIZE; i++)
{
list1.add(r.nextInt(MAX_VALUE));
list2.add(r.nextInt(MAX_VALUE));
}
Set<Integer> set1 = new HashSet<>(list1);
Set<Integer> set2 = new HashSet<>(list2);
// items that are in both lists
Set<Integer> intersection = new HashSet<>(set1);
intersection.retainAll(set2);
Set<Integer> notSeenYet = new HashSet<>(intersection);
List<Integer> list1Unique = new ArrayList<Integer>();
for(int n: list1)
{
if(intersection.contains(n)) // we may have to skip this one
{
if(notSeenYet.contains(n)) // no, don't skip, it's the first occurrence
{
notSeenYet.remove(n);
}
else
{
continue;
}
}
list1Unique.add(n);
}
System.out.println("list 1 contains "+list1Unique.size()+" values after removing all duplicates that are also in list 2");
}
}
It takes less than a second for 100k values.
Output
list 1 contains 99591 values after removing all duplicates that are
also in list 2
Create a HashSet.
First, iterate through the list b and add all elements into the HashSet.
Then, iterate through each element of the list a. When you visit an element, ask the HashSet if it already contains that element. If if doesn't, it's a new element, so just keep it. If it does, it is a duplicate and you can remove it from a.
HashSets can perform the Do you have this element? question in O(1), so for the whole list, you have O(n).
For more information, check the documentation.
Here is a general algorithm to consider. We can start by sorting both lists in ascending order. Using a good sorting algorithm such as merge sort, this would take O(NlgN) time, where N is the length of the list. Once we have paid this penalty, we need only maintain two pointers in each of the lists. The general algorithm would basically involve walking up both lists, searching for duplicates in the first a list, should the value in question match the pointer into the b list. If there be a match, then duplicates would be removed from the a list, otherwise we keep walking until reaching the end of the a list. This process would be only O(N), making the biggest penalty the initial sort, which is O(NlgN).
To "remove duplicates" I understand to mean "from n identical items, leave the first and remove the remaining n - 1". If so then this is the algorithm:
Convert list b to set B. Also introduce set A_dup. Run through list a and for each item:
if item is found in A_dup then remove it from a,
else if item is found in set B then add it to A_dup.
Repeat.
Checking for existence in sets (both A_dup and B) is O(1) operation, also to add new item in the set. So, you're left with iterating through list a, which in total gives us O(n).
I think what you are trying to do is find distinct pairs, right?
If so, you can do that in one line using Zip and Distinct and a C# Tuple (or use an anonymous type).
var result = a.Zip(b, (x,y) => (x, y)).Distinct();
import java.util.*;
import java.util.stream.Collectors;
public class Test {
public static void main(String args[]) {
List<String> dupliKhaneList = new ArrayList<>();
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Khan");
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Khan");
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Zidan");
// Solution 1 if want to remove in list
List<String> uniqueList = dupliKhaneList.stream().distinct().collect(Collectors.toList());
System.out.println("DupliKhane => " + dupliKhaneList);
System.out.println("Unique 1 => " + uniqueList);
// Solution 2 if want to remove using 2 list
List<String> list1 = new ArrayList<>();
list1.add("Vaquar");
list1.add("Khan");
list1.add("Vaquar");
list1.add("Vaquar");
list1.add("Khan");
list1.add("Vaquar");
list1.add("Zidan");
List<String> list2 = new ArrayList<>();
list2.add("Zidan");
System.out.println("list1 => " + list1);
System.out.println("list2 => " + list2);
list1.removeAll(list2);
System.out.println("removeAll duplicate => " + list1);
}
}
Results :
DupliKhane => [Vaquar, Khan, Vaquar, Vaquar, Khan, Vaquar, Zidan]
Unique 1 => [Vaquar, Khan, Zidan]
list1 => [Vaquar, Khan, Vaquar, Vaquar, Khan, Vaquar, Zidan]
list2 => [Zidan]
removeAll duplicate => [Vaquar, Khan, Vaquar, Vaquar, Khan, Vaquar]

Sorting Algorithm for expensive swapping?

I came across the following problem in the application I'm developing:
I'm given two lists:
list1 = { Z, K, A, B, A, C }
list2 = { A, A, B, C, K, Z }
list2 is the guaranteed to be the sorted version of list1.
My objective is to sort list1 only by swapping elements within list1. So for example, I cannot iterate through list2 and simply assign every element i in list1 to every element j in list2.
Using list2 as a resource, I need to sort list1 in the absolute minimum number of swaps possible.
Is there a set of algorithms specifically for this purpose? I've not heard of such a thing.
I wrote this code in java in order to do the minimal swaps,
Since the second list is guaranteed to be sorted we can look up for each element in it and find its index from the first list then do a swap between the current indexed element and the one that we found.
Update: I modified findLastElementIndex as it checks if the swapped element will be in the right index after swapping based on list2.
public class Testing {
private static String[] unorderedList = {"Z", "C", "A", "B", "A", "K"};
private static String[] orderedList = {"A", "A", "B", "C", "K", "Z"};
private static int numberOfSwaps;
public static void main(String[] args) {
for (int i = 0; i < unorderedList.length; i++) {
if (!unorderedList[i].equals(orderedList[i])) {
int index = findElementToSwapIndex(i, orderedList[i]);
swapElements(unorderedList, i, index);
}
}
System.out.println(numberOfSwaps);
}
private static void swapElements(String[] list, int indexOfFirstElement, int IndexOfSecElement) {
String temp = list[indexOfFirstElement];
list[indexOfFirstElement] = list[IndexOfSecElement];
list[IndexOfSecElement] = temp;
numberOfSwaps++;
}
private static int findElementToSwapIndex(int currentIndexOfUnorderedList , String letter) {
int lastElementToSwapIndex = 0;
for (int i = 0; i < unorderedList.length; i++) {
if (unorderedList[i].equals(letter)) {
lastElementToSwapIndex = i;
if(unorderedList[currentIndexOfUnorderedList].equals(orderedList[lastElementToSwapIndex])){// check if the swapped element will be in the right place in regard to list 2
return lastElementToSwapIndex;
}
}
}
return lastElementToSwapIndex;
}
}
min number of swaps for this code was the same as in https://stackoverflow.com/a/40507589/6726632
Hopefully this can help you.

LINQ implementation of Cartesian Product with pruning

I hope someone is able to help me with what is, at least to me, quite a tricky algorithm.
The Problem
I have a List (1 <= size <= 5, but size unknown until run-time) of Lists (1 <= size <= 2) that I need to combine. Here is an example of what I am looking at:-
ListOfLists = { {1}, {2,3}, {2,3}, {4}, {2,3} }
So, there are 2 stages to what I need to do:-
(1). I need to combine the inner lists in such a way that any combination has exactly ONE item from each list, that is, the possible combinations in the result set here would be:-
1,2,2,4,2
1,2,2,4,3
1,2,3,4,2
1,2,3,4,3
1,3,2,4,2
1,3,2,4,3
1,3,3,4,2
1,3,3,4,3
The Cartesian Product takes care of this, so stage 1 is done.....now, here comes the twist which I can't figure out - at least I can't figure out a LINQ way of doing it (I am still a LINQ noob).
(2). I now need to filter out any duplicate results from this Cartesian Product. A duplicate in this case constitutes any line in the result set with the same quantity of each distinct list element as another line, that is,
1,2,2,4,3 is the "same" as 1,3,2,4,2
because each distinct item within the first list occurs the same number of times in both lists (1 occurs once in each list, 2 appears twice in each list, ....
The final result set should therefore look like this...
1,2,2,4,2
1,2,2,4,3
--
1,2,3,4,3
--
--
--
1,3,3,4,3
Another example is the worst-case scenario (from a combination point of view) where the ListOfLists is {{2,3}, {2,3}, {2,3}, {2,3}, {2,3}}, i.e. a list containing inner lists of the maximum size - in this case there would obviously be 32 results in the Cartesian Product result-set, but the pruned result-set that I am trying to get at would just be:-
2,2,2,2,2
2,2,2,2,3 <-- all other results with four 2's and one 3 (in any order) are suppressed
2,2,2,3,3 <-- all other results with three 2's and two 3's are suppressed, etc
2,2,3,3,3
2,3,3,3,3
3,3,3,3,3
To any mathematically-minded folks out there - I hope you can help. I have actually got a working solution to part 2, but it is a total hack and is computationally-intensive, and I am looking for guidance in finding a more elegant, and efficient LINQ solution to the issue of pruning.
Thanks for reading.
pip
Some resources used so far (to get the Cartesian Product)
computing-a-cartesian-product-with-linq
c-permutation-of-an-array-of-arraylists
msdn
UPDATE - The Solution
Apologies for not posting this sooner...see below
You should implement your own IEqualityComparer<IEnumerable<int>> and then use that in Distinct().
The choice of hash code in the IEqualityComparer depends on your actual data, but I think something like this should be adequate if your actual data resemble those in your examples:
class UnorderedQeuenceComparer : IEqualityComparer<IEnumerable<int>>
{
public bool Equals(IEnumerable<int> x, IEnumerable<int> y)
{
return x.OrderBy(i => i).SequenceEqual(y.OrderBy(i => i));
}
public int GetHashCode(IEnumerable<int> obj)
{
return obj.Sum(i => i * i);
}
}
The important part is that GetHashCode() should be O(N), sorting would be too slow.
void Main()
{
var query = from a in new int[] { 1 }
from b in new int[] { 2, 3 }
from c in new int[] { 2, 3 }
from d in new int[] { 4 }
from e in new int[] { 2, 3 }
select new int[] { a, b, c, d, e };
query.Distinct(new ArrayComparer());
//.Dump();
}
public class ArrayComparer : IEqualityComparer<int[]>
{
public bool Equals(int[] x, int[] y)
{
if (x == null || y == null)
return false;
return x.OrderBy(i => i).SequenceEqual<int>(y.OrderBy(i => i));
}
public int GetHashCode(int[] obj)
{
if ( obj == null || obj.Length == 0)
return 0;
var hashcode = obj[0];
for (int i = 1; i < obj.Length; i++)
{
hashcode ^= obj[i];
}
return hashcode;
}
}
The finalised solution to the whole combining of multisets, then pruning the result-sets to remove duplicates problem ended up in a helper class as a static method. It takes svick's much appreciated answer and injects the IEqualityComparer dependency into the existing CartesianProduct answer I found at Eric Lipperts's blog here (I'd recommend reading his post as it explains the iterations in his thinking and why the linq implimentation is the best).
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(IEnumerable<IEnumerable<T>> sequences,
IEqualityComparer<IEnumerable<T>> sequenceComparer)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
var resultsSet = sequences.Aggregate(emptyProduct, (accumulator, sequence) => from accseq in accumulator
from item in sequence
select accseq.Concat(new[] { item }));
if (sequenceComparer != null)
return resultsSet.Distinct(sequenceComparer);
else
return resultsSet;
}

Google search results: How to find the minimum window that contains all the search keywords?

What is the complexity of the algorithm is that is used to find the smallest snippet that contains all the search key words?
As stated, the problem is solved by a rather simple algorithm:
Just look through the input text sequentially from the very beginning and check each word: whether it is in the search key or not. If the word is in the key, add it to the end of the structure that we will call The Current Block. The Current Block is just a linear sequence of words, each word accompanied by a position at which it was found in the text. The Current Block must maintain the following Property: the very first word in The Current Block must be present in The Current Block once and only once. If you add the new word to the end of The Current Block, and the above property becomes violated, you have to remove the very first word from the block. This process is called normalization of The Current Block. Normalization is a potentially iterative process, since once you remove the very first word from the block, the new first word might also violate The Property, so you'll have to remove it as well. And so on.
So, basically The Current Block is a FIFO sequence: the new words arrive at the right end, and get removed by normalization process from the left end.
All you have to do to solve the problem is look through the text, maintain The Current Block, normalizing it when necessary so that it satisfies The Property. The shortest block with all the keywords in it you ever build is the answer to the problem.
For example, consider the text
CxxxAxxxBxxAxxCxBAxxxC
with keywords A, B and C. Looking through the text you'll build the following sequence of blocks
C
CA
CAB - all words, length 9 (CxxxAxxxB...)
CABA - all words, length 12 (CxxxAxxxBxxA...)
CABAC - violates The Property, remove first C
ABAC - violates The Property, remove first A
BAC - all words, length 7 (...BxxAxxC...)
BACB - violates The Property, remove first B
ACB - all words, length 6 (...AxxCxB...)
ACBA - violates The Property, remove first A
CBA - all words, length 4 (...CxBA...)
CBAC - violates The Property, remove first C
BAC - all words, length 6 (...BAxxxC)
The best block we built has length 4, which is the answer in this case
CxxxAxxxBxxAxx CxBA xxxC
The exact complexity of this algorithm depends on the input, since it dictates how many iterations the normalization process will make, but ignoring the normalization the complexity would trivially be O(N * log M), where N is the number of words in the text and M is the number of keywords, and O(log M) is the complexity of checking whether the current word belongs to the keyword set.
Now, having said that, I have to admit that I suspect that this might not be what you need. Since you mentioned Google in the caption, it might be that the statement of the problem you gave in your post is not complete. Maybe in your case the text is indexed? (With indexing the above algorithm is still applicable, just becomes more efficient). Maybe there's some tricky database that describes the text and allows for a more efficient solution (like without looking through the entire text)? I can only guess and you are not saying...
I think the solution proposed by AndreyT assumes no duplicates exists in the keywords/search terms. Also, the current block can get as big as the text itself if text contains lot of duplicate keywords.
For example:
Text: 'ABBBBBBBBBB'
Keyword text: 'AB'
Current Block: 'ABBBBBBBBBB'
Anyway, I have implemented in C#, did some basic testing, would be nice to get some feedback on whether it works or not :)
static string FindMinWindow(string text, string searchTerms)
{
Dictionary<char, bool> searchIndex = new Dictionary<char, bool>();
foreach (var item in searchTerms)
{
searchIndex.Add(item, false);
}
Queue<Tuple<char, int>> currentBlock = new Queue<Tuple<char, int>>();
int noOfMatches = 0;
int minLength = Int32.MaxValue;
int startIndex = 0;
for(int i = 0; i < text.Length; i++)
{
char item = text[i];
if (searchIndex.ContainsKey(item))
{
if (!searchIndex[item])
{
noOfMatches++;
}
searchIndex[item] = true;
var newEntry = new Tuple<char, int> ( item, i );
currentBlock.Enqueue(newEntry);
// Normalization step.
while (currentBlock.Count(o => o.Item1.Equals(currentBlock.First().Item1)) > 1)
{
currentBlock.Dequeue();
}
// Figuring out minimum length.
if (noOfMatches == searchTerms.Length)
{
var length = currentBlock.Last().Item2 - currentBlock.First().Item2 + 1;
if (length < minLength)
{
startIndex = currentBlock.First().Item2;
minLength = length;
}
}
}
}
return noOfMatches == searchTerms.Length ? text.Substring(startIndex, minLength) : String.Empty;
}
This is an interesting question.
To restate it more formally:
Given a list L (the web page) of length n and a set S (the query) of size k, find the smallest sublist of L that contains all the elements of S.
I'll start with a brute-force solution in hopes of inspiring others to beat it.
Note that set membership can be done in constant time, after one pass through the set. See this question.
Also note that this assumes all the elements of S are in fact in L, otherwise it will just return the sublist from 1 to n.
best = (1,n)
For i from 1 to n-k:
Create/reset a hash found[] mapping each element of S to False.
For j from i to n or until counter == k:
If found[L[j]] then counter++ and let found[L[j]] = True;
If j-i < best[2]-best[1] then let best = (i,j).
Time complexity is O((n+k)(n-k)). Ie, n^2-ish.
Here's a solution using Java 8.
static Map.Entry<Integer, Integer> documentSearch(Collection<String> document, Collection<String> query) {
Queue<KeywordIndexPair> queue = new ArrayDeque<>(query.size());
HashSet<String> words = new HashSet<>();
query.stream()
.forEach(words::add);
AtomicInteger idx = new AtomicInteger();
IndexPair interval = new IndexPair(0, Integer.MAX_VALUE);
AtomicInteger size = new AtomicInteger();
document.stream()
.map(w -> new KeywordIndexPair(w, idx.getAndIncrement()))
.filter(pair -> words.contains(pair.word)) // Queue.contains is O(n) so we trade space for efficiency
.forEach(pair -> {
// only the first and last elements are useful to the algorithm, so we don't bother removing
// an element from any other index. note that removing an element using equality
// from an ArrayDeque is O(n)
KeywordIndexPair first = queue.peek();
if (pair.equals(first)) {
queue.remove();
}
queue.add(pair);
first = queue.peek();
int diff = pair.index - first.index;
if (size.incrementAndGet() == words.size() && diff < interval.interval()) {
interval.begin = first.index;
interval.end = pair.index;
size.set(0);
}
});
return new AbstractMap.SimpleImmutableEntry<>(interval.begin, interval.end);
}
There are 2 static nested classes KeywordIndexPair and IndexPair, the implementation of which should be apparent from the names. Using a smarter programming language that supports tuples those classes wouldn't be necessary.
Test:
Document: apple, banana, apple, apple, dog, cat, apple, dog, banana, apple, cat, dog
Query: banana, cat
Interval: 8, 10
For all the words, maintain min and max index in case there is going to be more than one entry; if not both min and mix index will same.
import edu.princeton.cs.algs4.ST;
public class DicMN {
ST<String, Words> st = new ST<>();
public class Words {
int min;
int max;
public Words(int index) {
min = index;
max = index;
}
}
public int findMinInterval(String[] sw) {
int begin = Integer.MAX_VALUE;
int end = Integer.MIN_VALUE;
for (int i = 0; i < sw.length; i++) {
if (st.contains(sw[i])) {
Words w = st.get(sw[i]);
begin = Math.min(begin, w.min);
end = Math.max(end, w.max);
}
}
if (begin != Integer.MAX_VALUE) {
return (end - begin) + 1;
}
return 0;
}
public void put(String[] dw) {
for (int i = 0; i < dw.length; i++) {
if (!st.contains(dw[i])) {
st.put(dw[i], new Words(i));
}
else {
Words w = st.get(dw[i]);
w.min = Math.min(w.min, i);
w.max = Math.max(w.max, i);
}
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
DicMN dic = new DicMN();
String[] arr1 = { "one", "two", "three", "four", "five", "six", "seven", "eight" };
dic.put(arr1);
String[] arr2 = { "two", "five" };
System.out.print("Interval:" + dic.findMinInterval(arr2));
}
}

Resources