Algorithm to find string which is longest and contains lowest possible numbers? - algorithm

I have to find string out of these which is longest and contains the lowest possible numbers.
10,20
10,20,30
10,30,40
30,40,50
Ans 10,20,30
I am trying to think of algorithm which can be used to find such record.
Record will not contain duplicate number for example e.g 10,10,20 won't be case.

Based on clarification of requirements:
split strings by delimiters and convert to lists of (unique) numbers
determine the max # of elements, M, in any list
get the set of lists with M elements
for each list, find the min element
pick the list with the lowest min

There is a general solution for problems where you need to find a string out of a collection of strings.
In case your data can be sorted then use a PriorityQueue.
From the docs:
An unbounded priority queue based on a priority heap. The elements of
the priority queue are ordered according to their natural ordering, or
by a Comparator
PriorityQueue<String> a = new PriorityQueue<>((o1, o2) -> {
int compare = Integer.compare(o1.length(), o2.length());
if(compare != 0){
return compare;
}
int result = compare numbers...
return result;
});
a.add("10,20");
a.add("10,20,30");
a.add("10,30,40");
a.add("30,40,50");
System.out.println(a.remove()); // 10,20,30
In case it's not, use any List and do the filtering one by one.

Related

Grouping numbers in a list

I came across the following question,
You are given an array A of n elements. These elements are now added to a new list L which is initially empty , in a certain order based on the given q queries.
In each query you are given an integer i that corresponds to A[i] in the array A. This means that you have to add the element A[i] to the list L.
After each element is added to the list L, make groups among the elements in the list L. Two elements will be in same group if their indexes in the array A are consecutive.
For each group we define the group’s value as axb where a is the largest value in that group and b is the size of that group.
Print the maximum group value among all the groups that are formed after each element is added to the list L.
My approach was to use a map<int,vector<int>> where key is the group number and value is a vector containing group size, max. of group. I also had an array g and g[i] indicated group number of a[i], -1 if it is not in any group. The code below is a part of my implementation, but I'm sure there are better ways to solve this question as this solution of mine gave TLE and WA in some cases,and I can't seem to figure out the correct approach. Pls suggest optimal way to solve this.
int g[a.size()+2]; //+2 because queries start with index 1, and g[i] corresponds to a[i-1]
for(int i=0;i<a.size()+2;i++)
g[i]=-1;
int gno=1;
map<int,vector<int> > m;
vector<int> ans;
int mx=0;
for(unsigned int i=0;i<queries.size();i++){
int q = queries[i];
if(g[q-1]==-1 && g[q+1]==-1){
//create new group with current eleent as first element
g[q] = gno; //gno is the group number.
vector<int> v;
v.push_back(1);
v.push_back(a[q-1]);
m[gno]=v;
mx = max(mx,m[gno][0]*m[gno][1]);
gno++;
}
else if(g[q-1]!=-1 && g[q+1]==-1){
//join current element to left group
g[q] = g[q-1];
m[g[q]][0]++;
m[g[q]][1] = max(m[g[q]][1],a[q-1]);
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
else if(g[q-1]==-1 && g[q+1]!=-1){
//join current element to right group
g[q] = g[q+1];
m[g[q]][0]++;
m[g[q]][1] = max(m[g[q]][1],a[q-1]);
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
else{
//join both groups to left and right
g[q]=g[q-1];
int g1 = g[q];
int i;
m[g[q]][0] += 1 + m[g[q+1]][0];
m[g[q]][1] = max(m[g[q]][1],max(a[q-1],m[g[q+1]][1]));
for(i=q+1;g[i]==g[i+1];i++){
g[i]=g1;
}
g[i]=g1;
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
ans.push_back(mx);
}
.
I would not actually build list L. It may be too costly in time to find what to do with a new value: is it a new group on itself, does it extend an existing group, do two groups need to merge into one? If the first values are all far apart, you'll have many groups, and you need to iterate them with each new incoming value: this is not efficient.
I would just collect all the values first and only then see how they fit in groups.
There are two ways to collect the values:
Store them in a list, and when all values have been collected, sort the list in ascending order
Flag the entry in an array of booleans of size n. This way you do not have to sort it, but afterwards you do need to iterate the whole array to find the values in ascending order.
Method 1 will be the best when q is a lot less than n. Method 2 will be better for greater q.
With both methods you'll be able to iterate over the found values in ascending order, and while doing so you can identify the groups, their value, and also keep track of the largest group-value. Only one sweep is needed to find the answer.
Let's start with two simplifying assumptions:
no duplicates. Once a given index i has been "queried", it will never be queried again.
no negative numbers. All elements are positive or zero, so the largest value in a group is always positive or zero, so expanding a group (or merging two groups) will never cause the overall "maximum group value" to decrease.
(Further below I'll show how to not require those assumptions, but for now this will simplify the picture.)
So, whenever we "query" an index i, there are four cases:
i-1 is currently the right-endpoint of a group (by which I mean its greatest index) and i+1 is currently the left-endpoint of another group.
In this case, we need to merge the two groups into a single group, with i bridging the gap between them.
i-1 is currently the right-endpoint of a group, but i+1 is not currently in any group.
In this case we need to extend the group to cover i.
i-1 is not currently in any group, but i+1 is currently the left-endpoint of a group.
In this case, as in the previous case, we need to extend the group to cover i.
Neither i-1 nor i+1 is in a group.
In this case, we have a new group with just one element.
In all cases, the key thing to note is that we're only interested in the endpoints of groups. So we don't need a general mapping from indices to their groups . . . which is good, because when we merge two groups, it would be expensive to then go and update every single index from one group to point to the other.
So we just need three mappings:
std::unordered_map<int, int> map_from_left_endpoint_to_right_endpoint;
std::unordered_map<int, int> map_from_right_endpoint_to_left_endpoint;
std::unordered_map<int, int> map_from_left_endpoint_to_largest_value;
To distinguish the four cases, we use e.g. map_from_right_endpoint_to_left_endpoint.find(i - 1) (which returns an iterator pointing to the left-endpoint of the group that i-1 is the right-endpoint of, if applicable; otherwise it returns map_from_right_endpoint_to_left_endpoint.end()). We then delete entries as they become no-longer-applicable (due to groups being extended or merged in a given direction), in addition to (obviously) inserting new entries, and updating the values of existing entries.
In addition to those values, we also need an
int maximum_group_value = 0;
and whenever we extend a group or merge two groups, we check whether the value of the resulting group (meaning its largest_value * (right_endpoint - left_endpoint + 1) is greater than maximum_group_value. If so, we update maximum_group_value and return it; if not, we return maximum_group_value as-is.
Now, what if duplicates are allowed, such that a given index i might be "queried" after it already belongs to a group?
The simplest approach is to simply keep track of which i-s have already been queried; but a more elegant approach, if desired, might be to change map_from_left_endpoint_to_right_endpoint from a std::unordered_map to a std::map, and then use something like this:
bool is_already_in_a_group(
std::map<int, int> const & map_from_left_endpoint_to_right_endpoint,
int const i) {
// get iterator to first element *after* index (or to 'end()' if no such):
auto iter = map_from_left_endpoint_to_right_endpoint.upper_bound(index);
// if that pointer points to 'begin()', then there are no elements
// at or before index:
if (iter == map_from_left_endpoint_to_right_endpoint.begin()) {
return false;
}
// otherwise, move iterator to point to the last element whose key is
// less than or equal to index:
--iter;
// . . . and check whether the value of that element is greater than
// or equal to index (meaning that [key, value] spans index):
return iter->second >= index;
}
to check if the greatest key in map_from_left_endpoint_to_right_endpoint that is less than or equal to i is mapped to a value that is greater than or equal to i.
This adds a fifth case to our case analysis above — "if i is already inside a group, just do nothing and return maximum_group_value" — but other than that, has no effect.
Note that this same approach also lets us eliminate map_from_right_endpoint_to_left_endpoint, if we want: the above function could easily be tweaked to int get_left_endpoint_for_right_endpoint by changing its return statement to return iter->second == index ? iter->first : -1;.
At this point it becomes sensible to define a Group class with three fields (left_endpoint, right_endpoint, and largest_value), and just keep a single map_from_left_endpoint_to_group.
Lastly — what if negative values are allowed, such that the "maximum group value" can actually decrease as the result of a query? (For example, if the array elements are [-1, -10] and the queries are i=0, i=1, then the results are maximum_group_value=-1, maximum_group_value=-2.) In such a case, we need to keep track of the values of all current groups, because any one of them might suddenly become the maximum.
For that, instead of storing a single int maximum_group_value, we can maintain a heap of groups, ordered by value, that we push into every time we create/extend/merge groups. (We can just use a std::vector<Group> for this, plus std::push_heap with an appropriate comparator, or with an appropriate definition for operator<(Group const &, Group const &).) After each query, we check if the top group on the heap (the first element in the vector) is still a group that actually exists; if so, we return its value, otherwise we pop it (using std::pop_heap) and repeat.
As an optimization, we can also store int maximum_group_value, and eliminate the heap once we've encountered a nonnegative array-element (since as soon as a given group contains a nonnegative array-element, its value can never decrease again, and obviously the maximum group value will be the value of one of those groups).

Valid Permutations of a String

This question was asked to me in a recent amazon technical interview. It goes as follows:-
Given a string ex: "where am i" and a dictionary of valid words, you have to list all valid distinct permutations of the string. A valid string comprises of words which exists in the dictionary. For ex: "we are him","whim aree" are valid strings considering the words(whim, aree) are part of the dictionary. Also the condition is that a mere rearrangement of words is not a valid string, i.e "i am where" is not a valid combination.
The task is to find all possible such strings in the optimum way.
As you have said, space doesn't count, so input can be just viewed as a list of chars. The output is the permutation of words, so an obvious way to do it is find all valid words then permutate them.
Now problem becomes to divide a list of chars into subsets which each forms a word, which you can find some answers here and following is my version to solve this sub-problem.
If the dictionary is not large, we can iterate dictionary to
find min_len/max_len of words, to estimate how many words we may have, i.e. how deep we recur
convert word into map to accelerate search;
filter the words which have impossible char (i.e. the char our input doesn't have) out;
if this word is subset of our input, we can find word recursively.
The following is pseudocode:
int maxDepth = input.length / min_len;
void findWord(List<Map<Character, Integer>> filteredDict, Map<Character, Integer> input, List<String> subsets, int level) {
if (level < maxDepth) {
for (Map<Character, Integer> word : filteredDict) {
if (subset(input, word)) {
subsets.add(word);
findWord(filteredDict, removeSubset(input, word), subsets, level + 1);
}
}
}
}
And then you can permutate words in a recursive functions easily.
Technically speaking, this solution can be O(n**d) -- where n is dictionary size and d is max depth. But if the input is not large and complex, we can still solve it in feasible time.

Binary search with gaps

Let's imagine two arrays like this:
[8,2,3,4,9,5,7]
[0,1,1,0,0,1,1]
How can I perform a binary search only in numbers with an 1 below it, ignoring the rest?
I know this can be in O(log n) comparisons, but my current method is slower because it has to go through all the 0s until it hits an 1.
If you hit a number with a 0 below, you need to scan in both directions for a number with a 1 below until you find it -- or the local search space is exhausted. As the scan for a 1 is linear, the ratio of 0s to 1s determines whether the resulting algorithm can still be faster than linear.
This question is very old, but I've just discovered a wonderful little trick to solve this problem in most cases where it comes up. I'm writing this answer so that I can refer to it elsewhere:
Fast Append, Delete, and Binary Search in a Sorted Array
The need to dynamically insert or delete items from a sorted collection, while preserving the ability to search, typically forces us to switch from a simple array representation using binary search to some kind of search tree -- a far more complicated data structure.
If you only need to insert at the end, however (i.e., you always insert a largest or smallest item), or you don't need to insert at all, then it's possible to use a much simpler data structure. It consists of:
A dynamic (resizable) array of items, the item array; and
A dynamic array of integers, the set array. The set array is used as a disjoint set data structure, using the single-array representation described here: How to properly implement disjoint set data structure for finding spanning forests in Python?
The two arrays are always the same size. As long as there have been no deletions, the item array just contains the items in sorted order, and the set array is full of singleton sets corresponding to those items.
If items have been deleted, though, items in the item array are only valid if the there is a root set at the corresponding position in the set array. All sets that have been merged into a single root will be contiguous in the set array.
This data structure supports the required operations as follows:
Append (O(1))
To append a new largest item, just append the item to the item array, and append a new singleton set to the set array.
Delete (amortized effectively O(log N))
To delete a valid item, first call search to find the adjacent larger valid item. If there is no larger valid item, then just truncate both arrays to remove the item and all adjacent deleted items. Since merged sets are contiguous in the set array, this will leave both arrays in a consistent state.
Otherwise, merge the sets for the deleted item and adjacent item in the set array. If the deleted item's set is chosen as the new root, then move the adjacent item into the deleted item's position in the item array. Whichever position isn't chosen will be unused from now on, and can be nulled-out to release a reference if necessary.
If less than half of the item array is valid after a delete, then deleted items should be removed from the item array and the set array should be reset to an all-singleton state.
Search (amortized effectively O(log N))
Binary search proceeds normally, except that we need to find the representative item for every test position:
int find(item_array, set_array, itemToFind) {
int pos = 0;
int limit = item_array.length;
while (pos < limit) {
int testPos = pos + floor((limit-pos)/2);
if (item_array[find_set(set_array, testPos)] < itemToFind) {
pos = testPos + 1; //testPos is too low
} else {
limit = testPos; //testPos is not too low
}
}
if (pos >= item_array.length) {
return -1; //not found
}
pos = find_set(set_array, pos);
return (item_array[pos] == itemToFind) ? pos : -1;
}

How to find the number of occurences of a number in a sorted Array in O(logn)

I have a sorted Array of Integers. Given a number, how to find the number of occurences of that number in O(logn) even the worst case.
Do one binary search for the point where all elements to the left are smaller than your number and all to the right are equal or greater and one for the point where all elements are smaller or equal to the left and greater to the right.
In other words: in one search your "test" is <, while in the other search the test is <=.
Both searches are log n, so that's your total.
For example, in C++ the two functions you'd need are std::lower_bound and std::upper_bound - it seems like the existing Java binary search functions (on Collections) always try to look for a specific value, so you probably have to implement the search yourself if you're using that.
It's important that you use a binary search variant that uses only a binary predicate! If you use a variant that checks whether it hit a specific key "by accident" will sometimes terminate too soon for this specific task.
Binary search for the number, then binary search for the start and the end of the run.
Search for the insertion points of number + 0.5 and number - 0.5 using two binary searches. The number of elements with value number in the list is the difference of the positions (indices) of these two insertion points.
Run the function below once as it is and once with the if condition changed to if(a[mid]<=val) and the corresponding changes in the recursive calls. The two values returned are the starting and ending occurence of the particular value.
int binmin(int a[], int start, int end,int val )
{
if(start<end)
{
int mid = (start+end)/2;
if(a[mid]>=val)
{
binmin(a,start,mid-1,val);
}
else if(a[mid]<val)
{
binmin(a,mid+1,end,val);
}
}
else if(start>end)
return start;
}

Efficient algorithm to remove any map that is contained in another map from a collection of maps

I have set (s) of unique maps (Java HashMaps currently) and wish to remove from it any maps that are completely contained by some other map in the set (i.e. remove m from s if m.entrySet() is a subset of n.entrySet() for some other n in s.)
I have an n^2 algorithm, but it's too slow. Is there a more efficient way to do this?
Edit:
the set of possible keys is small, if that helps.
Here is an inefficient reference implementation:
public void removeSubmaps(Set<Map> s) {
Set<Map> toRemove = new HashSet<Map>();
for (Map a: s) {
for (Map b : s) {
if (a.entrySet().containsAll(b.entrySet()))
toRemove.add(b);
}
}
s.removeAll(toRemove);
}
Not sure I can make this anything other than an n^2 algorithm, but I have a shortcut that might make it faster. Make a list of your maps with the length of the each map and sort it. A proper subset of a map must be shorter or equal to the map you're comparing - there's never any need to compare to a map higher on the list.
Here's another stab at it.
Decompose all your maps into a list of key,value,map number. Sort the list by key and value. Go through the list, and for each group of key/value matches, create a permutation of all the map number pairs - these are all potential subsets. When you have the final list of pairs, sort by map numbers. Go through this second list, and count the number of occurrences of each pair - if the number matches the size of one of the maps, you've found a subset.
Edit: My original interpretation of the problem was incorrect, here is new answer based on my re-read of the question.
You can create a custom hash function for HashMap which returns the product of all hash value of its entries. Sort the list of hash value and start loop from biggest value and find all divisor from smaller hash values, these are possible subsets of this hashmap, use set.containsAll() to confirm before marking them for removal.
This effectively transforms the problem into a mathematical problem of finding possible divisor from a collection. And you can apply all the common divisor-search optimizations.
Complexity is O(n^2), but if many hashmaps are subsets of others, the actual time spent can be a lot better, approaching O(n) in best-case scenario (if all hashmaps are subset of one). But even in worst case scenario, division calculation would be a lot faster than set.containsAll() which itself is O(n^2) where n is number of items in a hashmap.
You might also want to create a simple hash function for hashmap entry objects to return smaller numbers to increase multiply/division performance.
Here's a subquadratic (O(N**2 / log N)) algorithm for finding maximal sets from a set of sets: An Old Sub-Quadratic Algorithm for Finding Extremal Sets.
But if you know your data distribution, you can do much better in average case.
This what I ended up doing. It works well in my situation as there is usually some value that is only shared by a small number of maps. Kudos to Mark Ransom for pushing me in this direction.
In prose: Index the maps by key/value pair, so that each key/value pair is associated with a set of maps. Then, for each map: Find the smallest set associated with one of it's key/value pairs; this set is typically small for my data. Each of the maps in this set is a potential 'supermap'; no other map could be a 'supermap' as it would not contain this key/value pair. Search this set for a supermap. Finally remove all the identified submaps from the original set.
private <K, V> void removeSubmaps(Set<Map<K, V>> maps) {
// index the maps by key/value
List<Map<K, V>> mapList = toList(maps);
Map<K, Map<V, List<Integer>>> values = LazyMap.create(HashMap.class, ArrayList.class);
for (int i = 0, uniqueRowsSize = mapList.size(); i < uniqueRowsSize; i++) {
Map<K, V> row = mapList.get(i);
Integer idx = i;
for (Map.Entry<K, V> entry : row.entrySet())
values.get(entry.getKey()).get(entry.getValue()).add(idx);
}
// find submaps
Set<Map<K, V>> toRemove = Sets.newHashSet();
for (Map<K, V> submap : mapList) {
// find the smallest set of maps with a matching key/value
List<Integer> smallestList = null;
for (Map.Entry<K, V> entry : submap.entrySet()) {
List<Integer> list = values.get(entry.getKey()).get(entry.getValue());
if (smallestList == null || list.size() < smallestList.size())
smallestList = list;
}
// compare with each of the maps in that set
for (int i : smallestList) {
Map<K, V> map = mapList.get(i);
if (isSubmap(submap, map))
toRemove.add(submap);
}
}
maps.removeAll(toRemove);
}
private <K,V> boolean isSubmap(Map<K, V> submap, Map<K,V> map){
if (submap.size() >= map.size())
return false;
for (Map.Entry<K,V> entry : submap.entrySet()) {
V other = map.get(entry.getKey());
if (other == null)
return false;
if (!other.equals(entry.getValue()))
return false;
}
return true;
}

Resources