Algorithm - managing order of an array - algorithm

I'm looking for a solution for this problem.
I have an array which defines the rule of the order of elements like below.
let rule = [A,B,C,D,E,F,G,H,I,J,K]
I then have another array whose element can be removed or added back.
So for example, I have a list like this:
var list = [A,D,E,I,J,K]
Now If I want to add element 'B' to 'list' the list should be
var list = [A,B,D,E,I,J,K]
because 'B' comes after 'A' and before 'D' in the rule array. So the insertion index would be 1 in this case.
The item in the array are not comparable each other (Let's say a developer can change the order of rule list at any time if that make sense). And there needs no duplicates in the array.
I'm not sure if I explained the problem clearly, but I'd like to know a good approach that finds an insertion index.

Explained the Python code in comments. Basically, find the right place to insert the new element using binary search. The order of elements is decided using rank. The below code assumes that if elements is non-empty then the rule is followed by items in the elements.
rule = ['A','B','C','D','E','F','G','H','I','J','K']
rank = dict()
for i in range(len(rule)):
rank[rule[i]] = i
elements = ['A','D','E','I','J','K'] #list in which we wish to add elements
target = 'B' #element to be inserted
#Binary search to find the right place to insert the target in elements
left, right = 0, len(elements)
while left < right:
mid = left + (right - left) // 2
if rank[elements[mid]] >= rank[target]:
right = mid
else:
left = mid + 1
elements.insert(left, target) #left is the insertion index
print(elements)
Time complexity of add: O(log(len(elements)))
Space complexity: O(1)

If the items are unique (only can occur once), and are not comparable to each other (don't know that B comes after A), then.
Iterate through the rules and find the items position in the rule array.
Check if it is the first item in rules, if so insert at the first position and skip the other steps.
Check to see if it is the last item in rules, if so insert at the end and skip the other steps.
Select the value of the item 1 before into a variable A.
Select the value of the item 1 after into a variable B.
Iterate through the list,
if you encounter the value in parameter A insert it after that value, if you encounter the value B, add the value before that.
If you come to the end without finding either value A or B, then you need to repeat but with values 2 before and 2 after the item in the rules (again checking to see if you hit the start or end of the rules list).
You will probably want to make 6 & 7 a function that calls itself recursively.

A simple approach is, we can use one iteration of Insertion sort.
So, we start from right side of array compare our input x with array elements a go from right to left side. if we arrive an index i of array that let[i]<=x then let[i+1] is correct location that x can be insert.
This approach that has time complexity O(n), follow from correctness of Insertion sort.
Note that the lower of your problem is O(n) because your data structure is array so you need after each insertion shift whole elements.

Related

why Find-Minimum operation in priority queue implemented in unsorted array take only complexity = O(1) ? <steven skiena's the algorithm design manual>

In steven skiena's the algorithm design manual (page 85),
The author show in a table that priority queue implemented in unsorted array only take O(1) for both insertion and find minimum operation.
For my understanding unsorted array wasn't able get the minimum item in O(1) , because it has to search through the whole array to get the minimum.
is there any details i missed out in priority queue ?
It's (mostly) written there under the table:
The trick is using an extra variable to store a pointer/index to the minimum ...
Presumably, the next word is "value", meaning it's a simple O(1) dereference to get the minimum.
When inserting an item, you just append it to the end and, if it's less than the current minimum, update that pointer/index. That means O(1) for the insert.
The only "expensive" operation is then delete-minimum. You know where it is due to the pointer/index but it will take O(n) operations to shuffle the array elements beyond it down one.
And, since the cost is already O(n), you may as well take the opportunity to search the array for the new minimum and store its position in the pointer/index.
The pseudo-code for those operations be something along the lines of (first up, initialisation and insertion, and assuming zero-based indexes):
class prioQ:
array = [] # Empty queue.
lowIndex = 0 # Index of lowest value (for non-empty queue).
def insert(item):
# Add to end, quick calc if array empty beforehand.
array.append(item)
if len(array) == 1:
lowIndex = 0
return
# Adjust low-index only if inserted value smaller than current.
if array[lowIndex] > item:
lowIndex = len(array) - 1
Then a function to find the actual minimum value:
def findMin():
# Empty array means no minimum. Otherwise, return minimum.
if len(array) == 0: return None
return array[lowIndex]
And, finally, to extract the minimum value (remove it from the queue and return it):
def extractMin():
# Empty array means no minimum. Otherwise save lowest value.
if len(array) == 0: return None
retVal = array[lowIndex]
# Shuffle down all following elements to delete lowest one
for index = lowIndex to len(array) - 2 inclusive:
array[index] = array[index + 1]
# Remove final element (it's already been shuffled).
delete array[len(array) - 1]
# Find lowest element and store.
if len(array) > 0:
lowIndex = len(array) - 1
for index = len(array) - 2 to 0 inclusive:
if array[index] <= array[lowIndex]:
lowIndex = index
# Return saved value.
return retVal
As an aside, the two loops in the extractMin function could be combined in to one for efficiency. I've left it as two separate loops for readability.
One thing you should keep in mind, there are actually variations of the priority queue that preserve insertion order (within a priority level) and variations that do not care about that order.
For the latter case, you don't have to shuffle all the elements to remove an extracted one, you can simply move the last one in the array over the extracted one. This may result in some time savings if you don't actually need to preserve insertion order - you still have to scan the entire array looking for the new highest-priority item but at least the number of shuffle assignments will be reduced.
#paxdiablo's answer gives the scheme referred to in the book. Another way to achieve the same complexity is to always store the minimum at the first index in the array:
To insert x in O(1) time, either insert it at the end (if it is bigger than the current minimum), or copy the current minimum to the end and then store x at index 0.
To query the minimum in O(1) time, return the value at index 0.
To delete the minimum in O(n) time, search for the new minimum from index 1 onwards, write it at index 0, then "fill in the gap" by swapping the element at the last index to where the new minimum used to be.

Grouping numbers in a list

I came across the following question,
You are given an array A of n elements. These elements are now added to a new list L which is initially empty , in a certain order based on the given q queries.
In each query you are given an integer i that corresponds to A[i] in the array A. This means that you have to add the element A[i] to the list L.
After each element is added to the list L, make groups among the elements in the list L. Two elements will be in same group if their indexes in the array A are consecutive.
For each group we define the group’s value as axb where a is the largest value in that group and b is the size of that group.
Print the maximum group value among all the groups that are formed after each element is added to the list L.
My approach was to use a map<int,vector<int>> where key is the group number and value is a vector containing group size, max. of group. I also had an array g and g[i] indicated group number of a[i], -1 if it is not in any group. The code below is a part of my implementation, but I'm sure there are better ways to solve this question as this solution of mine gave TLE and WA in some cases,and I can't seem to figure out the correct approach. Pls suggest optimal way to solve this.
int g[a.size()+2]; //+2 because queries start with index 1, and g[i] corresponds to a[i-1]
for(int i=0;i<a.size()+2;i++)
g[i]=-1;
int gno=1;
map<int,vector<int> > m;
vector<int> ans;
int mx=0;
for(unsigned int i=0;i<queries.size();i++){
int q = queries[i];
if(g[q-1]==-1 && g[q+1]==-1){
//create new group with current eleent as first element
g[q] = gno; //gno is the group number.
vector<int> v;
v.push_back(1);
v.push_back(a[q-1]);
m[gno]=v;
mx = max(mx,m[gno][0]*m[gno][1]);
gno++;
}
else if(g[q-1]!=-1 && g[q+1]==-1){
//join current element to left group
g[q] = g[q-1];
m[g[q]][0]++;
m[g[q]][1] = max(m[g[q]][1],a[q-1]);
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
else if(g[q-1]==-1 && g[q+1]!=-1){
//join current element to right group
g[q] = g[q+1];
m[g[q]][0]++;
m[g[q]][1] = max(m[g[q]][1],a[q-1]);
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
else{
//join both groups to left and right
g[q]=g[q-1];
int g1 = g[q];
int i;
m[g[q]][0] += 1 + m[g[q+1]][0];
m[g[q]][1] = max(m[g[q]][1],max(a[q-1],m[g[q+1]][1]));
for(i=q+1;g[i]==g[i+1];i++){
g[i]=g1;
}
g[i]=g1;
mx = max(mx,m[g[q]][0]*m[g[q]][1]);
}
ans.push_back(mx);
}
.
I would not actually build list L. It may be too costly in time to find what to do with a new value: is it a new group on itself, does it extend an existing group, do two groups need to merge into one? If the first values are all far apart, you'll have many groups, and you need to iterate them with each new incoming value: this is not efficient.
I would just collect all the values first and only then see how they fit in groups.
There are two ways to collect the values:
Store them in a list, and when all values have been collected, sort the list in ascending order
Flag the entry in an array of booleans of size n. This way you do not have to sort it, but afterwards you do need to iterate the whole array to find the values in ascending order.
Method 1 will be the best when q is a lot less than n. Method 2 will be better for greater q.
With both methods you'll be able to iterate over the found values in ascending order, and while doing so you can identify the groups, their value, and also keep track of the largest group-value. Only one sweep is needed to find the answer.
Let's start with two simplifying assumptions:
no duplicates. Once a given index i has been "queried", it will never be queried again.
no negative numbers. All elements are positive or zero, so the largest value in a group is always positive or zero, so expanding a group (or merging two groups) will never cause the overall "maximum group value" to decrease.
(Further below I'll show how to not require those assumptions, but for now this will simplify the picture.)
So, whenever we "query" an index i, there are four cases:
i-1 is currently the right-endpoint of a group (by which I mean its greatest index) and i+1 is currently the left-endpoint of another group.
In this case, we need to merge the two groups into a single group, with i bridging the gap between them.
i-1 is currently the right-endpoint of a group, but i+1 is not currently in any group.
In this case we need to extend the group to cover i.
i-1 is not currently in any group, but i+1 is currently the left-endpoint of a group.
In this case, as in the previous case, we need to extend the group to cover i.
Neither i-1 nor i+1 is in a group.
In this case, we have a new group with just one element.
In all cases, the key thing to note is that we're only interested in the endpoints of groups. So we don't need a general mapping from indices to their groups . . . which is good, because when we merge two groups, it would be expensive to then go and update every single index from one group to point to the other.
So we just need three mappings:
std::unordered_map<int, int> map_from_left_endpoint_to_right_endpoint;
std::unordered_map<int, int> map_from_right_endpoint_to_left_endpoint;
std::unordered_map<int, int> map_from_left_endpoint_to_largest_value;
To distinguish the four cases, we use e.g. map_from_right_endpoint_to_left_endpoint.find(i - 1) (which returns an iterator pointing to the left-endpoint of the group that i-1 is the right-endpoint of, if applicable; otherwise it returns map_from_right_endpoint_to_left_endpoint.end()). We then delete entries as they become no-longer-applicable (due to groups being extended or merged in a given direction), in addition to (obviously) inserting new entries, and updating the values of existing entries.
In addition to those values, we also need an
int maximum_group_value = 0;
and whenever we extend a group or merge two groups, we check whether the value of the resulting group (meaning its largest_value * (right_endpoint - left_endpoint + 1) is greater than maximum_group_value. If so, we update maximum_group_value and return it; if not, we return maximum_group_value as-is.
Now, what if duplicates are allowed, such that a given index i might be "queried" after it already belongs to a group?
The simplest approach is to simply keep track of which i-s have already been queried; but a more elegant approach, if desired, might be to change map_from_left_endpoint_to_right_endpoint from a std::unordered_map to a std::map, and then use something like this:
bool is_already_in_a_group(
std::map<int, int> const & map_from_left_endpoint_to_right_endpoint,
int const i) {
// get iterator to first element *after* index (or to 'end()' if no such):
auto iter = map_from_left_endpoint_to_right_endpoint.upper_bound(index);
// if that pointer points to 'begin()', then there are no elements
// at or before index:
if (iter == map_from_left_endpoint_to_right_endpoint.begin()) {
return false;
}
// otherwise, move iterator to point to the last element whose key is
// less than or equal to index:
--iter;
// . . . and check whether the value of that element is greater than
// or equal to index (meaning that [key, value] spans index):
return iter->second >= index;
}
to check if the greatest key in map_from_left_endpoint_to_right_endpoint that is less than or equal to i is mapped to a value that is greater than or equal to i.
This adds a fifth case to our case analysis above — "if i is already inside a group, just do nothing and return maximum_group_value" — but other than that, has no effect.
Note that this same approach also lets us eliminate map_from_right_endpoint_to_left_endpoint, if we want: the above function could easily be tweaked to int get_left_endpoint_for_right_endpoint by changing its return statement to return iter->second == index ? iter->first : -1;.
At this point it becomes sensible to define a Group class with three fields (left_endpoint, right_endpoint, and largest_value), and just keep a single map_from_left_endpoint_to_group.
Lastly — what if negative values are allowed, such that the "maximum group value" can actually decrease as the result of a query? (For example, if the array elements are [-1, -10] and the queries are i=0, i=1, then the results are maximum_group_value=-1, maximum_group_value=-2.) In such a case, we need to keep track of the values of all current groups, because any one of them might suddenly become the maximum.
For that, instead of storing a single int maximum_group_value, we can maintain a heap of groups, ordered by value, that we push into every time we create/extend/merge groups. (We can just use a std::vector<Group> for this, plus std::push_heap with an appropriate comparator, or with an appropriate definition for operator<(Group const &, Group const &).) After each query, we check if the top group on the heap (the first element in the vector) is still a group that actually exists; if so, we return its value, otherwise we pop it (using std::pop_heap) and repeat.
As an optimization, we can also store int maximum_group_value, and eliminate the heap once we've encountered a nonnegative array-element (since as soon as a given group contains a nonnegative array-element, its value can never decrease again, and obviously the maximum group value will be the value of one of those groups).

Binary search with gaps

Let's imagine two arrays like this:
[8,2,3,4,9,5,7]
[0,1,1,0,0,1,1]
How can I perform a binary search only in numbers with an 1 below it, ignoring the rest?
I know this can be in O(log n) comparisons, but my current method is slower because it has to go through all the 0s until it hits an 1.
If you hit a number with a 0 below, you need to scan in both directions for a number with a 1 below until you find it -- or the local search space is exhausted. As the scan for a 1 is linear, the ratio of 0s to 1s determines whether the resulting algorithm can still be faster than linear.
This question is very old, but I've just discovered a wonderful little trick to solve this problem in most cases where it comes up. I'm writing this answer so that I can refer to it elsewhere:
Fast Append, Delete, and Binary Search in a Sorted Array
The need to dynamically insert or delete items from a sorted collection, while preserving the ability to search, typically forces us to switch from a simple array representation using binary search to some kind of search tree -- a far more complicated data structure.
If you only need to insert at the end, however (i.e., you always insert a largest or smallest item), or you don't need to insert at all, then it's possible to use a much simpler data structure. It consists of:
A dynamic (resizable) array of items, the item array; and
A dynamic array of integers, the set array. The set array is used as a disjoint set data structure, using the single-array representation described here: How to properly implement disjoint set data structure for finding spanning forests in Python?
The two arrays are always the same size. As long as there have been no deletions, the item array just contains the items in sorted order, and the set array is full of singleton sets corresponding to those items.
If items have been deleted, though, items in the item array are only valid if the there is a root set at the corresponding position in the set array. All sets that have been merged into a single root will be contiguous in the set array.
This data structure supports the required operations as follows:
Append (O(1))
To append a new largest item, just append the item to the item array, and append a new singleton set to the set array.
Delete (amortized effectively O(log N))
To delete a valid item, first call search to find the adjacent larger valid item. If there is no larger valid item, then just truncate both arrays to remove the item and all adjacent deleted items. Since merged sets are contiguous in the set array, this will leave both arrays in a consistent state.
Otherwise, merge the sets for the deleted item and adjacent item in the set array. If the deleted item's set is chosen as the new root, then move the adjacent item into the deleted item's position in the item array. Whichever position isn't chosen will be unused from now on, and can be nulled-out to release a reference if necessary.
If less than half of the item array is valid after a delete, then deleted items should be removed from the item array and the set array should be reset to an all-singleton state.
Search (amortized effectively O(log N))
Binary search proceeds normally, except that we need to find the representative item for every test position:
int find(item_array, set_array, itemToFind) {
int pos = 0;
int limit = item_array.length;
while (pos < limit) {
int testPos = pos + floor((limit-pos)/2);
if (item_array[find_set(set_array, testPos)] < itemToFind) {
pos = testPos + 1; //testPos is too low
} else {
limit = testPos; //testPos is not too low
}
}
if (pos >= item_array.length) {
return -1; //not found
}
pos = find_set(set_array, pos);
return (item_array[pos] == itemToFind) ? pos : -1;
}

algorithm to find the shortest subarray with distinct values

Given an array A, find a shortest subarray A[i : j ] such that each distinct value present in A is also present in the subarray.
The question is not for a homework. It's a practice problem from a chapter on Hash tables. I am not looking for the code. Just looking for the algorithm or hints.
1- Maintain a hash table element->count
2- Traverse array from begin to end, incrementing the element count. Whenever an element count is changed from 0 to 1, record it's index in a variable , say index_0_1. In the end index_0_1 will have end index of a potential ans.
3- Traverse array from begin to index_0_1, decrementing the element count. Stop, whenever an element count is changed from 1 to 0, record it's index in a variable, say index_1_0. subarray A[index_1_0 : index_0_1] is a potential ans, record it.
4- Traverse from index_0_1 towards end, incrementing the element count and stop whenever you find element A[index_1_0]. Update index_0_1 with current index.
5- Traverse from index_1_0+1 to index_0_1, decrementing the element count. Stop whenever an element count is changed from 1 to 0. This is new index_1_0. If subarray A[index_1_0: index_0_1] is smaller than previous ans, update it and continue with steps 4 and step 5, until whole array have been traversed.
Use a hash table to maintain a count of each type of element in the string.
when you find a new type of element
discard all previous answers and start trimming the start of the substring,
when you can trim it no more without having zero of one type of element
remember the substring if it's the shortest yet found and then start looking for another element to replace the one you are about to loose or to find a new element not previously seen as above.
When you hit the end of the string you are done.
If your hash is any good, this should be O(n)

What is the most efficient way to sort a number list into alternating low-high-low sequences?

Suppose you are given an unsorted list of positive integers, and you wish to order them in a manner such that the elements alternate as: (less than preceding element), (greater than preceding element), (less than preceding element), etc... The very first element in the output list may ignore the rule. So for example, suppose your list was: 1,4,9,2,7,5,3,8,6.
One correct output would be...
1,9,2,8,3,7,4,6,5
Another would be...
3,4,2,7,5,6,1,9,8
Assume that the list contains no duplicates, is arbitrarily large, and is not already sorted.
What is the most processing efficient algorithm to achieve this?
Now, the standard approach would be to simply sort the list in ascending order first, and then peel elements from the ends of the list in alternation. However, I'd like to know: Is there a more time-efficient way to do this without first sorting the list?
My reason for asking: (read this only if you care)
Apparently this is a question my sister's boyfriend poses to people at job interviews out in San Francisco. My sister asked me the question, and I immediately came up with the standard response. That's what everyone answers. However, apparently one girl came up with a completely different solution that does not require sorting the list, and it appears to work. My sister couldn't explain to me this solution, but the idea has been confounding me since last night. I'd appreciate any help! Thanks!
You can do this in O(n) by placing each element in turn at the end, or at the penultimate position based on a comparison with the current last element.
For example,
1,4,9,2,7,5,3,8,6
Place 1 at end, current list [1]
4>1 true so place 4 at end, current list [1,4]
9<4 false so place 9 at penultimate position [1,9,4]
2>4 false so place 2 at penultimate [1,9,2,4]
7<4 false so place 7 at penultimate [1,9,2,7,4]
5>4 true so place 5 at end [1,9,2,7,4,5]
3<5 true so place 3 at end [1,9,2,7,4,5,3]
8>3 true so place 8 at end [1,9,2,7,4,5,3,8]
6<8 true so place 6 at end [1,9,2,7,4,5,3,8,6]
Note that the equality tests alternate, and that we place at the end if the equality is true, or at the penultimate position if it is not true.
Example Python Code
A=[1,4,9,2,7,5,3,8,6]
B=[]
for i,a in enumerate(A):
if i==0 or (i&1 and a>B[-1]) or (i&1==0 and a<B[-1]):
B.insert(i,a)
else:
B.insert(i-1,a)
print B
One solution is this. Given in Pseudocode.
Assuming, nums has at least two elements and all elements in nums are distinct.
nums = [list of numbers]
if nums[0] < nums[1]: last_state = INCREASING else: last_state = DECREASING
for i = 2 to len(nums - 1):
if last_state = INCREASING:
if nums[i] > nums[i-1]:
swap (nums[i], nums[i-1])
last_state = DECREASING
else
if nums[i] < nums[i-1]:
swap (nums[i], nums[i-1])
last_state = INCREASING
Proof of correctness:
After each loop iteration, elements upto index i in nums remain alternating and last_state is represent the order of i th and i-1 th elements.
Note that a swapping happens only if last 3 items considered are in order. (Increasing or Decreasing) Therefore, if we swapped ith element with i-1 th element, the order of i-2 th element and i-1th element will not change.

Resources