Sorting second column relative to first column - algorithm

I have got the following sequence (representing a tree):
4 2
1 4
3 4
5 4
2 7
0 7
6 0
Now, I am trying to sort this sequence, such that when a value appears on the left (column 1), it has already appeared on the right (column 2). More concretely, the result of the sorting algorithm should be:
1 4
3 4
5 4
4 2
2 7
6 0
0 7
Obviously, this works in O(n^2) with an algorithm iterating over each entry of column 1 and then look for corresponding entries in column two. But as n can be quite big (> 100000) in my scenario, I'm looking for a O(n log n) way to do it. Is this even possible?

Assumption:
I'm assuming this is also a valid sort sequence:
1 4
4 2
3 4
5 4
2 7
6 0
0 7
i.e. Once a value appears once on the right, it can appear on the left.
If this is not the case (i.e. all occurrences on the right has to be before any occurrence on the left), ignore the "remove all edges pointing to that element" part and only remove the intermediate element if it has no incoming edges left.
Algorithm:
Construct a graph where each element A points to another element B if the right element of A is equal to the left element of B. This can be done using a hash multi-map:
Go through the elements, inserting each element A into the hash map as A.left -> A.
Go through the elements again, connecting each element B with all elements appearing under B.right.
Perform a topological sort of the graph, giving you your result. I should be modified such that, instead of removing an edge pointing to an element, we remove all edges pointing to that element (i.e. if we already found an element containing some element on the right, we don't need to find another for that element to appear on the left).
Currently this is O(n2) running time, because there are too many edges - if we have:
(1,2),(1,2),...,(1,2),(2,3),(2,3),...,(2,3)
There are O(n2) edges.
This can be avoided by, instead of having elements point directly to each other, create an intermediate element. In the above case, 1/2 the elements will point to that element and that element will point to the other half. Then, when doing the topological sort, when we would've remove an edge to that element, we instead remove that element and all edges pointing from / to it.
Now there will be a maximum of O(n) edges, and, since topological sort can be done in linear time with respect to the elements and edges, the overall running time is O(n).
Note that it's not always possible to get a result: (1,2), (2,1).
Illustrations:
For your example (pre-optimization), we'd have:
For my example above, we'd have:

Related

Iterating through every combination of elements with repetitions without generating whole set

I need to iterate over every possible combination of elements (with repetitions) up to n elements long.
I've found multiple solutions for this problem. But all of these are recursively generating collection of every possible combination, then iterating over it. while this works, for large element collections and combinations size it results in heavy memory use, so I'm looking for a solution that would allow me to calculate next combination from previous one, knowing number of elements, and maximum size of combination.
Is this even possible and is there any particular algorith that would work here?
Generate the combinations so that each combination is sorted. (This assumes the elements themselves can easily be placed in order. The precise ordering relationship is not important as long as it is a total order.)
Start with the combination consisting of n repetitions of the smallest element. To produce the next combination from any given combination:
Scan backwards until you find an element which is not the largest element. If you can't find one, you are done.
Replace that element and all following elements with the next larger element of that element.
If you want combinations of all lengths up to n, run that algorithm for each length up to n. Or start with a vector which contains empty slots and use the above algorithm with the understanding that the "next larger element" after an empty slot is the smallest element.
Example: length 3 of 3 values:
1 1 1
1 1 2
1 1 3
1 2 2
1 2 3
1 3 3
2 2 2
2 2 3
2 3 3
3 3 3

Minimum amount of swaps needed so no number has two neighbours that are both greater..?

The problem statement goes like this: Given a list of N < 500 000 distinct numbers, find the minimum number of swaps of adjacent elements required such that no number has two neighbours that are both greater. A number can only be swapped with a neighbour.
Hint given: Use a segment tree or a fenwick tree.
I don't really get the idea of how I should use a sum-tree to solve this problem.
Example inputs:
Input 1:
5 (amount of elements in the list)
3 1 4 2 0
output 1: 1
input 2:
6
4 5 2 0 1 3
output 2: 4
I can do it in O(n log n) time and O(n) extra space. But first let's look at the quadratic solution I've hinted at earlier:
initialize the result accumulator to 0
while the input list has more than two elements
find the lowest element in the list
add its distance from the closer end of the list to the accumulator
remove the element from the list.
output the accumulator.
Why does this work? First, Let's look at how a sequence that requires zero swap looks like. Since there are no duplicates, if the lowest element is anywhere but at either end, it is surrounded by two elements that are both greater, violating the requirement, thus the lowest element must be at one of the ends. Recurse into the subsequence that excludes this element. To bring a sequence into this state: at least as many swaps involving the lowest element as in the greedy algorithm are required to move the lowest element to one end, and since swaps involving the lowest element do not change the relative ordering of the rest, there is no penalty to reordering them to the front either.
Unfortunately, just implementing this with a list is quadratic. How do you make it faster? Have a finger tree that tracks the subtree weight and minimum value of each subtree and update these as you remove individual minima:
To initialize the tree: First, think of each element in the list as a one-element sublist with its minimum equal to its value. Then, while you have more than one sublist, group the subsequences in pairs, building a tree of subsequences. The length of a sequence is the sum of lengths of both its halves, and its minimum is equal to whichever minimum from both halves is lower.
To remove the minimum from a subsequence while tracking its index in the sequence:
Decrease the length of the subsequence
Remove the minimum from whichever half's minimum is equal to this subsequence minimum
the new minimum is the lower of its halves' new minima
The index of the minimum is equal to its index in its respective half, plus the length of the left half if the minimum was in the right half.
The distance from one end is then equal to either the index or (length before removal - index - 1), whichever is lower.

Sequence of Some Algorithms on Sorting

i see in some midterm or final exam on MIT that the following question repeat and repeat in same manner.
we show an array in the some step of one sorting algorithm.
5,3,1,9,8,2,4,7
2,3,1,4,5,8,9,7
1,2,3,4,5,8,9,7
1,2,3,4,5,8,7,9
1,2,3,4,5,7,8,9
which of Insertion Sort / Quick Sort / Merge Sort / Exchange Sort is used?
how i find solution of this Questions? ?
Edit: i think this is quick sort because each level some elements is lower than pivot and some elements is greater that pivot ....
In such cases you can either a) find some pattern if you think there is one or b) go with simple elimination. Let's try elimination:
1) it cannot be insertion sort as insertion sort starts from the beginning and treats the range [0,k] as a sorted subarray of already checked values. Then it continues one by one so we first would insert 3 before 5 etc as we would at first treat [5] as a sorted subarray of size 1 and insert 3 into it as it's the next value in the whole array.
2) Merge sort would sort neighbor first as it would first recursively treat the whole array as single element arrays and then go back up the recursion tree and merge neigbors so more like this:
[3,5],[1,9],[2,8],[4,7]
[1,3,5,9],[2,4,7,8]
[1,2,3,4,5,6,7,8]
[] shows which parts were sorted at each step.
This means that after one pass neighbors will be sorted.
3) exchange sort would also have a different ordering - the second line should start with 3 as you would swap 5 and 3, then 5 and 1 etc in the first pass. So after one pass we would go from 5,3,1,9,8,2,4,7 into 3,1,5,8,2,4,7,9 if my bubble sort serves me right. We compare each pair and swap if element at i+1 is greater that at i. This way the last element will be the largest.
4) as you fairly pointed out this is quick sort as in each step we can clearly see that the array is getting pivoted around a certain value 4, then you pivot the left half around 2 and the right half around 5 etc.
The parts in bold are the patterns I was talking about, now since you know them you can easily check which one it is :-)
It should be quick sort, not only because the evidence of partition, but also this interesting fact: for some level, only one part of the array changed.
Now let's discuss each algorithm:
Insertion sort will give you a pattern that the first few elements must be sorted, but obviously we don't have this pattern;
Bubble sort (exchange sort) will keep exchanging neighbors if the former element is bigger than the later element, and thus the last k elements will be sorted after k iterations. Based on these two facts, we won't have a pair of neighbor (a, b) that b < a exists after each iteration. However, the sequence doesn't follow this, say the term (3, 1) in the first sequence still exists in the second sequence.
Merge sort first splits the array into 2 + 2 + 2 subarrays and then merge it into 4 + 4 and finally a sorted array of 8 elements, so totally should take 3 steps, but we have 4 steps here, so won't be merge sort.

Hungarian algorithm matching one set to itself

I'm looking for a variation on the Hungarian algorithm (I think) that will pair N people to themselves, excluding self-pairs and reverse-pairs, where N is even.
E.g. given N0 - N6 and a matrix C of costs for each pair, how can I obtain the set of 3 lowest-cost pairs?
C = [ [ - 5 6 1 9 4 ]
[ 5 - 4 8 6 2 ]
[ 6 4 - 3 7 6 ]
[ 1 8 3 - 8 9 ]
[ 9 6 7 8 - 5 ]
[ 4 2 6 9 5 - ] ]
In this example, the resulting pairs would be:
N0, N3
N1, N4
N2, N5
Having typed this out I'm now wondering if I can just increase the cost values in the "bottom half" of the matrix... or even better, remove them.
Is there a variation of Hungarian that works on a non-square matrix?
Or, is there another algorithm that solves this variation of the problem?
Increasing the values of the bottom half can result in an incorrect solution. You can see this as the corner coordinates (in your example coordinates 0,1 and 5,6) of the upper half will always be considered to be in the minimum X pairs, where X is the size of the matrix.
My Solution for finding the minimum X pairs
Take the standard Hungarian algorithm
You can set the diagonal to a value greater than the sum of the elements in the unaltered matrix — this step may allow you to speed up your implementation, depending on how your implementation handles nulls.
1) The first step of the standard algorithm is to go through each row, and then each column, reducing each row and column individually such that the minimum of each row and column is zero. This is unchanged.
The general principle of this solution, is to mirror every subsequent step of the original algorithm around the diagonal.
2) The next step of the algorithm is to select rows and columns so that every zero is included within the selection, using the minimum number of rows and columns.
My alteration to the algorithm means that when selecting a row/column, also select the column/row mirrored around that diagonal, but count it as one row or column selection for all purposes, including counting the diagonal (which will be the intersection of these mirrored row/column selection pairs) as only being selected once.
3) The next step is to check if you have the right solution — which in the standard algorithm means checking if the number of rows and columns selected is equal to the size of the matrix — in your example if six rows and columns have been selected.
For this variation however, when calculating when to end the algorithm treat each row/column mirrored pair of selections as a single row or column selection. If you have the right solution then end the algorithm here.
4) If the number of rows and columns is less than the size of the matrix, then find the smallest unselected element, and call it k. Subtract k from all uncovered elements, and add it to all elements that are covered twice (again, counting the mirrored row/column selection as a single selection).
My alteration of the algorithm means that when altering values, you will alter their mirrored values identically (this should happen naturally as a result of the mirrored selection process).
Then go back to step 2 and repeat steps 2-4 until step 3 indicates the algorithm is finished.
This will result in pairs of mirrored answers (which are the coordinates — to get the value of these coordinates refer back to the original matrix) — you can safely delete half of each pair arbitrarily.
To alter this algorithm to find the minimum R pairs, where R is less than the size of the matrix, reduce the stopping point in step 3 to R. This alteration is essential to answering your question.
As #Niklas B, stated you are solving Weighted perfect matching problem
take a look at this
here is part of document describing Primal-dual algorithm for weighted perfect matching
Please read all and let me know if is useful to you

Algorithm: efficient way to search an integer in a two dimensional integer array? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Given a 2d array sorted in increasing order from left to right and top to bottom, what is the best way to search for a target number?
Search a sorted 2D matrix
A time efficient program to find an element in a two dimensional matrix, the rows and columns of which are increasing monotonically. (Rows and columns are increasing from top to bottom and from left to right).
I can only think of binary search, if the 2D array was sorted.
I posed this problem as homework last semester, and two students, which I had considered to be average, surprised me by coming up with a very elegant, straightforward, and (probably) optimal algorithm:
Find(k, tab, x, y)
let m = tab[x][y]
if k = m then return "Found"
else if k > m then
return Find(k, tab, x, y + 1)
else
return Find(k, tab, x - 1, y)
This algorithm eliminates either one line or one column at every call (note that it is tail recursive, and could be transformed into a loop, thereby avoiding the recursive calls). Thus, if your matrix is n*m, the algorithm performs in O(n+m). This solution is better than the dichotomic search spin off (which the solution I expected when handing out this problem).
EDIT : I fixed a typo (k became x in the recursive calls) and also, as Chris pointed out, this should initially be called with the "upper right" corner, that is Find(k, tab, n, 1), where n is the number of lines.
Since the the rows and columns are increasing monotonically, you can do a neat little search like this:
Start at the bottom left. If the element you are looking for is greater than the element at that location, go right. If it is less go up. Repeat until you find the element or you hit an edge. Example (in hex to make formatting easier):
1 2 5 6 7
3 4 6 7 8
5 7 8 9 A
7 A C D E
Let's search for 8. Start at position (0, 3): 7. 8 > 7 so we go right. We are now at (1, 3): A. 8 < A so we go up. At (1, 2): 7, 8 > 7 so we go right. (2, 2): 8 -> 8 == 8 so we are done.
You'll notice, however, that this has only found one of the elements whose value is 8.
Edit, in case it wasn't obvious this runs in O(n + m) average and worst case time.
Assuming I read right you are saying that the bottom of row n is always less than the top of row n+1. If that is the case then I'd say the simplest way is to search the first row using a binary search for either the number or the next smallest number. Then you will have identified the column it is in. Then do a binary search of that column until you find it.
Start at (0,0)
while the value is too low, continue to the right (0,1), then (0,2) etc.
when reaching a value too high, go down one and left one (1,1)
Repeating those steps should bring you to the target.

Resources