Sorting an array in minimum cost - algorithm

I have an array A[] with 4 element A={
8 1 2 4 }. How to sort it with minimized cost. Criteria is defined as follows-
a. It is possible to swap any 2 element.
b. The cost of any swap is sum of the element value , Like if i swap 8 and 4 the cost is 12 an resultant array is look like A={4 1 2 8}, which is still unsorted so more swap needed.
c. Need to find a way to sort the array with minimum cost.
From my observation greedy will not work, like in each step place any element to its sorted position in array with minimum cost. So a DP solution needed.
Can any one help??

Swap 2 and 1, and then 1 and 4, and then 1 and 8? Or is it a general question?
For a more general approach you could try:
Swapping every pair of 2 elements (with the highest sum) if they are perfect swaps (i.e. swapping them will put them both at their right spot). Th
Use the lowest element as a pivot for swaps (by swapping the element whose spot it occupies), until it reaches its final spot
Then, you have two possibilities:
Repeat step 2: use the lowest element not in its final spot as a pivot until it reaches its final spot, then go back to step 3
Or swap the lowest element not in its final spot (l2) with the lowest element (l1), repeat step 2 until l1 reaches the final spot of l2. Then:
Either swap l1 and l2 again, go to step 3.1
Or go to step 3.2 again, with the next lowest element not in its final spot being used.
When all this is done, if some opposite swaps are performed one next to another (for example it could happen from going to step 2. to step 3.2.), remove them.
There are still some things to watch out for, but this is already a pretty good approximation. Step one and two should always work though, step three would be the one to improve in some borderline cases.
Example of the algorithm being used:
With {8 4 5 3 2 7}: (target array {2 3 4 5 7 8})
Step 2: 2 <> 7, 2 <> 8
Array is now {2, 4, 5, 3, 7, 8}
Choice between 3.1 and 3.2:
3.1 gives 3 <> 5, 3 <> 4
3.2 gives 2 <> 3, 2 <> 5, 2 <> 4, 2 <> 3
3 <> 5, 3 <> 4 is the better result
Conclusion: 2 <> 7, 2 <> 8, 3 <> 5, 3 <> 4 is the best answer.
With {1 8 9 7 6} (resulting array {1 6 7 8 9})
You're beginning at step three already
Choice between 3.1 and 3.2:
3.1 gives 6 <> 9, 6 <> 7, 6 <> 8 (total: 42)
3.2 gives 1 <> 6, 1 <> 9, 1 <> 7, 1 <> 8, 1 <> 6 (total: 41)
So 1 <> 6, 1 <> 9, 1 <> 7, 1 <> 8, 1 <> 6 is the best result

This smells like homework. What you need to do is sort the array but doing so while minimizing cost of swaps. So, it's a optimization problem rather than a sorting problem.
A greedy algorithm would despite this work, all you do is that you fix the solution by swapping the cheapest first (figuring out where in the list it belongs). This is however, not necessarily optimal.
As long as you never swap the same element twice a greedy algorithm should be optimal though.
Anyway, back to the dynamic programming stuff, just build your solution tree using recursion and then prune the tree as you find a more optimal solutions. This is pretty basic recursion.
If you a more complicated sorting algorithm you'll have a lot more difficulty puzzling that together with the dynamic programming so I suggest you start out with a simple, slow O(n^2) sort. And build on top of this.
Rather than to provide you with a solution, I'd like to explain how dynamic programming works in my own words.
The first thing you need to do, is to figure out an algorithm that will explore all possible solutions (this can be a really stupid brute force algorithm).
You then implement this using recursion because dynamic programming is based around being able to figure out overlapping sub problems quickly, ergo recursion.
At each recursive call you look up where you are in your solution and check where you've computed this part of the solution tree before, if you have done this, you can test whether the current solution is more optimal, if it is then you continue, otherwise you're done with this branch of the problem.
When you arrive at the final solution you will have solved the problem.
Think of each recursive call as a snapshot of a partial solution. It's your job to figure how each recursive call fits together in the final optimal solution.
This what I recommend you do:
Write a recursive sort algorithm
Add a parameter to your recursive function that maintains the cost of this execution path, as you sort the array, add to this cost. For every possible swap at any given point do another recursive call (this will branch your solution tree)
Whenever you realize that the cost of the solution you are currently exploring exceeds what you already have somewhere else, abort (just return).
To be able to answer the last question you need to maintain shared memory area in which you can index depending on where you are in you're recursive algorithm. If there's a precomputed cost there you just return that value and don't continue processing (this is the pruning, which makes it fast).
Using this method you can even base your solution on a permutation brute force algorithm, it will probably be very slow or memory intensive because it is stupid when it comes to when you branch or prune but you don't really need a specific sort algorithm to make this work, it will just be more efficient to go about it that way.
Good luck!

If you do a high-low selection sort, you can guarantee that the Nth greatest element isn't swapped more than N times. This a simple algorithm with a pretty easy and enticing guarantee... Maybe check this on a few examples and see how it could be tweaked. Note: this may not lead to an optimal answer...

To find the absolute minimal cost you'll have to try all ways to swap and then find the fastest one.
def recsort(l, sort):
if sorted(l2):
if min>cost:
cost=min
bestsort=sort
if(len(sort) > len(l)*len(l)): //or some other criteria
return
for p1 in (0,len(l)):
for p2 in (0,len(l)):
cost += l[p1] + l[p2]
l2 = swap(l, p1,p2)
if cost<min:
recsort(l2, append sort (p1,p2))
An approach that will be pretty good is to recursively place the biggest value at the top.

Related

Difficulty in thinking a divide and conquer approach

I am self-learning algorithms. As we know Divide and Conquer is one of the algorithm design paradigms. I have studied mergeSort, QuickSort, Karatsuba Multiplication, counting inversions of an array as examples of this particular design pattern. Although it sounds very simple, divides the problems into subproblems, solves each subproblem recursively, and merges the result of each of them, I found it very difficult to develop an idea of how to apply that logic to a new problem. To my understanding, all those above-mentioned canonical examples come up with a very clever trick to solve the problem. For example, I am trying to solve the following problem:
Given a sequence of n numbers such that the difference between two consecutive numbers is constant, find the missing term in logarithmic time.
Example: [5, 7, 9, 11, 15]
Answer: 13
First, I came up with the idea that it can be solved using the divide and conquer approach as the naive approach will take O(n) time. From my understanding of divide and conquer, this is how I approached:
The original problem can be divided into two independent subproblems. I can search for the missing term in the two subproblems recursively. So, I first divide the problem.
leftArray = [5,7,9]
rightArray = [11, 15]
Now it says, I need to solve the subproblems recursively until it becomes trivial to solve. In this case, the subproblem becomes of size 1. If there is only one element, there are 0 missing elements. Now to combine the result. But I am not sure how to do it or how it will solve my original problem.
Definitely, I am missing something crucial here. My question is how to approach when solving this type of divide and conquer problem. Should I come up with a trick like a mergeSort or QuickSort? The more I see the solution to this kind of problem, it feels I am memorizing the approach to solve, not understanding and each problem solves it differently. Any help or suggestion regarding the mindset when solving divide and conquer would be greatly appreciated. I have been trying for a long time to develop my algorithmic skill but I improved very little. Thanks in advance.
You have the right approach. The only missing part is an O(1) way to decide which side you are discarding.
First, note that the numbers in your problem must be ordered, otherwise you can't do better than O(n). There also needs to be at least three numbers, otherwise you wouldn't figure out the "step".
With this understanding in place, you can determine the "step" in O(1) time by examining the initial three terms, and see what's the difference between the consecutive ones. Two outcomes are possible:
Both differences are the same, and
One difference is twice as big as the other.
Case 2 hands you a solution by luck, so we will consider only the first case from now on. With the step in hand, you can determine if the range has a gap in it by subtracting the endpoints, and comparing the result to the number of gaps times the step. If you arrive at the same result, the range does not have a missing term, and can be discarded. When both halves can be discarded, the gap is between them.
As #Sergey Kalinichenko points out, this assumes the incoming set is ordered
However, if you're certain the input is ordered (which is likely in this case) observe the nth position's value to be start + jumpsize * index; this allows you to bisect to find where it shifts
Example: [5, 7, 9, 11, 15]
Answer: 13
start = 5
jumpsize = 2
check midpoint: 5 * 2 * 2 -> 9
this is valid, so the shift must be after the midpoint
recurse
You can find the jumpsize by checking the first 3 values
a, b, c = (language-dependent retrieval)
gap1 = b - a
gap2 = c - b
if gap1 != gap2:
if (value at 4th index) - c == gap1:
missing value is b + gap1 # 2nd gap doesn't match
else:
missing value is a + gap2 # 1st gap doesn't match
bisect remaining values

Understanding the difference between these two scaling properties

I need help understanding the following paragraph from a book on algorithms -
Search spaces for natural combinatorial problems tend to grow
exponentially in the size N of the input; if the input size increases
by one, the number of possibilities increases multiplicatively. We’d
like a good algorithm for such a problem to have a better scaling
property: when the input size increases by a constant factor—say, a
factor of 2—the algorithm should only slow down by some constant
factor C.
I don't really get why one is better than the other. If anyone can formulate any examples to aid my understanding, its greatly appreciated.
Let's consider the following problem: you're given a list of numbers, and you want to find the longest subsequence of that list where the numbers are in ascending order. For example, given the sequence
2 7 1 8 3 9 4 5 0 6
you could form the subsequence [2, 7, 8, 9] as follows:
2 7 1 8 3 9 4 5 0 6
^ ^ ^ ^
but there's an even longer one, [1, 3, 4, 5, 6] available here:
2 7 1 8 3 9 4 5 0 6
^ ^ ^ ^ ^
That one happens to be the longest subsequence that's in increasing order, I believe, though please let me know if I'm mistaken.
Now that we have this problem, how would we go about solving it in the general case where you have a list of n numbers? Let's start with a not so great option. One possibility would be to list off all the subsequences of the original list of numbers, then filter out everything that isn't in increasing order, and then to take the longest one out of all the ones we find. For example, given this short list:
2 7 1 8
we'd form all the possible subsequences, which are shown here:
[]
[8]
[1]
[1, 8]
[7]
[7, 8]
[7, 1]
[7, 1, 8]
[2]
[2, 8]
[2, 1]
[2, 1, 8]
[2, 7]
[2, 7, 8]
[2, 7, 1]
[2, 7, 1, 8]
Yikes, that list is pretty long. But by looking at it, we can see that the longest increasing subsequences have length two, and that there are plenty of choices for which one we could pick.
Now, how well is this going to scale as our input list gets longer and longer? Here's something to think about - how many subsequences are there of this new list, which I made by adding 3 to the end of the existing list?
2 7 1 8 3
Well, every existing subsequence is still a perfectly valid subsequence here. But on top of that, we can form a bunch of new subsequences. In fact, we could take any existing subsequence and then tack a 3 onto the end of it. That means that if we had S subsequences for our length-four list, we'll have 2S subsequences for our length-five list.
More generally, you can see that if you take a list and add one more element onto the end of it, you'll double the number of subsequences available. That's a mathematical fact, and it's neither good nor bad by itself, but if we're in the business of listing all those subsequences and checking each one of them to see whether it has some property, we're going to be in trouble because that means there's going to be a ton of subsequences. We already see that there are 16 subsequences of a four-element list. That means there's 32 subsequences of a five-element list, 64 subsequences of a six-element list, and, more generally, 2n subsequences of an n-element list.
With that insight, let's make a quick calculation. How many subsequences are we going to have to check if we have, say, a 300-element list? We'd have to potentially check 2300 of them - a number that's bigger than the number of atoms in the observable universe! Oops. That's going to take way more time than we have.
On the other hand, there's a beautiful algorithm called patience sorting that will always find the longest increasing subsequence, and which does so quite easily. You can do this by playing a little game. You'll place each of the items in the list into one of many piles. To determine what pile to pick, look for the first pile whose top number is bigger than the number in question and place it on top. If you can't find a pile this way, put the number into its own pile on the far right.
For example, given this original list:
2 7 1 8 3 9 4 5 0 6
after playing the game we'd end up with these piles:
0
1 3 4 5
2 7 8 9 6
And here's an amazing fact: the number of piles used equals the length of the longest increasing subsequence. Moreover, you can find that subsequence in the following way: every time you place a number on top of a pile, make a note of the number that was on top of the pile to its left. If we do this with the above numbers, here's what we'll find; the parenthesized number tells us what was on top of the stack to the left at the time we put the number down:
0
1 3 (1) 4 (3) 5 (4)
2 7 (2) 8 (7) 9 (8) 6 (5)
To find the subsequence we want, start with the top of the leftmost pile. Write that number down, then find the number in parentheses and repeat this process. Doing that here gives us 6, 5, 4, 3, 1, which, if reversed, is 1, 3, 4, 5, 6, the longest increasing subsequence! (Wow!) You can prove that this works in all cases, and it's a really beautiful exercise to actually go and do this.
So now the question is how fast this process is. Placing the first number down takes one unit of work - just place it in its own pile. Placing the second number down takes at most two units of work - we have to look at the top of the first pile, and optionally put the number into a second pile. Placing the third number takes at most three units of work - we have to look at up to two piles, and possibly place the number into its own third pile. More generally, placing the kth number down takes k units of work. Overall, this means that the work we're doing is roughly
1 + 2 + 3 + ... + n
if we have n total elements. That's a famous sum called Gauss's sum, and it simplifies to approximately n2 / 2. So we can say that we'll need to do roughly n2 / 2 units of work to solve things this way.
How does that compare to our 2n solution from before? Well, unlike 2n, which grows stupidly fast as a function of n, n2 / 2 is actually a pretty nice function. If we plug in n = 300, which previously in 2n land gave back "the number of atoms in the universe," we get back a more modest 45,000. If that's a number of nanoseconds, that's nothing; that'll take a computer under a second to do. In fact, you have to plug in a pretty big value of n before you're looking at something that's going to take the computer quite a while to complete.
The function n2 / 2 has an interesting property compared with 2n. With 2n, if you increase n by one, as we saw earlier, 2n will double. On the other hand, if you take n2 / 2 and increase n by one, then n2 / 2 will get bigger, but not by much (specifically, by n + 1/2).
By contrast, if you take 2n and then double n, then 2n squares in size - yikes! But if you take n2 / 2 and double n, then n2 / 2 goes up only by a factor of four - not that bad, actually, given that we doubled our input size!
This gets at the heart of what the quote you mentioned is talking about. Algorithms with runtimes like 2n, n!, etc. scale terribly as a function of n, since increasing n by one causes a huge jump in the runtime. On the other hand, functions like n, n log n, n2, etc. have the property that if you double n, the runtime only goes up by some constant term. They therefore scale much more nicely as a function of input.

Sorting with limited stack operations

I am working on a sorting machine, and to minimize complexity, I would like to keep the moving parts to a minimum. I've come to the following design:
1 Input Stack
2+ Output Stacks
When starting, machine already knows all the items, their current order, and their desired order.
The machine can move one item from the bottom of the input stack to the bottom of an output stack of its choice.
The machine can move all items from an output stack to the top of the input stack. This is called a "return". (In my machine, I plan for this to be done by the user.)
The machine only accesses the bottom of a stack, except by a return. When a stack is returned to the input, the "new" items will be the last items out of the input. This also means that if the machine moves a set of items from the input to one output, the order of those items is reversed.
The goal of the machine is to take all the items from the input stack, and eventually move them all to an output stack in sorted order. A secondary goal is to reduce the number of "stack returns" to a minimum, because in my machine, this is the part that requires user intervention. Ideally, the machine should do as much sorting as it can without the user's help.
The issue I'm encountering is that I can't seem to find an appropriate algorithm for doing the actual sorting. Pretty much all algorithms I can find rely on being able to swap arbitrary elements. Distribution/external sorting seems promising, but all the algorithms I can find seem to rely on accessing multiple inputs at once.
Since machine already knows all the items, I can take advantage of this and sort all the items "in-memory". I experimented with "path-finding" from the unsorted state to the sorted state, but I'm unable to get it to actually converge on a solution. (It commonly just gets stuck in a loop moving stacks back and forth.)
Preferably, I would like a solution that works with a minimum of 2 output stacks, but is able to use more if available.
Interestingly, this is a "game" you can play with standard playing cards:
Get as many cards as you would like to sort. (I usually get 13 of a suit.)
Shuffle them and put them in your hand. Decide how many output stacks you get.
You have two valid moves:
You may move the front-most card in your hand and put it on top of any output stack.
You may pick up all the cards in an output stack and put them at the back of the cards you have in hand.
You win when the cards are in order in an output stack. Your score is the number of times you picked up a stack. Lower scores are better.
This can be done in O(log(n)) returns of an output to an input. More precisely in no more than 2 ceil(log_2(n)) - 1 returns if 1 < n.
Let's call the output stacks A and B.
First consider the simplest algorithm that works. We run through them, putting the smallest card on B and the rest on A. Then put A on input and repeat. After n passes you've got them in sorted order. Not very efficient, but it works.
Now can we make it so that we pull out 2 cards per pass? Well if we had cards 1, 4, 5, 8, 9, 12, ... in the top half and the rest in the bottom half, then the first pass will find card 1 before card 2, reverse them, the second finds card 3 before card 4, reverses them, and so on. 2 cards per pass. But with 1 pass with 2 returns we can put all the cards we want in the top half on stack A, and the rest on stack B, return stack A, return stack B, and then start extracting. This takes 2 + n/2 passes.
How about 4 cards per pass? Well we want it divided into quarters. With the top quarter having cards 1, 8, 9, 16, .... The second quarter having 2, 7, 10, 15, .... The third having 3, 6, 11, 14, .... And the last having 4, 5, 12, 13, .... Basically if you were dealing them you deal the first 4 in order, the second 4 in reverse, the next for in order.
We can divide them into quarters in 2 passes. Can we figure out how to get there? Well working backwards, after the second pass we want A to have quarters 2,1. And B to have quarters 4,3. Then we return A, return B, and we're golden. So after the first pass we wanted A to have quarters 2,4 and B to have quarters 1,3, return A return B.
Turning that around to work forwards, in pass 1 we put groups 2,4 on A, 1,3 on B. Return A, return B. Then in pass 2 we put groups 1,2 on A, 3,4 on B, return A, return B. Then we start dealing and we get 4 cards out per pass. So now we're using 4 + n/4 returns.
If you continue the logic forward, in 3 passes (6 returns) you can figure out how to get 8 cards per pass on the extract phase. In 4 passes (8 returns) you can get 16 cards per pass. And so on. The logic is complex, but all you need to do is remember that you want them to wind up in order ... 5, 4, 3, 2, 1. Work backwards from the last pass to the first figuring out how you must have done it. And then you have your forward algorithm.
If you play with the numbers, if n is a power of 2 you do equally well to take log_2(n) - 2 passes with 2 log_2(n) - 4 returns and then take 4 extraction passes with 3 returns between them for 2 log_2(n) - 1 returns, or if you take log_2(n) - 1 passes with 2 log_2(n) - 2 returns and then 2 extraction passes with 1 returns between them for 2 log_2(n) - 1 returns. (This is assuming, of course, that n is sufficiently large that it can be so divided. Which means "not 1" for the second version of the algorithm.) We'll see shortly a small reason to prefer the former version of the algorithm if 2 < n.
OK, this is great if you've got a multiple of a power of 2 to get. But what if you have, say, 10 cards? Well insert imaginary cards until we've reached the nearest power of 2, rounded up. We follow the algorithm for that, and simply don't actually do the operations that we would have done on the imaginary cards, and we get the exact results we would have gotten, except with the imaginary cards not there.
So we have a general solution which takes no more than 2 ceil(log_2(n)) - 1 returns.
And now we see why to prefer breaking that into 4 groups instead of 2. If we break into 4 groups, it is possible that the 4th group is only imaginary cards and we get to skip one more return. If we break into 2 groups, there always are real cards in each group and we don't get to save a return.
This speeds us up by 1 if n is 3, 5, 6, 9, 10, 11, 12, 17, 18, ....
Calculating the exact rules is going to be complicated, and I won't try to write code to do it. But you should be able to figure it out from here.
I can't prove it, but there is a chance that this algorithm is optimal in the sense that there are permutations of cards which you can't do better than this on. (There are permutations that you can beat this algorithm with, of course. For example if I hand you everything in reverse, just extracting them all is better than this algorithm.) However I expect that finding the optimal strategy for a given permutation is an NP-complete problem.

List value reduction algorithm

Forgive me, but I am very confused and I cannot find any sources that are pointing my in the right direction.
Given list of n elements:
[3, 6, 5, 1]
Reduce the values to be no larger than the size of the list while keeping prioritization values relative to one another (In their original order).
Constraints:
Order must be maintained
Elements are >= 0
Distinct values
I am trying to stay away from sorting and creating a new list, but modifying the list in-place.
What my expected outcome should be:
[1, 3, 2, 0]
Is there an algorithm that exists for this problem?
You could do this in O(n^2).
Just go through the list n times, setting the minimum element(while >= i) to i each time, where i starts at 0 and increments to n-1
I suspect you're looking for something better than that, but I'm not sure how much better you can do in-place.
Example:
Input: 3 6 5 1
3 6 5 0*
1* 6 5 0
1 6 2* 0
1 3* 2 0
Note: this assumes elements are >= 0 and distinct
There may be one, but you don't need it if you think about the steps needed to take to solve this problem.
First, you know that each value in the array cannot be greater than 4, since that is the size in this particular example.
You need to go through each number in the array and with a if condition check to see if the number is greater; if it is then you'll need to decrement it until it is meets the correct condition (in this case, that it is less than 4).
Perform these steps for each index of the array. As far as order, don't swap any indices, since you must retain the original order. Hope that helps!

which one will be faster?

let's say i have an array, size 40. and the element im looking for is in position 38.
having a simple loop, it will take 38 steps right?
but, having, 2 loops, running in parallel, and a variable, "found"
set to false, and changes to true when the element is found.
the first loop, will start from index 0
the second loop, will start from index 40.
so basically, it will take only, 4 steps right? to find the element. the worst case will be if the element is in the middle of the array. right?
It depends how much work it takes to synchronize the state between the two threads.
If it takes 0 work, then this will be, on average, 50% faster than a straight through algorithm.
On the other hand, if it takes more work than X, it will start to get slower (which is very likely the case).
From an algorithm standpoint, I don't think this is how you want to go. Even 2 threads is still going to be O(n) runtime. You would want to sort the data (n log n ), and then do a binary search to get the data. Especially you can sort it 1 time and use it for many searches...
If you're talking about algorithmic complexity, this is still a linear search. Just because you're searching two elements per iteration doesn't change the fact that the algorithm is O(n).
In terms of actual performance you would see, this algorithm is likely to be slower than a linear search with a single processor. Since very little work is done per-element in a search, the algorithm would be memory bound, so there would be no benefit to using multiple processors. Also, since you're searching in two locations, this algorithm would not be as cache efficient. And then, as bwawok points out, there would be a lot of time lost in synchronization.
When you are running in parallel you are dividing your CPU power into two + creating some overhead. If you mean you are running the search on a say, a multicore machine, with your proposed algorithm then the worse case is 20 steps. You are not making any change in the complexity class. So where those 4 steps, that you mentioned, are coming from?
On average there is no different in runtime.
Take for example if you are searching for an item out of 10.
The original algorithm will process in the following search order:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
The worse case is the last item (taking 10 steps).
While the second algorithm will process in the following search order:
1, 3, 5, 7, 9, 10, 8, 6, 4, 2
The worse case in this scenario is item 6 (taking 10 steps).
There are some cases where algorithm 1 is faster.
There are some cases where algorithm 2 is faster.
Both take the same time on average - O(n).
On a side note, it is interesting to compare this to a binary search order (on a sorted array).
4, 3, 2, 3, 1, 4, 3, 2, 4, 3
Taking at most 4 steps to complete.

Resources