Get permutations in lexicographic order using Haskell - sorting

I'm working on problem 24 from Project Euler which is as follows:
A permutation is an ordered arrangement of objects. For example, 3124
is one possible permutation of the digits 1, 2, 3 and 4. If all of the
permutations are listed numerically or alphabetically, we call it
lexicographic order. The lexicographic permutations of 0, 1 and 2 are:
012 021 102 120 201 210
What is the millionth lexicographic permutation of the digits 0, 1, 2,
3, 4, 5, 6, 7, 8 and 9?
I am trying to solve this using Haskell, and started with a brute force approach:
import Data.List
(sort . permutation) [0..9] !! 999999
But this takes way too long, and I'm wondering if it's because the program first gets all the permutations, then sorts them all, then finally takes the millionth element, which is a lot more work than it needs to do.
So I was thinking I could speed things up if I were to write a function that enumerates the permutations already in lexicographic order, so that we could just stop at the millionth element and have the answer.
The algorithm I have in mind is to first sort the input list x, then take the first (smallest) element and prepend it to the permutations of the remaining elements in lexicographic order. These orderings would be found by recursively calling the original function on the now already sorted tail of x (which means our original function should have a way of flagging whether or not the input list is sorted). Then we continue with the next largest element of x and so on until we have the full ordered list. Unfortunately I am still a Haskell beginner and my attempts to write this function have failed. Any hints as to how this might be done?

I have a thought that's too long for a comment, but isn't a working solution in its entirety. Still, it's a plan that should work nearly instantly.
Start with generating permutations in lexicographic order. This is easy to do with a recursive algorithm. First, select the least element available, and recursively generate permutations of the remaining elements, prepending the selected element to each permutation. Then select the second element lexicographically and continue on up.
For what it's worth, this is the standard-ish nondeterministic-select based permutation algorithm you often find in Haskell instructional materials, if the input list is sorted into increasing order. It's not the algorithm used by Data.List.permutations, which is designed to be faster and productive with infinite input.
But you can do better than this. You don't need to generate all the permutations before the target one. You can skip ahead, and it turns out to be really easy.
All you need to do is look at the number of permutation you are targeting, let's call it k, and use it to index into the permutations. If the inputs are sorted lexicographically, the first element of the result is the element at index q, followed by the permutation of the remaining elements at index r, given (q, r) = divMod k (fact(n - 1)).
I'm sure there are ways to get it faster than this, but that should make it basically instant for small numbers like a million anyway.

Related

Min number of Elements To generate all other elements using xor

I have n integers a_1, ..., a_n. I want to pick the minimum number from all of them whose xor forms others.
For example, consider [1,2,3], 1^3=2 so you don't need 2 in the array. So you can remove it. To end up with [1,3]. So the min number of elements is 2 and they can form all the original elements in the array by xoring any 2 of them. Would a greedy approach work here? or DP?
Edit: To explain what I am thinking. A greedy approach I thought about was due to the fact that if a^b=c then a^c=b and b^c=a. First I delete all duplicates. then I would first in the beginning list all the pairs that each element can pair up with to form another element in the array. It takes O(n^3) for preprocessing. Then I pick the element with the least contribution and I delete it and subsequently subtract 1 from each of the other elements. I repeat this until all elements have <=2 pairs. and I stop. This would also take O(n^3) for a total of O(n^3). Does this greedy approach work? Is there a DP way to do it?
If n is bounded by 50 I think backtracking should work.
Suppose at some step we have already selected a subset S of numbers (that should produce all the others) and want to include a new number to that subset.
Then we can do the following:
Consider all remaining numbers R and include in S all numbers that can't be produced by others (in S and R)
Include in S a random (or "best" in some way) number from R
Remove from R all numbers that can be produced by those in updated S
Also you should keep track of the current best solution and cut off all the branches that won't allow to get a better result.

Sorting algorithm based on subset inversion

I'm looking for a sorting algorithm based on subset inversion. It's like pancake sort, only instead of taking all the pancakes on top of the spatula, you can just invert any subset you want. Length of the subset doesn't matter.
Like this:
http://www.yourgenome.org/sites/default/files/illustrations/diagram/dna_mutations_inversion_yourgenome.png
So we can't simply swap numbers without inverting everything in between.
We're doing this to determine how one subspecies of fruitfly can mutate into the other. Both have the same genes but in a different order. The second subspecies' genome is 'sorted', i.e. the gene numbers are 1-25. The first subspecies genome is unsorted. Hence, we're looking for a sorting algorithm.
This is the "genome" we're looking at (though we should be able to have this work on all lists of numbers):
[23, 1, 2, 11, 24, 22, 19, 6, 10, 7, 25, 20, 5, 8, 18, 12, 13, 14, 15, 16, 17, 21, 3, 4, 9];
We're looking at two separate problems:
1) To sort a list of 25 numbers with the least amount of inversions
2) To sort a list of 25 numbers with the least amount of numbers moved
We also want to establish both upper and lower bounds for both.
We've already found a way to sort like this by just going from left to right, searching for the next lowest value and inverting everything in between, but we're absolutely certain we should be able to do this faster. However, we still haven't found any other methods so I'm asking for your help!
UPDATE: the method we currently use is based on the above method
but instead works both ways. It looks at the next elements needed
for both ends (e.g. 1 and 25 at the beginning) and then calculates
which inversion would be cheapest. All values at the ends can be
ignored for the rest of the algorithm because they get put into the
correct place immediately. Our first method took 18/19 steps and 148
genes, and this one does it in 17 steps and 101 genes. For both
optimalisation tactics (the two mentioned above), this is a better
method. It is however not cheaper in terms of code and processing.
Right now, we're working in Python because we have most experience with that, but I'd be happy with any pseudocode ideas on how we can more efficiently tackle this. If you think another language might be better suited, please let me know. Pseudocode, ideas, thoughts and actual code are all welcome!
Thanks in advance!
Regarding the first question: Do you know (and care about) which of the two strands the genes are on?
If so, you're in luck: This is called the inversion distance between signed permutations problem, and there is a linear-time algorithm for it: http://www.ncbi.nlm.nih.gov/pubmed/11694179. I haven't looked at the details.
If not, then unfortunately (as described on p. 2 of that paper) the problem is NP-hard, so it's very unlikely that any algorithm exists that is efficient (polynomial-time) in the worst case.
Regarding the second question: Assuming you mean that you want to find the minimum number of swaps needed to sort a list of numbers, you should be able to find solutions to this by searching here on SO and elsewhere. I think this is a clear and concise explanation. You can also use the optimal solution to this problem to get an upper bound for your first question: Any swap of positions i and j can be simulated using the two interval reversals (i, j) and (i+1, j-1). (This upper bound might be very bad, though, and in particular could be worse than your existing greedy algorithm.)
I think what you're looking for for the second question is the minimum number of swaps of adjacent elements to sort a sequence, which is equal to the number of inversions in the sequence (where a[i] > a[j] and i < j).
The first question seems quite a bit more complicated to me. One potential heuristic might be to think of the subset inversion as similar to the adjacent swap of more than one element. For example, if you've managed to get a sequence to this position,
5,6,1,2,3,4,7,8
we can "adjacent swap" indexes [0,1] with [2,3] (so inverting [0,1,2,3]),
2,1,6,5,3,4,7,8
and then [2,3] with [4,5] (inverting [2,3,4,5]),
2,1,4,3,5,6,7,8
and arrive at a sequence that now has significantly less element inversions, meaning less single adjacent swaps are needed to now complete the sort.
So maybe attempting to quantify inversions (in the sense of a[i] > a[j] and i < j) of sections rather than single elements could help move in the direction of estimating or building a method for the first question.

Algorithm to find first sequence of integers that sum to certain value

I have a list of numbers and I have a sum value. For instance,
list = [1, 2, 3, 5, 7, 11, 10, 23, 24, 54, 79 ]
sum = 20
I would like to generate a sequence of numbers taken from that list, such that the sequence sums up to that target. In order to help achieve this, the sequence can be of any length and repetition is allowed.
result = [2, 3, 5, 10] ,or result = [1, 1, 2, 3, 3, 5, 5] ,or result = [10, 10]
I've been doing a lot of research into this problem and have found the subset sum problem to be of interest. My problem is, in a few ways, similar to the subset sum problem in that I would like to find a subset of numbers that produces the targeted sum.
However, unlike the subset sum problem which finds all sets of numbers that sum up to the target (and so runs in exponential time if brute forcing), I only want to find one set of numbers. I want to find the first set that gives me the sum. So, in a certain sense, speed is a factor.
Additionally, I would like there to be some degree of randomness (or pseudo-randomness) to the algorithm. That is, should I run the algorithm using the same list and sum multiple times, I should get a different set of numbers each time.
What would be the best algorithm to achieve this?
Additional Notes:
What I've achieved so far is using a naive method where I cycle through the list adding it to every combination of values. This obviously takes a long time and I'm currently not feeling too happy about it. I'm hoping there is a better way to do this!
If there is no sequence that gives me the exact sum, I'm satisfied with a sequence that gives me a sum that is as close as possible to the targeted sum.
As others said, this is a NP-problem.
However, this doesn't mean small improvements aren't possible:
Is 1 in the list? [1,1,1,1...] is the solution. O(1) in a sorted list
Remove list element bigger than the target sum. O(n)
Is there any list element x with (x%sum)==0 ? Again, easy solution. O(n)
Are there any list elements x,y with (x%y)==0 ? Remove x. O(n^2)
(maybe even: Are there any list elements x,y,z with (x%y)==z or (x+y)==z ? Remove x. O(n^3))
Before using the full recursion, try if you can get the sum
just with the smallest even and smallest odd number.
...
Subset Sum problem isn't about finding all subsets, but rather about determining if there is some subset. It is a decision problem. All problems in NP are like this. And even this simpler problem is NP-complete.
This means that if you want an exact answer (the subset must sum exactly some value) you won't be able to do much better than the any subset sum algorithm (it is exponential unless P=NP).
I would attempt to reduce the problem to a brute-force search of a smaller set.
Sort the list smallest to largest.
Keep a sum and result list.
Repeat {
Draw randomly from the subset of list less than target - sum.
Increment sum by drawn value, add drawn value to result list.
} until list[0] > sum or sum == 0
If sum != 0, brute force search for small combinations from list that match the difference between sum and small combinations of result.
This approach may fail to find valid solutions, even if they exist. It can, however, quickly find a solution or quickly fail before having to resort to a slower brute force approach using the entire set at a greater depth.
This is a greedy approach to the problem:
Without 'randomness':
Obtain the single largest number in the set that is smaller than your desired sum- we'll name it X. Given it's ordered, at best it's O(1), and O(N) at worst if the sum is 2.
As you can repeat the value- say c times, do so as many times until you get closest to the sum, but be careful! Create a range of values- essentially now you'll be finding another sum! You'll now be find numbers that add up to R = (sum - X * c). So find the largest number smaller than R. Check if R - (number you just found) = 0 or if any [R - (number you just found)] % (smaller #s) == 0.
If it becomes R > 0, make partial sums of the smaller numbers less than R (this will not be more than 5 ~ 10 computations because of the nature of this algorithm). See if these would then satisfy it.
If that step makes R < 0, remove one X and start the process again.
With 'randomness':
Just get X randomly! :-)
Note: This would work best if you have a few single digit numbers.

From the given array of numbers find all the of numbers in group of 3 with sum value N

Given is a array of numbers:
1, 2, 8, 6, 9, 0, 4
We need to find all the numbers in group of three which sums to a value N ( say 11 in this example). Here, the possible numbers in group of three are:
{1,2,8}, {1,4,6}, {0,2,9}
The first solution I could think was of O(n^3). Later I could improve a little(n^2 log n) with the approach:
1. Sort the array.
2. Select any two number and perform binary search for the third element.
Can it be improved further with some other approaches?
You can certainly do it in O(n^2): for each i in the array, test whether two other values sum to N-i.
You can test in O(n) whether two values in a sorted array sum to k by sweeping from both ends at once. If the sum of the two elements you're on is too big, decrement the "right-to-left" index to make it smaller. If the sum is too small, increment the "left-to-right" index to make it bigger. If there's a pair that works, you'll find them, and you perform at most 2*n iterations before you run out of road at one end or the other. You might need code to ignore the value you're using as i, depends what the rules are.
You could instead use some kind of dynamic programming, working down from N, and you probably end up with time something like O(n*N) or so. Realistically I don't think that's any better: it looks like all your numbers are non-negative, so if n is much bigger than N then before you start you can quickly throw out any large values from the array, and also any duplicates beyond 3 copies of each value (or 2 copies, as long as you check whether 3*i == N before discarding the 3rd copy of i). After that step, n is O(N).

Divide list into two equal parts algorithm

Related questions:
Algorithm to Divide a list of numbers into 2 equal sum lists
divide list in two parts that their sum closest to each other
Let's assume I have a list, which contains exactly 2k elements. Now, I'm willing to split it into two parts, where each part has a length of k while trying to make the sum of the parts as equal as possible.
Quick example:
[3, 4, 4, 1, 2, 1] might be splitted to [1, 4, 3] and [1, 2, 4] and the sum difference will be 1
Now - if the parts can have arbitrary lengths, this is a variation of the Partition problem and we know that's it's weakly NP-Complete.
But does the restriction about splitting the list into equal parts (let's say it's always k and 2k) make this problem solvable in polynomial time? Any proofs to that (or a proof scheme for the fact that it's still NP)?
It is still NP complete. Proof by reduction of PP (your full variation of the Partition problem) to QPP (equal parts partition problem):
Take an arbitrary list of length k plus additional k elements all valued as zero.
We need to find the best performing partition in terms of PP. Let us find one using an algorithm for QPP and forget about all the additional k zero elements. Shifting zeroes around cannot affect this or any competing partition, so this is still one of the best performing unrestricted partitions of the arbitrary list of length k.

Resources