Help me with this recursive combinatorial algorithm - algorithm

Folks,
I have N bounded sets:
S1 = {s11, s12, ... s1a }
S2 = {s21, s22, ... s2b }
...
sN= {sN1, sN2, ... sNx }
I have a function f() that takes one argument A from each set:
f( A1, A2, ... AN ) such that Ax belongs to Sx
I need to invoke f() for all possible combinations of arguments:
f( s11, s21, ... sN1 )
f( s11, s21, ... sN2 )
f( s11, s21, ... sN3 )
...
f( s11, s21, ... sNx )
...
f( s1a, s2b, ... sNx )
Can someone help me figure out a recursive (or iterative) algorithm that will hit all combinations?
Thanks in advance.
-Raj

So basically you want to generate the cartesian product s1 x s2 x ... x sN.
This is a classic application of backtracking / recursion. Here's how a pseudocode would look like:
function CartesianProduct(current, k)
if (k == N + 1)
current is one possibility, so call f(current[1], current[2], ..., current[N])
and return
for each element e in Sk
call CartesianProduct(current + {e}, k + 1)
Initial call is CartesianProduct({}, 1)
You should write it on paper and see how it works. For example, consider the sets:
s1 = {1, 2}
s2 = {3, 4}
s3 = {5, 6}
The first call will be CartesianProduct({}, 1), which will then start iterating over the elements in the first set. The first recursive call with thus be CartesianProduct({1}, 2). This will go on in the same manner, eventually reaching CartesianProduct({1, 3, 5}, 4), for which the termination condition will be true (current.Length == N + 1). Then it will backtrack and call CartesianProduct({1, 3, 6}, 4) and so on, until all possibilities are generated. Run it on paper all the way to see exactly how it works.
A
Extra credit: can you figure out how to get rid of the k parameter?

Related

Finding all subsets of specified size

I've been scratching my head about this for two days now and I cannot come up with a solution. What I'm looking for is a function f(s, n) such that it returns a set containing all subsets of s where the length of each subset is n.
Demo:
s={a, b, c, d}
f(s, 4)
{{a, b, c, d}}
f(s, 3)
{{a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}}
f(s, 2)
{{a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}}
f(s, 1)
{{a}, {b}, {c}, {d}}
I have a feeling that recursion is the way to go here. I've been fiddling with something like
f(S, n):
for s in S:
t = f( S-{s}, n-1 )
...
But this does not seem to do the trick. I did notice that len(f(s,n)) seems to be the binomial coefficient bin(len(s), n). I guess this could be utilized somehow.
Can you help me please?
Let us call n the size of the array and k the number of elements to be out in a subarray.
Let us consider the first element A[0] of the array A.
If this element is put in the subset, the problem becomes a (n-1, k-1) similar problem.
If not, it becomes a (n-1, k) problem.
This can be simply implemented in a recursive function.
We just have to pay attention to deal with the extreme cases k == 0 or k > n.
During the process, we also have to keep trace of:
n: the number of remaining elements of A to consider
k: the number of elements that remain to be put in the current subset
index: the index of the next element of A to consider
The current_subset array that memorizes the elements already selected.
Here is a simple code in c++ to illustrate the algorithm
Output
For 5 elements and subsets of size 3:
3 4 5
2 4 5
2 3 5
2 3 4
1 4 5
1 3 5
1 3 4
1 2 5
1 2 4
1 2 3
#include <iostream>
#include <vector>
void print (const std::vector<std::vector<int>>& subsets) {
for (auto &v: subsets) {
for (auto &x: v) {
std::cout << x << " ";
}
std::cout << "\n";
}
}
// n: number of remaining elements of A to consider
// k: number of elements that remain to be put in the current subset
// index: index of next element of A to consider
void Get_subset_rec (std::vector<std::vector<int>>& subsets, int n, int k, int index, std::vector<int>& A, std::vector<int>& current_subset) {
if (n < k) return;
if (k == 0) {
subsets.push_back (current_subset);
return;
}
Get_subset_rec (subsets, n-1, k, index+1, A, current_subset);
current_subset.push_back(A[index]);
Get_subset_rec (subsets, n-1, k-1, index+1, A, current_subset);
current_subset.pop_back(); // remove last element
return;
}
void Get_subset (std::vector<std::vector<int>>& subsets, int subset_length, std::vector<int>& A) {
std::vector<int> current_subset;
Get_subset_rec (subsets, A.size(), subset_length, 0, A, current_subset);
}
int main () {
int subset_length = 3; // subset size
std::vector A = {1, 2, 3, 4, 5};
int size = A.size();
std::vector<std::vector<int>> subsets;
Get_subset (subsets, subset_length, A);
std::cout << subsets.size() << "\n";
print (subsets);
}
Live demo
One way to solve this is by backtracking. Here's a possible algorithm in pseudo code:
def backtrack(input_set, idx, partial_res, res, n):
if len(partial_res == n):
res.append(partial_res[:])
return
for i in range(idx, len(input_set)):
partial_res.append(input_set[i])
backtrack(input_set, idx+1, partial_res, res, n) # path with input_set[i]
partial_res.pop()
backtrack(input_set, idx+1, partial_res, res, n) # path without input_set[i]
Time complexity of this approach is O(2^len(input_set)) since we make 2 branches at each element of input_set, regardless of whether the path leads to a valid result or not. The space complexity is O(len(input_set) choose n) since this is the number of valid subsets you get, as you correctly pointed out in your question.
Now, there is a way to optimize the above algorithm to reduce the time complexity to O(len(input_set) choose n) by pruning the recursive tree to paths that can lead to valid results only.
If n - len(partial_res) < len(input_set) - idx + 1, we are sure that even if we took every remaining element in input_set[idx:] we are still short at least one to reach n. So we can employ this as a base case and return and prune.
Also, if n - len(partial_res) == len(input_set) - idx + 1, this means that we need each and every element in input_set[idx:] to get the required n length result. Thus, we can't skip any elements and so the second branch of our recursive call becomes redundant.
backtrack(input_set, idx+1, partial_res, res, n) # path without input_set[i]
We can skip this branch with a conditional check.
Implementing these base cases correctly, reduces the time complexity of the algorithm to O(len(input_set) choose k), which is a hard limit because that's the number of subsets that there are.
subseqs 0 _ = [[]]
subseqs k [] = []
subseqs k (x:xs) = map (x:) (subseqs (k-1) xs) ++ subseqs k xs
Live demo
The function looks for subsequences of (non-negative) length k in a given sequence. There are three cases:
If the length is 0: there is a single empty subsequence in any sequence.
Otherwise, if the sequence is empty: there are no subsequences of any (positive) length k.
Otherwise, there is a non-empty sequence that starts with x and continues with xs, and a positive length k. All our subsequences are of two kinds: those that contain x (they are subsequences of xs of length k-1, with x stuck at the front of each one), and those that do not contain x (they are just subsequences of xs of length k).
The algorithm is a more or less literal translation of these notes to Haskell. Notation cheat sheet:
[] an empty list
[w] a list with a single element w
x:xs a list with a head of x and a tail of xs
(x:) a function that sticks an x in front of any list
++ list concatenation
f a b c a function f applied to arguments a b and c
Here is a non-recursive python function that takes a list superset and returns a generator that produces all subsets of size k.
def subsets_k(superset, k):
if k > len(superset):
return
if k == 0:
yield []
return
indices = list(range(k))
while True:
yield [superset[i] for i in indices]
i = k - 1
while indices[i] == len(superset) - k + i:
i -= 1
if i == -1:
return
indices[i] += 1
for j in range(i + 1, k):
indices[j] = indices[i] + j - i
Testing it:
for s in subsets_k(['a', 'b', 'c', 'd', 'e'], 3):
print(s)
Output:
['a', 'b', 'c']
['a', 'b', 'd']
['a', 'b', 'e']
['a', 'c', 'd']
['a', 'c', 'e']
['a', 'd', 'e']
['b', 'c', 'd']
['b', 'c', 'e']
['b', 'd', 'e']
['c', 'd', 'e']

Prolog subgroup of list of size n

I'm trying to create a rule to determine if a list is a sublist of size n of another list.
isSubgroup/3
isSubgroup(+Subgroup, +Group, +N)
For example, isSubgroup([1, 2, 4], [1, 2, 3, 4, 5], 3) would return True
However, isSubgroup([4, 2, 1], [1, 2, 3, 4, 5], 3) would return False (because of the different order)
I thought of checking for each member of the subgroup whether or not it's a member of the large group, but that would ignore the order.
Is the idea feasible?
Really, try to write an inductive relation. Meanwhile, library(yall) coupled with library(apply) can make one liner:
isSubgroup(S,G,N) :- length(S,N),
foldl({G}/[E,P,X]>>(nth1(X,G,E),X>=P),S,1,_F).
As #WillemVanOnsem suggested, an inductive solution:
subGroups([], []).
subGroups([X|Xs], [X|Ys]):-
subGroups(Xs, Ys).
subGroups(Xs, [_|Ys]):-
subGroups(Xs, Ys).
subGroupsN(Options, N, Solution) :-
length(Solution, N),
subGroups(Solution, Options).
We can define this predictate by an inductive definition. A Subgroup is a subgroup of Group if:
the Subgroup is an empty list;
the first element of the Subgroup is the same as the first element of Group, and the rest of the Subgroup is a subgroup of the rest of the Group;
the Subgroup is a subgroup of the rest of the Group.
We need to update N accordingly such that, if the Subgroup is empty, then the length is 0:
isSubgroup([], _, 0). %% (1)
isSubgroup([H|TS], [H|TG], N) :- %% (2)
N1 is N-1,
isSubgroup(TS, TG, N1).
isSubgroup(S, [_|TG], N) :- %% (3)
isSubgroup(S, TG, N).
The above however results in duplicate trues for the same subgroup. This is due to the fact that we can satisfy the predicate in multiple ways. For example if we call:
isSubgroup([], [1,2], 0).
then it is satisfied through the fact (1), but the last clause (3) also calls this with isSubgroup([], [1], 0)., that will then get satisfied through the fact (1), etc.
We can avoid this by making the last clause more restrictive:
isSubgroup([], _, 0). %% (1)
isSubgroup([H|TS], [H|TG], N) :- %% (2)
N1 is N-1,
isSubgroup(TS, TG, N1).
isSubgroup([HS|TS], [_|TG], N) :- %% (3)
isSubgroup([HS|TS], TG, N).
The above works for the given "directions" (all arguments should be grounded, are "input"). But typically one wants to use a predicate in other directions as well. We can implement a version that works basically when we use arguments as "output" as well, and still make use of tail-call optimization (TCO):
isSubgroup(S, G, N) :-
isSubgroup(S, G, 0, N).
isSubgroup([], _, L, L). %% (1)
isSubgroup([H|TS], [H|TG], L, N) :- %% (2)
L1 is L+1,
isSubgroup(TS, TG, L1, N).
isSubgroup([HS|TS], [_|TG], L, N) :- %% (3)
isSubgroup([HS|TS], TG, L, N).
For example:
?- isSubgroup([1,4,2], G, N).
G = [1, 4, 2|_2974],
N = 3 ;
G = [1, 4, _2972, 2|_2986],
N = 3 ;
G = [1, 4, _2972, _2984, 2|_2998],
N = 3 ;
G = [1, 4, _2972, _2984, _2996, 2|_3010],
N = 3 .
Here Prolog is thus able to propose groups for which [1,4,2] is a subgroup, and it is capable to determining the length N of the subgroup.
We can query in the opposite direction as well:
?- isSubgroup(S, [1,4,2], N).
S = [],
N = 0 ;
S = [1],
N = 1 ;
S = [1, 4],
N = 2 ;
S = [1, 4, 2],
N = 3 ;
S = [1, 2],
N = 2 ;
S = [4],
N = 1 ;
S = [4, 2],
N = 2 ;
S = [2],
N = 1 ;
false.
Prolog can, for a given group [1,4,2] enumerate exhaustively all possible subgroups, together with N the length of that subgroup.

Prolog lists with lengths of constrained length [duplicate]

This question already has answers here:
Using a constrained variable with `length/2`
(4 answers)
Closed 5 years ago.
I'm using the clpfd library
?- use_module(library(clpfd)).
true.
Then I attempt to generate all 3 lists of length K with 1 <= K <= 3.
?- K in 1 .. 3, length(C, K).
K = 1,
C = [_1302] ;
K = 2,
C = [_1302, _1308] ;
K = 3,
C = [_1302, _1308, _1314] ;
ERROR: Out of global stack
I would expect the query to terminate after K = 3. For example, the following does terminate.
?- between(1, 3, K), length(X, K).
K = 1,
X = [_3618] ;
K = 2,
X = [_3618, _3624] ;
K = 3,
X = [_3618, _3624, _3630].
Why does one terminate and the other does not?
K in 1..3 simply asserts that K is somewhere between 1 and 3, without binding particular value. What you need is indomain(K) predicate, which backtracks over all values in K's domain:
K in 1..3, indomain(K), length(C, K).
Out of stack in your example happens for the following reason: length(C, K) without any of its arguments bound generates lists of different lengths, starting with 0, then 1, 2, 3, ...
Each time it generates a solution it tries bind a particular value to K, that is 0, 1, 2, ...
Now, because there are constraints applied to K, any attempts to bind a value greater than 3 will fail, meaning that length(C, K) will continue trying to find alternative solutions, that is, it will keep generating lists of length 4, 5, 6, ... and so on, all of which will be discarded. This process will continue until you exhaust your stack.

Explain and Clarify Haskell Counting Sort

I am working through Cormen et. al., Introduction to Algorithms, 3rd ed., but I also have an interest in Haskell. Section 8.2 (p. 194) covers counting sort. I was interested in how it and many algorithms might be implemented in haskell as they often use array access and destructive update. I took a look at the implementation on RosettaCode (copied below) and I find it very difficult to follow.
import Data.Array
countingSort :: (Ix n) => [n] -> n -> n -> [n]
countingSort l lo hi = concatMap (uncurry $ flip replicate) count
where count = assocs . accumArray (+) 0 (lo, hi) . map (\i -> (i, 1)) $ l
One of the things I like about haskell is how algorithms can be very clear (e.g. Haskell quicksort examples), at least as a specification that hasn't been optimized. This seems very unclear and I wonder if it's necessarily so or just overdone.
Can someone
clarify what's going on here,
perhaps provide a more instructive and clear implementation, and
tackle whether this is actually implementing counting sort or if non-strictness (lazyness) and immutability mean that this is actually some other sort disguised as counting sort?
It is indeed doing a counting sort. Here's a slightly rewritten version that I find easier to understand:
import Data.Array
countingSort :: (Ix n) => [n] -> n -> n -> [n]
countingSort l lo hi = concat [replicate times n | (n, times) <- counts]
where counts = assocs (accumArray (+) 0 (lo, hi) [(i, 1) | i <- l])
Let's break it down step by step. We'll use the list [5, 3, 1, 2, 3, 4, 5].
*Main> [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]
[(5,1),(3,1),(1,1),(2,1),(3,1),(4,1),(5,1)]
We're just taking every element of the list and turning it into a tuple with 1. This is the basis for our counts. Now we need a way to sum up those counts per element. This is where accumArray comes into play.
*Main> accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]
array (1,5) [(1,1),(2,1),(3,2),(4,1),(5,2)]
The first parameter to accumArray is the operation to apply during accumulation (just simple addition for us). The second parameter is the starting value, and the third parameter is the bounds. So we end up with an array mapping numbers to their counts in the input.
Next we use assocs to get key/value tuples from the map:
*Main> assocs $ accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]
[(1,1),(2,1),(3,2),(4,1),(5,2)]
And then replicate to repeat each number based on its count:
*Main> [replicate times n | (n, times) <- assocs $ accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]]
[[1],[2],[3,3],[4],[5,5]]
Finally, we use concat to turn this list of lists into a single list:
*Main> concat [replicate times n | (n, times) <- assocs $ accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]]
[1,2,3,3,4,5,5]
In the actual function I wrote above, I used where to break up this one-liner.
I find list comprehension to be easier to deal with than map and concatMap, so those are the main changes I made in my version of the function.
(uncurry $ flip replicate) is a nice trick... flip replicate gives you a version of replicate that takes its arguments in the opposite order. uncurry takes that curried function and turns it into a function that takes a tuple as an argument instead. Together, those produce the same result as my list comprehension, which destructured the tuple and then passed its parameters in reverse order. I'm not familiar enough with Haskell to know if this is a common idiom, but for me, the list comprehension was easier to follow.
count through accumArray works as histogram builder. That is, for each number from lo to hi it returns how many times the number occurs in the argument list.
count :: (Ix i, Num e) => [i] -> i -> i -> [(i, e)]
count l lo hi = assocs . accumArray (+) 0 (lo, hi) . map (\i -> (i, 1)) $ l
count [6,2,1,6] 0 10 ==
[(0,0),(1,1),(2,1),(3,0),(4,0),(5,0),(6,2),(7,0),(8,0),(9,0),(10,0)]
Results of count are used to generate original elements back from this specification. That is done by replicateing each fst element of tuple snd number of times . This yields list of lists that are concatenated together.
f l lo hi = map (uncurry $ flip replicate ) $ count l lo hi
f [6,2,1,6] 0 10 == [[],[1],[2],[],[],[],[6,6],[],[],[],[]]
Full solution is equivalent to
countingSort l lo hi = concat $ f l lo hi

Prolog - List of sequence from f0 to fN

The question require me to write a predicate seqList(N, L), which is satisfied when L is the list [f0, . . . , fN].
Where the fN = fN-1 + fN-2 + fN-3
My code is to compare the head of a list given, and will return true or false when compared.
seqList(_,[]).
seqList(N,[H|T]) :-
N1 is N - 1,
seq(N,H),
seqList(N1,T).
However, it only valid when the value is reversed,
e.g. seqList(3,[1,1,0,0]) will return true, but the list should return me true for
seqList(3,[0,0,1,1]). Is there any way for me to reverse the list and verifies it correctly?
It seems that you want to generate N elements of a sequence f such that f(N) = f(N-1) + f(N-2) + f(N-3) where f(X) is the X-th element of the sequence list, 0-based. The three starting elements must be pre-set as part of the specification as well. You seem to be starting with [0,0,1, ...].
Using the approach from Lazy lists in Prolog?:
seqList(N,L):- N >= 3, !,
L=[0,0,1|X], N3 is N-3, take(N3, seq(0,0,1), X-[], _).
next( seq(A,B,C), D, seq(B,C,D) ):- D is A+B+C.
Now all these functions can be fused and inlined, to arrive at one recursive definition.
But you can do it directly. You just need to write down the question, to get the solution back.
question(N,L):-
Since you start with 0,0,1, ... write it down:
L = [0, 0, 1 | X],
since the three elements are given, we only need to find out N-3 more. Write it down:
N3 is N-3,
you've now reduced the problem somewhat. You now need to find N-3 elements and put them into the X list. Use a worker predicate for that. It also must know the three preceding numbers at each step:
worker( N3, 0, 0, 1, X).
So just write down what the worker must know:
worker(N, A, B, C, X):-
if N is 0, we must stop. X then is an empty list. Write it down.
N = 0, X = [] .
Add another clause, for when N is greater than 0.
worker(N, A, B, C, X):-
N > 0,
We know that the next element is the sum of the three preceding numbers. Write that down.
D is A + B + C,
the next element in the list is the top element of our argument list (the last parameter). Write it down:
X = [D | X2 ],
now there are one less elements to add. Write it down:
N2 is N - 1,
To find the rest of the list, the three last numbers are B, C, and D. Then the rest is found by worker in exactly the same way:
worker( N2, B, C, D, X2).
That's it. The question predicate is your solution. Rename it to your liking.

Resources