Most common subset of size k - algorithm

Suppose you have a list of subsets S1,...,Sn of the integer range R={1,2,...,N}, and an integer k. Is there an efficient way to find a subset C of R of size k such that C is a subset of a maximal number of the Si?
As an example, let R={1,2,3,4} and k=2
S1={1,2,3}
S2={1,2,3}
S3={1,2,4}
S4={1,3,4}
Then I want to return either C={1,2} or C={1,3} (doesn't matter which).

I think your problem is NP-Hard. Consider the bipartite graph with the left nodes being your sets and the right nodes being the integers {1, ..., N}, with an edge between two nodes if the set contains the integer. Then, finding a common subset of size k, which is a subset of a maximal number of the Si, is equivalent to finding a complete bipartite subgraph K(i, k) with maximal number of edges i*k. If you could do this in polynomial time, then, you could find the complete bipartite subgraph K(i, j) with maximal number of edges i*j in polynomial time, by trying for each fixed k. But this problem in NP-Complete (Complete bipartite graph).
So, unless P=NP, your problem does not have a polynomial time algorithm.

Assuming I understand your question I believe this is straightforward for fairly small sets.
I will use Mathematica code for illustration, but the concept is universal.
I generate 10 random subsets of length 4, from the set {1 .. 8}:
ss = Subsets[Range#8, {4}] ~RandomSample~ 10
{{1, 3, 4, 6}, {2, 6, 7, 8}, {3, 5, 6, 7}, {2, 4, 6, 7}, {1, 4, 5, 8},
{2, 4, 6, 8}, {1, 2, 3, 8}, {1, 6, 7, 8}, {1, 2, 4, 7}, {1, 2, 5, 7}}
I convert these to a binary array of the presence of each number in each subset:
a = Normal#SparseArray[Join ## MapIndexed[Tuples[{##}] &, ss] -> 1];
Grid[a]
That is ten columns for ten subsets, and eight rows for elements {1 .. 8}.
Now generate all possible target subsets (size 3):
keys = Subsets[Union ## ss, {3}];
Take a "key" and extract those rows from the array and do a BitAnd operation (return 1 iff all columns equal 1), then count the number of ones. For example, for key {1, 6, 8} we have:
a[[{1, 6, 8}]]
After BitAnd:
Do this for each key:
counts = Tr[BitAnd ## a[[#]]] & /# keys;
Then find the position(s) of the maximum element of that list, and extract the corresponding parts of keys:
keys ~Extract~ Position[counts, Max#counts]
{{1, 2, 7}, {2, 4, 6}, {2, 4, 7}, {2, 6, 7}, {2, 6, 8}, {6, 7, 8}}
With adequate memory this process works quickly for a larger set. Starting with 50,000 randomly selected subsets of length 7 from {1 .. 30}:
ss = Subsets[Range#30, {7}] ~RandomSample~ 50000;
The maximum sub-subsets of length 4 are calculated in about nine seconds:
AbsoluteTiming[
a = Normal#SparseArray[Join ## MapIndexed[Tuples[{##}] &, ss] -> 1];
keys = Subsets[Union ## ss, {4}];
counts = Tr[BitAnd ## a[[#]]] & /# keys;
keys~Extract~Position[counts, Max#counts]
]
{8.8205045, {{2, 3, 4, 20},
{7, 10, 15, 18},
{7, 13, 16, 26},
{11, 21, 26, 28}}}
I should add that Mathematica is a high level language and these operations are on generic objects, therefore if this is done truly at the binary level this should be much faster, and more memory efficient.

I hope I don't misunderstand the problem... Here a solution in SWI-Prolog
:- module(subsets, [solve/0]).
:- [library(pairs),
library(aggregate)].
solve :-
problem(R, K, Subsets),
once(subset_of_maximal_number(R, K, Subsets, Subset)),
writeln(Subset).
problem(4, 2,
[[1,2,3], [1,2,3], [1,2,4], [1,3,4]]).
problem(8, 3,
[[1, 3, 4, 6], [2, 6, 7, 8], [3, 5, 6, 7], [2, 4, 6, 7], [1, 4, 5, 8],
[2, 4, 6, 8], [1, 2, 3, 8], [1, 6, 7, 8], [1, 2, 4, 7], [1, 2, 5, 7]]).
subset_of_maximal_number(R, K, Subsets, Subset) :-
flatten(Subsets, Numbers),
findall(Num-Count,
( between(1, R, Num),
aggregate_all(count, member(Num, Numbers), Count)
), NumToCount),
transpose_pairs(NumToCount, CountToNumSortedR),
reverse(CountToNumSortedR, CountToNumSorted),
length(Subset, K), % list of free vars
prefix(SolutionsK, CountToNumSorted),
pairs_values(SolutionsK, Subset).
test output:
?- solve.
[1,3]
true ;
[7,6,2]
true.
edit: I think that the above solution is wrong, in the sense that what's returned could not be a subset of any of the input: here (a commented) solution without this problem:
:- module(subsets, [solve/0]).
:- [library(pairs),
library(aggregate),
library(ordsets)].
solve :-
problem(R, K, Subsets),
once(subset_of_maximal_number(R, K, Subsets, Subset)),
writeln(Subset).
problem(4, 2,
[[1,2,3], [1,2,3], [1,2,4], [1,3,4]]).
problem(8, 3,
[[1, 3, 4, 6], [2, 6, 7, 8], [3, 5, 6, 7], [2, 4, 6, 7], [1, 4, 5, 8],
[2, 4, 6, 8], [1, 2, 3, 8], [1, 6, 7, 8], [1, 2, 4, 7], [1, 2, 5, 7]]).
subset_of_maximal_number(R, K, Subsets, Subset) :-
flatten(Subsets, Numbers),
findall(Num-Count,
( between(1, R, Num),
aggregate_all(count, member(Num, Numbers), Count)
), NumToCount),
% actually sort by ascending # of occurrences
transpose_pairs(NumToCount, CountToNumSorted),
pairs_values(CountToNumSorted, PreferredRev),
% we need higher values first
reverse(PreferredRev, Preferred),
% empty slots to fill, preferred first
length(SubsetP, K),
select_k(Preferred, SubsetP),
% verify our selection it's an actual subset of any of subsets
sort(SubsetP, Subset),
once((member(S, Subsets), ord_subtract(Subset, S, []))).
select_k(_Subset, []).
select_k(Subset, [E|R]) :-
select(E, Subset, WithoutE),
select_k(WithoutE, R).
test:
?- solve.
[1,3]
true ;
[2,6,7]
true.

Related

Writing a union operation in prolog for multisets

In the prolog there is a standard way to define union of two sets
But I want to write union function for multisets. That would mean if first set has [1,2] and
second has [2,3] then the output should be [1,2,2,3].
How do I go about writing such a function.
There is no type Set in Prolog. What you are referencing are just lists. The union predicate combines two lists with assumed unique elements to a new list where the elements are again unique. The order does not matter in a set. You see that when you pass a non-ordered/non-set list.
% both ordered, but result is not
?- union([1, 2, 5, 6], [3, 4, 7, 8], S).
S = [1, 2, 5, 6, 3, 4, 7, 8].
% sets, not ordered
?- union([1, 2, 3], [3, 2, 4, 5], S).
S = [1, 3, 2, 4, 5].
% multisets, not ordered
?- union([1, 2, 3], [3, 2, 4, 5, 5], S).
S = [1, 3, 2, 4, 5, 5].
To create a multiset, i.e. keep all elements in the set union, you can just combine the lists, while ordering them as you please. Formally, a multiset is not ordered as well, so if sorting does not matter, you can just append the second list to the first, which also works with your example:
?- append([1, 2, 4], [2, 3, 4, 5], S).
S = [1, 2, 4, 2, 3, 4, 5].
?- append([1, 2], [2, 3], S).
S = [1, 2, 2, 3].
If you can assume the lists to be ordered and you want that to be the case in the result, you can just merge them, which keeps the order:
?- merge([1, 2, 4], [2, 3, 4, 5], S).
S = [1, 2, 2, 3, 4, 4, 5].
If you can not assume the lists to be sorted, but you want to have the result sorted, you can also take care of the sorting yourself, e.g. with msort:
?- append([1, 2, 3], [3, 2, 4, 5, 5], S), msort(S, S1).
S = [1, 2, 3, 3, 2, 4, 5, 5],
S1 = [1, 2, 2, 3, 3, 4, 5, 5].
There are also more sorting predicates around, and if it's more complicated you can write one yourself, too.

Avoid findall overflow with n-fractions problem

I am trying to print all solutions of the n-fractions problem for n=4:
:- lib(ic).
fractions(Digits) :-
Digits = [A,B,C,D,E,F,G,H,I,J,K,L],
Digits #:: 1..9,
ic:alldifferent(Digits),
X #= 10*B+C,
Y #= 10*E+F,
Z #= 10*H+I,
V #= 10*K+L,
A*Y*Z*V + D*X*Z*V + G*X*Y*V + J*X*Y*Z #= X*Y*Z*V,
A*Y #=< D*X,
D*Z #=< G*Y,
G*V #=< J*Z,
search(Digits,0,input_order,indomain,complete,[]).
When I run the query:
?- findall(Digits,fractions(Digits),List).
I get the following exception:
*** Overflow of the local/control stack!
You can use the "-l kBytes" (LOCALSIZE) option to have a larger stack.
Peak sizes were: local stack 105728 kbytes, control stack 25344 kbytes
I am thinking if there is a way to loop inside the program and print one solution each time, or I can't do that because the problem has too many solutions?
As has been pointed out, your code fails because the alldifferent(Digits) constraint is too restrictive. The digits must be allowed to occur between 1 and 2 times. In eclipse-clp, you can use constraints such as atleast/3, atmost/3, occurrences/3 or gcc/2 to express this.
Slightly off-topic: as you are using ECLiPSe's ic-solver (which can handle continuous domains), you can actually use a model much closer to the original specification, without introducing lots of multiplications:
:- lib(ic).
:- lib(ic_global).
fractions4(Digits) :-
Digits = [A,B,C,D,E,F,G,H,I,J,K,L],
Digits #:: 1..9,
A/(10*B+C) + D/(10*E+F) + G/(10*H+I) + J/(10*K+L) $= 1,
( for(I,1,9), param(Digits) do
occurrences(I, Digits, NOcc), NOcc #:: 1..2
),
lex_le([A,B,C], [D,E,F]), % lex-ordering to eliminate symmetry
lex_le([D,E,F], [G,H,I]),
lex_le([G,H,I], [J,K,L]),
labeling(Digits).
Apart from the main equality constraint (using $= instead of #= because we don't want to require integrality here), I've used occurrences/3 for the occurrence restrictions, and lexicographic ordering constraints as a more standard way of eliminating symmetry. Result:
?- findall(Ds, fractions4(Ds), Dss), length(Dss, NSol).
Dss = [[1, 2, 4, 3, 5, 6, 8, 1, 4, 9, 2, 7], [1, 2, 6, 5, 3, 9, 7, 1, 4, 8, 2, 4], [1, 2, 6, 5, 3, 9, 7, 8, 4, 9, 1, 2], [1, 2, 6, 7, 3, 9, 8, 1, 3, 9, 5, 4], [1, 2, 6, 8, 7, 8, 9, 1, 3, 9, 5, 4], [1, 3, 4, 5, 4, 6, 8, 1, 7, 9, 2, 3], [1, 3, 4, 7, 5, 6, 8, 1, 7, 9, 2, 4], [1, 3, 4, 8, 1, 7, 8, 5, 2, 9, 2, ...], [1, 3, 5, 6, 2, 8, 7, 1, 4, 9, ...], [1, 3, 6, 5, 2, 4, 7, 1, 8, ...], [1, 3, 6, 5, 3, 6, 7, 8, ...], [1, 3, 6, 5, 4, 5, 8, ...], [1, 3, 6, 5, 6, 3, ...], [1, 3, 6, 6, 5, ...], [1, 3, 6, 7, ...], [1, 3, 9, ...], [1, 3, ...], [1, ...], [...], ...]
NSol = 1384
Yes (82.66s cpu)
An added advantage of this model is that it can be quite easily turned into a generic model for arbitrary N.
Simply your predicate fails. If you remove all the constraints except alldifferent/1 and search/6 (just to understand the problem) and call ?- fractions(Digits). you get false because it's impossible to have a list with 12 elements (Digits = [A,B,C,D,E,F,G,H,I,J,K,L]) with domain for each element Digits #:: 1..9 and constraint those elements to be all different (ic:alldifferent(Digits)). 9 options for 12 elements: unsolvable. If you expand the domain up to 12 (Digits #:: 1..12), you get a solution:
?- fractions(Digits).
Digits = [2, 3, 4, 9, 7, 10, 12, 8, 5, 11, 1, 6]
Yes (94.00s cpu, solution 1, maybe more)
Then you can apply findall/3 and see other solutions...
Many clpfd implementations offer global_cardinality constraints which I use in this example. In the following I use SICStus Prolog 4.5.0:
:- use_module(library(clpfd)).
fractions(Digits) :-
Digits = [A,B,C,D,E,F,G,H,I,J,K,L],
domain(Digits, 1, 9),
global_cardinality(Digits, [1-N1,2-N2,3-N3,4-N4,5-N5,6-N6,7-N7,8-N8,9-N9]),
domain([N1,N2,N3,N4,N5,N6,N7,N8,N9], 1, 2),
X #= 10*B+C,
Y #= 10*E+F,
Z #= 10*H+I,
V #= 10*K+L,
Z*V #= ZV,
X*Y #= XY,
A*Y*ZV + D*X*ZV + G*XY*V + J*XY*Z #= XY*ZV,
X #=< Y, X #= Y #=> A #=< D, % break some symmetries
Y #=< Z, Y #= Z #=> D #=< G,
Z #=< V, Z #= V #=> G #=< J.
Sample use:
| ?- n_fractions(4,Zs), labeling([enum],Zs).
Zs = [2,1,2,9,1,8,7,3,5,6,4,5] ? ;
Zs = [2,1,3,7,1,8,9,2,6,5,4,5] ? ;
Zs = [2,1,3,7,1,8,9,2,6,6,5,4] ? ;
...
no
Using prolog-findall for collecting all solutions works out all right, too:
?- findall(Zs,(n _fractions(4,Zs), labeling([enum],Zs)), Zss),
length(Zss, N_sols).
Zss = [[2,1,2,9,1,8,7,3,5|...],
[2,1,3,7,1,8,9,2,6|...],
[2,1,3,7,1,8,9,2|...],
[2,1,3,8,1,5,7|...],
[2,1,3,8,1,6|...],
[2,1,3,9,1|...],
[2,1,3,9|...],
[2,1,4|...],
[2,1|...],
[...|...]|...],
N_sols = 1384 ? ;
no

Counting number of paths between two nodes in Prolog program

I need some help for counting the number of combinations from which a destination node can be reached.
I found the program for finding the different paths. But in the end I need to have some query
%Edge List (Knowledge Base)
edge(1,2).
edge(1,4).
edge(2,4).
edge(3,6).
edge(3,7).
edge(4,3).
edge(4,5).
edge(5,6).
edge(5,7).
edge(6,5).
edge(7,5).
edge(8,6).
edge(8,7).
%Program
path(X,Y,[X,Y]):- edge(X,Y).
path(X,Y,[X|Xs]):- edge(X,W), path(W,Y,Xs).
-------------------------------------------------
%Query
path(1, 7, P).
%Results
Z = [1, 2, 4, 3, 6, 5, 7];
Z = [1, 2, 4, 3, 6, 5, 6, 5, 7];
.........................
But what if I want to run a query that gives me the number of these paths.
?-path(1, 7, count).
should return 2
First of all you're answer fall into cycles and does not terminate, you could keep a list of what you've visited in order to avoid visit same nodes twice:
path(X,Y,L):-path(X,Y,L,[X]).
path(X,Y,[X,Y],L):- \+member(Y,L),edge(X,Y).
path(X,Y,[X|Xs],L):- edge(X,W),\+ member(W,L) ,path(W,Y,Xs,[W|L]).
Now if you query:
?- path(1, 7, P).
P = [1, 2, 4, 3, 7] ;
P = [1, 2, 4, 3, 6, 5, 7] ;
P = [1, 2, 4, 5, 7] ;
P = [1, 4, 3, 7] ;
P = [1, 4, 3, 6, 5, 7] ;
P = [1, 4, 5, 7] ;
false.
So the valid paths are not 2, since the above six paths are valid.
Now to count the paths you could try:
findall(P, path(1,7,P), Paths), length(Paths, N).
as suggested in comments but this is not very efficient since you need first to build a list of all paths and count the length.
If you're using Swipl you could try a fail-driven loop to calculate all possible paths and use nb_getval/2 and nb_setval/2 in order to count:
count(X,Y):-
nb_setval(counter, 0),
path(X,Y,_),
nb_getval(counter, Value),
New_value is Value+1,
nb_setval(counter, New_value),
fail;
nb_getval(counter, Value),
write(Value).
Example:
?- count(1,7).
6
true.

How to calculate a matrix formed by vector in Mathematica

I need to obtain a matrix vvT formed by a column vector v. i.e. the column vector v matrix times its transpose.
I found Mathematica doesn't support column vector. Please help.
Does this do what you want?
v = List /# Range#5;
vT = Transpose[v];
vvT = v.vT;
v // MatrixForm
vT // MatrixForm
vvT // MatrixForm
To get {1, 2, 3, 4, 5} into {{1}, {2}, {3}, {4}, {5}} you can use any of:
List /# {1, 2, 3, 4, 5}
{ {1, 2, 3, 4, 5} }\[Transpose]
Partition[{1, 2, 3, 4, 5}, 1]
You may find one of these more convenient than the others. Usually on long lists you will find Partition to be the fastest.
Also, your specific operation can be done in different ways:
x = {1, 2, 3, 4, 5};
Outer[Times, x, x]
Syntactically shortest:

Maximize function in Mathematica which counts elements

I reduced a debugging problem in Mathematica 8 to something similar to the following code:
f = Function[x,
list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
Count[list, x]
];
f[4]
Maximize{f[x], x, Integers]
Output:
4
{0, {x->0}}
So, while the maximum o function f is obtained when x equals 4 (as confirmed in the first output line), why does Maximize return x->0 (output line 2)?
The reason for this behavior can be easily found using Trace. What happens is that your function is evaluated inside Maximize with still symbolic x, and since your list does not contain symbol x, results in zero. Effectively, you call Maximize[0,x,Integers], hence the result. One thing you can do is to protect the function from immediate evaluation by using pattern-defined function with a restrictive pattern, like this for example:
Clear[ff];
ff[x_?IntegerQ] :=
With[{list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5}}, Count[list, x]]
It appears that Maximize can not easily deal with it however, but NMaximize can:
In[73]:= NMaximize[{ff[x], Element[x, Integers]}, x]
Out[73]= {4., {x -> 4}}
But, generally, either of the Maximize family functions seem not quite appropriate for the job. You may be better off by explicitly computing the maximum, for example like this:
In[78]:= list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
Extract[#, Position[#, Max[#], 1, 1] &[#[[All, 2]]]] &[Tally[list]]
Out[79]= {{4, 4}}
HTH
Try this:
list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
First#Sort[Tally[list], #1[[2]] > #2[[2]] &]
Output:
{4, 4}

Resources