Capture matching substrings from a DCG - prolog

Using regular expressions makes it quite easy to capture sub strings, e.g. the string "Jaco was an American bassist" matches this regular expression (PCRE2 syntax):
(?sm)^([Jj]aco).+(was|is).+?(American|famous).+(dancer|bassist|singer|programmer|dueller)
and captures these strings
Jaco
was
American
bassist.
Here is a DCG that matches the string as well as generating all the possible strings. But it doesn't capture the specific sub strings.
jaco_bassist --> ("J" ; "j"), "aco", space, ("was" ; "is"), space, ("a" ; "an"), space,
("American" ; "famous"), space,
("dancer" ; "bassist" ; "singer" ; "programmer" ; "dueller").
space --> " ".
What would be the best - or at last a good - way of getting the same captures using Prolog's DCGs. Preferably an approach that also generates the possible strings.
For simple problems like this one can use member/2 to enumerate all the alternatives:
jaco_bassist2([Name,WasIs,Adj,Noun]) --> who(Name), space, was_is(WasIs), space,
("a" ; "an"), space, adj(Adj), space,
noun(Noun).
who(Who) --> [Who], {member(Who,["Jaco","jaco"])}.
was_is(WasIs) --> [WasIs], {member(WasIs,["was","is"])}.
adj(Adj) --> [Adj], {member(Adj,["American","famous"])}.
noun(Noun) --> [Noun], {member(Noun,["dancer","bassist","singer","programmer","dueller"])}.
To get the captures:
% ...
phrase(jaco_bassist2,[Who,WasIs,Adj,Noun], String)
A major drawback of this approach is that for more complex structures the enumeration can be a little tricky, for example if the name in the subject string instead of "[Jj]aco" would be one of the 48 spellings of my last name (kjellerstrand):
kjellerstrand --> "k", ("je" ; "ä"), "ll", ("" ; "er" ; "ar"),
("st" ; "b"), ("" ; "r"), "a", (""; "n"), "d".
Please note that I'm looking for "basic" DCG, for example those supported by e.g. B-Prolog (i.e. not requiring SWI-Prolog's fancy DCG stuff).

Let me re-phrase that: Given a goal phrase(NT__0, Cs0,Cs), capture the sequence described by NT__0.
First of all we need to restrict ourselves to DCGs without semicontext. For a (non-empty) semicontext may be represented with two variables (which in that context do not form a difference) but cannot be captured with a single list.
append(Capture, Cs, Cs0) should be it. At least declaratively when considering only ground terms.
as --> "" | "a", as.
?- Cs = [], phrase(as, Cs0,Cs), append(Capture, Cs, Cs0).
Cs = [], Cs0 = [], Capture = []
; Cs = [], Cs0 = "a", Capture = "a"
; Cs = [], Cs0 = "aa", Capture = "aa"
; Cs = [], Cs0 = "aaa", Capture = "aaa"
; Cs = [], Cs0 = "aaaa", Capture = "aaaa"
; ... .
?- phrase(as, Cs0,Cs), append(Capture, Cs, Cs0).
Cs0 = Cs, Capture = []
; Cs0 = [_A|Cs0], Cs = [_A|Cs0], Capture = [_A], unexpected
; Cs0 = [_A,_B|Cs0], Cs = [_A,_B|Cs0], Capture = [_A,_B], unexpected
; ... .
?- set_prolog_flag(occurs_check,true).
true.
?- phrase(as, Cs0,Cs), append(Capture, Cs, Cs0).
Cs0 = Cs, Capture = []
; loops, unexpected.
So far, the procedural reality of Prolog is a bit different. append/3 only works for lists but not for partial lists. There infinite, rational trees show up. And the occurs-check does not help that much, it just prevents the display of such answers, but keeps non-termination.
Time for a new version of append/3, append2u/3
?- set_prolog_flag(occurs_check,false).
true.
?- phrase(as, Cs0,Cs), append2u(Capture, Cs, Cs0).
Cs0 = Cs, Capture = []
; Cs0 = [a|Cs0], Cs = [a|Cs0], Capture = [], unexpected
; Cs0 = [a|Cs], Capture = "a", dif:dif(Cs,[a|Cs])
; Cs0 = [a,a|Cs0], Cs = [a,a|Cs0], Capture = [], unexpected
; Cs0 = [a,a|Cs], Capture = "aa", dif:dif(Cs,[a,a|Cs])
; Cs0 = [a,a,a|Cs0], Cs = [a,a,a|Cs0], Capture = [], unexpected
; ... .
?- set_prolog_flag(occurs_check,true).
true.
?- phrase(as, Cs0,Cs), append2u(Capture, Cs, Cs0).
Cs0 = Cs, Capture = []
; Cs0 = [a|Cs], Capture = "a"
; Cs0 = [a,a|Cs], Capture = "aa"
; Cs0 = [a,a,a|Cs], Capture = "aaa"
; ... .
So with the help of the occurs-check it is possible to get this right, also for the more general case. A new non-terminal phrase_capture//2 now uses the following internal definition:
phrase_capture(NT__0, Capture, S0,S) :-
phrase(NT__0, S0,S1),
append2u(Capture, S1, S0),
S1 = S.
For systems without a built-in occurs-check like B, rewrite append2u/3 using unify_with_occurs_check/2 explicitly. That is, also for (\=)/2.
Some further optimizations may be done to avoid costs that depend on the size of Cs0+Cs instead of the length of Capture. Like special casing for var(Cs), Cs == [], and partial strings. If Cs is a list constructor, an internal implementation may also just skip through Cs0 to find that very address of Cs first, and only resort to more costly means otherwise. Care must be given to ensure that this is always terminating, thus using mechanisms similar to '$skip_max_list'/4.
Also, what to do if Cs0 and Cs do not fit, that is, if they are not the result of a valid grammar. Such a case may happen with generalizations to explain unexpected failure.
Usage:
jaco_bassist([Name,WasIs,Adj,Noun]) -->
phrase_capture( (("J" ; "j"), "aco"), Name),
space,
phrase_capture( ("was" ; "is"), WasIs),
space,
("a" ; "an"),
space,
phrase_capture( ("American" ; "famous"), Adj),
space,
phrase_capture( ("dancer" ; "bassist" ; "singer" ; "programmer" ; "dueller"), Noun).
?- phrase(jaco_bassist(D), Ys).
D = ["Jaco","was","American","dancer"], Ys = "Jaco was a American ..."
; D = ["Jaco","was","American","bassist"], Ys = "Jaco was a American ..."
; D = ["Jaco","was","American","singer"], Ys = "Jaco was a American ..."
; ...
; D = ["jaco","is","famous","dueller"], Ys = "jaco is an famous d ...".
So this version terminates also when generating strings. And it has the potential to incur costs that are in many cases only depending on the length of the captured string. The original version using append/3 will always visit the entire string.
Lest I forget, there will always be some oddities should you be into infinite lists. Think of:
?- phrase("abc",L0,L0).
L0 = [a,b,c|L0].
?- phrase("abc",L0,L0), phrase(phrase_capture("abc",Capture),L0,L).
L0 = [a,b,c|L0], Capture = [], L = [a,b,c|L0], unexpected.
L0 = [a,b,c|L0], Capture = "abc", L = [a,b,c|L0]. % expected
These are all typical paradoxa that infinite lists ensue. First luring people into them only to smother them.
The following version of phrase_capture//2 does not rely on internal details. It uses the ^s of library(lambda) which are responsible for parameter passing only. (The other lambda-related construct \ is for renaming.)
phrase_capture(NT__0, Capture) -->
call(S0^S0^true),
NT__0,
call(S1^S1^true),
{append2u(Capture, S1, S0)}.

Isn't this as simple as:
% Show lists of codes as text (if 3 chars or longer)
:- portray_text(true).
sentence([P, T]) --> person(P), space, tense(T).
person(N, DL, T) :-
member(N, [`Jaco`, `jaco`]),
list_to_dl(N, DL, T).
tense(N, DL, T) :-
member(N, [`was`, `is`]),
list_to_dl(N, DL, T).
space --> " ".
list_to_dl([], T, T).
list_to_dl([H|T], [H|T2], Tail) :-
list_to_dl(T, T2, Tail).
Results in swi-prolog (so you'll have to tweak the quoting to suit your Prolog system):
?- time(phrase(sentence(S), `Jaco is`)).
% 25 inferences, 0.000 CPU in 0.000 seconds (85% CPU, 797728 Lips)
S = [`Jaco`,[105,115]] ;
% 7 inferences, 0.000 CPU in 0.000 seconds (79% CPU, 311360 Lips)
false.
... and it can generate:
?- time(phrase(sentence(S), L)).
% 24 inferences, 0.000 CPU in 0.000 seconds (85% CPU, 767043 Lips)
S = [`Jaco`,`was`],
L = `Jaco was` ;
% 7 inferences, 0.000 CPU in 0.000 seconds (75% CPU, 392971 Lips)
S = [`Jaco`,[105,115]],
L = `Jaco is` ;
% 17 inferences, 0.000 CPU in 0.000 seconds (82% CPU, 667504 Lips)
S = [`jaco`,`was`],
L = `jaco was` ;
% 8 inferences, 0.000 CPU in 0.000 seconds (62% CPU, 460750 Lips)
S = [`jaco`,[105,115]],
L = `jaco is`.
To handle the surname - can use term expansion to automate the string duplication:
sentence([P, T, SN]) -->
dcg(person, P), space, dcg(tense, T), space, surname(SN).
space --> " ".
surname(SN) -->
dcg(sn1, SN1), dcg(sn2, SN2), dcg(sn3, SN3),
dcg(sn4, SN4), dcg(sn5, SN5), dcg(sn6, SN6),
dcg(sn7, SN7), dcg(sn8, SN8), dcg(sn9, SN9),
{ append([SN1, SN2, SN3, SN4, SN5, SN6, SN7, SN8, SN9], SN) }.
term_expansion(expand(Name, Codes), [dcg(Name, Codes) --> Codes]).
expand(person, `Jaco`).
expand(person, `jaco`).
expand(tense, `was`).
expand(tense, `is`).
expand(sn1, `k`).
expand(sn2, `je`).
expand(sn2, `ä`).
expand(sn3, `ll`).
expand(sn4, ``).
expand(sn4, `er`).
expand(sn4, `ar`).
expand(sn5, `st`).
expand(sn5, `b`).
expand(sn6, ``).
expand(sn6, `r`).
expand(sn7, `a`).
expand(sn8, ``).
expand(sn8, `n`).
expand(sn9, `d`).
... which can both parse and generate - results in swi-prolog:
?- time(phrase(sentence(S), `jaco is kjellerstrand`)).
% 61 inferences, 0.000 CPU in 0.000 seconds (89% CPU, 1618037 Lips)
S = [`jaco`,[105,115],`kjellerstrand`] ;
% 5 inferences, 0.000 CPU in 0.000 seconds (79% CPU, 295299 Lips)
false.
?- time(phrase(sentence(S), L)).
% 54 inferences, 0.000 CPU in 0.000 seconds (90% CPU, 1390570 Lips)
S = [`Jaco`,`was`,`kjellstad`],
L = `Jaco was kjellstad` ;
% 37 inferences, 0.000 CPU in 0.000 seconds (79% CPU, 1141236 Lips)
S = [`Jaco`,`was`,`kjellstand`],
L = `Jaco was kjellstand` ;
% 39 inferences, 0.000 CPU in 0.000 seconds (87% CPU, 1291519 Lips)
S = [`Jaco`,`was`,`kjellstrad`],
L = `Jaco was kjellstrad` ;
% 38 inferences, 0.000 CPU in 0.000 seconds (85% CPU, 1573173 Lips)
S = [`Jaco`,`was`,`kjellstrand`],
L = `Jaco was kjellstrand` ;
% 38 inferences, 0.000 CPU in 0.000 seconds (86% CPU, 1382774 Lips)
S = [`Jaco`,`was`,`kjellbad`],
L = `Jaco was kjellbad`
etc.

Related

Match anything except if a negative rule matches

I have a rule that matches bc. When I encounter that in a string, I don't want to parse that string, otherwise parse anything else.
% Prolog
bc(B, C) --> [B, C], {
B = "b",
C = "c"
}.
not_bc(O) --> [O], % ?! bc(O, C).
% ?- phrase(not_bc(O), "bcdefg").
% false.
% ?- phrase(not_bc(O), "abcdefg").
% O = "a".
% ?- phrase(not_bc(O), "wxcybgz")
% O = "w".
% ?- phrase(not_bc(O), "wxybgz")
% O = "w".
Simplified version of my problem, hopefully solutions are isomorphic.
Similar to this question:
Translation to DCG Semicontext not working - follow on
An alternative:
process_bc(_) --> "bc", !, { fail }.
process_bc(C) --> [C].
This differs from my other solution in accepting:
?- time(phrase(process_bc(C), `b`, _)).
% 8 inferences, 0.000 CPU in 0.000 seconds (83% CPU, 387053 Lips)
C = 98.
In swi-prolog:
process_text(C1) --> [C1, C2], { dif([C1, C2], `bc`) }.
Results:
?- time(phrase(process_text(C), `bca`, _)).
% 11 inferences, 0.000 CPU in 0.000 seconds (79% CPU, 376790 Lips)
false.
?- time(phrase(process_text(C), `bd`, _)).
% 10 inferences, 0.000 CPU in 0.000 seconds (80% CPU, 353819 Lips)
C = 98.
?- time(phrase(process_text(C), `zbcagri4gj40w9tu4tu34ty3ty3478t348t`, _)).
% 10 inferences, 0.000 CPU in 0.000 seconds (80% CPU, 372717 Lips)
C = 122.
A single character, or no characters, are both presumably meant to be failures.
This is nicely efficient, only having to check the first 2 characters.

Prolog program that swaps the two halves of a list

I am new to this language and am having trouble coming up with a solution to this problem. The program must implement the following cases.
Both variables are instantiated:
pivot( [1,2,3,4,5,6,7], [5,6,7,4,1,2,3] ).`
yields a true/yes result.
Only Before is instantiated:
pivot( [1,2,3,4,5,6], R ).
unifies R = [4,5,6,1,2,3] as its one result.
Only After is instantiated:
pivot(L, [1,2]).
unifies L = [2,1] as its one result.
Neither variable is instantiated:
pivot(L, R).
is undefined (since results are generated arbitrarily).
If by pivot, you mean to split the list in 2 and swap the halves, then something like this would work.
First, consider the normal case: If you have an instantiated list, pivoting it is trivial. You just need to
figure out half the length of the list
break it up into
a prefix, consisting of that many items, and
a suffix, consisting of whatever is left over
concatenate those two lists in reverse order
Once you have that, everything else is just a matter of deciding which variable is bound and using that as the source list.
It is a common Prolog idiom to have a single "public" predicate that invokes a "private" worker predicate that does the actual work.
Given that the problem statement requires that at least one of the two variable in your pivot/2 must be instantiated, we can define our public predicate along these lines:
pivot( Ls , Rs ) :- nonvar(Ls), !, pivot0(Ls,Rs) .
pivot( Ls , Rs ) :- nonvar(Rs), !, pivot0(Rs,Ls) .
If Ls is bound, we invoke the worker, pivot0/2 with the arguments as-is. But if Ls is unbound, and Rs is bound, we invoke it with the arguments reversed. The cuts (!) are there to prevent the predicate from succeeding twice if invoked with both arguments bound (pivot([a,b,c],[a,b,c]).).
Our private helper, pivot0/2 is simple, because it knows that the 1st argument will always be bound:
pivot0( Ls , Rs ) :- % to divide a list in half and exchange the halves...
length(Ls,N0) , % get the length of the source list
N is N0 // 2 , % divide it by 2 using integer division
length(Pfx,N) , % construct a unbound list of the desired length
append(Pfx,Sfx,Ls) , % break the source list up into its two halves
append(Sfx,Pfx,Rs) % put the two halves back together in the desired order
. % Easy!
In swi-prolog:
:- use_module(library(dcg/basics)).
pivot_using_dcg3(Lst, LstPivot) :-
list_first(Lst, LstPivot, L1, L2, IsList),
phrase(piv3_up(L1), L1, L2),
% Improve determinism
(IsList = true -> ! ; true).
piv3_up(L), string(Ri), string(M), string(Le) --> piv3(L, Le, M, Ri).
piv3([], [], [], Ri) --> [], remainder(Ri).
piv3([_], [], [H], Ri) --> [H], remainder(Ri).
piv3([_, _|Lst], [H|T], M, Ri) --> [H], piv3(Lst, T, M, Ri).
% From 2 potential lists, rearrange them in order of usefulness
list_first(V1, V2, L1, L2, IsList) :-
( is_list(V1) ->
L1 = V1, L2 = V2,
IsList = true
; L1 = V2, L2 = V1,
(is_list(L1) -> IsList = true ; IsList = false)
).
Is general and deterministic, with good performance:
?- time(pivot_using_dcg3(L, P)).
% 18 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 402441 Lips)
L = P, P = [] ;
% 8 inferences, 0.000 CPU in 0.000 seconds (86% CPU, 238251 Lips)
L = P, P = [_] ;
% 10 inferences, 0.000 CPU in 0.000 seconds (87% CPU, 275073 Lips)
L = [_A,_B],
P = [_B,_A] ;
% 10 inferences, 0.000 CPU in 0.000 seconds (94% CPU, 313391 Lips)
L = [_A,_B,_C],
P = [_C,_B,_A] ;
% 12 inferences, 0.000 CPU in 0.000 seconds (87% CPU, 321940 Lips)
L = [_A,_B,_C,_D],
P = [_C,_D,_A,_B] ;
% 12 inferences, 0.000 CPU in 0.000 seconds (86% CPU, 345752 Lips)
L = [_A,_B,_C,_D,_E],
P = [_D,_E,_C,_A,_B] ;
% 14 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 371589 Lips)
L = [_A,_B,_C,_D,_E,_F],
P = [_D,_E,_F,_A,_B,_C] ;
?- numlist(1, 5000000, P), time(pivot_using_dcg3(L, P)).
% 7,500,018 inferences, 1.109 CPU in 1.098 seconds (101% CPU, 6759831 Lips)
The performance could be improved further, using difference lists for the final left-middle-right append, and cuts (sacrificing generality).

Prolog returning only the max value from all iterations

I have a method that returns me a number on all iterations, but now I need to returns only the max value from all of the iterations that were done.
find_max(X, Y):-
find_number(X, Y).
So the find_number() returns only 1 number and some text alongside it. So for example if I were to ran it I would get this output:
X = 1, Y = me;
X = 5, Y = you;
X = 6, Y = he;
And the only output I need to return is the X = 6, Y = he;.
I am using SWI-Prolog.
A more portable alternative to the library(aggregate) posted by Willem, as the library is only available in a few Prolog systems, is:
find_max_alt(Xm, Ym) :-
setof(max(X, Y), find_number(X, Y), Solutions),
reverse(Solutions, [max(Xm, Ym)| _]).
This solution also appears to required a smaller number of inferences. Using the data in the question, we get:
?- time(find_max(Xm, Ym)).
% 40 inferences, 0.000 CPU in 0.000 seconds (83% CPU, 800000 Lips)
Xm = 6,
Ym = he.
Versus:
?- time(find_max_alt(Xm, Ym)).
% 25 inferences, 0.000 CPU in 0.000 seconds (76% CPU, 675676 Lips)
Xm = 6,
Ym = he.
The setof/3 predicate is a standard predicate. The reverse/2 predicate is a common list predicate (and much simpler to define than the predicates in the aggregate library.
You can use the aggregate library for that:
:- use_module(library(aggregate)).
find_max(Xm, Ym):-
aggregate(max(X, Y), find_number(X, Y), max(Xm, Ym)).

partition a set into n subsets using prolog

I'm struggling with the following problem, partition a set into n subsets using prolog.
So for example, I give as input to program: X = [1,2,3,4], N=3 and I get
Res = [[1,2], [3], [4]]
Res = [[1,3], [2], [4]]
Res = [[1,4], [2], [3]]
Res = [[2,3], [1], [4]]
Res = [[2,4], [1], [3]]
Res = [[3,4], [1], [2]]
or I give as input: X = [1,2,3,4], N=2 and I get
Res = [[1,2], [3,4]]
Res = [[1,3], [2,4]]
Res = [[1,4], [2,3]]
Res = [[1,2,3], [4]]
Res = [[1,2,4], [3]]
Res = [[1,3,4], [2]]
Res = [[2,3,4], [1]]
This answer extends
#lurker's previous answer with additional (redundant) constraints.
Using dcg we define the following auxiliary non-terminals:
same_length([]) --> []. % DCG-style same_length/2
same_length([_|Es]) --> [_], same_length(Es).
same_length1([_|Es]) --> [_], same_length(Es).
same_lengths1([]) --> [].
same_lengths1([Es|Ess]) --> same_length1(Es), same_lengths1(Ess).
We utilize these DCGs by adding a phrase/2 goal upfront:
list_partitionedNU(Es, Xss) :-
phrase(same_lengths1(Xss), Es),
list_partitioned(Es, Xss).
Do we still get reasonable answers for some vanilla test case?
?- list_partitionedNU([a,b,c], Xss).
Xss = [[a],[b],[c]]
; Xss = [[a],[b,c]]
; Xss = [[a,b],[c]]
; Xss = [[a,c],[b]]
; Xss = [[a,b,c]]
; false.
Sure looks okay to me.
Next, let's talk about universal termination. Goals like list_partitioned(Es, [[a,b,c]]) do not terminate universally—even though they are trivial. list_partitionedNU/2 fixes this:
?- list_partitioned(Es, [[a,b,c]]).
Es = [a,b,c]
; NONTERMINATION
?- list_partitionedNU(Es, [[a,b,c]]).
Es = [a,b,c]
; false. % terminates universally
These additional constraints can speedup some queries considerably.
Using SICStus Prolog 4.4.0:
| ?- use_module(library(between), [numlist/3]).
yes
| ?- numlist(1, 14, _Es),
length(_Xss, 10),
member(P_2, [list_partitioned,list_partitionedNU]),
call_time((call(P_2,_Es,_Xss), false ; true), T_msec).
P_2 = list_partitioned , T_msec = 29632 ? ;
P_2 = list_partitionedNU, T_msec = 600 ? ; % 40x faster
no
Alright! Of course, the speedup depends on the actual lengths of the lists used... YMMV:)
The problem is already mostly solved in this question: All Partitions of a List In Prolog. This was easy to find just doing a Google search on "Prolog partition set".
Then you can just constrain it with length/2:
partitions_of_length(List, N, Partition) :-
length(Partition, N), list_partitioned(List, Partition).
| ?- partitions_of_length([a,b,c,d], 2, L).
L = [[a,b,c],[d]] ? ;
L = [[a,b,d],[c]] ? ;
L = [[a,b],[c,d]] ? ;
L = [[a,c,d],[b]] ? ;
L = [[a,c],[b,d]] ? ;
L = [[a,d],[b,c]] ? ;
L = [[a],[b,c,d]] ? ;
no
| ?-
We optimize performance in this case by constraining the length first. Below illustrates, in SWI Prolog, the difference between constraining the length after versus before:
:- use_module(library(statistics)).
6 ?- time((list_partitioned([a,b,c,d], P), length(P, 2))).
% 18 inferences, 0.000 CPU in 0.000 seconds (85% CPU, 1580195 Lips)
P = [[a, b, c], [d]] ;
% 12 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 1059696 Lips)
P = [[a, b, d], [c]] ;
% 10 inferences, 0.000 CPU in 0.000 seconds (86% CPU, 900414 Lips)
P = [[a, b], [c, d]] ;
% 19 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 1624070 Lips)
P = [[a, c, d], [b]] ;
% 10 inferences, 0.000 CPU in 0.000 seconds (86% CPU, 1021555 Lips)
P = [[a, c], [b, d]] ;
% 19 inferences, 0.000 CPU in 0.000 seconds (87% CPU, 1665060 Lips)
P = [[a, d], [b, c]] ;
% 19 inferences, 0.000 CPU in 0.000 seconds (87% CPU, 1661420 Lips)
P = [[a], [b, c, d]] ;
% 37 inferences, 0.000 CPU in 0.000 seconds (90% CPU, 2382639 Lips)
false.
7 ?- time((length(P, 2), list_partitioned([a,b,c,d], P))).
% 13 inferences, 0.000 CPU in 0.000 seconds (89% CPU, 1175832 Lips)
P = [[a, b, c], [d]] ;
% 6 inferences, 0.000 CPU in 0.000 seconds (83% CPU, 742023 Lips)
P = [[a, b, d], [c]] ;
% 6 inferences, 0.000 CPU in 0.000 seconds (85% CPU, 848896 Lips)
P = [[a, b], [c, d]] ;
% 9 inferences, 0.000 CPU in 0.000 seconds (84% CPU, 1210328 Lips)
P = [[a, c, d], [b]] ;
% 6 inferences, 0.000 CPU in 0.000 seconds (82% CPU, 828386 Lips)
P = [[a, c], [b, d]] ;
% 9 inferences, 0.000 CPU in 0.000 seconds (84% CPU, 1215723 Lips)
P = [[a, d], [b, c]] ;
% 9 inferences, 0.000 CPU in 0.000 seconds (90% CPU, 697999 Lips)
P = [[a], [b, c, d]] ;
% 10 inferences, 0.000 CPU in 0.000 seconds (86% CPU, 991277 Lips)
false.
If you were to modify the code in the link above to constrain the length of the list, the best way is probably to put the length/2 call inside the predicate before doing anything else, but the behavior is identical then to the above.

Which list item is the most common

I'm trying to find the most common list item common([b,a,a,a,c,d,b,f,s,f,s,f,s,f,s,f,f],R) so the result should be R=f,
I was thinking if we take the list , go to the end of the list take el=b ,num1=1 then go back to the beginning and compare if b=b ,num1=num1+1 else a!=b then if num2=num2+1 , num1>num2 recursion else el=a or something like this, but i had some difficulty transforming it into Prolog.
insert_sort sorts the list , but for some interesting reason if i use las(X,Y) (I override the original last/2 ) I get 4-a if I use last(X,Y) i get just a...
most_common([X|Y],J):-
insert_sort([X|Y],[R|Rs]),
count_runs([R|Rs],G),
las(G,J).
las([N-Y],Y).
las([_|T],Y):- las(T,Y).
las([_|Tail], Y) :- las(Tail, Y).
insert_sort(List,Sorted):-
i_sort(List,[],Sorted).
i_sort([],Acc,Acc).
i_sort([H|T],Acc,Sorted):-
insert(H,Acc,NAcc),
i_sort(T,NAcc,Sorted).
insert(X,[],[X]).
insert(X,[Y|T],[Y|NT]):- X #> Y, insert(X,T,NT).
insert(X,[Y|T],[X,Y|T]):- X #=< Y.
This looks like homework, so I'm not going to give you a full answer, but will suggest how you could solve it in one particular way, which isn't necessarily the best way:
Sort the list into sorted order (by standard order of terms if this is good enough): look at sort/2 routines. e.g., [b,a,a,a,c,d,b] becomes [a,a,a,b,b,c,d].
Take the sorted list and count the size of 'runs', perhaps to convert [a,a,a,b,b,c,d] into [3-a,2-b,1-c,1-d] (where -/2 is simply another term). e.g., consider the following code:
count_runs([E|Es], C) :-
% defer to count_runs/3 with an initial count of element E
count_runs(Es, 1-E, C).
% return the final count for Y elements if none remain (base case)
count_runs([], N-Y, [N-Y]).
count_runs([X|Es], N-Y, [N-Y|Rest]) :-
% if X is not equal to Y, record the count and continue next run
X \== Y, !,
count_runs([X|Es], Rest).
count_runs([_X|Es], N-Y, Rest) :-
% else X equals Y; increment the counter and continue
NPlusOne is N + 1,
count_runs(Es, NPlusOne-Y, Rest).
Perform something like keysort/2 to order the terms by the value of their keys (i.e., the numbers which are the counts, turning [3-a,2-b,1-c,1-d] into [1-c,1-d,2-b,3-a]). Then, the most-occurring elements of the list are the values at the end of the list with the same key value (i.e., here, this is the a in the last term 3-a). In general, they may be more than one element that occurs the most (equally with another).
Good luck.
Based on Prolog lambdas, we use the meta-predicates tcount/3 and reduce/3, as well as the reified term equality predicate (=)/3:
:- use_module(library(lambda)).
mostcommon_in(E,Xs) :-
tcount(=(E),Xs,M),
maplist(Xs+\X^N^(tcount(=(X),Xs,N)),Xs,Counts),
reduce(\C0^C1^C^(C is max(C0,C1)),Counts,M).
Sample query:
?- mostcommon_in(X,[a,b,c,d,a,b,c,a,b]).
X = a ;
X = b ;
false.
Note that this is monotone (unlike it's earlier quick-hack version). Look!
?- mostcommon_in(X,[A,B,C,D,A,B,C,A,B]), A=a,B=b,C=c,D=d.
X = a, A = a, B = b, C = c, D = d ;
X = b, A = a, B = b, C = c, D = d ;
false.
Preserve logical-purity by
using list_counts/2 for defining mostcommonitem_in/2 as follows:
mostcommonitem_in(E,Xs) :-
list_counts(Xs,Cs), % tag items with multiplicity
maplist(\ (X-N)^(M-X)^(M is -N),Cs,Ps), % prepare keysorting
keysort(Ps,[Max-_|_]), % sort ascending by negated count
member(Max-E,Ps). % pick most common ones
Let's run a query!
?- mostcommonitem_in(X,[a,b,c,d,a,b,c,a,b]).
X = a ;
X = b ;
false. % OK
But, is it still monotone?
?- mostcommonitem_in(X,[A,B,C,D,A,B,C,A,B]), A=a,B=b,C=c,D=d.
X = A, A = a, B = b, C = c, D = d ;
X = B, B = b, A = a, C = c, D = d ;
false. % OK: monotone
Got speed? (compared to the pure answer I showed in my previous answer to this question)
% OLD
?- length(Xs,5), time(findall(t,mostcommon_in(E,Xs),Ts)), length(Ts,N_sols).
% 854,636 inferences, 0.115 CPU in 0.115 seconds (100% CPU, 7447635 Lips)
N_sols = 71, Xs = [_,_,_,_,_], Ts = [t,t,t|...].
?- length(Xs,6), time(findall(t,mostcommon_in(E,Xs),Ts)), length(Ts,N_sols).
% 4,407,975 inferences, 0.449 CPU in 0.449 seconds (100% CPU, 9813808 Lips)
N_sols = 293, Xs = [_,_,_,_,_,_], Ts = [t,t,t|...].
?- length(Xs,7), time(findall(t,mostcommon_in(E,Xs),Ts)), length(Ts,N_sols).
% 24,240,240 inferences, 2.385 CPU in 2.384 seconds (100% CPU, 10162591 Lips)
N_sols = 1268, Xs = [_,_,_,_,_,_,_], Ts = [t,t,t|...].
% NEW
?- length(Xs,5), time(findall(t,mostcommonitem_in(E,Xs),Ts)), length(Ts,N_sols).
% 4,031 inferences, 0.001 CPU in 0.002 seconds (93% CPU, 2785423 Lips)
N_sols = 71, Xs = [_,_,_,_,_], Ts = [t,t,t|...].
?- length(Xs,6), time(findall(t,mostcommonitem_in(E,Xs),Ts)), length(Ts,N_sols).
% 17,632 inferences, 0.002 CPU in 0.002 seconds (100% CPU, 9194323 Lips)
N_sols = 293, Xs = [_,_,_,_,_,_], Ts = [t,t,t|...].
?- length(Xs,7), time(findall(t,mostcommonitem_in(E,Xs),Ts)), length(Ts,N_sols).
% 82,263 inferences, 0.023 CPU in 0.023 seconds (100% CPU, 3540609 Lips)
N_sols = 1268, Xs = [_,_,_,_,_,_,_], Ts = [t,t,t|...].
I could give you a high-level answer: You could sort the list and then it's relatively easy to count the items, one after another, and update what so far is the most common item.

Resources