Related
I need a clause that counts char coincidences between two large strings but omitting '_' coincidences. I have this code:
fit(GEN1, GEN2, N, N) :-
length(GEN1, L1),
length(GEN2, L2),
0 is L1*L2.
fit([P1|R1], [P2|R2], N, TOTAL) :-
member(P1, ['_',a,c,t,g]),
member(P2, ['_',a,c,t,g]),
append([P1],[P2],T),
( member(T,[[a,a],[c,c],[t,t],[g,g]])
-> X is N+1
; X is N
),
fit(R1,R2,X,TOTAL).
Where GEN1 and GEN2 are lists containing all characters large strings.
I've tried increasing the stack limit to avoid Out of Local Stack exception with little success.
The issue is that, is called often and in deep recursive clauses. Is there any better way to do this?
EDIT
The clause needs to stop when one or both lists are empty.
EDIT 2
Is worth saying that testings on all answers below were done using 64bit prolog, with the --stack-limit=32g option as my code isn't well optimized and the fit clause is a small part of a larger process, but was the main problem with my code.
EDIT 3
CapelliC code worked using the less resources.
false code using the library(reif) v2 worked the faster.
See Complexity of counting matching elements in two sequences using library(aggregate) for more proposed solutions.
It seems that there is no point to insist that you have letters out of "_actg" all the time. A generalized definition seems to be sufficient. Using library(reif):
fit([], _, N,N).
fit([_|_], [], N,N).
fit([P1|R1], [P2|R2], N,TOTAL) :-
if_( ( P1 = P2, dif(P1, '_') ), X is N+1, X = N ),
fit(R1, R2, X,TOTAL).
Update: please make sure to use v2 of library(reif). The original version did not compile dif/3.
And here a version for systems that can only index on one argument simultaneously:
fit([], _, N,N).
fit([P1|R1], L2, N,TOTAL) :-
ifit(L2, [P1|R1], N,TOTAL).
ifit([], _, N,N).
ifit([P2|R2], [P1|R1], N,TOTAL) :-
if_( ( P1 = P2, dif(P1, '_') ), X is N+1, X = N ),
fit(R1, R2, X,TOTAL).
if your Prolog has library(aggregate) you can do
fit(GEN1, GEN2, N) :-
aggregate_all(count, (nth1(P,GEN1,S),nth1(P,GEN2,S),memberchk(S,[a,c,g,t])), N).
edit
Depending on the statistic of data, a noticeable improvement can be obtained just swapping the last two calls, i.e. ...(nth1(P,GEN1,S),memberchk(S,[a,c,g,t]),nth1(P,GEN2,S))...
edit
Of course a tight loop it's better that a double indexed scan. For performance, I would write it like
fit_cc(GEN1, GEN2, N) :-
fit_cc(GEN1, GEN2, 0, N).
fit_cc([X|GEN1], [Y|GEN2], C, N) :-
( X\='_' /*memberchk(X, [a,c,g,t])*/, X=Y
-> D is C+1 ; D=C
),
fit_cc(GEN1, GEN2, D, N).
fit_cc(_, _, N, N).
but the generality and correctness allowed by library(reif) v2, as seen in #false' answer and comments, seems to be well worth the (pretty small) overhead.
In case you always call your predicate with two first arguments already fully instantiated, so you use it as a function, not as a relation -- which it seems like you do indeed -- I suspect that just adding !, at the start of your very last line of code should be enough to remove the stack overflow.
To do a little bit better, we'd use memberchk instead of member and notice that append([A],[B],C) is exactly the same thing as C = [A,B]; so after a little bit of reshufflling we end up with something like
fit( [], [], N, N).
fit( [P1|R1], [P2|R2], N, TOTAL) :-
memberchk( P1, [a,c,t,g]),
( P2 == P1
-> X is N+1
; X is N
),
%% !, %% might need the cut
fit( R1, R2, X, TOTAL).
and we might not even need that cut since memberchk is already deterministic.
(not tested, though)
this my code :
div2(L,N,L1,L2) :-
length(L1,N),%n=4
append(L1,L2, L),
L=L2,L1=[],L2=[].
i want it to display each time in L1 a 4 element list but it return false.
example :
L=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
L1=[1,2,3,4]
L1=[5,6,7,8]
L1=[9,10,11,12]
L1=[13,14,15,16]
how can i make it work. and thanks for the help.
Assuming you want div(L,N,L1,L2) to be that L1 is the first N element of L, and L2 is what remains:
div2(L,N,L1,L2) :-
length(L1,N),
append(L1,L2,L).
There are no other conditions that need to be placed on L, L1 or L2.
If you want to be the first 4 elements after some multiple of 4 elements, and L2 to be the rest, you need to say so:
div2(L,N,L1,L2) :-
append(L01,L2,L), % break up L
length(L01,Nx), % make sure prefix's length is multiple of N
0 is mod(Nx,N),
append(L0,L1,L01),% set L1 to be last N elements of prefix
length(L1,N).
A slightly different solution...
div2([X|Xs], N, P) :-
( length(P1, N),
append(P1, R, [X|Xs])
-> ( P = P1
; div2(R, N, P)
)
; P = [X|Xs]
).
This solution defines a list P1 of length N, and attempts to unify it (with append as a prefix list of [X|Xs]. ([X|Xs] is used instead of L to ensure that the predicate succeeds only if the first argument is a list with at least one element.)
If the append is successful, then the solution is either P1 (P is unified with P1) or a recursive call to div2 with the remainder of the first argument list with the prefix P1 absent.
If the append fails (which will happen if the number of partition elements N is larger than the first argument list length), then P is unified with the first argument list.
I want to write a predicate for 2 lists list_less(L1, L2) that is true if the list L1 is less than the list L2 with respect to the ordering below:
list_less(L1, L2) iff
Let L1' := complement(L1,L2), L2' := complement(L2,L1)
(complement(L1,L2) contains those elements of L1
that are not in L2)
Let m1 := max(L1'), m2 := max(L2')
(max(L) gives the maximal element of L with respect to
the standard order #<)
m1 #< m2.
and the output something like this :
?- list_less([3,3,3,3,2,2],[3,3,4,0]).
true.
?- list_less([a,b,X,Y,[X|Y],2], [[X,X|Y]]).
true.
?- list_less([a,b,X,Y,[X|Y],2], [X,b,b]).
false.
I started with this :
list_less([],[]).
list_less([H|T],[X|Y]):-
complement([H|T],[X|Y],L),
max_list(H|T],M1).
complement([],[],[]).
complement([H|T],[X|Y],L):-
member(H,[X|Y]),
!,
complement(T,Y,[X|_]).
max_list(L, Max):-
select(Max, L, Rest),
\+ (member(E, Rest), E > Max).
Here is a more compact solution (than one that computes the difference lists first).
list_less(As, Bs):-
sort(As, SortedAs),
reverse(SortedAs, RevAs),
sort(Bs, SortedBs),
reverse(SortedBs, RevBs),
RevAs #< RevBs.
The two lists are first sorted and reversed and then compared w.r.t. the standard order of terms. That means that elements are being compared starting from the left. Hence, the first two different elements (that correspond to the maximum element of each list that doesn't occur in the other) will make the comparison succeed or fail. This works because sort/2 removes the duplicates also.
| ?- list_less([3,3,3,3,2,2],[3,3,4,0]).
(4 ms) yes
| ?- list_less([a,b,X,Y,[X|Y],2], [[X,X|Y]]).
yes
| ?- list_less([a,b,X,Y,[X|Y],2], [X,b,b]).
no
Given the following facts in a database:
foo(a, 3).
foo(b, 2).
foo(c, 4).
foo(d, 3).
foo(e, 2).
foo(f, 6).
foo(g, 3).
foo(h, 2).
I want to collect all first arguments that have the smallest second argument, plus the value of the second argument. First try:
find_min_1(Min, As) :-
setof(B-A, foo(A, B), [Min-_|_]),
findall(A, foo(A, Min), As).
?- find_min_1(Min, As).
Min = 2,
As = [b, e, h].
Instead of setof/3, I could use aggregate/3:
find_min_2(Min, As) :-
aggregate(min(B), A^foo(A, B), Min),
findall(A, foo(A, Min), As).
?- find_min_2(Min, As).
Min = 2,
As = [b, e, h].
NB
This only gives the same results if I am looking for the minimum of a number. If an arithmetic expression in involved, the results might be different. If a non-number is involved, aggregate(min(...), ...) will throw an error!
Or, instead, I can use the full key-sorted list:
find_min_3(Min, As) :-
setof(B-A, foo(A, B), [Min-First|Rest]),
min_prefix([Min-First|Rest], Min, As).
min_prefix([Min-First|Rest], Min, [First|As]) :-
!,
min_prefix(Rest, Min, As).
min_prefix(_, _, []).
?- find_min_3(Min, As).
Min = 2,
As = [b, e, h].
Finally, to the question(s):
Can I do this directly with library(aggregate)? It feels like it should be possible....
Or is there a predicate like std::partition_point from the C++ standard library?
Or is there some easier way to do this?
EDIT:
To be more descriptive. Say there was a (library) predicate partition_point/4:
partition_point(Pred_1, List, Before, After) :-
partition_point_1(List, Pred_1, Before, After).
partition_point_1([], _, [], []).
partition_point_1([H|T], Pred_1, Before, After) :-
( call(Pred_1, H)
-> Before = [H|B],
partition_point_1(T, Pred_1, B, After)
; Before = [],
After = [H|T]
).
(I don't like the name but we can live with it for now)
Then:
find_min_4(Min, As) :-
setof(B-A, foo(A, B), [Min-X|Rest]),
partition_point(is_min(Min), [Min-X|Rest], Min_pairs, _),
pairs_values(Min_pairs, As).
is_min(Min, Min-_).
?- find_min_4(Min, As).
Min = 2,
As = [b, e, h].
What is the idiomatic approach to this class of problems?
Is there a way to simplify the problem?
Many of the following remarks could be added to many programs here on SO.
Imperative names
Every time, you write an imperative name for something that is a relation you will reduce your understanding of relations. Not much, just a little bit. Many common Prolog idioms like append/3 do not set a good example. Think of append(As,As,AsAs). The first argument of find_min(Min, As) is the minimum. So minimum_with_nodes/2 might be a better name.
findall/3
Do not use findall/3 unless the uses are rigorously checked, essentially everything must be ground. In your case it happens to work. But once you generalize foo/2 a bit, you will lose. And that is frequently a problem: You write a tiny program ; and it seems to work.
Once you move to bigger ones, the same approach no longer works. findall/3 is (compared to setof/3) like a bull in a china shop smashing the fine fabric of shared variables and quantification. Another problem is that accidental failure does not lead to failure of findall/3 which often leads to bizarre, hard to imagine corner cases.
Untestable, too specific program
Another problem is somewhat related to findall/3, too. Your program is so specific, that it is quite improbable that you will ever test it. And marginal changes will invalidate your tests. So you will soon give up to perform testing. Let's see what is specific: Primarily the foo/2 relation. Yes, only an example. Think of how to set up a test configuration where foo/2 may change. After each change (writing a new file) you will have to reload the program. This is so complex, chances are you will never do it. I presume you do not have a test harness for that. Plunit for one, does not cover such testing.
As a rule of thumb: If you cannot test a predicate on the top level you never will. Consider instead
minimum_with(Rel_2, Min, Els)
With such a relation, you can now have a generalized xfoo/3 with an additional parameter, say:
xfoo(o, A,B) :-
foo(A,B).
xfoo(n, A,B) :-
newfoo(A,B).
and you most naturally get two answers for minimum_with(xfoo(X), Min, Els). Would you have used findall/3 instead of setof/3 you already would have serious problems. Or just in general: minmum_with(\A^B^member(A-B, [x-10,y-20]), Min, Els). So you can play around on the top level and produce lots of interesting test cases.
Unchecked border cases
Your version 3 is clearly my preferred approach, however there are still some parts that can be improved. In particular, if there are answers that contain variables as a minimum. These should be checked.
And certainly, also setof/3 has its limits. And ideally you would test them. Answers should not contain constraints, in particular not in the relevant variables. This shows how setof/3 itself has certain limits. After the pioneering phase, SICStus produced many errors for constraints in such cases (mid 1990s), later changed to consequently ignoring constraints in built-ins that cannot handle them. SWI on the other hand does entirely undefined things here. Sometimes things are copied, sometimes not. As an example take:
setof(A, ( A in 1..3 ; A in 3..5 ), _) and setof(t, ( A in 1..3 ; A in 3.. 5 ), _).
By wrapping the goal this can be avoided.
call_unconstrained(Goal_0) :-
call_residue_vars(Goal_0, Vs),
( Vs = [] -> true ; throw(error(representation_error(constraint),_)) ).
Beware, however, that SWI has spurious constraints:
?- call_residue_vars(all_different([]), Xs).
Xs = [_A].
Not clear if this is a feature in the meantime. It has been there since the introduction of call_residue_vars/2 about 5 years ago.
I don't think that library(aggregate) covers your use case. aggregate(min) allows for one witness:
min(Expr, Witness)
A term min(Min, Witness), where Min is the minimal version of Expr over all solutions, and Witness is any other template applied to solutions that produced Min. If multiple solutions provide the same minimum, Witness corresponds to the first solution.
Some time ago, I wrote a small 'library', lag.pl, with predicates to aggregate with low overhead - hence the name (LAG = Linear AGgregate). I've added a snippet, that handles your use case:
integrate(min_list_associated, Goal, Min-Ws) :-
State = term(_, [], _),
forall(call(Goal, V, W), % W stands for witness
( arg(1, State, C), % C is current min
arg(2, State, CW), % CW are current min witnesses
( ( var(C) ; V #< C )
-> U = V, Ws = [W]
; U = C,
( C == V
-> Ws = [W|CW]
; Ws = CW
)
),
nb_setarg(1, State, U),
nb_setarg(2, State, Ws)
)),
arg(1, State, Min), arg(2, State, Ws).
It's a simple minded extension of integrate(min)...
The comparison method it's surely questionable (it uses less general operator for equality), could be worth to adopt instead a conventional call like that adopted for predsort/3. Efficiency wise, still better would be to encode the comparison method as option in the 'function selector' (min_list_associated in this case)
edit thanks #false and #Boris for correcting the bug relative to the state representation. Calling nb_setarg(2, State, Ws) actually changes the term' shape, when State = (_,[],_) was used. Will update the github repo accordingly...
Using library(pairs) and [sort/4], this can be simply written as:
?- bagof(B-A, foo(A, B), Ps),
sort(1, #=<, Ps, Ss), % or keysort(Ps, Ss)
group_pairs_by_key(Ss, [Min-As|_]).
Min = 2,
As = [b, e, h].
This call to sort/4 can be replaced with keysort/2, but with sort/4 one can also find for example the first arguments associated with the largest second argument: just use #>= as the second argument.
This solution is probably not as time and space efficient as the other ones, but may be easier to grok.
But there is another way to do it altogether:
?- bagof(A, ( foo(A, Min), \+ ( foo(_, Y), Y #< Min ) ), As).
Min = 2,
As = [b, e, h].
I wrote an algorithm and tried to implement it in Prolog, but what I found out is that the parenthesis do not work as expected: what is written in is not all done before exiting the parenthesis. Here is the code:
%1. If the first member of L1 is smaller than L2
% A. If the first member of L1 is not equal to Last
% Than: A.1 Add the first member of L1 to the first member of NL
% Begin recurssion on the resumption of L1, L2 resumption of NL
% and Last from L1.
% Else: A.2 begin the recursion on resumption of L1, L2 with the
% first member and L3 with the first member.
% 2. If the first member in L1 is equal to the first member of L2,
% Than: Start recursion on resumption of L1 and L2 (with its first
% member) and Last = *.
% 3. If the first member of L1 is bigger than the first membber of L2
% Than: begin recursion on L1 with the first member, resumption of
% L2 and Last = x. %(x != * only for debugging)
%
*/
make_list([X1|L1], [X2|L2], [X3|NewL], Last) :-
(
X1 < X2,
(
X1 \= Last, %A
X3=X1;
make_list(L1, [X2|L2], NewL, X1) %B
), %Why those parenthesis do not work as expected?
! %green cut
);
(
X1=X2,
make_list(L1, [X2|L2], [X3|NewL], *)
),
!
;
make_list([X1|L1], L2, [X3|NewL], *).
My question is how to make it work as expected and why does B not work once A is done? After all it is also in the same parenthesis, e.g.:
?- make_list([6,6,10,20],[10,25,30],L, -).
L = [6|_G849] % (should be [6,20]).
EDIT1: make_list should find all members in L1 that are not in L2 and put them in NewL, while Last stores the last member in L1 that was parsed.
EDIT2: no -> are allowed (this is howmework). If someone could show me how to express if then else in prolog, that could be great.
In short: !/0 applies at the predicate level: here you'd want it to apply to some block level where blocks would be delimited by ( and ). This notion of cut applied to blocks does not exist in Prolog. A cut will erase all encountered choice points up until its location in a predicate.
And instead of writing your code with a lot of (;)/2 usage, consider introducing new clauses instead, for readability (precisely because we do not like parenthesis and figuring out (;)/2 and (,)/2 priorities):
c :- A; B.
can be written
c :- A.
c :- B.
It will quite often be better this way. Better to produce the code easily, better to maintain it, to read it and to expand it.
I didn't look at your predicate in depth or anything, just wanted to mention those two things.
I suspect that "green cut" of yours is not green at all; you have
( A ; B ), !
so on exit from (A ; B) the first time, if A succeeded, B won't be tried anymore - that is what the cut ! is saying here: don't try any more.
If you want B to be tried too, remove the cut !.
If-then-else is:
ifte(A,B,C):- A, B.
ifte(A,B,C):- \+A, C.
We can spare us a not by using a cut,
ifte(A,B,C):- A, !, B.
ifte(A,B,C):- C.
About your code: we express and then with comma: A,B. To output the Last it's easiest to use a working predicate, with additional argument, "seen-last"; and in the base case finally the last seen and the output would be unified.