DNA Matching in Prolog

DNA Matching in Prolog - prolog

I am attempting to learn basic Prolog. I have read some basic tutorials on the basic structures of lists, variables, and if/and logic. A project I am attempting to do to help learn some of this is to match DNA sequences.
Essentially I want it to match reverse compliments of DNA sequences.
Example outputs can be seen below:
?- dnamatch([t, t, a, c],[g, t, a, a]).
true
While it's most likely relatively simple, being newer to Prolog I am currently figuring it out.
I started by defining basic matching rules for the DNA pairs:
pair(a,t).
pair(g,c).
etc...
I was then going to try to implement this into lists somehow, but am unsure how to make this logic apply to longer lists of sequences. I am unsure if my attempted start is even the correct approach. Any help would be appreciated.

Since your relation is describing lists, you could opt to use DCGs. You can describe the complementary nucleobases like so:
complementary(t) --> % thymine is complementary to
[a]. % adenine
complementary(a) --> % adenine is complementary to
[t]. % thymine
complementary(g) --> % guanine is complementary to
[c]. % cytosine
complementary(c) --> % cytosine is complementary to
[g]. % guanine
This corresponds to your predicate pair/2. To describe a bonding sequence in reverse order you can proceed like so:
bond([]) --> % the empty sequence
[]. % doesn't bond
bond([A|As]) --> % the sequence [A|As] bonds with
bond(As), % a bonding sequence to As (in reverse order)
complementary(A). % followed by the complementary nucleobase of A
The reverse order is achieved by writing the recursive goal first and then the goal that describes the complementary nucleobase to the one in the head of the list. You can query this using phrase/2 like so:
?- phrase(bond([t,t,a,c]),S).
S = [g,t,a,a]
Or you can use a wrapper predicate with a single goal containing phrase/2:
seq_complseq(D,M) :-
phrase(bond(D),M).
And then query it:
?- seq_complseq([t,t,a,c],C).
C = [g,t,a,a]
I find the description of lists with DCGs easier to read than the corresponding predicate version. Of course, describing a complementary sequence in reverse order is a relatively easy task. But once you want to describe more complex structures like, say the cloverleaf structure of tRNA DCGs come in real handy.

A solution with maplist/3 and reverse/2:
dnamatch(A,B) :- reverse(B,C), maplist(pairmatch,A,C).

If you want to avoid traversing twice you can also maybe do it like this?
rev_comp(DNA, RC) :-
rev_comp(DNA, [], RC).
rev_comp([], RC, RC).
rev_comp([X|Xs], RC0, RC) :-
pair(X, Y),
rev_comp(Xs, [Y|RC0], RC).
Then:
?- rev_comp([t,c,g,a], RC).
RC = [t, c, g, a].
This is only hand-coded amalgamation of reverse and maplist. Is it worth it? Maybe, maybe not. Probably not.
Now that I thought about it a little bit, you could also do it with foldl which reverses, but now you really want to reverse so it is more useful than annoying.
rev_comp([], []).
rev_comp([X|Xs], Ys) :-
pair(X, Y),
foldl(rc, Xs, [Y], Ys).
rc(X, Ys, [Y|Ys]) :- pair(X, Y).
But this is even less obvious than solution above and solution above is still less obvious than solution by #Capellic so maybe you can look at code I wrote but please don't write such code unless of course you are answering questions of Stackoverflow and want to look clever or impress a girl that asks your help for exercise in university.

Related

Prolog - remove the non unique elements

I have a predicate to check if the element is member of list and looks the following:
member(X,[X|_]).
member(X,[_|T]) :- member(X,T).
When I called: ?- member(1,[2,3,1,4])
I get: true.
And now I have to use it to write predicate which will remove all non unique elements from list of lists like the following:
remove([[a,m,t,a],[k,a,w],[i,k,b,b],[z,m,m,c]],X).
X = [[t],[w],[i,b,b],[z,c]]
How can I do that?

Using library(reif) for
SICStus|SWI:
lists_uniques(Xss, Yss) :-
maplist(tfilter(in_unique_t(Xss)), Xss, Yss).
in_unique_t(Xss, E, T) :-
tfilter(memberd_t(E), Xss, [_|Rs]),
=(Rs, [], T).
Remark that while there is no restriction how to name a predicate, a non-relational, imperative name often hides the pure relation behind. remove is a real imperative, but we only want a relation. A relation between a list of lists and a list of lists with only unique elements.
An example usage:
?- lists_uniques([[X,b],[b]], [[X],[]]).
dif(X, b).
So in this case we have left X an uninstantiated variable. Therefore, Prolog computes the most general answer possible, figuring out what X has to look like.
(Note that the answer you have accepted incorrectly fails in this case)

Going by your example and #false's comment, the actual problem seems to be something like removing elements from each sublist that occur in any other sublist. My difficulty conceptualizing this into words has led me to build what I consider a pretty messy and gross piece of code.
So first I want a little helper predicate to sort of move member/2 up to lists of sublists.
in_sublist(X, [Sublist|_]) :- member(X, Sublist).
in_sublist(X, [_|Sublists]) :- in_sublist(X, Sublists).
This is no great piece of work, and in truth I feel like it should be inlined somehow because I just can't see myself ever wanting to use this on its own.
Now, my initial solution wasn't correct and looked like this:
remove([Sub1|Subs], [Res1|Result]) :-
findall(X, (member(X, Sub1), \+ in_sublist(X, Subs)), Res1),
remove(Subs, Result).
remove([], []).
You can see the sort of theme I'm going for here though: let's use findall/3 to enumerate the elements of the sublist in here and then we can filter out the ones that occur in the other lists. This doesn't quite do the trick, the output looks like this.
?- remove([[a,m,t,a],[k,a,w],[i,k,b,b],[z,m,m,c]], R).
R = [[t], [a, w], [i, k, b, b], [z, m, m, c]].
So, it starts off looking OK with [t] but then loses the plot with [a,w] because there is not visibility into the input [a,m,t,a] when we get to the first recursive call. There are several ways we could deal with it; a clever one would probably be to form a sort of zipper, where we have the preceding elements of the list and the succeeding ones together. Another approach would be to remove the elements in this list from all the succeeding lists before the recursive call. I went for a "simpler" solution which is messier and harder to read but took less time. I would strongly recommend you investigate the other options for readability.
remove(In, Out) :- remove(In, Out, []).
remove([Sub1|Subs], [Res1|Result], Seen) :-
findall(X, (member(X, Sub1),
\+ member(X, Seen),
\+ in_sublist(X, Subs)), Res1),
append(Sub1, Seen, Seen1),
remove(Subs, Result, Seen1).
remove([], [], _).
So basically now I'm keeping a "seen" list. Right before the recursive call, I stitch together the stuff I've seen so far and the elements of this list. This is not particularly efficient, but it seems to get the job done:
?- remove([[a,m,t,a],[k,a,w],[i,k,b,b],[z,m,m,c]], R).
R = [[t], [w], [i, b, b], [z, c]].
This strikes me as a pretty nasty problem. I'm surprised how nasty it is, honestly. I'm hoping someone else can come along and find a better solution that reads better.
Another thing to investigate would be DCGs, which can be helpful for doing these kinds of list processing tasks.

Prolog and limitations of backtracking

This is probably the most trivial implementation of a function that returns the length of a list in Prolog
count([], 0).
count([_|B], T) :- count(B, U), T is U + 1.
one thing about Prolog that I still cannot wrap my head around is the flexibility of using variables as parameters.
So for example I can run count([a, b, c], 3). and get true. I can also run count([a, b], X). and get an answer X = 2.. Oddly (at least for me) is that I can also run count(X, 3). and get at least one result, which looks something like X = [_G4337877, _G4337880, _G4337883] ; before the interpreter disappears into an infinite loop. I can even run something truly "flexible" like count(X, A). and get X = [], A = 0 ; X = [_G4369400], A = 1., which is obviously incomplete but somehow really nice.
Therefore my multifaceted question. Can I somehow explain to Prolog not to look beyond first result when executing count(X, 3).? Can I somehow make Prolog generate any number of solutions for count(X, A).? Is there a limitation of what kind of solutions I can generate? What is it about this specific predicate, that prevents me from generating all solutions for all possible kinds of queries?

This is probably the most trivial implementation
Depends from viewpoint: consider
count(L,C) :- length(L,C).
Shorter and functional. And this one also works for your use case.
edit
library CLP(FD) allows for
:- use_module(library(clpfd)).
count([], 0).
count([_|B], T) :- U #>= 0, T #= U + 1, count(B, U).
?- count(X,3).
X = [_G2327, _G2498, _G2669] ;
false.
(further) answering to comments
It was clearly sarcasm
No, sorry for giving this impression. It was an attempt to give you a synthetic answer to your question. Every details of the implementation of length/2 - indeed much longer than your code - have been carefully weighted to give us a general and efficient building block.
There must be some general concept
I would call (full) Prolog such general concept. From the very start, Prolog requires us to solve computational tasks describing relations among predicate arguments. Once we have described our relations, we can query our 'knowledge database', and Prolog attempts to enumerate all answers, in a specific order.
High level concepts like unification and depth first search (backtracking) are keys in this model.
Now, I think you're looking for second order constructs like var/1, that allow us to reason about our predicates. Such constructs cannot be written in (pure) Prolog, and a growing school of thinking requires to avoid them, because are rather difficult to use. So I posted an alternative using CLP(FD), that effectively shields us in some situation. In this question specific context, it actually give us a simple and elegant solution.
I am not trying to re-implement length
Well, I'm aware of this, but since count/2 aliases length/2, why not study the reference model ? ( see source on SWI-Prolog site )

The answer you get for the query count(X,3) is actually not odd at all. You are asking which lists have a length of 3. And you get a list with 3 elements. The infinite loop appears because the variables B and U in the first goal of your recursive rule are unbound. You don't have anything before that goal that could fail. So it is always possible to follow the recursion. In the version of CapelliC you have 2 goals in the second rule before the recursion that fail if the second argument is smaller than 1. Maybe it becomes clearer if you consider this slightly altered version:
:- use_module(library(clpfd)).
count([], 0).
count([_|B], T) :-
T #> 0,
U #= T - 1,
count(B, U).
Your query
?- count(X,3).
will not match the first rule but the second one and continue recursively until the second argument is 0. At that point the first rule will match and yield the result:
X = [_A,_B,_C] ?
The head of the second rule will also match but its first goal will fail because T=0:
X = [_A,_B,_C] ? ;
no
In your above version however Prolog will try the recursive goal of the second rule because of the unbound variables B and U and hence loop infinitely.

Prolog - Palindrome Functor

I am trying to write a predicate palindrome/1 in Prolog that is true if and only if its list input consists of a palindromic list.
for example:
?- palindrome([1,2,3,4,5,4,3,2,1]).
is true.
Any ideas or solutions?

A palindrome list is a list which reads the same backwards, so you can reverse the list to check whether it yields the same list:
palindrome(L):-
reverse(L, L).

Looks that everybody is voting for a reverse/2 based solution. I guess you guys have a reverse/2 solution in mind that is O(n) of the given list. Something with an accumulator:
reverse(X,Y) :- reverse(X,[],Y).
reverse([],X,X).
reverse([X|Y],Z,T) :- reverse(Y,[X|Z],T).
But there are also other ways to check for a palindrome. I came up with a solution that makes use of DCG. One can use the following rules:
palin --> [].
palin --> [_].
palin --> [Border], palin, [Border].
Which solution is better? Well lets do some little statistics via the profile
command of the Prolog system. Here are the results:
So maybe the DCG solution is often faster in the positive case ("radar"), it does not
have to build the whole reverse list, but directly moves to the middle and then
checks the rest during leaving its own recursion. But disadvantage of DCG solution
is that it is non-deterministic. Some time measurements would tell more...
Bye
P.S.: Port statistics done with new plugable debugger of Jekejeke Prolog:
http://www.jekejeke.ch/idatab/doclet/prod/en/docs/10_dev/10_docu/02_reference/04_examples/02_count.html
But other Prolog systems have similar facilities. For more info see, "Code Profiler" column:
http://en.wikipedia.org/wiki/Comparison_of_Prolog_implementations

This sure sounds like a homework question, but I just can't help myself:
palindrome(X) :- reverse(X,X).
Technically, prolog functor don't "return" anything.

Another way, doing it with DCG's:
palindrome --> [_].
palindrome --> [C,C].
palindrome --> [C],palindrome,[C].
You can check for a palindrome like this:
?- phrase(palindrome,[a,b,a,b,a]).
true.
?- phrase(palindrome,[a,b,a,b,b]).
false.

You can use :
palindrome([]).
palindrome([_]).
palindrome([X|Xs]):-append(Xs1,[X],Xs), palindrome(Xs1).

binary predicate to square list and sublists in Prolog

I am new to prolog and was trying to create a binary predicate which will give
a list in which all numbers are squared, including those in sublists.
e.g.
?-dcountSublists([a,[[3]],b,4,c(5),4],C).
C=[a,[[9]],b,c(5),16]
Can anyone guide me how i can do this.
Thank You. Answer with a snippet is appreciated

This is easily achieved using recursion in Prolog. Remember that everything in Prolog is either a variable, or a term (atoms are just 0-arity terms), so a term like the following:
[a,[[3]],b,4,c(5),4]
...is easily deconstructed (also note that the list syntax [..] is sugar for the binary predicate ./2). Prolog offers a range of predicates to test for particular types of terms as well, such as numbers, strings, or compound terms (such as compound/1).
To build the predicate you're after, I recommend writing it using several predicates like this:
dcountSublists(In, Out) :-
% analyze type of In
% based on type, either:
% 1. split term into subterms for recursive processing
% 2. term cannot be split; either replace it, or pass it through
Here's an example to get you started which does the hard bit. The following recognizes compound terms and breaks them apart with the term de/constructor =../2:
dcountSublists(In, Out) :-
% test if In has type compound term
compound(In),
% cut to exclude backtracking to other cases below this predicate
!,
% deconstruct In into functor and an argument list
In =.. [Func|Args],
% apply dcountSublists/2 to every argument, building new args
maplist(dcountSublists, Args, NewArgs),
% re-construct In using the new arguments
Out =.. [Func|NewArgs].
dcountSublists(In, Out) :-
% test if In has type atom
atom(In), !,
% pass it through
Out = In.
Testing:
?- dcountSublists([a,[[e]],b,a,c(s),a], L).
L = [a, [[e]], b, a, c(s), a].
Note that this fails if the input term has numbers, because it doesn't have a predicate to recognize and deal with them. I'll leave this up to you.
Good luck!

SWI-Prolog has the predicate maplist/[2-5] which allows you to map a predicate over some lists.
Using that, you only have to make a predicate that will square a number or the numbers in a list and leave everything else the same. The predicates number/1, is_list/1 are true if their argument is a number or a list.
Therefore:
square(N,NN):-
integer(N),
NN is N*N.
square(L,LL):-
is_list(L),
dcountSublists(square,L,LL).
square(Other,Other):-
\+ number(Other),
\+ is_list(Other).
dcountSublists(L,LSquared):-
maplist(square,L,LSquared).
with the negation in the final predicate we avoid multiple (wrong) solutions:
for example dcountSublists([2],X) would return X=[4] and X=[2] otherwise.
This could be avoided if we used an if-then-else structure for square or once/1 to call square/2.
If this is homework maybe you should not use maplist since (probably) the aim of the exercise is to learn how to build a recursive function; in any case, I would suggest to try and write an equivalent predicate without maplist.

Max out of values defined by prolog clauses

I know how to iterate over lists in Prolog to find the maximum, but what if each thing is a separate clause? For example if I had a bunch of felines and their ages, how would I find the oldest kitty?
cat(sassy, 5).
cat(misty, 3).
cat(princess, 2).
My first thought was "hmm, the oldest cat is the one for which no older exists". But I couldn't really translate that well to prolog.
oldest(X) :- cat(X, AgeX), cat(Y, AgeY), X \= Y, \+ AgeX < AgeY, print(Y).
This still errorenously matches "misty". What's the proper way to do this? Is there some way to more directly just iterate over the ages to choose max?

One way is
oldest(X) :- cat(X, AgeX), \+ Y^(cat(Y, AgeY), Y \= X, AgeX < AgeY).
You can also use setof/3 to get a list of all cats and get the maximum from that.

A cat is the oldest if it's a cat and there is not a cat older than it. Let's write that in Prolog:
oldest(X):- cat(X, _), not( thereAreOlders(X)), !.
thereAreOlders(X):- cat(X, N), cat(C, M), C\=X, M > N.
If you consult:
?- oldest(X).
X = sassy.

Here is a solution that loops through all the solutions, always recording the solution that is better than the previous best. In the end, the best solution is returned.
The recording is done using assert/1, you could also use a non-backtrackable global variable if your Prolog provides that (SWI-Prolog does).
The benefit of this approach is that is considers each solution only once, i.e. complexity O(n). So, even though it looks uglier than starblue's solution, it should run better.
% Data
cat(sassy, 5).
cat(misty, 3).
cat(miisu, 10).
cat(princess, 2).
% Interface
oldest_cat(Name) :-
loop_through_cats,
fetch_oldest_cat(Name).
loop_through_cats :-
cat(Name, Age),
record_cat_age(Name, Age),
fail ; true.
:- dynamic current_oldest_cat/2.
record_cat_age(Name, Age) :-
current_oldest_cat(_, CAge),
!,
Age > CAge,
retract(current_oldest_cat(_, _)),
assert(current_oldest_cat(Name, Age)).
record_cat_age(Name, Age) :-
assert(current_oldest_cat(Name, Age)).
fetch_oldest_cat(Name) :-
retract(current_oldest_cat(Name, _Age)).
Usage example:
?- oldest_cat(Name).
Name = miisu
Miisu is a typical Estonian cat name. ;)

On a stylistic point- there are a few different approaches here (some are very elegant, others more 'readable'). If you're a beginner- chose your own, preferred, way of doing things- however inefficient.
You can learn techniques for efficiency later. Enjoy Prolog- its a beautiful language.

I don't remember much Prolog, but I do know that you shouldn't think about solving problems as you would with an imperative programming language.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

DNA Matching in Prolog - prolog

A solution with maplist/3 and reverse/2: dnamatch(A,B) :- reverse(B,C), maplist(pairmatch,A,C).

Related

Prolog - remove the non unique elements

Prolog and limitations of backtracking

Prolog - Palindrome Functor

binary predicate to square list and sublists in Prolog

Max out of values defined by prolog clauses

Categories

Resources