Change comparator in prolog rbtree - prolog

I am using the swi-prolog rb_trees. The standard implementation uses "==" to compare values, I need to use "=#=", is there some way to do it?
If it is not possible I guess I would need to find some other representation for the clauses I store in the tree. The clauses have this format for example:
cl(daughter(X,Y), [female(X), parent(Y,X)])
I need the above clause to be equal to for example this clause:
cl(daughter(A,B), [female(A), parent(B,A)])
One function that gives the same output only for =#= clauses is portray_clause i guess. However it doesn't have an output argument, so I am not able to store the output of portray_clause into rb_tree.

Maybe you can use numbervars/3 on the clauses for storing them in the red-black tree and varnumbers/3 to get the original clause back? The predicate numbervars/3 will ground the clauses and maybe that will make the use of (==)/2 work for your case?

I don't think there's a way to change the predicate used for comparisons.
The best solution is probably to use ground terms as keys, since then the difference between == and =#= disappears. (In fact, using ground terms is a good idea anyway, since you don't want a variable binding that occurs in between tree operations to invalidate the tree's order property.)

Related

Retrieve internal/canonical representation of terms in Prolog

I am writing a program for which I need terms in their prefix notation.
The point is to being able to parse mathematical expressions to prefix notation, while preserving the correct order of Operations. I then want to save the result in the database for later use (using assert), which includes translating to another language, which uses prefix notation. Prolog Operators do all have a fixed priority which is a feature I want to use, as I will be using all sorts of operators (including clp operators).
As among others I need to include complete mathematical expressions, such as the equality operator. Thus I cannot recursively use the Univ operator (=..), because it won't accept equality operators etc. Or can I somehow use =.. ?
Essentially I want to work with the internal representation of
N is 3*4+5 % just a random example
which would be
is(N,+(*(3,4),5))
Now, I do know that I can use, write_canonical(N is 3*4+5) to get the internal representation as seen above.
So is there a way to somehow get the internal representation as a term or a list, or something.
Would it be possible to bind the output of write_canonical to a variable?
I hope my question is clear enough.
Prolog terms can be despicted as trees. But, when writing a term, the way a term is displayed depends on the defined operators and write options. Consider:
?- (N is 3*4+5) = is(N,+(*(3,4),5)).
true.
?- (N is 3*4+5) = is(Variable, Expression).
N = Variable,
Expression = 3*4+5.
?- 3*4+5 = +(*(3,4),5).
true.
I.e. operators are syntactic sugar. They don't change how terms are represented, only how terms are displayed.

Prolog: Looping through elements of list A and comparing to members of list B

I'm trying to write Prolog logic for the first time, but I'm having trouble. I am to write logic that takes two lists and checks for like elements between the two. For example, consider the predicate similarity/2 :
?- similarity([2,4,5,6,8], [1,3,5,6,9]).
true.
?- similarity([1,2,3], [5,6,8]).
false.
The first query will return true as those two lists have 5 and 6 in common. The second returns false as there are no common elements between the two lists in that query.
I CANNOT use built in logic, such as member, disjoint, intersection, etc. I am thinking of iterating through the first list provided, and checking to see if it matches each element in the second list. Is this an efficient approach to this problem? I will appreciate any advice and help. Thank you so much.
Writing Prolog for the first time can be really daunting, since it is unlike many traditional programming languages that you will most likely encounter; however it is a very rewarding experience once you've got a grasp on this new style of programming! Since you mention that you are writing Prolog for the first time I'll give some general tips and tricks about writing Prolog, and then move onto some hints to your problem, and then provide what I believe to be a solution.
Think Recursively
You can think of every Prolog program that you write to be intrinsically recursive in nature. i.e. you can provide it with a series of "base-cases" which take the following form:
human(John). or wildling(Ygritte) In my opinion, these rules should always be the first ones that you write. Try to break down the problem into its simplest case and then work from there.
On the other hand, you can also provide it with more complex rules which will look something like this: contains(X, [H|T]):- contains(X, T) The key bit is that writing a rule like this is very much equivalent to writing a recursive function in say, Python. This rule does a lot of the heavy lifting in looking to see whether a value is contained in a list, but it isn't complete without a "base-case". A complete contains rule would actually be two rules put together: contains(X, [X|_]).
contains(X, [H|T]):-contains(X, T).
The big takeaway from this is to try and identify the simple cases of your problem, which can act like base cases in a recursive function, and then try to identify how you want to "recurse" and actually do work on the problem at hand.
Pattern Matching
Part of the great thing about Prolog is the pattern matching system that it has in place. You should 100% use this to your advantage whenever you can -- it is especially helpful when trying to do anything with lists. For example:
head(X, [X|T]).
Will evaluate to true when called thusly: head(1, [1, 2, 3]) because intrinsic in the rule is the matching of X. This sort of pattern matching on the first element of a list is incredibly important and really the key way that you will do any work on lists in Prolog. In my experience, pattern matching on the head of a list will often be one of the "base-cases" that I mentioned beforehand.
Understand The Flow of the Program
Another key component of how Prolog works is that it takes a "top-down" approach to reading code. What I mean by that is that every time a rule is called (except for definitions of the form king(James).), Prolog starts at line 1 and continues until it reaches a rule that is true or the end of the file. Therefore, the ordering of your rules is incredibly important. I'm assuming that you know that you can combine rules together via a comma to indicate logical AND, but what is maybe more subtle is that if you order one rule above another, it can act as a logical OR, simply because it will be evaluated before another rule, and can potentially cause the program to recurse.
Specific Example
Now that I've gotten all of my general advice out of the way, I'll actually reference the given problem. First, I'd write my "base-case". What would happen if you are given two lists whose first elements are the same? If the first element in each list is not the same, then they have to be different. So, you have to look through the second list to see if this element is contained anywhere in the rest of the list. What kind of rule would this produce? OR it could be the case that the first element of the first list is not contained within the second at all, in which case you have to advance once in the first list, and start again with the second list. What kind of rule would this produce?
In the end, I would say that your approach is the correct one to take, and I have provided my own solution below:
similarity([H|_], [H|_]).
similarity(H1|T1], [_|T2]):- similarity([H1|T1], T2).
similarity([_|T1], [H2|T2]):- similarity(T1, [H2|T2]).
Hope all of this helps in some way!

Prolog unknowns in the knowledge base

I am trying to learn Prolog and it seems the completeness of the knowledge is very important because obviously if the knowledge base does not have the fact, or the fact is incorrect, it will affect the query results. I am wondering how best to handle unknown details of a fact. For example,
%life(<name>,<birth year>,<death year>)
%ruler(<name>,<precededBy>,<succeededBy>)
Some people I add to the knowledge base would still be alive, therefore their year of death is not known. In the example of rulers, the first ruler did not have a predecessor and the current ruler does not have a successor. In the event that there are these unknowns should I put some kind of unknown flag value or can the detail be left out. In the case of the ruler, not knowing the predecessor would the fact look like this?
ruler(great_ruler,,second_ruler).
Well, you have a few options.
In this particular case, I would question your design. Rather than putting both previous and next on the ruler, you could just put next and use a rule to find the previous:
ruler(great_ruler, second_ruler).
ruler(second_ruler, third_ruler).
previous(Ruler, Previous) :- ruler(Previous, Ruler).
This predicate will simply fail for great_ruler, which is probably appropriate—there wasn't anyone before them, after all.
In other cases, it may not be straightforward. So you have to decide if you want to make an explicit value for unknown or use a variable. Basically, do you want to do this:
ruler(great_ruler, unknown, second_ruler).
or do you want to do this:
ruler(great_ruler, _, second_ruler).
In the first case, you might get spurious answers featuring unknown, unless you write some custom logic to catch it. But I actually think the second case is worse, because that empty variable will unify with anything, so lots of queries will produce weird results:
ruler(_, SucceededHimself, SucceededHimself)
will succeed, for instance, unifying SucceededHimself = second_ruler, which probably isn't what you want. You can check for variables using var/1 and ground/1, but at that point you're tampering with Prolog's search and it's going to get more complex. So a blank variable is not as much like NULL in SQL as you might want it to be.
In summary:
prefer representations that do not lead to this problem
if forced, use a special value

most readable way in XPath to write "is value X a member of sequence S"?

XPath 2.0 has some new functions and syntax, relative to 1.0, that work with sequences. Some of theset don't really add to what the language could already do in 1.0 (with node sets), but they make it easier to express the desired logic in ways that are more readable. This increases the chances of the programmer getting the code correct -- and keeping it that way. For example,
empty(s) is equivalent to not(s), but its intent is much clearer when you want to test whether a sequence is empty.
Correction: the effective boolean value of a sequence is in general more complicated than that. E.g. empty((0)) != not((0)). This applies to exists(s) vs. s in a boolean context as well. However, there are domains of s where empty(s) is equivalent to not(s), so the two could be used interchangeably within those domains. But this goes to show that the use of empty() can make a non-trivial difference in making code easier to understand.
Similarly, exists(s) is equivalent to boolean(s) that already existed in XPath 1.0 (or just s in a boolean context), but again is much clearer about the intent.
Quantified expressions; e.g. "some $x in expression satisfies test($x)" would be equivalent to boolean(expression[test(.)]) (although the new syntax is more flexible, in that you don't need to worry about losing the context item because you have the variable to refer to it by).
Similarly, "every $x in expression satisfies test($x)" would be equivalent to not(expression[not(test(.))]) but is more readable.
These functions and syntax were evidently added at no small cost, solely to serve the goal of writing XPath that is easier to map to how humans think. This implies, as experienced developers know, that understandable code is significantly superior to code that is difficult to understand.
Given all that ... what would be a clear and readable way to write an XPath test expression that asks
Does value X occur in sequence S?
Some ways to do it: (Note: I used X and S notation here to indicate the value and the sequence, but I don't mean to imply that these subexpressions are element name tests, nor that they are simple expressions. They could be complicated.)
X = S: This would be one of the most unreadable, since it requires the reader to
think about which of X and S are sequences vs. single values
understand general comparisons, which are not obvious from the syntax
However, one advantage of this form is that it allows us to put the topic (X) before the comment ("is a member of S"), which, I think, helps in readability.
See also CMS's good point about readability, when the syntax or names make the "cardinality" of X and S obvious.
index-of(S, X): This one is clear about what's intended as a value and what as a sequence (if you remember the order of arguments to index-of()). But it expresses more than we need to: it asks for the index, when all we really want to know is whether X occurs in S. This is somewhat misleading to the reader. An experienced developer will figure out what's intended, with some effort and with understanding of the context. But the more we rely on context to understand the intent of each line, the more understanding the code becomes a circular (spiral) and potentially Sisyphean task! Also, since index-of() is designed to return a list of all the indexes of occurrences of X, it could be more expensive than necessary: a smart processor, in order to evaluate X = S, wouldn't necessarily have to find all the contents of S, nor enumerate them in order; but for index-of(S, X), correct order would have to be determined, and all contents of S must be compared to X. One other drawback of using index-of() is that it's limited to using eq for comparison; you can't, for example, use it to ask whether a node is identical to any node in a given sequence.
Correction: This form, used as a conditional test, can result in a runtime error: Effective boolean value is not defined for a sequence of two or more items starting with a numeric value. (But at least we won't get wrong boolean values, since index-of() can't return a zero.) If S can have multiple instances of X, this is another good reason to prefer form 3 or 6.
exists(index-of(X, S)): makes the intent clearer, and would help the processor eliminate the performance penalty if the processor is smart enough.
some $m in S satisfies $m eq X: This one is very clear, and matches our intent exactly. It seems long-winded compared to 1, and that in itself can reduce readability. But maybe that's an acceptable price for clarity. Keep in mind that X and S could potentially be complex expressions themselves -- they're not necessarily just variable references. An advantage is that since the eq operator is explicit, you can replace it with is or any other comparison operator.
S[. eq X]: clearer than 1, but shares the semantic drawbacks of 2: it computes all members of S that are equal to X. Actually, this could return a false negative (incorrect effective boolean value), if X is falsy. E.g. (0, 1)[. eq 0] returns 0 which is falsy, even though 0 occurs in (0, 1).
exists(S[. eq X]): Clearer than 1, 2, 3, and 5. Not as clear as 4, but shorter. Avoids the drawbacks of 5 (or at least most of them, depending on the processor smarts).
I'm kind of leaning toward the last one, at this point: exists(S[. eq X])
What about you... As a developer coming to a complex, unfamiliar XSLT or XQuery or other program that uses XPath 2.0, and wanting to figure out what that program is doing, which would you find easiest to read?
Apologies for the long question. Thanks for reading this far.
Edit: I changed = to eq wherever possible in the above discussion, to make it easier to see where a "value comparison" (as opposed to a general comparison) was intended.
For what it's worth, if names or context make clear that X is a singleton, I'm happy to use your first form, X = S -- for example when I want to check an attribute value against a set of possible values:
<xsl:when test="#type = ('A', 'A+', 'A-', 'B+')" />
or
<xsl:when test="#type = $magic-types"/>
If I think there is a risk of confusion, then I like your sixth formulation. The less frequently I have to remember the rules for calculating an effective boolean value, the less frequently I make a mistake with them.
I prefer this one:
count(distinct-values($seq)) eq count(distinct-values(($x, $seq)))
When $x is itself a sequence, this expression implements the (value-based) subset of relation between two sets of values, that are represented as sequences. This implementation of subset of has just linear time complexity -- vs many other ways of expressing this, that have O(N^2)) time complexity.
To summarize, the question whether a single value belongs to a set of values is a special case of the question whether one set of values is a subset of another. If we have a good implementation of the latter, we can simply use it for answering the former.
The functx library has a nice implementation of this function, so you can use
functx:is-node-in-sequence($X, $Y)
(this particular function can be found at http://www.xqueryfunctions.com/xq/functx_is-node-in-sequence.html)
The whole functx library is available for both XQuery (http://www.xqueryfunctions.com/) and XSLT (http://www.xsltfunctions.com/)
Marklogic ships the functx library with their core product; other vendors may also.
Another possibility, when you want to know whether node X occurs in sequence S, is
exists((X) intersect S)
I think that's pretty readable, and concise. But it only works when X and the values in S are nodes; if you try to ask
exists(('bob') intersect ('alice', 'bob'))
you'll get a runtime error.
In the program I'm working on now, I need to compare strings, so this isn't an option.
As Dimitri notes, the occurrence of a node in a sequence is a question of identity, not of value comparison.

insert element in a list and return the same list updated

Hi i'm trying to insert an element in a list but it is very important from my program that the result is stored in the original list and not in a new one.
Any code that i have written or found on the internet only succeeds if you create a new list in which the end result is kept.
So my question is can anyone tell me how to define a function: insert(X,L) where X is an element and L is a list?
No, Prolog just doesn't work that way. There is no such thing as "modifying" a value. A variable can be unified with a specific value, but if it was already [1,3], it won't ever be [1,2,3] later.
As aschepler says, you cannot add or make any change to a proper list, i.e. a list in which every element is already bound. The only "modifying" we can do is unifying one expression with another.
However there is a concept of a partial list to which additional elements can be "added" at the end. This is typically known as a difference list, although that nomenclature may not be immediately understandable.
Suppose we start, not with an empty list, but with a free variable X. One might however think of subtracting X from X and getting "nothing". That is, an empty difference list is represented by X - X. The minus "-" here is a purely formal operator; no evaluation of the difference is intended. It's just a convenient syntax as you see from how difference lists can be used to accomplish what you (probably) want to do.
We can add an element to a difference list as follows:
insertDL(M,X-Y,X-Z) :- Y = [M|Z].
Here M is the new element we want to add, X-Y is the "old" difference list, and X-Z is the "new" difference (to which M has been added, by unifying the previously free variable Y with the partial list [M|Z], so that Z becomes the "open" tail of partial list X).
When we are finally done inserting things into our difference list, we can turn X into a proper list by setting the "free tail" at that point to the empty list [ ]. In this sense X is the "same" variable as when we first began, just unified by incremental steps from free variable to proper list.
This is a very powerful technique in Prolog programming, and it takes some practice to feel comfortable using it. Some links to further discussion on the Web:
[From Prolog lists to difference lists]
http://www.irisa.fr/prive/ridoux/ICLP91/node8.html
[Implementing difference lists in Prolog]
http://www.cl.cam.ac.uk/~jpw48/difflists.pdf
[Lecture Notes: Difference Lists]
http://www.cs.cmu.edu/~fp/courses/lp/lectures/11-diff.pdf
Some prologs provide the setarg/3 predicate in order to modify terms in place.
In order to use it over lists, you only need to consider that they are just a nice representation of chains of compound terms with functor '.'/2
In any case, when you need to use setarg/3 in Prolog, it probably means you are doing something wrong.

Resources