Difference between "open-ended lists" and "difference lists" - prolog

What is the difference between "open-ended lists" and "difference lists"?

As explained on http://homepages.inf.ed.ac.uk/pbrna/prologbook/node180.html, open list is a tool used to implement a difference list.
Open list is any list where you have a unassigned variable at some point in the list, e.g.: [a,b,c|X]. You can use open list to implement a data structure called difference list, which formally specifies a structure of two terms pointing to first element and to the open end, traditionally defined as: [a,b,c|X]-X, to make operating on such lists easier.
For example, if all you have is an open list, adding element to the end is possible, but you need to iterate over all items. In a difference list you can just use the end-of-list variable (called a Hole on the page above) to skip iteration and perform the operation in constant time.

Both notions seem to be lists, but in fact they are not. One is a concrete term, the other rather a convention.
Open-ended lists, partial lists
Open-ended lists are terms that are not lists but can be instantiated such that they become lists. In standard lingo, they are called partial lists. Here are partial lists: X, [a|X], [X|X] are all partial lists.
The notion open-ended lists suggests a certain usage of such lists to simulate some open-ended state. Think of a dictionary that might be represented by an open-ended list. Every time you add a new item, the variable "at the end of the partial list" is instantiated to a new element. While this programming technique is quite possible in Prolog, it has one big downside: The programs will heavily depend on a procedural interpretation. And in many situations there is no way to have a declarative interpretation at all.
Difference lists
Difference lists are effectively not lists as such but a certain way how lists are used such that the intended list is represented by two variables: one for the start and one for the end of the list. For this reason it would help a lot to rather talk of list differences instead of difference lists.
Consider:
el(E, [E|L],L).
Here, the last two arguments can be seen as forming a difference: a list that contains the single element [E]. You can now construct more complex lists out of simpler ones, provided you respect certain conventions which are essentially that the second argument is only passed further on. The differences as such are never compared to each other!
el2(E, F, L0,L) :-
el(E, L0,L1),
el(F, L1,L).
Note that this is merely a convention. The lists are not enforced. Think of:
?- el2(E, F, L, nonlist).
L = [E,F|nonlist].
This technique is also used to encode dcgs.

For example
Open-ended : [a,b,c | _]
Difference-list : [a,b,c|U]-U.

Related

Prolog: Looping through elements of list A and comparing to members of list B

I'm trying to write Prolog logic for the first time, but I'm having trouble. I am to write logic that takes two lists and checks for like elements between the two. For example, consider the predicate similarity/2 :
?- similarity([2,4,5,6,8], [1,3,5,6,9]).
true.
?- similarity([1,2,3], [5,6,8]).
false.
The first query will return true as those two lists have 5 and 6 in common. The second returns false as there are no common elements between the two lists in that query.
I CANNOT use built in logic, such as member, disjoint, intersection, etc. I am thinking of iterating through the first list provided, and checking to see if it matches each element in the second list. Is this an efficient approach to this problem? I will appreciate any advice and help. Thank you so much.
Writing Prolog for the first time can be really daunting, since it is unlike many traditional programming languages that you will most likely encounter; however it is a very rewarding experience once you've got a grasp on this new style of programming! Since you mention that you are writing Prolog for the first time I'll give some general tips and tricks about writing Prolog, and then move onto some hints to your problem, and then provide what I believe to be a solution.
Think Recursively
You can think of every Prolog program that you write to be intrinsically recursive in nature. i.e. you can provide it with a series of "base-cases" which take the following form:
human(John). or wildling(Ygritte) In my opinion, these rules should always be the first ones that you write. Try to break down the problem into its simplest case and then work from there.
On the other hand, you can also provide it with more complex rules which will look something like this: contains(X, [H|T]):- contains(X, T) The key bit is that writing a rule like this is very much equivalent to writing a recursive function in say, Python. This rule does a lot of the heavy lifting in looking to see whether a value is contained in a list, but it isn't complete without a "base-case". A complete contains rule would actually be two rules put together: contains(X, [X|_]).
contains(X, [H|T]):-contains(X, T).
The big takeaway from this is to try and identify the simple cases of your problem, which can act like base cases in a recursive function, and then try to identify how you want to "recurse" and actually do work on the problem at hand.
Pattern Matching
Part of the great thing about Prolog is the pattern matching system that it has in place. You should 100% use this to your advantage whenever you can -- it is especially helpful when trying to do anything with lists. For example:
head(X, [X|T]).
Will evaluate to true when called thusly: head(1, [1, 2, 3]) because intrinsic in the rule is the matching of X. This sort of pattern matching on the first element of a list is incredibly important and really the key way that you will do any work on lists in Prolog. In my experience, pattern matching on the head of a list will often be one of the "base-cases" that I mentioned beforehand.
Understand The Flow of the Program
Another key component of how Prolog works is that it takes a "top-down" approach to reading code. What I mean by that is that every time a rule is called (except for definitions of the form king(James).), Prolog starts at line 1 and continues until it reaches a rule that is true or the end of the file. Therefore, the ordering of your rules is incredibly important. I'm assuming that you know that you can combine rules together via a comma to indicate logical AND, but what is maybe more subtle is that if you order one rule above another, it can act as a logical OR, simply because it will be evaluated before another rule, and can potentially cause the program to recurse.
Specific Example
Now that I've gotten all of my general advice out of the way, I'll actually reference the given problem. First, I'd write my "base-case". What would happen if you are given two lists whose first elements are the same? If the first element in each list is not the same, then they have to be different. So, you have to look through the second list to see if this element is contained anywhere in the rest of the list. What kind of rule would this produce? OR it could be the case that the first element of the first list is not contained within the second at all, in which case you have to advance once in the first list, and start again with the second list. What kind of rule would this produce?
In the end, I would say that your approach is the correct one to take, and I have provided my own solution below:
similarity([H|_], [H|_]).
similarity(H1|T1], [_|T2]):- similarity([H1|T1], T2).
similarity([_|T1], [H2|T2]):- similarity(T1, [H2|T2]).
Hope all of this helps in some way!

Prolog predicate arguments: readability vs. efficiency

I want to ask pros and cons of different Prolog representations in arguments of predicates.
For example in Exercise 4.3: Write a predicate second(X,List) which checks whether X is the second element of List. The solution can be:
second(X,List):- [_,X|_]=List.
Or,
second(X,[_,X|_]).
The both predicates would behave similarly. The first one would be more readable than the second, at least to me. But the second one uses more stacks during the execution (I checked this with trace).
A more complicated example is Exercise 3.5: Binary trees are trees where all internal nodes have exactly two children. The smallest binary trees consist of only one leaf node. We will represent leaf nodes as leaf(Label) . For instance, leaf(3) and leaf(7) are leaf nodes, and therefore small binary trees. Given two binary trees B1 and B2 we can combine them into one binary tree using the functor tree/2 as follows: tree(B1,B2) . So, from the leaves leaf(1) and leaf(2) we can build the binary tree tree(leaf(1),leaf(2)) . And from the binary trees tree(leaf(1),leaf(2)) and leaf(4) we can build the binary tree tree(tree(leaf(1), leaf(2)),leaf(4)). Now, define a predicate swap/2 , which produces the mirror image of the binary tree that is its first argument. The solution would be:
A2.1:
swap(T1,T2):- T1=tree(leaf(L1),leaf(L2)), T2=tree(leaf(L2),leaf(L1)).
swap(T1,T2):- T1=tree(tree(B1,B2),leaf(L3)), T2=tree(leaf(L3),T3), swap(tree(B1,B2),T3).
swap(T1,T2):- T1=tree(leaf(L1),tree(B2,B3)), T2=tree(T3,leaf(L1)), swap(tree(B2,B3),T3).
swap(T1,T2):- T1=tree(tree(B1,B2),tree(B3,B4)), T2=tree(T4,T3), swap(tree(B1,B2),T3),swap(tree(B3,B4),T4).
Alternatively,
A2.2:
swap(tree(leaf(L1),leaf(L2)), tree(leaf(L2),leaf(L1))).
swap(tree(tree(B1,B2),leaf(L3)), tree(leaf(L3),T3)):- swap(tree(B1,B2),T3).
swap(tree(leaf(L1),tree(B2,B3)), tree(T3,leaf(L1))):- swap(tree(B2,B3),T3).
swap(tree(tree(B1,B2),tree(B3,B4)), tree(T4,T3)):- swap(tree(B1,B2),T3),swap(tree(B3,B4),T4).
The number of steps of the second solution was much less than the first one (again, I checked with trace). But regarding the readability, the first one would be easier to understand, I think.
Probably the readability depends on the level of one's Prolog skill. I am a learner level of Prolog, and am used to programming with C++, Python, etc. So I wonder if skillful Prolog programmers agree with the above readability.
Also, I wonder if the number of steps can be a good measurement of the computational efficiency.
Could you give me your opinions or guidelines to design predicate arguments?
EDITED.
According to the advice from #coder, I made a third version that consists of a single rule:
A2.3:
swap(T1,T2):-
( T1=tree(leaf(L1),leaf(L2)), T2=tree(leaf(L2),leaf(L1)) );
( T1=tree(tree(B1,B2),leaf(L3)), T2=tree(leaf(L3),T3), swap(tree(B1,B2),T3) );
( T1=tree(leaf(L1),tree(B2,B3)), T2=tree(T3,leaf(L1)), swap(tree(B2,B3),T3) );
( T1=tree(tree(B1,B2),tree(B3,B4)), T2=tree(T4,T3), swap(tree(B1,B2),T3),swap(tree(B3,B4),T4) ).
I compared the number of steps in trace of each solution:
A2.1: 36 steps
A2.2: 8 steps
A2.3: 32 steps
A2.3 (readable single-rule version) seems to be better than A2.1 (readable four-rule version), but A2.2 (non-readable four-rule version) still outperforms.
I'm not sure if the number of steps in trace is reflecting the actual computation efficiency.
There are less steps in A2.2 but it uses more computation cost in pattern matching of the arguments.
So, I compared the execution time for 40000 queries (each query is a complicated one, swap(tree(tree(tree(tree(leaf(3),leaf(4)),leaf(5)),tree(tree(tree(tree(leaf(3),leaf(4)),leaf(5)),leaf(4)),leaf(5))),tree(tree(leaf(3),tree(tree(leaf(3),leaf(4)),leaf(5))),tree(tree(tree(tree(leaf(3),leaf(4)),leaf(5)),leaf(4)),leaf(5)))), _). ). The results were almost the same (0.954 sec, 0.944 sec, 0.960 sec respectively). This is showing that the three reresentations A2.1, A2.2, A2.3 have close computational efficiency.
Do you agree with this result? (Probably this is a case specific; I need to vary the experimental setup).
This question is a very good example of a bad question for a forum like Stackoverflow. I am writing an answer because I feel you might use some advice, which, again, is very subjective. I wouldn't be surprised if the question gets closed as "opinion based". But first, an opinion on the exercises and the solutions:
Second element of list
Definitely, second(X, [_,X|_]). is to be preferred. It just looks more familiar. But you should be using the standard library anyway: nth1(2, List, Element).
Mirroring a binary tree
The tree representation that the textbook suggests is a bit... unorthodox? A binary tree is almost invariably represented as a nested term, using two functors, for example:
t/3 which is a non-empty tree, with t(Value_at_node, Left_subtree, Right_subtree)
nil/0 which is an empty tree
Here are some binary trees:
The empty tree: nil
A binary search tree holding {1,2,3}: t(2, t(1, nil, nil), t(3, nil, nil))
A degenerate left-leaning binary tree holding the list [1,2,3] (if you traversed it pre-order): t(1, t(2, t(3, nil, nil), nil), nil)
So, to "mirror" a tree, you would write:
mirror(nil, nil).
mirror(t(X, L, R), t(X, MR, ML)) :-
mirror(L, ML),
mirror(R, MR).
The empty tree, mirrored, is the empty tree.
A non-empty tree, mirrored, has its left and right sub-trees swapped, and mirrored.
That's all. No need for swapping, really, or anything else. It is also efficient: for any argument, only one of the two clauses will be evaluated because the first arguments are different functors, nil/0 and t/3 (Look-up "first argument indexing" for more information on this). If you would have instead written:
mirror_x(T, MT) :-
( T = nil
-> MT = nil
; T = t(X, L, R),
MT = t(X, MR, ML),
mirror_x(L, ML),
mirror_x(R, MR)
).
Than not only is this less readable (well...) but probably less efficient, too.
On readability and efficiency
Code is read by people and evaluated by machines. If you want to write readable code, you still might want to address it to other programmers and not to the machines that are going to evaluate it. Prolog implementations have gotten better and better at being efficient at evaluating code that is also more readable to people who have read and written a lot of Prolog code (do you recognize the feedback loop?). You might want to take a look at Coding Guidelines for Prolog if you are really interested in readability.
A first step towards getting used to Prolog is trying to solve the 99 Prolog Problems (there are other sites with the same content). Follow the suggestion to avoid using built-ins. Then, look at the solutions and study them. Then, study the documentation of a Prolog implementation to see how much of these problems have been solved with built-in predicates or standard libraries. Then, study the implementations. You might find some real gems there: one of my favorite examples is the library definition of nth0/3. Just look at this beauty ;-).
There is also a whole book written on the subject of good Prolog code: "The Craft of Prolog" by Richard O'Keefe. The efficiency measurements are quite outdated though. Basically, if you want to know how efficient your code is, you end up with a matrix with at least three dimensions:
Prolog implementation (SWI-Prolog, SICSTUS, YAP, Gnu-Prolog...)
Data structure and algorithm used
Facilities provided by the implementation
You will end up having some wholes in the matrix. Example: what is the best way to read line-based input, do something with each line, and output it? Read line by line, do the thing, output? Read all at once, do everything in memory, output at once? Use a DCG? In SWI-Prolog, since version 7, you can do:
read_string(In_stream, _, Input),
split_string(Input, "\n", "", Lines),
maplist(do_x, Lines, Xs),
atomics_to_string(Xs, "\n", Output),
format(Out_stream, "~s\n", Output)
This is concise and very efficient. Caveats:
The available memory might be a bottle neck
Strings are not standard Prolog, so you are stuck with implementations that have them
This is a very basic example, but it demonstrates at least the following difficulties in answering your question:
Differences between implementations
Opinions on what is readable or idiomatic Prolog
Opinions on the importance of standards
The example above doesn't even go into details about your problem, as for example what you do with each line. Is it just text? Do you need to parse the lines? Why are you not using a stream of Prolog terms instead? and so on.
On efficiency measurements
Don't use the number of steps in the tracer, or even the reported number of inferences. You really need to measure time, with a realistic input. Sorting with sort/2, for example, always counts as exactly one inference, no matter what is the length of the list being sorted. On the other hand, sort/2 in any Prolog is about as efficient as a sort on your machine would ever get, so is that an issue? You can't know until you have measured the performance.
And of course, as long as you make an informed choice of an algorithm and a data structure, you can at the very least know the complexity of your solution. Doing an efficiency measurement is interesting only if you notice a discrepancy between what you expect and what you measure: obviously, there is a mistake. Either your complexity analysis is wrong, or your implementation is wrong, or even the Prolog implementation you are using is doing something unexpected.
On top of this, there is the inherent problem of high-level libraries. With some of the more complex approaches, you might not be able to easily judge what the complexity of a given solution might be (constraint logic programming, as in CHR and CLPFD, is a prime example). Most real problems that fit nicely to the approach will be much easier to write, and more efficient than you could ever do without considerable effort and very specific code. But get fancy enough, and your CHR program might not even want to compile any more.
Unification in the head of the predicate
This is not opinion-based any more. Just do the unifications in the head if you can. It is more readable to a Prolog programmer, and it is more efficient.
PS
"Learn Prolog Now!" is a good starting point, but nothing more. Just work your way through it and move on.
In the first way for example for Exercise 3.5 you use the rule swap(T1,T2) four times ,which means that prolog will examine all these four rules and will return true or fail for every of these four calls .Because these rules can't all be true together (each time one of them will return true) ,for every input you waste three calls that will not succeed (that's why it demands more steps and more time ). The only advantage in the above case is that by writing with the first way ,it is more readable. In generally when you have such cases of pattern matching it's better to write the rules in a way that are well defined and not two(or more) rules match a input ,if of course you require only one answer ,as for example the second way of writing the above example .
Finally one example where it is required that more than one rules match an input is the predicate member where it is written:
member(H,[H|_]).
member(H,[_|T]):- member(H,T).
where in this case you require more than one answers.
In the third way you just write the first way without pattern matching .It has the form (condition1);...;(condition4) and if the condition1 does not return true it examines the next condition .Most of the times the fourth condition returns true ,but it has called and tested condition1-3 which returned false .So it is almost as the first way of writing the solution ,except the fact that in third solution if it finds true condition1 it will not test other conditions so you will save some wasted calls (compared to solution1).
As for the running time ,it was expected to be almost the same because in worst case solution 1 and 3 does four times the tests/calls that solution 2 does .So if solution2 is O(g) complexity (for some function g) ,then solution 1 and 3 are O(4g) which is O(g) complexity so running times will be very close.

Why list ++ requires to scan all elements of list on its left?

The Haskell tutorial says, be cautious that when we use "Hello"++" World", the new list construction has to visit all single elements(here, every character of "Hello"), so if the list on the left of "++" is long, then using "++" will bring down performance.
I think I was not understanding correctly, does Haskell's developers never tune the performance of list operations? Why this operation remains slow, to have some kind of syntax consistencies in any lambda function or currying?
Any hints? Thanks.
In some languages, a "list" is a general-purpose sequence type intended to offer good performance for concatenation, splitting, etc. In Haskell, and most traditional functional languages, a list is a very specific data structure, namely a singly-linked list. If you want a general-purpose sequence type, you should use Data.Sequence from the containers package (which is already installed on your system and offers very good big-O asymptotics for a wide variety of operations), or perhaps some other one more heavily optimized for common usage patterns.
If you have immutable list which has a head and a reference to the tail, you cannot change its tail. If you want to add something to the 'end' of the list, you have to reach the end and then put all items one by one to the head of your right list. It is the fundamential property of immutable lists: concatenation is expensive.
Haskell lists are like singly-linked lists: they are either empty or they consist of a head and a (possibly empty) tail. Hence, when appending something to a list, you'll first have to walk the entire list to get to the end. So you end up traversing the entire list (the list to which you append, that is), which needs O(n) runtime.

Constant-time list concatenation in OCaml

Is it possible to implement constant-time list concatenation in OCaml?
I imagine an approach where we deal directly with memory and concatenate lists by pointing the end of the first list to the beginning of the second list. Essentially, we're creating some type of linked-list like object.
With the normal list type, no, you can't. The algorithm you gave is exactly the one implemented ... but you still have to actually find the end of the first list...
There are various methods to implement constant time concatenation (see Okazaki for fancy details). I will just give you names of ocaml libraries that implement it: BatSeq, BatLazyList (both in batteries), sequence, gen, Core.Sequence.
Pretty sure there is a diff-list implementation somewhere too.
Lists are already (singly) linked lists. But list nodes are immutable. So you change any node's pointer to point to anything different. In order to concatenate two lists you must therefore copy all the nodes in the first list.

insert element in a list and return the same list updated

Hi i'm trying to insert an element in a list but it is very important from my program that the result is stored in the original list and not in a new one.
Any code that i have written or found on the internet only succeeds if you create a new list in which the end result is kept.
So my question is can anyone tell me how to define a function: insert(X,L) where X is an element and L is a list?
No, Prolog just doesn't work that way. There is no such thing as "modifying" a value. A variable can be unified with a specific value, but if it was already [1,3], it won't ever be [1,2,3] later.
As aschepler says, you cannot add or make any change to a proper list, i.e. a list in which every element is already bound. The only "modifying" we can do is unifying one expression with another.
However there is a concept of a partial list to which additional elements can be "added" at the end. This is typically known as a difference list, although that nomenclature may not be immediately understandable.
Suppose we start, not with an empty list, but with a free variable X. One might however think of subtracting X from X and getting "nothing". That is, an empty difference list is represented by X - X. The minus "-" here is a purely formal operator; no evaluation of the difference is intended. It's just a convenient syntax as you see from how difference lists can be used to accomplish what you (probably) want to do.
We can add an element to a difference list as follows:
insertDL(M,X-Y,X-Z) :- Y = [M|Z].
Here M is the new element we want to add, X-Y is the "old" difference list, and X-Z is the "new" difference (to which M has been added, by unifying the previously free variable Y with the partial list [M|Z], so that Z becomes the "open" tail of partial list X).
When we are finally done inserting things into our difference list, we can turn X into a proper list by setting the "free tail" at that point to the empty list [ ]. In this sense X is the "same" variable as when we first began, just unified by incremental steps from free variable to proper list.
This is a very powerful technique in Prolog programming, and it takes some practice to feel comfortable using it. Some links to further discussion on the Web:
[From Prolog lists to difference lists]
http://www.irisa.fr/prive/ridoux/ICLP91/node8.html
[Implementing difference lists in Prolog]
http://www.cl.cam.ac.uk/~jpw48/difflists.pdf
[Lecture Notes: Difference Lists]
http://www.cs.cmu.edu/~fp/courses/lp/lectures/11-diff.pdf
Some prologs provide the setarg/3 predicate in order to modify terms in place.
In order to use it over lists, you only need to consider that they are just a nice representation of chains of compound terms with functor '.'/2
In any case, when you need to use setarg/3 in Prolog, it probably means you are doing something wrong.

Resources