Related
During my exploration of different ways to write down lists, I am intrigued by the following list [[a,b]|c] which appears in the book 'Prolog and Natural Language Analysis' by Pereira and Shieber (page 42 of the digital edition).
At first I thought that such a notation was syntactically incorrect, as it would have had to say [[a,b]|[c]], but after using write_canonical/1 Prolog returned '.'('.'(a,'.'(b,[])),c).
As far as I can see, this corresponds to the following tree structure (although it seems odd to me that structure would simply end with c, without the empty list at the end):
I cannot seem to find the corresponding notation using comma's and brackets though. I thought it would correspond to [[a,b],c] (but this obviously returns a different result with write_canonical/1).
Is there no corresponding notation for [[a,b]|c] or am I looking at it the wrong way?
As others have already indicated, the term [[a,b]|c] is not a list.
You can test this yourself, using the | syntax to write it down:
?- is_list([[a,b]|c]).
false.
You can see from write_canonical/1 that this term is identical to what you have drawn:
| ?- write_canonical([[a,b]|c]).
'.'('.'(a,'.'(b,[])),c)
In addition to what others have said, I am posting an additional answer because I want to explain how you can go about finding the reason of unexpected failures. When starting with Prolog, you will often ask yourself "Why does this query fail?"
One way to find explanations for such issues is to generalize the query, by using logical variables instead of concrete terms.
For example, in the above case, we could write:
?- is_list([[A,b]|c]).
false.
In this case, I have used the logical variable A instead of the atom a, thus significantly generalizing the query. Since the generalized query still fails, some constraint in the remaining part must be responsible for the unexpected failure. We this generalize it further to narrow down the cause. For example:
?- is_list([[A,B]|c]).
false.
Or even further:
?- is_list([[A,B|_]|c]).
false.
And even further:
?- is_list([_|c]).
false.
So here we have it: No term that has the general form '.'(_, c) is a list!
As you rightly observe, this is because such a term is not of the form [_|Ls] where Ls is a list.
NOTE: The declarative debugging approach I apply above works for the monotonic subset of Prolog. Actually, is_list/1 does not belong to that subset, because we have:
?- is_list(Ls).
false.
with the declarative reading "There is no spoon list." So, it turns out, it worked only by coincidence in the case above. However, we could define the intended declarative meaning of is_list/1 in a pure and monotonic way like this, by simply applying the inductive definition of lists:
list([]).
list([_|Ls]) :- list(Ls).
This definition only uses pure and monotonic building blocks and hence is monotonic. For example, the most general query now yields actual lists instead of failing (incorrectly):
?- list(Ls).
Ls = [] ;
Ls = [_6656] ;
Ls = [_6656, _6662] ;
Ls = [_6656, _6662, _6668] .
From pure relations, we expect that queries work in all directions!
I cannot seem to find the corresponding notation using comma's and brackets though.
There is no corresponding notation, since this is technically speaking not a real list.
Prolog has syntacical sugar for lists. A list in Prolog is, like a Lisp list, actually a linked list: every element is either an empty list [], or a node .(H,T) with H the head and T the tail. Lists are not "special" in Prolog in the sense that the intepreter handles them differently than any other term. Of course a lot of Prolog libraries do list processing, and use the convention defined above.
To make complex lists more convenient, syntactical sugar was invented. You can write a node .(H,T) like [H|T] as well. So that means that in your [[a,b]|c]. We have an outer list, which has one node .(H,c) and the ? is another list, with two nodes and an empty list H = .(a,.(b,[])).
Technically speaking I would not consider this a "real" list, since the tail of a list should have either another node ./2, or an empty list.
You can however use this with variables like: [[a,b]|C] in order to unify the tail C further. So here we have some sort of list with [a,b] as first element (so a list containing a list) and with an open tail C. If we later for instance ground C to C = [], then the list is [[a,b]].
Given a list (A) I want to be able to create a new list (B) that contains only the elements of A that are the smallest or the biggest compared to their next and previous element. My problem is that I don't know how to do the comparisons of each element with its previous one.
(This question may be silly but I'm new to prolog and any help would be appreciated.)
You could start with something like that:
compareElem([]).
compareElem([H,H1,H2|B]):-compareElem(B),
compare(?Order, H1,H2),
compare(?Order, H1, H).
where ?Order is the order of comparison (like '<' or '>'). See compare/3.
Some queries:
?- compareElem([1,2,3,4,5,6]).
true.
?- compareElem([1,2,3,4,5,3]).
false.
of course to apply this example you must ensure that the list has 3n elements, this is just a basic example. Together with this comparison you can generate the other list
When I was writing down this question on an empty list as a difference list I wanted to test what I knew about those structures. However, when I tried something as simple as comparing different notations it seemed that I was wrong and that I did not understand what is actually going on with difference lists.
?- L = [a,b,c|[d,e]]-[d,e], L = [a,b,c].
false % expected true
I tested this on SWI-Prolog as well as SICStus. I verified the notation as this is how it is written in Bratko's Prolog Programming for AI, page 210, but apparently unification is not possible. Why is that? Don't these notations have the same declarative meaning?
I think you have the idea that the Prolog interpreter treats difference lists as something special. That is not the case: Prolog is not aware of the concept of a difference list (nor of nearly every concept except some syntactical sugar). He only sees:
L=-( |(a, |(b, |(c, |(d, |(e, []))))), |(d, |(e, [] )))
where -/2 and |/2 are functors, and a, b, c, d, e and [] are constants.
Difference lists are simply a programming technique (like for instance dynamic programming is a technique as well, the compiler cannot detect nor treat dynamic programming programs differently). It is used to efficiently unify a (partially) ununified part deep in an expression.
Say you want to append/3 two lists. You can do this as follows:
%append(A,B,C).
append([],L,L).
append([H|T],L,[H|B]) :-
append(T,L,B).
But this runs in O(n): you first need to iterate through the entire first list. If that list contains thousands of elements, it will take a lot of time.
Now you can define yourself a contract that you will feed an append_diff/3 not only the list, but a tuple -(List,Tail) where List is a reference to the beginning of the list, and Tail is a reference to the end of the not unified list. Examples of structures that fulfill this requirement are Tail-Tail, [a|Tail]-Tail, [1,4,2,5|Tail]-Tail.
Now you can effectively append_diff/3 in O(1) with:
append_diff(H1-T1,T1-T2,H1-T2).
Why? Because you unify the ununified tail of the first list with the second list. Now the ununified tail of the second lists becomes the tail of the final list. So take for instance:
append_diff([a|T1]-T1,[1,4,2,5|T2]-T2,L).
If you call the predicate, as you see above, T1 will unify with [1,4,2,5|T2], so now the first list collapses to [a|[1,4,2,5|T2]] or shorter [a,1,4,2,5|T2], since we also have a reference to T2, we can "return" (in Prolog nothing is returned), [a,1,4,2,5|T2]-T2: a new difference list with an open tail T2. But this is only because you give - a special meaning yourself: for Prolog - is simply -, it is not minus, it does not calculate a difference, etc. Prolog does not attach semantics to functors. If you would have used + instead of -, that would not have made the slightest difference.
So to return back to your question: you simply state to Prolog that L = -([a,b,c,d,e],[d,e]) and later state that L = [a,b,c]. Now it is clear that those two expressions cannot be unified. So Prolog says false.
I'm trying to re-familiarize myself with Prolog and I thought this could be the type of problem with an elegant solution in Prolog.
I'm following along this example:
http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html
I've tried a variety of data formats:
dist('BA','FI',662).
dist(0,'BA','FI',662).
dist(['BA'],['FI'],662).
but I haven't found any particular one most suitable.
Here's all the data in the first format:
%% Graph distances
dist('BA','FI',662).
dist('BA','MI',877).
dist('BA','NA',255).
dist('BA','RM',412).
dist('BA','TO',996).
dist('FI','MI',295).
dist('FI','NA',468).
dist('FI','RM',268).
dist('FI','TO',400).
dist('MI','NA',754).
dist('MI','RM',564).
dist('MI','TO',138).
dist('NA','RM',219).
dist('NA','TO',869).
dist('RM','TO',669).
Now, there seems to be some awesome structure to this problem to exploit, but I'm really struggling to get a grasp of it. I think I've got the first cluster here (thought it may not be the most elegant way of doing it ;)
minDist(A,B,D) :- dist(A,B,D), dist(X,Y,Z), A \= X, A \= Y, B \= X, B \= Y, D < Z.
min(A,B,B) :- B < A
min(A,B,A) :- A < B
dist([A,B],C, D) :- minDist(A,B,D), dist(A,C,Q), dist(B,C,W), min(Q,W,D)
The problem I have here is the concept of "replacing" the dist statements involving A and B with the cluster.
This just quickly become a brainteaser for me and I'm stuck. Any ideas on how to formulate this? Or is this perhaps just not the kind of problem elegantly solved with Prolog?
Your table is actually perfect! The problem is that you don't have an intermediate data structure. I'm guessing you'll find the following code pretty surprising. In Prolog, you can simply use whatever structures you want, and it will actually work. First let's get the preliminary we need for calculating distance without regard for argument order:
distance(X, Y, Dist) :- dist(X, Y, Dist) ; dist(Y, X, Dist).
This just swaps the order if it doesn't get a distance on the first try.
Another utility we'll need: the list of cities:
all_cities(['BA','FI','MI','NA','RM','TO']).
This is just helpful; we could compute it, but it would be tedious and weird looking.
OK, so the end of the linked article makes it clear that what is actually being created is a tree structure. The article doesn't show you the tree at all until you get to the end, so it isn't obvious that's what's going on in the merges. In Prolog, we can simply use the structure we want and there it is, and it will work. To demonstrate, let's enumerate the items in a tree with something like member/2 for lists:
% Our clustering forms a tree. So we need to be able to do some basic
% operations on the tree, like get all of the cities in the tree. This
% predicate shows how that is done, and shows what the structure of
% the cluster is going to look like.
cluster_member(X, leaf(X)).
cluster_member(X, cluster(Left, Right)) :-
cluster_member(X, Left) ; cluster_member(X, Right).
So you can see we're going to be making use of trees using leaf('FI') for instance, to represent a leaf-node, a cluster of N=1, and cluster(X,Y) to represent a cluster tree with two branches. The code above lets you enumerate all the cities within a cluster, which we'll need to compute the minimum distance between them.
% To calculate the minimum distance between two cluster positions we
% need to basically pair up each city from each side of the cluster
% and find the minimum.
cluster_distance(X, Y, Distance) :-
setof(D,
XCity^YCity^(
cluster_member(XCity, X),
cluster_member(YCity, Y),
distance(XCity, YCity, D)),
[Distance|_]).
This probably looks pretty weird. I'm cheating here. The setof/3 metapredicate finds solutions for a particular goal. The calling pattern is something like setof(Template, Goal, Result) where the Result will become a list of Template for each Goal success. This is just like bagof/3 except that setof/3 gives you unique results. How does it do that? By sorting! My third argument is [Distance|_], saying just give me the first item in the result list. Because the result is sorted, the first item in the list will be the smallest. It's a big cheat!
The XCity^YCity^ notation says to setof/3: I don't care what these variables actually are. It marks them as "existential variables." This means Prolog will not provide multiple solutions for each city combination; they will all be thrown together and sorted once.
This is all we need to perform the clustering!
From the article, the base case is when you have two clusters left: just combine them:
% OK, the base case for clustering is that we have two items left, so
% we cluster them together.
cluster([Left,Right], cluster(Left,Right)).
The inductive case takes the list of results and finds the two which are nearest and combines them. Hold on!
% The inductive case is: pair up each cluster and find the minimum distance.
cluster(CityClusters, FinalCityClusters) :-
CityClusters = [_,_,_|_], % ensure we have >2 clusters
setof(result(D, cluster(N1,N2), CC2),
CC1^(select(N1, CityClusters, CC1),
select(N2, CC1, CC2),
cluster_distance(N1, N2, D)),
[result(_, NewCluster, Remainder)|_]),
cluster([NewCluster|Remainder], FinalCityClusters).
Prolog's built-in sorting is to sort a structure on the first argument. We cheat again here by creating a new structure, result/3, which will contain the distance, the cluster with that distance, and the remaining items to be considered. select/3 is extremely handy. It works by pulling an item out of the list and then giving you back the list without that item. We use it twice here to select two items from the list (I don't have to worry about comparing a place to itself as a result!). CC1 is marked as a free variable. The result structures will be created for considering each possible cluster with the items we were given. Again, setof/3 will sort the list to make it unique, so the first item in the list will happen to be the one with the shortest distance. It's a lot of work for one setof/3 call, but I like to cheat!
The last line says, take the new cluster and append it to the remaining items, and forward it on recursively to ourself. The result of that invocation will eventually be the base case.
Now does it work? Let's make a quick-n-dirty main procedure to test it:
main :-
setof(leaf(X), (all_cities(Cities), member(X, Cities)), Basis),
cluster(Basis, Result),
write(Result), nl.
Line one is a cheesy way to construct the initial conditions (all cities in their own cluster of one). Line two calls our predicate to cluster things. Then we write it out. What do we get? (Output manually indented for readability.)
cluster(
cluster(
leaf(FI),
cluster(
leaf(BA),
cluster(
leaf(NA),
leaf(RM)))),
cluster(
leaf(MI),
leaf(TO)))
The order is slightly different, but the result is the same!
If you're perplexed by my use of setof/3 (I would be!) then consider rewriting those predicates using the aggregate library or with simple recursive procedures that aggregate and find the minimum by hand.
I'm in a bit of pickle in Prolog.
I have a collection of objects. These objects have a certain dimension, hence weight.
I want to split up these objects in 2 sets (which form the entire set together) in such a way that their difference in total weight is minimal.
The first thing I tried was the following (pseudo-code):
-> findall with predicate createSets(List, set(A, B))
-> iterate over results while
---> calculate weight of both
---> calculate difference
---> loop with current difference and compare to current difference
till end of list of sets
This is pretty straightforward. The issue here is that I have a list of +/- 30 objects. Creating all possible sets causes a stack overflow.
Helper predicates:
sublist([],[]).
sublist(X, [_ | RestY]) :-
sublist(X,RestY).
sublist([Item|RestX], [Item|RestY]) :-
sublist(RestX,RestY).
subtract([], _, []) :-
!.
subtract([Head|Tail],ToSubstractList,Result) :-
memberchk(Head,ToSubstractList),
!,
subtract(Tail, ToSubstractList, Result).
subtract([Head|Tail], ToSubstractList, [Head|ResultTail]) :-
!,
subtract(Tail,ToSubstractList,ResultTail).
generateAllPossibleSubsets(ListToSplit,sets(Sublist,SecondPart)) :-
sublist(Sublist,ListToSplit),
subtract(ListToSplit, Sublist, SecondPart).
These can then be used as follows:
:- findall(Set, generateAllPossibleSubsets(ObjectList,Set), ListOfSets ),
findMinimalDifference(ListOfSets,Set).
So because I think this is a wrong way to do it, I figured I'd try it in an iterative way. This is what I have so far:
totalWeightOfSet([],0).
totalWeightOfSet([Head|RestOfSet],Weight) :-
objectWeight(Head,HeadWeight),
totalWeightOfSet(RestOfSet, RestWeight),
Weight is HeadWeight + RestWeight.
findBestBalancedSet(ListOfObjects,Sets) :-
generateAllPossibleSubsets(ListOfObjects,sets(A,B)),
totalWeightOfSet(A,WeightA),
totalWeightOfSet(B,WeightB),
Temp is WeightA - WeightB,
abs(Temp, Difference),
betterSets(ListOfObjects, Difference, Sets).
betterSets(ListOfObjects,OriginalDifference,sets(A,B)) :-
generateAllPossibleSubsets(ListOfObjects,sets(A,B)),
totalWeightOfSet(A,WeightA),
totalWeightOfSet(B,WeightB),
Temp is WeightA - WeightB,
abs(Temp, Difference),
OriginalDifference > Difference,
!,
betterSets(ListOfObjects, Difference, sets(A, B)).
betterSets(_,Difference,sets(A,B)) :-
write_ln(Difference).
The issue here is that it returns a better result, but it hasn't traversed the entire solution tree. I have a feeling this is a default Prolog scheme I'm missing here.
So basically I want it to tell me "these two sets have the minimal difference".
Edit:
What are the pros and cons of using manual list iteration vs recursion through fail
This is a possible solution (the recursion through fail) except that it can not fail, since that won't return the best set.
I would generate the 30 objects list, sort it descending on weight, then pop objects off the sorted list one by one and put each into one or the other of the two sets, so that I get the minimal difference between the two sets on each step. Each time we add an element to a set, just add together their weights, to keep track of the set's weight. Start with two empty sets, each with a total weight of 0.
It won't be the best partition probably, but might come close to it.
A very straightforward implementation:
pair(A,B,A-B).
near_balanced_partition(L,S1,S2):-
maplist(weight,L,W), %// user-supplied predicate weight(+E,?W).
maplist(pair,W,L,WL),
keysort(WL,SL),
reverse(SL,SLR),
partition(SLR,0,[],0,[],S1,S2).
partition([],_,A,_,B,A,B).
partition([N-E|R],N1,L1,N2,L2,S1,S2):-
( abs(N2-N1-N) < abs(N1-N2-N)
-> N3 is N1+N,
partition(R,N3,[E|L1],N2,L2,S1,S2)
; N3 is N2+N,
partition(R,N1,L1,N3,[E|L2],S1,S2)
).
If you insist on finding the precise answer, you will have to generate all the partitions of your list into two sets. Then while generating, you'd keep the current best.
The most important thing left is to find the way to generate them iteratively.
A given object is either included in the first subset, or the second (you don't mention whether they're all different; let's assume they are). We thus have a 30-bit number that represents the partition. This allows us to enumerate them independently, so our state is minimal. For 30 objects there will be 2^30 ~= 10^9 generated partitions.
exact_partition(L,S1,S2):-
maplist(weight,L,W), %// user-supplied predicate weight(+E,?W).
maplist(pair,W,L,WL),
keysort(WL,SL), %// not necessary here except for the aesthetics
length(L,Len), length(Num,Len), maplist(=(0),Num),
.....
You will have to implement the binary arithmetics to add 1 to Num on each step, and generate the two subsets from SL according to the new Num, possibly in one fused operation. For each freshly generated subset, it's easy to calculate its weight (this calculation too can be fused into the same generating operation):
maplist(pair,Ws,_,Subset1),
sumlist(Ws,Weight1),
.....
This binary number, Num, is all that represents our current position in the search space, together with the unchanging list SL. Thus the search will be iterative, i.e. running in constant space.