List Recursive Predicates, advanced - prolog

Just going through some tutorials and I found some advanced recursive procedurs like flatten. I have tried to google to find similar examples that involve multiple recursions (head tail) but could not get the result I required.
Could you infer some predicates or tutorials that cover advanced list recusions (on both head, tail)?

Just to expand a bit on what #hardmath is saying, let's look at the definition of lists:
Base case: []
Inductive case: [Head|Tail]
What makes this a recursive data structure is that Tail is also a list. So when you see [1,2,3], you're also seeing [1|[2|[3|[]]]]. Let's prove it:
?- X = [1|[2|[3|[]]]].
X = [1, 2, 3].
So more "advanced" forms of recursion are forms that either involve more complex recursive data types or more complex computations. The next recursive data type most people are exposed to are binary trees, and binary trees have the nice property that they have two branches per node, so let's look at trees for a second.
First we need a nice definition like the definition from lists. I propose the following:
Base case: empty
Inductive case: tree(LeftBranch, Value, RightBranch)
Now let's create some example trees just to get a feel for how they look:
% this is like the empty list: no data
empty
% this is your basic tree of one node
tree(empty, 1, empty)
% this is a tree with two nodes
tree(tree(empty, 1, empty), 2, empty).
Structurally, the last example there would probably look something like this:
2
/
1
Now let's make a fuller example with several levels. Let's build this tree:
10
/ \
5 9
/ \ / \
4 6 7 14
In our Prolog syntax it's going to look like this:
tree(tree(tree(empty, 4, empty), 5, tree(empty, 6, empty)),
10,
tree(tree(empty, 7, empty), 9, tree(empty, 14, empty)))
The first thing we're going to want is a way to add up the size of the tree. Like with lists, we need to consider our base case and then our inductive cases.
% base case
tree_size(empty, 0).
% inductive case
tree_size(tree(Left, _, Right), Size) :-
tree_size(Left, LeftSize),
tree_size(Right, RightSize),
Size is LeftSize + RightSize + 1.
For comparison, let's look at list length:
% base case
length([], 0).
% inductive case
length([_|Rest], Length) :-
length(Rest, LengthOfRest),
Length is LengthOfRest + 1.
Edit: #false points out that though the above is intuitive, a version with better logical properties can be produced by changing the inductive case to:
length([_|Rest], Length) :-
length(Rest, LengthOfRest),
succ(LengthOfRest, Length).
So you can see the hallmarks of recursively processing data structures clearly by comparing these two:
You are given a recursive data structure, defined in terms of base cases and inductive cases.
You write the base of your rule to handle the base case.
This step is usually obvious; in the case of length or size, your data structure will have a base case that is empty so you just have to associate zero with that case.
You write the inductive step of your rule.
The inductive step takes the recursive case of the data structure and handles whatever that case adds, and combining that with the result of recursively calling your rule to process "the rest" of the data structure.
Because lists are only recursive in one direction there's only one recursive call in most list processing rules. Because trees have two branches there can be one or two depending on whether you need to process the whole tree or just go down one path. Both lists and trees effectively have two "constructors," so most rules will have two bodies, one to handle the empty case and one to handle the inductive case. More complex structures, such as language grammars, can have more than two basic patterns, and usually you'll either process all of them separately or you'll just be seeking out one pattern in particular.
As an exercise, you may want to try writing search, insert, height, balance or is_balanced and various other tree queries to get more familiar with the process.

Related

Prolog recursion doesn't work the other way

I am learning Prolog and I got stuck in the code where the knowledge base and base rules are:
rectangle(a1, 3, 5). %Name,Width,Height
rectangle(a2, 1, 2).
rectangle(a3, 4, 3).
calcWH(T,C) :-
rectangle(T,W,_), C is W. % finds the width of the rect.
The recursive part is:
calcWTH([],0). % Base case
calcWTH([H | T ] ,K) :-
length([H|T],L),
L =< 3,
calcWTH(T,M),
calcWH(H,O),
K is O+M.
What I want to do with this code is, with calcWTH I want to calculate total width of the list of rectangles. For example
calcWTH([a1,a2,a3],A)
returns 8, as expected. The thing I'm puzzled is that this code only works for one way, not the other. For example when I make a query like
calcWTH(A,3)
It first finds A= [a1], then A=[a2,a2,a2], and afterwards I expect the program to stop because of the part length([H|T],L), L =< 3 But It gets into a loop indefinitely. What am I missing here?
Think about what this is doing:
length([H|T],L), L =< 3
Make a list
Check its length is less than 3, if it is - then success, other wise back track to length/2 and try making another list and then check if that list is length is less than 3.
But we know that list/2 is creating lists in order of larger size,
so we know that if list of length 3 fails, then the next list
created on back tracking will be bigger and all other lists.. but
prolog does not know that. You could imagine writing a length
predicate that searchs for lists in a differnt order, maybe
randomly, or from a predefined big size to ever smaller lists.
Then we would want the backtracking to work as is.
Anyway that is why you get infinite recursion, it will keep trying new lists, of ever increasing size.
You might find adding constraints is a good way to solve your problem:
Using a constrained variable with `length/2`

Prolog get the number of nested list + 1 where an element is into a list

I'd like to return the depth level or number of nested list where certain element is. Also as condition the list doesn't have repeated elements. I am trying to understand this solution where I have two main doubts:
profundidad([],_,0):-!.
profundidad([A],A,1):-!.
profundidad([A|_],A,1):-!.
profundidad([H|_],A,N):-
profundidad(H,A,R),N is R+1.
profundidad([_|X],A,N):-
profundidad(X,A,N),!.
The correct output would be:
profundidad([2,3,4,5,[[6]],10],6,X).
X = 3
First, why we do put the cut operator ! from 1-3 statements? I know it prevents compiler from considering later statements when a solution is found.
Second, how we could read 4th and 5th cases in natural language?
The depth of an element A when the list is splitted by the head H and the rest _, is equal to the number R of steps plus 1.
profundidad([H|_],A,N):-
profundidad(H,A,R),N is R+1.
And those two sentences I think they are the same as previous ones but to go forward into the list:
profundidad([_|X],A,N):-
profundidad(X,A,N),!.
Plus, I am doubting now about why to not put [] into recursive called to:
profundidad(X,A,N),!.
I think it is to go deep into the nested lists but I am not sure.
Thank you.
It's better to avoid cuts when possible, in the following rewrite and simplification the test H\=A make the three clauses disjunctive.
profundidad([A|_],A,1).
profundidad([H|_],A,N):-
% H\=A,
profundidad(H,A,R),
N is R+1.
profundidad([H|T],A,N):-
H\=A,
profundidad(T,A,N).
The second clause doesn't need the test, since it goes down a level in the list, meaning it will succeed when it's a list, and then cannot unify with the target element anyway. For clarity it could stay there - it doesn't harm.
If your Prolog has dif/2, you could use that instead of (\=)/2, depending on your requirements about the generality (WRT variables instantiation) of the solution.

extracting algorithm for building prolog rules

I have to create some "list of predicates" in prolog.
But I don't fully understand the way of thinking
that I have to learn if I want to create some working predicates.
I've seen some popular tutorials (maybe I've not been searching precisely enough), but I can not find any tutorial that teaches how to plan an algorithm using REALLY elementary steps.
For example...
Task:
Write a concat(X,Y,Z). predicate that is taking elements from the lists X and Y and concatenates them in the list Z.
My analysing algorithm:
Firstly I define a domain of the number of elements I'll be concatenating (the lengths of the lists X and Y) as non-negative integers (XCount >= 0 and YCount >= 0). Then I create a predicate for the first case, which is XCount = 0 and YCount = 0:
concat([],[],[]).
... then test it and find that it is working for the first case.
Then, I create a predicate for the second case, where XCount = 1 and YCount = 0, as:
concat(X,[],X).
... and again test it and find that it is working with some unexpected positive result.
Results:
I can see that this algorithm is working not only for XCount = 1 but also for XCount = 0. So I can delete concat([],[],[]). and have only concat(X,[],X)., because X = [] inside predicate concat(X,[],X). is the same as concat([],[],[])..
The second unexpected result is that the algorithm is working not only for XCount in 0,1 but for all XCount >= 0.
Then I analyse the domain and search for elements that weren't handled yet, and find that the simplest way is to create a second predicate for YCount > 0.
Remembering that using just X as the first argument can cover all XCount >= 0, I create a case for YCount = 1 and all Xes, which is:
concat(X,[Y|_],[Y|X]).
And this is the place where my algorithm gets a brain-buffer overflow.
Respecting stackoverflow rules, I'm asking precisely.
Questions:
Is there any way to find the answer by myself? By which I mean - not an answer for the problem, but for the algorithm I've shown to solve it.
In the other words, the algorithm of my algorithm.
If you can answer question 1, how can I find this type of hint in future? Is there a specific name for my problem?
How precise do I have to be - how many cases and in what language can I try to implement my algorithm that is not just "doing" things, but is "thinking" how to plan and create other algorithms.
Lists are not defined as counts of elements in them. lists are defined recursively, as empty, or a pair of an element and the rest of elements:
list([]).
list([_A|B]) :- list(B).
Lists can be the same:
same_lists([], []).
same_lists([A|B], [A|C]) :- same_lists(B, C).
Or one can be shorter than the other, i.e. its prefix:
list_prefix([], L):- list(L).
list_prefix([A|B], [A|C]):- list_prefix(B, C).
Where the prefix ends, the suffix begins:
list_split([], L, L):- list(L).
list_split([A|B], Sfx, [A|C]):- list_split(B, Sfx, C).
So, general advice is: follow the types, how they are constructed, and analyze the situation according to all possible cases. With lists, it is either empty, or non-empty lists.

Hierarchical Agglomerative Clustering in Prolog

I'm trying to re-familiarize myself with Prolog and I thought this could be the type of problem with an elegant solution in Prolog.
I'm following along this example:
http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html
I've tried a variety of data formats:
dist('BA','FI',662).
dist(0,'BA','FI',662).
dist(['BA'],['FI'],662).
but I haven't found any particular one most suitable.
Here's all the data in the first format:
%% Graph distances
dist('BA','FI',662).
dist('BA','MI',877).
dist('BA','NA',255).
dist('BA','RM',412).
dist('BA','TO',996).
dist('FI','MI',295).
dist('FI','NA',468).
dist('FI','RM',268).
dist('FI','TO',400).
dist('MI','NA',754).
dist('MI','RM',564).
dist('MI','TO',138).
dist('NA','RM',219).
dist('NA','TO',869).
dist('RM','TO',669).
Now, there seems to be some awesome structure to this problem to exploit, but I'm really struggling to get a grasp of it. I think I've got the first cluster here (thought it may not be the most elegant way of doing it ;)
minDist(A,B,D) :- dist(A,B,D), dist(X,Y,Z), A \= X, A \= Y, B \= X, B \= Y, D < Z.
min(A,B,B) :- B < A
min(A,B,A) :- A < B
dist([A,B],C, D) :- minDist(A,B,D), dist(A,C,Q), dist(B,C,W), min(Q,W,D)
The problem I have here is the concept of "replacing" the dist statements involving A and B with the cluster.
This just quickly become a brainteaser for me and I'm stuck. Any ideas on how to formulate this? Or is this perhaps just not the kind of problem elegantly solved with Prolog?
Your table is actually perfect! The problem is that you don't have an intermediate data structure. I'm guessing you'll find the following code pretty surprising. In Prolog, you can simply use whatever structures you want, and it will actually work. First let's get the preliminary we need for calculating distance without regard for argument order:
distance(X, Y, Dist) :- dist(X, Y, Dist) ; dist(Y, X, Dist).
This just swaps the order if it doesn't get a distance on the first try.
Another utility we'll need: the list of cities:
all_cities(['BA','FI','MI','NA','RM','TO']).
This is just helpful; we could compute it, but it would be tedious and weird looking.
OK, so the end of the linked article makes it clear that what is actually being created is a tree structure. The article doesn't show you the tree at all until you get to the end, so it isn't obvious that's what's going on in the merges. In Prolog, we can simply use the structure we want and there it is, and it will work. To demonstrate, let's enumerate the items in a tree with something like member/2 for lists:
% Our clustering forms a tree. So we need to be able to do some basic
% operations on the tree, like get all of the cities in the tree. This
% predicate shows how that is done, and shows what the structure of
% the cluster is going to look like.
cluster_member(X, leaf(X)).
cluster_member(X, cluster(Left, Right)) :-
cluster_member(X, Left) ; cluster_member(X, Right).
So you can see we're going to be making use of trees using leaf('FI') for instance, to represent a leaf-node, a cluster of N=1, and cluster(X,Y) to represent a cluster tree with two branches. The code above lets you enumerate all the cities within a cluster, which we'll need to compute the minimum distance between them.
% To calculate the minimum distance between two cluster positions we
% need to basically pair up each city from each side of the cluster
% and find the minimum.
cluster_distance(X, Y, Distance) :-
setof(D,
XCity^YCity^(
cluster_member(XCity, X),
cluster_member(YCity, Y),
distance(XCity, YCity, D)),
[Distance|_]).
This probably looks pretty weird. I'm cheating here. The setof/3 metapredicate finds solutions for a particular goal. The calling pattern is something like setof(Template, Goal, Result) where the Result will become a list of Template for each Goal success. This is just like bagof/3 except that setof/3 gives you unique results. How does it do that? By sorting! My third argument is [Distance|_], saying just give me the first item in the result list. Because the result is sorted, the first item in the list will be the smallest. It's a big cheat!
The XCity^YCity^ notation says to setof/3: I don't care what these variables actually are. It marks them as "existential variables." This means Prolog will not provide multiple solutions for each city combination; they will all be thrown together and sorted once.
This is all we need to perform the clustering!
From the article, the base case is when you have two clusters left: just combine them:
% OK, the base case for clustering is that we have two items left, so
% we cluster them together.
cluster([Left,Right], cluster(Left,Right)).
The inductive case takes the list of results and finds the two which are nearest and combines them. Hold on!
% The inductive case is: pair up each cluster and find the minimum distance.
cluster(CityClusters, FinalCityClusters) :-
CityClusters = [_,_,_|_], % ensure we have >2 clusters
setof(result(D, cluster(N1,N2), CC2),
CC1^(select(N1, CityClusters, CC1),
select(N2, CC1, CC2),
cluster_distance(N1, N2, D)),
[result(_, NewCluster, Remainder)|_]),
cluster([NewCluster|Remainder], FinalCityClusters).
Prolog's built-in sorting is to sort a structure on the first argument. We cheat again here by creating a new structure, result/3, which will contain the distance, the cluster with that distance, and the remaining items to be considered. select/3 is extremely handy. It works by pulling an item out of the list and then giving you back the list without that item. We use it twice here to select two items from the list (I don't have to worry about comparing a place to itself as a result!). CC1 is marked as a free variable. The result structures will be created for considering each possible cluster with the items we were given. Again, setof/3 will sort the list to make it unique, so the first item in the list will happen to be the one with the shortest distance. It's a lot of work for one setof/3 call, but I like to cheat!
The last line says, take the new cluster and append it to the remaining items, and forward it on recursively to ourself. The result of that invocation will eventually be the base case.
Now does it work? Let's make a quick-n-dirty main procedure to test it:
main :-
setof(leaf(X), (all_cities(Cities), member(X, Cities)), Basis),
cluster(Basis, Result),
write(Result), nl.
Line one is a cheesy way to construct the initial conditions (all cities in their own cluster of one). Line two calls our predicate to cluster things. Then we write it out. What do we get? (Output manually indented for readability.)
cluster(
cluster(
leaf(FI),
cluster(
leaf(BA),
cluster(
leaf(NA),
leaf(RM)))),
cluster(
leaf(MI),
leaf(TO)))
The order is slightly different, but the result is the same!
If you're perplexed by my use of setof/3 (I would be!) then consider rewriting those predicates using the aggregate library or with simple recursive procedures that aggregate and find the minimum by hand.

Find best result without findall and a filter

I'm in a bit of pickle in Prolog.
I have a collection of objects. These objects have a certain dimension, hence weight.
I want to split up these objects in 2 sets (which form the entire set together) in such a way that their difference in total weight is minimal.
The first thing I tried was the following (pseudo-code):
-> findall with predicate createSets(List, set(A, B))
-> iterate over results while
---> calculate weight of both
---> calculate difference
---> loop with current difference and compare to current difference
till end of list of sets
This is pretty straightforward. The issue here is that I have a list of +/- 30 objects. Creating all possible sets causes a stack overflow.
Helper predicates:
sublist([],[]).
sublist(X, [_ | RestY]) :-
sublist(X,RestY).
sublist([Item|RestX], [Item|RestY]) :-
sublist(RestX,RestY).
subtract([], _, []) :-
!.
subtract([Head|Tail],ToSubstractList,Result) :-
memberchk(Head,ToSubstractList),
!,
subtract(Tail, ToSubstractList, Result).
subtract([Head|Tail], ToSubstractList, [Head|ResultTail]) :-
!,
subtract(Tail,ToSubstractList,ResultTail).
generateAllPossibleSubsets(ListToSplit,sets(Sublist,SecondPart)) :-
sublist(Sublist,ListToSplit),
subtract(ListToSplit, Sublist, SecondPart).
These can then be used as follows:
:- findall(Set, generateAllPossibleSubsets(ObjectList,Set), ListOfSets ),
findMinimalDifference(ListOfSets,Set).
So because I think this is a wrong way to do it, I figured I'd try it in an iterative way. This is what I have so far:
totalWeightOfSet([],0).
totalWeightOfSet([Head|RestOfSet],Weight) :-
objectWeight(Head,HeadWeight),
totalWeightOfSet(RestOfSet, RestWeight),
Weight is HeadWeight + RestWeight.
findBestBalancedSet(ListOfObjects,Sets) :-
generateAllPossibleSubsets(ListOfObjects,sets(A,B)),
totalWeightOfSet(A,WeightA),
totalWeightOfSet(B,WeightB),
Temp is WeightA - WeightB,
abs(Temp, Difference),
betterSets(ListOfObjects, Difference, Sets).
betterSets(ListOfObjects,OriginalDifference,sets(A,B)) :-
generateAllPossibleSubsets(ListOfObjects,sets(A,B)),
totalWeightOfSet(A,WeightA),
totalWeightOfSet(B,WeightB),
Temp is WeightA - WeightB,
abs(Temp, Difference),
OriginalDifference > Difference,
!,
betterSets(ListOfObjects, Difference, sets(A, B)).
betterSets(_,Difference,sets(A,B)) :-
write_ln(Difference).
The issue here is that it returns a better result, but it hasn't traversed the entire solution tree. I have a feeling this is a default Prolog scheme I'm missing here.
So basically I want it to tell me "these two sets have the minimal difference".
Edit:
What are the pros and cons of using manual list iteration vs recursion through fail
This is a possible solution (the recursion through fail) except that it can not fail, since that won't return the best set.
I would generate the 30 objects list, sort it descending on weight, then pop objects off the sorted list one by one and put each into one or the other of the two sets, so that I get the minimal difference between the two sets on each step. Each time we add an element to a set, just add together their weights, to keep track of the set's weight. Start with two empty sets, each with a total weight of 0.
It won't be the best partition probably, but might come close to it.
A very straightforward implementation:
pair(A,B,A-B).
near_balanced_partition(L,S1,S2):-
maplist(weight,L,W), %// user-supplied predicate weight(+E,?W).
maplist(pair,W,L,WL),
keysort(WL,SL),
reverse(SL,SLR),
partition(SLR,0,[],0,[],S1,S2).
partition([],_,A,_,B,A,B).
partition([N-E|R],N1,L1,N2,L2,S1,S2):-
( abs(N2-N1-N) < abs(N1-N2-N)
-> N3 is N1+N,
partition(R,N3,[E|L1],N2,L2,S1,S2)
; N3 is N2+N,
partition(R,N1,L1,N3,[E|L2],S1,S2)
).
If you insist on finding the precise answer, you will have to generate all the partitions of your list into two sets. Then while generating, you'd keep the current best.
The most important thing left is to find the way to generate them iteratively.
A given object is either included in the first subset, or the second (you don't mention whether they're all different; let's assume they are). We thus have a 30-bit number that represents the partition. This allows us to enumerate them independently, so our state is minimal. For 30 objects there will be 2^30 ~= 10^9 generated partitions.
exact_partition(L,S1,S2):-
maplist(weight,L,W), %// user-supplied predicate weight(+E,?W).
maplist(pair,W,L,WL),
keysort(WL,SL), %// not necessary here except for the aesthetics
length(L,Len), length(Num,Len), maplist(=(0),Num),
.....
You will have to implement the binary arithmetics to add 1 to Num on each step, and generate the two subsets from SL according to the new Num, possibly in one fused operation. For each freshly generated subset, it's easy to calculate its weight (this calculation too can be fused into the same generating operation):
maplist(pair,Ws,_,Subset1),
sumlist(Ws,Weight1),
.....
This binary number, Num, is all that represents our current position in the search space, together with the unchanging list SL. Thus the search will be iterative, i.e. running in constant space.

Resources