Functionally comparing data sets to each other once with Haskell - performance

After over a year of mental wrangling, I finally understand Haskell well enough to consider it my primary language for the majority of my general programming needs. I absolutely love it.
But I still struggle with doing very specific operations in a functional way.
A simplified example:
Set = [("Bob", 10), ("Megan", 7), ("Frank", 2), ("Jane", 11)]
I'd like to compare these entries to each other. With a language like C or Python, I'd probably create some complicated loop, but I'm not sure which approach (map, fold, list comprehension?) would be best or most efficient with a functional language.
Here's a sample of the code I started working on:
run xs = [ someAlgorithm (snd x) (snd y) | x <- xs, y <- xs, x /= y ]
The predicate keeps the list comprehension from comparing entries with themselves, but the function isn't very efficient because it compares entries that have already been compared. For example. It'll compare Bob with Megan, and then compare Megan with Bob.
Any advice on how to solve this issue would be greatly appreciated.

If you have an ordering on your data type, you can just use x < y instead of x /= y.
Another approach is to use tails to avoid comparing elements in the same position:
[ ... | (x:ys) <- tails xs, y <- ys]
This has the effect of only picking items y that occur after x in the original list. If your list contains duplicates, you'll want to combine this with the explicit filtering from before.

Related

Join two list in prolog?

I am learning prolog, what I am doing is writing a predicate to join two list. For example, if I query:
joinL([22,33,44],[1,2,3],L)
It will show L = [22,33,44,1,2,3].
To do it, I have tried to write predicate as followings:
joinL([],L2,L2).
joinL([H|T],L2,L):-joinL(T,L2,L),L = [H|L].
But when I query
joinL([22,33,44],[1,2,3],L)
It does not show desired result as i have just described above. Actually, it returns false.
What I want to ask is: "How did my code become wrong?", I do NOT ask "How to write predicate that join two list in prolog?" cause I can google it easily, and when compare with my code, I curiously want to know why i am wrong with my code. Can any one help me! Thank you all for reading and answering my question!
The problem is that you are using the = in the same way as one would use assignment:
L = [H|L]
In a state-changing language this means that whatever is stored in L (which is supposed to be a list) becomes a new list, made by tacking H to the front: [H|L]
In Prolog this states that what we know about L is that it is equal to [H|L]- equal to itself with H tacked to the front. This is not possible for any L though (actually, it is, if L is an infinite list containing only H but the proof engine of Prolog is not good enough to deal with that). Prolog's proof search fails at that hurdle and will return "false" - there are no solutions to the logic program you have entered.
(More after a coffee)
Here is how to think about this:
Ok, so I would like to state some logic facts about the problem of "list concatenation" so that, based on those logic facts, and given two completely-specified lists L1, L2, Prolog's proof search can determine enough about what the concatenated list LJ should look like to actually output it completely!
We decide to specify a predicate joinL(list1,list2,joinedlist) to express this.
First, we cover a special edge case:
joinL([],L2,LJ) :- LJ = L2.
So, it is stated that the 'joinL' relationship between the empty list '[]' and the joined list 'LJ' is such that 'LJ' is necessarily equal to 'L2'.
The logical reading is:
(LJ = L2) → joinL([],L2,LJ)
The operational reading is:
In order to prove joinL([],L2,LJ) you must prove LJ = L2 (which can either be verified if LJ and L2 are already known or can be added to the solution's known constraints if not.
There is also the reading of the SLD resolution, where you add the negation of joinL([],L2,LJ) to your set of logic facts, then try to prove ⊥ (the contradiction also know as the empty statement) using resolution, but I have not found that view to be particularly helpful.
Anyway, let's state more things about the edge cases:
joinL([],L2,LJ) :- LJ = L2.
joinL(L1,[],LJ) :- LJ = L1.
joinL([],[],LJ) :- LJ = [].
This will already enable the Prolog proof engine to determine LJ completely whenever any of the L1 and L2 is the empty list.
One commonly abbreviates to:
joinL([],L,L).
joinL(L,[],L).
joinL([],[],[]).
(The above abbreviation would not be possible in Picat for example)
And the third statement can be dropped because the other two "subsume it" - they cover that case more generally. Thus:
joinL([],L,L).
joinL(L,[],L).
Now for the case of non-empty lists. A fat part of logic programming is about inductive (or recursive) definitions of predicates (see this), so let's go:
joinL([H|T],L2,LJ) :- LJ = [H|LX], joinL(T,L2,LX).
Again, this is just a specification, where we say that the concatenation of a nonempty list [H|T] and any list L2 is a list LJ such that LJ is composed of H and a list LX and LX is the concatenation of T and L2.
This is useful to the Prolog proof engine because it gives more information about LJ (in fact, it specifies what the first element of LJ is) and reduces the problem to finding out more using the same predicate but a problem that is a little nearer to the base case with the empty list: joinL(T,L2,LX). If the proof goes down that route it will eventually hit joinL([],L2,LX), find out that L2 = LX and be able to successfully return from its descent.
joinL([H|T],L2,LJ) :- LJ = [H|LX], joinL(T,L2,LX).
is commonly abbreviated to
joinL([H|T],L2,[H|LX]) :- joinL(T,L2,LX).
Looks like we have covered everything with:
joinL([],L,L).
joinL(L,[],L).
joinL([H|T],L2,[H|LX]) :- joinL(T,L2,LX).
We can even drop the second statement, as it is covered by the recursive descent with L2 always equal to '[]'. It gives us a shorter program which burns cycles needlessly when L2 is '[]':
joinL([],L,L).
joinL([H|T],L2,[H|LX]) :- joinL(T,L2,LX).
Let's test this. One should use unit tests but I can't be bothered now and will just run these in SWISH. Let's see what Prolog can find out about X:
joinL([],[],X). % X = []
joinL([1,2],[],X). % X = [1,2]
joinL([],[1,2],X). % X = [1,2]
joinL([3,4],[1,2],X). % X = [3,4,1,2]
joinL([1,2],[3,4],X). % X = [1,2,3,4]
One can constrain the result completely, transforming Prolog into a checker:
joinL([3,4],[1,2],[3,4,1,2]). % true
joinL([3,4],[1,2],[1,1,1,1]). % false
Sometimes the predicate works backwards too, but often more careful design is needed. Not here:
joinL([3,4],L2,[3,4,1,2]). % L2 = [1, 2]
For this one, Prolog suggests a second solution might exist but there is none of course:
joinL(L1,[3,4],[1,2,3,4]). % L1 = [1, 2]
Find me something impossible:
joinL(L1,[3,4],[1,2,100,100]). % false

Prolog: Check if X is in range of 0 to K - 1

I'm new to prolog and every single bit of code I write turns into an infinite loop.
I'm specifically trying to see if X is in the range from 0 to K - 1.
range(X,X).
range(X,K) :- K0 is K - 1, range(X,K0).
My idea behind the code is that I decrement K until K0 equals X, then the base case will kick in. I'm getting an infinite loop though, so something with the way I'm thinking is wrong.
Welcome to the wondrous world of Prolog! It seems you tried to leapfrog several steps when learning Prolog, and (not very surprisingly) failed.
Ideally, you take a book like Art of Prolog and start with family relations. Then extend towards natural numbers using successor-arithmetics, and only then go to (is)/2. Today, (that is, since about 1996) there is even a better way than using (is)/2 which is library(clpfd) as found in SICStus or SWI.
So let's see how your program would have been, using successor-arithmetics. Maybe less_than_or_equal/2 would be a better name:
less_than_or_equal(N,N).
less_than_or_equal(N,s(M)) :-
less_than_or_equal(N,M).
?- less_than_or_equal(N,s(s(0))).
N = s(s(0))
; N = s(0)
; N = 0.
It works right out of the box! No looping whatsoever. So what went wrong?
Successor arithmetics relies on the natural numbers. But you used integers which contain also these negative numbers. With negative numbers, numbers are no longer well ordered (well founded, or Noetherian), and you directly experienced that consequence. So stick with the natural numbers! They are all natural, and do not contain any artificial negative ingredients. Whoever said "God made the integers, all else is the work of man." must have been wrong.
But now back to your program. Why does it not terminate? After all, you found an answer, so it is not completely wrong. Is it not? You tried to reapply the notions of control flow you learned in command oriented languages to Prolog. Well, Prolog has two relatively independent control flows, and many more surprising things like real variables (TM) that appear at runtime that have no direct counterpart in Java or C#. So this mapping did not work. I got a little bit suspicious when you called the fact a "base case". You probably meant that it is a "termination condition". But it is not.
So how can we easily understand termination in Prolog? The best is to use a failure-slice. The idea is that we will try to make your program as small as possible by inserting false goals into your program. At any place. Certain of the resulting programs will still not terminate. And those are most interesting, since they are a reason for non-termination of the original program! They are immediately, causally connected to your problem. And they are much better for they are shorter. Which means less time to read. Here are some attempts, I will strike through the parts that are no longer relevant.
range(X,X).
range(X,K) :-
K0 is K - 1, false,
range(X,K0).
Nah, above doesn't loop, so it cannot tell us anything. Let's try again:
range(X,X) :- false.
range(X,K) :-
K0 is K - 1,
range(X,K0), false.
This one loops for range(X,1) already. In fact, it is the minimal failure slice. With a bit of experience you will learn to see those with no effort.
We have to change something in the visible part to make this terminate. For example, you might add K > 0 or do what #Shevliaskovic suggested.
I believe the simplest way to do this is:
range(X,X).
range(X,K) :- X>0, X<K-1.
and here are my results:
6 ?- range(4,4).
true .
7 ?- range(5,8).
true.
8 ?- range(5,4).
false.
The simple way, as has been pointed out, if you just want to validate that X lies within a specified domain would be to just check the condition:
range(X,K) :- X >= 0 , X < K .
Otherwise, if you want your range/2 to be generative, would be to use the built-in between/3:
range(X,K) :- integer(K) , K1 is K-1 , between(0,K1,X).
If your prolog doesn't have a between/3, it's a pretty simple implementation:
%
% the classic `between/3` wants the inclusive lower and upper bounds
% to be bound. So we'll make the test once and use a helper predicate.
%
between(Lo,Hi,N) :-
integer(Lo),
integer(Hi),
_between(Lo,Hi,N)
.
_between(Lo,Hi,Lo) :- % unify the lower bound with the result
Lo =< Hi % - if we haven't yet exceeded the inclusive upper bound.
. %
_between(Lo,Hi,N) :- % otherwise...
Lo < Hi , % - if the lower bound is less than the inclusive upper bound
L1 is Lo+1 , % - increment the lower bound
_between(L1,Hi,N) % - and recurse down.
. %

Trying to write a tree-height predicate - do I need Peano-style natural numbers?

As a basic Prolog exercise, I set myself the task of writing a binary tree height predicate that would work forwards and "backwards" - that is, as well as determining the height of a known binary tree, it should be able to find all binary trees (including unbalanced ones) of a known height. This is the best solution I've come up with so far...
tree_eq1([],s). % Previously had a cut here - removed (see comments for reason)
tree_eq1([_|T],n(L,R)) :- tree_eq1(T,L), tree_eq1(T,R).
tree_eq1([_|T],n(L,R)) :- tree_eq1(T,L), tree_lt1(T,R).
tree_eq1([_|T],n(L,R)) :- tree_lt1(T,L), tree_eq1(T,R).
tree_lt1([_|_],s).
tree_lt1([_,X|T],n(L,R)) :- XX=[X|T], tree_lt1(XX,L), tree_lt1(XX,R).
The first argument is the height, expressed as a list - the elements are irrelevant, the length of the list expresses the height of the tree. So I'm basically abusing lists as Peano-style natural numbers. The reasons this is convenient are...
No concerns about negative numbers.
I can check for > or >= without knowing the exact number - for example, by matching two items on the head of the list, I ensure the list length is >=2 without caring about the length of the tail.
Neither of these properties seem to apply to Prolog numbers, and I can't think of a way so far to adapt the same basic approach to use actual numbers in place of these lists.
I've seen a few examples in Prolog using Peano-style numbers, so my question is - is this normal practice? Or is there some way to avoid the issue that I haven't spotted yet?
Also, is there a way to convert to/from a Peano-style representation that won't break the bidirectionality? The following don't work for fairly obvious reasons...
length(L,N), tree_eq1(L,X).
% infinite set of lists to explore if N is unknown
tree_eq1(L,X), length(L,N)
% infinite set of trees to explore if X is unknown
The best I can think of so far is an is-this-variable-instantiated test to choose between implementations, which seems like cheating to me.
BTW - I have some ideas for other methods which I don't want spoilers for - particularly a kind of dynamic programming approach. I'm really focused on fully understanding the lessons from this particular attempt.
First: +1 for using lists lengths for counting, which sometimes really is quite convenient and a nice alternative for successor notation.
Second: For reversible arithmetic, you typically use constraints instead of successor notation, because constraints allow you to work with actual numbers and come with built-in definitions of the usual mathematical relations among numbers.
For example, with SICStus Prolog or SWI:
:- use_module(library(clpfd)).
tree_height(s, 0).
tree_height(n(Left,Right), Height) :-
Height #>= 0,
Height #= max(HLeft,HRight) + 1,
tree_height(Left, HLeft),
tree_height(Right, HRight).
Example query:
?- tree_height(Tree, 2).
Tree = n(s, n(s, s)) ;
Tree = n(n(s, s), s) ;
Tree = n(n(s, s), n(s, s)) ;
false.
Third, notice that the most general query, ?- tree_eq1(X, Y)., does not work satisfactorily with your version. With the snippet above, at least it gives an infinite number of solutions (as it should):
?- tree_height(T, H).
T = s,
H = 0 ;
T = n(s, s),
H = 1 ;
T = n(s, n(s, s)),
H = 2 .
I leave their fair enumeration as an exercise.

Prolog programming - path way to a solution

I am studying prolog at university and facing some problems. What I already found out is just solution to a problem. However, I'm more interested in the way to think, i.e. how to get such solution.
Can somebody give me an advise on this field. I would really appreciate your help.
I give an example I am coping with and also, found a solution on stackoverflow here, but what I looking for is how does he do that, how does he find the answer :)
Write a predicate flatten(List,Flat) that flatten a list, e.g. flatten([a,b,[c,d],[[1,2]],foo],X) will give X=[a,b,c,d,1,2,foo].
This is the answer I found on stackoverflow:
flatten(List, Flattened):-
flatten(List, [], Flattened).
flatten([], Flattened, Flattened).
flatten([Item|Tail], L, Flattened):-
flatten(Item, L1, Flattened),
flatten(Tail, L, L1).
flatten(Item, Flattened, [Item|Flattened]):-
\+ is_list(Item).
this answer belongs to user gusbro and asked by user Parhs, I have try to find a way to contact user gusbro to ask him how he can derive such answer but I cannot.
Thank you very much.
Well, all I can say is that the way to solve a problem depends largely on the problem itself. There is a set of problems which are amenable to solve using recursion, where Prolog is well suited to solve them.
In this kind of problems, one can derive a solution to a larger problem by dividing it in two or more case classes.
In one class we have the "base cases", where we provide a solution to the problem when the input cannot be further divided into smaller cases.
The other class is the "recursive cases", where we split the input into parts, solve them separately, and then "join" the results to give a solution to this larger input.
In the example for flatten/2 we want to take as input a list of items where each item may also be a list, and the result shall be a list containing all the items from the input. Therefore we split the problem in its cases.
We will use an auxiliary argument to hold the intermediate flattened list, and thats the reason why we implement flatten/3.
Our flatten/2 predicate will therefore just call flatten/3 using an empty list as a starting intermediate flattened list:
flatten(List, Flattened):-
flatten(List, [], Flattened).
Now for the flatten/3 predicate, we have two base cases. The first one deals with an empty list. Note that we cannot further divide the problem when the input is an empty list. In this case we just take the intermediate flattened list as our result.
flatten([], Flattened, Flattened).
We now take the recursive step. This involves taking the input list and dividing the problem in two steps. The first step is to flatten the first item of this input list. The second step will be to recursively flatten the rest of it:
flatten([Item|Tail], L, Flattened):-
flatten(Item, L1, Flattened),
flatten(Tail, L, L1).
Ok, so the call to flatten(Item, L1, Flattened) flattens the first item but passes as intermediate list an unbound variable L1. This is just a trickery so that at the return of the predicate, the variable L1 still remain unbounded and Flattened will be of the form [...|L1] where ... are the flattened items of Item.
The next step, which calls flatten(Tail, L, L1) flattens the rest of the input list and the result is bounded with L1.
Our last clause is really another base case, the one that deals with single items (which are not lists). Therefore we have:
flatten(Item, Flattened, [Item|Flattened]):-
\+ is_list(Item).
which checks whether item is a list and when it is not a list it binds the result as a list with head=Item and as tail the intermediate flattened list.
First, I'll show you my approach to the problem, then I've got some resources for learning to think recursively.
Here's my solution to the problem "flatten a list of lists (of lists ...)". I've annotated it to show how I got there:
First, let's define the public interface to our solution. We define flatten/2. It's body consists of a call to the internal implementation flatten/3, which takes an accumulator, seeded as an empty list.
flatten ( X , R ) :-
flatten ( X , [] , R ) ,
.
That was easy.
The internal predicate flatten/3 is a little more complex, but not very.
First, we have the boundary condition: the empty list. That marks the end of what we need to do, so we unify the accumulator with the result:
flatten( [] , X , X ).
The next (and only) other case is a non-empty list. For this, we examine the head of the list. Our rule here is that it needs to flattened and appended to the result. A good rule of programming is to write descriptive code, and Prolog is itself a descriptive, rather than procedural, language: one describes the solution to the problem and lets the inference engine sort things out.
So...let's describe what needs to happen now, and punt on the mechanics of flattening the head of the list:
flatten( [X|Xs] , T , Y ) :-
flatten_head(X,X1) ,
append( T,X1,T1) ,
flatten( Xs , T1 , Y )
.
That, too, was easy.
That's the essence of the entire solution, right there. We've broken our problem into 3 pieces:
a special case (the empty list)
the normal case (a non-empty list)
what to do with each element in the list (not yet defined).
Let's move on to the implementation of how to flatten a single list element. That's easy, too. We've got two cases, here: the list item might be a list, or it might be something else.
First, the list element might be an unbound variable. We don't want untowards behaviour, like unbounded recursion happening, so let's take care of that straightaway, by disallowing unbound terms (for now). If the element is bound, we try to flatten it by invoking our public interface, flatten\2 again (oooooooh...more recursion!)
This accomplishes two things
First, it tells us whether we've got a list or not: flatten/2 fails if handed something other than a list.
Second, when it succeeds, the job of flatten_head/2 is done.
Here's the code:
flatten-head( X , Y ) :-
nonvar(X) ,
flatten( X , Y )
.
Finally, the last case we have to consider is the case of list elements that aren't lists (unbound vars, atoms or some other prolog term). These are already "flat"...all we need to do is wrap them as a single element list so that the caller (flatten\3) gets consistent semantics for its "return value":
flatten-head( X , [X] ).
Here's the complete code:
flatten ( X , R ) :-
flatten ( X , [] , R )
.
flatten( [] , X , X ) .
flatten( [X|Xs] , T , Y ) :-
flatten_head(X,X1) ,
append( T,X1,T1) ,
flatten( Xs , T1 , Y )
.
flatten-head( X , Y ) :-
nonvar(X) ,
flatten( X , Y )
.
flatten-head( X , [X] ) .
Each individual step is simple. It's identifying the pieces and weaving them together that's difficult (though sometimes, figuring out how to stop the recursion can be less than obvious).
Some Learning Resources
To understand recursion, you must first understand recursion—anonymous
Eric Roberts' Thinking Recursively (1986) is probably the best (only?) book specifically on developing a recursive point-of-view WRT developing software. There is an updated version Thinking Recursively With Java, 20th Anniversary Edition (2006), though I've not seen it.
Both books, of course, are available from the Usual Places: Powell's, Amazon, etc.
http://www.amazon.com/Thinking-Recursively-Eric-S-Roberts/dp/0471816523
http://www.amazon.com/Thinking-Recursively-Java-Eric-Roberts/dp/0471701467
http://www.powells.com/biblio/61-9780471816522-2
http://www.powells.com/biblio/72-9780471701460-0
You might also want to read Douglas Hofstadtler's classic Gödel, Escher, Bach: An Eternal Golden Braid Some consider it to be the best book ever written. YMMV.
Also available from the Usual Suspects:
http://www.powells.com/biblio/62-9780140289206-1
http://www.amazon.com/Godel-Escher-Bach-Eternal-Golden/dp/0465026567
A new book, though not directly about recursive theory, that might be useful, though I've not seen it (it's gotten good reviews) is Michael Corballis' The Recursive Mind:
The Origins of Human Language, Thought, and Civilization

Generating lists of satisfying values for a set of constraints

Given a set of constraints, I would like to efficiently generate the set of values.
Suppose I have a few constraints on my Thungus[1]:
goodThungus(X) :-
X > 100,
X < 1000.
sin(X) = 0.
Now, I can check a Thungus by asking:
goodThungus(500).
I would like to generate all good Thungi. I'm not sure how to do that; I'm really not sure about how to do it efficiently.
Note: this of course has to be a computable generation.
[1] Arbitrary object selected for this example.
What you are asking for can't be done in the full general case: imagine doing f(X) = 0 where f is a function for which the roots cannot be analytically determined, for example. Or suppose f(X) is the function "does the program X halt?". No computer is going to solve that for you.
Your options are basically to either:
Limit the set of constraints to things that you can reason about. e.g. inequalities are good because you can identify ranges, then do intersections and unions on ranges efficiently etc.
Limit the set of values to a small enough number that you can test them individually against each of the constraints
UPDATE: For the kind of constraints stated in the question (ranges of real values and real-valued functions that can be analytically solved and have a finite number of solutions within any range) I would suggest the following approach:
Write a generating function that can iteratively return solutions for you function within a given range... this will need to be done analytically e.g. exploiting the fact that sin(X)=0 implies X=n*pi where n is any integer.
Do interval arithmetic and bounding on your range constraints to work out the range(s) that need to be scanned (in the example you would want the range 100 < X < 1000)
Apply your generating function to each of the target ranges in order to create all of the possible solutions.
I'll preface my suggestion by stating that I'm no expert in using numerical constraint logic programming systems, but here goes...
On the surface, I'd think that solving this kind of problem in PROLOG would be best suited to a numerical constraint logic programming system, perhaps such as CLP(R) (for reals) in SWI-PROLOG; unfortunately, the specific problem you've asked for is seeking to solve for a set of constraints including a non-linear constraint, which seems to be not well or widely supported amongst PROLOG implementations; instead, they seem to deal mainly with linear constraints and often have limited support for non-linear constraints such as X = sin(Y), for example.
Take SWI-PROLOG's CLP(R) library, and the following example program:
:- use_module(library(clpr)).
report_xsq_zeros :-
findall(X, {0 = (X * X) - 10}, Results),
write_ln(Results).
report_sin_zeros :-
findall(X, {0 = sin(X)}, Results),
write_ln(Results).
Now, executing report_xsq_zeros gives us:
?- report_xsq_zeros.
[3.16228, -3.16228]
true.
Here, the system correctly computed the zeros of the quadratic x^2 - 10, which are indeed approximately 3.16228 and -3.16228, where the range of X was unbounded. However, when we execute report_sin_zeros, we get:
?- report_sin_zeros.
[0.0]
true.
We see that the system only computed a single zero of the function sin(X), even though the range of X was indeed also unbounded. Perhaps this is because it is recognized that there are an infinite number of solutions here (though I'm only guessing...). If we were to program what you've asked for:
report_sin_zeros :-
findall(X, {X > 100, X < 1000, 0 = sin(X)}, Results),
write_ln(Results).
We get no results, as the underlying system only computed a single zero for sin(X) as shown earlier (i.e., binding X to 0.0 which lies outside the stated range):
?- report_sin_zeros.
[]
true.
I conclude that I've either not demonstrated proper usage of SWI-PL CLP(R) (I suggest you look into it yourself), or it won't solve your specific (non-linear) problem. Other CLP(R) implementations may behave differently to SWI-PROLOG CLP(R), but I don't have them installed so I can't check, but you could try SICSTUS CLP(R) or others; the syntax looks similar.
He is searching any X in [100..1000] for that sin(x) = 0. But this is a pure mathematical problem, and not meant for relational logical deduction / backtracking. simple Prolog is not suited for this?

Resources