Retrieve internal/canonical representation of terms in Prolog - prolog

I am writing a program for which I need terms in their prefix notation.
The point is to being able to parse mathematical expressions to prefix notation, while preserving the correct order of Operations. I then want to save the result in the database for later use (using assert), which includes translating to another language, which uses prefix notation. Prolog Operators do all have a fixed priority which is a feature I want to use, as I will be using all sorts of operators (including clp operators).
As among others I need to include complete mathematical expressions, such as the equality operator. Thus I cannot recursively use the Univ operator (=..), because it won't accept equality operators etc. Or can I somehow use =.. ?
Essentially I want to work with the internal representation of
N is 3*4+5 % just a random example
which would be
is(N,+(*(3,4),5))
Now, I do know that I can use, write_canonical(N is 3*4+5) to get the internal representation as seen above.
So is there a way to somehow get the internal representation as a term or a list, or something.
Would it be possible to bind the output of write_canonical to a variable?
I hope my question is clear enough.

Prolog terms can be despicted as trees. But, when writing a term, the way a term is displayed depends on the defined operators and write options. Consider:
?- (N is 3*4+5) = is(N,+(*(3,4),5)).
true.
?- (N is 3*4+5) = is(Variable, Expression).
N = Variable,
Expression = 3*4+5.
?- 3*4+5 = +(*(3,4),5).
true.
I.e. operators are syntactic sugar. They don't change how terms are represented, only how terms are displayed.

Related

How does = operator works in Prolog

Sorry if this is a newbie question, but recently I was trying to compare an string and I used this (not exactly :P):
some_fact('Yes').
some_fact('No').
some_rule(X):- some_fact(X), (X =:= 'Yes' -> writeln("ISS YES") ; writeln("No")).
Error: Arithmetic: `'Yes'' is not a function
After that, I Googled and saw that Strings are compared with = and \=
But if I write: X = 5 I'm assigning the value 5 to X, well I don't know if the word is assign, cause the assigment operator is is. Right?
Just in case, I don't need to fix the code, I want understand what's happening.
Thanks
I think there is a lot of confusion here and most of it would be remedied by going through a book, but let me try and clear a few things up for you right now.
'Yes' is an atom, not a string. SWI-Prolog has actual strings, but most Prolog implementations do not, they use atoms instead. The nice thing about atoms is that if they are lower case and do not contain spaces there is no need for quotes. The quotes are needed to tell Prolog "this is not a variable" and resolve the syntactic ambiguity of this and that.
Lacking strings, there is no operator for string comparison.
= is the unification operator. Unification is a big topic—not something that is easily summarized in a question, but as an approximation you can think of it as a bi-directional pattern matching. So, it will do the job for what you probably need string comparisons for, but it is the real engine of computation in Prolog and is happening behind the scenes in lots of ways.
Prolog does not have assignment. True, you can give a variable a value. But you cannot change that value later; X = X + 1 is meaningless in mathematics and it's meaningless in Prolog too. In general, you will be working recursively, so you will simply create a new variable when something like this needs to occur. It will make more sense as you get further in reading about Prolog and writing your first programs. Please check out a tutorial!
is/2 resolves arithmetic expressions. If you have X = 2+3, Prolog will reply with X = 2+3. Only X is 2+3 will cause Prolog to report X=5. Arithmetic just isn't a huge part of classic Prolog usage; these days, people will quickly suggest you check out CLPFD, which enables you to do more interesting things like 15 #= X + Y and producing bindings that add up to 15. Standard Prolog cannot "work backwards" like this. But for a complete beginner it probably suffices to say that arithmetic works differently than you expect it to, and differently than the rest of Prolog unless you use CLPFD.
=:= is an arithmetic equality operator. You use this to answer questions like 6 + 1 =:= 5 + 2. This is a really special-purpose tool that I personally have never really needed to use.

How to identify wasteful representations of Prolog terms

What is the Prolog predicate that helps to show wasteful representations of Prolog terms?
Supplement
In a aside of an earlier Prolog SO answer, IIRC by mat, it used a Prolog predicate to analyze a Prolog term and show how it was overly complicated.
Specifically for a term like
[op(add),[[number(0)],[op(add),[[number(1)],[number(1)]]]]]
it revealed that this has to many [].
I have searched my Prolog questions and looked at the answers twice and still can't find it. I also recall that it was not in SWI-Prolog but in another Prolog so instead of installing the other Prolog I was able to use the predicate with an online version of Prolog.
If you read along in the comments you will see that mat identified the post I was seeking.
What I was seeking
I have one final note on the choice of representation. Please try out the following, using for example GNU Prolog or any other conforming Prolog system:
| ?- write_canonical([op(add),[Left,Right]]).
'.'(op(add),'.'('.'(_18,'.'(_19,[])),[]))
This shows that this is a rather wasteful representation, and at the same time prevents uniform treatment of all expressions you generate, combining several disadvantages.
You can make this more compact for example using Left+Right, or make all terms uniformly available using for example op_arguments(add, [Left,Right]), op_arguments(number, [1]) etc.
Evolution of a Prolog data structure
If you don't know it already the question is related to writing a term rewriting system in Prolog that does symbolic math and I am mostly concentrating on simplification rewrites at present.
Most people only see math expressions in a natural representation
x + 0 + sin(y)
and computer programmers realize that most programming languages have to parse the math expression and convert it into an AST before using
add(add(X,0),sin(Y))
but most programming languages can not work with the AST as written above and have to create data structures See: Compiler/lexical analyzer, Compiler/syntax analyzer, Compiler/AST interpreter
Now if you have ever done more than dipped your toe in the water when learning about Prolog you will have come across Program 3.30 Derivative rules, which is included in this, but the person did not give attribution.
If you try and roll your own code to do symbolic math with Prolog you might try using is/2 but quickly find that doesn't work and then find that Prolog can read the following as compound terms
add(add(X,0),sin(Y))
This starts to work until you need to access the name of the functor and find functor/3 but then we are getting back to having to parse the input, however as noted by mat and in "The Art of Prolog" if one were to make the name of the structure accessible
op(add,(op(add,X,0),op(sin,Y)))
now one can access not only the terms of the expression but also the operator in a Prolog friendly way.
If it were not for the aside mat made the code would still be using the nested list data structure and now is being converted to use the compound terms that expose the name of the structure. I wonder if there is a common phrase to describe that, if not there should be one.
Anyway the new simpler data structure worked on the first set of test, now to see if it holds up as the project is further developed.
Try it for yourself online
Using GNU Prolog at tutorialspoint.com enter
:- initialization(main).
main :- write_canonical([op(add),[Left,Right]]).
then click Execute and look at the output
sh-4.3$ gprolog --consult file main.pg
GNU Prolog 1.4.4 (64 bits)
Compiled Aug 16 2014, 23:07:54 with gcc
By Daniel Diaz
Copyright (C) 1999-2013 Daniel Diaz
compiling /home/cg/root/main.pg for byte code...
/home/cg/root/main.pg:2: warning: singleton variables [Left,Right] for main/0
/home/cg/root/main.pg compiled, 2 lines read - 524 bytes written, 9 ms
'.'(op(add),'.'('.'(_39,'.'(_41,[])),[]))| ?-  
Clean vs. defaulty representations
From The Power of Prolog by Markus Triska
When representing data with Prolog terms, ask yourself the following question:
Can I distinguish the kind of each component from its outermost functor and arity?
If this holds, your representation is called clean. If you cannot distinguish the elements by their outermost functor and arity, your representation is called defaulty, a wordplay combining "default" and "faulty". This is because reasoning about your data will need a "default case", which is applied if everything else fails. In addition, such a representation prevents argument indexing, and is considered faulty due to this shortcoming. Always aim to avoid defaulty representations! Aim for cleaner representations instead.
Please see the last part of:
https://stackoverflow.com/a/42722823/1613573
It uses write_canonical/1 to display the canonical representation of a term.
This predicate is very useful when learning Prolog and helps to clear several misconceptions that are typical for beginners. See for example the recent question about hyphens, where it would have helped too.
Note that in SWI, the output deviates from canonical Prolog syntax in general, so I am not using SWI when explaining Prolog syntax.
You could also programmatially count how many subterms are a single-element list using something like this (not optimized);
single_element_list_subterms(Term, Index) :-
Term =.. [Functor|Args],
( Args = []
-> Index = 0
; maplist(single_element_list_subterms, Args, Indices),
sum_list(Indices, SubIndex),
( Functor = '.', Args = [_, []]
-> Index is SubIndex + 1
; Index = SubIndex
)
).
Trying it on the example compound term:
| ?- single_element_list_subterms([op(add),[[number(0)],[op(add),[[number(1)],[number(1)]]]]], Count).
Count = 7
yes
| ?-
Indicating that there are 7 subterms consisting of a single-element list. Here is the result of write_canonical:
| ?- write_canonical([op(add),[[number(0)],[op(add),[[number(1)],[number(1)]]]]]).
'.'(op(add),'.'('.'('.'(number(0),[]),'.'('.'(op(add),'.'('.'('.'(number(1),[]),'.'('.'(number(1),[]),[])),[])),[])),[]))
yes
| ?-

Predicate for removing certain terms from compound term in Prolog

I would like to write a Prolog predicate that takes in a compound term as its argument and outputs this compound term with some of the nested terms removed. For example, let's say that I have a compound term:
outer_term(level_one(level_two_a(X), level_two_b(Y)), level_one(level_two_b(Z))).
And I would like to write a predicate extract_terms/2 which would take this term and returned it without occurences of level_two_a/1.
extract_terms(Term, ExtractedTerm) :-
*** Prolog Magic ***.
Is there a built-in (or semi-built-in) way to do this in Prolog? If not, how would I go about doing this? One way that occurs to me would be to use =../2 operator to convert the Term into a list and then somehow use some built-in predicate like subtract/3 to get rid of the predicates that I want. The trouble I have is making this work with list that has nested terms as its items.
I would appreciate any ideas, thank you.
First, a general guideline:
Everything that can be expressed by pattern matching should be expressed by pattern matching.
You have asked a similar question previously, although it was a bit simpler. Still, let us consider the simpler case first: You said a possible instance of a verb phrase would be:
VP = vp(vp(verb(making), adj(quick), np2(noun(improvements))))
and you want to extract the verb. Well, the simplest approach is to use pattern matching, or more generally, unification, like this:
?- VP = vp(vp(verb(making), adj(quick), np2(noun(improvements)))),
VP = vp(vp(Verb, _, _)).
This yields:
Verb = verb(making).
Thus, we have successfully "extracted" verb(making) from such a phrase by virtue of unification.
Now to the slightly more complex task you are considering in this question: At this point, you may wonder whether you have chosen a good representation of your data. Frequent use or even the very necessity of (=..)/2 typically indicates a problem with your representation, since it may mean that you have lost track or control of the possible shapes of your data.
In this concrete case, you state as an example:
outer_term(level_one(level_two_a(X), level_two_b(Y)), level_one(level_two_b(Z))).
and you want to remove occurrences of level_two_a. You can now of course begin to mess with (=..)/2, which requires a conversion of such terms to lists, then some reasoning on these lists, and a second conversion from lists back to such structures. That's not how we want to work with our data. In addition to other drawbacks, it would preclude more general usage patterns that we expect from relations.
Instead, let us fix the data representation so that we can cleanly distinguish the different cases. For example, instead of "hardcoding" the very parameter we need to distinguish the cases inside of a functor, let us make the distinction explicit: We want to be able to distinguish, by pattern matching, level 1 from level 2.
So, the following representation suggests itself:
outer_term([level(1, [level(2, X),
level(2, Y)]),
level(1, [level(2, Z)])]).
This may need some additional attributes, such as a and b, and I leave extending this representation to represent such attributes as an easy exercise. The general idea should be clear though: We have thus achieved a uniform representation about which we can easily reason symbolically.
It is now easy to describe the relation between a (potentially nested) list of such levels and the levels without the "level 2" elements:
without_level_2(Ls0, Ls) :-
phrase(no_level_2(Ls0), Ls).
no_level_2([]) --> [].
no_level_2([L|Ls]) -->
no_level_2_(L),
no_level_2(Ls).
no_level_2_(level(2,_)) --> [].
no_level_2_(level(L,Ls0)) --> [level(L,Ls)],
{ dif(L, 2),
without_level_2(Ls0, Ls) }.
See dcg for more information about this formalism.
Sample query:
?- outer_term(Ts0),
without_level_2(Ts0, Ts).
Yielding:
Ts = [level(1, []), level(1, [])] .
Note that to truly benefit from this representation, you need to obtain it in the first place by using or generating terms of such shapes. Once you have this ensured, you can conveniently stick to pattern matching to distinguish the cases. Among the main benefits of this approach we find convenience, performance and generality. For example, we can use the DCG shown above not only to extract but also to generate terms of this form:
?- length(Ls0, _), without_level_2(Ls0, Ls).
Ls0 = Ls, Ls = [] ;
Ls0 = [level(2, _56)],
Ls = [] ;
Ls0 = Ls, Ls = [level(_130, [])],
dif(_130, 2) ;
Ls0 = [level(_150, [level(2, _164)])],
Ls = [level(_150, [])],
dif(_150, 2) .
This is truly a relation, usable in all directions. For this reason, I have avoided imperative names like "extract", "remove" etc. in the predicate name, because these always suggest a particular direction of use, not doing justice to the generality of the predicate.

Identify similar expressions?

How can I validate two expressions that are logically equivalent ?
For e.g :
(a+b) <=> (b+a) or (a+(b+c)) <=> ((a+b)+c) or (a && b) <=> (b && a) ,etc
I want an optimize solution for it, that will identify the duplicate one from 1000s of expressions.
Contrary to a comment by #japreiss, there is no open research question here. In the context of propositional expressions like the examples given, logical equivalence is very well understood. (I'm assuming OP is using + to mean logical OR rather than numeric addition)
A question for the OP: are you interested in doing this by hand or automating it via programming code?
There are multiple approaches but the one taught in most intro to logic courses and text books would be to construct separate truth tables for the left-hand side and right-hand side, and see if the final values on each row are identical.
If you have a computer system that can already evaluate the truth of an expression like (a+(b+c)) and ((a+b)+c), then that same system can probably be given the whole expression (a+(b+c)) <=> ((a+b)+c) and can tell you whether or not it is a tautology.
With regard to comparing 1,000's of expressions to find duplicates, one technique is to convert each expression to conjunctive normal form and then just use string comparison to see which ones are identical.

Efficiently store and evaluate a large number of boolean expressions

I have a huge set (20000) of boolean expressions. They consist of AND, OR and NOT operators and a large number of boolean variables A1, A2, A3 ... (about 1000). Most expression contain only 5, maybe 20 of these variables.
Given an assignment of the variables (A1 = true, A2 = false, A3 = false ...) I have to find those expressions that evaluate to false.
The same set of expressions will be evaluated for multiple (10-100) assignments
For this purpose:
How should I store the expressions on disk so I can load and parse them fast (I currently have them either as some specialized DSL or as a more or less normalized (and dead slow) relational data structure, but I can change that)
Is there a fast algorithm / data structure for evaluating such expressions that I can use?
Do implementations on the JVM exist?
You may want to look at converting your expressions into Conjunctive Normal Form and combining like terms. You then can have a two-way mapping of an expression to a set of terms, any of which evaluating to false implies that the whole expression evaluates false. For each assignment of variables, start with a set of expressions, evaluate CNF terms until one evaluates to false. If that term is false, then all expressions involving that term will also be false, so those expressions can also be removed from the set.
Whether such an approach fits your case can't be said without looking at the expressions - with 1000 variables and 20000 expressions, it might not be that they have many CNF terms in common.
Outside of Java, and for much larger numbers of expressions, DNF is possibly more useful, since its implementation on the GPU is obvious.
The SOP answer to this is to store the expressions as strings in RPN (Reverse Polish Notation) and then write a simple Stack Machine parser to evaluate them.
Generally, an RPN string can be evaluated almost as fast as an already in-memory AST (Abstract Symbol Tree). And the stack machine parser is dead easy to write.
You seem attached to Java, but have you considered feeding these things to a language that has an eval() function? It would probably reduce the problem to saving an expression in a file and evaluating it. Note that if you don't trust the (source of the) expressions, this has security implications!
Jython comes to mind, but there are probably several that would make very short work of this.
If you're married to java, you could probably implement a recursive descent parser for boolean algebra. But that's quite a bit more involved.
UPDATE: The following site has code that might help.
Convert your list of expressions into source code for a function that when called with the value of the variables will evaluate all the functions and return an indication of which expressions evaluate to false. compile the function then call it for your different variable values.
I have done similar and used Python. The only parsing and interpretation I had to write was to translate the input boolean operators, '&', '|', '~' into their Python equivalents.
Your problem size seems quite OK for a Python solution.
You could build an index where for each variable you record two sets of expressions, those where the variable occurs positively and those where it occurs negatively. Depending on the values of the variables you collect those expressions which could become false due to this variable (positive occurrences if the variables is set to false and vice versa). Edit: These are just candidates, you still need to evaluate them to find out if they really become false.
Whether this helps compared to just evaluating all your expressions depends on the structure of your expressions and how many evaluate to false.
Try to convert them into CNF and use MiniSat to check whether the expression evaluates to true or false

Resources