How to deal with unary minus and exponentiation in an expression parser - expression

I know that exponentiation has higher precedence that the unary minus. However if I build an expression parser based on that I still can’t parse expressions like 2—-3. In order to deal with these I’ve found I also need to add unary minus handling to the factor production rule that is one precedence higher than exponentiation. Is this how the unary minus and exponetiation is usually dealt with? I’ve not found anything online or in books that talks about this particular situation. I was wondering whether making exponentiation and unary operators having equal precedence you help?
I'm hand crafting a recursive descent parser, I tried merging the power and unary production rules together but it didn't seem to work. What does work is the following EBNF
factor = '(' expression ')' | variable | number | '-' factor
power = factor { '^' factor }
unaryTerm = ['-' | '+'] power
term = unaryTerm { factorOp unaryTerm }
expression = term { termOp term }
termOp = '+' | '-'
factorOp = '*' | '/'

Unless you have unusual requirements, putting both unary minus and exponentiation in the same non-terminal will work fine, because exponentiation is right-associative: (Yacc/bison syntax)
atom: ID
| '(' expr ')'
factor
: atom
| '-' factor
| atom '^' factor
term: factor
| term '*' factor
expr: term
| expr '+' term
| expr '-' term
Indeed, exponentiation being right-associative is virtually required for this syntax to be meaningful. Consider the alternative, with a left-associative operator.
Let's say we have two operators, ⊕ and ≀, with ⊕ being left associative and binding more tightly than ≀, so that ≀ a ⊕ b is ≀(a ⊕ b).
Since ⊕ is left associative, we would expect a ⊕ b ⊕ c to be parsed as (a ⊕ b) ⊕ c. But then we get an oddity. Is a ⊕ ≀ b ⊕ c the same as (a ⊕ ≀b) ⊕ c) or the same as a ⊕ ≀(b ⊕ c))? Both options seem to violate the simple patterns. [Note 1]
Certainly, an unambiguous grammar could be written for each case, but which one would be less surprising to a programmer who was just going by the precedence chart? The most likely result would be a style requirement that ≀ expressions always be fully parenthesized, even if the parentheses are redundant. (C style guides are full of such recommendations, and many compilers will chide you for using correct but "unintuitive" expressions.)
Notes:
If you use precedence declarations, you'll get a ⊕ ≀(b ⊕ c)), which might or might not be intuitive, depending on your intuitions.

Related

SWI-Prolog: Write predicate union(A,B,C) in form C = A ∪ B

Is their some way in SWI-Prolog to write predicates with three variables for example union(A,B,C) in the following form C = A ∪ B. For predicates with two variables I know their are operators to do that, but I am not sure if their is something similar in that case.
No.
Not directly. Prolog only supports defining unary operators (prefix/suffix operators such as -- 32 or 32 ++, both of which correspond to '--'/1 or '++'/1) and infix operators (e.g. X is Y which corresponds to is/2).
If you look at the operator definitions and precedences, you would need to define your union operator as an infix operator with a precedence of less than 700.
Then, reading a term like x = y ∪ z would yield '='( x , '∪'(y,z) ).
Another way to do it would be to write a DCG (definite clause grammar) to parse the text as desired. See this tutorial: https://www.metalevel.at/prolog/dcg

Is it possible to represent a context-free grammar with first-order logic?

Briefly, I have a EBNF grammar and so a parse-tree, but I do not know if there is a procedure to translate it in First Order Logic.
For example:
DR ::= E and P
P ::= B | (and P)* | (or P)*
B ::= L | P (and L P)
L ::= a
Yes, there is. The general pattern for translating a production of the form
A ::= B C ... D
is to paraphrase is declaratively as saying
A sequence of terminals s is an A (or: A generates the sequence s, if you prefer that formulation) if:
s is the concatenation of s_1, s_2, ... s_n, and
s_1 is a B / B generates the sequence s_1, and
s_2 is a C / C generates the sequence s_2, and
...
s_n is a D / D generates the sequence s_n.
Assuming we write these in the obvious way using a generates predicate, and that we can write concatenation using a || operator, your first rule becomes (if I am right to guess that E and P are non-terminals and "and" is a terminal symbol) something like
generates(DR,s) ⊃ generates(E,s1)
∧ generates(and,s2)
∧ generates(P,s3)
∧ s = s1 || s2 || s3
To establish the consequent (i.e. prove that s is an A), prove the antecedents. As long as the grammar does actually generate some sentences, and as long as you have some premises defining the "generates" relation for terminal symbols, the proof will be straightforward.
Prolog definite-clause grammars are a beautiful instantiation of this pattern. It takes some of us a while to understand and appreciate the use of difference lists in DCGs, but they handle the partitioning of s into subsequences and the association of the subsequences with the different parts of the right hand side much more elegantly than the simple translation into logic given above.

Prolog 'is/2' predicate implementation

How is the 'is/2' Prolog predicate implemented?
I know that
X is 3*4
is equivalent with
is(X, 3*4)
But is the predicate implemented using imperative programming?
In other words, is the implementation equivalent with the following C code?
if(uninstantiated(x))
{
X = 3*4;
}
else
{
//signal an error
}
Or is it implemented using declarative programming and other predicates?
Depends on your Prolog, obviously, but any practical implementation will do its dirty work in C or another imperative language. Part of is/2 can be simulated in pure Prolog:
is(X, Expr) :-
evaluate(Expr, Value),
(var(X) ->
X = Value
;
X =:= Value
).
Where evaluate is a huge predicate that knows about arithmetic expressions. There are ways to implement large parts of it in pure Prolog too, but that will be both slow and painful. E.g. if you have a predicate that adds integers, then you can multiply them as well using the following (stupid) algorithm:
evaluate(X + Y, Value) :-
% even this can be done in Prolog using an increment predicate,
% but it would take O(n) time to do n/2 + n/2.
add(X, Y, Value).
evaluate(X * Y, Value) :-
(X == 0 ->
Value = 0
;
evaluate(X + -1, X1),
evaluate(X1, Y, Value1),
evaluate(Y + Value1, Value)
).
None of this is guaranteed to be either practical or correct; I'm just showing how arithmetic could be implemented in Prolog.
Would depend on the version of Prolog; for example, CProlog is (unsurprisingly) written in C, so all built-in predicates are implemented in a imperative language.
Prolog was developed for language parsing. So, a arithmetic expression like
3 + - ( 4 * 12 ) / 2 + 7
after parsing is just a prolog term (representing the parse tree), with operator/3 providing the semantics to guide the parser's operation. For basic arithmetic expressions, the terms are
'-'/2. Negation
'*'/2, '/'/2. Multiplication, division
'+'/2, '-'/2. Addition, subtraction
The sample expression above is parsed as
'+'( '+'( 3 , '/'( '-'( '*'(4,12) ) , 2 ) ) , 7 )
'is'/2 simply does a recursive walk of the parse tree representing the right hand side, evaluating each term in pretty much the same way an RPN (reverse polish notation) calculator does. Once that expression is evaluated, the result is unified with the left hand side.
Each basic operation — add, subtract, multiply, divide, etc. — has to be done in machine code, so at the end of the day, some machine code routine is being invoked to compute the result of each elemental operation.
Whether is/2 is written entirely in native code or written mostly in prolog, with just the leaf operations written in native code, is pretty much an implementation choice.

About Prolog syntax

Sometimes I see terms like:
X = a:b
or
X = a-b
I can do requests like
X = Y:Z
and the compiler unifies Y with a and Z with b, as expected.
Now my answer:
Which characters (or sequence of characters) am I allowed to use to combine two Prolog atoms?!
Maybe you can give me some links with further informations about this issue.
Thanks for your help and kind regards from Germany
Which characters (or sequence of characters) am I allowed to use to combine two Prolog atoms?!
What you are asking here for, is the entire operator syntax definition of Prolog. To get the very full answer to this, please refer to the tag iso-prolog for full information how to obtain the Prolog standard ISO/IEC 13211-1.
But as a short answer to start with:
Prolog syntax consists of
functional notation, like +(a,b), plus
a dynamically redefinable operator syntax, plus
some extra.
It seems you want to know which "characters" can be used as operators.
The short answer is that you can use all atoms Op that succeed for current_op(Pri,Fix,Op). So you can ask dynamically, which operators are present:
?- current_op(Pri, Fix, Op).
Pri = 1, Fix = fx, Op = ($)
; Pri = 1150, Fix = fx, Op = (module_transparent)
; Pri = 700, Fix = xfx, Op = (=#=)
; Pri = 700, Fix = xfx, Op = (#>=)
; Pri = 700, Fix = xfx, Op = (>=)
; ... .
All those operators can be used in the specified manner, as pre-, in-, or postfix with the indicated priorities. Some of these operators are specific to SWI, and some are defined by the standard. Above, only #>= and >= are standard operators.
Most of the operators consist of the graphic characters #$&*+-./:<=>?#^~ only or of letters, digits and underscores starting with a lower case letter. There are two solo characters !; and then there are ,| which are even more special. Operator names that are different to above need quoting - you rarely will encounter them.
To see how operators nest, use write_canonical(Term).
The long answer is that you are also able to define such operators yourself. However, be aware that changing the operator syntax has often many implications that are very difficult to fathom. Even more so, since many systems differ in some rarely used configurations. For example, the system you mentioned, SWI differs in several ways.
I'd suggest to avoid defining new operators until you have learned more about the Prolog language.
let's see what's inside X = Y:Z
?- display( X = Y:Z ).
=(_G3,:(_G1,_G2))
true.
then we have a nested structure, where functors are operators.
An operator is an atom, and the rule for atom syntax says that we have 3 kind to consider:
a sequence of any printable character enclosed in single quote
a sequence of special characters only, where a special character is one of `.=:-+*/><##~? (I hope I have found all of them, from this page you can check if I forgot someone !)
a sequence of lowercase/uppercase characters or the underscore, starting with a lowercase character
edit
A functor (shorthand for function constructor, I think, but function is misleading in Prolog context) it's the symbol that 'ties' several arguments. The number of arguments is named arity. In Prolog a term is an atomic literal (like a number, or an atom), or a recursive structure, composed of a functor and a number of arguments, each being a term itself (at least 1).
Given the appropriate declaration, i.e. op/3, unary and binary terms can be represented as expressions, like that one you show.
An example of operator, using the : special char, is ':-'
member(X,[X|_]).
member(X,[_|T]) :- member(X, T).
The O.P., said (and I quote):
Sometimes I see terms like: X = a:b or X = a-b
I can do requests like X = Y:Z and the compiler unifies Y with a and Z with b, as expected.
Now my answer: Which characters (or sequence of characters) am I allowed
to use to combine two Prolog atoms?!
The short answer is Pretty much whatever you want (provided it is an atom).
The longer answer is this:
What are seeing are infix (x infix_op b), prefix (pfx_op b) and suffix (b sfx_op ) operators. Any structure with an arity of 2 can be an infix operator. Any structure with an arity of 1 can be a prefix or suffix operator. As a result, any atom may be an operator.
Prolog is parsed via a precedence driven, recursive descent parser (written in Prolog, naturally). Operators are defined and enumerated, along with their precedence and associativity in the operator/3 predicate. Associativity has to do with how the parse tree is constructed. An expression like a - b - c could be parsed as ( a - ( b - c ) ) (right-associative), or ( ( a - b ) - c ) (left-associative).
Precedence has to do with how tightly operators bind. An expression like a + b * c binds as ( a + ( b * c ) not because of associativity, but because '*'/2 (multiplication) has higher precedence that '+'/2 (addition).
You can add, remove and change operators to your heart's content. Not that this gives you a lot of room to shoot yourself in the foot by breaking prolog's syntax.
It should be noted, however, that any operator expression can also be written via ordinary notation:
a + b * c
is exactly identical to
'+'( a , '*'(b,c) )

symbolic computation

My problem: symbolic expression manipulation.
A symbolic expression is built starting from integer constants and variable with the help of operators like +, -, *, /, min,max. More exactly I would represent an expression in the following way (Caml code):
type sym_expr_t =
| PlusInf
| MinusInf
| Const of int
| Var of var_t
| Add of sym_expr_t * sym_expr_t
| Sub of sym_expr_t * sym_expr_t
| Mul of sym_expr_t * sym_expr_t
| Div of sym_expr_t * sym_expr_t
| Min of sym_expr_t * sym_expr_t
| Max of sym_expr_t * sym_expr_t
I imagine that in order to perform useful and efficient computation (eg. a + b - a = 0 or a + 1 > a) I need to have some sort of normal form and operate on it. The above representation will probably not work too good.
Can someone point me out how I should approach this? I don't necessary need code. That can be written easily if I know how. Links to papers that present representations for normal forms and/or algorithms for construction/ simplification/ comparison would also help.
Also, if you know of an Ocaml library that does this let me know.
If you drop out Min and Max, normal forms are easy: they're elements of the field of fractions on your variables, I mean P[Vars]/Q[Vars] where P, Q are polynomials. For Min and Max, I don't know; I suppose the simplest way is to consider them as if/then/else tests, and make them float to the top of your expressions (duplicating stuff in the process), for example P(Max(Q,R)) would be rewritten into P(if Q>R then Q else R), and then in if Q>R then P(Q) else P(R).
I know of two different ways to find normal forms for your expressions expr :
Define rewrite rules expr -> expr that correspond to your intuition, and show that they are normalizing. That can be done by directing the equations that you know are true : from Add(a,Add(b,c)) = Add(Add(a,b),c) you will derive either Add(a,Add(b,c)) -> Add(Add(a,b),c) or the other way around. But then you have an equation system for which you need to show Church-Rosser and normalization; dirty business indeed.
Take a more semantic approach of giving a "semantic" of your values : an element in expr is really a notation for a mathematical object that lives in the type sem. Find a suitable (unique) representation for objects of sem, then an evaluation function expr -> sem, then finally (if you wish to, but you don't need to for equality checking for example) a reification sem -> expr. The composition of both transformations will naturally give you a normalization procedure, without having to worry for example about direction of the Add rewriting (some arbitrary choice will arise naturally from your reification function). For example, for polynomial fractions, the semantic space would be something like:
.
type sem = poly * poly
and poly = (multiplicity * var * degree) list
and multiplicity = int
and degree = int
Of course, this is not always so easy. I don't see right know what representation give to a semantic space with Min and Max functions.
Edit: Regarding external libraries, I don't know any and I'm not sure there are. You should maybe look for bindings to other symbolic algebra software, but I haven't heard of it (there was a Jane Street Summer Project about that a few years ago, but I'm not sure there was any deliverable produced).
If you need that for a production application, maybe you should directly consider writing the binding yourselves, eg. to Sage or Maxima. I don't know what it would be like.
The usual approach to such a problem is:
Start with a string, such a as "a + 1 > a"
Go through a lexer, and separate your input into distinct tokens: [Variable('a'); Plus; Number(1); GreaterThan; Variable('a')]
Parse the tokens into a syntax tree (what you have now). This is where you use the operator precedence rules: Max( Add( Var('a'), Const(1)), Var('a'))
Make a function that can interpret the syntax tree to obtain your final result
let eval_expr expr = match expr with
| Number n -> n
| Add a b -> (eval_expr a) + (eval_expr b)
...
Pardon the syntax, I haven't used Ocaml in a while.
About libraries, I don't remember any out of the top of my mind, but there certainly are good ones easily available - this is the kind of task that the FP community loves doing.

Resources