How to develope prolog parser for mySMPL program? - prolog

I am new in prolog i have a code that i want to read from file then parse this is the code here
var x;
x <- (5 * 2);
return (x + 1).
now in prolog i want to tokenize this like this first
[’var’, ’x’, ’;’, ’x’,’<-’, ’(’, 5, ’*’, 2, ’)’, ’;’, ’return’, ’(’, ’x’, ’+’, 1, ’)’, ’.’]
the i want to implement predicate
parse(+TokenList, -AST)
then again
evaluate(+AST, -Number)
using SWIProlog
The parser should not allow the keywords of the language (e.g., the arithmetic operators, <-,
var, return) as variable identifiers

You can try tokenize_atom, to see if it fits your requirements.
Otherwise, bite the bullet and try to write your own DCG...

Related

Pattern matching using list of characters

I am having difficulty pattern matching words which are converted to lists of characters:
wordworm(H1,H2,H3,V1,V2) :-
word(H1), string_length(H1,7),
word(H2), string_length(H2,5),
word(H3), string_length(H3,4),
word(V1), string_length(V1,4),
word(H3) \= word(V1),
atom_chars(H2, [_,_,Y,_,_]) = atom_chars(V1, [_,_,_,Y]),
word(V2), string_length(V2,5),
word(H2) \= word(V2),
atom_chars(H3, [_,_,_,Y]) = atom_chars(V2, [_,_,_,_,Y]).
Above this section, I have a series of 600 words in the format, word("prolog"). The code runs fine, without the atom_chars, but with it, I get a time-out error. Can anyone suggest a better way for me to structure my code?
Prolog predicate calls are not like function calls in other languages. They do not have "return values".
When you write X = atom_chars(foo, Chars) this does not execute atom_chars. It builds a data structure atom_chars(foo, Chars). It does not "call" this data structure.
If you want to evaluate atom_chars on some atom H2 and then say something about the resulting list, call it like:
atom_chars(H2, H2Chars),
H2Chars = [_,_,Y,_,_]
So overall maybe your code should look more like this:
...,
atom_chars(H2, H2Chars),
H2Chars = [_,_,Y,_,_],
atom_chars(V1, V1Chars),
V1Chars = [_,_,_,Y],
...
Note that you don't need to assert some kind of "equality" between these atom_chars goals. The fact that their char lists share the same variable Y means that there will be a connection: The third character of H2 must be equal to the fourth character of V1.

Parse To Prolog Variables Using DCG

I want to parse a logical expression using DCG in Prolog.
The logical terms are represented as lists e.g. ['x','&&','y'] for x ∧ y the result should be the parse tree and(X,Y) (were X and Y are unassigned Prolog variables).
I implemented it and everything works as expected but I have one problem:
I can't figure out how to parse the variable 'x' and 'y' to get real Prolog variables X and Y for the later assignment of truth values.
I tried the following rule variations:
v(X) --> [X].:
This doesn't work of course, it only returns and('x','y').
But can I maybe uniformly replace the logical variables in this term with Prolog variables? I know of the predicate term_to_atom (which is proposed as a solution for a similar problem) but I don't think it can be used here to achieve the desired result.
v(Y) --> [X], {nonvar(Y)}.:
This does return an unbound variable but of course a new one every time even if the logical variable ('x','y',...) was already in the term so
['X','&&','X'] gets evaluated to and(X,Y) which is not the desired result, either.
Is there any elegant or idiomatic solution to this problem?
Many thanks in advance!
EDIT:
The background to this question is that I'm trying to implement the DPLL-algorithm in Prolog. I thought it would by clever to directly parse the logical term to a Prolog-term to make easy use of the Prolog backtracking facility:
Input: some logical term, e.g T = [x,'&&',y]
Term after parsing: [G_123,'&&',G_456] (now featuring "real" Prolog variables)
Assign a value from { boolean(t), boolean(f) } to the first unbound variable in T.
simplify the term.
... repeat or backtrack until a assignment v is found so that v(T) = t or the search space is depleted.
I'm pretty new to Prolog and honestly couldn't figure out a better approach. I'm very interested in better alternatives! (So I'm kinda half-shure that this is what I want ;-) and thank you very much for your support so far ...)
You want to associate ground terms like x (no need to write 'x') with uninstantiated variables. Certainly that does not constitute a pure relation. So it is not that clear to me that you actually want this.
And where do you get the list [x, &&, x] in the first place? You probably have some kind of tokenizer. If possible, try to associate variable names to variables prior to the actual parsing. If you insist to perform that association during parsing you will have to thread a pair of variables throughout your entire grammar. That is, instead of a clean grammar like
power(P) --> factor(F), power_r(F, P).
you will now have to write
power(P, D0,D) --> factor(F, D0,D1), power_r(F, P, D1,D).
% ^^^^ ^^^^^ ^^^^
since you are introducing context into an otherwise context free grammar.
When parsing Prolog text, the same problem occurs. The association between a variable name and a concrete variable is already established during tokenizing. The actual parser does not have to deal with it.
There are essentially two ways to perform this during tokenization:
1mo collect all occurrences Name=Variable in a list and unify them later:
v(N-V, [N-V|D],D) --> [N], {maybesometest(N)}.
unify_nvs(NVs) :-
keysort(NVs, NVs2),
uniq(NVs2).
uniq([]).
uniq([NV|NVs]) :-
head_eq(NVs, NV).
uniq(NVs).
head_eq([], _).
head_eq([N-V|_],N-V).
head_eq([N1-_|_],N2-_) :-
dif(N1,N2).
2do use some explicit dictionary to merge them early on.
Somewhat related is this question.
Not sure if you really want to do what you asked. You might do it by keeping a list of variable associations so that you would know when to reuse a variable and when to use a fresh one.
This is an example of a greedy descent parser which would parse expressions with && and ||:
parse(Exp, Bindings, NBindings)-->
parseLeaf(LExp, Bindings, MBindings),
parse_cont(Exp, LExp, MBindings, NBindings).
parse_cont(Exp, LExp, Bindings, NBindings)-->
parse_op(Op, LExp, RExp),
{!},
parseLeaf(RExp, Bindings, MBindings),
parse_cont(Exp, Op, MBindings, NBindings).
parse_cont(Exp, Exp, Bindings, Bindings)-->[].
parse_op(and(LExp, RExp), LExp, RExp)--> ['&&'].
parse_op(or(LExp, RExp), LExp, RExp)--> ['||'].
parseLeaf(Y, Bindings, NBindings)-->
[X],
{
(member(bind(X, Var), Bindings)-> Y-NBindings=Var-Bindings ; Y-NBindings=Var-[bind(X, Var)|Bindings])
}.
It parses the expression and returns also the variable bindings.
Sample outputs:
?- phrase(parse(Exp, [], Bindings), ['x', '&&', 'y']).
Exp = and(_G683, _G696),
Bindings = [bind(y, _G696), bind(x, _G683)].
?- phrase(parse(Exp, [], Bindings), ['x', '&&', 'x']).
Exp = and(_G683, _G683),
Bindings = [bind(x, _G683)].
?- phrase(parse(Exp, [], Bindings), ['x', '&&', 'y', '&&', 'x', '||', 'z']).
Exp = or(and(and(_G839, _G852), _G839), _G879),
Bindings = [bind(z, _G879), bind(y, _G852), bind(x, _G839)].

How to check that certain strA is the substring of strB using Prolog?

So basically I use this code to check the substring:
substring(X,S) :- append(_,T,S), append(X,_,T), X \= [].
and my input is this:
substring("cmp", Ins) % Ins is "cmp(eax, 4)"
But when I use swi-prolog to trace this code, I find this:
substring([99, 109, 112], cmp(eax, 4))
and obviously it failed...
So could anyone give me some help?
SWI-Prolog has recently changed the traditional string literals as 'list of codes' to a more memory efficient representation (starting from version 7).
As a consequence (among others more difficult to explain), append/3 doesn't work anymore for your task, unless you convert explicitly to list of codes.
Contextually, many builtins have been introduced, like sub_string/5: for instance, try
?- sub_string("cmp(eax, 4)", Start,Len,Stop, "eax").
Start = Stop, Stop = 4,
Len = 3
Make this string a term of the form cmp(eax, 4). Here, in Prolog lingo, you have:
the term cmp(eax, 4)
with a functor cmp/2
with a first argument the atom eax
and a second argument the integer 4
Now that you have a term, you can use pattern matching in the head of your predicate (unification) to write predicates like:
apply_instruction(cmp(Reg, Operand) /*, other arguments as needed */) :-
/* do the comparison of the contents of _Reg_ and the values in _Operand_ */
apply_instruction(add(Reg, Addend) /*, other arguments */) :-
/* add _Addend_ to _Reg_ */
% and so on
How to make a term out of your input: there are many ways, the easiest would be to read one full line (depends on the Prolog implementation you are using, in SWI-Prolog, assuming you have your input stream in In):
read_line_to_codes(In, Line).
and then use a DCG to parse it. A DCG would look maybe something like:
instruction(cmp(Op1, Op2)) -->
"cmp",
ops(Op1, Op2).
instruction(add(Op1, Op2) -->
"add",
ops(Op1, Op2).
ops(Op1, Op2) -->
space,
op1(Op1), optional_space,
",", optional_space,
op2(Op2),
space_to_eol.
% and so on
You can then use phrase/2 to apply the DCG to the line you have read:
phrase(instruction(Instr), Line).

General-purpose language to specify value constraints

I am looking for a general-purpose way of defining textual expressions which allow a value to be validated.
For example, I have a value which should only be set to 1, 2, 3, 10, 11, or 12.
Its constraint might be defined as: (value >= 1 && value <= 3) || (value >= 10 && value <= 12)
Or another value which can be 1, 3, 5, 7, 9 etc... would have a constraint like value % 2 == 1 or IsOdd(value).
(To help the user correct invalid values, I'd like to show the constraint - so something descriptive like IsOdd is preferable.)
These constraints would be evaluated both on client-side (after user input) and server-side.
Therefore a multi-platform solution would be ideal (specifically Win C#/Linux C++).
Is there an existing language/project which allows evaluation or parsing of similar simple expressions?
If not, where might I start creating my own?
I realise this question is somewhat vague as I am not entirely sure what I am after. Searching turned up no results, so even some terms as a starting point would be helpful. I can then update/tag the question accordingly.
You may want to investigate dependently typed languages like Idris or Agda.
The type system of such languages allows encoding of value constraints in types. Programs that cannot guarantee the constraints will simply not compile. The usual example is that of matrix multiplication, where the dimensions must match. But this is so to speak the "hello world" of dependently typed languages, the type system can do much more for you.
If you end up starting your own language I'd try to stay implementation-independent as long as possible. Look for the formal expression grammars of a suitable programming language (e.g. C) and add special keywords/functions as required. Once you have a formal definition of your language, implement a parser using your favourite parser generator.
That way, even if your parser is not portable to a certain platform you at least have a formal standard from where to start a separate parser implementation.
You may also want to look at creating a Domain Specific Language (DSL) in Ruby. (Here's a good article on what that means and what it would look like: http://jroller.com/rolsen/entry/building_a_dsl_in_ruby)
This would definitely give you the portability you're looking for, including maybe using IronRuby in your C# environment, and you'd be able to leverage the existing logic and mathematical operations of Ruby. You could then have constraint definition files that looked like this:
constrain 'wakeup_time' do
6 <= value && value <= 10
end
constrain 'something_else' do
check (value % 2 == 1), MustBeOdd
end
# constrain is a method that takes one argument and a code block
# check is a function you've defined that takes a two arguments
# MustBeOdd is the name of an exception type you've created in your standard set
But really, the great thing about a DSL is that you have a lot of control over what the constraint files look like.
there are a number of ways to verify a list of values across multiple languages. My preferred method is to make a list of the permitted values and load them into a dictionary/hashmap/list/vector (dependant on the language and your preference) and write a simple isIn() or isValid() function, that will check that the value supplied is valid based on its presence in the data structure. The beauty of this is that the code is trivial and can be implemented in just about any language very easily. for odd-only or even-only numeric validity again, a small library of different language isOdd() functions will suffice: if it isn't odd it must by definition be even (apart from 0 but then a simple exception can be set up to handle that, or you can simply specify in your code documentation that for logical purposes your code evaluates 0 as odd/even (your choice)).
I normally cart around a set of c++ and c# functions to evaluate isOdd() for similar reasons to what you have alluded to, and the code is as follows:
C++
bool isOdd( int integer ){ return (integer%2==0)?false:true; }
you can also add inline and/or fastcall to the function depending on need or preference; I tend to use it as an inline and fastcall unless there is a need to do otherwise (huge performance boost on xeon processors).
C#
Beautifully the same line works in C# just add static to the front if it is not going to be part of another class:
static bool isOdd( int integer ){ return (integer%2==0)?false:true; }
Hope this helps, in any event let me know if you need any further info:)
Not sure if it's what you looking for, but judging from your starting conditions (Win C#/Linux C++) you may not need it to be totally language agnostic. You can implement such a parser yourself in C++ with all the desired features and then just use it in both C++ and C# projects - thus also bypassing the need to add external libraries.
On application design level, it would be (relatively) simple - you create a library which is buildable cross-platform and use it in both projects. The interface may be something simple like:
bool VerifyConstraint_int(int value, const char* constraint);
bool VerifyConstraint_double(double value, const char* constraint);
// etc
Such interface will be usable both in Linux C++ (by static or dynamic linking) and in Windows C# (using P/Invoke). You can have same codebase compiling on both platforms.
The parser (again, judging from what you've described in the question) may be pretty simple - a tree holding elements of types Variable and Expression which can be Evaluated with a given Variable value.
Example class definitions:
class Entity {public: virtual VARIANT Evaluate() = 0;} // boost::variant may be used typedef'd as VARIANT
class BinaryOperation: public Entity {
private:
Entity& left;
Entity& right;
enum Operation {PLUS,MINUS,EQUALS,AND,OR,GREATER_OR_EQUALS,LESS_OR_EQUALS};
public:
virtual VARIANT Evaluate() override; // Evaluates left and right operands and combines them
}
class Variable: public Entity {
private:
VARIANT value;
public:
virtual VARIANT Evaluate() override {return value;};
}
Or, you can just write validation code in C++ and use it both in C# and C++ applications :)
My personal choice would be Lua. The downside to any DSL is the learning curve of a new language and how to glue the code with the scripts but I've found Lua has lots of support from the user base and several good books to help you learn.
If you are after making somewhat generic code that a non programmer can inject rules for allowable input it's going to take some upfront work regardless of the route you take. I highly suggest not rolling your own because you'll likely find people wanting more features that an already made DSL will have.
If you are using Java then you can use the Object Graph Navigation Library.
It enables you to write java applications that can parse,compile and evaluate OGNL expressions.
OGNL expressions include basic java,C,C++,C# expressions.
You can compile an expression that uses some variables, and then evaluate that expression
for some given variables.
An easy way to achieve validation of expressions is to use Python's eval method. It can be used to evaluate expressions just like the one you wrote. Python's syntax is easy enough to learn for simple expressions and english-like. Your expression example is translated to:
(value >= 1 and value <= 3) or (value >= 10 and value <= 12)
Code evaluation provided by users might pose a security risk though as certain functions could be used to be executed on the host machine (such as the open function, to open a file). But the eval function takes extra arguments to restrict the allowed functions. Hence you can create a safe evaluation environment.
# Import math functions, and we'll use a few of them to create
# a list of safe functions from the math module to be used by eval.
from math import *
# A user-defined method won't be reachable in the evaluation, as long
# as we provide the list of allowed functions and vars to eval.
def dangerous_function(filename):
print open(filename).read()
# We're building the list of safe functions to use by eval:
safe_list = ['math','acos', 'asin', 'atan', 'atan2', 'ceil', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp', 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh']
safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ])
# Let's test the eval method with your example:
exp = "(value >= 1 and value <= 3) or (value >= 10 and value <= 12)"
safe_dict['value'] = 2
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation: True
# Test with a forbidden method, such as 'abs'
exp = raw_input("type an expression: ")
-> type an expression: (abs(-2) >= 1 and abs(-2) <= 3) or (abs(-2) >= 10 and abs(-2) <= 12)
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation:
-> Traceback (most recent call last):
-> File "<stdin>", line 1, in <module>
-> File "<string>", line 1, in <module>
-> NameError: name 'abs' is not defined
# Let's test it again, without any extra parameters to the eval method
# that would prevent its execution
print "expression evaluation: ", eval(exp)
-> expression evaluation: True
# Works fine without the safe dict! So the restrictions were active
# in the previous example..
# is odd?
def isodd(x): return bool(x & 1)
safe_dict['isodd'] = isodd
print "expression evaluation: ", eval("isodd(7)", {"__builtins__":None},safe_dict)
-> expression evaluation: True
print "expression evaluation: ", eval("isodd(42)", {"__builtins__":None},safe_dict)
-> expression evaluation: False
# A bit more complex this time, let's ask the user a function:
user_func = raw_input("type a function: y = ")
-> type a function: y = exp(x)
# Let's test it:
for x in range(1,10):
# add x in the safe dict
safe_dict['x']=x
print "x = ", x , ", y = ", eval(user_func,{"__builtins__":None},safe_dict)
-> x = 1 , y = 2.71828182846
-> x = 2 , y = 7.38905609893
-> x = 3 , y = 20.0855369232
-> x = 4 , y = 54.5981500331
-> x = 5 , y = 148.413159103
-> x = 6 , y = 403.428793493
-> x = 7 , y = 1096.63315843
-> x = 8 , y = 2980.95798704
-> x = 9 , y = 8103.08392758
So you can control the allowed functions that should be used by the eval method, and have a sandbox environment that can evaluate expressions.
This is what we used in a previous project I worked in. We used Python expressions in custom Eclipse IDE plug-ins, using Jython to run in the JVM. You could do the same with IronPython to run in the CLR.
The examples I used in part inspired / copied from the Lybniz project explanation on how to run a safe Python eval environment. Read it for more details!
You might want to look at Regular-Expressions or RegEx. It's proven and been around for a long time. There's a regex library all the major programming/script languages out there.
Libraries:
C++: what regex library should I use?
C# Regex Class
Usage
Regex Email validation
Regex to validate date format dd/mm/yyyy

How do I make a function use the altered version of a list in Mathematica?

I want to make a list with its elements representing the logic map given by
x_{n+1} = a*x_n(1-x_n)
I tried the following code (which adds stuff manually instead of a For loop):
x0 = Input["Enter x0"]
a = Input["a"]
M = {x0}
L[n_] := If[n < 1, x0, a*M[[n]]*(1 - M[[n]])]
Print[L[1]]
Append[M, L[1]]
Print[M]
Append[M, L[2]]
Print[M]
The output is as follows:
0.3
2
{0.3}
0.42
{0.3,0.42}
{0.3}
Part::partw: Part 2 of {0.3`} does not exist. >>
Part::partw: Part 2 of {0.3`} does not exist. >>
{0.3, 2 (1 - {0.3}[[2]]) {0.3}[[2]]}
{0.3}
It seems that, when the function definition is being called in Append[M,L[2]], L[2] is calling M[[2]] in the older definition of M, which clearly does not exist.
How can I make L use the newer, bigger version of M?
After doing this I could use a For loop to generate the entire list up to a certain index.
P.S. I apologise for the poor formatting but I could find out how to make Latex code work here.
Other minor question: What are the allowed names for functions and lists? Are underscores allowed in names?
It looks to me as if you are trying to compute the result of
FixedPointList[a*#*(1-#)&, x0]
Note:
Building lists element-by-element, whether you use a loop or some other construct, is almost always a bad idea in Mathematica. To use the system productively you need to learn some of the basic functional constructs, of which FixedPointList is one.
I'm not providing any explanation of the function I've used, nor of the interpretation of symbols such as # and &. This is all covered in the documentation which explains matters better than I can and with which you ought to become familiar.
Mathematica allows alphanumeric (only) names and they must start with a letter. Of course, Mathematic recognises many Unicode characters other than the 26 letters in the English alphabet as alphabetic. By convention (only) intrinsic names start with an upper-case letter and your own with a lower-case.
The underscore is most definitely not allowed in Mathematica names, it has a specific and widely-used interpretation as a short form of the Blank symbol.
Oh, LaTeX formatting doesn't work hereabouts, but Mathematica code is plenty readable enough.
It seems that, when the function definition is being called in
Append[M,L2], L2 is calling M[2] in the older definition of M,
which clearly does not exist.
How can I make L use the newer, bigger version of M?
M is never getting updated here. Append does not modify the parameters you pass to it; it returns the concatenated value of the arrays.
So, the following code:
A={1,2,3}
B=Append[A,5]
Will end up with B={1,2,3,5} and A={1,2,3}. A is not modfied.
To analyse your output,
0.3 // Output of x0 = Input["Enter x0"]. Note that the assignment operator returns the the assignment value.
2 // Output of a= Input["a"]
{0.3} // Output of M = {x0}
0.42 // Output of Print[L[1]]
{0.3,0.42} // Output of Append[M, L[1]]. This is the *return value*, not the new value of M
{0.3} // Output of Print[M]
Part::partw: Part 2 of {0.3`} does not exist. >> // M has only one element, so M[[2]] doesn't make sense
Part::partw: Part 2 of {0.3`} does not exist. >> // ditto
{0.3, 2 (1 - {0.3}[[2]]) {0.3}[[2]]} (* Output of Append[M, L[2]]. Again, *not* the new value of M *)
{0.3} // Output of Print[M]
The simple fix here is to use M=Append[M, L[1]].
To do it in a single for loop:
xn=x0;
For[i = 0, i < n, i++,
M = Append[M, xn];
xn = A*xn (1 - xn)
];
A faster method would be to use NestList[a*#*(1-#)&, x0,n] as a variation of the method mentioned by Mark above.
Here, the expression a*#*(1-#)& is basically an anonymous function (# is its parameter, the & is a shorthand for enclosing it in Function[]). The NestList method takes a function as one argument and recursively applies it starting with x0, for n iterations.
Other minor question: What are the allowed names for functions and lists? Are underscores allowed in names?
No underscores, they're used for pattern matching. Otherwise a variable can contain alphabets and special characters (like theta and all), but no characters that have a meaning in mathematica (parentheses/braces/brackets, the at symbol, the hash symbol, an ampersand, a period, arithmetic symbols, underscores, etc). They may contain a dollar sign but preferably not start with one (these are usually reserved for system variables and all, though you can define a variable starting with a dollar sign without breaking anything).

Resources