General-purpose language to specify value constraints - validation

I am looking for a general-purpose way of defining textual expressions which allow a value to be validated.
For example, I have a value which should only be set to 1, 2, 3, 10, 11, or 12.
Its constraint might be defined as: (value >= 1 && value <= 3) || (value >= 10 && value <= 12)
Or another value which can be 1, 3, 5, 7, 9 etc... would have a constraint like value % 2 == 1 or IsOdd(value).
(To help the user correct invalid values, I'd like to show the constraint - so something descriptive like IsOdd is preferable.)
These constraints would be evaluated both on client-side (after user input) and server-side.
Therefore a multi-platform solution would be ideal (specifically Win C#/Linux C++).
Is there an existing language/project which allows evaluation or parsing of similar simple expressions?
If not, where might I start creating my own?
I realise this question is somewhat vague as I am not entirely sure what I am after. Searching turned up no results, so even some terms as a starting point would be helpful. I can then update/tag the question accordingly.

You may want to investigate dependently typed languages like Idris or Agda.
The type system of such languages allows encoding of value constraints in types. Programs that cannot guarantee the constraints will simply not compile. The usual example is that of matrix multiplication, where the dimensions must match. But this is so to speak the "hello world" of dependently typed languages, the type system can do much more for you.

If you end up starting your own language I'd try to stay implementation-independent as long as possible. Look for the formal expression grammars of a suitable programming language (e.g. C) and add special keywords/functions as required. Once you have a formal definition of your language, implement a parser using your favourite parser generator.
That way, even if your parser is not portable to a certain platform you at least have a formal standard from where to start a separate parser implementation.

You may also want to look at creating a Domain Specific Language (DSL) in Ruby. (Here's a good article on what that means and what it would look like: http://jroller.com/rolsen/entry/building_a_dsl_in_ruby)
This would definitely give you the portability you're looking for, including maybe using IronRuby in your C# environment, and you'd be able to leverage the existing logic and mathematical operations of Ruby. You could then have constraint definition files that looked like this:
constrain 'wakeup_time' do
6 <= value && value <= 10
end
constrain 'something_else' do
check (value % 2 == 1), MustBeOdd
end
# constrain is a method that takes one argument and a code block
# check is a function you've defined that takes a two arguments
# MustBeOdd is the name of an exception type you've created in your standard set
But really, the great thing about a DSL is that you have a lot of control over what the constraint files look like.

there are a number of ways to verify a list of values across multiple languages. My preferred method is to make a list of the permitted values and load them into a dictionary/hashmap/list/vector (dependant on the language and your preference) and write a simple isIn() or isValid() function, that will check that the value supplied is valid based on its presence in the data structure. The beauty of this is that the code is trivial and can be implemented in just about any language very easily. for odd-only or even-only numeric validity again, a small library of different language isOdd() functions will suffice: if it isn't odd it must by definition be even (apart from 0 but then a simple exception can be set up to handle that, or you can simply specify in your code documentation that for logical purposes your code evaluates 0 as odd/even (your choice)).
I normally cart around a set of c++ and c# functions to evaluate isOdd() for similar reasons to what you have alluded to, and the code is as follows:
C++
bool isOdd( int integer ){ return (integer%2==0)?false:true; }
you can also add inline and/or fastcall to the function depending on need or preference; I tend to use it as an inline and fastcall unless there is a need to do otherwise (huge performance boost on xeon processors).
C#
Beautifully the same line works in C# just add static to the front if it is not going to be part of another class:
static bool isOdd( int integer ){ return (integer%2==0)?false:true; }
Hope this helps, in any event let me know if you need any further info:)

Not sure if it's what you looking for, but judging from your starting conditions (Win C#/Linux C++) you may not need it to be totally language agnostic. You can implement such a parser yourself in C++ with all the desired features and then just use it in both C++ and C# projects - thus also bypassing the need to add external libraries.
On application design level, it would be (relatively) simple - you create a library which is buildable cross-platform and use it in both projects. The interface may be something simple like:
bool VerifyConstraint_int(int value, const char* constraint);
bool VerifyConstraint_double(double value, const char* constraint);
// etc
Such interface will be usable both in Linux C++ (by static or dynamic linking) and in Windows C# (using P/Invoke). You can have same codebase compiling on both platforms.
The parser (again, judging from what you've described in the question) may be pretty simple - a tree holding elements of types Variable and Expression which can be Evaluated with a given Variable value.
Example class definitions:
class Entity {public: virtual VARIANT Evaluate() = 0;} // boost::variant may be used typedef'd as VARIANT
class BinaryOperation: public Entity {
private:
Entity& left;
Entity& right;
enum Operation {PLUS,MINUS,EQUALS,AND,OR,GREATER_OR_EQUALS,LESS_OR_EQUALS};
public:
virtual VARIANT Evaluate() override; // Evaluates left and right operands and combines them
}
class Variable: public Entity {
private:
VARIANT value;
public:
virtual VARIANT Evaluate() override {return value;};
}
Or, you can just write validation code in C++ and use it both in C# and C++ applications :)

My personal choice would be Lua. The downside to any DSL is the learning curve of a new language and how to glue the code with the scripts but I've found Lua has lots of support from the user base and several good books to help you learn.
If you are after making somewhat generic code that a non programmer can inject rules for allowable input it's going to take some upfront work regardless of the route you take. I highly suggest not rolling your own because you'll likely find people wanting more features that an already made DSL will have.

If you are using Java then you can use the Object Graph Navigation Library.
It enables you to write java applications that can parse,compile and evaluate OGNL expressions.
OGNL expressions include basic java,C,C++,C# expressions.
You can compile an expression that uses some variables, and then evaluate that expression
for some given variables.

An easy way to achieve validation of expressions is to use Python's eval method. It can be used to evaluate expressions just like the one you wrote. Python's syntax is easy enough to learn for simple expressions and english-like. Your expression example is translated to:
(value >= 1 and value <= 3) or (value >= 10 and value <= 12)
Code evaluation provided by users might pose a security risk though as certain functions could be used to be executed on the host machine (such as the open function, to open a file). But the eval function takes extra arguments to restrict the allowed functions. Hence you can create a safe evaluation environment.
# Import math functions, and we'll use a few of them to create
# a list of safe functions from the math module to be used by eval.
from math import *
# A user-defined method won't be reachable in the evaluation, as long
# as we provide the list of allowed functions and vars to eval.
def dangerous_function(filename):
print open(filename).read()
# We're building the list of safe functions to use by eval:
safe_list = ['math','acos', 'asin', 'atan', 'atan2', 'ceil', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp', 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh']
safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ])
# Let's test the eval method with your example:
exp = "(value >= 1 and value <= 3) or (value >= 10 and value <= 12)"
safe_dict['value'] = 2
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation: True
# Test with a forbidden method, such as 'abs'
exp = raw_input("type an expression: ")
-> type an expression: (abs(-2) >= 1 and abs(-2) <= 3) or (abs(-2) >= 10 and abs(-2) <= 12)
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation:
-> Traceback (most recent call last):
-> File "<stdin>", line 1, in <module>
-> File "<string>", line 1, in <module>
-> NameError: name 'abs' is not defined
# Let's test it again, without any extra parameters to the eval method
# that would prevent its execution
print "expression evaluation: ", eval(exp)
-> expression evaluation: True
# Works fine without the safe dict! So the restrictions were active
# in the previous example..
# is odd?
def isodd(x): return bool(x & 1)
safe_dict['isodd'] = isodd
print "expression evaluation: ", eval("isodd(7)", {"__builtins__":None},safe_dict)
-> expression evaluation: True
print "expression evaluation: ", eval("isodd(42)", {"__builtins__":None},safe_dict)
-> expression evaluation: False
# A bit more complex this time, let's ask the user a function:
user_func = raw_input("type a function: y = ")
-> type a function: y = exp(x)
# Let's test it:
for x in range(1,10):
# add x in the safe dict
safe_dict['x']=x
print "x = ", x , ", y = ", eval(user_func,{"__builtins__":None},safe_dict)
-> x = 1 , y = 2.71828182846
-> x = 2 , y = 7.38905609893
-> x = 3 , y = 20.0855369232
-> x = 4 , y = 54.5981500331
-> x = 5 , y = 148.413159103
-> x = 6 , y = 403.428793493
-> x = 7 , y = 1096.63315843
-> x = 8 , y = 2980.95798704
-> x = 9 , y = 8103.08392758
So you can control the allowed functions that should be used by the eval method, and have a sandbox environment that can evaluate expressions.
This is what we used in a previous project I worked in. We used Python expressions in custom Eclipse IDE plug-ins, using Jython to run in the JVM. You could do the same with IronPython to run in the CLR.
The examples I used in part inspired / copied from the Lybniz project explanation on how to run a safe Python eval environment. Read it for more details!

You might want to look at Regular-Expressions or RegEx. It's proven and been around for a long time. There's a regex library all the major programming/script languages out there.
Libraries:
C++: what regex library should I use?
C# Regex Class
Usage
Regex Email validation
Regex to validate date format dd/mm/yyyy

Related

Filter an collection of tuples

I'm playing with iterables and comprehension in Julia and tried to code simple problem: find all pairs of numbers less then 10 whose product is less then 10. This was my first try:
solution = filter((a,b)->a*b<10, product(1:10, 1:10))
collect(solution)
but I got error "wrong number of arguments". This is kind of expected because anonymous function inside filter expects two arguments but it gets one tuple.
I know I can do
solution = filter(p->p[1]*p[2]<10, product(1:10, 1:10))
but it doesn't look nice as the one above. Is there a way I can tell that (a,b) is argument of type tuple and use something similar to syntax in first example?
I don't think there's a way to do exactly as you'd like, but here are some alternatives you could consider for the anonymous function:
x->let (a,b)=x; a*b<10 end
x->((a,b)=x; a*b<10)
These can of course be made into macros if you like:
macro tup(ex)
#assert ex.head == :(->)
#assert ex.args[1].head == :tuple
arg = gensym()
quote
$arg -> ( $(ex.args[1]) = $arg; $(ex.args[2]) )
end
end
Then #tup (a, b) -> a * b < 10 will do as you like.
Metaprogramming in Julia is pretty useful and common for situations where you are doing something over and over and would like specialized syntax for it. But I would avoid this kind of metaprogramming if this were a one-off thing, because adding new syntax means learning new syntax and makes code harder to read.

Haskell debugging an arbitrary lambda expression

I have a set of lambda expressions which I'm passing to other lambdas. All lambdas rely only on their arguments, they don't call any outside functions. Of course, sometimes it gets quite confusing and I'll pass an function with the incorrect number of arguments to another, creating a GHCi exception.
I want to make a debug function which will take an arbitrary lambda expression (with an unknown number of arguments) and return a string based on the structure and function of the lambda.
For example, say I have the following lambda expressions:
i = \x -> x
k = \x y -> x
s = \x y z -> x z (y z)
debug (s k) should return "\a b -> b"
debug (s s k) should return "\a b -> a b a" (if I simplified that correctly)
debug s should return "\a b c -> a c (b c)"
What would be a good way of doing this?
I think the way to do this would be to define a small lambda calculus DSL in Haskell (or use an existing implementation). This way, instead of using the native Haskell formulation, you would write something like
k = Lam "x" (Lam "y" (App (Var "x") (Var "y")))
s = Lam "x" (Lam "y" (Lam "z" (App (App (Var "x") (Var "z")
(App (Var "y") (Var "z"))))
and similarly for s and i. You would then write/use an evaluation function so that you could write
debug e = eval e
debug (App s k)
which would give you the final form in your own syntax. Additionally you would need a sort of interpreter to convert your DSL syntax to Haskell, so that you can actually use the functions in your code.
Implementing this does seem like quite a lot of (tricky) work, and it's probably not exactly what you had in mind (especially if you need the evaluation for typed syntax), but I'm sure it would be a great learning experience. A good reference would be chapter 6 of "Write you a Haskell". Using an existing implementation would be a lot easier (but less fun :)).
If this is merely for debugging purposes you might benefit from looking at the core syntax ghc compiles to. See chapter 25 of Real world Haskell, the ghc flag to use is -ddump-simpl. But this would mean looking at generated code rather than generating a representation inside your program. I'm also not sure to what extent you would be able to identify specific functions in the Core code easily (I have no experience with this so YMMV).
It would of course be pretty cool if using show on functions would give the kind of output you describe but there are probably very good reasons functions are not an instance of Show (I wouldn't be able to tell you).
You can actually achieve that by utilising pretty-printing from Template Haskell, which comes with GHC out of the box.
First, the formatting function should be defined in separate module (that's a TH restriction):
module LambdaPrint where
import Control.Monad
import Language.Haskell.TH.Ppr
import Language.Haskell.TH.Syntax
showDef :: Name -> Q Exp
showDef = liftM (LitE . StringL . pprint) . reify
Then use it:
{-# LANGUAGE TemplateHaskell #-}
import LambdaPrint
y :: a -> a
y = \a -> a
$(return []) --workaround for GHC 7.8+
test = $(showDef 'y)
The result is more or less readable, not counting fully qualified names:
*Main> test
"Main.y :: forall a_0 . a_0 -> a_0"
Few words about what's going on. showDef is a macro function which reifies the definition of some name from the environment and pretty-prints it in a string literal expression. To use it, you need to quote the name of the lambda (using ') and splice the result (which is a quoted string expression) into some expression (using $(...)).

Conventions on creating constants in Python

I am writing an application which needs to find out the schema of a database, across engines. To that end, I am writing a small database adapter using Python. I decided to first write a base class that outlines the functionality I need, and then implement it using classes that inherit from this base. Along the way, I need to implement some constants which need to be accessible across all these classes. Some of these constants need to be combined using C-style bitwise OR.
My question is,
what is the standard way of sharing such constants?
what is the right way to create constants that can be combined? I am referring to MAP_FIXED | MAP_FILE | MAP_SHARED style code that C allows.
For the former, I came across threads where all the constants were put into a module first. For the latter, I briefly thought of using a dict of booleans. Both of these seemed too unwieldly. I imagine that this is a fairly common requirement, and think some good way must indeed exist!
what is the standard way of sharing such constants?
Throughout the standard library, the most common way is to define constants as module-level variables using UPPER_CASE_WITH_UNDERSCORES names.
what is the right way to create constants that can be combined? I am referring to MAP_FIXED | MAP_FILE | MAP_SHARED style code that C allows.
The same rules as in C apply. You have to make sure that each constant value corresponds to a single, unique bit, i.e. powers of 2 (2, 4, 8, 16, ...).
Most of the time, people use hex numbers for this:
OPTION_A = 0x01
OPTION_B = 0x02
OPTION_C = 0x04
OPTION_D = 0x08
OPTION_E = 0x10
# ...
Some prefer a more human-readable style, computing the constant values dynamically using shift operators:
OPTION_A = 1 << 0
OPTION_B = 1 << 1
OPTION_C = 1 << 2
# ...
In Python, you could also use binary notation to make this even more obvious:
OPTION_A = 0b00000001
OPTION_B = 0b00000010
OPTION_C = 0b00000100
OPTION_D = 0b00001000
But since this notation is lengthy and hard to read, using hex or binary shift notation is probably preferable.
Constants generally go at the module level. From PEP 8:
Constants
Constants are usually defined on a module level and written in all capital letters with underscores separating words. Examples include MAX_OVERFLOW and TOTAL.
If you want constants at class level, define them as class properties.
Stdlib is a great source of knowledge example of what you want can be found in doctest code:
OPTIONS = {}
# A function to add (register) an option.
def register_option(name):
return OPTIONS.setdefault(name, 1 << len(OPTIONS))
# A function to test if an option exist.
def has_option(options, name):
return bool(options & name)
# All my option defined here.
FOO = register_option('FOO')
BAR = register_option('BAR')
FOOBAR = register_option('FOOBAR')
# Test if an option figure out in `ARG`.
ARG = FOO | BAR
print has_option(ARG, FOO)
# True
print has_option(ARG, BAR)
# True
print has_option(ARG, FOOBAR)
# False
N.B: The re module also use bit-wise argument style too, if you want another example.
You often find constants at global level, and they are one of the few variables that exist up there. There are also people who write Constant namespaces using dicts or objects like this
class Const:
x = 33
Const.x
There are some people who put them in modules and others that attach them as class variables that instances access. Most of the time its personal taste, but just a few global variables can't really hurt that much.
Naming is usually UPPERCASE_WITH_UNDERSCORE, and they are usually module level but occasionally they live in their own class. One good reason to be in a class is when the values are special -- such as needing to be powers of two:
class PowTwoConstants(object):
def __init__(self, items):
self.names = items
enum = 1
for name in items:
setattr(self, name, enum)
enum <<= 1
constants = PowTwoConstants('ignore_case multiline newline'.split())
print constants.newline # prints 4
If you want to be able to export those constants to module level (or any other namespace) you can add the following to the class:
def export(self, namespace):
for name in self.names:
setattr(namespace, name, getattr(self, name))
and then
import sys
constants.export(sys.modules[__name__])

Is Odersky serious with "bills !*&^%~ code!"?

In his book programming in scala (Chapter 5 Section 5.9 Pg 93)
Odersky mentioned this expression "bills !*&^%~ code!"
In the footnote on same page:
"By now you should be able to figure out that given this code,the Scala compiler would
invoke (bills.!*&^%~(code)).!()."
That's a bit to cryptic for me, could someone explain what's going on here?
What Odersky means to say is that it would be possible to have valid code looking like that. For instance, the code below:
class BadCode(whose: String, source: String) {
def ! = println(whose+", what the hell do you mean by '"+source+"'???")
}
class Programmer(who: String) {
def !*&^%~(source: String) = new BadCode(who, source)
}
val bills = new Programmer("Bill")
val code = "def !*&^%~(source: String) = new BadCode(who, source)"
bills !*&^%~ code!
Just copy&paste it on the REPL.
The period is optional for calling a method that takes a single parameter, or has an empty parameter list.
When this feature is utilized, the next chunk after the space following the method name is assumed to be the single parameter.
Therefore,
(bills.!*&^%~(code)).!().
is identical to
bills !*&^%~ code!
The second exclamation mark calls a method on the returned value from the first method call.
I'm not sure if the book provides method signatures but I assume it's just a comment on Scala's syntactic sugar so it assumes if you type:
bill add monkey
where there is an object bill which has a method add which takes a parameter then it automatically interprets it as:
bill.add(monkey)
Being a little Scala rusty, I'm not entirely sure how it splits code! into (code).!() except for a vague tickling of the grey cells that the ! operator is used to fire off an actor which in compiler terms might be interpretted as an implicit .!() method on the object.
The combination of the '.()' being optional with method calls (as Wysawyg explained above) and the ability to use (almost) whatever characters you like for naming methods, makes it possible to write methods in Scala that look like operator overloading. You can even invent your own operators.
For example, I have a program that deals with 3D computer graphics. I have my own class Vector for representing a 3D vector:
class Vector(val x: Double, val y: Double, val z: Double) {
def +(v: Vector) = new Vector(x + v.x, y + v.y, z + v.z)
// ...etc.
}
I've also defined a method ** (not shown above) to compute the cross product of two vectors. It's very convenient that you can create your own operators like that in Scala, not many other programming languages have this flexibility.

List of Scala's "magic" functions

Where can I find a list of Scala's "magic" functions, such as apply, unapply, update, +=, etc.?
By magic-functions I mean functions which are used by some syntactic sugar of the compiler, for example
o.update(x,y) <=> o(x) = y
I googled for some combination of scala magic and synonyms of functions, but I didn't find anything.
I'm not interested with the usage of magic functions in the standard library, but in which magic functions exists.
As far as I know:
Getters/setters related:
apply
update
identifier_=
Pattern matching:
unapply
unapplySeq
For-comprehensions:
map
flatMap
filter
withFilter
foreach
Prefixed operators:
unary_+
unary_-
unary_!
unary_~
Beyond that, any implicit from A to B. Scala will also convert A <op>= B into A = A <op> B, if the former operator isn't defined, "op" is not alphanumeric, and <op>= isn't !=, ==, <= or >=.
And I don't believe there's any single place where all of Scala's syntactic sugars are listed.
In addition to update and apply, there are also a number of unary operators which (I believe) qualify as magical:
unary_+
unary_-
unary_!
unary_~
Add to that the regular infix/suffix operators (which can be almost anything) and you've got yourself the complete package.
You really should take a look at the Scala Language Specification. It is the only authoritative source on this stuff. It's not that hard to read (as long as you're comfortable with context-free grammars), and very easily searchable. The only thing it doesn't specify well is the XML support.
Sorry if it's not exactly answering your question, but my favorite WTF moment so far is # as assignment operator inside pattern match. Thanks to soft copy of "Programming in Scala" I found out what it was pretty quickly.
Using # we can bind any part of a pattern to a variable, and if the pattern match succeeds, the variable will capture the value of the sub-pattern. Here's the example from Programming in Scala (Section 15.2 - Variable Binding):
expr match {
case UnOp("abs", e # UnOp("abs", _)) => e
case _ =>
}
If the entire pattern match succeeds,
then the portion that matched the
UnOp("abs", _) part is made available
as variable e.
And here's what Programming Scala says about it.
That link no longer works. Here is one that does.
I'll also add _* for pattern matching on an arbitrary number of parameters like
case x: A(_*)
And operator associativity rule, from Odersky-Spoon-Venners book:
The associativity of an operator in Scala is determined by its last
character. As mentioned on <...>, any method that ends
in a ‘:’ character is invoked on its right operand, passing in the
left operand. Methods that end in any other character are the other
way around. They are invoked on their left operand, passing in the
right operand. So a * b yields a.*(b), but a ::: b yields b.:::(a).
Maybe we should also mention syntactic desugaring of for expressions which can be found here
And (of course!), alternative syntax for pairs
a -> b //converted to (a, b), where a and b are instances
(as correctly pointed out, this one is just an implicit conversion done through a library, so it's probably not eligible, but I find it's a common puzzler for newcomers)
I'd like to add that there is also a "magic" trait - scala.Dynamic:
A marker trait that enables dynamic invocations. Instances x of this trait allow method invocations x.meth(args) for arbitrary method names meth and argument lists args as well as field accesses x.field for arbitrary field names field.
If a call is not natively supported by x (i.e. if type checking fails), it is rewritten according to the following rules:
foo.method("blah") ~~> foo.applyDynamic("method")("blah")
foo.method(x = "blah") ~~> foo.applyDynamicNamed("method")(("x", "blah"))
foo.method(x = 1, 2) ~~> foo.applyDynamicNamed("method")(("x", 1), ("", 2))
foo.field ~~> foo.selectDynamic("field")
foo.varia = 10 ~~> foo.updateDynamic("varia")(10)
foo.arr(10) = 13 ~~> foo.selectDynamic("arr").update(10, 13)
foo.arr(10) ~~> foo.applyDynamic("arr")(10)
As of Scala 2.10, defining direct or indirect subclasses of this trait is only possible if the language feature dynamics is enabled.
So you can do stuff like
import scala.language.dynamics
object Dyn extends Dynamic {
def applyDynamic(name: String)(a1: Int, a2: String) {
println("Invoked " + name + " on (" + a1 + "," + a2 + ")");
}
}
Dyn.foo(3, "x");
Dyn.bar(3, "y");
They are defined in the Scala Language Specification.
As far as I know, there are just three "magic" functions as you mentioned.
Scalas Getter and Setter may also relate to your "magic":
scala> class Magic {
| private var x :Int = _
| override def toString = "Magic(%d)".format(x)
| def member = x
| def member_=(m :Int){ x = m }
| }
defined class Magic
scala> val m = new Magic
m: Magic = Magic(0)
scala> m.member
res14: Int = 0
scala> m.member = 100
scala> m
res15: Magic = Magic(100)
scala> m.member += 99
scala> m
res17: Magic = Magic(199)

Resources