I've developed a program which generates insurance quotes using different types of coverages based on state criteria. Now I want to add the ability to specify 'rules'. For example we may have 3 types of coverage (we'll call them UM, BI, and PD). Well some states don't allow PD to be greater than BI and other states don't allow UM to exist without BI. So I've added the ability for the user to create these rules so that when the quote is generated the rule will be followed and thus no state regulations will be violated when the program generates the quote.
The Problem
I don't want the user to be able to select conflicting rules. The user can select any of the VB mathematical operators (>, <, >=, <=, =, <>) and set a coverage on either side. They can do this multiple times (but only one at a time) so they might end up with a list of rules like this:
A > B
B > C
C > A
As you can see, the last rule conflicts with the previously set rules. My solution to this was to validate the list each time the user clicks 'Add rule to list'.
Pretend the 3rd list item is not yet in the list but the user has clicked 'add rule' to put it in the list. The validation process first checks to see if both incoming variables have already been used on the same line. If not, it just searches for the left side incoming variable (in this case 'C') in the already created list. if it finds it, it then sets tmp1 equal to the variable across from the match (tmp1 = 'B'). It then does the same for the incoming variable on the right side (in this case 'A'). Then tmp2 is set equal to the variable across from A (tmp2 = 'B'). If tmp1 and tmp2 are equal then the incoming rule is either conflicting OR is irrelevant regardless of the operators used. I'm pretty sure this is solid logic given 3 variables. However, I found that adding any additional variables could easily bypass my validation. There could be upwards of 10 coverage types in any given state so it is important to be able to validate more than just 3.
Is there any uniform way to do a sound validation given any number of variables? Any ideas or thoughts are appreciated. I hope my explanation makes sense.
Thanks
My best bet is some sort of hierarchical tree of rules. When the user adds the first rule (say A > B), the application could create a data structure like this (lowerValues is a Map which the key leads to a list of values):
lowerValues['A'] = ['B']
Now when the user adds the next rule (B > C), the application could check if B is already in a any lowerValues list (in this case, A). If that happens, C is added to lowerValues['A'], and lowerValues['B'] is also created:
lowerValues['A'] = ['B', 'C']
lowerValues['B'] = ['C']
Finally, when the last rule is provided by the user (C > A), the application checks if C is in any lowerValues list. Since it's in B and A, the rule is invalid.
Hope that helps. I don't remember if there's some sort of mapping in VB. I think you should try the Dictionary object.
In order to this idea works out, all the operations must be internally translated to a simple type. So, for example:
A > B
could be translated as
B <= A
Good luck
In general this is a pretty hard problem. What you in fact want to know is if a set of propositional equations over (apparantly) some set of arithmetic is true. To do this you need what amounts to constraint solvers that "know" arithmetic. Not likely to find that in VB6, but you might be able to invoke one as a subprocess.
If the rules are propositional equations only over inequalities (AA", write them only one way).
Second, try solving the propositions for tautology (see for Wang's algorithm which you can likely implment awkwardly in VB6).
If the propositions are not a tautology, now you want build chains of inequalities (e.g, A > B > C) as a graph and look for cycles. The place this fails is when your propositions have disjunctions, e.g., ("A>B or B>Q"); you'll have to generate an inequality chain for each combination of disjunctions, and discard the inconsistent ones. If you discard all of them, the set is inconsistent. Watch out for expressions like "A and B"; by DeMorgans theorem, they're equivalent to "not A or not B", e.g., "A>B and B>Q" is the same as "A<=B or B<=Q". You might want to reduce the conditions to disjunctive normal form to avoid getting suprised.
There are apparantly decision procedures for such inequalities. They're likely hard to implement.
Related
I'm trying to write a Prolog program which does the following:
I have some relations defined in the Relations list. (For example: [f1,s1] means f1 needs s1) Depending on what features(f1,f2,f3) are selected in the TargetFeat list, I would like to create Result list using constraint programming.
Here is a sample code:
Relations =[[f1, s1], [f2, s2], [f3, s3], [f3, s4]],
TargetFeat = [f3, f1],
Result = [],
member(f3,TargetFeat) #= member(s3,Result), %One of the constraints
labeling(Result).
This doesn't work because #= works only with arithmetic expressions as operands. What are the alternatives to achieve something like this ?
There are many possible ways to model such dependencies with constraints. I consider in this post CLP(FD) and CLP(B) constraints, because they are most commonly used for solving combinatorial tasks.
Consider first CLP(FD), which is more frequently used and more convenient in many ways. When using CLP(FD) constraints, you again have several options to represent your task. However, no matter which model you eventually choose, you must first switch all items in your representation to suitable entitites that the constraint solver can actually reason about. In the case of CLP(FD), this means switching your entities to integers.
Translating your entities to corresponding integers is very straight-forward, and it is one of the reasons why CLP(FD) constraints also suffice to model tasks over domains that actually do not contain integers, but can be mapped to integers. So, let us suppose you are not reasoning about features f1, f2 and f3, but about integers 0, 1, and 2, or any other set of integers that suits you.
You can directly translate your requirements to this new domain. For example, instead of:
[f1,s1] means: f1 needs s1
we can say for example:
0 -> 3 means: 0 needs 3
And this brings us already very close to CLP(FD) constraints that let us model the whole problem. We only need to make one more mental leap to obtain a representation that lets us model all requirements. Instead of concrete integers, we now use CLP(FD) variables to indicate whether or not a specific requirement must be met to obtain the desired features. We shall use the variables R1, R2, R3, ... to denote which requirements are needed, by using either 0 (not needed) or 1 (needed) for each of the possible requirements.
At this point, you must develop a clear mental model of what you actually want to describe. I explain what I have in mind: I want to describe a relation between three things:
a list Fs of features
a list Ds of dependencies between features and requirements
a list Rs of requirements
We have already considered how to represent all these entitites: (1) is a list of integers that represent the features we want to obtain. (2) is a list of F -> R pairs that mean "feature F needs requirement R", and (3) is a list of Boolean variables that indicate whether or not each requirement is eventually needed.
Now let us try to relate all these entitites to one another.
First things first: If no features are desired, it all is trivial:
features_dependencies_requirements([], _, _).
But what if a feature is actually desired? Well, it's simple: We only need to take into account the dependencies of that feature:
features_dependencies_requirements([F|Fs], Ds, Rs) :-
member(F->R, Ds),
so we have in R the requirement of feature F. Now we only need to find the suitable variable in Rs that denotes requirement R. But how do we find the right variable? After all, a Prolog variable "does not have a bow tie", or—to foreigners—lacks a mark by which we could distinguish it from others. So, at this point, we would actually find it convenient to be able to nicely pick a variable out of Rs given the name of its requirement. Let us hence suppose that we represent Rs as a list of pairs of the form I=R, where I is the integer that defines the requirement, and R is the Boolean indicator that denotes whether that requirement is needed. Given this representation, we can define the clause above in its entirety as follows:
features_dependencies_requirements([F|Fs], Ds, Rs) :-
member(F->I, Ds),
member(I=1, Rs),
features_dependencies_requirements(Fs, Ds, Rs).
That's it. This fully relates a list of features, dependencies and requirements in such a way that the third argument indicates which requirements are necessary to obtain the features.
At this point, the attentive reader will see that no CLP(FD) constraints whatsoever were actually used in the code above, and in fact the translation of features to integers was completely unnecessary. We can as well use atoms to denote features and requirements, using the exact same code shown above.
Sample query and answers:
?- features_dependencies_requirements([f3,f1],
[f1->s1,f2->s2,f3->s3,f3->s4],
[s1=S1,s2=S2,s3=S3,s4=S4]).
S1 = S3, S3 = 1 ;
S1 = S4, S4 = 1 ;
false.
Obviously, I have made the following assumption: The dependencies are disjunctive, which means that the feature can be implemented if at least one of the requirements is satisifed. If you want to turn this into a conjunction, you will obviously have to change this. You can start by representing dependencies as F -> [R1,R2,...R_n].
Other than that, can it still be useful to translate your entitites do integers? Yes, because many of your constraints can likely be formulated also with CLP(FD) constraints, and you need integers for this to work.
To get you started, here are two ways that may be usable in your case:
use constraint reification to express what implies what. For example: F #==> R.
use global constraints like table/2 that express relations.
Particularly in the first case, CLP(B) constraints may also be useful. You can always use Boolean variables to express whether a requirement must be met.
Not a solution but some observations that would not fit a comment.
Don't use lists to represent relations. For example, instead of [f1, s1], write requires(f1, s1). If these requirement are fixed, then define requires/2 as a predicate. If you need to identify or enumerate features, consider a feature/1 predicate. For example:
feature(f1).
feature(f2).
...
Same for s1, s2, ... E.g.
support(s1).
support(s2).
...
Often we would like to refactor a context-free grammar to remove left-recursion. There are numerous algorithms to implement such a transformation; for example here or here.
Such algorithms will restructure a grammar regardless of the presence of left-recursion. This has negative side-effects, such as producing different parse trees from the original grammar, possibly with different associativity. Ideally a grammar would only be transformed if it was absolutely necessary.
Is there an algorithm or tool to identify the presence of left recursion within a grammar? Ideally this might also classify subsets of production rules which contain left recursion.
There is a standard algorithm for identifying nullable non-terminals, which runs in time linear in the size of the grammar (see below). Once you've done that, you can construct the relation A potentially-starts-with B over all non-terminals A, B. (In fact, it's more normal to construct that relationship over all grammatical symbols, since it is also used to construct FIRST sets, but in this case we only need the projection onto non-terminals.)
Having done that, left-recursive non-terminals are all A such that A potentially-starts-with+ A, where potentially-starts-with+ is:
potentially-starts-with ∘ potentially-starts-with*
You can use any transitive closure algorithm to compute that relation.
For reference, to detect nullable non-terminals.
Remove all useless symbols.
Attach a pointer to every production, initially at the first position.
Put all the productions into a workqueue.
While possible, find a production to which one of the following applies:
If the left-hand-side of the production has been marked as an ε-non-terminal, discard the production.
If the token immediately to the right of the pointer is a terminal, discard the production.
If there is no token immediately to the right of the pointer (i.e., the pointer is at the end) mark the left-hand-side of the production as an ε-non-terminal and discard the production.
If the token immediately to the right of the pointer is a non-terminal which has been marked as an ε-non-terminal, advance the pointer one token to the right and return the production to the workqueue.
Once it is no longer possible to select a production from the work queue, all ε-non-terminals have been identified.
Just for fun, a trivial modification of the above algorithm can be used to do step 1. I'll leave it as an exercise (it's also an exercise in the dragon book). Also left as an exercise is the way to make sure the above algorithm executes in linear time.
XPath 2.0 has some new functions and syntax, relative to 1.0, that work with sequences. Some of theset don't really add to what the language could already do in 1.0 (with node sets), but they make it easier to express the desired logic in ways that are more readable. This increases the chances of the programmer getting the code correct -- and keeping it that way. For example,
empty(s) is equivalent to not(s), but its intent is much clearer when you want to test whether a sequence is empty.
Correction: the effective boolean value of a sequence is in general more complicated than that. E.g. empty((0)) != not((0)). This applies to exists(s) vs. s in a boolean context as well. However, there are domains of s where empty(s) is equivalent to not(s), so the two could be used interchangeably within those domains. But this goes to show that the use of empty() can make a non-trivial difference in making code easier to understand.
Similarly, exists(s) is equivalent to boolean(s) that already existed in XPath 1.0 (or just s in a boolean context), but again is much clearer about the intent.
Quantified expressions; e.g. "some $x in expression satisfies test($x)" would be equivalent to boolean(expression[test(.)]) (although the new syntax is more flexible, in that you don't need to worry about losing the context item because you have the variable to refer to it by).
Similarly, "every $x in expression satisfies test($x)" would be equivalent to not(expression[not(test(.))]) but is more readable.
These functions and syntax were evidently added at no small cost, solely to serve the goal of writing XPath that is easier to map to how humans think. This implies, as experienced developers know, that understandable code is significantly superior to code that is difficult to understand.
Given all that ... what would be a clear and readable way to write an XPath test expression that asks
Does value X occur in sequence S?
Some ways to do it: (Note: I used X and S notation here to indicate the value and the sequence, but I don't mean to imply that these subexpressions are element name tests, nor that they are simple expressions. They could be complicated.)
X = S: This would be one of the most unreadable, since it requires the reader to
think about which of X and S are sequences vs. single values
understand general comparisons, which are not obvious from the syntax
However, one advantage of this form is that it allows us to put the topic (X) before the comment ("is a member of S"), which, I think, helps in readability.
See also CMS's good point about readability, when the syntax or names make the "cardinality" of X and S obvious.
index-of(S, X): This one is clear about what's intended as a value and what as a sequence (if you remember the order of arguments to index-of()). But it expresses more than we need to: it asks for the index, when all we really want to know is whether X occurs in S. This is somewhat misleading to the reader. An experienced developer will figure out what's intended, with some effort and with understanding of the context. But the more we rely on context to understand the intent of each line, the more understanding the code becomes a circular (spiral) and potentially Sisyphean task! Also, since index-of() is designed to return a list of all the indexes of occurrences of X, it could be more expensive than necessary: a smart processor, in order to evaluate X = S, wouldn't necessarily have to find all the contents of S, nor enumerate them in order; but for index-of(S, X), correct order would have to be determined, and all contents of S must be compared to X. One other drawback of using index-of() is that it's limited to using eq for comparison; you can't, for example, use it to ask whether a node is identical to any node in a given sequence.
Correction: This form, used as a conditional test, can result in a runtime error: Effective boolean value is not defined for a sequence of two or more items starting with a numeric value. (But at least we won't get wrong boolean values, since index-of() can't return a zero.) If S can have multiple instances of X, this is another good reason to prefer form 3 or 6.
exists(index-of(X, S)): makes the intent clearer, and would help the processor eliminate the performance penalty if the processor is smart enough.
some $m in S satisfies $m eq X: This one is very clear, and matches our intent exactly. It seems long-winded compared to 1, and that in itself can reduce readability. But maybe that's an acceptable price for clarity. Keep in mind that X and S could potentially be complex expressions themselves -- they're not necessarily just variable references. An advantage is that since the eq operator is explicit, you can replace it with is or any other comparison operator.
S[. eq X]: clearer than 1, but shares the semantic drawbacks of 2: it computes all members of S that are equal to X. Actually, this could return a false negative (incorrect effective boolean value), if X is falsy. E.g. (0, 1)[. eq 0] returns 0 which is falsy, even though 0 occurs in (0, 1).
exists(S[. eq X]): Clearer than 1, 2, 3, and 5. Not as clear as 4, but shorter. Avoids the drawbacks of 5 (or at least most of them, depending on the processor smarts).
I'm kind of leaning toward the last one, at this point: exists(S[. eq X])
What about you... As a developer coming to a complex, unfamiliar XSLT or XQuery or other program that uses XPath 2.0, and wanting to figure out what that program is doing, which would you find easiest to read?
Apologies for the long question. Thanks for reading this far.
Edit: I changed = to eq wherever possible in the above discussion, to make it easier to see where a "value comparison" (as opposed to a general comparison) was intended.
For what it's worth, if names or context make clear that X is a singleton, I'm happy to use your first form, X = S -- for example when I want to check an attribute value against a set of possible values:
<xsl:when test="#type = ('A', 'A+', 'A-', 'B+')" />
or
<xsl:when test="#type = $magic-types"/>
If I think there is a risk of confusion, then I like your sixth formulation. The less frequently I have to remember the rules for calculating an effective boolean value, the less frequently I make a mistake with them.
I prefer this one:
count(distinct-values($seq)) eq count(distinct-values(($x, $seq)))
When $x is itself a sequence, this expression implements the (value-based) subset of relation between two sets of values, that are represented as sequences. This implementation of subset of has just linear time complexity -- vs many other ways of expressing this, that have O(N^2)) time complexity.
To summarize, the question whether a single value belongs to a set of values is a special case of the question whether one set of values is a subset of another. If we have a good implementation of the latter, we can simply use it for answering the former.
The functx library has a nice implementation of this function, so you can use
functx:is-node-in-sequence($X, $Y)
(this particular function can be found at http://www.xqueryfunctions.com/xq/functx_is-node-in-sequence.html)
The whole functx library is available for both XQuery (http://www.xqueryfunctions.com/) and XSLT (http://www.xsltfunctions.com/)
Marklogic ships the functx library with their core product; other vendors may also.
Another possibility, when you want to know whether node X occurs in sequence S, is
exists((X) intersect S)
I think that's pretty readable, and concise. But it only works when X and the values in S are nodes; if you try to ask
exists(('bob') intersect ('alice', 'bob'))
you'll get a runtime error.
In the program I'm working on now, I need to compare strings, so this isn't an option.
As Dimitri notes, the occurrence of a node in a sequence is a question of identity, not of value comparison.
The application I'm working on is a "configurator" of sorts. It's written in C# and I even wrote a rules engine to go with it. The idea is that there are a bunch of propositional logic statements, and the user can make selections. Based on what they've selected, some other items become required or completely unavailable.
The propositional logic statements generally take the following forms:
A => ~X
ABC => ~(X+Y)
A+B => Q
A(~(B+C)) => ~Q A <=> B
The symbols:
=> -- Implication
<=> -- Material Equivalence
~ -- Not
+ -- Or
Two letters side-by-side -- And
I'm very new to Prolog, but it seems like it might be able to handle all of the "rules processing" for me, allowing me to get out of my current rules engine (it works, but it's not as fast or easy to maintain as I would like).
In addition, all of the available options fall in a hierarchy. For instance:
Outside
Color
Red
Blue
Green
Material
Wood
Metal
If an item at the second level (feature, such as Color) is implied, then an item at the third level (option, such as Red) must be selected. Similarly if we know that a feature is false, then all of the options under it are also false.
The catch is that every product has it's own set of rules. Is it a reasonable approach to set up a knowledge base containing these operators as predicates, then at runtime start buliding all of the rules for the product?
The way I imagine it might work would be to set up the idea of components, features, and options. Then set up the relationships between then (for instance, if the feature is false, then all of its options are false). At runtime, add the product's specific rules. Then pass all of the user's selections to a function, retrieving as output which items are true and which items are false.
I don't know all the implications of what I'm asking about, as I'm just getting into Prolog, but I'm trying to avoid going down a bad path and wasting lots of time in the process.
Some questions that might help target what I'm trying to find out:
Does this sound do-able?
Am I barking up the wrong tree?
Are there any drawbacks or concerns to trying to create all of these rules at runtime?
Is there a better system for this kind of thing out there that I might be able to squeeze into a C# app (Silverlight, to be exact)?
Are there other competing systems that I should examine?
Do you have any general advice about this sort of thing?
Thanks in advance for your advice!
Sure, but Prolog has a learning curve.
Rule-based inference is Prolog's game, though you may have to rewrite many rules into Horn clauses. A+B => Q is doable (it becomes q :- a. q :- b. or q :- (a;b).) but your other examples must be rewritten, including A => ~X.
Depends on your Prolog compiler, specifically whether it supports indexing for dynamic predicates.
Search around for terms like "forward checking", "inference engine" and "business rules". Various communities keep inventing different terminologies for this problem.
Constraint Handling Rules (CHR) is a logic programming language, implemented as a Prolog extension, that is much closer to rule-based inference/forward chaining/business rules engines. If you want to use it, you'll still have to learn basic Prolog, though.
Keep in mind that Prolog is a programming language, not a silver bullet for logical inference. It cuts some corners of first-order logic to keep things efficiently computable. This is why it only handles Horn clauses: they can be mapped one-to-one with procedures/subroutines.
You can also throw in DCGs to generate bill of materials. The idea is
roughly that terminals can be used to indicate subproducts, and
non-terminals to define more and more complex combinations of a subproducts
until you arrive at your final configurable products.
Take for example the two attribute value pairs Color in {red, blue, green}
and Material in {wood, metal}. These could specify a door knob, whereby
not all combinations are possible:
knob(red,wood) --> ['100101'].
knob(red,metal) --> ['100102'].
knob(blue,metal) --> ['100202'].
You could then define a door as:
door ... --> knob ..., panel ...
Interestingly you will not see any logic formula in such a product specification,
only facts and rules, and a lot of parameters passed around. You can use the
parameters in a knowledge acquisition component. By just running uninstantiated
goals you can derive possible values for the attribute value pairs. The predicate
setof/3 will sort and removen duplicates for you:
?- setof(Color,Material^Bill^knob(Color,Material,Bill,[]),Values).
Value = [blue, red]
?- setof(Material,Color^Bill^knob(Color,Material,Bill,[]),Values).
Material = [metal, wood]
Now you know the range of the attributes and you can let the end-user successively
pick an attribute and a value. Assume he takes the attribute Color and its value blue.
The range of the attribute Material then shrinks accordingly:
?- setof(Material,Bill^knob(blue,Material,Bill,[]),Values).
Material = [metal]
In the end when all attributes have been specified you can read off the article
numbers of the subproducts. You can use this for price calculation, by adding some
facts that give you additional information on the article numbers, or to generate
ordering lists etc..:
?- knob(blue,metal,Bill,[]).
Bill = ['100202']
Best Regards
P.S.:
Oh it seems that the bill of materials idea used in the product configurator
goes back to Clocksin & Mellish. At least I find a corresponding
comment here:
http://www.amzi.com/manuals/amzi/pro/ref_dcg.htm#DCGBillMaterials
I'm looking for an algorithm to help me build 2D patterns based on rules. The idea is that I could write a script using a given site of parameters, and it would return a random, 2-dimensional sequence up to a given length.
My plan is to use this to generate image patterns based on rules. Things like image fractals or sprites for game levels could possibly use this.
For example, lets say that you can use A, B, C, & D to create the pattern. The rule is that C and A can never be next to each other, and that D always follows C. Next, lets say I want a pattern of size 4x4. The result might be the following which respects all the rules.
A B C D
B B B B
C D B B
C D C D
Are there any existing libraries that can do calculations like this? Are there any mathematical formulas I can read-up on?
While pretty inefficient concering runtime, backtracking is an often used algorithm for such a problem.
It follows a simple pattern, and if written correctly, you can easily replace a rule set into it.
Define your rule data structures; i.e., define the set of operations that the rules can encapsulate, and define the available cross-referencing that can be done. Once you've done this, you should have a clearer view of what type of algorithms to use to apply these rules to a potential result set.
Supposing that your rules are restricted to "type X is allowed to have type Y immediately to its left/right/top/bottom" you potentially have situations where generating possible patterns is computationally difficult. Take a look at Wang Tiles (a good source is the book Tilings and Patterns by Grunbaum and Shephard) and you'll see that with the states sets of rules you might define sets of Wang Tiles. Appropriate sets of these are Turing Complete.
For small rectangles, or your sets of rules, this may only be of academic interest. As mentioned elsewhere a backtracking approach might be appropriate for your ruleset - in which case you may want to consider appropriate heuristics for the order in which new components are added to your grid. Again, depending on your rulesets, other approaches might work. E.g. if your ruleset admits many solutions you might get a long way by randomly allocating many items to the grid before attempting to fill in remaining gaps.