Maybe someone can help me to solve a problem with Prolog or any constraint programming language. Imagine a table of projects (school projects where pupils do something with their mothers). Each project has one or more children participating. For each child we store its name and the name of its mother. But for each project there is only one cell that contains all mothers and one cell that contains all children. Both cells are not necessarily ordered in the same way.
Example:
+-----------+-----------+------------+
| | | |
| Project | Parents | Children |
| | | |
+-----------+-----------+------------+
| | | |
| 1 | Jane; | Brian; |
| | Claire | Stephen |
| | | |
+-----------+-----------+------------+
| | | |
| 2 | Claire; | Emma; |
| | Jane | William |
| | | |
+-----------+-----------+------------+
| | | |
| 3 | Jane; | William; |
| | Claire | James |
| | | |
+-----------+-----------+------------+
| | | |
| 4 | Jane; | Brian; |
| | Sophia; | James; |
| | Claire | Isabella |
| | | |
+-----------+-----------+------------+
| | | |
| 4 | Claire | Brian |
| | | |
+-----------+-----------+------------+
| | | |
| 5 | Jane | Emma |
| | | |
+-----------+-----------+------------+
I hope this example visualizes the problem. As I said both cells only contain the names separated by a delimiter, but are not necessarily ordered in a similar way. So for technical applications you would transform the data into this:
+-------------+-----------+----------+
| Project | Name | Role |
+-------------+-----------+----------+
| 1 | Jane | Mother |
+-------------+-----------+----------+
| 1 | Claire | Mother |
+-------------+-----------+----------+
| 1 | Brian | Child |
+-------------+-----------+----------+
| 1 | Stephen | Child |
+-------------+-----------+----------+
| 2 | Jane | Mother |
+-------------+-----------+----------+
| 2 | Claire | Mother |
+-------------+-----------+----------+
| 2 | Emma | Child |
+-------------+-----------+----------+
| 2 | William | Child |
+-------------+-----------+----------+
| | | |
| |
| And so on |
The number of parents and children is equal for each project. So for each deal we have n mothers and n children and each mother belongs to exactly one child. With these constraints it is possible to assign each mother to all of her children by logical inference starting with the projects that involve only one child (i.e. 4 and 5).
Results:
Jane has Emma, Stephen and James;
Claire has Brian and William;
Sophia has Isabella
I am wondering how this can be solved using constraint programming. Additionally, the data set might be underdetermined and I am wondering if it is possible to isolate records that, when solved manually (i.e. when the mother-child assignments are done manually), would break the underdetermination.
I'm not sure if I understand all the requirements of the problem, but here is a constraint programming model in MiniZinc (http://www.minizinc.org/). The full model is here: http://hakank.org/minizinc/one_to_many.mzn .
LATER NOTE: The first version of the project constraints where not correct. I have removed the incorrect code . See the edit history for the original answer.
enum mothers = {jane,claire,sophia};
enum children = {brian,stephen,emma,william,james,isabella};
% decision variables
% who is the mother of this child?
array[children] of var mothers: x;
solve satisfy;
constraint
% All mothers has at least one child
forall(m in mothers) (
exists(c in children) (
x[c] = m
)
)
;
constraint
% NOTE: This is a more correct version of the project constraints.
% project 1
(
( x[brian] = jane /\ x[stephen] = claire) \/
( x[stephen] = jane /\ x[brian] = claire)
)
/\
% project 2
(
( x[emma] = claire /\ x[william] = jane) \/
( x[william] = claire /\ x[emma] = jane)
)
/\
% project 3
(
( x[william] = claire /\ x[james] = jane) \/
( x[james] = claire /\ x[william] = jane)
)
/\
% project 4
(
( x[brian] = jane /\ x[james] = sophia /\ x[isabella] = claire) \/
( x[james] = jane /\ x[brian] = sophia /\ x[isabella] = claire) \/
( x[james] = jane /\ x[isabella] = sophia /\ x[brian] = claire) \/
( x[brian] = jane /\ x[isabella] = sophia /\ x[james] = claire) \/
( x[isabella] = jane /\ x[brian] = sophia /\ x[james] = claire) \/
( x[isabella] = jane /\ x[james] = sophia /\ x[brian] = claire)
)
/\
% project 4(sic!)
( x[brian] = claire) /\
% project 5
( x[emma] = jane)
;
output [
"\(c): \(x[c])\n"
| c in children
];
The unique solution is
brian: claire
stephen: jane
emma: jane
william: claire
james: jane
isabella: sophia
Edit2: Here is a more general solution. See http://hakank.org/minizinc/one_to_many.mzn for the complete model.
include "globals.mzn";
enum mothers = {jane,claire,sophia};
enum children = {brian,stephen,emma,william,james,isabella};
% decision variables
% who is the mother of this child?
array[children] of var mothers: x;
% combine all the combinations of mothers and children in a project
predicate check(array[int] of mothers: mm, array[int] of children: cc) =
let {
int: n = length(mm);
array[1..n] of var 1..n: y;
} in
all_different(y) /\
forall(i in 1..n) (
x[cc[i]] = mm[y[i]]
)
;
solve satisfy;
constraint
% All mothers has at least one child.
forall(m in mothers) (
exists(c in children) (
x[c] = m
)
)
;
constraint
% project 1
check([jane,claire], [brian,stephen]) /\
% project 2
check([claire,jane],[emma,william]) /\
% project 3
check([claire,jane],[william,james]) /\
% project 4
check([claire,sophia,jane],[brian,james,isabella]) /\
% project 4(sic!)
check([claire],[brian]) /\
% project 5
check([jane],[emma])
;
output [
"\(c): \(x[c])\n"
| c in children
];
This model use the following predicate to ensure that all the combinations of mothers vs children are considered:
predicate check(array[int] of mothers: mm, array[int] of children: cc) =
let {
int: n = length(mm);
array[1..n] of var 1..n: y;
} in
all_different(y) /\
forall(i in 1..n) (
x[cc[i]] = mm[y[i]]
)
;
It use the global constraint all_different(y) to ensure that mm[y[i]] is one of the mothers in mm, and then assign the `i'th child to that specific mother.
A bit off topic, but since from SWI-Prolog manual:
Plain Prolog can be regarded as CLP(H), where H stands for Herbrand
terms. Over this domain, =/2 and dif/2 are the most important
constraints that express, respectively, equality and disequality of
terms.
I feel authorized to suggest a Prolog solution, more general than the algorithm you suggested (progressively reduce relations based on single to single relations):
solve2(Projects,ParentsChildren) :-
foldl([_-Ps-Cs,L,L1]>>try_links(Ps,Cs,L,L1),Projects,[],ChildrenParent),
transpose_pairs(ChildrenParent,ParentsChildrenFlat),
group_pairs_by_key(ParentsChildrenFlat,ParentsChildren).
try_links([],[],Linked,Linked).
try_links(Ps,Cs,Linked,Linked2) :-
select(P,Ps,Ps1),
select(C,Cs,Cs1),
link(C,P,Linked,Linked1),
try_links(Ps1,Cs1,Linked1,Linked2).
link(C,P,Assigned,Assigned1) :-
( memberchk(C-Q,Assigned)
-> P==Q,
Assigned1=Assigned
; Assigned1=[C-P|Assigned]
).
This accepts data in a natural format, like
data(1,
[1-[jane,claire]-[brian,stephen]
,2-[claire,jane]-[emma,william]
,3-[jane,claire]-[william,james]
,4-[jane,sophia,claire]-[brian,james,isabella]
,5-[claire]-[brian]
,6-[jane]-[emma]
]).
data(2,
[1-[jane,claire]-[brian,stephen]
,2-[claire,jane]-[emma,william]
,3-[jane,claire]-[william,james]
,4-[jane,sophia,claire]-[brian,james,isabella]
,5-[claire]-[brian]
,6-[jane]-[emma]
,7-[sally,sandy]-[grace,miriam]
]).
?- data(2,Ps),solve2(Ps,S).
Ps = [1-[jane, claire]-[brian, stephen], 2-[claire, jane]-[emma, william], 3-[jane, claire]-[william, james], 4-[jane, sophia, claire]-[brian, james, isabella], 5-[claire]-[brian], 6-[jane]-[emma], 7-[...|...]-[grace|...]],
S = [claire-[william, brian], jane-[james, emma, stephen], sally-[grace], sandy-[miriam], sophia-[isabella]].
This is my first CHR program, so I hope that someone will come and give me some advice on how to improve it.
My thinking is that you need to expand all the lists into facts. From there, if you know that a project has just one parent and one child, you can establish the parent relationship from that. Also, once you have a parent-child relationship, you can remove that set from the other facts in the other projects and reduce the cardinality of the problem by one. Eventually you will have figured out everything you can. The only difference between a completely determined dataset and an incompletely determined one is in how far that reduction can go. If it doesn't quite get there, it will leave around some facts so you can see which projects/parents/children are still creating ambiguity.
:- use_module(library(chr)).
:- chr_constraint project/3, project_parent/2, project_child/2,
project_parents/2, project_children/2, project_size/2, parent/2.
%% turn a project into a fact about its size plus
%% facts for each parent and child in this project
project(N, Parents, Children) <=>
length(Parents, Len),
project_size(N, Len),
project_parents(N, Parents),
project_children(N, Children).
%% expand the list of parents for this project into a fact per parent per project
project_parents(_, []) <=> true.
project_parents(N, [Parent|Parents]) <=>
project_parent(N, Parent),
project_parents(N, Parents).
%% same for the children
project_children(_, []) <=> true.
project_children(N, [Child|Children]) <=>
project_child(N, Child),
project_children(N, Children).
%% a single parent-child combo on a project is exactly what we need
one_parent # project_size(Project, 1),
project_parent(Project, Parent),
project_child(Project, Child) <=>
parent(Parent, Child).
%% if I have a parent relationship for project of size N,
%% remove this parent and child from the project and decrease
%% the number of parents and children by one
parent_det # parent(Parent, Child) \ project_size(Project, N),
project_parent(Project, Parent),
project_child(Project, Child) <=>
succ(N0, N),
project_size(Project, N0).
I ran this with your example by making a main/0 predicate to do it:
main :-
project(1, [jane, claire], [brian, stephen]),
project(2, [claire, jane], [emma, william]),
project(3, [jane, claire], [william, james]),
project(4, [jane, sophia, claire], [brian, james, isabella]),
project(5, [claire], [brian]),
project(6, [jane], [emma]).
This outputs:
parent(sophia, isabella),
parent(jane, james),
parent(claire, william),
parent(jane, emma),
parent(jane, stephen),
parent(claire, brian).
To demonstrate incomplete determination, I added a seventh project:
project(7, [sally,sandy], [grace,miriam]).
The program then outputs this:
project_parent(7, sandy),
project_parent(7, sally),
project_child(7, miriam),
project_child(7, grace),
project_size(7, 2),
parent(sophia, isabella),
parent(jane, james),
parent(claire, william),
parent(jane, emma),
parent(jane, stephen),
parent(claire, brian).
As you can see, any project_size/2 that remains tells you the cardinality of what remains to be solved (project seven has two parent/children relationships still remaining to be determined) and you get back exactly the parents/children that remain to be handled, as well as all of the parent/2 relations which could be determined.
I'm pretty happy with this outcome but hopefully others can come and improve my code!
Edit: my code has a shortcoming which was identified on the mailing list, that certain inputs will fail to converge even though the solution can be computed, for instance:
project(1,[jane,claire],[brian, stephan]),
project(2,[jane,emma],[stephan, jones]).
For more information, see Ian's solution, which uses set intersection to determine the mapping.
Does anyone know the rules for valid Ruby variable names? Can it be matched using a RegEx?
UPDATE: This is what I could come up with so far:
^[_a-z][a-zA-Z0-9_]+$
Does this seem right?
Identifiers are pretty straightforward. They begin with letters or an underscore, and contain letters, underscore and numbers. Local variables can't (or shouldn't?) begin with an uppercase letter, so you could just use a regex like this.
/^[a-z_][a-zA-Z_0-9]*$/
It's possible for variable names to be unicode letters, in which case most of the existing regexes don't match.
varname = "\u2211" # => "∑"
eval(varname + '= "Tony the Pony"') => "Tony the Pony"
puts varname # => ∑
local_variable_identifier = /Insert large regular expression here/
varname =~ local_variable_identifier # => nil
See also "Fun with Unicode" in either the Ruby 1.9 Pickaxe or at Fun with Unicode.
According to http://rubylearning.com/satishtalim/ruby_names.html a Ruby variable consists of:
A name is an uppercase letter,
lowercase letter, or an underscore
("_"), followed by Name characters
(this is any combination of upper- and
lowercase letters, underscore and
digits).
In addition, global variables begin with a dollar sign, instance variables with a single at-sign, and class variables with two at-signs.
A regular expression to match all that would be:
%r{
(\$|#{1,2})? # optional leading punctuation
[A-Za-z_] # at least one upper case, lower case, or underscore
[A-Za-z0-9_]* # optional characters (including digits)
}x
Hope that helps.
I like #aboutruby's answer, but just to complete it, here's the equivalent using POSIX bracket expressions.
/^[_[:lower:]][_[:alnum:]]*$/
Or, since a-z is actually shorter than [:lower:]:
/^[_a-z][_[:alnum:]]*$/
I think /^(\$){0,1}[_a-zA-Z][a-zA-Z0-9_]*([?!]){0,1}$/ is a bit closer to what you will need...
It depends on whether you want to match method names as well.
If you are trying to match a name that might be encountered in an expression, then it might start with $ and it might end with ? or !. If you know for sure that it is just a local variable then the rule will be much simpler.
i was trying to figure one out for a rails patch, and Matthew Draper wrote this one, using the ruby parser as a reference:
/\A(?![A-Z0-9])(?:[[:alnum:]_]|[^\0-\177])+\z/
And here it is, straight from the horse's mouth. (The horse in this case is the Draft ISO Ruby Specification):
local-variable-identifier → ( lowercase-character | _ ) identifier-character *
identifier-character → lowercase-character | uppercase-character | decimal-digit | _
uppercase-character → A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
lowercase-character → a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
decimal-digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
In Ruby 1.9, using named groups, you can translate this literally:
local_variable_identifier = %r{
(?<uppercase_character> A | B | C | D | E | F | G | H | I | J | K | L | M
| N | O | P | Q | R | S | T | U | V | W | X | Y | Z
){0}
(?<lowercase_character> a | b | c | d | e | f | g | h | i | j | k | l | m
| n | o | p | q | r | s | t | u | v | w | x | y | z
){0}
(?<decimal_digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9){0}
(?<identifier_character> \g<lowercase_character>
| \g<uppercase_character>
| \g<decimal_digit>
| _
){0}
( \g<lowercase_character> | _ ) \g<identifier_character>*
}x
Of course, this is not how you would really write it.