What does it mean by express this puzzle as a CSP - prolog

What is meant by the below for the attached image.
By labelling each cell with a variable, express the puzzle as a CSP. Hint:
recall that a CSP is composed of three parts.
I initially thought just add variables to each cell like A, B, C etc to each cell and then constrain those cells, but I do not believe that is correct. I do not want the answer just an explanation of what is required. in terms of CSP.

In my opinion, a CSP is best divided into two parts:
State the constraints. This is called the modeling part or model.
Search for solutions using enumeration predicates like labeling/2.
These parts are best kept separate by using a predicate which we call core relation and which has the following properties:
It posts the constraints, i.e., it expresses part (1) above.
Its last argument is the list of variables that still need to be labeled.
By convention, its name ends with an underscore _.
Having this distinction in place allows you to:
try different search strategies without the need to recompile your code
reason about termination properties of the core relation in isolation of any concrete (and often very costly) search.
I can see how some instructors may decompose part (1) into:
1a. stating the domains of the variables, using for example in/2 constraints
1b. stating the other constraints that hold among the variables.
In my view, this distinction is artificial, because in/2 constraints are constraints like all other constraints in the modeling part, but some instructors may teach this separately also for historical reasons, dating back to the time when CSP systems were not as dynamic as they are now.
Nowadays, you can typically post additional domain restrictions any time you like and freely mix in/2 constraints with other constraints in any order.
So, the parts that are expected from you are likely: (a) state in/2 constraints, (b) state further constraints and (c) use enumeration predicates to search for concrete solutions. It also appears that you already have the right idea about how to solve this concrete CSP with this method.

Related

Sudoku as CSP (arc consistency)

For a study assignment I've recreated Norvig's algorithm in C# to solve sudoku's as a Constraint Satisfaction Problem (CSP) combined with local search with as heuristic the amount of possible values for a square. Now I need to create an extension or variant of it and I'm kind of confused about to what degrees the algorithm ensures arc consistency. What the current algorithm basicly does for this is:
Initialize the possible values (domains) of each square as [1,...,n*n].
Each assignment of a value to a square is done by eliminating each possible value from the domain and updating every peer (square in the same subgrid/row/column) by removing the assigned value from their domains. (Doesn't this fully ensure arc consistency because these are the only constraints between peers; that they may not have the same value?)
When eliminating a value from a square's domain it also checks whether there's only 1 square left for this value in its unit. If so, it assigns it to that square (also by eliminating possible values, reducing it to just that value).
Now my question is: does this algorithm ensure complete arc consistency? And if not, how could I improve my CPS algorithm for this?
If anyone could help me out on this it'd be much appreciated!
Thanks in advance.
Best regards.
I am surprise you add local search as the Sudoku is really trivially solved in CP (usually without any branching). Anyway, Arc consistency may have three different meanings:
Establishing arc consistency over a constraint network: roughly means you call the filtering algorithm of your constraints until reaching a fix point. This is done by all solvers by default. People using this term usually assume that each constraint has its own arc-consistency algorithm (see next point), which is quite true for binary constraints but usually wrong in the general case (and real life problems).
Establishing arc consistency for a constraint: roughly means removing from each variable, all values that belong to no solution OF THAT CONSTRAINT (regardless the rest of the model). It depends of the filtering algorithm you use for the constraint (you can have many with different tradeoff between filtering power and runtime).
Establishing arc consistency on a problem: imagine you model your entire problem using one custom global constraint, then apply previous definition.
So do you establish AC on the entire problem? This means do all unfiltered variable/value assignment belong to a solution? From what you describe, the answer is no.
Do you establish AC on each of your constraints? Well, this depends on your model. If you only use binary constraints to state your problem, then I would say yes. If you want to improve filtering, you should use global constraints, such as AllDifferent. The arc-consistent filtering algorithm of this constraint is more complex than what you describe, but it is also more powerful!
You can give a look at this example that uses the Choco Solver.
You can also use different consistency levels (such as bound consistency).

School Scheduling (Constrained Logic)

I'm trying to build a school scheduling program in prolog. I want to check if a teacher is available at a given time to teach a certain this class; check allowable time slots; etc.
Here's what I've been able to write so far:
teacher(ali, bio).
teacher(sara, math).
teacher(john, lit).
teacher(milton, arabic).
% a, b, c, d, e, f, g
timeslot(a).
timeslot(b).
% class has a name and a grade
class(bio, 1).
class(math, 1).
class(lit, 2).
class(arabic, 2).
How do I establish that a class cannot have two timeslots?
I have used a little bit of Prolog, but I'm not sure how to go about this. Any further tips and indications, like papers or similar problems that are solved more frequently, would be appreciated.
The wording of the Question suggests that a program is to be written that will produce (or at least check) a proposed class schedule.
Inputs to the program appear to be a list of teachers (and their subjects), a list of time slots, and a list of classes (and their subjects/grades).
Presumably there are several "cardinality" restrictions (sometimes called "business rules") that a proper class schedule must meet. A class can only be given once (not two time slots) is the point of the Question, but also a teacher can only teach one class per time slot, etc.
How can these restrictions be indicated? Prolog predicates do not have inherent restrictions of this kind, but they can be implemented either structurally or logically (i.e. in the program's logical checking).
An example of doing things in a structural way would be adding a field to the class predicate to represent the assigned timeslot. Some logic would be involved in how this field is assigned, to insure that value is a valid time slot.
An example of doing the relationship between classes and time slots in a logical fashion would be to define an additional predicate that models the assignment of time slots to classes (presumably something similar applies to assigning classes to teachers). You would have, as illustration, predicate class_timeslot(Class,Timeslot). The rules of your program would enforce the uniqueness of one instance of these (dynamically asserted) facts per Class instance, and the validity of the Timeslot value.
Alternatively, instead of dynamic facts, the class schedule could be constructed as a list of structures similarly pairing classes and time slots. But the point is that program logic needs to implement that this pairing is a functional relationship.
i wrote two years ago a scheduling program for assessment center and used clpfd for that because in normal swi-prolog it would be much more complicated and the problem is scaling exponentional with the complexity so if you have a real school with lots of teachers, lessions etc this will be not really efficient without constraint programming.
Please have a look int clp(fd) at swi-prolog website
Kind regards
solick

Method for runtime comparison of two programs' objects

I am working through a particular type of code testing that is rather nettlesome and could be automated, yet I'm not sure of the best practices. Before describing the problem, I want to make clear that I'm looking for the appropriate terminology and concepts, so that I can read more about how to implement it. Suggestions on best practices are welcome, certainly, but my goal is specific: what is this kind of approach called?
In the simplest case, I have two programs that take in a bunch of data, produce a variety of intermediate objects, and then return a final result. When tested end-to-end, the final results differ, hence the need to find out where the differences occur. Unfortunately, even intermediate results may differ, but not always in a significant way (i.e. some discrepancies are tolerable). The final wrinkle is that intermediate objects may not necessarily have the same names between the two programs, and the two sets of intermediate objects may not fully overlap (e.g. one program may have more intermediate objects than the other). Thus, I can't assume there is a one-to-one relationship between the objects created in the two programs.
The approach that I'm thinking of taking to automate this comparison of objects is as follows (it's roughly inspired by frequency counts in text corpora):
For each program, A and B: create a list of the objects created throughout execution, which may be indexed in a very simple manner, such as a001, a002, a003, a004, ... and similarly for B (b001, ...).
Let Na = # of unique object names encountered in A, similarly for Nb and # of objects in B.
Create two tables, TableA and TableB, with Na and Nb columns, respectively. Entries will record a value for each object at each trigger (i.e. for each row, defined next).
For each assignment in A, the simplest approach is to capture the hash value of all of the Na items; of course, one can use LOCF (last observation carried forward) for those items that don't change, and any as-yet unobserved objects are simply given a NULL entry. Repeat this for B.
Match entries in TableA and TableB via their hash values. Ideally, objects will arrive into the "vocabulary" in approximately the same order, so that order and hash value will allow one to identify the sequences of values.
Find discrepancies in the objects between A and B based on when the sequences of hash values diverge for any objects with divergent sequences.
Now, this is a simple approach and could work wonderfully if the data were simple, atomic, and not susceptible to numerical precision issues. However, I believe that numerical precision may cause hash values to diverge, though the impact is insignificant if the discrepancies are approximately at the machine tolerance level.
First: What is a name for such types of testing methods and concepts? An answer need not necessarily be the method above, but reflects the class of methods for comparing objects from two (or more) different programs.
Second: What are standard methods exist for what I describe in steps 3 and 4? For instance, the "value" need not only be a hash: one might also store the sizes of the objects - after all, two objects cannot be the same if they are massively different in size.
In practice, I tend to compare a small number of items, but I suspect that when automated this need not involve a lot of input from the user.
Edit 1: This paper is related in terms of comparing the execution traces; it mentions "code comparison", which is related to my interest, though I'm concerned with the data (i.e. objects) than with the actual code that produces the objects. I've just skimmed it, but will review it more carefully for methodology. More importantly, this suggests that comparing code traces may be extended to comparing data traces. This paper analyzes some comparisons of code traces, albeit in a wholly unrelated area of security testing.
Perhaps data-tracing and stack-trace methods are related. Checkpointing is slightly related, but its typical use (i.e. saving all of the state) is overkill.
Edit 2: Other related concepts include differential program analysis and monitoring of remote systems (e.g. space probes) where one attempts to reproduce the calculations using a local implementation, usually a clone (think of a HAL-9000 compared to its earth-bound clones). I've looked down the routes of unit testing, reverse engineering, various kinds of forensics, and whatnot. In the development phase, one could ensure agreement with unit tests, but this doesn't seem to be useful for instrumented analyses. For reverse engineering, the goal can be code & data agreement, but methods for assessing fidelity of re-engineered code don't seem particularly easy to find. Forensics on a per-program basis are very easily found, but comparisons between programs don't seem to be that common.
(Making this answer community wiki, because dataflow programming and reactive programming are not my areas of expertise.)
The area of data flow programming appears to be related, and thus debugging of data flow programs may be helpful. This paper from 1981 gives several useful high level ideas. Although it's hard to translate these to immediately applicable code, it does suggest a method I'd overlooked: when approaching a program as a dataflow, one can either statically or dynamically identify where changes in input values cause changes in other values in the intermediate processing or in the output (not just changes in execution, if one were to examine control flow).
Although dataflow programming is often related to parallel or distributed computing, it seems to dovetail with Reactive Programming, which is how the monitoring of objects (e.g. the hashing) can be implemented.
This answer is far from adequate, hence the CW tag, as it doesn't really name the debugging method that I described. Perhaps this is a form of debugging for the reactive programming paradigm.
[Also note: although this answer is CW, if anyone has a far better answer in relation to dataflow or reactive programming, please feel free to post a separate answer and I will remove this one.]
Note 1: Henrik Nilsson and Peter Fritzson have a number of papers on debugging for lazy functional languages, which are somewhat related: the debugging goal is to assess values, not the execution of code. This paper seems to have several good ideas, and their work partially inspired this paper on a debugger for a reactive programming language called Lustre. These references don't answer the original question, but may be of interest to anyone facing this same challenge, albeit in a different programming context.

Efficient Mutable Graph Representation in Prolog?

I would like to represent a mutable graph in Prolog in an efficient manner. I will searching for subsets in the graph and replacing them with other subsets.
I've managed to get something working using the database as my 'graph storage'. For instance, I have:
:- dynamic step/2.
% step(Type, Name).
:- dynamic sequence/2.
% sequence(Step, NextStep).
I then use a few rules to retract subsets I've matched and replace them with new steps using assert. I'm really liking this method... it's easy to read and deal with, and I let Prolog do a lot of the heavy pattern-matching work.
The other way I know to represent graphs is using lists of nodes and adjacency connections. I've seen plenty of websites using this method, but I'm a bit hesitant because it's more overhead.
Execution time is important to me, as is ease-of-development for myself.
What are the pros/cons for either approach?
As usual: Using the dynamic database gives you indexing, which may speed things up (on look-up) and slow you down (on asserting). In general, the dynamic database is not so good when you assert more often than you look up. The main drawback though is that it also significantly complicates testing and debugging, because you cannot test your predicates in isolation, and need to keep the current implicit state of the database in mind. Lists of nodes and adjacancy connections are a good representation in many cases. A different representation I like a lot, especially if you need to store further attributes for nodes and edges, is to use one variable for each node, and use variable attribtues (get_attr/3 and put_attr/3 in SWI-Prolog) to store edges on them, for example [edge_to(E1,N_1),edge_to(E2,N_2),...] where N_i are the variables representing other nodes (with their own attributes), and E_j are also variables onto which you can attach further attributes to store additional information (weight, capacity etc.) about each edge if needed.
Have you considered using SWI-Prolog's RDF database ? http://www.swi-prolog.org/pldoc/package/semweb.html
as mat said, dynamic predicates have an extra cost.
in case however you can construct the graph and then you dont need to change it, you can compile the predicate and it will be as fast as a normal predicate.
usually in sw-prolog the predicate lookup is done using hash tables on the first argument. (they are resized in case of dynamic predicates)
another solution is association lists where the cost of lookup etc is o(log(n))
after you understand how they work you could easily write an interface if needed.
in the end, you can always use a SQL database and use the ODBC interface to submit queries (although it sounds like an overkill for the application you mentioned)

prolog problem; displaying same rank universities

Sorry for the blunt title, but couldn't really generalize the question. Hope someone with Prolog experience can help me out here. So I've got a database which basically lists universities and their rank, i.e: oxford(1), warwick(2), etc. The question requires me to write a rule that returns all the names of the universities that have the same rank. Thanks in advance.
I believe this is going to require a bit of meta-programming, but only a little bit. You are probably going to have to provide some feedback about my assumptions in this answer in order to get a robust solution. But I think jumping in will get you there faster (with both of us learning something along the way) than asking a sequence of clarifying comments.
Our immediate goal will be to find these "university" facts through what SWI-Prolog calls "Examining the program" (links below, but you could search for it as a section of the Manual). If we can do this, we can query those facts to get a particular rank, thus obtaining all universities of the same rank.
From what you've said, there are a number of "facts" of the form "UNIVERSITY(RANK)." Typically if you consult a file containing these from SWI-Prolog, they will be dynamic predicates and (unless you've done something explicit to avoid it) added to the [user] module. Such a database is often called a "factbase". Facts here mean clauses with only a head (no body); dynamic predicates can in general have clauses with or without bodies.
SWI-Prolog has three different database mechanisms. The one we are discussing is the clause database that is manipulated through not only consulting but also by the assert/retract meta-predicates. We will refer to these as the "dynamic" predicates.
Here's a modification of a snippet of code that Jan Wielemaker provides for generating (through backtracking) all the built-in predicates, now repurposed to generate the dynamic predicates:
generate_dynamic(Name/Arity) :-
predicate_property(user:Head, dynamic),
functor(Head, Name, Arity). % get arity: number of args
In your case you are only interested in certain dynamic predicates, so this may return too much in the way of results. One way to narrow things down is by setting Arity = 1, since your university facts only consist of predicates with a single argument.
Another way to narrow things down is by the absence of a body. If this check is needed, we can incorporate a call to clause/2 (documented on the same page linked above). If we have a "fact" (clause without a body), then the resulting call to clause/2 returns the second argument (Body) set to the atom true.
As a final note, Jan's website uses SWI-Prolog to deliver its pages, but the resulting links don't always cut-and-paste well. If the link I gave above doesn't work for you, then you can either navigate to Sec. 4.14 of the Manual yourself, or try this link to a mirrored copy of the documentation that appears not-quite current (cf. difference in section numbering and absence of Jan's code snippet).
And feel free to ask questions if I've said something that needs clarification or assumed something that doesn't apply to your setup.
Added: Let's finish the answer by showing how to query a list of universities, whether given as such or derived from the "factbase" as outlined above. Then we have a few comments about design and learning at the end.
Suppose LU = [oxford,warwick,...] is in hand, a list of all possible universities. Apart from efficiency, we may not even care if a few things that are not universities or are not ranked are on the list, depending on the nature of the query you want to do.
listUniversityRank(LU,V,R) :- % LU: list of universities
member(V,LU),
call(V(R)).
The above snippet defines a predicate listUniversityRank/2 that we would provide a list of universities, and which would in turn call a dynamically generated goal on each member of the list to find its rank. Such a predicate can be used in several ways to accomplish your objective of finding "all the names of the universities that have the same rank."
For instance, we might want to ask for a specific rank R=1 what universities share that rank. Calling listUniversityRank(LU,V,R) with R bound to 1 would accomplish that, at least in the sense that it would backtrack through all such university names. If you wanted to gather these names into a list, then you could use findall/3.
For that matter you might want to begin listing "all the names of the universities that have the same rank" by making a list of all possible ranks, using setof/3 to collect the solutions for R in listUniversityRank(LU,_,R). setof is similar to findall but sorts the results and eliminates duplicates.
Now let's look back and think about how hard we are working to accomplish the stated aim, and what might be a design that makes life easier for that purpose. We want a list of university names with a certain property (all have the same rank). It would have been easier if we had the list of university names to start with. As Little Bobby Tables points out in one of the Comments on the Question, we have a tough time telling what is and isn't a university if there are facts like foo(3) in our program.
But something more is going on here. Using the university names to create the "facts", a different predicate for each different university, obscures the relationship university vs. rank that we would like to query. If we only had it to do over again, surely we'd rather represent this relationship with a single two-argument predicate, say universityRank/2 that directly connects each university name and the corresponding rank. Fewer predicates, better design (because more easily queried, without fancy meta-programming).

Resources