Understanding universal restriction of a concept in Description Logic (DL) - first-order-logic

I am trying to understand the following Passage from a DL tutorial in terms of First Order Logic (FOL).
Passage
To represent the set of individuals all of whose children are female, we use the universal restriction
∀parentOf.Female (16)
It is a common error to forget that (16) also includes those individuals that have no children at all.
I take (16) to mean "if an individual has children, then those children are all female". My FOL representation of (16) is:
∀x∀y(parentOf(x,y) → Female(y)) (1)
My rational for this translation is that the implicit variable x represents the set of individuals being defined by the role parentOf. I assume x is universally quantified. The variable y represents female children, which I believe is called the successor of x in DL terminology, it is explicitly universally quantified in DL.
My FOL representation of "individuals that have no children at all" in FOL is:
∀x∀y ¬(parentOf(x,y)) (2)
My interpretation of the Passage, in FOL terms, is that if (2) holds then (1) holds. This is because the antecedent of (1) is false in this case.
Is my interpretation of the Passage correct?
Are my translations correct?

About your formula (1)
If you say
∀x∀y(parentOf(x,y) → Female(y))
or, equivalently
∀y((∃x parentOf(x,y)) → Female(y))
you mean that every x can only have female children. But for stating this in DL you need concept inclusion, that is:
⊤ ⊑ ∀parentOf.Female
that means "the range of role parentOf is included in concept Female".
Concept and role inclusions are intensional assertions, i.e., axioms specifying general properties of DL constructs.
Instead, DL's restrictions are not assertions, but constructs like concepts. So they are not used to state properties that are valid for every individual of the ontology. Like, when you say C⊓D, you are not stating that all the individuals of your ontology are instances of C and D, but you are simply "collecting" only the individuals that are instances of C and D at the same time.
Therefore, the formula ∀parentOf.Female just wants to "catch" all the x such that, if x is parent of y, then y is Female. More formally, its semantics is the following:
{x|∀y (parentOf(x,y) → Female(y))}
About your formula (2)
Analogously, the semantics of "individuals that have no children at all" is a set of individuals too:
{x|∀y ¬parentOf(x,y)}
or equivalently
{x|¬∃y parentOf(x,y)}
Indeed, you are collecting all the individuals that have no children, and not stating that all the individuals have no children.
Conclusion
You say right: "if (2) holds then (1) holds". The point is that neither (2) nor (1) (necessarily) hold.
Since you are working with sets, you should not reason in terms of logical inference but of set inclusion.
Therefore, the correct interpretation of the Passage is not
if ∀x∀y ¬(parentOf(x,y)) then ∀x∀y(parentOf(x,y) → Female(y))
but
{x|∀y ¬parentOf(x,y)} is a subset of {x|∀y (parentOf(x,y) → Female(y))}

Related

Translate sentence into FOL expression, confused about constants and quantifiers

Translate the following statements into FOL sentances
1) Alex likes John
Likes(alex, john) - I know this is correct
2) Each person is either a man or woman
AxAy( man(x) v woman(y) )
EDIT: Is this better??: Az(Person(z) -> man(x) v woman(y))
OR EDIT: Is this better??: Ax(Person(x) -> man(x) v woman(x))
3) No one is both man and woman
Ex( (man(x) ^ ¬woman(x)) v (¬man(x) ^ woman(x)) )
4) Alex likes a man who likes a woman
AxEy(Likes( man(x), woman(y) ) -> Likes(alex, man(x) ))
Thanks
Here is a screenshot of the background info
EDIT: For number 3, I have found this online
"The exclusive disjunction of p and q asserts that either p is true or q is true but not both. The natural, but long-winded, way to express exclusive disjunction, then, is (p | q) & ~(p & q)."
If this can apply, then I assume the correct answer is Ax( (man(x) v woman(x)) ^ ¬(man(x) & woman(x)) )
But now I am getting confused as to how 2 and 3 are different...
Hey I just wanted to know if these were correct
1. Testing the Correctness of Translation
One of the ways to test if a first order sentence agrees with an informal specification is to use a model.
To perform a test you need:
To determine how many relations are there in your first order sentence and list them. This list would play the role of the logical signature of your model.
Now take a set of individuals large enough to have all interesting combinations of properties assigned to at least one individual.
Assign the properties to individuals according to your understanding of the meaning of relations in your first order sentence.
Finally for each universally quantified variable try different assignments of individuals to variables and check if the property holds.
Consider one of the examples from the post.
Informal specification: Alex likes a man who likes a woman
We have
one constant symbol: Alex
two unary relations: Man(x) and Woman(x)
a binary relation: Likes(x,y)
1.1 Structure No. 1
Now consider a structure where we have an individual for Alex, an individual for a man who is not Alex, and an individual for a woman who is not Alex.
Let's start with three: p1, p2, p3.
Alex is p1
Man(p2)
Woman(p3)
Woman(p1)
Likes (p2,p3)
Likes (p1,p2)
The informal specification says that Alex likes someone who is a Man and Likes someone else who is a Woman. Our model satisfies this specification.
Now consider the following statement taken from the OP:
AxEy(Likes( man(x), woman(y) ) -> Likes(alex, man(x) ))
This statement is from a different language. Here man(x) and woman(y) are unary functions instead of unary relations, so we cannot check this sentence.
I would not speculate any further on the subject of woman(y) or man(x) being a function (Pun intended). Instead I would consider a different sentence.
AxEy(Man(x) & Woman(y) & Likes( x, y ) -> Likes(alex, x ))
Let's see what happens if x=p2 and y=p1. In our model the premise
Man(p2) & Woman(p3) & Likes( p2, p3 )
holds, and the conclusion
Likes(p1,p2)
also holds, therefore the formula is true under this variable assignment.
1.2 Structure No. 2
Now consider a different model. This time with four individuals: p1,p2,p3,p4.
Alex is p1
Man(p2)
Man(p4)
Woman(p3)
Woman(p1)
Likes (p2,p3)
Likes (p1,p2)
Likes (p4,p3)
This model also satisfies our informal specification as Alex likes p2 who is a Man and likes p3 who is a Woman.
Consider the assignment x=p4 and y=p3.
Once again the premise holds in our model
Man(p4) & Woman(p3) & Likes( p4, p3 )
however the conclusion
Likes(p1,p4)
does not hold in our model. Since x is universally quantified this might be a problem as the formula does not hold under this assignment, however y is under an existential quantifier. Let's see if we can find an assignment for y that would turn our formula into a true statement with x=p4.
This is indeed possible. Let y=p2.
Man(p4) & Woman(p2) & Likes( p4, p2 )
does not hold, therefore the formula
Man(p4) & Woman(p2) & Likes( p4, p2 ) -> Likes(p1,p4)
is true. This sounds better yet to satisfy the formula we took an assignment that is far from the intended meaning of the informal specification. The original specification clearly was not speaking about a man who likes another man. So we are not done yet.
1.3 Structure No. 3
Consider a model with just two individuals: p1 and p2.
Alex is p1
Man(p2)
Woman(p1)
The informal specification does not hold in this model, so our sentence also should be false. There are four possible assignments of variables
x=p1,y=p2
x=p1,y=p1
x=p2,y=p1
x=p2,y=p2
Since Like(x,y) is always false in our model, the premise fails therefore according to the rules of implication, the formula is true. So our sentence is also true in a model where it should not hold. Once again, our first-order formalization does not hold against the informal specification.
This seems to be a very complex process that we have taken to test the translation. It assumes a certain skill of finding counterexamples and always keeping in mind the intended meaning of the informal specification.
2. How to Come up with a Correct Translation
Let's look back at our informal specification
Informal specification: Alex likes a man who likes a woman
and reformulate it in a way that is easier to translate
There exists a man whom Alex likes, and this man likes some woman
There are two parts in this sentence connected with an and
Ex (Man(x) & Likes(Alex,x))
Ey (Woman(y) & Likes(x,y))
So we have
Ex (Man(x) & Likes(Alex,x) & Ey (Woman(y) & Likes(x,y))).
You like prenex normal forms where all quantifiers are collected in one quantifier prefix. We can apply logical equivalence to get
ExEy (Man(x) & Likes(Alex,x) & Woman(y) & Likes(x,y)).
Now let's check if this statement agrees with the specification in each of the structures from the previous section.
2.1 Structure 1
Since the variables are existentially quantified and the specification holds, we need to find only one satisfying assignment for the formula.
Consider
Alex is p1
x = p2
y = p3
The conjunction holds under this assignment.
2.2 Structure 2
The same assignment as in the previous subsection can be used. In fact
Structure 1 is an substructure of Structure 2. For an existentially quantified statement we know that if it is true in a substructure, it is also true in the whole structure.
2.3 Structure 3
Since there is no pair of elements (x,y) such that Likes(x,y) in our structure, the conjunction
Man(x) & Likes(Alex,x) & Woman(y) & Likes(x,y)
cannot be true, so the statement is false. We also know that Structure 3 does not satisfy our informal specification, so our formula has passed our tests.
Our testing procedure is by no means complete. However it gives us some assurance that the translation is indeed correct.

Is there an error in this textbook about Peano Arithmetic?

I encountered this doubt in an online intro-logic open course offered by Stanford Uni.
Under the section 9.4 of this textbook here: http://logic.stanford.edu/intrologic/secondary/notes/chapter_09.html
It says:
The axioms shown here define the same relation in terms of 0 and s.(where the functional constant letter s below represents the successor function, e.g. s(0)=1, s(1)=2, s(2)=3 )
∀x.same(x,x)
∀x.(¬same(0,s(x)) ∧ ¬same(s(x),0))
∀x.∀y.(¬same(x,y) ⇒ ¬same(s(x),s(y)))
As my understanding, :
The first sentence says two identical numbers are same. The second and third sentences are used to define what is not same.
The second says no successor of any number is same to 0.
The third says if two numbers are not the same, then their successors are not same. For example, if 1≠3, then 2≠4.
However, I think the third sentence should be bi-conditional because, if I'm not wrong, the definition didn't cover the instance where the number being testified are smaller than the given number,otherwise it is possible to say if 2≠4, then 1=3.
So I wondered is this an error in text book or there's something wrong of my reasoning.
There is no error in this text book. While the statement does hold in both directions, there is no need to state it as an axiom since the other direction follows from the functional property of the successor function and the three axioms listed in the textbook.
A formal proof would involve the axioms that define the successor function. Someone more accustomed to the use of automated provers or just a good student of logic might be able to complete such a formal proof.
Here is just a sketch of a proof. It uses the symbol "=" to denote term equality, i.e. u=v means u and v are syntactically identical terms written using the symbols 0 and s(). Also "u<v" means that u and v are both ground terms and u has strictly less applications of s() than v.
Suppose
∀x.∀y.(¬same(s(x),s(y)) ⇒ ¬same(x,y))
does not hold, then there exist some terms x0 and y0 such that
same(x0,y0) and ¬same(s(x0),s(y0)).
Since s(x0) is a function, it follows from ¬same(s(x0),s(y0)) and ∀x.same(x,x) that x0 and y0 are two different terms. First let us consider the case when x0 < y0, then y0 = s(...s(x0)) where there are n applications of s() and n > 0. The other case when y0 < x0 can be handled similarly.
Substituting s(...s(x0)) for y0 in same(x0,y0) we get same(x0,s(...s(x0))).
Also x0 = s(...s(0)) where there are m applications of s() for some nonnegative integer m. Using the third axiom in the direction provided we can say that if same(s(u),s(v)) then same(u,v). Thus from same(x0,s(...s(x0))) we can "strip" m applications of s() to obtain
same(0,s(...s(0))) where there are n applications of s() in the second argument. This contradicts the second axiom. Q.E.D.

Example channelling constraints ECLiPSe

Can someone provide a simple example of channelling constraints?
Channelling constraints are used to combine viewpoints of a constraint problem. Handbook of Constraint Programming gives a good explanation of how it works and why it can be useful:
The search variables can be the variables of one of the viewpoints, say X1 (this is discussed further below). As
search proceeds, propagating the constraints C1 removes values from the domains of the
variables in X1. The channelling constraints may then allow values to be removed from
the domains of the variables in X2. Propagating these value deletions using the constraints
of the second model, C2, may remove further values from these variables, and again these
removals can be translated back into the first viewpoint by the channelling constraints. The
net result can be that more values are removed within viewpoint V1 than by the constraints
C1 alone, leading to reduced search.
I do not understand how this is implemented. What are these constraints exactly, how do they look like in a real problem? A simple example would be very helpful.
As stated in Dual Viewpoint Heuristics for Binary Constraint Satisfaction Problems (P.A. Geelen):
Channelling constraints of two different models allows for the expression of a relationship between two sets of variables, one of each model.
This implies assignments in one of the viewpoints can be translated into assignments in the other and vice versa, as well as, when search initiates,
excluded values from one model can be excluded from the other as well.
Let me throw in an example I implemented a while ago while writing a Sudoku solver.
Classic viewpoint
Here we interpret the problem in the same way a human would: using the
integers between 1 and 9 and a definition that all rows, columns and blocks must contain every integer exactly once.
We can easily state this in ECLiPSe using something like:
% Domain
dim(Sudoku,[N,N]),
Sudoku[1..N,1..N] :: 1..N
% For X = rows, cols, blocks
alldifferent(X)
And this is yet sufficient to solve the Sudoku puzzle.
Binary boolean viewpoint
One could however choose to represent integers by their binary boolean arrays (shown in the answer by #jschimpf). In case it's not clear what this does, consider the small example below (this is built-in functionality!):
?­ ic_global:bool_channeling(Digit, [0,0,0,1,0], 1).
Digit = 4
Yes (0.00s cpu)
?­ ic_global:bool_channeling(Digit, [A,B,C,D], 1), C = 1.
Digit = 3
A = 0
B = 0
C = 1
D = 0
Yes (0.00s cpu)
If we use this model to represent a Sudoku, every number will be replaced by its binary boolean array and corresponding constraints can be written. Being trivial for this answer, I will not include all the code for the constraints, but a single sum constraint is yet enough to solve a Sudoku puzzle in its binary boolean representation.
Channelling
Having these two viewpoints with corresponding constrained models now gives the opportunity to channel between them and see if any improvements were made.
Since both models are still just an NxN board, no difference in dimension of representation exists and channelling becomes real easy.
Let me first give you a last example of what a block filled with integers 1..9 would look like in both of our models:
% Classic viewpoint
1 2 3
4 5 6
7 8 9
% Binary Boolean Viewpoint
[](1,0,0,0,0,0,0,0,0) [](0,1,0,0,0,0,0,0,0) [](0,0,1,0,0,0,0,0,0)
[](0,0,0,1,0,0,0,0,0) [](0,0,0,0,1,0,0,0,0) [](0,0,0,0,0,1,0,0,0)
[](0,0,0,0,0,0,1,0,0) [](0,0,0,0,0,0,0,1,0) [](0,0,0,0,0,0,0,0,1)
We now clearly see the link between the models and simply write the code to channel our decision variables. Using Sudoku and BinBools as our boards, the code would look something like:
( multifor([Row,Col],1,N), param(Sudoku,BinBools,N)
do
Value is Sudoku[Row,Col],
ValueBools is BinBools[Row,Col,1..N],
ic_global:bool_channeling(Value,ValueBools,1)
).
At this point, we have a channelled model where, during search, if values are pruned in one of the models, its impact will also occur in the other model. This can then of course lead to further overall constraint propagation.
Reasoning
To explain the usefulness of the binary boolean model for the Sudoku puzzle, we must first differentiate between some provided alldifferent/1 implementations by ECLiPSe:
What this means in practice can be shown as following:
?­ [A, B, C] :: [0..1], ic:alldifferent([A, B, C]).
A = A{[0, 1]}
B = B{[0, 1]}
C = C{[0, 1]}
There are 3 delayed goals.
Yes (0.00s cpu)
?­ [A, B, C] :: [0..1], ic_global:alldifferent([A, B, C]).
No (0.00s cpu)
As there has not yet occurred any assignment using the Forward Checking (ic library), the invalidity of the query is not yet detected, whereas the Bounds Consistent version immediately notices this. This behaviour can lead to considerable differences in constraint propagation while searching and backtracking through highly constrained models.
On top of these two libraries there is the ic_global_gac library intended for global constraints for which generalized arc consistency (also called hyper arc consistency or domain consistency) is maintained. This alldifferent/1 constraint provides even more pruning opportunities than the bounds consistent one, but preserving full domain consistency has its cost and using this library in highly constrained models generally leads to a loss in running performance.
Because of this, I found it interesting for the Sudoku puzzle to try and work with the bounds consistent (ic_global) implementation of alldifferent to maximise performance and subsequently try to approach domain consistency myself by channelling the binary boolean model.
Experiment results
Below are the backtrack results for the 'platinumblonde' Sudoku puzzle (referenced as being the hardest, most chaotic Sudoku puzzle to solve in The Chaos Within Sudoku, M. Ercsey­Ravasz and Z. Toroczkai) using respectively forward checking, bounds consistency, domain consistency, standalone binary boolean model and finally, the channelled model:
(ic) (ic_global) (ic_global_gac) (bin_bools) (channelled)
BT 6 582 43 29 143 30
As we can see, our channelled model (using bounds consistency (ic_global)) still needs one backtrack more than the domain consistent implementation, but it definitely performs better than the standalone bounds consistent version.
When we now take a look at the running times (results are the product of calculating an average over multiple executions, this to avoid extremes) excluding the forward checking implementation as it's proven to no longer be interesting for solving Sudoku puzzles:
(ic_global) (ic_global_gac) (bin_bools) (channelled)
Time(ms) 180ms 510ms 100ms 220ms
Looking at these results, I think we can successfully conclude the experiment (these results were confirmed by 20+ other Sudoku puzzle instances):
Channelling the binary boolean viewpoint to the bounds consistent standalone implementation produces a slightly less strong constraint propagation behaviour than that of the domain consistent standalone implementation, but with running times ranging from just as long to notably faster.
EDIT: attempt to clarify
Imagine some domain variable representing a cell on a Sudoku board has a remaining domain of 4..9. Using bounds consistency, it is guaranteed that for both value 4 and 9 other domain values can be found which satisfy all constraints and thus provides consistency. However, no consistency is explicitly guaranteed for other values in the domain (this is what 'domain consistency' is).
Using a binary boolean model, we define the following two sum constraints:
The sum of every binary boolean array is always equal to 1
The sum of every N'th element of every array in every row/col/block is always equal to 1
The extra constraint strength is enforced by the second constraint which, apart from constraining row, columns and blocks, also implicitly says: "every cell can only contain every digit once". This behaviour is not actively expressed when using just the bounds consistent alldifferent/1 constraint!
Conclusion
It is clear that channelling can be very useful to improve a standalone constrained model, however if the new model's constraint strengthness is weaker than that of the current model, obviously, no improvements will be made. Also note that having a more constrained model doesn't necesarilly also mean an overall better performance! Adding more constraints will in fact decrease amounts of backtracks required to solve a problem, but it might also increase the running times of your program if more constraint checks have to occur.
Channeling constraints are used when, in a model, aspects of a problem are represented in more than one way. They are then necessary to synchronize these multiple representations, even though they do not themselves model an aspect of the problem.
Typically, when modelling a problem with constraints, you have several ways of choosing your variables. For example, in a scheduling problem, you could choose to have
an integer variable for each job (indicating which machine does the job)
an integer variable for each machine (indicating which job it performs)
a matrix of Booleans (indicating which job runs on which machine)
or something more exotic
In a simple enough problem, you choose the representation that makes it easiest to formulate the constraints of the problem. However, in real life problems with many heterogeneous constraints it is often impossible to find such a single best representation: some constraints are best represented with one type of variable, others with another.
In such cases, you can use multiple sets of variables, and formulate each individual problem constraint over the most convenient variable set. Of course, you then end up with multiple independent subproblems, and solving these in isolation will not give you a solution for the whole problem. But by adding channeling constraints, the variable sets can be synchronized, and the subproblems thus re-connected. The result is then a valid model for the whole problem.
As hinted in the quote from the handbook, in such a formulation is is sufficient to perform search on only one of the variable sets ("viewpoints"), because the values of the others are implied by the channeling constraints.
Some common examples for channeling between two representations are:
Integer variable and Array of Booleans:
Consider an integer variable T indicating the time slot 1..N when an event takes place, and an array of Booleans Bs[N] such that Bs[T] = 1 iff an event takes place in time slot T. In ECLiPSe:
T #:: 1..N,
dim(Bs, [N]), Bs #:: 0..1,
Channeling between the two representations can then be set up with
( for(I,1,N), param(T,Bs) do Bs[I] #= (T#=I) )
which will propagate information both ways between T and Bs. Another way of implementing this channeling is the special purpose bool_channeling/3 constraint.
Start/End integer variables and Array of Booleans (timetable):
We have integer variables S,E indicating the start and end time of an activity. On the other side an array of Booleans Bs[N] such that Bs[T] = 1 iff the activity takes place at time T. In ECLiPSe:
[S,E] #:: 1..N,
dim(Bs, [N]), Bs #:: 0..1,
Channeling can be achieved via
( for(I,1,N), param(S,E,Bs) do Bs[I] #= (S#=<I and I#=<E) ).
Dual representation Job/Machine integer variables:
Here, Js[J] = M means that job J is executed on machine M, while the dual formulation Ms[M] = J means that machine M executes job J
dim(Js, [NJobs]), Js #:: 0..NMach,
dim(Ms, [NMach]), Ms #:: 1..NJobs,
And channeling is achieved via
( multifor([J,M],1,[NJobs,NMach]), param(Js,Ms) do
(Js[J] #= M) #= (Ms[M] #= J)
).
Set variable and Array of Booleans:
If you use a solver (such as library(ic_sets)) that can directly handle set-variables, these can be reflected into an array of booleans indicating membership of elements in the set. The library provides a dedicated constraint membership_booleans/2 for this purpose.
Here is a simple example, works in SWI-Prolog, but should
also work in ECLiPSe Prolog (in the later you have to use (::)/2 instead of (in)/2):
Constraint C1:
?- Y in 0..100.
Y in 0..100.
Constraint C2:
?- X in 0..100.
X in 0..100.
Channelling Constraint:
?- 2*X #= 3*Y+5.
2*X#=3*Y+5.
All together:
?- Y in 0..100, X in 0..100, 2*X #= 3*Y+5.
Y in 1..65,
2*X#=3*Y+5,
X in 4..100.
So the channel constraint works in both directions, it
reduces the domain of C1 as well as the domain of C2.
Some systems use iterative methods, with the result that this channelling
can take quite some time, here is an example which needs around
1 minute to compute in SWI-Prolog:
?- time(([U,V] ins 0..1_000_000_000, 36_641*U-24 #= 394_479_375*V)).
% 9,883,559 inferences, 53.616 CPU in 53.721 seconds
(100% CPU, 184341 Lips)
U in 346688814..741168189,
36641*U#=394479375*V+24,
V in 32202..68843.
On the other hand ECLiPSe Prolog does it in a blink:
[eclipse]: U::0..1000000000, V::0..1000000000,
36641*U-24 #= 394479375*V.
U = U{346688814 .. 741168189}
V = V{32202 .. 68843}
Delayed goals:
-394479375 * V{32202 .. 68843} +
36641 * U{346688814 .. 741168189} #= 24
Yes (0.11s cpu)
Bye

How to define set in coq without defining set as a list of elements

I am trying to define (1,2,3) as a set of elements in coq. I can define it using list as (1 :: (2 :: (3 :: nil))). Is there any way to define set in coq without using list.
The are basically four possible choices to be made when defining sets in Coq depending on your constraints on the base type of the set and computation needs:
If the base type doesn't have decidable equality, it is common to use:
Definition Set A := A -> Prop
Definition cup A B := fun x => A x /\ B x.
...
basically, Coq's Ensembles. This representation cannot "compute", as we can't even decide if two elements are equal.
If the base data type has decidable equality, then there are two choices depending if extensionality is wanted:
Extensionality means that two sets are equal in Coq's logic iff they have the same elements, formally:
forall (A B : set T), (A = B) <-> (forall x, x \in A <-> x \in B).
If extensionality is wanted, sets should be represented by a canonically-sorted duplicate-free structure, usually a list. A good example is Cyril's Cohen finmap library. This representation however is very inefficient for computing as re-sorting is needed every time a set is modified.
If extensionality is not needed (usually a bad idea if proofs are wanted), you can use representations based on binary trees such as Coq's MSet, which are similar to standard Functional Programming set implementations and can work efficiently.
Finally, when the base type is finite, the set of all sets is a finite type too. The best example of this approach is IMO math-comp's finset, which encodes finite sets as the space of finitely supported membership functions, which is extensional, and forms a complete lattice.
The standard library of coq provides the following finite set modules:
Coq.MSets abstracts away the implementation details of the set. For instance, there is an implementation that uses AVL trees and another based on lists.
Coq.FSets abstracts away the implementation details of the set; it is a previous version of MSets.
Coq.Lists.ListSet is an encoding of lists as sets, which I am including for the sake of completeness
Here is an example on how to define a set with FSets:
Require Import Coq.Structures.OrderedTypeEx.
Require Import Coq.FSets.FSetAVL.
Module NSet := FSetAVL.Make Nat_as_OT.
(* Creates a set with only element 3 inside: *)
Check (NSet.add 3 NSet.empty).
There are many encodings of sets in Coq (lists, function, trees, ...) which can be finite or not. You should have a look at Coq's standard library. For example the 'simplest' set definition I know is this one

why the order of quantifiers are important? how the order is determined?

I want to know why the order of quantifiers are important in a logic formula?
When I read books about logic programming, such points are mentioned, but did not say why.
Is there any one could explain with some examples?
Also, how can we determine order of quantifiers from a given logic formula?
Thanks in advance!
You would be well advised to read a book about first-order logic before the books about
logic programming.
Consider the true statement:
1. Everybody has a mother
Let's formalize it in FOL. To keep it simple, we'll say
that the universe of discourse is the set of people, i.e.
our individual variables x, y, z... range over people. Then
1 becomes:
1F. (x)(Ey)Mother(y,x)
which we can read as: For every person x there exists
some person y such that y is the mother of x.
Now let's swap the order of the universal quantifier (x) and existential
quantifier (Ey):
2F. (Ey)(x)Mother(y,x)
That reads: There is some person y such that for every person x,
y is the mother of x. Or in plain English:
2. There is somebody who is the mother of everybody
You see that swapping the quantifiers changes the meaning of the statement,
taking us from the true statement 1 to the false statement 2. Indeed, to the absurdly false statement
2, which entails that somebody is their own mother.
That's why the order of quantifiers matters.
how can we determine order of quantifiers from a given logic formula?
Well, in 1F and 2F, for example, all the variables are already bound by quantifiers,
so there's nothing to determine. The order of the quantifiers is what you see,
left to right.
Suppose one of the variables was free (not bound), e.g.
3F. (Ey)Mother(y,x)
You might read that as: There is someone who is the mother of x, for variable person x.
But this formula really doesn't express any statement. It expresses a unary predicate of persons, the predicate Someone is the mother of x. If you free up the remaining variable:
4F. Mother(x,y)
then you have the binary predicate, or relation: x is the mother of y.
A formula with 1,2,...,n free variables expresses a unary, binary,...,n-ary predicate.
Given a predicate, you can make a statement by binding free variables with quantifiers and/or substituting individual constants for the free variables. From 4F you can make:
(x)(y)Mother(x,y) (Everybody is everybody's mother)
(Ex)(y)Mother(x,y) (Somebody is everybody's mother)
(Ex)(Ey)Mother(x,y) (Somebody is somebody's mother)
(x)Mother(x,Arnold) (Everybody is the mother of Arnold)
(x)Mother(Bernice,x) (Bernice is the mother of everybody)
Mother(Arnold,Bernice) (Arnold is the mother of Bernice)
...
...
and so on ad nauseam.
What this should make clear is that if a formula has free variables, and therefore expresses
a predicate, the formula as such does not imply any particular way of quantifying
the free variables, or that they should be quantified at all.

Resources