Seeking extended Divide operator explanation - relational-algebra

I am reading about Codd’s Eight Original Operators in Inside Microsoft SQL Server 2008: T-SQL Querying by Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, and Steve Kass and do not understand the Divide operator.
Quotes defining the Divide operator:
“A divisor relation is used to partition a dividend relation and
produce a quotient relation. The quotient relation is made up of those
values of one column from the dividend table for which the second
column contains all of the values in the divisor.”
This statement is in agreement with Wikipedia's definition and example.
“The formula for the Divide operator includes three relations: a
divide by b per c, where a is the dividend, b is the divisor, and c is
the mediator relation. Let relation a have attributes A and relation b
attributes B. The Divide operator returns a relation that includes of
all tuples from divisor such that a tuple {A, B} appears in the
mediator relation for all tuples from divisor relation.”
The diagram below is used to demonstrate this statement. I believe the relations are presented in the following order: dividend, divisor, mediator, and end result.
The second relation (the divisor) has {a, x}, {a, z}, {b, x} and {b, z} for tuples. My thought process follows that since there are tuples {b, x} and {b, z} that b should be included in the end result. I've check the book corrections on the book's website (linked at the beginning of this post) and am certain that I am wrong.
Why is the result of the diagram example a and not a and b?

Relational Division has always been a mess, and is likely to stay that way. It was originally invented as the means for relational query systems to be able to formulate/answer questions such as, e.g., "what is the list of customers who have subscribed ALL possible insurance policy types". That is, it was intended as the vehicle for formulating queries that involved some kind of universal quantification as a predicate determining the result set.
Elaborating further on my customers/policies example, let's assume that the set of "ALL possible insurance policy types" is itself time-varying, i.e. over time, new policy types can emerge, while others can be discontinued. Let's further assume that "ALL possible insurance policy types" in a certain query, means, specifically, "ALL policy types that are currently open for subscrition by customers" (that is, the discontinued policy types are not part of this set of "ALL" types).
Let's assume that this set of "ALL possible policy types" at a certain moment is {TYPE1, TYPE3}. TYPE2 has been discontinued. Let's also assume that customer ES still has a policy of type TYPE2, obviously dating from before when it was discontinued. Thus customer ES has policies of type {TYPE1, TYPE2, TYPE3}.
Now answer the question whether this customer has "ALL policy types that are currently open for subscription". Your answer should be a firm 'yes'. You might get a sense of where this is going : relational division centers around comparison of two sets. One is a "comparand" (the customer's set of subscribed policy types), the other one is a "reference" (the set of policy types that are currently open for subscription).
Now there are at least two useful comparisons that can be made between sets : one is for equality, the other is for set containment (subsetness). In some cases of querying you will want the equality test, while in others you will want the containment test. And the relational operator called 'division' (in whatever of its umpty flavours) does not allow to make this distinction. I gather it is this phenomenon that you are asking about, and the answer is simply that the choice is made by design, so to speak, and hardwired into the operator's definition. It makes the operator "useful" in those cases where its definition matches your needs, and useless in other cases.
The sort-of good news is, when you have to spell out the SQL for a relational division, there won't be much difference between division-with-equality and division-with-containment (despite the fact that the algebra operator is, by definition, only one of the two, and the other one simply even does not have an algebraic operator). Main problem is that set equality itself is very messy to express in SQL, and that's not any less so "inside a relational division query" ...
And then there are still all the valid points already made by philip. Read them, but very carefully.

The result in the diagram should indeed be rows {a} and {b}. But the book and your message are mixed up.
There are a number of relational DIVIDE variants. There is Codd's divide, Date's abridged version of Codd's divide, Todd's divide, Date and Darwen's Small (Codd-like) divide-per and Great (Todd-like) divide-per and their Generalized Small and Great divide-pers, and Darwen's per-divide-per.
The quotes and diagram are from the book. First they are trying to talk about Codd's divide and it has two inputs; "a divisor relation is used to partition a dividend relation". (The book even says: "This is the way the Divide operator was defined originally.") But its second sentence is wrong. First, it talks about columns instead of disjoint column sets. Second, it limits its second operand to only having a subset of the first operand's columns. (It's Date's abridged version of Codd's divide.) Third, it is unintelligible. Let A, B and C be disjoint attribute sets and AB and BC be relations with attribute sets A U B and B U C. Then Codd's AB DIVIDEBY BC returns A subrows that appear in AB with every B subrow that appears in BC.
The wiki actually describes Codd's divide. So only if you ignore the problems above is the first quote "in agreement with" it.
Then the second quote is talking about a different "extended" divide operator. The book explains that Codd's divide doesn't reflect a natural phrasing using "for all" in common situations. They then unclearly introduce the other operator. They are trying to define the Small divide-per. The input "includes three relations: a divide by b per c". The diagram shown is even labelled as an "extended divide".
Unfortunately the definition is not only ungrammatical, it is unintelligible. A DIVIDEBY B PER AB returns rows that appear in A that appear as subrows in AB with every row that appears in B. They should say that it returns the tuples {A} in the dividend such that a tuple {A, B} appears in the mediator with all tuples {B} in the divisor
That book is a mess. The writing is terrible. The proof reading is terrible. Given the text in this section there is no reason to expect the answers to be right. But also, you are not reading carefully.
Then you misinterpret the arguments in the diagram. But you don't notice because you are not following the definition. Thus not noticing that the definition is unintelligible. The diagram shows relations A-AB-B-result which is dividend-mediator-divisor-quotient. So again you are not reading carefully.
And it is true that both {a} and {b} rows should be in the result. But I can't really say you're "right" about that because your "thought process" seems to be "get a vague impression". Some of the vagueness is surely due to the poor writing in the book. But you don't seem to be noticing that the book is unintelligible. Read carefully!

Related

How to refer to "equivalent" algorithms

This is a bit of a "soft question", so if this is not the appropriate place to post, please let me know.
Essentially I'm wondering how to talk about algorithms which are "equivalent" in some sense but "different" in others.
Here is a toy example. Suppose we are given a list of numbers list of length n. Two simple ways to add up the numbers in the list are given below. Obviously these methods are exactly the same in exact arithmetic, but in floating point arithmetic might give different results.
add_list_1(list,n):
sum = 0
for i=1,2,...,n:
sum += list[i]
return sum
add_list_2(list,n):
sum = 0
for i=n,...,2,1:
sum += list[i]
return sum
This is a very common thing to happen with numerical algorithms, with Gram-Schmidt vs Modified Gram Schmidt being perhaps the most well known example.
The wikipedia page for algorithms mentions "high level description", "implementation description", and "formal description".
Obviously, the implementation and formal descriptions vary, but a high level description such as "add up the list" is the same for both.
Are these different algorithms, different implementations of the same algorithm, or something else entirely? How would you describe algorithms where the high level level description is the same but the implementation is different when talking about them?
The following definition can be found on the Info for the algorithm tag.
An algorithm is a set of ordered instructions based on a formal language with the following conditions:
Finite. The number of instructions must be finite.
Executable. All instructions must be executable in some language-dependent way, in a finite amount of time.
Considering especially
set of ordered instructions based on a formal language
What this tells us is that the order of the instructions matter. While the outcome of two different algorithms might be the same, it does not imply that the algorithms are the same.
Your example of Gram-Schmidt vs. Modified Gram-Schmidt is an interesting one. Looking at the structure of each algorithm as defined here, these are indeed different algorithms, even on a high level description. The steps are in different orders.
One important distinction you need to make is between a set of instructions and the output set. Here you can find a description of three shortest path algorithms. The set of possible results based on input is the same but they are three very distinct algorithms. And they also have three completely different high level descriptions. To someone who does not care about that though these "do the same" (almost hurts me to write this) and are equivalent.
Another important distinction is the similarity of steps between to algorithms. Let's take your example and write it in a bit more formal notation:
procedure 1 (list, n):
let sum = 0
for i = 1 : n
sum = sum + list[i]
end for
sum //using implicit return
procedure 2 (list, n):
let sum = 0
for i = n : 1
sum = sum + list[i]
end for
sum //using implicit return
These two pieces of code have the same set of results but the instructions seem differently ordered. Still this is not true on a high level. It depends on how you formalise the procedures. Loops are one of those things that if we reduce them to indices they change our procedure. In this particular case though (as already pointed out in the comments), we can essentially substitute the loop for a more formalised for each loop.
procedure 3 (list):
let sum = 0
for each element in list
sum = sum + element
end for
sum
procedure 3 now does the same things as procedure 1 and procedure 2, their result is the same but the instructions again seem different. So the procedures are equivalent algorithms but not the same on the implementation level. They are not the same since the order in which the instructions for summing are executed is different for procedure 1 and procedure 2 and completely ignored in procedure 3 (it depends on your implementation of for each!).
This is where the concepts of a high level description comes in. It is the same for all three algorithms as you already pointed out. The following is from the Wikipedia article you are referring to.
1 High-level description
"...prose to describe an algorithm, ignoring the implementation details. At this level, we do not need to mention how the machine manages its tape or head."
2 Implementation description
"...prose used to define the way the Turing machine uses its head and the way that it stores data on its tape. At this level, we do not give details of states or transition function."
3 Formal description
Most detailed, "lowest level", gives the Turing machine's "state table".
Keeping this in mind your question really depends on the context it is posed in. All three procedures on a high level are the same:
1. Let sum = 0
2. For every element in list add the element to sum
3. Return sum
We do not care how we go through the list or how we sum, just that we do.
On the implementation level we already see a divergence. The procedures move differently over the "tape" but store the information in the same way. While procedure 1 moves "right" on the tape from a starting position, procedure 2 moves "left" on the tape from the "end" (careful with this because there is no such thing in a TM, it has to be defined with a different state, which we do not use in this level).
procedure 3, well it is not defined well enough to make that distinction.
On the low level we need to be very precise. I am not going down to the level of a TM state table thus please accept this rather informal procedure description.
procedure 1:
1. Move right until you hit an unmarked integer or the "end"
//In an actual TM this would not work, just for simplification I am using ints
1.e. If you hit the end terminate //(i = n)
2. Record value //(sum += list[i]) (of course this is a lot longer in an actual TM)
3. Go back until you find the first marked number
4. Go to 1.
procedure 2 would be the reverse on instructions 1. and 3., thus they are not the same.
But on these different levels are these procedures equivalent? According to Merriam Webster, I'd say they are on all levels. Their "value" or better their "output" is the same for the same input**. The issue with the communication is that these algorithms, like you already stated in your question return the same making them equivalent but not the same.
You referring to **floating point inaccuracy implies implementation level, on which the two algorithms are already different. As a mathematical model we do not have to worry about floating point inaccuracy because there is no such thing in mathematics (mathematicians live in a "perfect" world).
These algorithms are the different implementation level descriptions of the same high level description. Thus, I would refer to different implementations of the same high level algorithm since the idea is the same.
The last important distinction is the further formalisation of an algorithm by assigning it to a set for its complexity (as pointed out perfectly in the comments by #jdehesa). If you just use big omicron, well... your sets are going to be huge and make more algorithms "equivalent". This is because both merge sort and bubble sort are both members of the set O(n^2) for their time complexity (very unprecise but n^2 is an upper bound for both). Obviously bubble sort is not in O(n*log[2](n)) but this description does not specify that. If we use big theta then bubble and merge sort are not in the same set anymore, context matters. There is more to describing an algorithm than just its steps and that is one more way you can keep in mind to distinguish algorithms.
To sum up: it depends on context, especially who you are talking to. If you are comparing algorithms, make sure that you specify the level you are doing it on. To an amateur saying "add up the list" will be good enough, for your docs use a high level description, when explaining your code explain your implementation of the above high level, and when you really need to formalise your idea before putting it in code use a formal description. Latter will also allow you to prove that your program executes correctly. Of course, nowadays you do not have to write all the states of the underlying TM anymore. When you describe your algorithms, do it in the appropriate form for the setting. And if you have two different implementations of the same high level algorithm just point out the differences on the implementation level (direction of traversal, implementation of summing, format of return values etc.).
I guess, you could call it an ambiguous algorithm. Although this term may not be well defined in literature, consider your example on adding the list of elements.
It could be defined as
1. Initialize sum to zero
2. Add elements in the list to sum one by one.
3. return the sum
The second part is ambiguous, you can add them in any order as its not defined in the algorithm statement and the sum may change in floating point arithematic
One good example I came across: cornell lecture slide. That messy sandwich example is golden.
You could read what the term Ambiguity gererally refers to here wiki, Its applied in various contexts including computer science algorithms.
You may be referring to algorithms that, at least at the surface, perform the same underlying task, but have different levels of numerical stability ("robustness"). Two examples of this may be—
calculating mean and variance (where the so-called "Welford algorithm" is more numerically stable than the naive approach), and
solving a quadratic equation (with many formulas with different "robustness" to choose from).
"Equivalent" algorithms may also include algorithms that are not deterministic, or not consistent between computer systems, or both; for example, due to differences in implementation of floating-point numbers and/or floating-point math, or in the order in which parallel operations finish. This is especially problematic for applications that care about repeatable "random" number generation.

sort/2, keysort/2 vs. samsort/3, predsort/3

ISO-Prolog provides sort/2 and keysort/2 which relies on term order (7.2) often called "standard term order".
The common way to sort a list with a different order is to map each element El of that list somehow to a list of pairs XKey-El and then sort that list, and finally project the keys away. As an example consider how keysort/2 can be expressed in terms of sort/2 (See the note for an implementation).
In many situations this approach is much faster than using a generic implementation specific sorting predicate that relies on a user defined order as SWI's predsort(C_3, List, SortedList)
or SICStus' samsort(O_2, List, SortedList).
My question boils down to:
Are there cases where a sorting using predsort/3 resp. samsort/3 cannot be replaced by some mapping, sort/2-ing and projecting?1
And for the sake of clarity, better stick to finite, ground terms. For, infinite ground terms do no possess a total, lexicographic order as it would be needed as an extension of the finite case ; and further, it is not clear how the comparison of variables with the case of two different variables being implementation dependent will turn out given 7.2.1 of ISO/IEC 13211-1:1995:
7.2.1 Variable
If X and Y are variables which are not identical then
X term_precedes Y shall be implementation dependent
except that during the creation of a sorted list (7.1.6.5,
8.10.3.1 j) the ordering shall remain constant.
So it is not clear whether predsort/3 would still qualify as a
creation of a sorted list. What is clear is that the ordering remains constant during sort/2 and keysort/2.
1 Thanks to #WillNess, this projection should at least include also reverse/2 — or any linear transformation. It also means that results with both duplicates and unique ones can be implemented (similarly to the way keysort/2 is implemented).
First off, you can "negate" Prolog atoms. Let's call it atom_neg/2 (it is a silly name, but it does something silly anyway):
atom_neg(A, NK) :-
atom_codes(A, Cs),
maplist(negate, Cs, NCs),
append(NCs, [0], NK).
negate(X, N) :- N is -X.
I am not claiming it is practical to do this, but apparently, it is possible.
A total ordering is a weak ordering, and a key function f on a set T together with a total ordering r on the codomain of f, define a weak ordering wr(x, y) <==> r(f(x), f(y)).
(Codomain of a function in that context is the domain of the values that the function returns.)
I might be totally wrong, but the existence of a relation requires the existence of a key: you can define a relation in terms of another relation, but eventually you must compare something that can exist in isolation as well: the key.
The point here is that the key does not need to be in the same domain as the thing we want to sort, and that a weak ordering (a relation) is defined for objects of the same domain. Prolog does something weird here: it defines a standard order of terms for all possible terms. Prolog also does not have a proper notion of "types", or "domains". My gut feeling tells me that sorting things that do not belong to the same domain is simply not very useful, but Prolog obviously disagrees.
You can not define a key function for two cases:
The comparison predicate keeps its own state;
You have "opaque" objects (defined in C, for example) that provide the comparison function but not a key function.
Either way, predsort can be useful: no one would prefer atom_neg/2 over the solution by Will Ness. However, it has one serious defficiency at the moment: it does not allow for a stable sort. The SWI-Prolog already can be used in this way, all it takes would be to add the current behavour to the specification and documentation of predsort/3.
edit: the answer by #Boris shows how to "negate" atoms for the purposes of comparison, so it invalidates my contr-argument in that case. And the new stipulation in the question invalidates it entirely.
In case of complex sorting criteria, multiple sub-keys will have to be arranged. If retention of duplicates is desired, incrementing indices should be prefixed to the original term, following the sorting subkeys, in terms constructed for sort/2 to sort.
The complexity and number of the constructed subkeys can get out of hand though. Imaging sorting points by X first, then by Y, in ascending or descending orders in some regions, and by Y first, and by X second, in others.
Then the sought for advantage of replacing the loglinear number of (presumably computationally heavy) comparisons with only a linear number of key constructions and a loglinear number of (presumably light) comparisons in standard order of terms, can disappear.
Trivially, predsort/3ing e.g. a list of atoms in reverse, with custom comparison predicate
comp(<,A,B):- B #< A.
etc., can't be done by sort/2ing which works in "standard order of terms" (quoting SWI documentation). With numbers, we could flip the sign, but not with names.
Perhaps you'd want to add reverse to the allowed actions.
With sort/4 allowed, I don't see anything that wouldn't work. And since it's stable, secondary criteria can be accommodated as well, by the secondary passes (first by a minor, then by a major criterion).
I think I might have a proper answer to your question.
If you have a partial ordering, you can still try and sort using predsort/3, and you might get a better result than just saying "a total ordering does not exist."
Here is an example: say you have a game played by two teams. Each play gives a point to only one of the teams, and you play until one team reaches a certain number of points.
Now, you organize a tournament, and it has a group stage, in groups of 4 teams, which is a round-robin. Only the two top teams make it out of the groups.
For each game played, teams get a score of own_points - other_teams_points. In other words, if you play to 7, and the final score is:
Team A - 5:7 - Team B
them Team A scores -2 and Team B scores 2.
At the end of the group stage, you order the teams by:
Total score
If the total score is the same, the team that won the direct fight is ordered higher.
Most notably, using this scoring scheme, you cannot resolve a three-way draw if Team A beat Team B, Team B beat Team C, and Team C beat Team A. A "stable" sort makes no sense in this context.
However, using predsort/3, you can attempt to find the two top teams, and you will get a definitive answer in most cases. Resolving a three-way draw as above is usually resolved using a coin-toss.

genetic algorithm crossover operation

I am trying to implement a basic genetic algorithm in MATLAB. I have some questions regarding the cross-over operation. I was reading materials on it and I found that always two parents are selected for cross-over operation.
What happens if I happen to have an odd number of parents?
Suppose I have parent A, parent B & parent C and I cross parent A with B and again parent B with C to produce offspring, even then I get 4 offspring. What is the criteria for rejecting one of them, as my population pool should remain the same always? Should I just reject the offspring with the lowest fitness value ?
Can an arithmetic operation between parents, like suppose OR or AND operation be deemed a good crossover operation? I found some sites listing them as crossover operations but I am not sure.
How can I do crossover between multiple parents ?
"Crossover" isn't so much a well-defined operator as the generic idea of taking aspects of parents and using them to produce offspring similar to each parent in some ways. As such, there's no real right answer to the question of how one should do crossover.
In practice, you should do whatever makes sense for your problem domain and encoding. With things like two parent recombination of binary encoded individuals, there are some obvious choices -- things like n-point and uniform crossover, for instance. For real-valued encodings, there are things like SBX that aren't really sensible if viewed from a strict biological perspective. Rather, they are simply engineered to have some predetermined properties. Similarly, permutation encodings offer numerous well-known operators (Order crossover, Cycle crossover, Edge-assembly crossover, etc.) that, again, are the result of analysis of what features in parents make sense to make heritable for particular problem domains.
You're free to do the same thing. If you have three parents (with some discrete encoding like binary), you could do something like the following:
child = new chromosome(L)
for i=1 to L
switch(rand(3))
case 0:
child[i] = parentA[i]
case 1:
child[i] = parentB[i]
case 2:
child[i] = parentC[i]
Whether that is a good operator or not will depend on several factors (problem domain, the interpretation of the encoding, etc.), but it's a perfectly legal way of producing offspring. You could also invent your own more complex method, e.g., taking a weighted average of each allele value over multiple parents, doing boolean operations like AND and OR, etc. You can also build a more "structured" operator if you like in which different parents have specific roles. The basic Differential Evolution algorithm selects three parents, a, b, and c, and computes an update like a + F(b - c) (with some function F) roughly corresponding to an offspring.
Consider reading the following academic articles:
DEB, Kalyanmoy et al. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, v. 6, n. 2, p. 182-197, 2002.
DEB, Kalyanmoy; AGRAWAL, Ram Bhushan. Simulated binary crossover for continuous search space. Complex systems, v. 9, n. 2, p. 115-148, 1995.
For SBX, method of crossing and mutate children mentioned by #deong, see answer simulated-binary-crossover-sbx-crossover-operator-example
Genetic algorithm does not have an arbitrary and definite form to be made. Many ways are proposed. But generally, what applies in all are the following steps:
Generate a random population by lot or any other method
Cross parents to raise children
Mutate
Evaluate the children and parents
Generate new population based only on children or children and parents (different approaches exist)
Return to item 2
NSGA-II, the DEB quoted above, is one of the most widely used and well-known genetic algorithms. See an image of the flow taken from the article:

Tricky algorithm for sorting symbols in an array while preserving relationships via order

The problem
I have multiple groups which specify the relationships of symbols.. for example:
[A B C]
[A D E]
[X Y Z]
What these groups mean is that (for the first group) the symbols, A, B, and C are related to each other. (The second group) The symbols A, D, E are related to each other.. and so forth.
Given all these data, I would need to put all the unique symbols into a 1-dimension array wherein the symbols which are somehow related to each other would be placed closer to each other. Given the example above, the result should be something like:
[B C A D E X Y Z]
or
[X Y Z D E A B C]
In this resulting array, since the symbol A has multiple relationships (namely with B and C in one group and with D and E in another) it's now located between those symbols, somewhat preserving the relationship.
Note that the order is not important. In the result, X Y Z can be placed first or last since those symbols are not related to any other symbols. However, the closeness of the related symbols is what's important.
What I need help in
I need help in determining an algorithm that takes groups of symbol relationships, then outputs the 1-dimension array using the logic above. I'm pulling my hair out on how to do this since with real data, the number of symbols in a relationship group can vary, there is also no limit to the number of relationship groups and a symbol can have relationships with any other symbol.
Further example
To further illustrate the trickiness of my dilemma, IF you add another relationship group to the example above. Let's say:
[C Z]
The result now should be something like:
[X Y Z C B A D E]
Notice that the symbols Z and C are now closer together since their relationship was reinforced by the additional data. All previous relationships are still retained in the result also.
The first thing you need to do is to precisely define the result you want.
You do this by defining how good a result is, so that you know which is the best one. Mathematically you do this by a cost function. In this case one would typically choose the sum of the distances between related elements, the sum of the squares of these distances, or the maximal distance. Then a list with a small value of the cost function is the desired result.
It is not clear whether in this case it is feasible to compute the best solution by some special method (maybe if you choose the maximal distance or the sum of the distances as the cost function).
In any case it should be easy to find a good approximation by standard methods.
A simple greedy approach would be to insert each element in the position where the resulting cost function for the whole list is minimal.
Once you have a good starting point you can try to improve it further by modifying the list towards better solutions, for example by swapping elements or rotating parts of the list (local search, hill climbing, simulated annealing, other).
I think, because with large amounts of data and lack of additional criteria, it's going to be very very difficult to make something that finds the best option. Have you considered doing a greedy algorithm (construct your solution incrementally in a way that gives you something close to the ideal solution)? Here's my idea:
Sort your sets of related symbols by size, and start with the largest one. Keep those all together, because without any other criteria, we might as well say their proximity is the most important since it's the biggest set. Consider every symbol in that first set an "endpoint", an endpoint being a symbol you can rearrange and put at either end of your array without damaging your proximity rule (everything in the first set is an endpoint initially because they can be rearranged in any way). Then go through your list and as soon as one set has one or more symbols in common with the first set, connect them appropriately. The symbols that you connected to each other are no longer considered endpoints, but everything else still is. Even if a bigger set only has one symbol in common, I'm going to guess that's better than smaller sets with more symbols in common, because this way, at least the bigger set stays together as opposed to possibly being split up if it was put in the array later than smaller sets.
I would go on like this, updating the list of endpoints that existed so that you could continue making matches as you went through your set. I would keep track of if I stopped making matches, and in that case, I'd just go to the top of the list and just tack on the next biggest, unmatched set (doesn't matter if there are no more matches to be made, so go with the most valuable/biggest association). Ditch the old endpoints, since they have no matches, and then all the symbols of the set you just tacked on are the new endpoints.
This may not have a good enough runtime, I'm not sure. But hopefully it gives you some ideas.
Edit: Obviously, as part of the algorithm, ditch duplicates (trivial).
The problem as described is essentially the problem of drawing a graph in one dimension.
Using the relationships, construct a graph. Treat the unique symbols as the vertices of the graph. Place an edge between any two vertices that co-occur in a relationship; more sophisticated would be to construct a weight based on the number of relationships in which the pair of symbols co-occur.
Algorithms for drawing graphs place well-connected vertices closer to one another, which is equivalent to placing related symbols near one another. Since only an ordering is needed, the symbols can just be ranked based on their positions in the drawing.
There are a lot of algorithms for drawing graphs. In this case, I'd go with Fiedler ordering, which orders the vertices using a particular eigenvector (the Fiedler vector) of the graph Laplacian. Fiedler ordering is straightforward, effective, and optimal in a well-defined mathematical sense.
It sounds like you want to do topological sorting: http://en.wikipedia.org/wiki/Topological_sorting
Regarding the initial ordering, it seems like you are trying to enforce some kind of stability condition, but it is not really clear to me what this should be from your question. Could you try to be a bit more precise in your description?

How to represent implicit relationships?

I am developing an application where I have to deal with an Entity named 'Skill'. Now the thing is that a 'Skill A' can have a certain relevancy with a 'Skill B' (the relevancy is used for search purposes). Similarly 'Skill B' can also be relevant to 'Skill C'. We currently have the following data model to represent this scenario
Skill {SkillId, SkillName}
RelevantSkill {SkillId, RelevantSkillId, RelevanceLevel}
Now given the above scenario we have a implicit relation between 'Skill A' and 'Skill C'. What would be the optimal data model for this scenario be? We'd also have to traverse this hierarchy when performing search.
What you're asking for seems to be basically a graph distance algorithm (slash data structure) computed from a set of pairwise distances. A reasonable (and nicely computable) metric is commute time.
It can be thought of thus: construct a graph where each node is a Skill, and each edge represents the relevancy of the nodes it connects to each other. Now imagine that you're starting at some node in the graph (some Skill) and randomly jumping to other nodes along defined edges. Let's say that the probability of jumping from Skill A to Skill B is proportional to the relevancy of those skills to each other (normalized by the relevancy of those to other skills ...). Now the commute time represents the average number of steps it takes to make it from Skill A to Skill C.
This has a very nice property that adding more paths between two nodes makes the commute time shorter: if Skill A and B, B and C, C and D, and D and A are related, then the commute time between A and C will get shorter yet. Moreover, commute time can be computed quite easily using an eigenvalue decomposition of your sparsely connected Skill graph (I think the reference I gave you shows this, but if not there are many available).
If you want to actually store the commute time between any pair of Skills you'll need a fully-connected graph, or NxN matrix (N is the number of Skills). A far nicer variant, however, is to, as stated above, drop all connections weaker than some threshold, then store a sparsely connected graph as rows in a database.
Good luck, and I hope this helped!
Something left open by your explanation is how the relevance levels are combined in the case of the indirect ("implicit") relationships. E.g. if skill A is relevant to B with level 3 and skill B is relevant to skill C with level 5, what is the level (as a number) of the indirect relevance of skill A to skill C?
The proper data model depends on two things: how many skills you have, and how dense the relationship structure it is (dense = lots of skills are relevant to others). If the relationship structure is dense and you have few skills (< 1000) you can be best off be representing the whole thing as a matrix.
But if you have many skills but a sparse relationship structure you can represent it as three tables:
Skill {SkillId, SkillName}
RelevantSkill {SkillId, RelevantSkillId, RelevanceLevel}
IndirectRelevance { SkillId, RelevantSkillId, RelevanceLevel}
The third table (IndirectRelevance) is calculated based on the two primary tables; whenever you change Skill or RelevantSkill tables, you need to update the IndirectRelevance table.
I think it is better to have three tables than two; this makes the implementation cleaner and more straightforward. RelevantSkill contains the explicitly stated relationships; IndirectRelevance all derived facts.
Your best bet is to:
augment RelevantSkill with an ImplicitRelevance boolean column:
RelevantSkill {SkillId, RelevantSkillId, RelevanceLevel, ImplicitRelevance}
insert (into the RelevantSkill table) rows corresponding to all implicit (indirect) relevance relationships (e.g. "Skill A" -> "Skill C") with their corresponding computed RelevanceLevel's, if and only if the computed RelevanceLevel is above a set threshold. These rows should have ImplicitRelevance set to true
skill_a_id, skill_b_id, computed_level, 'T'
If any changes are made to the explicit relevance levels (metrics), remove all rows with ImplicitRelevance=true and recompute (re-insert) them.
There are some factors to consider before you can choose best options:
how many skills are there?
are relations sparse or dense (i.e. are skills related with a lot of other skills)?
how often do they change?
is there a relevancy threshold (minimal relevancy that is of interest to you)?
how is multi-path relevancy calculated?
The structure obviously will be like antti.huima proposes. The difference is how IndirectRelevance will be implemented. If there are a lot of changes, lot of relations and relations are dense, then the best way might be stored procedure (perhaps accessed through a view). If there relations are sparse and there is threshold, the best option might be materialized view or a table updated via triggers.

Resources