Representation of parent-child relationship in java - algorithm

I have a set of tables that are related (parent child relationships).
I need a solution where in I can quickly find if two tables are related.
Also If they are related I need to find out if the relationship is a parent-child relationship or child-parent relationship.
My solution:
Store the relationship details in the form of a matrix.
Say there are three tables T1, T2 and T3. T1 has two children T2 and T3.
Then I can represent the relationship as
{{0,1,1},
{-1,0,0},
{-1,0,0}}
The first row and first column represent T1.
The second row and second column represent T2.
The third row and third column represent T3.
To find the relationship between T1 and T2 you go to the first row and second column. The value is 1. This shows that T1 is the parent and T2 is the child.
A -1 would indicate that the first table is the child and the second table is the parent.
A 0 would indicate that the two tables are not related.
Is there a better solution to this problem?

Related

How to concatenate two B-Trees ((d,2d) trees) with another node x

Assuming 2 B-trees T1, T2 are given. Also a key 'x'. I know that the leaves of each tree have between 2d-1 to 4d-1 keys, and I know that all the keys in T1 are greater then x and all the keys in T2 are smaller the x.
What I am trying to do is to create a single B-Tree out of this trees. After trying then naive approach to just insert all the keys in the smaller tree, I tried to do the following:
Check if height(T1) > height(T2) (let's assume it was found true. If it's false it's really symmetrical.
On T1, I am going all the way down-left to height(T2)+1 and insert x as the leftmost key in the leftmost node in this height.
I Insert T2 as the left son of x.
I know that inserting x isn't a problem as I might need to do a sequence of splits up back to the root. My problem with my solution is that T2 might have only 1 key in its root(or any number of keys smaller than d-1) which breaks the B-Tree rule for internal nodes.
Is there any other solution for this?

Difference between natural join and simple join on common attribute in algebra

I have a confusion.
Suppose there two relation with common attribite A.
Now is
(R natural join S)=(R join S where join condition A=A)?
Natural join returns a common column A
Do simple join return two columns with same name AA or 1 common column A due to relational algebra which is defined in set theory ??
There's an example of a Natural Join here. As #Renzo says, there are many variants. And SQL is different again. So I'll keep to what wikipedia shows.
Most important: the join condition applies to all attributes in common between the two arguments. So you need to say "two relations with A being their only common attribute". The only common attribute is DeptName in that wikipedia example. There can be many common attributes, in general.
Yes, joining means forming tuples in the result by pairing tuples from the argument that have same values in the corresponding common attributes. So you have same value with same attribute name. It would be pointless repeating both attributes in the result, because you'd be repeating the values. The example shows there's a single attribute DeptName in the result.
Beware that different dialects of Relational Algebra use different symbols and notations. So the bare bowtie (⋈) for Natural Join can be suffixed with a boolean condition, making a theta-join (θ-join) or equi-join -- see that example. The boolean condition is between differently-named attributes, and might use any comparison operator. So both attribute names and their values appear in the result.
Set theory operations apply because each tuple is a set of name-value pairs. The result tuples are the union of a tuple from each argument -- providing that union is a valid tuple. That is, providing same-named n-v pairs have same value.

An efficient way to assign user_ids to huge dataset under certain conditions

I have a dataset contains pairs of tx_ids and node_ids where every tx_id is associated with one or more node_id.
The node_ids that are connected to the same tx_id belong to the same user.
And if the same node_id is connected to different tx_ids then all nodes associated with these tx_ids belong to the same user as well.
Take a look at the following small sample of the dataset:
tx_id node_id user_id
1 a 1
1 d 1
2 d 1
2 g 1
3 g 1
3 e 1
4 c 2
4 f 2
For example, nodes {a,d} belong to the same user as they appear with the same tx_id. Additionally, {d} is connected to tx_ids = {1, 2} then {a, d, g} all belong to the same user. But {g} appears in tx_ids = {2, 3}, that means all nodes in tx_ids = {1,2,3} belong to the same user (as illustrated above).
Lets put it this way:
tx_id= transaction_id, and node_id =bank account.
A user may have multiple bank accounts, and the bank account belongs to one and only one user. Now user can originate a single transaction from different accounts (in my situation),
so in the above example for tx_id =1, (User_1) used the two accounts {a,d}, which means that any transaction that used accounts {a,d} belongs to User_1, consequently tx_id=2 belongs to User_1 since it contains account {d} which appeared in tx_id =1. I want to create a new table which has tx_id, node_id, user_id(new integer value, not continuous and not unique)
The problem is that in my dataset the user_ids are not assigned to nodes and I have a huge dataset of 400M records. I am looking for an efficient way to solve this, given that my dataset is stored in PostgreSQL database. If it is possible to solve this by SQL queries, that would be great, otherwise, any suggestion in any programming language is appreciated.
Thanks in advance.
use python dictionary as a lookup table to store node_ids and their corresponding user_ids. Retrieve tx_id, node_id list ordered by tx_id, and if a node_id appeared with two tx_ids, the tx which comes later will find that the node_id already stored in python dictionary and get the user_id from dict.
This is union-find partitioning problem, the question is how to unite sets(tx in your case) if they have a common node_id.

Oracle selectivity / cardinality

According to this:
Selectivity is the value between 0 and 1, and it is the fraction of rows returned after applying a filter on the table.
For example if
a table has 10,000 rows and the query returns 2601 rows, the
selectivity would be 2601/10000 or .26 or 26 percent. Selectivity
enables you (or optimizer for that matter) to decide which data access
method is optimum in the execution plan.
I need some clarifications: ok, that table has 10000 rows and only 2601 are returned by the query. But what if, that query block contains three joined tables, or it contains a subquery in the where clause? So in the FROM clause there are three tables, and the fourth table is part of a where clause subquery, then how is this selectivity calculated?
Selectivity = number of rows satisfying a condition (from which table?) / total number of rows (from all the four tables?)
Same question for cardinality (cardinality = selectivity * total number of rows).
I found many articles about this, but each of them exemplifies these concepts with simple select statements, based on a single table or a single where clause condition.
Can someone give me an example of how are these measures calculated in case of a bit more complex query (on "hr" schema, or other training purpose schema), meaning subqueries in the FROM clause, or WHERE clause, and so on?
Thank you.
EDIT:
I need some clarification about the selectivity measure, computed by the Estimator (Cost-Based Optimizer).
http://gerardnico.com/wiki/database/oracle/selectivity
For example, for an equality predicate (last_name = 'Smith'), selectivity is set to the reciprocal of the number n of distinct values of last_name, because the query selects rows that all contain one out of n distinct values.
I don't know how to understand that "reciprocal of the number n of distinct values".
Assuming the employees table have 107 rows, and the query
Code: [Select all] [Show/ hide]
select * from employees where last_name = 'Smith'
returns 2 rows, the selectivity is 2/107 = 0.01? So it's the number of rows satisfying the predicate / total number of rows. So no "distinct" involved in this equation.
Apart from this selectivity of the statement, there is also a column selectivity, which is represented by the NDV (number of distinct values in that column - which can be queried from dba_tab_col_statistics) / total number of rows (http://www.runningoracle.com/product_info.php?products_id=233). So if NDV is 103, the selectivity of the last_name column is 103/107 = 0,96.
This is what I understood.. is this correct? If I'm wrong, please correct me.
Thank you.
Selectivity is always based on whatever criteria are being applied at that time.
Exactly what this means for a particular table depends on the join order.
Oracle will always start executing a query by selecting rows from a particular table on it's own. In this case the selectivity is straightforward as per the examples you have read. There are no join conditions to take into account at this point.
Next, it joins in a second table. Oracle makes an estimate of how many rows will satisfy both the constant conditions directly on that table only, along with any join conditions from the first table. The later is called "join selectivity".
Then, when joining the third table it estimates based on joining to the first two tables as well as any constant conditions.
This is one of the reasons that join order is so important to a plan.

Does a greedy approach work here?

Suppose there are N groups of people and M tables. We know the size of each group and the capacity of each table. How do we match the people to the tables such that no two persons of the same group sit at the same table?
Does a greedy approach work for this problem ? (The greedy approach works as follows: for each table try to "fill" it with people from different groups).
Assuming the groups and tables can be of unequal size, I don't think the greedy approach as described works (at least not without additional specifications). Suppose you have a table of 2 T1 and a table of 3 T2, and 3 groups {A1}, {B1,B2} and {C1,C2}. If I follow your algorithm, T1 will receive {A1,B1} and now you are left with T2 and {B2,C1,C2} which doesn't work. Yet there is a solution T1 {B1,C1}, T2 {A1,B2,C2}.
I suspect the following greedy approach works: starting with the largest group, take each group and allocate one person of that group per table, picking first tables with the most free seats.
Mathias:
I suspect the following greedy approach works: starting with the largest group, take each group and allocate one person of that group per table, picking first tables with the most free seats.
Indeed. And a small variation of tkleczek's argument proves it.
Suppose there is a solution. We have to prove that the algorithm finds a solution in this case.
This is vacuously true if the number of groups is 0.
For the induction step, we have to show that if there is any solution, there is one where one member of the largest group sits at each of the (size of largest group) largest tables.
Condition L: For all pairs (T1,T2) of tables, if T1 < T2 and a member of the largest group sits at T1, then another member of the largest group sits at T2.
Let S1 be a solution. If S1 fulfills L we're done. Otherwise there is a pair (T1,T2) of tables with T1 < T2 such that a member of the largest group sits at T1 but no member of the largest group sits at T2.
Since T2 > T1, there is a group which has a member sitting at T2, but none at T1 (or there is a free place at T2). So these two can swap seats (or the member of the largest group can move to the free place at T2) and we obtain a solution S2 with fewer pairs of tables violating L. Since there's only a finite number of tables, after finitely many steps we have found a solution Sk satisfying L.
Induction hypothesis: For all constellations of N groups and all numbers M of tables, if there is a solution, the algorithm will find a solution.
Now consider a constellation of (N+1) groups and M tables where a solution exists. By the above, there is also a solution where the members of the largest group are placed according to the algorithm. Place them so. This reduces the problem to a solvable constellation of N groups and M' tables, which is solved by the algorithm per the induction hypothesis.
The following greedy approach works:
Repeat the following steps until there is no seat left:
Pick the largest group and the largest table
Match one person from the chosen group to the chosen table
Reduce group size and table size by 1.
Proof:
We just have to prove that after performing one step we still can reach optimal solution.
Let's call any member of the largest group a cool guy.
Suppose that there is a different optimal solution in which no cool guy sits at the largest table. Let's pick any person sitting at the largest table in this solution and call it lame guy.
He must belong to the group of size no larger than the cool group. So there is another table at which sits a cool guy but no lame guy. We can than safely swap seats of the lame and cool guy which also results in an optimal solution.

Resources