[BASE]
/ \ \
C1 C2 C3
/\ \
C4 C5 C6
I have a tree like the above. This is a N child tree which is not balanced. The problem is, I need to select one of the node based on some condition. Like
Select C1 when k1 = a
Select C4 when K1 = a and K2=b and K3=C
Select C5 when k1 = a and k'=z
Select C2 when K'' = b
Select C3 when k5 = 9
Select C6 when k5=9 and k6 = 10
The input to the program would be an arbitraty length of key value pairs like if input is -k1=a,k2=b,k3=c,k8=10 - I should select C4 as that is the best match.
Ideally I was thinking of traversing the tree and for each node, there is a selection criteria which I can match against the input set. But soon I figured out, this tree can be very huge and Base node can have tens of thousands of child nodes under it. So it might not be a good idea to go node by node. If there is a way to select the nodes more efficiently, I would love to know that.
Looks like your k's are pointing to directory structure and the leaf of this structure (exactly one leaf for each directory) is the node you are looking for. You can keep this string in node as another value. What is not clear in question is how are the k's related to the tree
for e.g.
a->c1
a/b/c->c4
I have found a workable solution like this one
----------------------------------------
|rowId|param1|param2|param3|param4|node|
----------------------------------------
|10 | a | | | | C1 |
----------------------------------------
|14 | a | b | c | | C4 |
----------------------------------------
|18 | a | b | | | C5 |
----------------------------------------
Lets call it a condition table. Each column represent the input series (k) and for different combinations of the value, there is a node to be selected. This table can be think of an in memory data structure or a real table in RDBMS.
Related
I have a table like this:
| A | B |
|---|---|
| a | 5 | <- max, should be red
| b | 1 | <- min, should be green
| c | 0 | <- zero, should not count
| d | 1 | <- min, should be green
| e | 3 |
| f | 5 | <- max, should be red
| g | 4 |
| h | 0 | <- zero, should not count
The objective is to get the maximum values formatted red and the minimum values green. The cells with the value 0 should not count (as minimum value).
I tried conditional formatting with following rules:
Condition 1
Formula is: MAX(E2:E40)
Cell Style: max
Condition 2
Formula is: MINIFS(E2:E40;E2:E40;">0")
Cell Style: max
But the result is, that all cells with value > 0 get marked red.
How to mark the greatest and the lowest values in a column and ignore the cells with a defined value?
The trick with conditional formatting is that the current cell is referenced by the first cell, not by a range of cells. That is, E2 refers to the current cell, applying to E3, E4 and so on throughout the conditionally formatted range.
References in formulas change for each cell unless they are fixed with $, so in the formula below, $E2 is used to fix the reference to column E (because the value is in column E even when we're formatting column D) but lets the reference to row 2 change for each row that needs to be formatted. In contrast, the range to check for min and max values should not change no matter what the current cell, so that is $E$2:$E$40.
Anyway, whether you followed that explanation or not, here are the two formulas.
$E2 = MAX($E$2:$E$40)
$E2 = MINIFS($E$2:$E$40;$E$2:$E$40;">0")
I have a timetable like this:
+-----------+-------------+------------+------------+------------+------------+-------+----+
| transport | trainnumber | departcity | arrivecity | departtime | arrivetime | price | id |
+-----------+-------------+------------+------------+------------+------------+-------+----+
| Q | Q00 | BJ | TJ | 13:00:00 | 15:00:00 | 10 | 1 |
| Q | Q01 | BJ | TJ | 18:00:00 | 20:00:00 | 10 | 2 |
| Q | Q02 | TJ | BJ | 16:00:00 | 18:00:00 | 10 | 3 |
| Q | Q03 | TJ | BJ | 21:00:00 | 23:00:00 | 10 | 4 |
| Q | Q04 | HA | DL | 06:00:00 | 11:00:00 | 50 | 5 |
| Q | Q05 | HA | DL | 14:00:00 | 19:00:00 | 50 | 6 |
| Q | Q06 | HA | DL | 18:00:00 | 23:00:00 | 50 | 7 |
| Q | Q07 | DL | HA | 07:00:00 | 12:00:00 | 50 | 8 |
| Q | Q08 | DL | HA | 15:00:00 | 20:00:00 | 50 | 9 |
| ... | ... | ... | ... | ... | ... | ... | ...|
+-----------+-------------+------------+------------+------------+------------+-------+----+
In this table, there 13 cities and 116 routes altogether and the smallest unit of time is half an hour.
There are difference transports, which doesn't matter. As you can see, there can be multiple edges with same departcity and arrivecity but difference time and difference price. The time is constant everyday.
Now, here arises a problem.
A user wonder how he can travel from city A to city B (A and B may be one city), with passing zero or some cities C, D...(whether they should be in order depends on whether the user wants it to be, that is, there are two problems), within X hours and also least costs under above conditions.
Before this problem, I have solved another simpler problem.
A user wonder how he can travel from city A to city B (A and B may be one city), with passing zero or some cities C, D...(whether they should be in order depends), with least costs under above conditions.
Here is how I solve it (just take not in order as an example):
Sort the must-pass cities:C1, C2, C3...Cn. Let C0 = A, C(n+1) = B, minCost.cost = INFINITE;
i = 0, j = 1, W = {};
Find a least cost way S from Ci to Cj using Dijkstra Algorithm with price as the weight of edges. W=W∪S;
i = i + 1, j = j + 1;
If j <= n + 1, goto 3;
if W.cost < minCost.cost, minCost = W;
If next permutation for C1...Cn exists, rearrange list C1...Cn in order of the next permutation for C1...Cn and goto 2;
Return minCost;
However, I cannot come up with a efficient solution to the first problem, Please help me, thanks.
I'll be appreciated if anyone can solve another problem:
A user wonder how he can travel from city A to city B (A and B may be one city), with passing zero or some cities C, D...(whether they should be in order depends), within least time under above conditions.
It's quite a big problem, so I will just sketch a solution.
First, remodel your graph as follows. Instead of each vertex representing a city, let a vertex represent a tuple of (city, time). This is feasible as there are only 13 cities and only (time_limit - current_time) * 2 possible time slots as the smallest unit of time is half an hour. Now connect vertices according to the given timetable with prices as their weights as before. Don't forget that the user can stay at any city for any amount of time for free. All nodes with city A are start nodes, all nodes with city B are target nodes. Take the minimum value of all (B, time) vertices to get the solution with least cost. If there are multiple, take the one with the smallest time.
Now on towards forcing the user to pass through certain cities in order. If there are n cities to pass through (plus start and target city), you need n+2 copies of the same graph which act as different levels. The level represents how many cities of your list you have already passed. So you start in level 0 on vertex A. Once you get to C1 in level 0 you move to the vertex C1 in level 1 of the graph (connect the vertices by 0-weight edges). This means that when you are in level k, you have already passed cities C1 to Ck and you can get to the next level only by going through C(k+1). The vertices of city B in the last level are your target nodes.
Note: I said copies of the same graph, but that is not exactly true. You can't allow the user to reach C(k+2), ..., B in level k, that would violate the required order.
To enforce passing cities in any order, a different scheme of connecting the levels (and modifying them during runtime) is required. I'll leave this to you.
I have a case, where I need to do multiple joins(lookups) like below query.Sample scenario was given.
I have around 200 CAT_CODE. I thought few solutions and I listed it down as cases.Is there is any different way to write a sql query to have better performance? or any better approach in ETL tool?
Primary Table(PRIM):
NUM CAT1_CODE CAT2_CODE CAT3_CODE
A 1 y q
B 2 e a
C 3 s z
Secondary Table(LOV):
CATEGORY COLUMN_LKP EXT_CODE
CAT1_CODE 1 AB
CAT1_CODE 2 CD
CAT1_CODE 3 HI
CAT2_CODE y JL
CAT2_CODE e QD
CAT2_CODE s AH
CAT3_CODE q CD
CAT3_CODE a MS
CAT3_CODE z EJ
CASE-1: Through SQL:
I have written a simple query to accomplish this task. Do you think, this would be right approach? Any other ways, to improve this query? Right now, I'm using both Oracle and Postgres.
SELECT
NUM,
(SELECT EXT_CODE FROM TEST_LOV
WHERE CATEGRY='CAT1_CODE' AND COLUMN_LKP=A.CAT1_CODE) CAT1,
(SELECT EXT_CODE FROM TEST_LOV
WHERE CATEGRY='CAT2_CODE' AND COLUMN_LKP=A.CAT2_CODE) CAT2,
(SELECT EXT_CODE FROM TEST_LOV
WHERE CATEGRY='CAT3_CODE' AND COLUMN_LKP=A.CAT3_CODE) CAT3
FROM
TEST_PRIM A
REQUIRED OUTPUT:
NUM CAT1 CAT2 CAT3
A AB JL CD
B CD QD MS
C HI AH EJ
CASE-2: ETL:
Same case can be accomplished through ETL. We need to use lookups to get that done.
Scenario-1:
LOV(CAT1_CODE) LOV(CAT2_CODE) LOV(CAT3_CODE)
| | |
| | |
PRIM---->LOOKUP---------->LOOKUP------------>LOOKUP-------->TARGET
I don't think, that would be right approach. We have 200 codes, we cannot use 200 lookup. Is there is any better approach to handle that in ETL(Datastage, Talend, BODS)with better performance?
Scenario-2:
Pivoting PRIM(converting CAT1_CODE,CAT2_CODE,CAT3_CODE columns in to rows) like below and doing one lookup.But pivoting will take much time, because we have data around 600 million and 200 columns.
NUM CATGRY CODE
A CAT1_CODE 1
A CAT1_CODE y
A CAT1_CODE q
B CAT2_CODE 2
B CAT2_CODE e
B CAT2_CODE a
C CAT3_CODE 3
C CAT3_CODE s
C CAT3_CODE z
Kindly suggest me some best way to handle this approach.It can be through ETL or through sql. Thanks in advance.
You can use the LATERAL keyword to do the magic that you are looking for.
The following code could help:
SELECT
NUM,
MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT1_CODE') AS CAT1,
MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT2_CODE') AS CAT2,
MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT3_CODE') AS CAT3
FROM TEST_PRIM a
CROSS JOIN LATERAL (
SELECT *
FROM TEST_LOV b
WHERE
(a.CAT1_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT1_CODE')
OR (a.CAT2_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT2_CODE')
OR (a.CAT3_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT3_CODE')
) c
GROUP BY NUM
ORDER BY NUM;
Output
num | cat1 | cat2 | cat3
-----+------+------+------
A | AB | JL | CD
B | CD | QD | MS
C | HI | AH | EJ
This question is about algorithms and thus language-independent.
Given the following rows:
A1, B1, C1, D1 (1)
A1, B2, C1, D1 (2)
A2, B1, C1, D1 (3)
A2, B2, C1, D1 (4)
A3, B1, C1, D1 (5)
A3, B2, C1, D1 (6)
A1, B1, C2, D1 (7)
They can be factored as follow:
+----+----+----+----+
| A1 | B1 | C1 | D1 |
| A2 | B2 | | |
| A3 | | | |
+----+----+----+----+
| A1 | B1 | C2 | D1 |
+----+----+----+----+
The following objects can store those data:
class ExpandedRow {
String a;
String b;
String c;
String d;
}
class FactoredRow {
List<String> as;
List<String> bs;
List<String> cs;
List<String> ds;
}
Concerning the transformations algorithms, the factored --> expanded one is quite easy:
List<FactoredRow> factoredRows = fill();
List<ExpandedRow> expandedRows = empty();
for each factoredRow in factoredRows {
for each a in factoredRow.as {
for each b in factoredRow.bs {
for each c in factoredRow.cs {
for each d in factoredRow.ds {
expandedRows.add(new ExpandedRow(a, b, c, d));
}
}
}
}
}
But I'm lost concerning the expanded --> factored one. How can I factorize a List<ExpandedRow> into a List<FactoredRow>?
In other words, I have the factored table as input. I expand it using the provided algorithm and store it in its expanded state. The question is: how to retrieve the initial factored state after having expanding it?
I thought that if two expanded rows have only one attribute that differs, they can be factored, for example A1, B1, C1, D1 (1) and A1, B1, C2, D1 (2). But if we factorize those two rows together, we will end with:
+----+----+----+----+
| A1 | B1 | C1 | D1 |
| | | C2 | |
+----+----+----+----+
| A1 | B2 | C1 | D1 |
| A2 | | | |
| A3 | | | |
+----+----+----+----+
| A2 | B1 | C1 | D1 |
| A3 | | | |
+----+----+----+----+
Which is less factored than the initial table.
It's seems that there are many factored solutions, and the main issue is to define and to find the most factored one.
This problem seems something like a graph partitioning problem. I suspect it's NP-hard but I haven't been able to prove it yet.
Let's take a simpler example to see what's going on. Consider the pairs (A1,B1), (A2,B1), (A3,B1), (A2,B2). We represent the points as points in 2D-space, and connect points if it is possible to move from one to the other by a translation parallel to the x- or y-axis:
(A2,B2)
|
(A1,B1) -- (A2,B1) -- (A3,B1)
The idea is to partition the graph by lines parallel to the axes, and repartition each partition, and so on, until we get pieces that are complete rectangles, line segments, or points.
There are two esssentially different ways of partitioning the graph above. We can draw a vertical line at position x=1.5:
(A2,B2)
|
(A1,B1) (A2,B1) -- (A3,B1)
after which the right-side piece needs to be further partitioned (by a vertical or horizontal line, let's take horizontal):
(A2,B2)
(A1,B1) (A2,B1) -- (A3,B1)
We have now factored the original list into
A1 B1
-----
A2 B2
-----
A2 B1
A3
On the other hand, if we had made our initial partition with a horizontal line at position y=1.5, we would have
(A2,B2)
(A1,B1) -- (A2,B1) -- (A3,B1)
which is already nicely factored into a point and a line segment:
A2 B2
-----
A1 B1
A2
A3
In higher dimensions (4D for letters A, B, C, D) we have a similar problem, except that there are correspondingly more choices for initial cuts, and the allowed final pieces are higher-dimesional (not just points, line segments, and rectangles but also 3D and 4D boxes).
The problem feels NP-hard to me, just like many other graph partitioning problems, but there are probably reasonably fast approximation algorithms.
i am trying to understand how shortest job first algorithm works, am i doing this in the right way please help
Proc Burst1 Burst2
+------+---------+--------+
| A | 10 | 5 |
| B | 3 | 9 |
| C | 8 | 11 |
+------+---------+--------+
B1->3->C1->11->B2->20->A1->30->A2->35->C2->46
"Shortest job first" is not really an algorithm, but a strategy: among the jobs ready to execute always choose the job with the shortest execution time. Your sequence looks ok. In the beginning the following jobs are ready for execution (with execution time in parenthesis):
A1(10), B1(3), C1(8)
So B1 is chosen, after which also job B2 is ready to execute, so here is the updated list of ready jobs:
A1(10), B2(9), C1(8)
Now C1 is chosen, and so on.
There are variants of the strategy "shortest job first", where the total time over all bursts, i.e. A1 + A2, B1 + B2, ..., is taken into account. Then the chosen sequence would be:
B1, B2, A1, A2, C1, C2