I have a case, where I need to do multiple joins(lookups) like below query.Sample scenario was given.
I have around 200 CAT_CODE. I thought few solutions and I listed it down as cases.Is there is any different way to write a sql query to have better performance? or any better approach in ETL tool?
Primary Table(PRIM):
NUM CAT1_CODE CAT2_CODE CAT3_CODE
A 1 y q
B 2 e a
C 3 s z
Secondary Table(LOV):
CATEGORY COLUMN_LKP EXT_CODE
CAT1_CODE 1 AB
CAT1_CODE 2 CD
CAT1_CODE 3 HI
CAT2_CODE y JL
CAT2_CODE e QD
CAT2_CODE s AH
CAT3_CODE q CD
CAT3_CODE a MS
CAT3_CODE z EJ
CASE-1: Through SQL:
I have written a simple query to accomplish this task. Do you think, this would be right approach? Any other ways, to improve this query? Right now, I'm using both Oracle and Postgres.
SELECT
NUM,
(SELECT EXT_CODE FROM TEST_LOV
WHERE CATEGRY='CAT1_CODE' AND COLUMN_LKP=A.CAT1_CODE) CAT1,
(SELECT EXT_CODE FROM TEST_LOV
WHERE CATEGRY='CAT2_CODE' AND COLUMN_LKP=A.CAT2_CODE) CAT2,
(SELECT EXT_CODE FROM TEST_LOV
WHERE CATEGRY='CAT3_CODE' AND COLUMN_LKP=A.CAT3_CODE) CAT3
FROM
TEST_PRIM A
REQUIRED OUTPUT:
NUM CAT1 CAT2 CAT3
A AB JL CD
B CD QD MS
C HI AH EJ
CASE-2: ETL:
Same case can be accomplished through ETL. We need to use lookups to get that done.
Scenario-1:
LOV(CAT1_CODE) LOV(CAT2_CODE) LOV(CAT3_CODE)
| | |
| | |
PRIM---->LOOKUP---------->LOOKUP------------>LOOKUP-------->TARGET
I don't think, that would be right approach. We have 200 codes, we cannot use 200 lookup. Is there is any better approach to handle that in ETL(Datastage, Talend, BODS)with better performance?
Scenario-2:
Pivoting PRIM(converting CAT1_CODE,CAT2_CODE,CAT3_CODE columns in to rows) like below and doing one lookup.But pivoting will take much time, because we have data around 600 million and 200 columns.
NUM CATGRY CODE
A CAT1_CODE 1
A CAT1_CODE y
A CAT1_CODE q
B CAT2_CODE 2
B CAT2_CODE e
B CAT2_CODE a
C CAT3_CODE 3
C CAT3_CODE s
C CAT3_CODE z
Kindly suggest me some best way to handle this approach.It can be through ETL or through sql. Thanks in advance.
You can use the LATERAL keyword to do the magic that you are looking for.
The following code could help:
SELECT
NUM,
MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT1_CODE') AS CAT1,
MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT2_CODE') AS CAT2,
MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT3_CODE') AS CAT3
FROM TEST_PRIM a
CROSS JOIN LATERAL (
SELECT *
FROM TEST_LOV b
WHERE
(a.CAT1_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT1_CODE')
OR (a.CAT2_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT2_CODE')
OR (a.CAT3_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT3_CODE')
) c
GROUP BY NUM
ORDER BY NUM;
Output
num | cat1 | cat2 | cat3
-----+------+------+------
A | AB | JL | CD
B | CD | QD | MS
C | HI | AH | EJ
Related
i'm trying to do two calculations in one procedure in pl/sql. The simple version would look like
select a1+a2
into a
from table c;
select b1+b2
into b
from table c;
...
in the end i would like to further calculate the results of a and b, so let's say
select (a*b)
into c
from x
insert
into x (A,B,C)
values (a,b,c)
would that be possible? In case I split the two into two procedures (to first calculate a, b and then a second procedure to calculate c) i got results like
A | B | C
a | b | null
null | null | c
what i want to get is
A | B | C
a | b | c
I'm using Oracle SQL and i need help with a query. Hope it's not too much easy one. I did't find an answer for it in Google.
I have a table that need to be aggregated by ID column and then to select only the records that two values are included in a certain table (and both of them).
Table for example
ID | Value
1 | Y
1 | N
2 | N
2 | N
2 | Y
3 | Y
3 | Y
4 | Y
5 | Y
5 | N
5 | Y
5 | N
The output table need to include only the IDs that both Y and N are included in Value table. Output:
ID
1
2
5
Another solution that groups by the ID and uses HAVING to return only those with > 1 DISTINCT values:
with v_data(id, value) as (
select 1, 'Y' from dual union all
select 1, 'N' from dual union all
select 2, 'Y' from dual)
select id
from v_data
group by id
having count(distinct value) > 1
select distinct a.id
from your_table a inner join your_table b on a.id = b.id and a.value != b.value;
Lets say we have a car rental company and that we have multiple cars of the same type. When a user books a car he doesn’t book a specific car but one of its type.
Which algorithm is the best to display/arrange/sort the bookings on a timeline table, looking like this:
/*
* | 09 - 10 - 11 - 12 - 13 - 14 - 15 - 16 - 17 |
* | A A A A A |
* | D D D B B B |
* | D D D B B B |
* | C C C C |
* | C C C C |
* | C C C C |
* */
Where each line represents a car of the same type, and each letter represents 1 booking (1 user can book multiple cars). The system has built in detection so, you cannot book a car if there are no cars left (like from 11. to 14. there are no cars available).
There are multiple ways to sort the bookings, but an algorithm (or some pointers on how to) that always returns the same order and works for all (or close) different cases would be great.
[BASE]
/ \ \
C1 C2 C3
/\ \
C4 C5 C6
I have a tree like the above. This is a N child tree which is not balanced. The problem is, I need to select one of the node based on some condition. Like
Select C1 when k1 = a
Select C4 when K1 = a and K2=b and K3=C
Select C5 when k1 = a and k'=z
Select C2 when K'' = b
Select C3 when k5 = 9
Select C6 when k5=9 and k6 = 10
The input to the program would be an arbitraty length of key value pairs like if input is -k1=a,k2=b,k3=c,k8=10 - I should select C4 as that is the best match.
Ideally I was thinking of traversing the tree and for each node, there is a selection criteria which I can match against the input set. But soon I figured out, this tree can be very huge and Base node can have tens of thousands of child nodes under it. So it might not be a good idea to go node by node. If there is a way to select the nodes more efficiently, I would love to know that.
Looks like your k's are pointing to directory structure and the leaf of this structure (exactly one leaf for each directory) is the node you are looking for. You can keep this string in node as another value. What is not clear in question is how are the k's related to the tree
for e.g.
a->c1
a/b/c->c4
I have found a workable solution like this one
----------------------------------------
|rowId|param1|param2|param3|param4|node|
----------------------------------------
|10 | a | | | | C1 |
----------------------------------------
|14 | a | b | c | | C4 |
----------------------------------------
|18 | a | b | | | C5 |
----------------------------------------
Lets call it a condition table. Each column represent the input series (k) and for different combinations of the value, there is a node to be selected. This table can be think of an in memory data structure or a real table in RDBMS.
I have 3 tables:
two with hierarchical structures
(like "dimensions" of recursive type of hierarchy);
one with summing data (like "facts" with X column).
They are here:
DIM1 (ID1, PARENT2, NAME1)
DIM2 (ID2, PARENT2, NAME2)
FACTS (ID1, ID2, X)
Example of DIM1 table:
-- 1 0 DIM1
---- 2 1 DIM1-A
------ 3 2 DIM1-A-A
-------- 4 3 DIM1-A-A-A
-------- 5 3 DIM1-A-A-B
------ 6 2 DIM1-A-B
-------- 7 6 DIM1-A-B-A
-------- 8 6 DIM1-A-B-B
------ 9 2 DIM1-A-C
---- 10 1 DIM1-B
------ 11 10 DIM1-B-C
------ 12 10 DIM1-B-D
---- 13 1 DIM1-C
Example of DIM2 table:
-- 1 0 DIM2
---- 2 1 DIM2-A
------ 3 2 DIM2-A-A
-------- 4 3 DIM2-A-A-A
-------- 5 3 DIM2-A-A-B
-------- 6 3 DIM2-A-B-C
------ 7 2 DIM2-A-B
---- 8 1 DIM2-B
---- 9 1 DIM2-C
Example of FACTS table:
1 1 100
1 2 30
1 3 500
-- ................
13 9 200
And I would like to create the only SELECT where I will specify the parent for DIM1 (for example ID1=2 for DIM1-A) and parent for DIM2 (for example ID2=2 for DIM2-A) and SELECT will generate a report like this:
Name_of_1 Name_of_2 Sum_of_X
--------- --------- ----------
DIM1-A-A DIM2-A-A (some sum)
DIM1-A-A DIM2-A-B (some sum)
DIM1-A-B DIM2-A-A (some sum)
DIM1-A-B DIM2-A-B (some sum)
DIM1-A-C DIM2-A-A (some sum)
DIM1-A-C DIM2-A-B (some sum)
I would like to use CONNECT BY phrase, START WITH phrase, SUM phrase, GROUP BY phrase, and OUTER or INNER (?) JOIN. I need no other extensions of Oracle 10.2.
In other words: only with "classic" SQL and
only Oracle extensions for hierarchy queries.
Is it possible?
I tried some experiments with question in
Mixing together Connect by, inner join and sum with Oracle
(where is a very nice solution but only for one
dimension table ("Tasks"), but I need to JOIN two dimension tables to one facts table), but I was not successful.
"Some sum" is not very descriptive, so I don't see why do you need CONNECT BY at all.
SELECT dim1.name, dim2.name, x
FROM (
SELECT id1, id2, SUM(x) AS x
FROM facts
GROUP BY
id1, id2
) f
JOIN dim1
ON dim1.id = f.id1
JOIN dim2
ON dim2.id = f.id2
I think what you're trying to do is get the sum of the value in the facts table for all of the children of the specified rows grouped by the topmost children. This would mean that in your example above, the results for the first row would be the sum any intersections of (DIM1-A-A, DIM1-A-A-A, DIM1-A-A-B) and (DIM2-A-A, DIM2-A-A-A, DIM2-A-A-B, DIM3-A-A-C) found in the FACTS table. With that assumption, I have come to the following solution:
SELECT root_name1, root_name2, SUM(X)
FROM ( SELECT CONNECT_BY_ROOT(name1) AS root_name,
id1
FROM dim1
CONNECT BY parent1 = PRIOR id1
START WITH parent1 = 2) d1
CROSS JOIN
( SELECT CONNECT_BY_ROOT(name2) AS root_name,
id2
FROM dim2
CONNECT BY parent2 = PRIOR id2
START WITH parent2 = 2) d2
LEFT OUTER JOIN
facts
ON d1.id1 = facts.id1
AND d2.id2 = facts.id2
GROUP BY root_name1, root_name2
(This also assumes that the columns of FACTS are named ID1, ID2, and X.)