This is an old problem - looking for the best solution in Vertica. Imagine a table with columns:-
A, B, C, D, E
Columns A-D are ints or varchars and column E is a timestamptz column that has a default value of GETUTCDATE().
Sample content of the table:-
1, 2, "AAA", 4, 1404305559
1, 2, "BBB", 23, 1404305633
1, 2, "CCC", 62, 1404305705 <-- the max entry for (1,2,"CCC")
1, 2, "AAA", 123, 1404305740 <-- the max entry for (1,2,"AAA")
1, 2, "BBB", 91, 1404305778 <-- the max entry for (1,2,"BBB")
So potentially there are repeating rows for the composite (A,B,C) value (with column D being a value and column E the timestamp).
I'd like a resultset that showed, for each unique (A,B,C) combination, the latest row and its value. Hence the resultset for the above would look like:-
1, 2, "CCC", 62, 1404305705
1, 2, "AAA", 123, 1404305740
1, 2, "BBB", 91, 1404305778
Let's set up the sample data:
CREATE TABLE public.test (
A int,
B int,
C varchar,
D int,
E int
);
INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 4, 1404305559);
INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 23, 1404305633);
INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'CCC', 62, 1404305705);
INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 123, 1404305740);
INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 91, 1404305778);
COMMIT;
We'll use the RANK function to rank each row based on A, B, C and sort on E and return only the rows that are at the top (have a rank of 1).
SELECT a.a,
a.b,
a.c,
a.d,
a.e
FROM (SELECT a,
b,
c,
d,
e,
RANK()
OVER (
PARTITION BY a, b, c
ORDER BY e DESC) AS rank
FROM public.test) a
WHERE a.rank = 1;
This returns:
A | B | C | D | E
---+---+-----+-----+------------
1 | 2 | CCC | 62 | 1404305705
1 | 2 | AAA | 123 | 1404305740
1 | 2 | BBB | 91 | 1404305778
Related
This qes is about oracle db.
I have a table (T1) with several columns:
group
id
price
A
1
50
A
5
40
B
4
54
C
1
33
C
6
33
D
5
13
D
3
4
And another table (T2) with 2 columns:
id
description
1
aaa
2
bbb
3
ccc
4
ddd
5
eee
6
fff
7
ggg
The Id in this table (t2) is unique.
the connection between the two tables is by the "id" column.
I need to check for each (!) Group in T1 (A, B, C, D),
which of the "id" from T2 - not found for this group in T1,
and the result needs to be: group + id (that does not exist in this group)
for the example on the above tables, I expected to get this result:
group
id
A
2
A
3
A
4
A
6
A
7
B
1
B
2
B
3
B
5
B
6
B
7
C
2
C
3
C
4
C
5
C
7
D
1
D
2
D
4
D
6
D
7
Thank You!
In Oracle you may use partitioned join
with a(grp, id, price) as (
select 'A', 1, 50 from dual union all
select 'A', 5, 40 from dual union all
select 'B', 4, 54 from dual union all
select 'C', 1, 33 from dual union all
select 'C', 6, 33 from dual union all
select 'D', 5, 13 from dual union all
select 'D', 3, 4 from dual
)
, b (id, descr) as (
select 1, 'aaa' from dual union all
select 2, 'bbb' from dual union all
select 3, 'ccc' from dual union all
select 4, 'ddd' from dual union all
select 5, 'eee' from dual union all
select 6, 'fff' from dual union all
select 7, 'ggg' from dual
)
select
grp
, id
from a
partition by (grp)
right join b
using (id)
where a.price is null
order by grp, id
GRP | ID
:-- | -:
A | 2
A | 3
A | 4
A | 6
A | 7
B | 1
B | 2
B | 3
B | 5
B | 6
B | 7
C | 2
C | 3
C | 4
C | 5
C | 7
D | 1
D | 2
D | 4
D | 6
D | 7
db<>fiddle here
The idea here is to create all possible record combinations by using a Cartesian join and then match all groups and select non existing via left join and not null
create table T1 (c1 varchar2(10), c2 number(10), c3 number(10))
insert into T1 values('A', 1, 50);
insert into T1 values('A', 5, 40);
insert into T1 values('B', 4 , 54);
insert into T1 values('C', 1, 33);
insert into T1 values('C', 6, 33);
insert into T1 values('D', 5, 13);
insert into T1 values('D', 3, 4);
create table T2 (c6 number(10), c7 varchar2(10))
insert into T2 values( 1, 'aaa');
insert into T2 values( 2, 'bbb');
insert into T2 values( 3, 'ccc');
insert into T2 values( 4, 'ddd');
insert into T2 values( 5, 'eee');
insert into T2 values( 6, 'fff');
insert into T2 values( 7, 'ggg');
SELECT tx.c1, tx.c6, ty.c1
FROM
(select c1, c6
from
(select distinct c1 from T1) ta,
(select distinct c6 from T2) tb) tx -- tx is the cartisian product
left join
(select c1, c2 from T1 group by c1, c2) ty
on tx.c1 = ty.c1 and tx.c6 = ty.c2
WHERE
ty.c1 is null
ORDER BY 1, 2
This gets you exactly what you're looking for.
Tested and verified
First, find all the possible combinations, then determine which of them don't exist:
WITH cteCombinations
AS (SELECT DISTINCT T1."group", T2.ID
FROM T1
CROSS JOIN T2)
SELECT c."group", c.ID
FROM cteCombinations c
LEFT OUTER JOIN T1
ON T1.ID = c.ID AND
T1."group" = c."group"
WHERE T1.ID IS NULL
ORDER BY c."group", c.ID
The CTE (Common Table Expression) uses a CROSS JOIN to find all of the unique combinations of group and ID; then the LEFT OUTER JOIN is used to determine which of the combinations don't exist in T1.
Another way to do it is:
WITH cteCombinations
AS (SELECT DISTINCT T1."group", T2.ID
FROM T1
CROSS JOIN T2)
SELECT c."group", c.ID
FROM cteCombinations c
WHERE (c."group", c.ID) NOT IN (SELECT "group", ID
FROM T1)
ORDER BY c."group", c.ID
Here we use the same CTE to generate the possible combinations, but instead of a LEFT OUTER JOIN we use a NOT IN comparison to determine which of the combinations are not present in table T1.
db<>fiddle here
I have a table, A and B which are shown below,
Table
A:
id
idB
name
faculty
B:
id
name
Table B has 2 records as below.
SELECT *
FROM B;
1, 1, 'First'
2, 2, 'Second'
Table A has 8 records as below.
SELECT *
FROM A;
1, 1, A, IT
2, 1, B, IT
3, 1, C, IT
4, 1, D, Medicine
5, 1, E, Medicine
6, 1, F, Business
7, 1, G, Business
8, 1, H, IT
9, 2, A, Medicine
10, 2, B, Medicine
11, 2, C, Medicine
12, 2, D, Medicine
13, 2, E, Medicine
14, 2, F, Medicine
15, 2, G, Business
16, 2, H, Medicine
My question is:
How can I select data from table B where faculty should be IT and if there are multiple it should get with max ID. AND if there is no any IT, it should be get business?
My select view should be look like this below:
A and B records.
8, 1, H, IT, First
15, 2, G, Business, Second
Please could you advise and help in which way we can retrieve these data?
this query will help you to get your desired result
SELECT id
,name
,faculty
FROM A
WHERE faculty IN ('IT', 'Business')
How can I select data from table B where faculty should be IT and if there are multiple it should get with max ID. AND if there is no any IT, it should be get business?
This will get the row with the maximum ID that is in IT and if there are no IT rows then Business
SELECT *
FROM (
SELECT A.id,
A.idB,
A.name,
A.faculty,
B.name AS bname
FROM A
INNER JOIN B
ON ( A.idB = B.id )
WHERE A.faculty IN ( 'IT', 'Business' )
ORDER BY
DECODE( A.faculty, 'IT', 1, 'Business', 2 ),
A.id DESC
)
WHERE ROWNUM = 1;
If you want the top row from each group then:
SELECT id,
idB,
name,
faculty,
bname
FROM (
SELECT A.id,
A.idB,
A.name,
A.faculty,
B.name AS bname,
ROW_NUMBER() OVER (
PARTITION BY A.faculty
ORDER BY A.id DESC
) AS rn
FROM A
INNER JOIN B
ON ( A.idB = B.id )
WHERE A.faculty IN ( 'IT', 'Business' )
)
WHERE rn = 1;
SELECT A.*,T.Name
FROM
TableA A
INNER JOIN
(
SELECT MAX(A.id) AS id,A.FACULTY,B.NAME,
ROW_NUMBER() OVER (PARTITION BY B.NAME ORDER BY MAX(A.id) DESC) AS RN FROM TableA A
INNER JOIN TableB B
ON A.idB=B.id
WHERE faculty IN ('IT', 'Business')
GROUP BY A.FACULTY,B.NAME
) T
ON A.id = T.id
AND A.FACULTY = T.FACULTY
WHERE T.RN=1
Output
ID IDB NAME FACULTY NAME
8 1 H IT First
15 2 G Business Second
Demo
http://sqlfiddle.com/#!4/98b84/24
I have 3 columns a,b,c in table.i need to find the duplicates for the columns a & b but with distinct value in c column.
Maybe you need something like this:
with test(a, b, c) as (
select 1, 2, 10 from dual union all
select 1, 2, 20 from dual union all
select 4, 5, 30 from dual union all
select 4, 5, 30 from dual union all
select 3, 2, 3 from dual union all
select 6, 2, 2 from dual
)
select a, b
from test
group by a,b
having count(distinct c) > 1
That is, you need to aggregate for A,B, but only keeping pairs for which there are more DISTINCT values for column C
A has a 1:M relationship with B.
A = LOAD ... AS (
a_id:char
,...
);
B = LOAD ... AS (
a_id:chararray
,b_id:chararray
,...
);
JOINED = JOIN A BY a_id, B BY a_id;
GROUPED = GROUP JOINED BY a::a_id;
This would create a DataBag with the following schema:
{group: chararray, JOINED: {(A:a_id, ..., B::a_id, B::b_id, ...)}}
For example:
(1, {(1, ..., 1, 1, ...)})
(2, {(2, ..., 2, 2, ...), (2, ..., 2,3, ...), (2, ...,2,4, ...)})
(3, {(3, ..., 3, 5, ...)})
For these three rows, this is how the corresponding HBase results would look like:
rowkey = 1, A:a_id=1, ... B:b1|a_id=1, B:b1|b_id:=1
rowkey = 2, A:a_id=2, ... B:b2|a_id=2, B:b2|b_id=2, ..., B:b3|a_id=2, B:b3|b_id=3, ..., B:b4|a_id=2, B:b4|b_id=4, ...
rowkey = 3, A:a_id=3, ..., B:b5|a_id=3, B:b5|b_id = 5
How can I import this DataBag into HBase using the above logic?
In order to do this I need to generate dynamic column qualifier names, the number of which would be dependent on the number of subtuples in the DataBag.
I'm new to relational algebra. I found the * operator in the following expression
What's the different this and one using join
The * should more correctly be written × as it represents a Cartesian product. This operation returns the set of all tuples that are the concatenation of tuples from each operand. A join filters the Cartesian product down to only those tuples with matching values on specified attributes. If the join is a natural join, as in your example, the attributes matched on are those with identical names.
For example, given the following two relations R and S as shown:
R ( a, b, c ) S ( b, c, d )
( 1, 2, 3 ) ( 2, 7, 9 )
( 2, 4, 6 ) ( 5, 3, 4 )
( 3, 6, 9 ) ( 2, 3, 6 )
The Cartesian product R × S is:
( R.a, R.b, R.c, S.b, S.c, S.d )
( 1, 2, 3, 2, 7, 9 )
( 1, 2, 3, 5, 3, 4 )
( 1, 2, 3, 2, 3, 6 )
( 2, 4, 6, 2, 7, 9 )
( 2, 4, 6, 5, 3, 4 )
( 2, 4, 6, 2, 3, 6 )
( 3, 6, 9, 2, 7, 9 )
( 3, 6, 9, 5, 3, 4 )
( 3, 6, 9, 2, 3, 6 )
The natural join R ⨝ S is the product filtered to only tuples where the b and c values match:
( a, b, c, d )
( 1, 2, 3, 6 )
The join R ⨝b S is the product filtered to only tuples where the b values match:
( R.a, b, R.c, S.c, S.d )
( 1, 2, 3, 7, 9 )
( 1, 2, 3, 3, 6 )
In few books natural join is denoted by an astric(*).