Hive inner joins wrong result - hadoop

Two tables table1 and table 2
hive> select * from table1 where dt=20171020;
OK
a 1 1 p 10 20171020
b 2 2 q 10 20171020
c 3 3 r 10 20171020
d 4 4 r 10 20171020
hive> select * from table2 where dt=20171020;
OK
a 1 1 p 10 20171020
b 2 2 t 10 20171020
c 3 3 r 10 20171020
hive> select * from table1 t1
> join table2 t2
> on t1.c1=t2.c1
> where
> t1.dt=20171020 and t2.dt=20171020 and
> t1.c2 <> t2.c2 or t1.c3 <> t2.c3 or t1.c4 <> t2.c4 or t1.c5 <> t2.c5;
Result:
a 1 1 p 20 20171016 a 1 1 p 10 20171015
a 1 1 p 20 20171016 a 1 1 p 10 20171020
b 2 2 q 20 20171016 b 2 2 t 10 20171015
b 2 2 q 20 20171016 b 2 2 t 10 20171020
c 3 3 r 20 20171016 c 3 3 r 10 20171015
c 3 3 r 20 20171016 c 3 3 r 10 20171020
b 2 2 q 10 20171020 b 2 2 t 10 20171015
b 2 2 q 10 20171020 b 2 2 t 10 20171020
a 19 19 p 20 20171019 a 1 1 p 10 20171015
a 19 19 p 20 20171019 a 1 1 p 10 20171020
I want following row because this row got changed,how hive joins in the above code?
b 2 2 q 10 20171020

Try this.Your join should be on date as well.
SELECT *
FROM table1 t1
JOIN table2 t2
ON t1.c1 = t2.c1
AND t1.dt = t2.dt
WHERE t1.dt = 20171020
AND ( t1.c2 <> t2.c2
OR t1.c3 <> t2.c3
OR t1.c4 <> t2.c4
OR t1.c5 <> t2.c5 );

Related

How can I extract columns name from row value of another table(oracle sql)?

I have 2 tables:
table1
no
a
b
c
x1
2
3
4
x2
10
11
12
x3
20
21
22
table2
from_val
in_out
cf_pv
term
a
out
cf
b
b
out
pv
b
c
in
cf
e
Define sum_out is sum of a, b, c in table1 with condition in_out='out' in table2 and sum_cf is sum of a, b, c in table1 with condition cf_pv='cf' in table2.
Shortly, values of from_val in table2 are columns name i.e. a, b, c in table1.
How can I extract and calculate sum_out or sum_cf of every no in Oracle?
sum_out of x1 = 2 + 3
sum_out of x2 = 10 + 11
sum_out of x3 = 20 + 21
sum_cf of x1 = 2 + 4
sum_cf of x2 = 10 + 12
sum_cf of x3 = 20 + 22
Thanks!
'''''''''''''''''''''''''''''''''''''''''''''
in additional,
i want to calculate
sum_out and cf of x1= 2 (=a)
sum_out and cf of x2= 10 (=b)
sum_out and cf of x3= 20 (=c)
Sample data
WITH
tbl_1 AS
(
Select 'x1' "COL_NO", 2 "A", 3 "B", 4 "C" From Dual Union All
Select 'x2' "COL_NO", 10 "A", 11 "B", 12 "C" From Dual Union All
Select 'x3' "COL_NO", 20 "A", 21 "B", 22 "C" From Dual
),
tbl_2 AS
(
Select 'A' "FROM_VAL", 'out' "IN_OUT", 'cf' "CF_PV", 'begin' "TERM" From Dual Union All
Select 'B' "FROM_VAL", 'out' "IN_OUT", 'pv' "CF_PV", 'begin' "TERM" From Dual Union All
Select 'C' "FROM_VAL", 'in' "IN_OUT", 'cf' "CF_PV", 'end' "TERM" From Dual
),
Create CTE (formulas) that generates formulas for IN_OUT = 'out' and For CF_PV = 'cf'
formulas AS
(
Select
CASE WHEN IN_OUT = 'out' THEN IN_OUT END "IN_OUT",
LISTAGG(FROM_VAL, ' + ') WITHIN GROUP (ORDER BY FROM_VAL) OVER(PARTITION BY IN_OUT) "IN_OUT_FORMULA",
CASE WHEN CF_PV = 'cf' THEN CF_PV END "CF_PV",
LISTAGG(FROM_VAL, ' + ') WITHIN GROUP (ORDER BY FROM_VAL) OVER(PARTITION BY CF_PV) "CF_PV_FORMULA"
From
tbl_2
),
IN_OUT
IN_OUT_FORMULA
CF_PV
CF_PV_FORMULA
C
cf
A + C
out
A + B
cf
A + C
out
A + B
B
Another CTE (grid) to connect COL_NO to formulas
grid AS
(
Select
t1.COL_NO,
CASE WHEN f1.IN_OUT = 'out' THEN f1.IN_OUT END "IN_OUT", CASE WHEN f1.IN_OUT = 'out' THEN f1.IN_OUT_FORMULA END "IN_OUT_FORMULA",
CASE WHEN f1.CF_PV = 'cf' THEN f1.CF_PV END "CF_PV", CASE WHEN f1.CF_PV = 'cf' THEN f1.CF_PV_FORMULA END "CF_PV_FORMULA"
From
tbl_1 t1
Left Join
formulas f1 ON(f1.IN_OUT Is Not Null AND f1.CF_PV Is Not Null)
)
COL_NO
IN_OUT
IN_OUT_FORMULA
CF_PV
CF_PV_FORMULA
x1
out
A + B
cf
A + C
x2
out
A + B
cf
A + C
x3
out
A + B
cf
A + C
Main SQL to get the final result
SELECT
g.COL_NO,
g.IN_OUT,
g.IN_OUT_FORMULA,
CASE WHEN g.IN_OUT = 'out' And INSTR(IN_OUT_FORMULA, 'A') > 0 THEN A ELSE 0 END +
CASE WHEN g.IN_OUT = 'out' And INSTR(IN_OUT_FORMULA, 'B') > 0 THEN B ELSE 0 END +
CASE WHEN g.IN_OUT = 'out' And INSTR(IN_OUT_FORMULA, 'C') > 0 THEN C ELSE 0 END "CALC_OUT",
--
g.CF_PV,
g.CF_PV_FORMULA,
CASE WHEN g.CF_PV = 'cf' And INSTR(CF_PV_FORMULA, 'A') > 0 THEN A ELSE 0 END +
CASE WHEN g.CF_PV = 'cf' And INSTR(CF_PV_FORMULA, 'B') > 0 THEN B ELSE 0 END +
CASE WHEN g.CF_PV = 'cf' And INSTR(CF_PV_FORMULA, 'C') > 0 THEN C ELSE 0 END "CALC_CF"
FROM
grid g
INNER JOIN
tbl_1 t1 ON(g.COL_NO = t1.COL_NO)
R e s u l t :
COL_NO
IN_OUT
IN_OUT_FORMULA
CALC_OUT
CF_PV
CF_PV_FORMULA
CALC_CF
x1
out
A + B
5
cf
A + C
6
x2
out
A + B
21
cf
A + C
22
x3
out
A + B
41
cf
A + C
42

how can i return null when dividing values

I would like to get a null value when i SUM UP and divide multiple values in event any of the values that i am summing up has a null. in the example below i would like the return value to be be a null if any of the values i am summing up have a null or zero.
(((CAST (NVL(XYY.SCR,NULL)AS NUMBER) - 57.81114) / 24.79211) + ((CAST(NVL(WPM_SCR,NULL)AS NUMBER) - 40.7836082505127) / 17.5946375921401) + ((CAST (NVL(SLOT3,NULL) AS NUMBER) - 50.204190919674 ) / 25.5100093808846) ) / 3 BASE
A simplified example:
anything + null or / null will be null anyway, so you don't have to do anything about it
for + 0 or / 0, use CASE (see lines #7 and #11)
SQL> with test (a, b) as
2 (select 6, 3 from dual union all
3 select 5, 0 from dual union all
4 select 2, null from dual
5 )
6 select a, b,
7 case when b = 0 then null
8 else a/b
9 end result_div,
10 --
11 case when a = 0 or b = 0 then null
12 else a + b
13 end result_sum
14 from test;
A B RESULT_DIV RESULT_SUM
---------- ---------- ---------- ----------
6 3 2 9
5 0
2
SQL>

Oracle 11g - Adding a Total Column to a Pivot Table

I've created a pivot table with data from multiple tables (using JOINS). How can I add another column to the table which adds up each column from each row?
Example:
Category | A | B | C |
ABC 1 1 1
A 1 0 0
B 0 1 0
C 0 0 1
Category | A | B | C | TOTAL
ABC 1 1 1 3
A 1 0 0 1
B 0 1 0 1
C 0 0 1 1
SCOTT#research 15-APR-15> select * from testing ;
CATEG A B C
----- ---------- ---------- ----------
ABC 1 1 1
A 1 0 0
B 0 1 0
C 0 0 1
SCOTT#research 15-APR-15> select category,a,b,c, sum(a+b+c) as "total" from testing group by category,a,b,c order by category;
CATEG A B C total
----- ---------- ---------- ---------- ----------
A 1 0 0 1
ABC 1 1 1 3
B 0 1 0 1
C 0 0 1 1
In case you want to add a column, then can add one use a procedure to update the values using this,
alter table testing add total int;
use this procedure to update the values
create or replace procedure add_Test
is
sqlis varchar2(10);
total1 int;
begin
for i in (select * from testing) loop
select sum(a+b+c) into total1 from testing where category=i.category;
update testing set total=total1 where category=i.category;
end loop;
commit;
end;
exec add_test;
SCOTT#research 15-APR-15> select * from testing;
CATEG A B C TOTAL
----- ---------- ---------- ---------- ----------
ABC 1 1 1 3
A 1 0 0 1
B 0 1 0 1
C 0 0 1 1

Create file with matched and non-matched records using Pig script

Can you please suggest on below file matching logic and removing duplicate entries using Pig -
1) Removing duplicate entries based on key RoleId-
InputFile1
--------------
RoleId Name
1 A
2 B
3 C
2 D
5 E
5 F
7 G
OutpufFile1 (Only unique records)
RoleId Name
1 A
3 C
7 G
OutpufFile2 (Capture duplicate records)
RoleId Name
2 B
2 D
5 E
5 F
2) File Matching key is RoleId -
InputFile1 InputFile2
----------- ----------
RoleId Name RoleId Age
1 A 1 20
2 B 2 21
3 C 1 22
4 D 2 23
5 E 3 24
7 25
OutpufFile1 (Matching records) OutputFile2 (Un-matching from 1st)
-------------------- -----------
RoleId Name Age RoleId Name
1 A 20, 22 4 D
2 B 21, 23 5 E
3 C 24
Thanks,
Can you try the below approach?
Problem1 Solution:
input
1 A
2 B
3 C
2 D
5 E
5 F
7 G
PigScript:
A = LOAD 'in.txt' USING PigStorage() AS(RoleId:int,Name:chararray);
B = GROUP A BY RoleId;
C = FOREACH B GENERATE FLATTEN($1) AS(RoleId,Name),COUNT(A) AS cnt;
SPLIT C INTO Distval IF (cnt==1), NonDistVal IF (cnt>=2);
D = FOREACH Distval GENERATE RoleId,Name;
STORE D INTO 'DistFile' USING PigStorage();
E = FOREACH NonDistVal GENERATE RoleId,Name;
STORE E INTO 'NonDistFile' USING PigStorage();
Output:
cat DistFile/part-r-00000
1 A
3 C
7 G
cat NonDistFile/part-r-00000
2 B
2 D
5 E
5 F
Problem2 Solution:
InputFile1
1 A
2 B
3 C
4 D
5 E
InputFile2
1 20
2 21
1 22
2 23
3 24
7 25
PigScript:
A = LOAD 'InputFile1' USING PigStorage() AS(RoleId:long, Name:chararray);
B = LOAD 'InputFile2' USING PigStorage() AS(RoleId:long, Age:int);
C = COGROUP A BY RoleId ,B BY RoleId;
D = FILTER C BY NOT IsEmpty(A);
SPLIT D INTO RoleMatch IF NOT IsEmpty(B),NoRoleMatch IF IsEmpty(B);
E = FOREACH RoleMatch GENERATE FLATTEN($1),BagToTuple(B.Age);
STORE E INTO 'RoleMatchFile' USING PigStorage();
F = FOREACH NoRoleMatch GENERATE FLATTEN($1);
STORE F INTO 'NoRoleMatchFile' USING PigStorage();
Output:
cat RoleMatchFile/part-r-00000
1 A (20,22)
2 B (21,23)
3 C (24)
cat NoRoleMatchFile/part-r-00000
4 D
5 E

simple join of three tables in LINQ

i dont understand linq properly. i dont know why.
i have three tables.
1)TillTable
tillId, tillName
1 w1
2 w2
3 w3
4 w4
2)TillDepartment
tillDeptId, tillId, deptId, isPart
1 1 5 1
2 1 7 0
3 1 8 0
4 1 9 0
5 2 5 0
6 2 7 0
7 2 8 0
8 2 9 0
9 3 5 0
10 3 7 1
11 3 8 0
12 3 9 0
13 4 5 0
14 4 7 0
15 4 9 0 so on....
3) departmentTable
deptId, deptName
5 Science
7 Commerce
8 history
9 English so on....
now using linq or lambda exp i want to display following result,
tillId, tillName, deptName
1 w1 science
2 w2 no dept
3 w3 commerce
4 w4 no dept so on...
i hope for linq master its damn easy task....
help me to get it resolved....
if isPart column is 1 then in result set deptName should be displayed else 'no dept' ....
associative table has multiple entries for tillId.....
its requirement so strictly follow this scenario only.....
i hope its clr ......
Try this:-
var result = (from tt in db.tillTables
join td in db.tillDepts
on tt.tillId equals td.tillId
join dt in db.departmentTable
on td.deptid equals dt.deptId
select new
{
tillId = tt.tillId,
tillName = tt.tillName,
deptName = td.isPart == 1 ? dt.deptName : null
}).GroupBy(x => x.tillId)
.Select(x =>
{
var orderedDeptRecord = x.OrderByDescending(z => z.deptName).FirstOrDefault();
return new
{
tillId = x.Key,
tillName = orderedDeptRecord.tillName,
deptName = orderedDeptRecord.deptName
};
});
Outer join to single record,then, the foreign key to the deptName is the select clause with ternary operator (gets translated to "case")
from t in TillTable
join tds in TillDepartment on new { t.tillId,isPart=1}
equals new { td.tillId, isPart } into tdOuter
from td in tdOuter.DefaultIfEmpty().Take(1)
select new { t.tillId, t.tillName,
deptName=((td==null)? "no dept" :
(from dt in departmentTable
where dt.deptId == td.deptId
select deptName ) ) }

Resources