How to join hive tables based on condition of the joining column - hadoop

We have a hive table like below:
num value
123 A
456 B
789 C
101 D
The joining table is:
num Symbols
123 ASC
456001 JEN
456002 JEN
456003 JEN
789001 CON
101 URB
Our expected result:
num value symbols
123 A ASC
456 B JEN
789 C CON
101 D URB
Currently we are joining the tables twice in order to get the results.
Like first time insert into some tmp table using the below query:
select
a.num,
a.value,
b.symbols
from mytable a
join mytable b on a.num = b.num;
This query is producing the results for keys 123,101.
Next, we are running another query like below:
select
a.num,
a.value,
b.symbols
from mytable a
join mytable b on CONCAT(a.num,'001') = b.num;
This query is producing the results for keys 456, 789.
These two queries results are inserted into some tmp hive table and we select the final results from the tmp table.
This looks a bad design overall. but I would like to know if there is a better way to achieve this. Thanks.
Query Result
for
Select
a.num
,a.value
,b.symbols
from
(select substr(num,3) as num, value from table)a
join
(select substr(num,3) as num, symbols from table) b
on a.num = b.num
a.num a.value b.symbols
3 A ASC
1 D URB

OK, just one sql can implement your requirement.see below, table a is the table with value column and table b is the table with the symbols column, the SQL:
select
distinct a.num,
a.value,
b.symbols
from
mytable1 a
join
mytable2 b on substr(cast(b.num as string),0,3) = cast(a.num as string)

If datatype of num is String then you can try with Substr
Select
a.num
,a.value
,b.symbols
from a join b on
substr(a.num,3) = substr(b.num,3)

Can you pls try this
Select
a.num
,a.value
,b.symbols
from
(select substr(num,3) as num, value from table)a
join
(select substr(num,3) as num, symbols from table) b
on a.num = b.num

Can you try with left semi join with above query as shown below.
Select
a.num,
a.value,
b.symbols
from
mytable1 a
Left semi join
mytable2 b on substr(cast(b.num as string),0,3) = cast(a.num as string)

Related

how to select specific columns from three different tables in Oracle SQL

I am trying to select values from three different tables.
When I select all columns it works well, but if I select specific column, the SQL Error [42000]: JDBC-8027:Column name is ambiguous. appear.
this is the query that selected all that works well
SELECT
*
FROM (SELECT x.*, B.*,C.* , COUNT(*) OVER (PARTITION BY x.POLICY_NO) policy_no_count
FROM YIP.YOUTH_POLICY x
LEFT JOIN
YIP.YOUTH_POLICY_AREA B
ON x.POLICY_NO = B.POLICY_NO
LEFT JOIN
YIP.YOUTH_SMALL_CATEGORY C
ON B.SMALL_CATEGORY_SID = C.SMALL_CATEGORY_SID
ORDER BY x.POLICY_NO);
and this is the error query
SELECT DISTINCT
x.POLICY_NO,
x.POLICY_TITLE,
policy_no_count ,
B.SMALL_CATEGORY_SID,
C.SMALL_CATEGORY_TITLE
FROM (SELECT x.*, B.*,C.* , COUNT(*) OVER (PARTITION BY x.POLICY_NO) policy_no_count
FROM YIP.YOUTH_POLICY x
LEFT JOIN
YIP.YOUTH_POLICY_AREA B
ON x.POLICY_NO = B.POLICY_NO
LEFT JOIN
YIP.YOUTH_SMALL_CATEGORY C
ON B.SMALL_CATEGORY_SID = C.SMALL_CATEGORY_SID
ORDER BY x.POLICY_NO);
I am trying to select if A.POLICY_NO values duplicate rows more than 18, want to change C.SMALL_CATEGORY_TITLE values to "ZZ" and also want to cahge B.SMALL_CATEGORY_SID values to null.
that is why make 2 select in query like this
SELECT DISTINCT
x.POLICY_NO,
CASE WHEN (policy_no_count > 17) THEN 'ZZ' ELSE C.SMALL_CATEGORY_TITLE END AS C.SMALL_CATEGORY_TITLE,
CASE WHEN (policy_no_count > 17) THEN NULL ELSE B.SMALL_CATEGORY_SID END AS B.SMALL_CATEGORY_SID,
x.POLICY_TITLE
FROM (SELECT x.*, B.*,C.* , COUNT(*) OVER (PARTITION BY x.POLICY_NO) policy_no_count
FROM YIP.YOUTH_POLICY x
LEFT JOIN
YIP.YOUTH_POLICY_AREA B
ON x.POLICY_NO = B.POLICY_NO
LEFT JOIN
YIP.YOUTH_SMALL_CATEGORY C
ON B.SMALL_CATEGORY_SID = C.SMALL_CATEGORY_SID
ORDER BY x.POLICY_NO);
If i use that query, I got SQL Error [42000]: JDBC-8006:Missing FROM keyword. ¶at line 3, column 80 of null error..
I know I should solve it step by step. Is there any way to select specific columns?
That's most probably because of SELECT x.*, B.*,C.* - avoid asterisks - explicitly name all columns you need, and then pay attention to possible duplicate column names; if you have them, use column aliases.
For example, if that select (which is in a subquery) evaluates to
select x.id, x.name, b.id, b.name
then outer query doesn't know which id you want as two columns are named id (and also two names), so you'd have to
select x.id as x_id,
x.name as x_name,
b.id as b_id,
b.name as b_name
from ...
and - in outer query - select not just id, but e.g. x_id.

ORACLE Query to find value in other table based on dates

I have two tables, Table A has an ID and an Event Date and Table B has an ID, a Description and an Event Date.
Not all IDs in Table A appear in Table B and some IDs appear multiple times in Table B with different Descriptions for each event.
The Description in Table B is an attribute that can change over time, the Event date in Table B is the date that a given ID's Description changes from its default value (kept in another table) to the new value.
I want to find the Description in Table B that matches the Event Date in Table A so, for example
Table Sample Data
A1234 would return Green and A4567 would return Null
I can't create tables here so I need to be able to this with a query.
This query will select last description from before the event:
SELECT * FROM (
SELECT tabA.id, tabA.event_date, tabB.description,
ROW_NUMBER() OVER(PARTITION BY tabB.id ORDER BY tabB.event_date DESC) rn
FROM Table_A tabA
LEFT JOIN Table_B tabB ON tabA.id = tabB.id AND tabB.event_date <= tabA.event_date
) WHERE rn = 1
If I understand well your need, this could be a way:
select a.id, description
from tableA A
left join
(select id,
description,
event_date from_date,
lead(event_date) over (partition by id order by event_date) -1 as to_date
from tableB
) B
on (A.id = B.id and a.event_date between b.from_date and b.to_date)
The idea here is to evaluate, for each row in tableB the range of dates for which that row, and its description, is valid; given this, a simple join should do the job.
You can left join tables like:
select a.ID , b1.DESCRIPTION
from TABLE_A a
left join TABLE_B b1 on a.ID = b1.id and a.EVENT_DATE > b1.EVENT_DATE
left join TABLE_B b2 on a.ID = b2.id and b1.EVENT_DATE < b2.EVENT_DATE and a.EVENT_DATE > b2.EVENT_DATE
where b1.id is null or b2.EVENT_DATE is null;

OUTER JOIN from different column names to same column

I am trying to select columns from different tables (the columns have different name) and use an outer join to get them in a single table. How do I do this?
(I am using sqlplus)
Here is an example:
Table a:
a.NAME1 a.NAME2 a.RATING
Jack Sparrow 4
Table b:
b.FIRSTNAME b.LASTNAME b.RATING
Jack Sparrow 7
Table 3:
c.F_NAME c.L_NAME c.RATING
Jack Sparrow 6
I would like a table like this:
NAME RATING
Jack 4
7
6
I tried this code
SELECT
a.NAME1 AS NAME,
b.FIRSTNAME AS NAME,
c.F_NAME AS NAME,
a.RATING AS RATING,
b.RATING AS RATING,
c.RATING AS RATING
FROM a
FULL OUTER JOIN (b
CROSS JOIN c)
ON (a.NAME1 = b.FIRSTNAME
AND a.NAME1 = c.F_NAME);
But that didn't work. How do I go about achieving this?
It does not sound like you want to join the tables at all. If you joined three tables each with 1 row, you would end up with a result set that had a single row and many columns. Since your goal is to end up with three rows of data, you would want to use a union all
SELECT a.name1, a.rating
FROM a
UNION ALL
SELECT b.firstname, b.rating
FROM b
UNION ALL
SELECT c.f_name, c.rating
FROM c
If you want to eliminate duplicate rows, use a union rather than a union all.
select a.NAME1, a.NAME2, a.RATING, b.RATING, c.RATING
from a
left outer join b on b.FIRSTNAME = a.NAME1 and b.LASTNAME = a.NAME2
left outer join c on c.F_NAME = a.NAME1 and c.L_NAME = a.NAME2

Oracle join query

I have two tables
Table A has columns id|name|age.
Table B has columns id|name|age.
Sample Records from table A
1|xavi |23
2|christine|24
3|faisal |25
5|jude |27
Sample Records from table B
1|xavi |23
2|christine|22
3|faisal |23
4|ram |25
If id values from table A matches in table B than take records from table A only.
Also take records which are present in table A only
Also take records which are present in table B only
So my result should be
1|xavi |23
2|christine|24
3|faisal |25
4|ram |25
5|jude |27
You can simply use union operator to get unique values from both tables. Operator UNION will remove repeated values.
SELECT * FROM tableA AS t1
UNION
SELECT * FROM tableB AS t2
You have a precedence problem here. Take all the records from table A and then the extra records from table B:
select *
from A
union all
select *
from B
where B.id not in (select A.id from A);
You can also express this with a full outer join (assuming id is not duplicated in either table):
select coalesce(A.id, B.id) as id,
coalesce(A.name, B.name) as name,
coalesce(A.age, B.age) as age
from A full outer join
B
on A.id = B.id;
In this case, the coalesce() gives priority to the values in A.
select distinct * FROM
(
select ID, NAME, AGE from TableA
UNION ALL
select ID, NAME, AGE from TableB
) TableAB
Some things to consider --> Unless you're updating specific tables and the records are the same, it will not matter which table you're viewing the records from (because they're the same...).
If you want to see which table the records are deriving from, let me know and i'll show you how to do that as well... but the query is more complex and i don't really think it's required for the purpose described above. let me know if this helps... thanks, Brian
If the tables has relation you need:
Select DISTINCT *
from tableA a
Inner Join tableB b
On a.id = b.id
If not:
You have to use UNION and after using DISTINCT.
DISTINCT will not permit repeat rows.

Oracle get first preceding and following rows ordered by time in another table

I have 2 tables each of which have a timestamp column. How do I query for each row in A, the first preceding and following timestamps in B ?
I want:
A.id A.timestamp first_preceding(B.timestamp) first_following(B.timestamp)
I'd try this:
SELECT DISTINCT a.id, a.timestamp, b0.timestamp, b1.timestamp
FROM a, b b0, b b1
WHERE
b0.timestamp = (SELECT MAX(timestamp) FROM b WHERE timestamp < a.timestamp)
AND b1.timestamp = (SELECT MIN(timestamp) FROM b WHERE timestamp > a.timestamp);

Resources