Distinct in hive - hadoop

I want to use DISTINCT in my hive query.So query looks like
Insert overwrite table tablename select distinct a.id AS ID , a.id AS SID from a left join b on a.id = b.id;
So in above query I want to insert same value for two different column.With DISTINCT query doesn't work ,otherwise it works.

Related

how to select specific columns from three different tables in Oracle SQL

I am trying to select values from three different tables.
When I select all columns it works well, but if I select specific column, the SQL Error [42000]: JDBC-8027:Column name is ambiguous. appear.
this is the query that selected all that works well
SELECT
*
FROM (SELECT x.*, B.*,C.* , COUNT(*) OVER (PARTITION BY x.POLICY_NO) policy_no_count
FROM YIP.YOUTH_POLICY x
LEFT JOIN
YIP.YOUTH_POLICY_AREA B
ON x.POLICY_NO = B.POLICY_NO
LEFT JOIN
YIP.YOUTH_SMALL_CATEGORY C
ON B.SMALL_CATEGORY_SID = C.SMALL_CATEGORY_SID
ORDER BY x.POLICY_NO);
and this is the error query
SELECT DISTINCT
x.POLICY_NO,
x.POLICY_TITLE,
policy_no_count ,
B.SMALL_CATEGORY_SID,
C.SMALL_CATEGORY_TITLE
FROM (SELECT x.*, B.*,C.* , COUNT(*) OVER (PARTITION BY x.POLICY_NO) policy_no_count
FROM YIP.YOUTH_POLICY x
LEFT JOIN
YIP.YOUTH_POLICY_AREA B
ON x.POLICY_NO = B.POLICY_NO
LEFT JOIN
YIP.YOUTH_SMALL_CATEGORY C
ON B.SMALL_CATEGORY_SID = C.SMALL_CATEGORY_SID
ORDER BY x.POLICY_NO);
I am trying to select if A.POLICY_NO values duplicate rows more than 18, want to change C.SMALL_CATEGORY_TITLE values to "ZZ" and also want to cahge B.SMALL_CATEGORY_SID values to null.
that is why make 2 select in query like this
SELECT DISTINCT
x.POLICY_NO,
CASE WHEN (policy_no_count > 17) THEN 'ZZ' ELSE C.SMALL_CATEGORY_TITLE END AS C.SMALL_CATEGORY_TITLE,
CASE WHEN (policy_no_count > 17) THEN NULL ELSE B.SMALL_CATEGORY_SID END AS B.SMALL_CATEGORY_SID,
x.POLICY_TITLE
FROM (SELECT x.*, B.*,C.* , COUNT(*) OVER (PARTITION BY x.POLICY_NO) policy_no_count
FROM YIP.YOUTH_POLICY x
LEFT JOIN
YIP.YOUTH_POLICY_AREA B
ON x.POLICY_NO = B.POLICY_NO
LEFT JOIN
YIP.YOUTH_SMALL_CATEGORY C
ON B.SMALL_CATEGORY_SID = C.SMALL_CATEGORY_SID
ORDER BY x.POLICY_NO);
If i use that query, I got SQL Error [42000]: JDBC-8006:Missing FROM keyword. ¶at line 3, column 80 of null error..
I know I should solve it step by step. Is there any way to select specific columns?
That's most probably because of SELECT x.*, B.*,C.* - avoid asterisks - explicitly name all columns you need, and then pay attention to possible duplicate column names; if you have them, use column aliases.
For example, if that select (which is in a subquery) evaluates to
select x.id, x.name, b.id, b.name
then outer query doesn't know which id you want as two columns are named id (and also two names), so you'd have to
select x.id as x_id,
x.name as x_name,
b.id as b_id,
b.name as b_name
from ...
and - in outer query - select not just id, but e.g. x_id.

ORACLE Query to find value in other table based on dates

I have two tables, Table A has an ID and an Event Date and Table B has an ID, a Description and an Event Date.
Not all IDs in Table A appear in Table B and some IDs appear multiple times in Table B with different Descriptions for each event.
The Description in Table B is an attribute that can change over time, the Event date in Table B is the date that a given ID's Description changes from its default value (kept in another table) to the new value.
I want to find the Description in Table B that matches the Event Date in Table A so, for example
Table Sample Data
A1234 would return Green and A4567 would return Null
I can't create tables here so I need to be able to this with a query.
This query will select last description from before the event:
SELECT * FROM (
SELECT tabA.id, tabA.event_date, tabB.description,
ROW_NUMBER() OVER(PARTITION BY tabB.id ORDER BY tabB.event_date DESC) rn
FROM Table_A tabA
LEFT JOIN Table_B tabB ON tabA.id = tabB.id AND tabB.event_date <= tabA.event_date
) WHERE rn = 1
If I understand well your need, this could be a way:
select a.id, description
from tableA A
left join
(select id,
description,
event_date from_date,
lead(event_date) over (partition by id order by event_date) -1 as to_date
from tableB
) B
on (A.id = B.id and a.event_date between b.from_date and b.to_date)
The idea here is to evaluate, for each row in tableB the range of dates for which that row, and its description, is valid; given this, a simple join should do the job.
You can left join tables like:
select a.ID , b1.DESCRIPTION
from TABLE_A a
left join TABLE_B b1 on a.ID = b1.id and a.EVENT_DATE > b1.EVENT_DATE
left join TABLE_B b2 on a.ID = b2.id and b1.EVENT_DATE < b2.EVENT_DATE and a.EVENT_DATE > b2.EVENT_DATE
where b1.id is null or b2.EVENT_DATE is null;

Workaround for delete in hive with join conditions

So i'm trying to convert a SQL delete query to Hive one. Im using hive .12 version's which doesn't supports delete.
Below is the SQL query:
Delete from t1 c where exists(select 1 from t2 a where
a.emplid=c.employee_id and a.project_status='test')
Now i tried tried using NOT IN for the above query but due to some reasons we cannot use NOT IN in our queries.
below is Hive query i have written but i'm not sure as its not giving correct results. i'm pretty new to hive. can anyone help on this.
INSERT Overwrite table t1
select * from t1 c left outer join t2 a on (c.employee_id=a.employee_id)
where a.project_status= 'test'
and a.employee_id is null
Move project_status='test' condition to the subquery or into the on clause. Also you should select columns only from table c.
Example with filter in the subquery:
insert overwrite table t1
select c.*
from t1 c
left join (select employee_id
from t2
where project_status='test'
) a on (c.employee_id=a.employee_id)
where a.employee_id is null;
Example with additional condition in the ON:
insert overwrite table t1
select c.*
from t1 c
left join t2 a on (c.employee_id=a.employee_id and a.project_status='test')
where a.employee_id is null;

Left Join with where optional

I've a two tables and a clause where, but I want to bring the left side always independent of the clause where.
Example
table a
-----------------
id
nome
table B
-----------------
id
nome
date
id_a
I've been a query and it works when don't exists value in table B or when the clause WHERE fetch.
select ta.* from table_a ta
left join table_b tb on ta.id = tb.id_a
where
tb.date = '2015-07-05' or tb.id is null
In my table has a registry with tb.date = '2015-07-05'. This query works, but I trying the query with tb.date = '2015-07-04' the query fetch with left side in join, but don't bring the row that have table_b '2015-07-05'.
I want fetch left side in join independent of the clause where.
select ta.* from table_a ta
left join table_b tb on (ta.id = tb.id_a and tb.date = '2015-07-05');
Also, '2015-07-05' is considered a string in Oracle. Always use to_date to compare date values.

Oracle join query

I have two tables
Table A has columns id|name|age.
Table B has columns id|name|age.
Sample Records from table A
1|xavi |23
2|christine|24
3|faisal |25
5|jude |27
Sample Records from table B
1|xavi |23
2|christine|22
3|faisal |23
4|ram |25
If id values from table A matches in table B than take records from table A only.
Also take records which are present in table A only
Also take records which are present in table B only
So my result should be
1|xavi |23
2|christine|24
3|faisal |25
4|ram |25
5|jude |27
You can simply use union operator to get unique values from both tables. Operator UNION will remove repeated values.
SELECT * FROM tableA AS t1
UNION
SELECT * FROM tableB AS t2
You have a precedence problem here. Take all the records from table A and then the extra records from table B:
select *
from A
union all
select *
from B
where B.id not in (select A.id from A);
You can also express this with a full outer join (assuming id is not duplicated in either table):
select coalesce(A.id, B.id) as id,
coalesce(A.name, B.name) as name,
coalesce(A.age, B.age) as age
from A full outer join
B
on A.id = B.id;
In this case, the coalesce() gives priority to the values in A.
select distinct * FROM
(
select ID, NAME, AGE from TableA
UNION ALL
select ID, NAME, AGE from TableB
) TableAB
Some things to consider --> Unless you're updating specific tables and the records are the same, it will not matter which table you're viewing the records from (because they're the same...).
If you want to see which table the records are deriving from, let me know and i'll show you how to do that as well... but the query is more complex and i don't really think it's required for the purpose described above. let me know if this helps... thanks, Brian
If the tables has relation you need:
Select DISTINCT *
from tableA a
Inner Join tableB b
On a.id = b.id
If not:
You have to use UNION and after using DISTINCT.
DISTINCT will not permit repeat rows.

Resources