What does the following statement equate to when table2.level = 3 for some rows?
SELECT *
FROM table1, table2
WHERE table1.value1 =
CASE table2.id
WHEN 1 THEN table2.value1(+)
WHEN 2 THEN table2.value2(+)
END
Oracle DOCS state that "CASE returns NULL" if none of the cases evaluate to TRUE and the ELSE expression is omitted, but substituting the CASE expression by NULL yields 0 results instead.
The result, by the way, is that the query results omit the optional table2 results for rows in which table2.level = 3 as if the entire equation after AND results in TRUE (for those particular rows).
Can someone elaborate on how/why this works?
Edit: Replacing the (+) operator with the LEFT OUTER JOIN syntax helped me gain insight in what it means when the CASE expression returns NULL.
The following statements are equal if table2.id IN (1,2,3)
SELECT *
FROM table1
LEFT OUTER JOIN table2
ON table1.value1 =
CASE table2.id
WHEN 1 THEN table2.value1
WHEN 2 THEN table2.value2
END
SELECT *
FROM table1
LEFT OUTER JOIN table2
ON table1.value1 =
CASE table2.id
WHEN 1 THEN table2.value1
WHEN 2 THEN table2.value2
WHEN 3 THEN NULL
END
SELECT *
FROM table1
LEFT OUTER JOIN table2
ON table1.value1 =
CASE table2.id
WHEN 1 THEN table2.value1
WHEN 2 THEN table2.value2
ELSE NULL
END
I think that answers the question;
it does equate to NULL but only for the rows not in the CASE expression, still outer joining the table2 results when table2.id = 1 OR table2.id = 2
Related
I am trying to select values from three different tables.
When I select all columns it works well, but if I select specific column, the SQL Error [42000]: JDBC-8027:Column name is ambiguous. appear.
this is the query that selected all that works well
SELECT
*
FROM (SELECT x.*, B.*,C.* , COUNT(*) OVER (PARTITION BY x.POLICY_NO) policy_no_count
FROM YIP.YOUTH_POLICY x
LEFT JOIN
YIP.YOUTH_POLICY_AREA B
ON x.POLICY_NO = B.POLICY_NO
LEFT JOIN
YIP.YOUTH_SMALL_CATEGORY C
ON B.SMALL_CATEGORY_SID = C.SMALL_CATEGORY_SID
ORDER BY x.POLICY_NO);
and this is the error query
SELECT DISTINCT
x.POLICY_NO,
x.POLICY_TITLE,
policy_no_count ,
B.SMALL_CATEGORY_SID,
C.SMALL_CATEGORY_TITLE
FROM (SELECT x.*, B.*,C.* , COUNT(*) OVER (PARTITION BY x.POLICY_NO) policy_no_count
FROM YIP.YOUTH_POLICY x
LEFT JOIN
YIP.YOUTH_POLICY_AREA B
ON x.POLICY_NO = B.POLICY_NO
LEFT JOIN
YIP.YOUTH_SMALL_CATEGORY C
ON B.SMALL_CATEGORY_SID = C.SMALL_CATEGORY_SID
ORDER BY x.POLICY_NO);
I am trying to select if A.POLICY_NO values duplicate rows more than 18, want to change C.SMALL_CATEGORY_TITLE values to "ZZ" and also want to cahge B.SMALL_CATEGORY_SID values to null.
that is why make 2 select in query like this
SELECT DISTINCT
x.POLICY_NO,
CASE WHEN (policy_no_count > 17) THEN 'ZZ' ELSE C.SMALL_CATEGORY_TITLE END AS C.SMALL_CATEGORY_TITLE,
CASE WHEN (policy_no_count > 17) THEN NULL ELSE B.SMALL_CATEGORY_SID END AS B.SMALL_CATEGORY_SID,
x.POLICY_TITLE
FROM (SELECT x.*, B.*,C.* , COUNT(*) OVER (PARTITION BY x.POLICY_NO) policy_no_count
FROM YIP.YOUTH_POLICY x
LEFT JOIN
YIP.YOUTH_POLICY_AREA B
ON x.POLICY_NO = B.POLICY_NO
LEFT JOIN
YIP.YOUTH_SMALL_CATEGORY C
ON B.SMALL_CATEGORY_SID = C.SMALL_CATEGORY_SID
ORDER BY x.POLICY_NO);
If i use that query, I got SQL Error [42000]: JDBC-8006:Missing FROM keyword. ¶at line 3, column 80 of null error..
I know I should solve it step by step. Is there any way to select specific columns?
That's most probably because of SELECT x.*, B.*,C.* - avoid asterisks - explicitly name all columns you need, and then pay attention to possible duplicate column names; if you have them, use column aliases.
For example, if that select (which is in a subquery) evaluates to
select x.id, x.name, b.id, b.name
then outer query doesn't know which id you want as two columns are named id (and also two names), so you'd have to
select x.id as x_id,
x.name as x_name,
b.id as b_id,
b.name as b_name
from ...
and - in outer query - select not just id, but e.g. x_id.
I don't understand why these two queries below fetch different count. Case 1 below fetches more rows while Case 2 fetches fewer rows. If the where clause is put outside, fewer records are fetched.
Case 1
SELECT COUNT(1)
FROM (
SELECT *
FROM (SELECT * FROM TABLE1 WHERE COL1 = 123) A
LEFT JOIN TABLE2 B ON B.COL2=A.COL4
LEFT JOIN TABLE3 C ON C.COL3=B.COL2
)
Case 2
SELECT COUNT(1)
FROM (
SELECT *
FROM (SELECT * FROM TABLE1 ) A
LEFT JOIN TABLE2 B ON B.COL2=A.COL4
LEFT JOIN TABLE3 C ON C.COL3=B.COL2
)
WHERE COL1 = 123
Theoretical explanation:
Consider a left outer join of tables A and B. A condition (filter) on table B has different effects if it is in the join condition (ON clause) vs. in the WHERE clause. EDIT: The filter on B being in the ON condition is equivalent to replacing B with a subquery where the filter is applied first (similar to the OP's example).
If it's in the ON clause, then the rows in table B are filtered for that condition, and then the left join is performed. Then the result of the query will include rows from A (with NULL for the B side) whenever there are no rows in B that satisfy the filter and match the row in A on the join condition.
On the other hand, if the filter on B comes later in the execution, in a WHERE clause, then the left join is performed first. Only then is the WHERE clause applied. The WHERE clause is very likely (depending on the conditions on B) to reject all the rows from A that didn't have a matching row in B - because for such rows, all the values from B are NULL.
In your case, assuming COL1 only exists in table B, then the condition COL1=123 in a WHERE clause will effectively cause the left join to produce the same result as an inner join: any rows from A that didn't have a match in B will come from the left join with COL1 as NULL, so they will fail the filter condition. When you put COL1=123 in the ON clause, that check is done BEFORE the "outer join" operation.
I have a question concerning Hive. Let me explain to you the scenario :
I am using a Hive action on Oozie; I have a query which is doing
succesive LEFT JOIN on different tables;
Total number of rows to be inserted is about 35 million;
First, the job was crashing due to lack of memory, so I set "set hive.auto.convert.join=false" the query was perfectly executed but it took 4 hours to be done;
I tried to rewrite the order of LEFT JOINs putting large tables at the end, but same result, about 4 hours to be executed;
Here is what the query look like:
INSERT OVERWRITE TABLE final_table
SELECT
T1.Id,
T1.some_field_name,
T1.another_filed_name,
T2.also_another_filed_name,
FROM table1 T1
LEFT JOIN table2 T2 ON ( T2.Id = T1.Id ) -- T2 is the smallest table
LEFT JOIN table3 T3 ON ( T3.Id = T1.Id )
LEFT JOIN table4 T4 ON ( T4.Id = T1.Id ) -- T4 is the biggest table
So, knowing the structure of the query is there a way to rewrite it so that I can avoid too many JOINs ?
Thanks in advance
PS: Even vectorization gave me the same timing
Too long for a comment, will be deleted later.
(1) Your current query won't compile.
(2) You are not selecting anything from T3 and T4, which makes no sense.
(3) Changing the order of tables is not likely to have any impact with cost based optimizer.
(4) Basically I would suggest to collect statistics on the tables, specifically on the id columns, but in your case I got a feeling that id is not unique in more than 1 table.
Add to your post the result of the following query:
select *
, case when cnt_1 = 0 then 1 else cnt_1 end
* case when cnt_2 = 0 then 1 else cnt_2 end
* case when cnt_3 = 0 then 1 else cnt_3 end
* case when cnt_4 = 0 then 1 else cnt_4 end as product
from (select id
,count(case when tab = 1 then 1 end) as cnt_1
,count(case when tab = 2 then 1 end) as cnt_2
,count(case when tab = 3 then 1 end) as cnt_3
,count(case when tab = 4 then 1 end) as cnt_4
from ( select 1 as tab,id from table1
union all select 2 as tab,id from table2
union all select 3 as tab,id from table3
union all select 4 as tab,id from table4
) t
group by id
having greatest (cnt_1,cnt_2,cnt_3,cnt_4) >= 10
) t
order by product desc
limit 10
;
I need help with my SQL Query I have Two tables that i need to join using a LEFT OUTER JOIN, then i need to create a database view over that particular view. If i run a query on the join to look for name A i need to get that A's latest brand "AP".
Table 1
ID name address
-----------------------
1 A ATL
2 B ATL
TABLE 2
ID PER_ID brand DATEE
--------------------------------------------
1 1 MS 5/19/17:1:00pm
2 1 XB 5/19/17:1:05pm
3 1 AP 5/19/17:2:00pm
4 2 RO 5/19/17:3:00pm
5 2 WE 5/19/17:4:00pm
I tried query a which returns correct result but i get problem 1 when i try to build the database view on top of the join. I tried query b but when i query my view in oracle sql developer i still get all the results but not the latest.
query a:
SELECT * from table_1
left outer join table_2 on table_1.ID = Table_2.PER_ID
AND table_2.DATE = (SELECT MAX(DATE) from table_2 z where z.PER_ID = table_2.PER_ID)
Problem 1
Error report -
ORA-01799: a column may not be outer-joined to a subquery
01799. 00000 - "a column may not be outer-joined to a subquery"
*Cause: <expression>(+) <relop> (<subquery>) is not allowed.
*Action: Either remove the (+) or make a view out of the subquery.
In V6 and before, the (+) was just ignored in this case.
Query 2:
SELECT * from table_1
left outer join(SELECT PER_ID,brand, max(DATEE) from table_2 group by brand,PER_ID) t2 on table_1.ID = t2.PER_ID
Use row_number():
select t1.id, t1.name, t1.address, t2.id as t2_id, t2.brand, t2.datee
from table_1 t1 left outer join
(select t2.*,
row_number() over (partition by per_id order by date desc) as seqnum
from table_2 t2
) t2
on t1.ID = t2.PER_ID and t2.seqnum = 1;
When defining a view, you should be in the habit of listing the columns explicitly.
I have a basic query(rewritten with vague names), I do not understand why hive is asking for the t2.description column in the case statement to be added to the group by. I appeased them and put it in but of course I get null value for that column for every row... If i take out the case statement and query the raw data I get all the lovely descriptions. only when I want to add some logic with the case statement does it fail. I am new to Hive and understand it is not ANSI sql but I did not imagine it to be this finicky.
select
t1.columnid as column_id,
(case when t2.description in ('description1','description2','description3') then t2.description else null end) as label_description
from table1 t1
left outer join table2 t2 on (t1.inresult = t2.inresult)
group by
t1.columnid
It's often difficult to understand the actual problem based on the error logs shown by Hive's sql parser. The problem here is that you are selecting 2 columns but only applying the GROUP BY to one column. To make this query executable you must do one of the following:
Group by both column 1 and column 2
select t1.columnid as column_id,
(case when t2.description in ('description1','description2','description3') then t2.description
else null end) as label_description from table1 t1 left outer join
table2 t2 on (t1.inresult = t2.inresult) GROUP BY t1.columnid, (case
when t2.description in ('description1','description2','description3')
then t2.description else null end);
Do not use a GROUP BY statement
select t1.columnid as column_id,
(case when t2.description in ('description1','description2','description3') then t2.description
else null end) as label_description from table1 t1 left outer join
table2 t2 on (t1.inresult = t2.inresult)
Apply an aggregate function to column 2
select t1.columnid as column_id,
MIN(case when t2.description in ('description1','description2','description3') then t2.description
else null end) as label_description from table1 t1 left outer join
table2 t2 on (t1.inresult = t2.inresult) group by t1.columnid
For hive, if you are using a GROUP BY then all the columns you are selecting must either be in the GROUP BY statement or be wrapped in an aggregate statement applied such as MAX, MIN or SUM.