Hive: Using select inside select - hadoop

First I used the query:
select name
from tab1
where id in (select id
from (select id,count(id) as a
from tab2
group by id
order by a desc limit 1) ;
and I came to know that select inside select is not possible in hive.
So I modified it using variable.
set var1= select count(id) as a from tab2 group by id order by a desc limit 1;
select name from tab1 group by name having count(id)='${hiveconf:var1}';
But in the place of '${hiveconf:var1}', the query got substitued and again getting the same error.
Is there any way to do this?

select t1.name
from tab1 t1
join (select id
,count(*) as cnt
from tab2
group by id
order by cnt desc
limit 1
) t2
on t2.id = t1.id

Related

Oracle SQL -- Finding count of rows that match date maximum in table

I am trying to use a query to return the count from rows such that the date of the rows matches the maximum date for that column in the table.
Oracle SQL: version 11.2:
The following syntax would seem to be correct (to me), and it compiles and runs. However, instead of returning JUST the count for the maximum, it returns several counts more or less like the "HAIVNG" clause wasn't there.
Select ourDate, Count(1) as OUR_COUNT
from schema1.table1
group by ourDate
HAVING ourDate = max(ourDate) ;
How can this be fixed, please?
You can use:
SELECT MAX(ourDate) AS ourDate,
COUNT(*) KEEP (DENSE_RANK LAST ORDER BY ourDate) AS ourCount
FROM schema1.table1
or:
SELECT ourDate,
COUNT(*) AS our_count
FROM (
SELECT ourDate,
RANK() OVER (ORDER BY ourDate DESC) AS rnk
FROM schema1.table1
)
WHERE rnk = 1
GROUP BY ourDate
Which, for the sample data:
CREATE TABLE table1 (ourDate) AS
SELECT SYSDATE FROM DUAL CONNECT BY LEVEL <= 5 UNION ALL
SELECT SYSDATE - 1 FROM DUAL;
Both output:
OURDATE
OUR_COUNT
2022-06-28 13:35:01
5
db<>fiddle here
I don't know if I understand what you want. Try this:
Select x.ourDate, Count(1) as OUR_COUNT
from schema1.table1 x
where x.ourDate = (select max(y.ourDate) from schema1.table1 y)
group by x.ourDate
One option is to use a subquery which fetches maximum date:
select ourdate, count(*)
from table1
where ourdate = (select max(ourdate)
from table1)
group by ourdate;
Or, a more modern approach (if your database version supports it; 11g doesn't, though):
select ourdate, count(*)
from table1
group by ourdate
order by ourdate desc
fetch first 1 rows only;
You can use this SQL query:
select MAX(ourDate),COUNT(1) as OUR_COUNT
from schema1.table1
where ourDate = (select MAX(ourDate) from schema1.table1)
group by ourDate;

ORACLE SQL DEVELOPER-select count(*) from multiple databases

How can I select count(*) from two different databases(call them ZEOTA and SP) having as result:
Zeota SP
88 3
I have tried this:
SELECT COUNT(CONSTRAINT_TYPE) NumberOfPrimaryKeys_Zeota
FROM ALL_CONSTRAINTS
WHERE CONSTRAINT_TYPE = 'P'
AND
OWNER = 'ZEOTA';
SELECT COUNT(*) AS NumberOfAttributes_SP
FROM ALL_COL_COMMENTS
WHERE OWNER = 'SP';
But the output shows two separate query results:
Query result 1:
Zeota
88
Query Result 2:
SP
3
However I am trying to do it this way, but having issues:
SELECT
COUNT(DISTINCT TABLE_NAME) AS ZNumOfTables,
COUNT(DISTINCT TABLE_NAME) AS SNumOfTables,
COUNT(DISTINCT TABLE_NAME) AS PNumOfTables
FROM ALL_TAB_COLUMNS
WHERE OWNER = 'SP';
WHERE OWNER = 'ZEOTA';
One option is to use your current queries as CTEs and then just cross join them:
with
t1 as
(select count(constraint_type) num_pk_zeota
from all_constraints
where owner = 'ZEOTA'
and constraint_type = 'PK'
),
t2 as
(select count(*) num_attr_sp
from all_col_comments
where owner = 'SP'
)
select a.num_pk_zeota,
b.num_attr_sp
from t1 a cross join t2 b;
P.S. What you call "databases" are users (schemas) in Oracle.

Create Oracle database query

I have the following table (tb1):
I need to create a query that consist of:
Select the oldest Date_created having Status 001.
Should not select a PCR if the same PCR having status 002.
For the table above, this query should return the following table:
Can anyone help me how to create it?
Final query:
select q2.id,q2.PCR,q2.status, q2.date_created from (select pcr, min(date_created) date_created from table1 t1 where not exists (select * from table1 t2 where t1.pcr = t2.pcr and t2.status = '002') group by pcr) q1 inner join (select * from table1) q2 on q1.PCR = q2.PCR and q1.date_created = q2.date_created

Get unmatched records without using oracle minus except not in

Actually I have two table and each having column name, I just want the result which are not there in Table2
Table1
----
Name
---
|A|
|B|
|C|
|D|
Table2
------
|Name|
-----
|A|
|B|
Answer
|C|
|D|
I am able to do it by using minus
select name from table1
minus
select name from table2
select name from table1 where name
not in (
select name from table2)
But my Manager ask me to do it with other alternate solution without using minus,except,not in.
Is there a way to do that, It will be great if someone can help me on it.
I need to do it with oracle pl/sql
The one option left with you is using NOT EXISTS
SELECT t1.name
FROM table1 t1
WHERE NOT EXISTS (SELECT 'X'
FROM table2 t2
WHERE t2.name = t1.name);
Update: Using Join
with table_ as
(
select t1.name t1_name, t2.name t2_name
from table1 t1
left join table2 t2
on t1.name = t2.name)
select t1_name
from table_
where t2_name is null;
Or just
select t1.name
from table1 t1
left join table2 t2
on t1.name = t2.name
where t2.name is null;
Another alternative is to use an outer join and then filter rows that don't have a value in the 2nd table:
with t1 as (select 'A' name from dual union all
select 'B' name from dual union all
select 'C' name from dual union all
select 'D' name from dual),
t2 as (select 'A' name from dual union all
select 'B' name from dual)
select t1.name
from t1
left outer join t2 on (t1.name = t2.name)
where t2.name is null;
NAME
----
D
C

How to optimize this SELECT with sub query Oracle

Here is my query,
SELECT ID As Col1,
(
SELECT VID FROM TABLE2 t
WHERE (a.ID=t.ID or a.ID=t.ID2)
AND t.STARTDTE =
(
SELECT MAX(tt.STARTDTE)
FROM TABLE2 tt
WHERE (a.ID=tt.ID or a.ID=tt.ID2) AND tt.STARTDTE < SYSDATE
)
) As Col2
FROM TABLE1 a
Table1 has 48850 records and Table2 has 15944098 records.
I have separate indexes in TABLE2 on ID,ID & STARTDTE, STARTDTE, ID, ID2 & STARTDTE.
The query is still too slow. How can this be improved? Please help.
I'm guessing that the OR in inner queries is messing up with the optimizer's ability to use indexes. Also I wouldn't recommend a solution that would scan all of TABLE2 given its size.
This is why in this case I would suggest using a function that will efficiently retrieve the information you are looking for (2 index scan per call):
CREATE OR REPLACE FUNCTION getvid(p_id table1.id%TYPE)
RETURN table2.vid%TYPE IS
l_result table2.vid%TYPE;
BEGIN
SELECT vid
INTO l_result
FROM (SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
UNION ALL
SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id2 = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
ORDER BY startdte DESC)
WHERE rownum = 1;
RETURN l_result;
END;
Your SQL would become:
SELECT ID As Col1,
getvid(a.id) vid
FROM TABLE1 a
Make sure you have indexes on both table2(id, startdte DESC) and table2(id2, startdte DESC). The order of the index is very important.
Possibly try the following, though untested.
WITH max_times AS
(SELECT a.ID, MAX(t.STARTDTE) AS Startdte
FROM TABLE1 a, TABLE2 t
WHERE (a.ID=t.ID OR a.ID=t.ID2)
AND t.STARTDTE < SYSDATE
GROUP BY a.ID)
SELECT b.ID As Col1, tt.VID
FROM TABLE1 b
LEFT OUTER JOIN max_times mt
ON (b.ID = mt.ID)
LEFT OUTER JOIN TABLE2 tt
ON ((mt.ID=tt.ID OR mt.ID=tt.ID2)
AND mt.startdte = tt.startdte)
You can look at analytic functions to avoid having to hit the second table twice. Something like this might work:
SELECT id AS col1, vid
FROM (
SELECT t1.id, t2.vid, RANK() OVER (PARTITION BY t1.id ORDER BY
CASE WHEN t2.startdte < TRUNC(SYSDATE) THEN t2.startdte ELSE null END
NULLS LAST) AS rn
FROM table1 t1
JOIN table2 t2 ON t2.id IN (t1.ID, t1.ID2)
)
WHERE rn = 1;
The inner select gets the id and vid values from the two tables with a simple join on id or id2. The rank function calculates a ranking for each matching row in the second table based on the startdte. It's complicated a bit by you wanting to filter on that date, so I've used a case to effectively ignore any dates today or later by changing the evaluated value to null, and in this instance that means the order by in the over clause needs nulls last so they're ignored.
I'd suggest you run the inner select on its own first - maybe with just a couple of id values for brevity - to see what its doing, and what ranks are being allocated.
The outer query is then just picking the top-ranked result for each id.
You may still get duplicates though; if table2 has more than one row for an id with the same startdte they'll get the same rank, but then you may have had that situation before. You may need to add more fields to the order by to break ties in a way that makes sens to you.
But this is largely speculation without being able to see where your existing query is actually slow.

Resources