data not retrieved in the same order - oracle

I have a query which returns list of SIMs. Each SIM is linked to a Customer. The SIMs are in T_SIM table and Customers are in T_CUSTOMER table. There can be more than one SIM linked to a single Customer. When returning the SIMs it returns the Customer details also.
The T_SIM table will have a foreigh key to T_CUSTOMER table.
The issue is:
First run the query by requesting top 100 records by doing order by CUSTOMER_CODE in ascending order.
Now run the same query by requesting top 1000 records by doing order by CUSTOMER_CODE in ascending order.
Here in point #2, in the results of 1000 records the first 100 records are not same as in point #1 result. The records got shuffled. The order is not consistent.
To resolve this I have used ROWID along with order by CUSTOMER_CODE.
But the solution is not accepted by the client.
Could you please suggest any other alternative to resolve the issue. The data type of CUSTOMER_CODE is VARCHAR2
Below is the query:
SELECT TT.SIM_ID,
TT.IMSI,
TT.MSISDN,
TT.SECONDARY_MSISDN,
TT.CUSTOMER_ID,
TT.SIM_STATE,
TCU.CUSTOMER_CODE
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 = 1
AND TT.SIM_ID IN
(SELECT SIM_ID
FROM
(SELECT *
FROM
(SELECT Z.*,
ROWNUM RNUM
FROM
(SELECT TT.SIM_ID
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 =1
ORDER BY TCU.CUSTOMER_CODE ASC
) Z
WHERE ROWNUM <= 1000
)
WHERE RNUM >= 0
)
)
ORDER BY TCU.CUSTOMER_CODE ASC
The result in both the cases is done order by CUSTOMER_CODE but the SIMS belonging to them are not coming in the same order.

The problem is that first you are limiting number of rows when selecting from t_sim (so these are selected randomly) , and just then you are ordering your output.
So what you should do, is to remove ROWNUM<1000 from inner query and
put it on the very top level like this:
select * from
( TT.SIM_ID,
TT.IMSI,
TT.MSISDN,
TT.SECONDARY_MSISDN,
TT.CUSTOMER_ID,
TT.SIM_STATE,
TCU.CUSTOMER_CODE
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 = 1
AND TT.SIM_ID IN
(SELECT SIM_ID
FROM
(SELECT *
FROM
(SELECT Z.*,
ROWNUM RNUM
FROM
(SELECT TT.SIM_ID
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 =1
ORDER BY TCU.CUSTOMER_CODE ASC
) Z
)
WHERE RNUM >= 0
)
)
ORDER BY TCU.CUSTOMER_CODE ASC
) where rownum<1000
Because first you want to make complete ordered result set and just then display 1000 top records of sim cards ordered by customer_code.

Related

select statement should return count as zero if no row return using group by clause

I have a table student_info, it has column "status", status can be P (present), A (absent), S (ill), T ( transfer), L (left).
I am looking for expected output as below.
status count(*)
P 12
S 1
A 2
T 0
L 0
But output is coming like as below:
Status Count(*)
P 12
S 1
A 2
we need rows against status T and L as well with count zero though no record exist in DB.
#mkuligowski's approach is close, but you need an outer join between the CTE providing all of the possible status values, and then you need to count the entries that actually match:
-- CTE to generate all possible status values
with stored_statuses (status) as (
select 'A' from dual
union all select 'L' from dual
union all select 'P' from dual
union all select 'S' from dual
union all select 'T' from dual
)
select ss.status, count(si.status)
from stored_statuses ss
left join student_info si on si.status = ss.status
group by ss.status;
STATUS COUNT(SI.STATUS)
------ ----------------
P 12
A 2
T 0
S 1
L 0
The CTE acts as a dummy table holding the five statuses you want to count. That is then outer joined to your real table - the outer join means the rows from the CTE are still included even if there is no match - and then the rows that are matched in your table are counted. That allows the zero counts to be included.
You could also do this with a collection:
select ss.status, count(si.status)
from (
select column_value as status from table(sys.odcivarchar2list('A','L','P','S','T'))
) ss
left join student_info si on si.status = ss.status
group by ss.status;
It would be preferable to have a physical table which holds those values (and their descriptions); you could also then have a primary/foreign key relationship to enforce the allowed values in your existing table.
If all the status values actually appear in your table, but you have a filter which happens to exclude all rows for some of them, then you could get the list of all (used) values from the table itself instead of hard-coding it.
If your initial query was something like this, with a completely made-up filter:
select si.status, count(*)
from student_info si
where si.some_condition = 'true'
group by si.status;
then you could use a subquery to get all the distinct values from the unfiltered table, outer join from that to the same table, and apply the filter as part of the outer join condition:
select ss.status, count(si.status)
from (
select distinct status from student_info
) ss
left join student_info si on si.status = ss.status
and si.some_condition = 'true'
group by ss.status;
It can't stay as a where clause (at least here, where it's applying to the right-hand-side of the outer join) because that would override the outer join and effectively turn it back into an inner join.
You should store somewhere your statuses (pherhaps in another table). Otherwise, you list them using subquery:
with stored_statuses as (
select 'P' code, 'present' description from dual
union all
select 'A' code, 'absent' description from dual
union all
select 'S' code, 'ill' description from dual
union all
select 'T' code, 'transfer' description from dual
union all
select 'L' code, 'left' description from dual
)
select ss.code, count(*) from student_info si
left join stored_statuses ss on ss.code = si.status
group by ss.code

Re-writing a join query

I have a question concerning Hive. Let me explain to you the scenario :
I am using a Hive action on Oozie; I have a query which is doing
succesive LEFT JOIN on different tables;
Total number of rows to be inserted is about 35 million;
First, the job was crashing due to lack of memory, so I set "set hive.auto.convert.join=false" the query was perfectly executed but it took 4 hours to be done;
I tried to rewrite the order of LEFT JOINs putting large tables at the end, but same result, about 4 hours to be executed;
Here is what the query look like:
INSERT OVERWRITE TABLE final_table
SELECT
T1.Id,
T1.some_field_name,
T1.another_filed_name,
T2.also_another_filed_name,
FROM table1 T1
LEFT JOIN table2 T2 ON ( T2.Id = T1.Id ) -- T2 is the smallest table
LEFT JOIN table3 T3 ON ( T3.Id = T1.Id )
LEFT JOIN table4 T4 ON ( T4.Id = T1.Id ) -- T4 is the biggest table
So, knowing the structure of the query is there a way to rewrite it so that I can avoid too many JOINs ?
Thanks in advance
PS: Even vectorization gave me the same timing
Too long for a comment, will be deleted later.
(1) Your current query won't compile.
(2) You are not selecting anything from T3 and T4, which makes no sense.
(3) Changing the order of tables is not likely to have any impact with cost based optimizer.
(4) Basically I would suggest to collect statistics on the tables, specifically on the id columns, but in your case I got a feeling that id is not unique in more than 1 table.
Add to your post the result of the following query:
select *
, case when cnt_1 = 0 then 1 else cnt_1 end
* case when cnt_2 = 0 then 1 else cnt_2 end
* case when cnt_3 = 0 then 1 else cnt_3 end
* case when cnt_4 = 0 then 1 else cnt_4 end as product
from (select id
,count(case when tab = 1 then 1 end) as cnt_1
,count(case when tab = 2 then 1 end) as cnt_2
,count(case when tab = 3 then 1 end) as cnt_3
,count(case when tab = 4 then 1 end) as cnt_4
from ( select 1 as tab,id from table1
union all select 2 as tab,id from table2
union all select 3 as tab,id from table3
union all select 4 as tab,id from table4
) t
group by id
having greatest (cnt_1,cnt_2,cnt_3,cnt_4) >= 10
) t
order by product desc
limit 10
;

How to left join with conditions in Toad Data Point Query Builder?

I'm trying to build a query in Toad Data Point. I have a subquery that has a row number to identify the records I'm interested in. This subquery needs to be left joined onto the main table only when the row number is 1. Here's the query I'm trying to visualize:
SELECT distinct E.EMPLID, E.ACAD_CAREER
FROM PS_STDNT_ENRL E
LEFT JOIN (
SELECT ACAD_CAREER, ROW_NUMBER() OVER (PARTITION BY ACAD_CAREER ORDER BY EFFDT DESC) as RN
FROM PS_ACAD_CAR_TBL
) T on T.ACAD_CAREER = E.ACAD_CAREER and RN = 1
When I try to replicate this, the row number condition is placed in the global WHERE clause. This is not the intended functionality because it removes any records that don't have a match in the subquery effectively making it an inner join.
Here is the query it's generating:
SELECT DISTINCT E.EMPLID, E.ACAD_CAREER, T.RN
FROM SYSADM.PS_STDNT_ENRL E
LEFT OUTER JOIN
(SELECT PS_ACAD_CAR_TBL.ACAD_CAREER,
ROW_NUMBER ()
OVER (PARTITION BY ACAD_CAREER ORDER BY EFFDT DESC)
AS RN
FROM SYSADM.PS_ACAD_CAR_TBL PS_ACAD_CAR_TBL) T
ON (E.ACAD_CAREER = T.ACAD_CAREER)
WHERE (T.RN = 1)
Is there a way to get the query builder to place that row number condition on the left join instead of the global WHERE clause?
I found a way to get this to work.
Add a calculated field to the main table with a value of 1.
Join the row number to this new calculated field.
Now the query has the filter in the join condition instead of the WHERE clause so that it joins as intended. Here is the query it made:
SELECT DISTINCT E.EMPLID, E.ACAD_CAREER, T.RN
FROM SYSADM.PS_STDNT_ENRL E
LEFT OUTER JOIN
(SELECT PS_ACAD_CAR_TBL.ACAD_CAREER,
ROW_NUMBER ()
OVER (PARTITION BY ACAD_CAREER ORDER BY EFFDT DESC)
AS RN
FROM SYSADM.PS_ACAD_CAR_TBL PS_ACAD_CAR_TBL) T
ON (E.ACAD_CAREER = T.ACAD_CAREER) AND (1 = T.RN)

Order by position

Lets say we have two tables
TableA (A1,A2) , TableB(B1,B2)
Is there any difference (in therms of performance, memory usage ) between the two queries (only order by clause positions are different) below in oracle
Select Y.*, ROWNUM rNum FROM (
select * from
TableA a join TableB b on a.A1 = b.B1
Where a.A2 = 'SomeVal'
Order by b.B2
) A
Select Y.*, ROWNUM rNum FROM (
select * from
TableA a join TableB b on a.A1 = b.B1
Where a.A2 = 'SomeVal'
) A
Order by B2
Yes -- in the latter the rownum is assigned prior to the rows being ordered, and in the former the rownum is assigned after the rows are ordered.
So the first query's rownums might read as, "1,2,3,4,5 ...", whereas the second query's rownums might read, "33,3,5,45,1 ..."

RANK OVER function in Hive

I'm trying to run this query in Hive to return only the top 10 url which appear more often in the adimpression table.
select
ranked_mytable.url,
ranked_mytable.cnt
from
( select iq.url, iq.cnt, rank() over (partition by iq.url order by iq.cnt desc) rnk
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url
) iq
) ranked_mytable
where
ranked_mytable.rnk <= 10
order by
ranked_mytable.url,
ranked_mytable.rnk desc
;
Unfortunately I get an error message stating:
FAILED: SemanticException [Error 10002]: Line 26:23 Invalid column reference 'rnk'
I've tried to debug it and until the ranked_mytable sub-queries everything goes smooth. I've tried to comment the where ranked_mytable.rnk <= 10 clause but the error message keep appearing.
Hive is unable to order by a column that is not in the "output" of a select statement. To fix it, just include that column in the selected columns:
select
ranked_mytable.url,
ranked_mytable.cnt,
ranked_mytable.rnk
from
( select iq.url, iq.cnt, rank() over (partition by iq.url order by iq.cnt desc) rnk
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url
) iq
) ranked_mytable
where
ranked_mytable.rnk <= 10
order by
ranked_mytable.url,
ranked_mytable.rnk desc
;
If you don't want that 'rnk' column in your final output, I expect you could wrap that whole thing in another inner-query and just select out the 'url' and 'cnt' fields.
RANK OVER is not the best function to achieve this goal.
A better solution would be to use a combination of SORT BY and LIMIT. It's true in fact LIMIT picks randomly the rows in a table, but this might be avoided if used with the SORT BY function. From the Apache Wiki:
-- Top k queries. The following query returns the top 5 sales records wrt amount.
SET mapred.reduce.tasks = 1 SELECT * FROM sales SORT BY amount
DESC LIMIT 5
The query can be re-written in this way:
select
iq.url,
iq.cnt
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url ) iq
sort by
iq.cnt desc
limit
10
;
Remove the partition by iq.url clause from rank over() and re-run query.
Thanks & Regards,
Kamleshkumar Gujarathi
Put as before the rnk variable. It should work fine.

Resources