Hi I have two table like that:
First: Tcustcounselm
Second: Tcustcounseldt
Tcustcounseldt include these columns : Counsel_Seq, Proc_Note, Proc_Date
Tcustcounselm include these columns : Counsel_Seq, Cust_No, Proc_Date
I have customers and I want to retrieve the customer proc_note (refund detail). Some customers have more than one refund detail, I only want the latest one.
This is my plsql code but when I run it it gives the same cust_no twice when I only want to see the latest one.
Select
A.Cust_No,
Max(To_Char(B.Proc_Date, 'yyyy/mm/dd hh24:mi:ss')) As Proc,
B.Proc_Note,
A.Counsel_Seq
FROM Tcustcounselm A,
Tcustcounseldt B
WHERE
A.Counsel_Seq = B.Counsel_Seq
--AND B.Do_Flag ='40'
AND A.Proc_Date BETWEEN TO_DATE('2013/07/08', 'YYYY/MM/DD') AND TO_DATE('2013/07/08', 'YYYY/MM/DD')+1
GROUP BY
A.Cust_No,
B.Proc_Note,
A.Counsel_Seq
ORDER BY 2 DESC;
I thınk MAX is problem so i tried different sample code but same problem
SELECT A.Cust_No,
B.Proc_Note
FROM Tcustcounselm A ,
(SELECT Counsel_Seq,
Proc_Note,
Rank () Over (Partition By Counsel_Seq Order By Proc_Date Desc) As Priority
From Tcustcounseldt
) B
Where A.Counsel_Seq = B.Counsel_Seq
--And B.Priority = 1
AND A.Proc_Date BETWEEN TO_DATE('2013/07/08', 'YYYY/MM/DD') AND TO_DATE('2013/07/08', 'YYYY/MM/DD')+1;
This (or somehing similar) should work
SELECT A.Cust_No,
To_Char(B.Proc_Date, 'yyyy/mm/dd hh24:mi:ss') As Proc,
B.Proc_Note,
A.Counsel_Seq
FROM Tcustcounselm B
INNER JOIN Tcustcounseldt A
ON A.Counsel_Seq = B.Counsel_Seq
WHERE (A.Cust_No,B.Proc_Date) IN ( SELECT A.Cust_No,
max(B.Proc_Date) PD
FROM Tcustcounselm B
INNER JOIN Tcustcounseldt A
ON A.Counsel_Seq = B.Counsel_Seq
GROUP BY A.Cust_No)
I had time to play with sqlfiddle and I fixed the query, try to check it here.
Related
My objective is simple, I have to create a temporary table with some random values from a employee table whenever the department is in some particular department (say 2). For the rest of departments I don't care the value, it can be NULL.
Currently I have the following :
create table test
as
select s.DEPTNAME,
cast (
(case when s.DEPTID in (2) then
(SELECT a.ENAME FROM
(SELECT b.ENAME, b.DEPTID FROM EMPLOYEE b
WHERE b.DEPTID IS NOT NULL
ORDER BY DBMS_RANDOM.VALUE) a
WHERE a.DEPTID = s.DEPTID AND ROWNUM = 1
)
END)
AS VARCHAR2(30)) "ENAME" from DEPARTMENT s;
But the main issue here is related to performance. For every department value in 2 we do a sort of EMPLOYEE table to get a single random ENAME.
Is there a better way to do this ? I know sample might work but I want to achieve more randomness.
First idea - join randomly numbered enames:
with
e as (select ename, deptid, row_number() over (order by dbms_random.value) rn
from employee where deptid = 2),
c as (select count(1) cnt from e),
d as (select deptname, deptid, round(dbms_random.value(1, c.cnt)) rn from department, c)
select d.deptname, e.ename from d left join e using (rn, deptid)
SQLFiddle demo
Second possible solution, which worked for me, is to create function returning random ename from table employee
and use it in your query, but it would be probably slower.
Edit - according to comment:
If, for some reason, the first part of your statement is "fixed", then you could use this syntax:
create table test as
select deptname, ename from (
with
e as (select ename, deptid, row_number() over (order by dbms_random.value) rn
from employee where deptid = 2),
c as (select count(1) cnt from e),
d as (select deptname, deptid, round(dbms_random.value(1, c.cnt)) rn
from department cross join c)
select d.deptname, e.ename from d left join e using (rn, deptid));
I looked at the other questions related to this but it did not work. I even played around with Dense_Rank and still couldn't get the right answer.
Using the Oracle Student Scheme Data. I am trying to find the highest 3 grades for each section based on two tables ( NOT the top 3 rows for each sections. )
This is based off the Student and the Grade table from the Student Scheme
I am using the Oracle Developer 10 and it does not look like it takes the LIMIT function. Cause it gives me the error
missing right parenthesis
select
g.SECTION_ID,g.NUMERIC_GRADE, s.First_Name,s.Last_Name
from GRADE g Join Student s
On g.student_ID = s.student_ID
and Grade_Type_Code='MT'
Where numeric_grade in(Select distinct numeric_grade
from grade
order by numeric_grade desc limit 3);
You can use ROWNUM in Oracle:
SELECT g.SECTION_ID,
g.NUMERIC_GRADE,
s.First_Name,
s.Last_Name
FROM grade g
JOIN Student s
ON g.student_ID = s.student_ID
AND Grade_Type_Code='MT'
WHERE numeric_grade in (SELECT numeric_grade
FROM (SELECT DISTINCT numeric_grade
FROM grade
ORDER BY numeric_grade desc)
WHERE ROWNUM <=3);
Solution with dense_rank:
select section_id, numeric_grade, first_name, last_name
from (
select g.section_id, g.numeric_grade, s.First_Name,s.Last_Name,
dense_rank() over (partition by g.section_id
order by g.numeric_grade desc) rnk
from GRADE g
join Student s on g.student_ID = s.student_ID
and Grade_Type_Code='MT' )
where rnk < 4
order by section_id, numeric_grade desc
SQLFiddle demo
I'm trying to run this query in Hive to return only the top 10 url which appear more often in the adimpression table.
select
ranked_mytable.url,
ranked_mytable.cnt
from
( select iq.url, iq.cnt, rank() over (partition by iq.url order by iq.cnt desc) rnk
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url
) iq
) ranked_mytable
where
ranked_mytable.rnk <= 10
order by
ranked_mytable.url,
ranked_mytable.rnk desc
;
Unfortunately I get an error message stating:
FAILED: SemanticException [Error 10002]: Line 26:23 Invalid column reference 'rnk'
I've tried to debug it and until the ranked_mytable sub-queries everything goes smooth. I've tried to comment the where ranked_mytable.rnk <= 10 clause but the error message keep appearing.
Hive is unable to order by a column that is not in the "output" of a select statement. To fix it, just include that column in the selected columns:
select
ranked_mytable.url,
ranked_mytable.cnt,
ranked_mytable.rnk
from
( select iq.url, iq.cnt, rank() over (partition by iq.url order by iq.cnt desc) rnk
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url
) iq
) ranked_mytable
where
ranked_mytable.rnk <= 10
order by
ranked_mytable.url,
ranked_mytable.rnk desc
;
If you don't want that 'rnk' column in your final output, I expect you could wrap that whole thing in another inner-query and just select out the 'url' and 'cnt' fields.
RANK OVER is not the best function to achieve this goal.
A better solution would be to use a combination of SORT BY and LIMIT. It's true in fact LIMIT picks randomly the rows in a table, but this might be avoided if used with the SORT BY function. From the Apache Wiki:
-- Top k queries. The following query returns the top 5 sales records wrt amount.
SET mapred.reduce.tasks = 1 SELECT * FROM sales SORT BY amount
DESC LIMIT 5
The query can be re-written in this way:
select
iq.url,
iq.cnt
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url ) iq
sort by
iq.cnt desc
limit
10
;
Remove the partition by iq.url clause from rank over() and re-run query.
Thanks & Regards,
Kamleshkumar Gujarathi
Put as before the rnk variable. It should work fine.
Here is my query,
SELECT ID As Col1,
(
SELECT VID FROM TABLE2 t
WHERE (a.ID=t.ID or a.ID=t.ID2)
AND t.STARTDTE =
(
SELECT MAX(tt.STARTDTE)
FROM TABLE2 tt
WHERE (a.ID=tt.ID or a.ID=tt.ID2) AND tt.STARTDTE < SYSDATE
)
) As Col2
FROM TABLE1 a
Table1 has 48850 records and Table2 has 15944098 records.
I have separate indexes in TABLE2 on ID,ID & STARTDTE, STARTDTE, ID, ID2 & STARTDTE.
The query is still too slow. How can this be improved? Please help.
I'm guessing that the OR in inner queries is messing up with the optimizer's ability to use indexes. Also I wouldn't recommend a solution that would scan all of TABLE2 given its size.
This is why in this case I would suggest using a function that will efficiently retrieve the information you are looking for (2 index scan per call):
CREATE OR REPLACE FUNCTION getvid(p_id table1.id%TYPE)
RETURN table2.vid%TYPE IS
l_result table2.vid%TYPE;
BEGIN
SELECT vid
INTO l_result
FROM (SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
UNION ALL
SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id2 = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
ORDER BY startdte DESC)
WHERE rownum = 1;
RETURN l_result;
END;
Your SQL would become:
SELECT ID As Col1,
getvid(a.id) vid
FROM TABLE1 a
Make sure you have indexes on both table2(id, startdte DESC) and table2(id2, startdte DESC). The order of the index is very important.
Possibly try the following, though untested.
WITH max_times AS
(SELECT a.ID, MAX(t.STARTDTE) AS Startdte
FROM TABLE1 a, TABLE2 t
WHERE (a.ID=t.ID OR a.ID=t.ID2)
AND t.STARTDTE < SYSDATE
GROUP BY a.ID)
SELECT b.ID As Col1, tt.VID
FROM TABLE1 b
LEFT OUTER JOIN max_times mt
ON (b.ID = mt.ID)
LEFT OUTER JOIN TABLE2 tt
ON ((mt.ID=tt.ID OR mt.ID=tt.ID2)
AND mt.startdte = tt.startdte)
You can look at analytic functions to avoid having to hit the second table twice. Something like this might work:
SELECT id AS col1, vid
FROM (
SELECT t1.id, t2.vid, RANK() OVER (PARTITION BY t1.id ORDER BY
CASE WHEN t2.startdte < TRUNC(SYSDATE) THEN t2.startdte ELSE null END
NULLS LAST) AS rn
FROM table1 t1
JOIN table2 t2 ON t2.id IN (t1.ID, t1.ID2)
)
WHERE rn = 1;
The inner select gets the id and vid values from the two tables with a simple join on id or id2. The rank function calculates a ranking for each matching row in the second table based on the startdte. It's complicated a bit by you wanting to filter on that date, so I've used a case to effectively ignore any dates today or later by changing the evaluated value to null, and in this instance that means the order by in the over clause needs nulls last so they're ignored.
I'd suggest you run the inner select on its own first - maybe with just a couple of id values for brevity - to see what its doing, and what ranks are being allocated.
The outer query is then just picking the top-ranked result for each id.
You may still get duplicates though; if table2 has more than one row for an id with the same startdte they'll get the same rank, but then you may have had that situation before. You may need to add more fields to the order by to break ties in a way that makes sens to you.
But this is largely speculation without being able to see where your existing query is actually slow.
Let's assume I extract some set of data.
i.e.
SELECT A, date
FROM table
I want just the record with the max date (for each value of A). I could write
SELECT A, col_date
FROM TABLENAME t_ext
WHERE col_date = (SELECT MAX (col_date)
FROM TABLENAME t_in
WHERE t_in.A = t_ext.A)
But my query is really long... is there a more compact way using ANALYTIC FUNCTION to do the same?
The analytic function approach would look something like
SELECT a, some_date_column
FROM (SELECT a,
some_date_column,
rank() over (partition by a order by some_date_column desc) rnk
FROM tablename)
WHERE rnk = 1
Note that depending on how you want to handle ties (or whether ties are possible in your data model), you may want to use either the ROW_NUMBER or the DENSE_RANK analytic function rather than RANK.
If date and col_date are the same columns you should simply do:
SELECT A, MAX(date) FROM t GROUP BY A
Why not use:
WITH x AS ( SELECT A, MAX(col_date) m FROM TABLENAME GROUP BY A )
SELECT t.A, t.date FROM TABLENAME t JOIN x ON x.A = t.A AND x.m = t.col_date
Otherwise:
SELECT A, FIRST_VALUE(date) KEEP(dense_rank FIRST ORDER BY col_date DESC)
FROM TABLENAME
GROUP BY A
You could also use:
SELECT t.*
FROM
TABLENAME t
JOIN
( SELECT A, MAX(col_date) AS col_date
FROM TABLENAME
GROUP BY A
) m
ON m.A = t.A
AND m.col_date = t.col_date
A is the key, max(date) is the value, we might simplify the query as below:
SELECT distinct A, max(date) over (partition by A)
FROM TABLENAME
Justin Cave answer is the best, but if you want antoher option, try this:
select A,col_date
from (select A,col_date
from tablename
order by col_date desc)
where rownum<2
Since Oracle 12C, you can fetch a specific number of rows with FETCH FIRST ROW ONLY.
In your case this implies an ORDER BY, so the performance should be considered.
SELECT A, col_date
FROM TABLENAME t_ext
ORDER BY col_date DESC NULLS LAST
FETCH FIRST 1 ROW ONLY;
The NULLS LAST is just in case you may have null values in your field.
SELECT mu_file, mudate
FROM flightdata t_ext
WHERE mudate = (SELECT MAX (mudate)
FROM flightdata where mudate < sysdate)