I am writing a select query with distinct but I am still getting duplicates in my result. The Oracle View do have duplicates and I am trying to get back only 1 occurrence of that value.
Here is my query
select
person.person_id,
person.last_name,
person.first_name,
person.first_name,
person.middle_name,
skill.skills_id,
(case
when trim(skills.skill_description) = 'typing fast' then 'TP1'
when trim(skills.skill_description) = 'courier district 9' then 'CD9'
when trim(skills.skill_description) = 'helpdesk shift 3' then 'HD3'
when trim(skills.skill_description) = 'helpdesk shift 5' then 'HD5'
....
else ''
end) AS skill_description
from person_view person
left join (select distinct person_id, skill_id, skill_description, updated_date
from skill_view) skills
on skills.person_id = person.person_id and
((trunc(sysdate) - PHONE.UPDATED_DT <= 1)) and
trim(skills.skill_description) in ('skill1', 'skill2', 'skill3' ...)
There is a lot of values for skill_description, so I add the IN clause to filter for 15 - 20 specific skill_description values.
My case will take a the value and set the code for it.
I thought when I used the 'distinct' keyword it would filter out the duplicates but it is not working.
Here is my output so far
105 John E Doe SKILL1
105 John E Doe SKILL1
105 John E Doe SKILL2
105 John E Doe SKILL2
105 John E Doe SKILL3
105 John E Doe SKILL3
Any help is appreciated. Thanks
The problem is the DISTINCT is in an inner level and maybe the duplicates are outside the LEFT JOIN, you must put the DISTINCT clause in the first SELECT.
This should give you the desire output:
SELECT DISTINCT
person.person_id,
person.last_name,
person.first_name,
person.first_name,
person.middle_name,
skill.skills_id,
OMMITED CODE...
You are selecting updated_date in the subquery, but you don't use it in the outer query. QUESTION: Why? Did you mean to use it for something, perhaps to only select the most current info from the table?
In any case, if you have the same person_id, skill_id and skill_description values but different updated_date, DISTINCT won't help; the rows may very well be distinct in the inner query, but cease to be distinct in the outer query (if you don't include updated_date in the projection).
Related
I have the table T_LOCATION_DATA on Oracle DB as follows:
Person_ID | Location | Role
----------------------------
101 Delhi Manager
102 Mumbai Employee
103 Noida Manager
104 Mumbai Employee
105 Noida Employee
106 Delhi Manager
107 Mumbai Manager
108 Delhi Employee
109 Mumbai Employee
Another table is T_STATUS with following data:
Person_ID | Status
-------------------
101 Active
102 Active
103 Inactive
104 Active
105 Active
106 Inactive
107 Active
108 Active
109 Inactive
I am trying to get the count of both Employee and Manager who are Active; group by location in a single query so that the result comes as follows:
Location | MANAGER COUNT | EMPLOYEE COUNT
Delhi 1 1
Mumbai 1 1
Noida 0 1
I am trying with following query but with no result:
select location, count (a.person_id) as MANAGER COUNT,
count (b.person_id) as EMPLOYEE COUNT
from T_LOCATION_DATA a,T_LOCATION_DATA b
where a.person_id in (select person_id from t_status where status='Active')
... and I get lost here
Can someone guide me on this please?
From your data, I would query like this:
SELECT
Location,
COUNT(CASE WHEN Role='Manager' THEN 1 END) as count_managers,
COUNT(CASE WHEN Role='Employee' THEN 1 END) as count_employees,
COUNT(*) count_everyone
FROM
t_location_data l
INNER JOIN
t_status s
ON
l.person_id = s.person_id AND
s.status = 'Active'
GROUP BY location
Differences to your SQL:
We dump the awful old join syntax (SELECT * FROM a,b WHERE a.id=b.id) - please always use a JOIN b ON a.id = b.id
We join in the status table but we only really do that for the active ones, hence the reason why i stated it as another clause in the ON. I could have put it in a WHERE. With an INNER JOIN it makes no difference. With an OUTER JOIN it can make a big difference, as if you write a LEFT JOIN b ON a.id = b.id WHERE b.id = 'active' will convert that LEFT JOIN back to an INNER JOIN behaviour unless you made a where clause like WHERE b.id = 'active' OR b.id IS NULL - and that's just ugly. If that comparison to a constant had been put in an ON clause, you can skip the or ... is null ugliness
We group by location, but we don't necessarily count everything. If we count the result of a CASE WHEN role = 'Manager' THEN ..., the case when produces a 1 for a manager, and it produces NULL for a non manager (i didn't specify anything for the else; this is the design behaviour of CASE WHEN in such a scenario). The number didn't have to be a 1 either; it could be 'a', Role; anything that is non null. COUNT counts anything non null as a 1, and null as a 0. The following are thus equivalent, pick whichever one makes more sense to you:
COUNT(CASE WHEN Role='Employee' THEN 1 END) as count_employees,
COUNT(CASE WHEN Role='Employee' THEN 'a' END) as count_employees,
COUNT(CASE WHEN Role='Employee' THEN role END) as count_employees,
COUNT(CASE WHEN Role='Employee' THEN role ELSE null END) as count_employees,
SUM(CASE WHEN Role='Employee' THEN 1 ELSE 0 END) as count_employees,
They both work as counts, but in the SUM case, you really do have to use 1 and 0 if you want the output number to be a count. Actually, 0 is optional, as SUM doesn't sum nulls (but as mathguy points out below, if you didn't put ELSE 0, then the SUM method would produce a NULLwhen there were 0 items, rather than a 0. Whether this is helpful or hindering to you is a decision for you alone to make)
I wasn't clear whether managers are employees also. To me, they are, maybe not to you. I added a COUNT(*) that literally counts everyone at the location. Any difference meaning count_employees+count_managers != count_everyone means there was another role, not manager or employee, in the table.. Pick your poison
This COUNT/SUM(CASE WHEN...) pattern is really useful for turning data around - a PIVOT operation. It takes a column of data:
Manager
Employee
Manager
And turns it into two columns, for the count values:
Manager Employee
2 1
You can extend it as many times as you like. If you have 10 roles, make 10 case whens, and the results will have 10 columns with a grouped up count. The data is pivoted from row-ar representation to column-ar representation
If I execute both query individually it is not taking 4sec to get data, but when combine both I see query is deadslow. Any help much appreciated
Query1:
Select Med_Number,Med_Code,Member_Name,DOB FROM Med
WHERE Med.Med_Code=:Med_Code
Query2:
Select Red_Number,Red_Name,Red_Code FROM Red
WHERE Red.Red_Code =:Red_Code
Final One:Im passing one value at a time
Select Med_Number,Member_Name,Red_Number,Red_Name FROM Med M
LEFT JOIN Red R ON M.Med_Number=R.Red_Number
Where (Med.Med_Code=:Med_Code) OR (Red.Red_Code=:Red_Code)
If you look at the execution plan for all 3 statements, you'll figure it out. If you're not interested in figuring it out and you must execute only one query, then you can execute this, using src to determine which rows below to which row source assuming you have to know the difference and assuming the numbers and names are suitably equivalent data types:
Select 1 src, Med_Number,Member_Name,DOB
FROM Med
WHERE Med.Med_Number=:Med_Number
UNION ALL
Select 2 src, Red_Number,Red_Name, null
FROM Red
WHERE Red.Red_Number=:Red_Number
Of course, if the data types are equivalent and DOB is not allowed to be null, then this would suffice
Select Med_Number,Member_Name,DOB
FROM Med
WHERE Med.Med_Number=:Med_Number
UNION ALL
Select Red_Number,Red_Name, null
FROM Red
WHERE Red.Red_Number=:Red_Number
As you join your two tables on m.Med_number = R.Red_number, you don't need 2 parameters.
Select Med_Number,Member_Name,Red_Number,Red_Name FROM Med M
LEFT JOIN Red R ON M.Med_Number=R.Red_Number
Where M.Med_Number=:Number;
I have this query written by someone else and I am trying to figure out how is it working. I have general idea about all these things such as row_number() , partition by, pivot but I am unable to understand them all together.
For this query :
select
d, p, s, a
from
(
select name,occupation, (ROW_NUMBER() OVER (partition by occupation order by name)) as rownumber from occupations
)
pivot
(
max(name)
for occupation
in ('Doctor' as d, 'Professor' as p, 'Singer' as s, 'Actor' as a)
)
order by rownumber;
This is the input table on which the above query works :
This it the output generated by the query which is correct as per the question :
Jenny Ashley Meera Jane
Samantha Christeen Priya Julia
NULL Ketty NULL Maria
Now, I want to know how the output is generated by the query i.e. step by step with flow of execution. Explanation with easy examples matching the above situation would be much appreciated. Thanks in advance.
After from clause you have following :
select name,occupation, (ROW_NUMBER() OVER (partition by occupation order by name))
Above virtually restack your table data in three columns - Name, occupation, rownumber. rownumber will reset itself as soon as occupation column changes. Output data will be like :
NAME OCCUPATION ROWNUMBER
-------------------- -------------------- ----------
Jane ACTOR 1
Julia ACTOR 2
Maria ACTOR 3
JENNY DOCTOR 1 <-- rownumber reset to 1
Sammantha DOCTOR 2
Pivot function let you aggregate result & rotate rows into columns.
Pivot usage code is :
PIVOT
(
aggregate_function(column2)
FOR column2
IN ( expr1, expr2, ... expr_n) | subquery
)
So your PIVOT function have name stacked NAME based on OCCUPATION . Each stack (column in output) is ordered by rownumber column inserted via first subquery.
Can you please help me in getting a query for this scenario. In below case it should return me single row of A=13 because 13,14 in column A has most occurrences and value of B (30) is greater for 13. We are interested in maximum occurrences of A and in case of tie B should be considered as tie breaker.
A B
13 30
13 12
14 10
14 25
15 5
In below case where there are single occurrence of A (all tied) it should return 14 having maximum value of 40 for B.
A B
13 30
14 40
15 5
Use case - we get calls from corporate customers. We are interested in knowing during what hours of day when most calls come and in case of tie - which of the busiest hours has longest call.
Further question
There is further questions on this. I want to use either of two solutions - '11g or lower' from #GurV or 'dense_rank' from #mathguy in bigger query below how can I do it.
SELECT dv.id , u.email , dv.email_subject AS headline , dv.start_date , dv.closing_date, b.name AS business_name, ls.call_cost, dv.currency,
SUM(lsc.duration) AS duration, COUNT(lsc.id) AS call_count, ROUND(AVG(lsc.duration), 2) AS avg_duration
-- max(extract(HOUR from started )) keep (dense_rank last order by count(duration), max(duration)) as most_popular_hour
FROM deal_voucher dv
JOIN lead_source ls ON dv.id = ls.deal_id
JOIN lead_source_call lsc ON ls.PHONE_SID = lsc.phone_number_id
JOIN business b ON dv.business_id = b.id
JOIN users u ON b.id = u.business_id
AND TRUNC(dv.closing_date) = to_date('13-01-2017', 'dd-mm-yyyy')
AND lsc.status = 'completed' and lsc.duration >= 30
GROUP BY dv.id , u.email , dv.email_subject , dv.start_date , dv.closing_date, b.name, ls.call_cost, dv.currency
--, extract(HOUR from started )
Try this if 12c+
select a
from t
group by a
order by count(*) desc, max(b) desc
fetch first 1 row only;
If 11g or lower:
select * from (
select a
from t
group by a
order by count(*) desc, max(b) desc
) where rownum = 1;
Note that if there is equal count and equal max value for two or more values of A, then any one of them will be fetched.
Here is a query that will work in older versions (no fetch clause) and does not require a subquery. It uses the first/last function. In case of ties by both "count by A" and "value of max(B)" it selects only the row with the largest value of A. You can change that to min(A), or even to sum(A) (although that probably doesn't make sense in your problem) or LISTAGG(A, ',') WITHIN GROUP (ORDER BY A) to get a comma-delimited list of the A's that are tied for first place, but that requires 11.2 (I believe).
select max(a) keep (dense_rank last order by count(b), max(b)) as a
, max(max(b)) keep (dense_rank last order by count(b)) as b
from inputs
group by a
;
Let's say I have table data similar to the following:
123456 John Doe 1 Green 2001
234567 Jane Doe 1 Yellow 2001
234567 Jane Doe 2 Red 2001
345678 Jim Doe 1 Red 2001
What I am attempting to do is only isolate the records for Jane Doe based upon the fact that she has more than one row in this table. (More that one sequence number)
I cannot isolate based upon ID, names, colors, years, etc...
The number 1 in the sequence tells me that is the first record and I need to be able to display that record, as well as the number 2 record -- The change record.
If the table is called users, and the fields called ID, fname, lname, seq_no, color, date. How would I write the code to select only records that have more than one row in this table? For Example:
I want the query to display this only based upon the existence of the multiple rows:
234567 Jane Doe 1 Yellow 2001
234567 Jane Doe 2 Red 2001
In PL/SQL
First, to find the IDs for records with multiple rows you would use:
SELECT ID FROM table GROUP BY ID HAVING COUNT(*) > 1
So you could get all the records for all those people with
SELECT * FROM table WHERE ID IN (SELECT ID FROM table GROUP BY ID HAVING COUNT(*) > 1)
If you know that the second sequence ID will always be "2" and that the "2" record will never be deleted, you might find something like:
SELECT * FROM table WHERE ID IN (SELECT ID FROM table WHERE SequenceID = 2)
to be faster, but you better be sure the requirements are guaranteed to be met in your database (and you would want a compound index on (SequenceID, ID)).
Try something like the following. It's a single tablescan, as opposed to 2 like the others.
SELECT * FROM (
SELECT t1.*, COUNT(name) OVER (PARTITION BY name) mycount FROM TABLE t1
)
WHERE mycount >1;
INNER JOIN
JOIN:
SELECT u1.ID, u1.fname, u1.lname, u1.seq_no, u1.color, u1.date
FROM users u1 JOIN users u2 ON (u1.ID = u2.ID and u2.seq_no = 2)
WHERE:
SELECT u1.ID, u1.fname, u1.lname, u1.seq_no, u1.color, u1.date
FROM users u1, thetable u2
WHERE
u1.ID = u2.ID AND
u2.seq_no = 2
Check out the HAVING clause for a summary query. You can specify stuff like
HAVING COUNT(*) >= 2
and so forth.