Oracle count distinct record within subquery - oracle

I have 3 tables
SUBJECTS
CODE, SUBJECT_NAME , SESSION
100, MATHS , AM
101, MATHS - INTRO , AM
102, MATHS - ADVANCED , AM
200, ENGLISH , AM
201, ENGLISH - INTRO , AM
202, ENGLISH - BEGINNER, AM
203, ENGLISH - ADVANCED, AM
STUDENTS_SUBJECTS
ID, SUBJECT_CODE
2, 101
2, 102
1, 201
1, 203
3, 101
3, 102
STUDENTS
ID,PARENT_ID, STUDENT_NAME, CLASS_LEADER, INACTIVE, EXPERT
1 , 2 , ELSA , no , N , N
2 , 4 , STEVE , no , N , N
3 , 5 , MIKE , no , N , N
My query goes like
SELECT t1.CODE,
t1.SUBJECT_NAME,
SUM (CASE WHEN ( (t2.CLASS_LEADER = 'no'
OR t2.CLASS_LEADER IS NULL)
AND t2.EXPERT IS NULL)
THEN 1 ELSE 0 END) AS "Average Student"
FROM subjects t1
LEFT OUTER JOIN (
select a.STUDENT_ID, a.PARENT_ID, a.STUDENT_NAME,
a.CLASS_LEADER, c.SUBJECT_CODE, a.INACTIVE, a.EXPERT
FROM students a
INNER JOIN students_subjects c
ON (a.STUDENT_ID = c.ID )
where (INACTIVE is null)
GROUP BY a.STUDENT_ID, a.PARENT_ID, a.STUDENT_NAME, a.CLASS_LEADER, c.SUBJECT_CODE, a.INACTIVE, a.EXPERT
) t2
ON substr(trim(t2.SUBJECT_CODE),1,2)= substr(trim(t1.CODE),1,2)
WHERE (t1.SESSION='AM')
GROUP BY t1.CODE, T1.SUBJECT_NAME
ORDER BY T1.CODE
What I would like to get is the number of students who signed up for the class for morning session under each major subject without the duplicates. For example, each students who signed up for Maths - Intro & Maths Advanced should only be counted once under the Maths subject.
if I run the subquery separately minus the subject_code in select statement and group by statement, I managed to get the correct value however I'm not sure how to return the correct value when it's joined in the query.
REPORT
CODE, SUBJECT_NAME, AVERAGE_STUDENT
100 MATHS 2
200 ENGLISH 1
Thank you.

First some recomendation:
1) add column MAIN_SUBJECT_CODE to the table SUBJECTS (as already commented)
2) the column ID in the table STUDENTS_SUBJECTS is a foreign key pointing to the table STUDENT, so a better name will be STUDENT_ID
3) use unique mechanism to store Boolean values do not mix 'no' and 'N'
First the query of all student subscriptions
Note that I added the missing column main_subject_code and adjusted the average student definition to get some result.
SELECT su.CODE,
substr(trim(su.CODE),1,2)||'0' main_subject_code,
su.SUBJECT_NAME,
st.STUDENT_NAME,
CASE WHEN ( (st.CLASS_LEADER = 'no'
OR st.CLASS_LEADER IS NULL)
AND st.EXPERT = 'N' /*IS NULL*/)
THEN 1 ELSE 0 END AS "Average Student"
FROM subjects su
INNER JOIN students_subjects ss
ON su.code = ss.SUBJECT_CODE
INNER JOIN STUDENTS st
ON ss.ID /* STUDENT_ID */ = st.ID
;
CODE MAIN_SUBJECT_CODE SUBJECT_NAME STUDENT_NAME Average Student
101 100 MATHS - INTRO MIKE 1
101 100 MATHS - INTRO STEVE 1
102 100 MATHS - ADVANCED MIKE 1
102 100 MATHS - ADVANCED STEVE 1
201 200 ENGLISH - INTRO ELSA 1
203 200 ENGLISH - ADVANCED ELSA 1
The rest is simple - group on main subject and add the title of it
with subsr as (
SELECT su.CODE,
substr(trim(su.CODE),1,2)||'0' main_subject_code,
su.SUBJECT_NAME,
st.STUDENT_NAME,
CASE WHEN ( (st.CLASS_LEADER = 'no'
OR st.CLASS_LEADER IS NULL)
AND st.EXPERT = 'N' /*IS NULL*/)
THEN 1 ELSE 0 END AS "Average Student"
FROM subjects su
INNER JOIN students_subjects ss
ON su.code = ss.SUBJECT_CODE
INNER JOIN STUDENTS st
ON ss.ID /* STUDENT_ID */ = st.ID
)
select
main_subject_code,
(select SUBJECT_NAME from SUBJECTS where CODE = main_subject_code) main_subject_name,
sum("Average Student") "Average Student"
from subsr
group by main_subject_code
order by main_subject_code;
MAIN_SUBJECT_CODE MAIN_SUBJECT_NAME Average Student
----------------- ------------------------- ---------------
100 MATHS 4
200 ENGLISH 2

Your posted query contains a lot of extraneous logic which doesn't seem releavnt to your apparent task. So I'm ignoring it and focusing on simply getting "the number of students who signed up for the class for morning session under each major subject without the duplicates".
select major
, count(*)
from (
select distinct subj.major
, ss.id as student_id
from
( select code,
regexp_replace(subject_name, '^([A-Z]+)(.*)', '\1') major ,
from subjects
where session = 'AM'
) subj
join student_subjects ss
on ss.subject_code = subj.code
)
group by major
order by major
/
The subquery on SUBJECTS use a regex function to extract the leading element of the subject name as the major. It works for the posted sample data but might fail for more complicated names. Regex shouldn't be necessary: a proper data model would separate the MAJOR subject from its subsidiaries.

Related

Oracle Select Query on Same Table (self join)

It seems to simple, but not getting desired results
I have a table with there data
Team_id, Player_id, Player_name Game_cd
1 100 abc 24
1 1000 xyz 24
1 588 ert 24
1 500 you 24
2 600 ops 24
2 700 dps 24
2 900 lmv 24
2 200 hmv 24
I have to write a query to get a result like this
Home_team home_plr_id home_player away_team away_plr_id away_player
1 100 abc 2 600 ops
1 1000 xyz 2 900 lmv
The query I wrote
select f1.Team_id as home_team,
f1.player_id as home_plr_id,
f1.player_Name as home_player,
f2.Team_id as away_team,
f2.player_id as away_plr_id,
f2.player_Name as home_player
from game f1, game f2
where
f1.team_id<> f2.team_id and
f1.game_cd = f2.game_cd
Alternative to #Radagast81's self-join is pivot, available in your Oracle version:
select home_plr_id, home_plr_name, away_plr_id, away_plr_name
from (select game.*,
row_number() over (partition by team_id order by player_id) rn
from game)
pivot (max(player_id) plr_id, max(player_name) plr_name
for team_id in (1 home, 2 away))
SQL Fiddle
Players have to be numbered somehow (here by ID), it can be done by name, null or even random. This numbering is needed only to put them in same rows. Pivot works also if numbers of players in teams differs.
It is not clear how you want to pair a home player with an away player. But provided that you don't care about that, the following might be what you are looking for:
WITH game_p AS (SELECT team_id, player_id, player_name, game_cd
, ROW_NUMBER() over (PARTITION BY team_id, game_cd ORDER BY player_id) pos
, dense_rank() over (PARTITION BY game_cd ORDER BY team_id) team_pos
FROM game)
SELECT NVL(f1.game_cd, f2.game_cd) AS game_cd
, f1.Team_id as home_team
, f1.player_id as home_plr_id
, f1.player_Name as home_player
, f2.Team_id as away_team
, f2.player_id as away_plr_id
, f2.player_Name as away_player
FROM (SELECT * FROM game_p WHERE team_pos = 1) f1
FULL JOIN (SELECT * FROM game_p WHERE team_pos = 2) f2
ON f1.game_cd = f2.game_cd
AND f1.pos = f2.pos
The new column POS gives any player of each team a position to pair them with the other team.
The new column TEAM_POS is to get the team_id mapped to the values 1 and 2, as the team_id's can differ per game.
Finally do a FULL JOIN to get the final list. If the number of players are allways the same for both teams you can do a normal join instead...

SQL sample groups

I have a sqlite database that I can read as:
In [42]: df = pd.read_sql("SELECT * FROM all_vs_all", engine)
In [43]:
In [43]: df.head()
Out[43]:
user_data user_model \
0 037d05edbbf8ebaf0eca#172.16.199.165 037d05edbbf8ebaf0eca#172.16.199.165
1 037d05edbbf8ebaf0eca#172.16.199.165 060210bf327a3e3b4621#172.16.199.33
2 037d05edbbf8ebaf0eca#172.16.199.165 1141259bd36ba65bef02#172.21.44.180
3 037d05edbbf8ebaf0eca#172.16.199.165 209627747e2af1f6389e#172.16.199.181
4 037d05edbbf8ebaf0eca#172.16.199.165 303a1aff4ab6e3be82ab#172.21.112.182
score Class time_secs model_name bin_id
0 0.283141 0 1514764800 Flow 0
1 0.999300 1 1514764800 Flow 0
2 1.000000 1 1514764800 Flow 0
3 0.206360 1 1514764800 Flow 0
4 1.000000 1 1514764800 Flow 0
As the table is too big I rather than reading the full table I select a random subset of rows:
This can be done very quckly as:
random_query = "SELECT * FROM all_vs_all WHERE abs(CAST(random() AS REAL))/9223372036854775808 < %f AND %s" % (ratio, time_condition)
df = pd.read_sql(random_query, engine)
The problem is that for each triplet [user_data, user_model, time_secs] I want to get all the rows containing that triplet. Each triplet appears 1 or 2 times.
A possible way to do it is to firstly sample a random set of triplets and then get all the rows that have one of the selected triplets but this seems to be too slow.
Is there an efficient way to do it?
EDIT: If I could load all the data in pandas I would have done something like:
selected_groups = []
for group in df.groupby(['user_data', 'user_model', 'time_secs']):
if np.random.uniform(0,1) > ratio:
selected_groups.append(group)
res = pd.concat(selected_groups)
Few sample join and sql query:
currently admitted :
Select p.patient_no, p.pat_name,p.date_admitted,r.room_extension,p.date_discharged FROM Patient p JOIN room r ON p.room_location = r.room_location where p.date_discharged IS NULL ORDER BY p.patient_no,p.pat_name,p.date_admitted,r.room_extension,p.date_discharged;
vacant rooms:
SELECT r.room_location, r.room_accomadation, r.room_extension FROM room r where r.room_location NOT IN (Select p.room_location FROM patient.p where p.date_discharged IS NULL) ORDER BY r.room_location, r.room_accomadation, r.room_extension;
no charges yet :
SELECT p.patient_no, p.pat_name, COALESCE (b.charge,0.00) charge FROM patient p LEFT JOIN billed b on p.patient_no = b.patient_no WHERE p.patient_no NOT IN (SELECT patient_no FROM billed) group by p.patient_no ORDER BY p.patient_no, p.pat_name,charge;
max salarised :
SELECT phy_id,phy_name, salary FROM physician where salary in (SELECT MAX(salary) FROM physician) UNION
SELECT phy_id,phy_name, salary FROM physician where salary in (SELECT MIN(salary) FROM physician) ORDER BY phy_id,phy_name, salary;
various item consumed by:
select p.pat_name, i.discription, count (i.item code) as item code from patient p join billed b on p.patient no = b. patient no join item i on b.item code = i.item code group by p.patient no, i.item code order by..
patient not receivede treatment:
SELECT p.patient_no,p.pat_name FROM patient p where p.patient_no NOT IN (SELECT t.patient_no FROM treats t)
ORDER BY p.patient_no,p.pat_name;
2 high paid :
Select phy_id, phy_name, date_of_joining, max(salary) as salary from physician group by salary having salary IN (Select salary from physician)
Order by phy_id, phy_name, date_of_joining, salary limit 2;
over 200:
select patient_no, sum (charge), as total charge from billed group by patient no having total charges > 200 order by patient no, total charges

Select different count from same table

I have the table T_LOCATION_DATA on Oracle DB as follows:
Person_ID | Location | Role
----------------------------
101 Delhi Manager
102 Mumbai Employee
103 Noida Manager
104 Mumbai Employee
105 Noida Employee
106 Delhi Manager
107 Mumbai Manager
108 Delhi Employee
109 Mumbai Employee
Another table is T_STATUS with following data:
Person_ID | Status
-------------------
101 Active
102 Active
103 Inactive
104 Active
105 Active
106 Inactive
107 Active
108 Active
109 Inactive
I am trying to get the count of both Employee and Manager who are Active; group by location in a single query so that the result comes as follows:
Location | MANAGER COUNT | EMPLOYEE COUNT
Delhi 1 1
Mumbai 1 1
Noida 0 1
I am trying with following query but with no result:
select location, count (a.person_id) as MANAGER COUNT,
count (b.person_id) as EMPLOYEE COUNT
from T_LOCATION_DATA a,T_LOCATION_DATA b
where a.person_id in (select person_id from t_status where status='Active')
... and I get lost here
Can someone guide me on this please?
From your data, I would query like this:
SELECT
Location,
COUNT(CASE WHEN Role='Manager' THEN 1 END) as count_managers,
COUNT(CASE WHEN Role='Employee' THEN 1 END) as count_employees,
COUNT(*) count_everyone
FROM
t_location_data l
INNER JOIN
t_status s
ON
l.person_id = s.person_id AND
s.status = 'Active'
GROUP BY location
Differences to your SQL:
We dump the awful old join syntax (SELECT * FROM a,b WHERE a.id=b.id) - please always use a JOIN b ON a.id = b.id
We join in the status table but we only really do that for the active ones, hence the reason why i stated it as another clause in the ON. I could have put it in a WHERE. With an INNER JOIN it makes no difference. With an OUTER JOIN it can make a big difference, as if you write a LEFT JOIN b ON a.id = b.id WHERE b.id = 'active' will convert that LEFT JOIN back to an INNER JOIN behaviour unless you made a where clause like WHERE b.id = 'active' OR b.id IS NULL - and that's just ugly. If that comparison to a constant had been put in an ON clause, you can skip the or ... is null ugliness
We group by location, but we don't necessarily count everything. If we count the result of a CASE WHEN role = 'Manager' THEN ..., the case when produces a 1 for a manager, and it produces NULL for a non manager (i didn't specify anything for the else; this is the design behaviour of CASE WHEN in such a scenario). The number didn't have to be a 1 either; it could be 'a', Role; anything that is non null. COUNT counts anything non null as a 1, and null as a 0. The following are thus equivalent, pick whichever one makes more sense to you:
COUNT(CASE WHEN Role='Employee' THEN 1 END) as count_employees,
COUNT(CASE WHEN Role='Employee' THEN 'a' END) as count_employees,
COUNT(CASE WHEN Role='Employee' THEN role END) as count_employees,
COUNT(CASE WHEN Role='Employee' THEN role ELSE null END) as count_employees,
SUM(CASE WHEN Role='Employee' THEN 1 ELSE 0 END) as count_employees,
They both work as counts, but in the SUM case, you really do have to use 1 and 0 if you want the output number to be a count. Actually, 0 is optional, as SUM doesn't sum nulls (but as mathguy points out below, if you didn't put ELSE 0, then the SUM method would produce a NULLwhen there were 0 items, rather than a 0. Whether this is helpful or hindering to you is a decision for you alone to make)
I wasn't clear whether managers are employees also. To me, they are, maybe not to you. I added a COUNT(*) that literally counts everyone at the location. Any difference meaning count_employees+count_managers != count_everyone means there was another role, not manager or employee, in the table.. Pick your poison
This COUNT/SUM(CASE WHEN...) pattern is really useful for turning data around - a PIVOT operation. It takes a column of data:
Manager
Employee
Manager
And turns it into two columns, for the count values:
Manager Employee
2 1
You can extend it as many times as you like. If you have 10 roles, make 10 case whens, and the results will have 10 columns with a grouped up count. The data is pivoted from row-ar representation to column-ar representation

Oracle 'Partition By' and 'Row_Number' keyword along with pivot

I have this query written by someone else and I am trying to figure out how is it working. I have general idea about all these things such as row_number() , partition by, pivot but I am unable to understand them all together.
For this query :
select
d, p, s, a
from
(
select name,occupation, (ROW_NUMBER() OVER (partition by occupation order by name)) as rownumber from occupations
)
pivot
(
max(name)
for occupation
in ('Doctor' as d, 'Professor' as p, 'Singer' as s, 'Actor' as a)
)
order by rownumber;
This is the input table on which the above query works :
This it the output generated by the query which is correct as per the question :
Jenny Ashley Meera Jane
Samantha Christeen Priya Julia
NULL Ketty NULL Maria
Now, I want to know how the output is generated by the query i.e. step by step with flow of execution. Explanation with easy examples matching the above situation would be much appreciated. Thanks in advance.
After from clause you have following :
select name,occupation, (ROW_NUMBER() OVER (partition by occupation order by name))
Above virtually restack your table data in three columns - Name, occupation, rownumber. rownumber will reset itself as soon as occupation column changes. Output data will be like :
NAME OCCUPATION ROWNUMBER
-------------------- -------------------- ----------
Jane ACTOR 1
Julia ACTOR 2
Maria ACTOR 3
JENNY DOCTOR 1 <-- rownumber reset to 1
Sammantha DOCTOR 2
Pivot function let you aggregate result & rotate rows into columns.
Pivot usage code is :
PIVOT
(
aggregate_function(column2)
FOR column2
IN ( expr1, expr2, ... expr_n) | subquery
)
So your PIVOT function have name stacked NAME based on OCCUPATION . Each stack (column in output) is ordered by rownumber column inserted via first subquery.

11g Oracle aggregate SQL query

Can you please help me in getting a query for this scenario. In below case it should return me single row of A=13 because 13,14 in column A has most occurrences and value of B (30) is greater for 13. We are interested in maximum occurrences of A and in case of tie B should be considered as tie breaker.
A B
13 30
13 12
14 10
14 25
15 5
In below case where there are single occurrence of A (all tied) it should return 14 having maximum value of 40 for B.
A B
13 30
14 40
15 5
Use case - we get calls from corporate customers. We are interested in knowing during what hours of day when most calls come and in case of tie - which of the busiest hours has longest call.
Further question
There is further questions on this. I want to use either of two solutions - '11g or lower' from #GurV or 'dense_rank' from #mathguy in bigger query below how can I do it.
SELECT dv.id , u.email , dv.email_subject AS headline , dv.start_date , dv.closing_date, b.name AS business_name, ls.call_cost, dv.currency,
SUM(lsc.duration) AS duration, COUNT(lsc.id) AS call_count, ROUND(AVG(lsc.duration), 2) AS avg_duration
-- max(extract(HOUR from started )) keep (dense_rank last order by count(duration), max(duration)) as most_popular_hour
FROM deal_voucher dv
JOIN lead_source ls ON dv.id = ls.deal_id
JOIN lead_source_call lsc ON ls.PHONE_SID = lsc.phone_number_id
JOIN business b ON dv.business_id = b.id
JOIN users u ON b.id = u.business_id
AND TRUNC(dv.closing_date) = to_date('13-01-2017', 'dd-mm-yyyy')
AND lsc.status = 'completed' and lsc.duration >= 30
GROUP BY dv.id , u.email , dv.email_subject , dv.start_date , dv.closing_date, b.name, ls.call_cost, dv.currency
--, extract(HOUR from started )
Try this if 12c+
select a
from t
group by a
order by count(*) desc, max(b) desc
fetch first 1 row only;
If 11g or lower:
select * from (
select a
from t
group by a
order by count(*) desc, max(b) desc
) where rownum = 1;
Note that if there is equal count and equal max value for two or more values of A, then any one of them will be fetched.
Here is a query that will work in older versions (no fetch clause) and does not require a subquery. It uses the first/last function. In case of ties by both "count by A" and "value of max(B)" it selects only the row with the largest value of A. You can change that to min(A), or even to sum(A) (although that probably doesn't make sense in your problem) or LISTAGG(A, ',') WITHIN GROUP (ORDER BY A) to get a comma-delimited list of the A's that are tied for first place, but that requires 11.2 (I believe).
select max(a) keep (dense_rank last order by count(b), max(b)) as a
, max(max(b)) keep (dense_rank last order by count(b)) as b
from inputs
group by a
;

Resources