aggregate function with join by most recent date Oracle - oracle

I'm trying to select a single test score for students by test type, test element, semester and ID. One score per ID. If a student has taken the test more than once, I want to only return the highest (or most recent) score for that test and element.
My problem is that there are a very small number of instances (less than 10 records out of 2000) where a student has two test scores recorded on different dates because they've re-taken the test to improve their score and we record both scores. My output therefore has a small number of records with multiple scores for unique name_ids (where test_id = 'act' and element_id = 'comp').
Examples of the two tables are:
students
----------------------------------------------------------
name_id term_id
100 Fall
100 Spring
100 Summer
105 Fall
105 Spring
110 Fall
110 Spring
110 Summer
test_score
----------------------------------------------------------
name_id test_id element_id score test_date
100 act comp 25 02/01/2019
100 sat comp 1250 01/20/2019
105 act comp 19 01/15/2019
105 act comp 21 02/28/2019
110 act comp 27 01/31/2019
I've tried using MAX(test_score) but perhaps could use MAX(test_date)? Either would work because the students don't report additional test scores from later dates if the scores aren't higher than what was originally reported.
This is a small part of a larger routine joining several tables so I don't know that I can replace my JOIN(s). I'm just trying to get this small subset of the routine to produce the correct number of unique records
SELECT
a.name_id NameID,
a.term_id TermID,
MAX(b.score) Score
FROM students a
LEFT JOIN test_score b ON a.name_id = b.name_id AND b.test_id = 'act' AND b.element_id = 'comp'
WHERE a.term_id = 'Spring'
group by b.score,a.name_id,a.term_id
order by a.name_id
No error messages but results from above will yield two records for NameID 105:
NameID TermID Score
100 Spring 25
105 Spring 19
105 Spring 21
110 Spring 27
I'm not certain how to write this to only select the highest score (or only the score from the most recent date)
Thanks for your guidance.

To select the highest score the GROUP BY cannot include the score...
SELECT
a.name_id NameID,
a.term_id TermID,
MAX(b.score) Score
FROM students a
LEFT JOIN test_score b ON a.name_id = b.name_id AND b.test_id = 'act' AND b.element_id = 'comp'
WHERE a.term_id = 'Spring'
group by a.name_id,a.term_id
order by a.name_id
To get the score associated with the highest date...
SELECT x.NameID,
x.TermID,
y.score
FROM (
SELECT
a.name_id NameID,
a.term_id TermID,
--- MAX(b.score) Score
MAX(b.test_date) test_date
FROM students a
LEFT JOIN test_score b ON a.name_id = b.name_id AND b.test_id = 'act' AND b.element_id = 'comp'
WHERE a.term_id = 'Spring'
group by a.name_id,a.term_id ) x
LEFT JOIN test_score y ON x.nameid = y.name_id AND x.test_date = y.test_date
order by x.nameid

Related

Oracle Select Query on Same Table (self join)

It seems to simple, but not getting desired results
I have a table with there data
Team_id, Player_id, Player_name Game_cd
1 100 abc 24
1 1000 xyz 24
1 588 ert 24
1 500 you 24
2 600 ops 24
2 700 dps 24
2 900 lmv 24
2 200 hmv 24
I have to write a query to get a result like this
Home_team home_plr_id home_player away_team away_plr_id away_player
1 100 abc 2 600 ops
1 1000 xyz 2 900 lmv
The query I wrote
select f1.Team_id as home_team,
f1.player_id as home_plr_id,
f1.player_Name as home_player,
f2.Team_id as away_team,
f2.player_id as away_plr_id,
f2.player_Name as home_player
from game f1, game f2
where
f1.team_id<> f2.team_id and
f1.game_cd = f2.game_cd
Alternative to #Radagast81's self-join is pivot, available in your Oracle version:
select home_plr_id, home_plr_name, away_plr_id, away_plr_name
from (select game.*,
row_number() over (partition by team_id order by player_id) rn
from game)
pivot (max(player_id) plr_id, max(player_name) plr_name
for team_id in (1 home, 2 away))
SQL Fiddle
Players have to be numbered somehow (here by ID), it can be done by name, null or even random. This numbering is needed only to put them in same rows. Pivot works also if numbers of players in teams differs.
It is not clear how you want to pair a home player with an away player. But provided that you don't care about that, the following might be what you are looking for:
WITH game_p AS (SELECT team_id, player_id, player_name, game_cd
, ROW_NUMBER() over (PARTITION BY team_id, game_cd ORDER BY player_id) pos
, dense_rank() over (PARTITION BY game_cd ORDER BY team_id) team_pos
FROM game)
SELECT NVL(f1.game_cd, f2.game_cd) AS game_cd
, f1.Team_id as home_team
, f1.player_id as home_plr_id
, f1.player_Name as home_player
, f2.Team_id as away_team
, f2.player_id as away_plr_id
, f2.player_Name as away_player
FROM (SELECT * FROM game_p WHERE team_pos = 1) f1
FULL JOIN (SELECT * FROM game_p WHERE team_pos = 2) f2
ON f1.game_cd = f2.game_cd
AND f1.pos = f2.pos
The new column POS gives any player of each team a position to pair them with the other team.
The new column TEAM_POS is to get the team_id mapped to the values 1 and 2, as the team_id's can differ per game.
Finally do a FULL JOIN to get the final list. If the number of players are allways the same for both teams you can do a normal join instead...

DAX AVG of Group Applied but returned for one specific Employee?

I have data as follows
EmployeeID Cycle Val Group
1 1 6 A
2 1 5 A
My desired result is as follows:
EmployeeID Cycle GroupVal
1 1 5.5
2 1 5.5
I have written 2 Measures as follows:
Emp_AVG: CALCULATE(AVERAGE(EmployeeFeedback, EmployeeFeedback[Val] > 0)
Group_AVG: CALCULATE(AVERAGEX(EmployeeFeedback[Emp_AVG],EmployeeFeedback[Emp_AVG] >0)
My thought process is that the Group_AVG is averaging the avg of all employees PER GROUP however since i need the results for a SPECIFIC employee, as soon as i introduce that column, it starts slicing by the Employee and the Group avg becomes inaccurate. I guess i need to generate Group Avgs before i do any employee filtering..how?
I am running a DAX query as follows:
EVALUATE SUMMARIZECOLUMNS(
EmployeeFeedback[EmployeeID],
EmployeeFeedback[Cycle],
"Group Val", [Group_AVG]
)
I need the EmployeeID to filter it down to an employee but because of EmployeeID, the Group AVG gets screwed. Without EmployeeID, Group AVG is correct but then there is no way to filter it for a specific Employee!
Thanks!
You could try to provide the ALLEXCEPT() argument to the CALCULATE function.
Btw, I believe there's an error in your Emp_AVG measure.
Try this:
Employee Average
Emp_AVG =
CALCULATE(
AVERAGE(EmployeeFeedback[Val]),
EmployeeFeedback[Val] > 0)
Group Average
Grp_Avg =
CALCULATE(EmployeeFeedback[Emp_AVG],
ALLEXCEPT(EmployeeFeedback,EmployeeFeedback[Group]))
Result:

11g Oracle aggregate SQL query

Can you please help me in getting a query for this scenario. In below case it should return me single row of A=13 because 13,14 in column A has most occurrences and value of B (30) is greater for 13. We are interested in maximum occurrences of A and in case of tie B should be considered as tie breaker.
A B
13 30
13 12
14 10
14 25
15 5
In below case where there are single occurrence of A (all tied) it should return 14 having maximum value of 40 for B.
A B
13 30
14 40
15 5
Use case - we get calls from corporate customers. We are interested in knowing during what hours of day when most calls come and in case of tie - which of the busiest hours has longest call.
Further question
There is further questions on this. I want to use either of two solutions - '11g or lower' from #GurV or 'dense_rank' from #mathguy in bigger query below how can I do it.
SELECT dv.id , u.email , dv.email_subject AS headline , dv.start_date , dv.closing_date, b.name AS business_name, ls.call_cost, dv.currency,
SUM(lsc.duration) AS duration, COUNT(lsc.id) AS call_count, ROUND(AVG(lsc.duration), 2) AS avg_duration
-- max(extract(HOUR from started )) keep (dense_rank last order by count(duration), max(duration)) as most_popular_hour
FROM deal_voucher dv
JOIN lead_source ls ON dv.id = ls.deal_id
JOIN lead_source_call lsc ON ls.PHONE_SID = lsc.phone_number_id
JOIN business b ON dv.business_id = b.id
JOIN users u ON b.id = u.business_id
AND TRUNC(dv.closing_date) = to_date('13-01-2017', 'dd-mm-yyyy')
AND lsc.status = 'completed' and lsc.duration >= 30
GROUP BY dv.id , u.email , dv.email_subject , dv.start_date , dv.closing_date, b.name, ls.call_cost, dv.currency
--, extract(HOUR from started )
Try this if 12c+
select a
from t
group by a
order by count(*) desc, max(b) desc
fetch first 1 row only;
If 11g or lower:
select * from (
select a
from t
group by a
order by count(*) desc, max(b) desc
) where rownum = 1;
Note that if there is equal count and equal max value for two or more values of A, then any one of them will be fetched.
Here is a query that will work in older versions (no fetch clause) and does not require a subquery. It uses the first/last function. In case of ties by both "count by A" and "value of max(B)" it selects only the row with the largest value of A. You can change that to min(A), or even to sum(A) (although that probably doesn't make sense in your problem) or LISTAGG(A, ',') WITHIN GROUP (ORDER BY A) to get a comma-delimited list of the A's that are tied for first place, but that requires 11.2 (I believe).
select max(a) keep (dense_rank last order by count(b), max(b)) as a
, max(max(b)) keep (dense_rank last order by count(b)) as b
from inputs
group by a
;

Oracle count distinct record within subquery

I have 3 tables
SUBJECTS
CODE, SUBJECT_NAME , SESSION
100, MATHS , AM
101, MATHS - INTRO , AM
102, MATHS - ADVANCED , AM
200, ENGLISH , AM
201, ENGLISH - INTRO , AM
202, ENGLISH - BEGINNER, AM
203, ENGLISH - ADVANCED, AM
STUDENTS_SUBJECTS
ID, SUBJECT_CODE
2, 101
2, 102
1, 201
1, 203
3, 101
3, 102
STUDENTS
ID,PARENT_ID, STUDENT_NAME, CLASS_LEADER, INACTIVE, EXPERT
1 , 2 , ELSA , no , N , N
2 , 4 , STEVE , no , N , N
3 , 5 , MIKE , no , N , N
My query goes like
SELECT t1.CODE,
t1.SUBJECT_NAME,
SUM (CASE WHEN ( (t2.CLASS_LEADER = 'no'
OR t2.CLASS_LEADER IS NULL)
AND t2.EXPERT IS NULL)
THEN 1 ELSE 0 END) AS "Average Student"
FROM subjects t1
LEFT OUTER JOIN (
select a.STUDENT_ID, a.PARENT_ID, a.STUDENT_NAME,
a.CLASS_LEADER, c.SUBJECT_CODE, a.INACTIVE, a.EXPERT
FROM students a
INNER JOIN students_subjects c
ON (a.STUDENT_ID = c.ID )
where (INACTIVE is null)
GROUP BY a.STUDENT_ID, a.PARENT_ID, a.STUDENT_NAME, a.CLASS_LEADER, c.SUBJECT_CODE, a.INACTIVE, a.EXPERT
) t2
ON substr(trim(t2.SUBJECT_CODE),1,2)= substr(trim(t1.CODE),1,2)
WHERE (t1.SESSION='AM')
GROUP BY t1.CODE, T1.SUBJECT_NAME
ORDER BY T1.CODE
What I would like to get is the number of students who signed up for the class for morning session under each major subject without the duplicates. For example, each students who signed up for Maths - Intro & Maths Advanced should only be counted once under the Maths subject.
if I run the subquery separately minus the subject_code in select statement and group by statement, I managed to get the correct value however I'm not sure how to return the correct value when it's joined in the query.
REPORT
CODE, SUBJECT_NAME, AVERAGE_STUDENT
100 MATHS 2
200 ENGLISH 1
Thank you.
First some recomendation:
1) add column MAIN_SUBJECT_CODE to the table SUBJECTS (as already commented)
2) the column ID in the table STUDENTS_SUBJECTS is a foreign key pointing to the table STUDENT, so a better name will be STUDENT_ID
3) use unique mechanism to store Boolean values do not mix 'no' and 'N'
First the query of all student subscriptions
Note that I added the missing column main_subject_code and adjusted the average student definition to get some result.
SELECT su.CODE,
substr(trim(su.CODE),1,2)||'0' main_subject_code,
su.SUBJECT_NAME,
st.STUDENT_NAME,
CASE WHEN ( (st.CLASS_LEADER = 'no'
OR st.CLASS_LEADER IS NULL)
AND st.EXPERT = 'N' /*IS NULL*/)
THEN 1 ELSE 0 END AS "Average Student"
FROM subjects su
INNER JOIN students_subjects ss
ON su.code = ss.SUBJECT_CODE
INNER JOIN STUDENTS st
ON ss.ID /* STUDENT_ID */ = st.ID
;
CODE MAIN_SUBJECT_CODE SUBJECT_NAME STUDENT_NAME Average Student
101 100 MATHS - INTRO MIKE 1
101 100 MATHS - INTRO STEVE 1
102 100 MATHS - ADVANCED MIKE 1
102 100 MATHS - ADVANCED STEVE 1
201 200 ENGLISH - INTRO ELSA 1
203 200 ENGLISH - ADVANCED ELSA 1
The rest is simple - group on main subject and add the title of it
with subsr as (
SELECT su.CODE,
substr(trim(su.CODE),1,2)||'0' main_subject_code,
su.SUBJECT_NAME,
st.STUDENT_NAME,
CASE WHEN ( (st.CLASS_LEADER = 'no'
OR st.CLASS_LEADER IS NULL)
AND st.EXPERT = 'N' /*IS NULL*/)
THEN 1 ELSE 0 END AS "Average Student"
FROM subjects su
INNER JOIN students_subjects ss
ON su.code = ss.SUBJECT_CODE
INNER JOIN STUDENTS st
ON ss.ID /* STUDENT_ID */ = st.ID
)
select
main_subject_code,
(select SUBJECT_NAME from SUBJECTS where CODE = main_subject_code) main_subject_name,
sum("Average Student") "Average Student"
from subsr
group by main_subject_code
order by main_subject_code;
MAIN_SUBJECT_CODE MAIN_SUBJECT_NAME Average Student
----------------- ------------------------- ---------------
100 MATHS 4
200 ENGLISH 2
Your posted query contains a lot of extraneous logic which doesn't seem releavnt to your apparent task. So I'm ignoring it and focusing on simply getting "the number of students who signed up for the class for morning session under each major subject without the duplicates".
select major
, count(*)
from (
select distinct subj.major
, ss.id as student_id
from
( select code,
regexp_replace(subject_name, '^([A-Z]+)(.*)', '\1') major ,
from subjects
where session = 'AM'
) subj
join student_subjects ss
on ss.subject_code = subj.code
)
group by major
order by major
/
The subquery on SUBJECTS use a regex function to extract the leading element of the subject name as the major. It works for the posted sample data but might fail for more complicated names. Regex shouldn't be necessary: a proper data model would separate the MAJOR subject from its subsidiaries.

Trying to figure out top 5 land areas of the 50 states in the U.S

I have a table created. With one column named states and another column called land area. I am using oracle 11g. I have looked at various questions on here and cannot find a solution. Here is what I have tried so far:
SELECT LandAreas, State
FROM ( SELECT LandAreas, State, DENSE_RANK() OVER (ORDER BY State DESC) sal_dense_rank
FROM Map )
WHERE sal_dense_rank >= 5;
This does not provide the top 5 land areas as far as number wise.
I have also tried this one but no go either:
SELECT * FROM Map order by State desc)
where rownum < 5;
Anyone have any suggestions to get me on the right track??
Here is a samle of the table
states land areas
michagan 15000
florida 25000
tennessee 10000
alabama 80000
new york 150000
california 20000
oregon 5000
texas 6000
utah 3000
nebraska 1000
Desired output from query:
States land area
new york 150000
alabama 80000
florida 25000
california 20000
Try:
Select * from
(SELECT State, LandAreas FROM Map ORDER BY LandAreas DESC)
where rownum < 6
Link to Fiddle
Use a HAVING clause and count the number state states larger:
SELECT m.state, m.landArea
FROM Map m
LEFT JOIN Map m2 on m2.landArea > m.landArea
GROUP BY m.state, m.landArea
HAVING count(*) < 5
ORDER BY m.landArea DESC
See SQLFiddle
This joins each state to every state whose area is greater, then uses a HAVING clause to return only those states where the number of larger states was less than 5.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
The left join is needed for the case of the largest state, which has no other larger state to join to.
The ORDER BY is optional.
Try something like this
select m.states,m.landarea
from map m
where (select count(‘x’) from map m2 where m2.landarea > m.landarea)<=5
order by m.landarea
There are two bloomers in your posted code.
You need to use landarea in the DENSE_RANK() call. At the moment you're ordering the states in reverse alphabetical order.
Your filter in the outer query is the wrong way around: you're excluding the top four results.
Here is what you need ...
SELECT LandArea, State
FROM ( SELECT LandArea
, State
, DENSE_RANK() OVER (ORDER BY landarea DESC) as area_dr
FROM Maps )
WHERE area_dr <= 5
order by area_dr;
... and here is the SQL Fiddle to prove it. (I'm going with the statement in the question that you want the top 5 biggest states and ignoring the fact that your desired result set has only four rows. But adjust the outer filter as you will).
There are three different functions for deriving top-N result sets: DENSE_RANK, RANK and ROW_NUMBER.
Using ROW_NUMBER will always guarantee you 5 rows in the result set, but you may get the wrong result if there are several states with the same land area (unlikely in this case, but other data sets will produce such clashes). So: 1,2,3,4,5
The difference between RANK and DENSE_RANK is how they handle ties. DENSE_RANK always produces a series of consecutive numbers, regardless of how many rows there are in each rank. So: 1,2,2,3,3,3,4,5
RANK on the other hand will produce a sparse series if a given rank has more than one hit. So: 1,2,2,4,4,4.
Note that each of the example result sets has a different number of rows. Which one is correct? It depends on the precise question you want to ask.
Using a sorted sub-query with the ROWNUM pseudo-column will work like the ROW_NUMBER function, but I prefer using ROW_NUMBER because it is more powerful and more error-proof.

Resources