Creating dataset outputs..where to start with this one? - sorting

The variables in the data set are: Event, EventType, FName, LName, Age, Gender, Score.
I am trying to create a report that gives me the Lowest 5 scores per event/event type per each Age(18-65) per Gender.
For example, I want the 5 lowest scores for everyone who participated in EventA EventtypeB who were 18 year old females then I want all the 19 year old females and so on.. For each gender. A side note- Not all ages have 5 participants.. For example there may be no 20 year olds who participated and there may only be 2 21 year olds.
I originally tried to tackle this by making a bunch of separate data sets for each age, but I know there must be a better way to do it. I would appreciate any help thank you I am pretty new to SAS but I have introductory experience in all aspects.
Here is some sample input:
Mile Sprint John Smith 19 Male 15.31
Mile Sprint Alex Doe 19 Male 13.21
Mile Sprint Ian Sore 19 Male 23.51
Mile Sprint Sean Lae 19 Male 12.34
Mile Sprint Mike Rai 19 Male 17.27
Mile Sprint Connor Te 19 Male 11.23
Mile Sprint Simon Doe 19 Male 15.21
Mile Long Jane Joy 37 Female 35.12
Mile Long Victoria K 37 Female 27.31
Mile Long Chris Li 25 Male 23.43
For the Mile Sprint 19 Males I would want it to return:
Mile Sprint Connor Te 19 Male 11.23
Mile Sprint Sean Lae 19 Male 12.34
Mile Sprint Alex Doe 19 Male 13.21
Mile Sprint Simon Doe 19 Male 15.21
Mile Sprint John Smith 19 Male 15.31
For the Mile Long 37 Female I would want it to just return this due to there not being 5 participants:
Mile Long Victoria K 37 Female 27.31
Mile Long Jane Joy 37 Female 35.12
With the sample input shown I am trying to get the 5 lowest scores for Mile Sprint for Males age 19. Then the same for age 20-65. Then the same for Mile Long for all males. Vice versa for females. With the assumption that there may not be 5 participants in a race or there may be more than 5. Is there anyway to do all of this in one or two dataset outputs?

/* Creating Sample dataset */
data input_dataset;
infile datalines dlm=",";
input Event : $10.
EventType : $ 10.
FName : $10.
LName : $10.
Age : 8.
Gender : $10.
Score : 8.
;
datalines;
Mile,Sprint,John,Smith,19,Male,15.31
Mile,Sprint,Alex,Doe,19,Male,13.21
Mile,Sprint,Ian,Sore,19,Male,23.51
Mile,Sprint,Sean,Lae,19,Male,12.34
Mile,Sprint,Mike,Rai,19,Male,17.27
Mile,Sprint,Connor,Te,19,Male,11.23
Mile,Sprint,Simon,Doe,19,Male,15.21
Mile,Long,Jane,Joy,37,Female,35.12
Mile,Long,Victoria,K,37,Female,27.31
Mile,Long,Chris,Li,25,Male,23.43
;
run;
/* Sorting based on desired parameters - event EventType age Gender */
proc sort data = input_dataset;
by event EventType age Gender score;
run;
/*Picking the lowest five scores based on above parameters */
data input_dataset_1(drop=num);
set input_dataset;
retain num;
by event EventType age Gender score;
if first.gender then num=1 ; else num=num+1;
if num<=5;
run;

Related

How to store multiple values in a variable to be used in a case statement

I am having this issue and any help in this regard will greatly be appreciated.
I have Oracle db and working with following business case:
An employee can work in a different job grades in his/her regular time hours or in overtime
Need to calculate employee’s hours w.r.t. different job grades and wage codes, because I have hours and job grades in different tables and the table which has job grades doesn’t have hours, instead time in and time out so after querying the db I get the following result.
Emp_ID
Wage Code
Job grade
Hours
Date
1
01
8
2021/06/07
1
02
P
2
2021/06/07
1
08
8
2021/06/08
1
01
6
2021/06/09
1
01
E
8
2021/06/09
1
01
8
2021/06/10
1
01
8
2021/06/11
1
02
9
2021/06/11
Now I get wrong hours when the employee works in different job grade(s).
To overcome this, I need to identify on which date employee worked in a different job grade do I can put case statement.
I used this logic.
Pick the date on which employee worked in different job grade and on that date do calculation of hours from table A
Other wise do calculation of hours from table B.
The problem is I can’t simply use variables because there could be multiple dates.
How can I achieve this? Can I use any other logic?
Thanks,
Here are my tables
TABLE A
Emp_ID
Wage_code
time_in
time_out
Job_grade
Date
01
8:00
16:00
2021-06-7
01
16:00
18:00
P
2021-06-7
01
8:00
16:00
2021-06-08
01
8:00
14:00
2021-06-09
01
14:00
16:00
E
2021-06-09
01
8:00
16:00
2021-06-10
01
8:00
16:00
2021-06-11
01
16:00
17:00
2021-06-11
This table doesn't store wage_codes. empty job_grade means employee has worked in the same job grade
TABLE B
Emp_ID
Wage_code
Hours
Date
01
1
8
2021-06-7
01
2
2
2021-06-7
01
8
8
2021-06-08
01
1
8
2021-06-09
01
1
8
2021-06-10
01
1
8
2021-06-11
01
2
2
2021-06-11
This table stores wage_codes but no job grade change, just a regular one and hours for each wage_code (1=regular,2=overtime,8=vacation etc..)
my query
select
A.emp_id,
A.job_grade,
B.Wage_code,
B.Date,
case
when A.job_grade ='' then B.Hours
else
to_char(A.time_in - A.time_out) *(24),'fm99.90')
end "Hours"
from A
left join B on A.emp_id=B.emp_id and A.Date=B.Date
With this query I get wrong hours when employee has worked in a different job grade. Because the condition in case statement checks if job grade is empty then calculate hours from Table B. Now e.g. on 06/07, employee has worked in a normal grade as well as in a different job grade.
How can I identify the date on which employee has worked in a different job grade so I can combine it with the job_grade condition in case statement and calculate hours accurately.
Many thanks for your support!!

How to categorize data in a ORACLE table based on a primary key?

I have a table TRAVEL with column TRIP_ID as primary key.
SOURCE DATA
TRIP_ID PERSON_NAME DESTINATION SOURCE TRANSACTION_ID TRIP_COUNT
100 Mike London Zurich 1000B112 1
101 Mike Paris Capetown 1000B112 1
102 Mike Moscow Madrid 1000B112 1
103 John Delhi Moscow 1100A110 1
104 John Toronto Zurich 1100A110 1
105 Mary Chennai Madrid 1100A111 1
106 Mary Berlin Zurich 1100A111 1
EXPECTED RESULTS:
when I do select * from TRAVEL where TRANSACTION_ID = 1100A111 it returns below two rows as below .
so I want my data to be categorized based on the transaction_ID on a run time.I dont want to hardcode the value for transaction-ID each time as above but i want to group it in such a way that it should fetch me the above expected results.I mean it should return all the data which are corresponding to the TransactionID in the table and it should not sum up the TRIP COUNT.It should return me the rows as below in my table.I am ok to create view .Please suggest
TRIP_ID PERSON_NAME DESTINATION SOURCE TRANSACTION_ID TRIP_COUNT
105 Mary Chennai Madrid 1100A111 1
106 Mary Berlin Zurich 1100A111 1
Can someone suggest a query in ORACLE to handle this ? I donot want to hardcode transaction ID
Regards
Sameer

Using Powershell to get third table from a website

I am trying to scrape the Team Statistics table from https://www.hockey-reference.com/leagues/NHL_2020.html , but am not getting the full expected output.
My code:
#get the data
$data = Invoke-WebRequest $url
#get the first table
$table = $data.ParsedHtml.getElementsByTagName("table") | Select -skip 2 | Select -First 1 |
#get the rows
$rows = $table.rows
#get table headers
$headers = $rows.item(1).children | select -ExpandProperty InnerText
#count number of rows
$NumOfRows = $rows | Measure-Object
#Manually injecting TeamName
$headers = #($headers[0];'TeamName';$headers[1..($headers.Length-1)])
#enumerate the remaining rows (skipping the header row) and create a custom object
$out = for ($i=2;$i -lt $NumofRows.Count;$i++) {
#define an empty hashtable
$objHash=[ordered]#{}
#get the child rows
$rowdata = $rows.item($i).children | select -ExpandProperty InnerText
for ($j=0;$j -lt $headers.count;$j++) {
#add each row of data to the hash table using the corresponding
#table header value
$objHash.Add($headers[$j],$rowdata[$j])
} #for
#turn the hashtable into a custom object
[pscustomobject]$objHash
} #for
The output:
Special Teams Shot Data
------------- ---------
Rk AvAge
1 Washington Capitals
2 St. Louis Blues
3 Boston Bruins
4 Pittsburgh Penguins
5 Tampa Bay Lightning
6 New York Islanders
7 Colorado Avalanche
8 Columbus Blue Ja...
9 Carolina Hurricanes
10 Vancouver Canucks
11 Philadelphia Flyers
12 Dallas Stars
13 Edmonton Oilers
14 Toronto Maple Leafs
15 Florida Panthers
16 Vegas Golden Kni...
17 Arizona Coyotes
18 Calgary Flames
19 Winnipeg Jets
20 Chicago Blackhawks
21 Buffalo Sabres
22 Montreal Canadiens
23 Nashville Predators
24 New York Rangers
25 Minnesota Wild
26 San Jose Sharks
27 Anaheim Ducks
28 Ottawa Senators
29 New Jersey Devils
30 Los Angeles Kings
31 Detroit Red Wings
League Ave... 27.8
I believe my issue lies in what I am selecting, though I haven't been able to figure out how to select all the necessary parts for that specific table.
Ideally I'd like to get it looking near identical to what's on the website, but getting all the stats outputted is just as good.

How to convert the following source into the below target [duplicate]

This question already has answers here:
"pivot" table Oracle - how to change row items into columns
(2 answers)
Closed 5 years ago.
Student Name Subject Name Marks
Sam Maths 100
Tom Maths 80
Sam Physical Science 80
John Maths 75
Sam Life Science 70
John Life Science 100
John Physical Science 85
Tom Life Science 100
Tom Physical Science 85
We want to load our Target Table as:
Student Name Maths Life Science Physical Science
Sam 100 70 80
John 75 100 85
Tom 80 100 85
Try This.
SELECT student_name,
"Maths",
"Life Science",
"Physical Science"
FROM
(SELECT s.*
FROM Student s )
PIVOT (MAX(Marks)
FOR subject_name IN
('Maths' AS "Maths",'Life Science' AS "Life Science",'Physical Science' AS "Physical Science") )
ORDER BY 3;

How to get multiple rows into single row data in oracle sql?

Suppose in our Source Table we have data as given below:
Student Name Subject Name Marks
Sam Maths 100
Tom Maths 80
Sam Physical Science 80
John Maths 75
Sam Life Science 70
John Life Science 100
John Physical Science 85
Tom Life Science 100
Tom Physical Science 85
We want to load our Target Table as:
Student Name Maths Life Science Physical Science
Sam 100 70 80
John 75 100 85
Tom 80 100 85
Use the PIVOT operator:
SELECT *
FROM source
PIVOT ( MAX( marks ) FOR subject_name IN (
'Maths' AS Maths,
'Life Science' AS Life_Science,
'Physical Science' AS Physical_Science
) );
Output:
STUDENT_NAME MATHS LIFE_SCIENCE PHYSICAL_SCIENCE
------------ ----- ------------ ----------------
Sam 100 70 80
John 75 100 85
Tom 80 100 85

Resources