I am trying to scrape the Team Statistics table from https://www.hockey-reference.com/leagues/NHL_2020.html , but am not getting the full expected output.
My code:
#get the data
$data = Invoke-WebRequest $url
#get the first table
$table = $data.ParsedHtml.getElementsByTagName("table") | Select -skip 2 | Select -First 1 |
#get the rows
$rows = $table.rows
#get table headers
$headers = $rows.item(1).children | select -ExpandProperty InnerText
#count number of rows
$NumOfRows = $rows | Measure-Object
#Manually injecting TeamName
$headers = #($headers[0];'TeamName';$headers[1..($headers.Length-1)])
#enumerate the remaining rows (skipping the header row) and create a custom object
$out = for ($i=2;$i -lt $NumofRows.Count;$i++) {
#define an empty hashtable
$objHash=[ordered]#{}
#get the child rows
$rowdata = $rows.item($i).children | select -ExpandProperty InnerText
for ($j=0;$j -lt $headers.count;$j++) {
#add each row of data to the hash table using the corresponding
#table header value
$objHash.Add($headers[$j],$rowdata[$j])
} #for
#turn the hashtable into a custom object
[pscustomobject]$objHash
} #for
The output:
Special Teams Shot Data
------------- ---------
Rk AvAge
1 Washington Capitals
2 St. Louis Blues
3 Boston Bruins
4 Pittsburgh Penguins
5 Tampa Bay Lightning
6 New York Islanders
7 Colorado Avalanche
8 Columbus Blue Ja...
9 Carolina Hurricanes
10 Vancouver Canucks
11 Philadelphia Flyers
12 Dallas Stars
13 Edmonton Oilers
14 Toronto Maple Leafs
15 Florida Panthers
16 Vegas Golden Kni...
17 Arizona Coyotes
18 Calgary Flames
19 Winnipeg Jets
20 Chicago Blackhawks
21 Buffalo Sabres
22 Montreal Canadiens
23 Nashville Predators
24 New York Rangers
25 Minnesota Wild
26 San Jose Sharks
27 Anaheim Ducks
28 Ottawa Senators
29 New Jersey Devils
30 Los Angeles Kings
31 Detroit Red Wings
League Ave... 27.8
I believe my issue lies in what I am selecting, though I haven't been able to figure out how to select all the necessary parts for that specific table.
Ideally I'd like to get it looking near identical to what's on the website, but getting all the stats outputted is just as good.
Related
I have a matrix visual in Power BI. The columns are departments and the rows years. The values are counts of people in each department each year. The departments obviously don't have a natural ordering, BUT I would like to reorder them using the total column count for each department in descending order.
For example, if Department C has 100 people total over the years (rows), and all the other departments have fewer, I want Department C to come first.
I have seen other solutions that add an index column, but this doesn't work very well for me because the "count of people" variable is what I want to index by and that doesn't already exist in my data. Rather it's a calculation based on individual people which each have a department and year.
If anyone can point me to an easy way of changing the column ordering/sorting that would be splendid!
| DeptA | DeptB | DeptC
------|-------|-------|-------
1900 | 2 | 5 | 10
2000 | 6 | 7 | 2
2010 | 10 | 1 | 12
2020 | 0 | 3 | 30
------|-------|-------|-------
Total | 18 | 16 | 54
Order: #2 #3 #1
I don't think there is a built-in way to do this like there is for sorting the rows (there should be though, so go vote for a similar idea here), but here's a possible workaround.
I will assume your source table is called Employees and looks something like this:
Department Year Value
A 1900 2
B 1900 5
C 1900 10
A 2000 6
B 2000 7
C 2000 2
A 2010 10
B 2010 1
C 2010 12
A 2020 0
B 2020 3
C 2020 30
First, create a new calculated table like this:
Depts = SUMMARIZE(Employees, Employees[Department], "Total", SUM(Employees[Value]))
This should give you a short table as follows:
Department Total
A 18
B 16
C 54
From this, you can easily rank the totals with a calculated column on this Depts table:
Rank = RANKX('Depts', 'Depts'[Total])
Make sure your new Depts table is related to the original Employees table on the Department column.
Under the Data tab, use Modeling > Sort by Column to sort Depts[Department] by Depts[Rank].
Finally, replace the Employees[Department] with Depts[Department] on your matrix visual and you should get the following:
Let's say i have a table structure like this :
ID | Name | SCHOOLNAME | CODESCHOOL
1 DARK Kindergarten 123 1
2 DARK Kindergarten 111 1
3 Knight NY University 3
4 Knight LA Senior HS 2
5 JOHN HARVARD 3
so, how to diplay all of the data above into like this :
ID | Name | SCHOOLNAME | CODESCHOOL
1 DARK Kindergarten 123 1
3 Knight NY University 3
5 JOHN HARVARD 3
my purpose is want to display data with the max of codeschool, but when i tried with my query below :
SELECT NAME, SCHOOLNAME, MAX(CODESCHOOL) FROM TABLE GROUP BY NAME, SCHOOLNAME
but the result is just like this :
ID | Name | SCHOOLNAME | CODESCHOOL
1 DARK Kindergarten 123 1
2 DARK Kindergarten 111 1
3 Knight NY University 3
4 Knight LA Senior HS 2
5 JOHN HARVARD 3
maybe it caused by the GROUP BY SCHOOLNAME, when i tried to not select SCHOOLNAME, the data displayed just like what i expected, but i need the SCHOOLNAME field for search condition in my query
hope you guys can help me out of this problem
any help will be appreciated
thanks
Using some wacky joins you can get a functional get max rows per category query.
What you essentially need to do is to join the table to itself and make sure that the joined values only contain the top values for the CODESCHOOL column.
I've also added a :schoolname parameter because you wanted to search by schoolname
Example:
SELECT
A.*
FROM
TABLE1 A
LEFT OUTER JOIN TABLE1 B ON B.NAME = A.NAME
AND B.CODESCHOOL < A.CODESCHOOL
WHERE
B.CODESCHOOL IS NULL AND
(
(A.SCHOOLNAME = :SCHOOLNAME AND :SCHOOLNAME IS NOT NULL) OR
(:SCHOOLNAME IS NULL)
);
this should create this output, note that dark has 2 outputs because it has 2 rows with the same code school which is the max in the dark "category"/name.
ID|NAME |SCHOOLNAME |CODESCHOOL
--| -----|----------------|----------
4|Knight|LA Senior HS | 2
5|JOHN |HARVARD | 3
2|DARK |Kindergarten 111| 1
1|DARK |Kindergarten 123| 1
It's not the most effective query but it should be more than good enough as a starting point.
Sidenote: I've been blatantly stealing this logic for a while from https://www.xaprb.com/blog/2007/03/14/how-to-find-the-max-row-per-group-in-sql-without-subqueries/
I am using an analytical window function ROW_NUMBER().
This will group (or partition) by NAME then select the top 1 CODESCHOOL in DESC order.
Select NAME,
SCHOOLNAME,
CODESCHOOL
From (
Select NAME,
SCHOOLNAME,
CODESCHOOL,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY CODESCHOOL DESC) as rn
from myTable)
Where rn = 1;
I have an Oracle table with values for two accounts and each record will have Date field. First day of the week will have only data relevant to the Day 1 but when we see the data for Day2 in a week it has accumulated data. So we need to subtract Day2 data from previous day data to calculate exact data for Day2.Similar approach for Day3..Day7.
Please suggest the best approach in SQL query to handle this requirement. I am very sorry to bother you. I am totally new to SQL.Really appreciate your valuable inputs.As an example, there are 6 columns with header are given below
Center Entity Bonus Year Period Incentive
MANUFACTURING NEW YORK 1200 FY18 31-12-2017 120
MANUFACTURING NEW YORK 1500 FY18 01-01-2018 250
MANUFACTURING NEW YORK 1800 FY18 01-01-2018 320
So assuming Dec 31, 2017 is the first day of the week, the data record will show only data for that day 1. When we move on to Day 2 of the week i.e. Jan 01, 2018, it has accumulated data which includes Day 1 and day2. So we need to subtract Day2 data from Day1 data to calculate exact data for data 2. 1500 - 1200 = 300 is the exact value for Day 2. Similar approach we need to follow for Day3, day4, Day5,Day6 and Day7.
Expected output is given below
Center Entity Bonus Year Period Incentive
MANUFACTURING NEW YORK 1200 FY18 01-01-2018 120
MANUFACTURING NEW YORK 300 FY18 01-01-2018 130
MANUFACTURING NEW YORK 300 FY18 01-01-2018 70
You could use a simple LAG() function with NVL().
select Center, Entity, Bonus,Year,Period,
Incentive - NVL( LAG(Incentive , 1) OVER ( ORDER BY Period ), 0) Incentive
FROM yourtable;
DEMO
You can do a self join with you table on period and do the subtraction from previous date's data.
looks like you have typo in the test data and result, the dates should be incremental as stated in description, but it has duplicate.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE t
(Center varchar2(13), Entity varchar2(8), Bonus int, Year varchar2(4), Period DATE, Incentive int)
;
INSERT INTO t (Center, Entity, Bonus, Year, Period, Incentive)
VALUES ('MANUFACTURING', 'NEW YORK', 1200, 'FY18', DATE '2017-12-31', 120);
INSERT INTO t (Center, Entity, Bonus, Year, Period, Incentive)
VALUES ('MANUFACTURING', 'NEW YORK', 1500, 'FY18', DATE '2018-01-01', 250);
INSERT INTO t (Center, Entity, Bonus, Year, Period, Incentive)
VALUES ('MANUFACTURING', 'NEW YORK', 1800, 'FY18', DATE '2018-01-02', 320);
;
Query:
select t1.center,
t1.entity,
t1.bonus - nvl (t2.bonus,0) bonus,
t1.year,
t1.period,
t1.incentive - nvl(t2.incentive,0) incentive
from t t1
left outer join t t2
on t1.period = t2.period + 1
order by t1.period
Results:
| CENTER | ENTITY | BONUS | YEAR | PERIOD | INCENTIVE |
|---------------|----------|-------|------|----------------------|-----------|
| MANUFACTURING | NEW YORK | 1200 | FY18 | 2017-12-31T00:00:00Z | 120 |
| MANUFACTURING | NEW YORK | 300 | FY18 | 2018-01-01T00:00:00Z | 130 |
| MANUFACTURING | NEW YORK | 300 | FY18 | 2018-01-02T00:00:00Z | 70 |
I have 3 tables; COMPANY, TRAINING TICKET and TEST.
COMPANY table:
COMPANY CODE | COMPANY NAME
192 ABC ENTERPRISE
299 XYZ ENTERPRISE
TRAINING TICKET table:
TICKET ID | COMPANY CODE | START DATE
2900 192 2015-02-02
3939 192 2015-03-03
4399 299 2015-03-02
TEST SESSION table:
TEST CODE | TICKET ID | COMPANY CODE | CERTIFIED
1221 2900 192 YES
2821 3939 192 NULL
3922 4399 299 YES
I need something like this:
C. CODE | COMPANY NAME | 1ST START DATE | TRAINING TICKET TOTAL | CERTIFIED TOTAL
192 ABC ENTERPRISE 2015-02-02 2 1
299 XYZ ENTERPRISE 2015-03-02 1 1
Its possible?
My Sql instruction is:
Select *, count(TICKET.CCODE) AS TICKET_TOTAL, count(TEST.CODE) AS CERT_TOTAL
from TICKET
Inner Join COMPANY on TICKET.CCODE = COMPANY.CCODE
Inner Join TEST on COMPANY.CCODE = TEST.CCODE
Group by (TICKET.CCODE),(TEST.CCODE)
Order by TICKET_TOTAL DESC
but both counts are always equals (same result for TICKET_TOTAL and CERT_TOTAL) and the sums are wrong - the result is TICKET_TOTAL = 21 and CERT_TOTAL = 28, but I got 523 - for TOP 1 company.
I got the answer:
Select COMPANY.CODE, COMPANY.NAME,
MIN(TICKET.STARTDATE), count(TICKET.TICKETID) AS TICKET_TOTAL,
count(TEST.CERTIFIED) AS CERT_TOTAL
from COMPANY
INNER JOIN TICKET ON COMPANY.CODE = TICKET.CCODE
LEFT JOIN TEST ON TICKET.TICKETID = TEST.TICKET
Group by (TICKET.CCODE)
ORDER BY TICKET_TOTAL DESC
1- Reorder and star the instruction from COMPANY TABLE
2- MIN(TICKET.STARTDATE) to got the First Start Date (Use MAX to got the Last Start Date if necessary)
3- Change Inner Join to Left Join (because some companies have a ticket on ticket table but does not have a test on test table)
Hope this can help someone in the future!
I have a table with columns :
Table Name: IdentityTable
ID Dest_Name Dest_Reference_Id Format IG
31231231 India Delhi XKHYGUI 21
12313131 USA Washington XHKOWKG 1
34645542 India Mumbai XKLWOFH 1
31231314 USA California XLGJDJG 21
31234531 India Delhi XKHIHUI 21
12375671 USA Washington XHKLHKG 21
12574613 USA Washington XLKWMKG 1
and so on...
I want to query this table to retrieve information in this form:
Dest_Name Dest_Reference_Id Total_Format Format IG
India Delhi 2 XKHYGUI 21
India Delhi 2 XKHIHUI 21
USA Washington 3 XHKOWKG 1
USA Washington 3 XHKLHKG 21
USA Washington 3 XLKWMKG 1
India Mumbai 1 XKLWOFH 1
USA California 1 XLGJDJG 21
I did:
select dest_name, dest_reference_id, count (format)
from IdentityTable
Group By dest_name, dest_reference_id;
I could retrieve all information required except Format column in the result. How should I modify my query to return expected result ?
Make two sub-queries and join em up.
SELECT counts.dest_name,
counts.dest_reference_id,
counts.total_format,
idents.format,
idents.IG
FROM
(
select dest_name, dest_reference_id, count (format) as total_format
from IdentityTable
Group By dest_name, dest_reference_id;
) counts
join
(
select distinct dest_name, dest_reference_id, format, IG
from identitytable
) idents
on counts.dest_name = idents.dest_name
and counts.dest_reference_id = idents.dest_reference_id
count(*) over(partition by dest_name, dest_reference_id)
and without a group by. google "oracle analytic functions".