How to change multiple headers in a table using Powershell - windows

I am trying to change multiple header names within my code that is pulling the Team Statistics table from this site
I am unsure where to manually change them in my code.
For example, I tried manually changing header 8, GF to GFPG in the line where I add the 'TEAM' header, but I get the error:
Exception calling "Add" with "2" argument(s): "Item has already been added. Key in dictionary: 'GF' Key being added: 'GF'"
At C:\NHLScraper.ps1:32 char:5
+ $objHash.Add($headers[$j],$rowdata[$j])
My code:
$url = "https://www.hockey-reference.com/leagues/NHL_2020.html"
#getting the data
$data = Invoke-WebRequest $url
#grab the third table
$table = $data.ParsedHtml.getElementsByTagName("table") | Select -skip 2 | Select -First 1
#get the rows of the Team Statistics table
$rows = $table.rows
#get table headers
$headers = $rows.item(1).children | select -ExpandProperty InnerText
#count the number of rows
$NumOfRows = $rows | Measure-Object
#Manually injecting TEAM header
$headers = #($headers[0];'TEAM';$headers[1..($headers.Length-1)])
#enumerate the remaining rows (we need to skip the header row) and create a custom object
$out = for ($i=2;$i -lt $NumofRows.Count;$i++) {
#define an empty hashtable
$objHash=[ordered]#{}
#getting the child rows
$rowdata = $rows.item($i).children | select -ExpandProperty InnerText
for ($j=0;$j -lt $headers.count;$j++) {
#add each row of data to the hash table using the correlated table header value
$objHash.Add($headers[$j],$rowdata[$j])
}
#turn the hashtable into a custom object
[pscustomobject]$objHash
}
$out | Select TEAM,AvAge,GP,W,L,OL,PTS,PTS%,GF,GA,SOW,SOL,SRS,SOS,TG/G,EVGF,EVGA,PP,PPO,PP%,PPA,PPOA,PK%,SH,SHA,PIM/G,oPIM/G,S,S%,SA,SV%,SO -SkipLast 1 | Export-Csv -Path "C:\$((Get-Date).ToString("'NHL Stats' yyyy-MM-dd")).csv" -NoTypeInformation

You can add a condition to check if the key has already been added and if so, update it or ignore it,
if (!$objHash.Contains(headers[$j]))
$objHash.Add($headers[$j],$rowdata[$j])
else
$objHash[$headers[$j]] = $rowdata[$j] # Overwrite values
But after looking at your code a few times, this doesnt make sense,
$out = for ($i=2;$i -lt $NumofRows.Count;$i++) {
#define an empty hashtable
$objHash=[ordered]#{} # Overwritten each loop???
#getting the child rows
$rowdata = $rows.item($i).children | select -ExpandProperty InnerText
for ($j=0;$j -lt $headers.count;$j++) {
#add each row of data to the hash table using the correlated table header value
$objHash.Add($headers[$j],$rowdata[$j]) # Dictionary cannot have duplicate keys
}
#turn the hashtable into a custom object
[pscustomobject]$objHash # what do you do with this?
}
You are looping over x number of times and each time you are overwriting the $objHash. only thing that would be returned is whats created in the last loop.
Suggested Solution
You can use another variable to keep track of all the hashtables you are creating along with making sure duplicate keys are not inserted that would throw the exception.
# If you want to change the header value from GF to GFPG, you can do that in the place you have defined $headers
#get table headers
$headers = $rows.item(1).children | select -ExpandProperty InnerText
$headers = $headers | % { if ($_ -eq "GF") { "GFPG" } else { $_ }}
#count the number of rows
$NumOfRows = $rows | Measure-Object
#Manually injecting TEAM header
$headers = #($headers[0];'TEAM';$headers[1..($headers.Length-1)])
#enumerate the remaining rows (we need to skip the header row) and create a custom object
$allData = #{}
$out = for ($i=2;$i -lt $NumofRows.Count;$i++) {
#define an empty hashtable
$objHash=[ordered]#{}
#getting the child rows
$rowdata = $rows.item($i).children | select -ExpandProperty InnerText
for ($j=0;$j -lt $headers.count;$j++) {
#add each row of data to the hash table using the correlated table header value
$objHash[$headers[$j]] = $rowdata[$j]
}
#turn the hashtable into a custom object
[pscustomobject]$objHash
$allData.Add($i, $objHash)
}
I used $AllData with i as the key to store each of those results that can later be accessed.

Related

PowerShell | Optimization search : the matching between the elements of two arrays knowing in advance that only one unique pair exists

I would like to optimize the process when I match the elements between two arrays (each contains several thousand elements). If the match is found then we move on to the next element instead of continuing to search for another match (which does not exist because each element is unique).
$array1 = #(thousandItemsForExample)
$array2 = #(thousandItemsForExample)
foreach ($array1item in $array1) {
$object = [PSCustomObject]#{
property1 = $array1item.property1
property2 = ($array1 | Where-Object { $_.property1 -eq $array2.property1 } | Select-Object property2).property2
}
I tried to find out if any of the comparison operators had this kind of option but I couldn't find anything.
Thank you! :)
PS : Sorry for my English, it's not my native language...
You do this with the help of a hash table that allows for fast look-ups. Also Group-Object -AsHashtable helps greatly with the construction of the hash table:
$array1 = #(thousandItemsForExample)
$array2 = thousandItemsForExample | Group-Object property1 -AsHashTable -AsString
$result = foreach ($item in $array1) {
[PSCustomObject]#{
property1 = $item.property1
property2 = $array2[$item.property1].property2
}
}
Create a hashtable and load all the items from $array2 into it, using the value of property1 as the key:
$array1 = #(thousandItemsForExample)
$array2 = #(thousandItemsForExample)
$lookupTable = #{}
$array2 |ForEach-Object {
$lookupTable[$_.property1] = $_
}
Fetching the corresponding item from the hashtable by key is going to be significantly faster than filtering the whole array with Where-Object everytime:
foreach ($array1item in $array1) {
$object = [PSCustomObject]#{
property1 = $array1item.property1
property2 = $lookupTable[$array1item.property1].property2
}
}

Powershell csv calculate total sum

I am currently working on powershell. Powershell is new for me so its kind of hard to figure out this one.
I have three headers in my csv files.
Headers include: Name, MessageCount and Direction.
Names are email addresses and those addresses are all the same. Direction have "Inbound" and "Outbound". MessageCount are bunch of diffrent numbers:
Overview
I want to calculate those number so i get "Inbound" and "Outbound" Totals and emails on those rows.
I am trying to foreach loop out MessageCount and calculate those together it will only give me output like this :
MessageCount
Try something like this:
$data = Import-Csv "path-to-your-csv-file";
$data | group Name
| select Name,
#{n = "Inbound"; e = {
(($_.Group | where Direction -eq "Inbound").MessageCount | Measure-Object -Sum).Sum }
},
#{n = "Outbound"; e = {
(($_.Group | where Direction -eq "Outbound").MessageCount | Measure-Object -Sum).Sum }
}
Code explanation
group Name groups results by property Name - in this case, email address. More here
select allows select property from object or create custom with #{n="";e={}}. More here
($_.Group | where Direction -eq "Outbound").MessageCount gets data from the group, searches for rows with Direction equal to Outbound and then gets the MessageCount from found rows.
Measure-Object -Sum takes array and creates object with properties ie. sum of values in array, so we get sum of MessageCount and return as custom property in object.

Powershell question - Looking for fastest method to loop through 500k objects looking for a match in another 500k object array

I have two large .csv files that I've imported using the import-csv cmdlet. I've done a lot of searching and trying and am finally posting to ask for some help to make this easier.
I need to move through the first array that will have anywhere from 80k rows to 500k rows. Each object in these arrays has multiple properties, and I then need to find the corresponding entry in a second array of the same size matching on a property from there.
I'm importing them as [systems.collection.arrayList] and I've tried to place them as hashtables too. I have even tried to muck with LINQ which was mentioned in several other posts.
Any chance anyone can offer advice or insight how to make this run faster? It feels like I'm looking in one haystack for matching hay in a different stack.
$ImportTime1 = Measure-Command {
[System.Collections.ArrayList]$fileList1 = Import-csv file1.csv
[System.Collections.ArrayList]$fileSorted1 = ($fileList1 | Sort-Object -property 'Property1' -Unique -Descending)
Remove-Variable fileList1
}
$ImportTime2 = Measure-Command {
[System.Collections.ArrayList]$fileList2 = Import-csv file2.csv
[System.Collections.ArrayList]$fileSorted2 = ($fileList2 | Sort-Object -property 'Property1' -Unique -Descending)
Remove-Variable fileList2
}
$fileSorted1.foreach({
$varible1 = $_
$target = $fileSorted2.where({$_ -eq $variable1})
###do some other stuff
})
This may be of use: https://powershell.org/forums/topic/comparing-two-multi-dimensional-arrays/
The updated solution in comment #27359 + add the suggested change by Max Kozlov in comment #27380.
Function RJ-CombinedCompare() {
[CmdletBinding()]
PARAM(
[Parameter(Mandatory=$True)]$List1,
[Parameter(Mandatory=$True)]$L1Match,
[Parameter(Mandatory=$True)]$List2,
[Parameter(Mandatory=$True)]$L2Match
)
$hash = #{}
foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]#{Owner=1;Value=$($data)}}
foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]#{Owner=2;Value=$($data)}}
foreach ($kv in $hash.GetEnumerator()) {
$m1, $m2 = $kv.Value.where({$_.Owner -eq 1}, 'Split')
[PSCustomObject]#{
MatchValue = $kv.Key
L1Matches = $m1.Count
L2Matches = $m2.Count
L1MatchObject = $L1Match
L2MatchObject = $L2Match
List1 = $m1.Value
List2 = $m2.Value
}
}
}
$fileList1 = Import-csv file1.csv
$fileList2 = Import-csv file2.csv
$newList = RJ-CombinedCompare -List1 $fileList1 -L1Match $(yourcolumnhere) -List2 $fileList2 -L2Match $(yourothercolumnhere)
foreach ($item in $newList) {
# your logic here
}
It should be fast to pass the lists into this hashtable and it's fast to iterate through as well.

Read multiple rows from Oracle table

I am trying to read a table in Oracle from PowerShell and want to save them in an ArrayList. The connection is working but reading the any rows after the first doesn't work.
Here's what I'm trying to do.
$rows = New-Object System.Collections.ArrayList
class Table {
[String] $name
[String] $type
}
try {
$oraConn.Open()
$sql = [string]::Format("select name, type from source_table where type = 'running'")
$oraCmd = New-Object System.Data.OracleClient.OracleCommand($sql, $oraConn)
$reader = $oraCmd.ExecuteReader()
#add tables to arraylist
while ($reader.Read()) {
$table = New-Object Table
$table.name = $reader["name"];
$table.type = $reader["type"];
[void]$rows.Add($table)
}
Write-Host "rows collected"
}
My problem is, I only read the first row of the table, how can I tell Oracle to read them all? Would I have to countthem first and then query for each row?
I check the contents of $rows later in the code, it's not really relevant to the question since I know that this part works, so I left it out.
I know that my query returns something because I tried it in Oracle.
Do I need a foreach loop? It would make sense but how can I tell Oracle to do that? Would I have to query for each row of the table and set a counter to query only one row at a time?
I hope someone can help me and point me in the right direction, since I'm already trying a long time to get my script working. I got most of the logic for my script, but if I can't load the rows into my list, my logic doesn't help me at all.
Use the following code snippet as a some base for an own solution:
$cs = 'data source=oradb;user id=/;dba privilege=sysdba'
$oc = new-object oracle.dataaccess.client.oracleconnection $cs
$oc.open()
$cm = new-object oracle.dataaccess.client.oraclecommand
$cm.connection = $oc
$cm.commandtext = "select name, type from source_table where type = 'running'"
$da = new-object oracle.dataaccess.client.oracledataadapter
$da.selectcommand = $cm
$tbl = new-object data.datatable
$da.fill($tbl)
$tbl | %{"$($_.name = $_.type)"}

Put results in a table and then sorted output

I am writing a script which produces two outputs with in a foreach loop , one string $server and one integer $util.(lets say I get 20 results)
What is the simplest approach to put my results in a table while running the loop and then I can output them sorted (descending) after the loop is finished ?
SERVER UTIL
------ ----
SERVER001 95
SERVER002 74
SERVER003 32
SERVER004 12
if you want to sort the results in descending order you will have put the results in an array and then sort outside the loop like so:
$arr = #()
foreach ($item in $collection)
{
$arr += [pscustomobject]#{
Server = $item.server
util = $item.util
}
}
$arr | Sort-Object -Property Util -Descending

Resources