Powershell using Foreach-ObjectFast and Where-ObjectFast - windows

I never really worked with Powershell, so I am quite stuck with this. My goal is to merge multiple CSV's into 1, more specifically 3 at the moment.
Using Import-Csv and Foreach-Object I managed to achieve this, however super-incredibly slow. I have discovered this article so I gave it a try. Incredible fast iteration.
Unfortunately I am too dumb with Powershell to understand why I cannot use Where-ObjectFast properly, it won't match anything.
My code example:
Measure-Command { $CSV1 = Import-CSV -Path .\CSV1.csv }
Measure-Command { $CSV2 = Import-CSV -Path .\CSV2.csv }
Measure-Command {
Import-Csv -Path .\CSV3.csv | Foreach-ObjectFast {
$row = $_;
$match = $CSV1 | Where-ObjectFast -FilterScript { $_.name -eq $row.'name' }
$dbg1 = 'Matched: {0}' -f $match; Write-Host $dbg1 -foreground Cyan;
# continued... } }
What I need to do basically is to match "name" from CVS3 with "name" from CSV1, and other fields as needed, then do the same for CSV2 and output to a final file.
It seems that when using Where-ObjectFast $_ is empty (?).
Please advise what I am doing wrong here, I would really appreciate it.

The problem you're having is not with the overhead from ForEach-Object binding and processing the input - so replacing ForEach-Object with ForEach-ObjectFast is not going to have a significant impact.
If you want to pivot on the name column (or any other column), build index tables with a hashtable/dictionary:
$CSV1 = #{}
Import-Csv -Path .\CSV1.csv |ForEach-Object { $CSV1[$_.name] = $_ }
$CSV2 = #{}
Import-Csv -Path .\CSV2.csv |ForEach-Object { $CSV2[$_.id] = $_ }
Now you don't need to wait for Where-Object to search through each collection:
Import-Csv -Path .\CSV3.csv |ForEach-Object {
$row = $_
# This is going to be MUCH faster than ... |Where-Object { ... }
$csv1match = $CSV1[$row.name]
$csv2match = $CSV1[$row.id]
# join $row,$csv1match,$csv2match here
}

Related

Powershell Script isn't returning anything (Running on MAC OS)

I've been messing with this powershell script (i installed powershell on my mac OS) I also modified the code a bit in the first line.
I am not getting any errors, just nothing happens.
$folder = “/Users/mbp/Desktop/nier_unpacked_2_extracted“
$files = gci -recurse $folder | where { ! $_.PSIsContainer }
$fileContents = $files | foreach { gc -encoding utf8 $_.fullname }
$lines = $fileContents | foreach { if ($_ -match "^JP: (.*)$") { $matches[1] } }
$chars = $lines | foreach { $_.ToCharArray() }
$groups = $chars | group-object
$totals = $groups | sort-object -desc -property count
Basically outputting japanese text characters and how often they show up.
This is the original code(before modification):
$folder = "F:\nier_unpacked_2_extracted"
$files = gci -recurse $folder | where { ! $_.PSIsContainer }
$fileContents = $files | foreach { gc -encoding utf8 $_.fullname }
$lines = $fileContents | foreach { if ($_ -match "^JP: (.*)$") { $matches[1] } }
$chars = $lines | foreach { $_.ToCharArray() }
$groups = $chars | group-object
$totals = $groups | sort-object -desc -property count
Here is the link to the resource i got the code from if that helps: https://dev.to/nyctef/extracting-game-text-from-nier-automata-1gm0
I'm not sure why nothing is returning unfortunately.
In PowerShell (as in most other programming languages), $totals = ... means that you assign the result of the expression at the right side is assigned to the variable ($totals) at the left side.
To display the contents of the variable ($totals), you might use the Write-Output $totals, Write-Host $totals, Out-Defualt $totals, along with a lot of other output cmdlets.
Anyways, in PowerShell, it is generally not necessary to use a cmdlet in instances where the output is displayed by default. For example:
$totals Enter

Trying to pull out newest errors in Log file and the create custom.txt output

I am new to powershell scripting and have been tasked to create some alerts based on errors in certain logfiles. These are just logs from a bespoke application.
My current Code is
`$OutputFile3 = (Get-Location).Path + ".\Results.txt"
$Sourcefolder= "C:\Users\dewana\Documents\Test\"
$Targetfolder= "C:\Users\dewana\Documents\Test\Test3"
Get-ChildItem -Path $Sourcefolder -Recurse|
Where-Object {
$_.LastWriteTime -gt [datetime]::Now.AddMinutes(-5)
}| Copy-Item -Destination $Targetfolder
$Testing5 = Get-Content -Tail -1 -Path "C:\Users\dewana\Documents\Test\Test3\*.txt" | Where-Object
{ $_.Contains("errors") }
Remove-Item $OutputFile3
New-Item $OutputFile3 -ItemType file
try
{
$stream = [System.IO.StreamWriter] $OutputFile3
$stream.WriteLine('clientID 1111')
$stream.WriteLine('SEV 1')
$stream.WriteLine('Issue with this process')
}
finally
{
$stream.close()
}`
What i am struggling with is trying is
$Testing5 = Get-Content -Tail -1 -Path "C:\Users\dewana\Documents\Test\Test3\*.txt" | Where-Object { $_.Contains("errors") }
I am trying to store the latest string which contains the word error in the log file. i would want to use the stored string to the create an if statement to say if $Testing5 have a new value of error assigned the create a custom text file.
I can't seem to find out why the get-content is not working with the where-object
The only issue I can see is your Where-Object code block is on the next line.
Get-Content -Tail -1 -Path $tempfile | Where-Object
{ $_.Contains("errors") }
If you separate at the pipe it's fine.
Get-Content -Tail -1 -Path $tempfile |
Where-Object { $_.Contains("errors") }

Strip all lines from a file that match a pattern, except the first occurrence

I have a directory of .txt files that look like this:
[LINETYPE]S[STARTTIME]00:00:00
[LINETYPE]P[STARTTIME]00:00:00
[LINETYPE]B[STARTTIME]00:59:00
[LINETYPE]C[STARTTIME]00:59:00
[LINETYPE]C[STARTTIME]00:59:30
[LINETYPE]S[STARTTIME]01:00:00
[LINETYPE]P[STARTTIME]01:00:00
[LINETYPE]B[STARTTIME]01:59:00
[LINETYPE]C[STARTTIME]01:59:00
[LINETYPE]C[STARTTIME]01:59:30
[LINETYPE]S[STARTTIME]02:00:00
I'd like to remove all occurrences of [LINETYPE]S except the first, which happens to always be 00:00:00 and on the first line, and then re-save the file to a new location.
That is, [LINETYPE]S[STARTTIME]00:00:00 must always be present, but the other lines that start with [LINETYPE]S need to be removed.
This is what I came up with, which works except it removes all [LINETYPE]S lines, including the first. I can't seem to figure out how to do that part after Googling for a while, so I'm hoping someone can point me in the right direction. Thanks for your help!
Get-ChildItem "C:\Users\Me\Desktop\Samples" -Filter *.txt | ForEach-Object {
Get-Content $_.FullName | Where-Object {
$_ -notmatch "\[LINETYPE\]S"
} | Set-Content ('C:\Users\Me\Desktop\Samples\Final\' + $_.BaseName + '.txt')
}
i couldn't figure out how to do this via a pipeline [blush], so i went with a foreach loop and a compound test.
# fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
[LINETYPE]S[STARTTIME]00:00:00
[LINETYPE]P[STARTTIME]00:00:00
[LINETYPE]B[STARTTIME]00:59:00
[LINETYPE]C[STARTTIME]00:59:00
[LINETYPE]C[STARTTIME]00:59:30
[LINETYPE]S[STARTTIME]01:00:00
[LINETYPE]P[STARTTIME]01:00:00
[LINETYPE]B[STARTTIME]01:59:00
[LINETYPE]C[STARTTIME]01:59:00
[LINETYPE]C[STARTTIME]01:59:30
[LINETYPE]S[STARTTIME]02:00:00
'# -split [System.Environment]::NewLine
$KeepFirst = '[LINETYPE]S'
$FoundFirst = $False
$FilteredList = foreach ($IS_Item in $InStuff)
{
if ($IS_Item.StartsWith($KeepFirst))
{
if (-not $FoundFirst)
{
$IS_Item
$FoundFirst = $True
}
}
else
{
$IS_Item
}
}
$FilteredList
output ...
[LINETYPE]S[STARTTIME]00:00:00
[LINETYPE]P[STARTTIME]00:00:00
[LINETYPE]B[STARTTIME]00:59:00
[LINETYPE]C[STARTTIME]00:59:00
[LINETYPE]C[STARTTIME]00:59:30
[LINETYPE]P[STARTTIME]01:00:00
[LINETYPE]B[STARTTIME]01:59:00
[LINETYPE]C[STARTTIME]01:59:00
[LINETYPE]C[STARTTIME]01:59:30
at that point, you can send the new collection out to a file. [grin]
Try the following:
Get-ChildItem "C:\Users\Me\Desktop\Samples" -Filter *.txt |
Foreach-Object {
$count = 0
Get-Content $_.FullName |
Where-Object { $_ -notmatch '\[LINETYPE\]S' -or $count++ -eq 0 } |
Set-Content ('C:\Users\Me\Desktop\Samples\Final\' + $_.BaseName + '.txt')
}
The script block passed to Where-Object runs in the same scope as the caller, so variable $count can be directly updated.
The 1st line that does contain [LINETYPE]S is included, because $count is 0 at that point, after which $count is incremented ($count++); subsequent [LINETYPE]S are not included, because $count is then already greater than 0.

Improve the efficiency of my PowerShell script

The below code searches 400+ numbers from a list.txt file to see if it exists within any files within the folder path specified.
The script is very slow and has yet to complete as it did not complete after 25 minutes of running. The folder we are searching is 507 MB (532,369,408 bytes) and it contains 1,119 Files & 480 Folders. Any help to improve the speed of the search and the efficiency is greatly appreciated.
$searchWords = (gc 'C:\temp\list.txt') -split ','
$results = #()
Foreach ($sw in $searchWords)
{
$files = gci -path 'C:\Users\david.craven\Dropbox\Asset Tagging\_SJC Warehouse_\_Project Completed_\2018\A*' -filter "*$sw*" -recurse
foreach ($file in $files)
{
$object = New-Object System.Object
$object | Add-Member -Type NoteProperty –Name SearchWord –Value $sw
$object | Add-Member -Type NoteProperty –Name FoundFile –Value $file.FullName
$results += $object
}
}
$results | Export-Csv C:\temp\output.csv -NoTypeInformation
The following should speed up your task substantially:
If the intent is truly to look for the search words in the file names:
$searchWords = (Get-Content 'C:\temp\list.txt') -split ','
$path = 'C:\Users\david.craven\Dropbox\Facebook Asset Tagging\_SJC Warehouse_\_Project Completed_\2018\A*'
Get-ChildItem -File -Path $path -Recurse -PipelineVariable file |
Select-Object -ExpandProperty Name |
Select-String -SimpleMatch -Pattern $searchWords |
Select-Object #{n='SearchWord'; e='Pattern'},
#{n='FoundFile'; e={$file.FullName}} |
Export-Csv C:\temp\output.csv -NoTypeInformation
If the intent is to look for the search words in the files' contents:
$searchWords = (Get-Content 'C:\temp\list.txt') -split ','
$path = 'C:\Users\david.craven\Dropbox\Facebook Asset Tagging\_SJC Warehouse_\_Project Completed_\2018\A*'
Get-ChildItem -File -Path $path -Recurse |
Select-String -List -SimpleMatch -Pattern $searchWords |
Select-Object #{n='SearchWord'; e='Pattern'},
#{n='FoundFile'; e='Path'} |
Export-Csv C:\temp\output.csv -NoTypeInformation
The keys to performance improvement:
Perform the search with a single command, by passing all search words to Select-String. Note: -List limits matching to 1 match (by any of the given patterns).
Instead of constructing custom objects in a script block with New-Object and Add-Member, let Select-Object construct the objects for you directly in the pipeline, using calculated properties.
Instead of building an intermediate array iteratively with += - which behind the scenes recreates the array every time - use a single pipeline to pipe the result objects directly to Export-Csv.
So there are definitely some basic things in the PowerShell code you posted that can be improved, but it may still not be super fast. Based on the sample you gave us I'll assume you're looking to match the file names against a list of words. You're looping through the list of words (400 iterations) and in each loop you're looping through all 1,119 files. That's a total of 447,600 iterations!
Assuming you can't reduce the number of iterations in the loop, let's start by making each iteration faster. The Add-Member cmdlet is going to be really slow, so switch that approach up by casting a hashtable to the [PSCustomObject] type accelerator:
[PSCustomObject]#{
SearchWord = $Word
File = $File.FullName
}
Also, there is no reason to pre-create an array object and then add each file to it. You can simply capture the ouptut of the foreach loop in a variable:
$Results = Foreach ($Word in $Words)
{
...
So a faster loop might look like this:
$Words = Get-Content -Path $WordList
$Files = Get-ChildItem -Path $Path -Recurse -File
$Results = Foreach ($Word in $Words)
{
foreach ($File in $Files)
{
if ($File.BaseName -match $Word)
{
[PSCustomObject]#{
SearchWord = $Word
File = $File.FullName
}
}
}
}
A simpler approach might be to use Where-Object on the files array:
$Results = Foreach ($Word in $Words)
{
$Files | Where-Object BaseName -match $Word
}
Try both and test out the performance.
So if speeding up the loop doesn't meet your needs, try removing the loop entirely. You could use regex and join all the words together:
$Words = Get-Content -Path $WordList
$Files = Get-ChildItem -Path $Path -Recurse -File
$WordRegex = $Words -join '|'
$Files | Where basename -match $WordRegex

PowerShell: Script won't count files older than 30 from last modified date

all.
I'm stuck. I have a PowerShell script which looks to a specific folder for files which are older than 30 days from the last modified date (additionally, it'll create the folder if it doesn't exist). It creates the folder, it gives me the total files, it'll list all of the files in a test query, but it won't actually count the number of 30+ day old files. I've tried several methods to get this count (some deriving from other solutions to delete old files from this site), but PowerShell just doesn't want to do it.
Here's my code so far...
$HomePath = $env:USERPROFILE
$CompanyFolder = "\Company"
$TimeSensativeFolder = "\TimeSensative"
$TimeSensativePath = $HomePath+$CompanyFolder+$TimeSensativeFolder
$OldFilesAmount = 0
$TotalFilesAmount = 0
$TimeLimit = (Get-Date).AddDays(-30)
$StatusOK = "No old files were found in the time sensative folder."
$StatusCreated = "The time sensative folder was created."
$StatusError1 = "There were old files found in the time sensative folder!"
$StatusError2 = "Unable to create the time sensative folder!"
function MakeTimeSensativeFolder ($TimeSensativePath) {
try {
md $TimeSensativePath -Force -ErrorAction Stop
Write-Host $StatusCreated
}
catch {
Write-Host $StatusError2
}
}
function CountOldFiles () {
$OldFilesAmount = $OldFilesAmount + 1
}
if(!(Test-Path $TimeSensativePath -PathType Container)) {
MakePHIFolder $TimeSensativePath
}
else {
}
try {
$TotalFilesAmount = (Get-ChildItem $PHIPath -Recurse -File | Measure-Object).Count
# I've tried this...
Get-Item $PHIPath | Foreach {$_.LastWriteTime} -ErrorAction Stop
if (Get-Content $_.LastWriteTime | Where-Object {$_ -gt $TimeLimit}) {
CountOldFiles
}
# And I've tried this...
Get-ChildItem -Path $PHIPath -Recurse -File | Foreach-Object {
if (Get-Content $_.LastWriteTime | Where-Object {$_ -gt $TimeLimit}) {
CountOldFiles
}
}
# I've even tried this...
Get-ChildItem $PHIPath -Recurse -File | ? {
-not $_.PSIsContainer -and $_.LastWriteTime -lt $TimeLimit
} | CountOldFiles
# And this, as well...
Get-ChildItem -Path $PHIPath -Recurse -File | Where-Object {$_.LastWriteTime -gt $TimeLimit} | CountOldFiles
}
catch {
MakeTimeSensativeFolder $TimeSensativePath
}
# Used for testing.
<#
Get-ChildItem $TimeSensativePath -Recurse -File
Write-Host "TimeSensative folder exists:" $TimeSensativePathExists
Write-Host "Home TimeSensative path:" $TimeSensativePath
Write-Host "Old files found:" $OldFilesAmount
Write-Host "Total files found:" $TotalFilesAmount
Exit
#>
# Determining proper grammar for status message based on old file count.
if ($OldFilesAmount -eq 1) {
$StatusError1 = "There was "+$OldFilesAmount+" old file of "+$TotalFilesAmount+" total found in the PHI folder!"
}
if ($OldFilesAmount -gt 1) {
$StatusError1 = "There were "+$OldFilesAmount+" old files of "+$TotalFilesAmount+" total found in the PHI folder!"
}
# Give statuses.
if ($OldFilesAmount -gt 0) {
Write-Host $StatusError1
}
else {
Write-Host $StatusOK
}
Depending on which I tried, I would get no result or I'd get something like this:
Get-Content : Cannot find drive. A drive with the name '12/22/2016 17' does not exist.
At C:\Users\nobody\Scripts\PS1\ts_file_age.ps1:54 char:14
+ if (Get-Content $_.LastWriteTime | Where-Object {$_ -gt $Tim ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (12/22/2016 17:String) [Get-Content], DriveNotFoundException
+ FullyQualifiedErrorId : DriveNotFound,Microsoft.PowerShell.Commands.GetContentCommand
In any instance, there's no old file count as I'm endeavoring to demand.
It's been a bit of a head scratcher. Any advice?
Thanks so much in advance!
Filtering files with last write time is easy enough. Like so,
$allFiles = gci
$d = (Get-Date).adddays(-30)
$newFiles = #()
$oldFiles = #()
$allFiles | % { if ($_.lastwritetime -ge $d) { $newFiles +=$_ } else { $oldFiles += $_ } }
What's done here is that first all the files are set in a collection. This isn't mandatory, but one can browse the collection to check that it's been populated properly. This is useful in cases one has complex paths or exclusion filters.
The second step is just to get a DateTime that is used later to divide files into old and new ones. Just like the sample did, so nothing interesting here. Actually, there's one little thing. The date is -30 days, but hours, minutes and seconds are based on current time. So if there's really tight limit, consider using midnight time ([datetime]::Today).AddDays(-30)
The third step is to declare two empty collections for new and old files.
The last step is to iterate through the $allFiles and check the last write time. If it's greater or equal to the cutpoint, add it into $newFiles, othervise $OldFiles.
After the last step, further processing should be simple enough.
This is what I do to get (delete in this case) files older than X days:
$Days = 5
$limit = (Get-Date).AddDays(-$Days)
$CurrentDate = Get-Date
#This will delete all files older than 5 days
Get-ChildItem -Path $Workdir -Recurse -Force | Where-Object { !$_.PSIsContainer -and $_.LastWriteTime -lt $limit } | Remove-Item -Force

Resources