Improve the efficiency of my PowerShell script - performance

The below code searches 400+ numbers from a list.txt file to see if it exists within any files within the folder path specified.
The script is very slow and has yet to complete as it did not complete after 25 minutes of running. The folder we are searching is 507 MB (532,369,408 bytes) and it contains 1,119 Files & 480 Folders. Any help to improve the speed of the search and the efficiency is greatly appreciated.
$searchWords = (gc 'C:\temp\list.txt') -split ','
$results = #()
Foreach ($sw in $searchWords)
{
$files = gci -path 'C:\Users\david.craven\Dropbox\Asset Tagging\_SJC Warehouse_\_Project Completed_\2018\A*' -filter "*$sw*" -recurse
foreach ($file in $files)
{
$object = New-Object System.Object
$object | Add-Member -Type NoteProperty –Name SearchWord –Value $sw
$object | Add-Member -Type NoteProperty –Name FoundFile –Value $file.FullName
$results += $object
}
}
$results | Export-Csv C:\temp\output.csv -NoTypeInformation

The following should speed up your task substantially:
If the intent is truly to look for the search words in the file names:
$searchWords = (Get-Content 'C:\temp\list.txt') -split ','
$path = 'C:\Users\david.craven\Dropbox\Facebook Asset Tagging\_SJC Warehouse_\_Project Completed_\2018\A*'
Get-ChildItem -File -Path $path -Recurse -PipelineVariable file |
Select-Object -ExpandProperty Name |
Select-String -SimpleMatch -Pattern $searchWords |
Select-Object #{n='SearchWord'; e='Pattern'},
#{n='FoundFile'; e={$file.FullName}} |
Export-Csv C:\temp\output.csv -NoTypeInformation
If the intent is to look for the search words in the files' contents:
$searchWords = (Get-Content 'C:\temp\list.txt') -split ','
$path = 'C:\Users\david.craven\Dropbox\Facebook Asset Tagging\_SJC Warehouse_\_Project Completed_\2018\A*'
Get-ChildItem -File -Path $path -Recurse |
Select-String -List -SimpleMatch -Pattern $searchWords |
Select-Object #{n='SearchWord'; e='Pattern'},
#{n='FoundFile'; e='Path'} |
Export-Csv C:\temp\output.csv -NoTypeInformation
The keys to performance improvement:
Perform the search with a single command, by passing all search words to Select-String. Note: -List limits matching to 1 match (by any of the given patterns).
Instead of constructing custom objects in a script block with New-Object and Add-Member, let Select-Object construct the objects for you directly in the pipeline, using calculated properties.
Instead of building an intermediate array iteratively with += - which behind the scenes recreates the array every time - use a single pipeline to pipe the result objects directly to Export-Csv.

So there are definitely some basic things in the PowerShell code you posted that can be improved, but it may still not be super fast. Based on the sample you gave us I'll assume you're looking to match the file names against a list of words. You're looping through the list of words (400 iterations) and in each loop you're looping through all 1,119 files. That's a total of 447,600 iterations!
Assuming you can't reduce the number of iterations in the loop, let's start by making each iteration faster. The Add-Member cmdlet is going to be really slow, so switch that approach up by casting a hashtable to the [PSCustomObject] type accelerator:
[PSCustomObject]#{
SearchWord = $Word
File = $File.FullName
}
Also, there is no reason to pre-create an array object and then add each file to it. You can simply capture the ouptut of the foreach loop in a variable:
$Results = Foreach ($Word in $Words)
{
...
So a faster loop might look like this:
$Words = Get-Content -Path $WordList
$Files = Get-ChildItem -Path $Path -Recurse -File
$Results = Foreach ($Word in $Words)
{
foreach ($File in $Files)
{
if ($File.BaseName -match $Word)
{
[PSCustomObject]#{
SearchWord = $Word
File = $File.FullName
}
}
}
}
A simpler approach might be to use Where-Object on the files array:
$Results = Foreach ($Word in $Words)
{
$Files | Where-Object BaseName -match $Word
}
Try both and test out the performance.

So if speeding up the loop doesn't meet your needs, try removing the loop entirely. You could use regex and join all the words together:
$Words = Get-Content -Path $WordList
$Files = Get-ChildItem -Path $Path -Recurse -File
$WordRegex = $Words -join '|'
$Files | Where basename -match $WordRegex

Related

How to select the file with the maximum number of the specified file

I want to keep only the file with the largest version of the specified zip file in the folder using powershell. I wrote a shell script but it returns all the files. How can I modify the script to select only the file with the largest version?
$files = Get-ChildItem -Filter "*.zip"
$max = $files |Measure-Object -Maximum| ForEach-Object {[int]($_.Split("_")[-1].Split(".")[0])}
$largestFiles = $files | Where-Object {[int]($_.Split("_")[-1].Split(".")[0]) -eq $max}
Write-Output $largestFiles
Expectation:
A1_Fantasic_World_20.zip
A1_Fantasic_World_21.zip
B1_Mythical_Realms_11.zip
B1_Mythical_Realms_12.zip
C1_Eternal_Frame_Corporation_2.zip
C1_Eternal_Frame_Corporation_3.zip
↓
A1_Fantasic_World_21.zip
B1_Mythical_Realms_12.zip
C1_Eternal_Frame_Corporation_3.zip
A1_Fantasic_World's biggest number is 21.B1_Mythical_Realms's is 12.C1_Eternal_Frame_Corporation's is 3. So I want to choose the biggest version of zip.
First you add the calculated properties to your file system objects you use for filtering. Then with a combination of Group-Object, Sort-Object and Select.Object you can filter the desired files.
$FileList =
Get-ChildItem -Filter *.zip |
Select-Object -Property *,
#{
Name = 'Title'
Expression = {($_.BaseName -split '_')[0..$(($_.BaseName -split '_').count - 2)] -join '_' }
},
#{
Name = 'Counter'
Expression = {[INT]($_.BaseName -split '_')[-1]}
}
$LastOnesList =
$FileList |
Group-Object -Property Title |
ForEach-Object {
$_.Group | Sort-Object -Property Counter | Select-Object -Last 1
}
$LastOnesList |
Select-Object -Property Name

PowerShell: Sort-Object descending wrong order

We currently have the following folder structure:
C:\Packages\Adobe\DC\9.2.2
C:\Packages\Adobe\DC\10.0.3
C:\Packages\Adobe\DC\10.0.8
C:\Packages\Microsoft\Edge\6.1.10
C:\Packages\Microsoft\Edge\6.1.18
C:\Packages\Microsoft\Edge\6.1.20
With a PowerShell script I try to keep only the highest version of the respective application folder and delete all others. The result should be:
C:\Packages\Adobe\DC\10.0.8
C:\Packages\Microsoft\Edge\6.1.20
However, the sorting doesn't seem to work properly in my script.
$folders_root = Get-ChildItem -Path C:\Packages -Directory
foreach ($folder in $folders_root)
{
$folders_appilcation = Get-ChildItem $folder.FullName -Directory
foreach ($app_folder in $folders_appilcation)
{
$versionfolder = Get-ChildItem $app_folder.FullName -Directory | Sort-Object -Descending | Select-Object -Skip 1 | % {Write-host "Deleting $($_.FullName)"<#; Remove-Item $_.FullName#>}
}
}
For the path "C: \ Packages \ Adobe \ DC" 9.2.2 is considered as the highest version (according to the script) but it should be 10.0.8.
Deleting C:\Packages\Adobe\DC\10.0.8
Deleting C:\Packages\Adobe\DC\10.0.3
Deleting C:\Packages\Microsoft\Edge\6.1.18
Deleting C:\Packages\Microsoft\Edge\6.1.10
Can someone tell me what I'm doing wrong?
Sort is comparing strings and not numbers. When you compare the string 10 and 9 it will interpret the 10 as a 1.
Force the sort to us it as version, try something like that:
sort {[version]::Parse(($_.split("\"))[-1])} -Descending
Thank you. The following code works perfectly:
$folders_root = Get-ChildItem -Path \\hkt-cmgmt01a\d$\Empirum\Configurator\Packages -Directory
foreach ($folder in $folders_root)
{
$folders_appilcation = Get-ChildItem $folder.FullName -Directory
foreach ($app_folder in $folders_appilcation)
{
$versionfolder = Get-ChildItem $app_folder.FullName -Directory | Sort-Object { [version] $_.Name } -Descending -ErrorAction SilentlyContinue| Select-Object -Skip 1 | % {Write-host "Deleting $($_.FullName)"<#; Remove-Item $_.FullName#>}
}
}

Passing filename to variable in Powershell?

I have hundreds of file in working directory that needs to be processed. It looks similar to this
a.txt
b.txt
c.txt
d.txt
All of these file can be processed manually like this
$lines = Get-Content "a.txt"
foreach ($line in $lines){
Out-File -FilePath "a-done.txt" -Encoding UTF8 -Append -InputObject ($line.Split(","))[0]
}
How to automate this process using loop by passing all filename to variable above.
I have tried foreach loop but it's not working
$lines = Get-Content "path/*.txt"
foreach ($line in $lines){
Out-File -FilePath "$lines-processed.txt" -Encoding UTF8 -Append -InputObject ($line.Split(","))[0]
}
What do I miss here?
To discover the files themselves, you'll want to use Get-ChildItem instead of Get-Content! To reference the file name without the extension (ie. a from a.txt), reference the BaseName property:
foreach($file in Get-ChildItem .\path\ -Filter *.txt){
foreach ($line in $file |Get-Content){
Out-File -FilePath "$($file.BaseName)-done.txt" -Encoding UTF8 -Append -InputObject ($line.Split(","))[0]
}
}
Youre looping through the lines of get-content, not where the filenames are saved. You need probably an extra step e.g.
$items = Get-ChildItem 'C:\Users\Alex\Desktop\oop'
foreach ($item in $items) {
<#your processin with get-content here#>
echo $item.name
echo "$item-processed.txt"
}
I misunderstodd in the first time. I hope I am right now:
You want so save one done-File per one input file.
The Problem with your code is that you are collection all the content of all Files in your $lines-Variable. And there is no Information about the underlying File(-names) any more.
Instead you have to loop over all the files an handle them seperately.
The solution suggested:
$files = dir *.txt -Exclude *done.txt
foreach ($f in $files) {
Get-Content $f | % {$_.split(',')[0]} |
Out-File ($f.DirectoryName + '\' + $f.Basename + '-done.txt') -Encoding UTF8}
Regards Martin
Here's an example of how I combine csv files into one big CSV.
$txtFilter = "D:\Temp\*.csv"
$fileOutputSummary = "D:\Temp\Summary.csv"
$list = Get-ChildItem -Path $txtFilter | select FullName
$iItems = $list.Count
$i = 0
ForEach($file in $list){
$i++
Write-Host "Combining ($i of $iItems) `r"
Write-Progress -Activity "Combining CSV files" -PercentComplete ($i / $iItems*100)
Import-Csv -Path $file.FullName | Export-Csv -Path $fileOutputSummary -Append -NoTypeInformation
Sleep 1
} # end ForEach file
I hope my example helps.

Extract timestamp from filename and sort

I'm trying to look through each item in a folder and add each item to an array sorted by the datestamp in the filename.
For example, I have three files:
myfile_20150813_040949.txt
myfile_20150812_030949.txt
myfile_20150812_010949.txt
I'm not sure how to parse out the time from each and add them to an array in ascending order. Any help would be appreciated.
I am assuming that you are looking to sort the files by the parsed timestamp that is pulled from the file name with this example. It may not the be the best RegEx approach, but it works in testing.
#RegEx pattern to parse the timestamps
$Pattern = '.*_(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})\.txt'
$List = New-Object System.Collections.ArrayList
$Temp = New-Object System.Collections.ArrayList
Get-ChildItem | ForEach {
#Make sure the file matches the pattern
If ($_.Name -match $Pattern) {
Write-Verbose "Add $($_.Name)" -Verbose
$Date = $Matches[2],$Matches[3],$Matches[1] -join '/'
$Time = $Matches[4..6] -join ':'
[void]$Temp.Add(
(New-Object PSObject -Property #{
Date =[datetime]"$($Date) $($Time)"
File = $_
}
))
}
}
#Sort the files by the parsed timestamp and add to the main list
$List.AddRange(#($Temp | Sort Date | Select -Expand File))
#Clear out the temp collection
$Temp.Clear()
#Display the results
$List
What you could be doing for this is using the string method .Split() with the [datetime] method of TryParseExact(). Go though each file and add a property for the "FromFileDate" and then sort on that.
$path = "C:\temp"
Get-ChildItem -Filter "*.txt" -Path $path | ForEach-Object{
$date = ($_.BaseName).Split("_",2)[1]
$result = New-Object DateTime
if([datetime]::TryParseExact($date,"yyyyMMdd_hhmmss",[System.Globalization.CultureInfo]::InvariantCulture,[System.Globalization.DateTimeStyles]::None,[ref]$result)){
# This is a good date
Add-Member -InputObject $_ -MemberType NoteProperty -Name "FromFileDate" -Value $result -PassThru
} Else {
# Could not parse date from filename
Add-Member -InputObject $_ -MemberType NoteProperty -Name "FromFileDate" -Value "Could not Parse" -PassThru
}
} | Select-Object Name,fromfiledate | Sort-Object fromfiledate
We take the basename of the each text file and split it into 2 parts from the first underscore. Using TryParseExact we then attempt to convert the "date" string to the format of "yyyyMMdd_hhmmss". Since we use TryParseExact if we have trouble parsing the date then the code will continue.
Sample Output
Name FromFileDate
---- ------------
myfile_20150812_030949.txt 8/12/2015 3:09:49 AM
myfile_20150813_040949.txt 8/13/2015 4:09:49 AM
files.txt Could not Parse
If you didn't want the erroneous data in the output a simple Where-Object{$_.fromfiledate -is [datetime]} would remove those entries.

How do I filter directories with powershell on the amount of files contained

I am having issues finding the correct syntax I need to filter my results on only listing directories with a file count of above a specified amount (600 in my case).
This is my code so far;
$server_dir= "D:\backup"
$export_dir= "C:\support\spcount.txt"
if($server_dir)
{
$folders = Get-ChildItem $server_dir
$output = #()
foreach($folder in $folders)
{
$fname = $folder.Name
$fpath = $folder.FullName
$fcount = Get-ChildItem $fpath | Measure-Object | Select-Object -Expand Count
$obj = New-Object psobject -Property #{FolderName = $fname; FileCount = $fcount} | Format-List;
$output += $obj
}
#Output
$output | Tee-Object -FilePath $export_dir | Format-list FileCount
}
And I am getting positive results with this, it is listing all Child Items within the backup dir however I need to filter this to only display and out too text format IF the directory contains 600 or more files.
Can anybody help me please?
I am fairly new too powershell so please pull me up if this code is not the greatest, I am forever wanting too learn.
Thanks!
I think I found the issue. It's that Format-List statement at the end of your object creation statement. It pipes the newly created object through Format-List, and thus transforms it into something else.
$obj = New-Object psobject -Property #{FolderName = $fname; FileCount = $fcount} | Format-List
So if you remove that last bit, you'll get the object you expect
$obj = New-Object psobject -Property #{FolderName = $fname; FileCount = $fcount}
So when you use the where statement to filter, you'll actually have a FileCount property to filter on.
I detected it by running the $output through Get-Member which showed me it wasn't the object with the expected properties.
So basically, here's your code, including fixes:
if($server_dir)
{
# *** Added the -directory flag, cause we don't need those pesky files ***
$folders = Get-ChildItem $server_dir -directory
$output = #()
foreach($folder in $folders)
{
$fname = $folder.Name
$fpath = $folder.FullName
$fcount = Get-ChildItem $fpath | Measure-Object | Select-Object -Expand Count
# *** Format-List was dropped here to avoid losing the objects ***
$obj = New-Object psobject -Property #{FolderName = $fname; FileCount = $fcount}
$output += $obj
}
# *** And now the filter and we're done ***
$output | where -Property FileCount -ge 600 | Tee-Object -FilePath $export_dir | Format-list FileCount
}
Note also the -directory to get only folders with get-childitem, and the -ge 600 (greater than or equal) instead of -gt 599 which is just a bit more obvious.
Remember that the Format-* statements actually transform the data passed through them. So you should only use those at the end of the pipeline to show data on screen or dump it to a file.
Don't use it to transform the data you still want to work with later on.
So in short you could do something like this to get that information.
Get-ChildItem C:\temp -Directory |
Select Name,#{Label="Count";Expression={(Get-Childitem $_ -file -Recurse).Count}} |
Where-Object{$_.Count -lt 10}
Let see if we can incorporate that in your code. Your if statement is also kind of pointless. Your variable contains a non-null \ non-zerolength string so it will always be True. You want it to work if the directory exists I imagine.
$server_dir= "D:\backup"
$export_dir= "C:\support\spcount.txt"
if(Test-Path $server_dir){
Get-ChildItem C:\temp -Directory |
Select Name,#{Label="Count";Expression={(Get-Childitem $_ -file -Recurse).Count}} |
Where-Object{$_.Count -lt 10} |
ConvertTo-Csv | Tee -File $export_dir | ConvertFrom-Csv
} Else {
Write-Warning "$server_dir does not exist."
}
Just working on getting this to file and screen with Tee just a moment.
I see 2 ways to do this.
Filter it in your output like this:
$output | where -property FileCount -gt 599 | # ... your code to write to the output
Or not store it in the output array if it doesn't match the condition:
if ($fcount -gt 599) {
$obj = New-Object psobject -Property #{FolderName = $fname; FileCount = $fcount} | Format-List;
$output += obj
}

Resources