For PowerShell 2.0 in Win 2008,
I need to check what's the newest file in a directory with about 1.6 million files.
I know I can use Get-ChildItem like so:
$path="G:\Calls"
$filter='*.wav'
$lastFile = Get-ChildItem -Recurse -Path $path -Include $filter | Sort-Object -Property LastWriteTime | Select-Object -Last 1
$lastFile.Name
$lastFile.LastWriteTime
The issue is that it takes sooooo long to find the newest file due to the sheer amount of files.
Is there a faster way to find that?
Sort-Object is known to be slow as it aggregates over each item combination.
But you don't need to do that as you might just go over each file and keep track of the latest one:
Get-ChildItem -Recurse |ForEach-Object `
-Begin { $Newest = $Null } `
-Process { if ($_.LastWriteTime -gt $Newest.LastWriteTime) { $Newest = $_ } } `
-End { $Newest }
there are a couple of things that can be done to improve performance.
First, use -Filter rather than -Include because the filter is passed to the underlying Win32API which will be a bit faster.
Also, because the script gathers all the files and then sorts them, you might be creating a very large memory footprint during the sorting phase. I don't know if it's possible to query the MFT or some other process which avoids retrieving each file and inspecting the lastwritetime, but an alternative approach could be:
gci -rec -file -filter *.wav | %{$v = $null}{if ($_.lastwritetime -gt $v.lastwritetime){$v=$_}}{$v}
I tried this with all files and saw the following:
measure-command{ ls -rec -file |sort lastwritetime|select -last 1}
. . .
TotalSeconds : 142.1333641
vs
measure-command { gci -rec -file | %{$v = $null}{if ($_.lastwritetime -gt $v.lastwritetime){$v=$_}}{$v} }
. . .
TotalSeconds : 87.7215093
which is a pretty good savings. There may be additional ways to improve performance
Related
I have a file transfer/sync job that is copying files from the main network into a totally secure network using a custom protocol (ie no SMB). The problem is that because I can't look back to see what files exist, the destination is filling up, as the copy doesn't remove any files it hasn't touched (like robocopy MIR does).
Initailly I wrote a script that:
1. Opens the log file and grabs the file paths out (this is quite quick and painless)
2. Does a Get-ChildItem on the destination folder (now using dir /s /b as it's way faster than gci)
3. Compared the two, and then removed the differences.
The problem is that there are more jobs that require this clean up but the log files are 100MB and the folders contain 600,000 files, so it's taking ages and using tons of memory. I actually have yet to see one finish. I'd really like some ideas on how to make this faster (memory/cpu use doesn't bother me too much but speed is essential.
$destinationMatch = "//server/fileshare/folder/"
the log file contains some headers and footers and then 600,000 lines like this one:
"//server/fileshare/folder/dummy/deep/tags/20140826/more_stuff/Deeper/2012-07-02_2_0.dat_v2" 33296B 0B completed
Here's the script:
[CmdletBinding(SupportsShouldProcess=$True)]
param(
[Parameter(Mandatory=$True)]
[String]$logName,
[Parameter(Mandatory=$True)]
[String]$destinationMatch
)
$logPath = [string]("C:\Logs\" + $logName)
$manifestFile = gci -Path $logPath | where {$_.name -match "manifest"} | sort creationtime -descending | select Name -first 1
$manifestFileName = [string]$manifestFile.name
$manifestFullPath = $logPath + "\" + $manifestFileName
$copiedList = #()
(gc $manifestFullPath -ReadCount 0) | where {$_.trim() -match $DestinationMatch} | % {
if ( $_ -cmatch '(?<=")[^"]*(?=")' ){
$copiedList += ($matches[0]).replace("/","\")
}
}
$dest = $destinationMatch.replace("/","\")
$actualPathString = (gci -Path $dest -Recurse | select fullname).fullnameCompare-Object -ReferenceObject $copiedList -DifferenceObject $actualPathString -PassThru | % {
$leaf = Split-Path $_ -leaf
if ($leaf.contains(".")){
$fsoData = gci -Path $_
if (!($fsoData.PSIsContainer)){
Remove-Item $_ -Force
}
}
}
$actualDirectory | where {$_.PSIsContainer -and #(gci -LiteralPath $_.FullName -Recurse -WarningAction SilentlyContinue -ErrorAction SilentlyContinue | where {!$_.PSIsContainer}).Length -eq 0} | remove-item -Recurse -Force
Ok, so let's assume that your file copy preserves the last modified date/time stamp. If you really need to pull a directory listing, and compare it against a log, I think you're doing a decent job of it. The biggest slow down is obviously going to be pulling your directory listing. I'll address that shortly. For right now I would propose the following modification of your code:
[CmdletBinding(SupportsShouldProcess=$True)]
param(
[Parameter(Mandatory=$True)]
[String]$logName,
[Parameter(Mandatory=$True)]
[String]$destinationMatch
)
$logPath = [string]("C:\Logs\" + $logName)
$manifestFile = gci -Path $logPath | where {$_.name -match "manifest"} | sort creationtime -descending | select -first 1
$RegExPattern = [regex]::escape($DestinationMatch)
$FilteredManifest = gc $manifestfile.FullPath | where {$_ -match "`"($RegexPattern[^`"]*)`""} |%{$matches[1] -replace '/','\'}
$dest = $destinationMatch.replace("/","\")
$DestFileList = gci -Path $dest -Recurse | select Fullname,Attributes
$DestFileList | Where{$FilteredManifest -notcontains $_.FullName -and $_.Attributes -notmatch "Directory"}|Remove-Item $_ -Force
$DestFileList | Where{$FilteredManifest -notcontains $_.FullName -and $_.Attributes -match "Directory" -and (gci -LiteralPath $_ -Recurse -WarningAction SilentlyContinue -ErrorAction SilentlyContinue).Length -eq 0}{Remove-Item $_ -Recurse -Force}
This stops you from duplicating efforts. There's no need to get your manifest file, and then assign different variables to different properties of the file object, just reference them directly. Then later when you pull your directory listing of the drive (the slow part here), keep the full name and attributes of the files/folders. That way you can easily filter against Attributes to see what's a directory and what not, so we can deal with files first, then clean up directories later after the files are cleaned up.
That script should be a bit more streamlined version of yours. Now, about pulling that directory listing... Here's the deal, using Get-ChildItem is going to be slower than some alternatives (such as dir /s /b) but it stops you from having to duplicate efforts by later checking what's a file, and what's a directory. I suppose if the actual files/folders that you are concerned with are a small percentage of the total, then the double work may actually be worth the time and effort to pull the list with something like dir /s /b, and then parse against the log, and only pull folder/file info for the specific items you need to address.
I have multiple machines uploading files to one FTP directory. The first part of the filename is the machine, the rest is a timestamp, e.g. AAAAA_20130312_125113.
Now I want to get a sorted list of all Unique machines that have uploaded to this directory.
I managed to write the lost of all filenames.substring(0,5) to the host but I still don't have the unique machine names.
$files=Get-ChildItem $strMOVETO -Name -Include TAS*.csv -Recurse
ForEach ($i in $files) { Write-Host $i.Substring(0,5) }
Any hints on how to do this? Does not necessary have to be a one liner, although that would be a nice challenge ;-).
Thanks!
What happens when you have an 8-character machine name? Your substring will break. Since the machine name, date & time are delimited by an _, split on that & get the first item.
Get-ChildItem $strMOVETO -recurse -name -include TAS*.csv|%{$_.split("_")[0]}|sort-object -unique
To filter on date as well:
Get-ChildItem $strMOVETO -recurse -include TAS*.csv|where-object{$_.lastwritetime -ge (get-date).adddays(-1)}|%{$_.basename.split("_")[1]}|sort-object -unique
Not tested but something like this:
Get-ChildItem $strMOVETO -Name -Include TAS*.csv -Recurse | % { $_.Name.Substring(0,5) } | Sort -Unique
You don't need to do the Write-Host inside the loop and it's easier to use % instead of a foreach loop.
pipe the results of your command into a | sort -unique
$files=Get-ChildItem $strMOVETO -Name -Include TAS*.csv -Recurse
ForEach ($i in $files) { Write-Host $i.Substring(0,5) } | sort -unique
...but better still would be to simplify the script...
$filter = "TAS*.csv"
Get-ChildItem -Path $strMOVETO -Filter $filter -Recurse | % {$_.BaseName.Substring(0,5) } | sort -unique
Does anybody know a powershell 2.0 command/script to count all folders and subfolders (recursive; no files) in a specific folder ( e.g. the number of all subfolders in C:\folder1\folder2)?
In addition I also need also the number of all "leaf"-folders. in other words, I only want to count folders, which don't have subolders.
In PowerShell 3.0 you can use the Directory switch:
(Get-ChildItem -Path <path> -Directory -Recurse -Force).Count
You can use get-childitem -recurse to get all the files and folders in the current folder.
Pipe that into Where-Object to filter it to only those files that are containers.
$files = get-childitem -Path c:\temp -recurse
$folders = $files | where-object { $_.PSIsContainer }
Write-Host $folders.Count
As a one-liner:
(get-childitem -Path c:\temp -recurse | where-object { $_.PSIsContainer }).Count
To answer the second part of your question, of getting the leaf folder count, just modify the where object clause to add a non-recursive search of each directory, getting only those that return a count of 0:
(dir -rec | where-object{$_.PSIsContainer -and ((dir $_.fullname | where-object{$_.PSIsContainer}).count -eq 0)}).Count
it looks a little cleaner if you can use powershell 3.0:
(dir -rec -directory | where-object{(dir $_.fullname -directory).count -eq 0}).count
Another option:
(ls -force -rec | measure -inp {$_.psiscontainer} -Sum).sum
This is a pretty good starting point:
(gci -force -recurse | where-object { $_.PSIsContainer }).Count
However, I suspect that this will include .zip files in the count. I'll test that and try to post an update...
EDIT: Have confirmed that zip files are not counted as containers. The above should be fine!
Get the path child items with recourse option, pipe it to filter only containers, pipe again to measure item count
((get-childitem -Path $the_path -recurse | where-object { $_.PSIsContainer }) | measure).Count
So, I've got a set of directories 00-99 in a folder. Each of those directories has 100 subdirectories, 00-99. Each of those subdirectories has thousands of images.
What I'm attempting to do is basically get a progress report while it's computing the average file size, but I can't get that to work. Here's my current query:
get-childitem <MyPath> -recurse -filter *.jpeg | Where-Object { Write-Progress "Examining File $($_.Fullname)" true } | measure-object -Property length -Average
This shows me a bar that updates as each of the files gets processed, but at the end I get back no average file size data. Clearly, I'm doing something wrong, because I figure trying to hack the Where-Object to print a progress statement is probably a bad idea(tm).
Since there are millions and millions of images, this query obviously takes a VERY LONG time to work. get-childitem is pretty much going to be the bulk of query time, if I understand things correctly. Any pointers to get what I want? AKA, my result would ideally be:
Starting...
Examining File: \00\00\Sample.jpeg
Examining File: \00\00\Sample2.jpeg
Examining File: \00\00\Sample3.jpeg
Examining File: \00\00\Sample4.jpeg
...
Examining File: \99\99\Sample9999.jpg
Average File Size: 12345678.244567
Edit: I can do the simple option of:
get-childitem <MyPath> -recurse -filter *.jpeg | measure-object -Property length -Average
And then just walk away from my workstation for a day and half or something, but that seems a bit inefficient =/
Something like this?
get-childitem -recurse -filter *.exe |
%{Write-Host Examining file: $_.fullname; $_} |
measure-object -Property length -Average
A little more detailed progress:
$images = get-childitem -recurse -filter *.jpeg
$images | % -begin { $i=0 } `
-process { write-progress -activity "Computing average..." -status "Examining File: $image.fullpath ($i of $($images.count))" -percentcomplete ($i/$images.count*100); $i+=1 } `
-end { write-output "Average file size is: $($images | measure-object -Property length -Average)" }
I am interested in file searching by custom properties. For example, I want to find all JPEG-images with certain dimensions. Something looks like
Get-ChildItem -Path C:\ -Filter *.jpg -Recursive | where-object { $_.Dimension -eq '1024x768' }
I suspect it's about using of System.Drawing. How it can be done?
Thanks in advance
That's actually pretty easy to do and your gut feeling about System.Drawing was in fact correct:
Add-Type -Assembly System.Drawing
$input | ForEach-Object { [Drawing.Image]::FromFile($_) }
Save that as Get-Image.ps1 somewhere in your path and then you can use it.
Another option would be to add the following to your $profile:
Add-Type -Assembly System.Drawing
function Get-Image {
$input | ForEach-Object { [Drawing.Image]::FromFile($_) }
}
which works pretty much the same. Of course, add fancy things like documentation or so as you see fit.
You can then use it like so:
gci -inc *.jpg -rec | Get-Image | ? { $_.Width -eq 1024 -and $_.Height -eq 768 }
Note that you should dispose the objects created this way after using them.
Of course, you can add a custom Dimension property so you could filter for that:
function Get-Image {
$input |
ForEach-Object { [Drawing.Image]::FromFile($_) } |
ForEach-Object {
$_ | Add-Member -PassThru NoteProperty Dimension ('{0}x{1}' -f $_.Width,$_.Height)
}
}
Here's an alternative implementation as a (almost) one-liner:
Add-Type -Assembly System.Drawing
Get-ChildItem -Path C:\ -Filter *.jpg -Recursive | ForEach-Object { [System.Drawing.Image]::FromFile($_.FullName) } | Where-Object { $_.Width -eq 1024 -and $_.Height -eq 768 }
If you are going to need to run this command more than once, I would recommend Johannes' more complete solution instead.