I have a folder A with 75000 files which are to be processed. I have 4 folders (A,B,C,D) alongside it which can process 3000 files at a time.
I want a script to take 3000 files from A and put it in B. It should then take another 3000 files and put in C, then D and finally E
Below is the code I have so far. this takes 10 files and moves them into B, but then it just sits forever without putting any files into C,D or E.
Is there a way to quit out of the EnumerateFiles section of code? I just want the first X files it finds to get moved, I don't care about how many files are in A.
Any idea?
$dirBase = "\\networkDir\A\"
$dirProc1 = "\\networkDir\B\"
$dirProc2 = "\\networkDir\C\"
$dirProc3 = "\\networkDir\D\"
$dirProc4 = "\\networkDir\E\"
cd $dirBase
$directoryInfo1 = Get-ChildItem $dirProc1 | Measure-Object
$directoryInfo2 = Get-ChildItem $dirProc2 | Measure-Object
$directoryInfo3 = Get-ChildItem $dirProc3 | Measure-Object
$directoryInfo4 = Get-ChildItem $dirProc4 | Measure-Object
if ($directoryInfo1.count -eq 0) {
MoveFiles $dirBase $dirProc1
}
if ($directoryInfo2.count -eq 0) {
MoveFiles $dirBase $dirProc2
}
if ($directoryInfo3.count -eq 0) {
MoveFiles $dirBase $dirProc3
}
if ($directoryInfo4.count -eq 0) {
MoveFiles $dirBase $dirProc4
}
function MoveFiles([string]$srcDir, [string]$dest)
{
$FileLimit = 10
$Counter = 0
[IO.Directory]::EnumerateFiles($srcDir) | Where-Object {$Counter -lt $FileLimit} | %{
#Get-ChildItem $srcDir | Select-Object -first $FileLimit | %{
Move-Item $_ -destination $dest
$Counter++
}
}
Get-ChildItem $dirProc1 | select -first 3000
?
Related
This code returns unique and shared lines between two files. Unfortunately, it runs forever if the files have 1 million lines. Is there a faster way to do this (e.g., -eq, -match, wildcard, Compare-Object) or containment operators are the optimal approach?
$afile = Get-Content (Read-Host "Enter 'A' file")
$bfile = Get-Content (Read-Host "Enter 'B' file")
$afile |
? { $bfile -notcontains $_ } |
Set-Content lines_ONLY_in_A.txt
$bfile |
? { $afile -notcontains $_ } |
Set-Content lines_ONLY_in_B.txt
$afile |
? { $bfile -contains $_ } |
Set-Content lines_in_BOTH_A_and_B.txt
As mentioned in my answer to a previous question of yours, -contains is a slow operation, particularly with large arrays.
For exact matches you could use Compare-Object and discriminate the output by side indicator:
Compare-Object $afile $bfile -IncludeEqual | ForEach-Object {
switch ($_.SideIndicator) {
'<=' { $_.InputObject | Add-Content 'lines_ONLY_in_A.txt' }
'=>' { $_.InputObject | Add-Content 'lines_ONLY_in_B.txt' }
'==' { $_.InputObject | Add-Content 'lines_in_BOTH_A_and_B.txt' }
}
}
If that's still too slow try reading each file into a hashtable:
$afile = Get-Content (Read-Host "Enter 'A' file")
$ahash = #{}
$afile | ForEach-Object {
$ahash[$_] = $true
}
and process the files like this:
$afile | Where-Object {
-not $bhash.ContainsKey($_)
} | Set-Content 'lines_ONLY_in_A.txt'
If that still doesn't help you need to identify the bottleneck (reading the files, comparing the data, doing multiple comparisons, ...) and proceed from there.
try this:
$All=#()
$All+= Get-Content "c:\temp\a.txt" | %{[pscustomobject]#{Row=$_;File="A"}}
$All+= Get-Content "c:\temp\b.txt" | %{[pscustomobject]#{Row=$_;File="B"}}
$All | group row | %{
$InA=$_.Group.File.Contains("A")
$InB=$_.Group.File.Contains("B")
if ($InA -and $InB)
{
$_.Group.Row | select -unique | Out-File c:\temp\lines_in_A_And_B.txt -Append
}
elseif ($InA)
{
$_.Group.Row | select -unique | Out-File c:\temp\lines_Only_A.txt -Append
}
else
{
$_.Group.Row | select -unique | Out-File c:\temp\lines_Only_B.txt -Append
}
}
Full code for the best option (#ansgar-wiechers). A unique, B unique, and A,B shared lines:
$afile = Get-Content (Read-Host "Enter 'A' file")
$ahash = #{}
$afile | ForEach-Object {
$ahash[$_] = $true
}
$bfile = Get-Content (Read-Host "Enter 'B' file")
$bhash = #{}
$bfile | ForEach-Object {
$bhash[$_] = $true
}
$afile | Where-Object {
-not $bhash.ContainsKey($_)
} | Set-Content 'lines_ONLY_in_A.txt'
$bfile | Where-Object {
-not $ahash.ContainsKey($_)
} | Set-Content 'lines_ONLY_in_B.txt'
$afile | Where-Object {
$bhash.ContainsKey($_)
} | Set-Content 'lines_in _BOTH_A_and_B.txt'
Considering my suggestion to do a binary search, I have created a reusable Search-SortedArray function for this:
Description
The Search-SortedArray (alias Search) (binary) searches a string in a sorted array. If the string is found, the index of the found string in the array is returned. Otherwise, if the string is not found, a $Null is returned.
Function Search-SortedArray ([String[]]$SortedArray, [String]$Find, [Switch]$CaseSensitive) {
$l = 0; $r = $SortedArray.Count - 1
While ($l -le $r) {
$m = [int](($l + $r) / 2)
Switch ([String]::Compare($find, $SortedArray[$m], !$CaseSensitive)) {
-1 {$r = $m - 1}
1 {$l = $m + 1}
Default {Return $m}
}
}
}; Set-Alias Search Search-SortedArray
$afile |
? {(Search $bfile $_) -eq $Null} |
Set-Content lines_ONLY_in_A.txt
$bfile |
? {(Search $afile $_) -eq $Null} |
Set-Content lines_ONLY_in_B.txt
$afile |
? {(Search $bfile $_) -ne $Null} |
Set-Content lines_in_BOTH_A_and_B.txt
Note 1: Due to the overhead, a binary search will only give advantage with (very) large arrays.
Note 2: The array has to be sorted otherwise the result will be unpredictable.
Nate 3: The search doesn't account for duplicates. In case of duplicate values, just one index will be returned (which isn't a concern for this specific question)
Added 2017-11-07 based on the comment from #Ansgar Wiechers:
Quick benchmark with 2 files with a couple thousand lines each (including duplicate lines): binary search: 2400ms; compare-object: 1850ms; hashtable lookup: 250ms
The idea is that the binary search will take its advantage on the long run: the larger the arrays the more it will proportional gain performance.
Taken $afile |? { $bfile -notcontains $_ } as an example, the performance measurements in the comment and that “a couple thousand lines” is 3000 lines:
For a standard search, you will need an average of 1500 iterations in the $bfile:*1
(3000 + 1) / 2 = 3001 / 2 = 1500
For a binary search, you will need an average of 6.27 iterations in the $bfile:
(log2 3000 + 1) / 2 = (11.55 + 1) / 2 = 6.27
In both situations you do this 3000 times (for each item in $afile)
This means that each single iteration takes:
For a standard search: 250ms / 1500 / 3000 = 56 nanoseconds
For a binary search: 2400ms / 6.27 / 3000 = 127482 nanoseconds
The breakeven point will at about:
56 * ((x + 1) / 2 * 3000) = 127482 * ((log2 x + 1) / 2 * 3000)
Which is (according my calculations) at about 40000 entries.
*1 presuming that a hashtable lookup doesn’t do a binary search itself as it is unaware that the array is sorted
Added 2017-11-07
Conclusion from the comments: Hash tables appear to have a similar associative array algorithms that can't be outperformed with low-level programming commands.
all.
I'm stuck. I have a PowerShell script which looks to a specific folder for files which are older than 30 days from the last modified date (additionally, it'll create the folder if it doesn't exist). It creates the folder, it gives me the total files, it'll list all of the files in a test query, but it won't actually count the number of 30+ day old files. I've tried several methods to get this count (some deriving from other solutions to delete old files from this site), but PowerShell just doesn't want to do it.
Here's my code so far...
$HomePath = $env:USERPROFILE
$CompanyFolder = "\Company"
$TimeSensativeFolder = "\TimeSensative"
$TimeSensativePath = $HomePath+$CompanyFolder+$TimeSensativeFolder
$OldFilesAmount = 0
$TotalFilesAmount = 0
$TimeLimit = (Get-Date).AddDays(-30)
$StatusOK = "No old files were found in the time sensative folder."
$StatusCreated = "The time sensative folder was created."
$StatusError1 = "There were old files found in the time sensative folder!"
$StatusError2 = "Unable to create the time sensative folder!"
function MakeTimeSensativeFolder ($TimeSensativePath) {
try {
md $TimeSensativePath -Force -ErrorAction Stop
Write-Host $StatusCreated
}
catch {
Write-Host $StatusError2
}
}
function CountOldFiles () {
$OldFilesAmount = $OldFilesAmount + 1
}
if(!(Test-Path $TimeSensativePath -PathType Container)) {
MakePHIFolder $TimeSensativePath
}
else {
}
try {
$TotalFilesAmount = (Get-ChildItem $PHIPath -Recurse -File | Measure-Object).Count
# I've tried this...
Get-Item $PHIPath | Foreach {$_.LastWriteTime} -ErrorAction Stop
if (Get-Content $_.LastWriteTime | Where-Object {$_ -gt $TimeLimit}) {
CountOldFiles
}
# And I've tried this...
Get-ChildItem -Path $PHIPath -Recurse -File | Foreach-Object {
if (Get-Content $_.LastWriteTime | Where-Object {$_ -gt $TimeLimit}) {
CountOldFiles
}
}
# I've even tried this...
Get-ChildItem $PHIPath -Recurse -File | ? {
-not $_.PSIsContainer -and $_.LastWriteTime -lt $TimeLimit
} | CountOldFiles
# And this, as well...
Get-ChildItem -Path $PHIPath -Recurse -File | Where-Object {$_.LastWriteTime -gt $TimeLimit} | CountOldFiles
}
catch {
MakeTimeSensativeFolder $TimeSensativePath
}
# Used for testing.
<#
Get-ChildItem $TimeSensativePath -Recurse -File
Write-Host "TimeSensative folder exists:" $TimeSensativePathExists
Write-Host "Home TimeSensative path:" $TimeSensativePath
Write-Host "Old files found:" $OldFilesAmount
Write-Host "Total files found:" $TotalFilesAmount
Exit
#>
# Determining proper grammar for status message based on old file count.
if ($OldFilesAmount -eq 1) {
$StatusError1 = "There was "+$OldFilesAmount+" old file of "+$TotalFilesAmount+" total found in the PHI folder!"
}
if ($OldFilesAmount -gt 1) {
$StatusError1 = "There were "+$OldFilesAmount+" old files of "+$TotalFilesAmount+" total found in the PHI folder!"
}
# Give statuses.
if ($OldFilesAmount -gt 0) {
Write-Host $StatusError1
}
else {
Write-Host $StatusOK
}
Depending on which I tried, I would get no result or I'd get something like this:
Get-Content : Cannot find drive. A drive with the name '12/22/2016 17' does not exist.
At C:\Users\nobody\Scripts\PS1\ts_file_age.ps1:54 char:14
+ if (Get-Content $_.LastWriteTime | Where-Object {$_ -gt $Tim ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (12/22/2016 17:String) [Get-Content], DriveNotFoundException
+ FullyQualifiedErrorId : DriveNotFound,Microsoft.PowerShell.Commands.GetContentCommand
In any instance, there's no old file count as I'm endeavoring to demand.
It's been a bit of a head scratcher. Any advice?
Thanks so much in advance!
Filtering files with last write time is easy enough. Like so,
$allFiles = gci
$d = (Get-Date).adddays(-30)
$newFiles = #()
$oldFiles = #()
$allFiles | % { if ($_.lastwritetime -ge $d) { $newFiles +=$_ } else { $oldFiles += $_ } }
What's done here is that first all the files are set in a collection. This isn't mandatory, but one can browse the collection to check that it's been populated properly. This is useful in cases one has complex paths or exclusion filters.
The second step is just to get a DateTime that is used later to divide files into old and new ones. Just like the sample did, so nothing interesting here. Actually, there's one little thing. The date is -30 days, but hours, minutes and seconds are based on current time. So if there's really tight limit, consider using midnight time ([datetime]::Today).AddDays(-30)
The third step is to declare two empty collections for new and old files.
The last step is to iterate through the $allFiles and check the last write time. If it's greater or equal to the cutpoint, add it into $newFiles, othervise $OldFiles.
After the last step, further processing should be simple enough.
This is what I do to get (delete in this case) files older than X days:
$Days = 5
$limit = (Get-Date).AddDays(-$Days)
$CurrentDate = Get-Date
#This will delete all files older than 5 days
Get-ChildItem -Path $Workdir -Recurse -Force | Where-Object { !$_.PSIsContainer -and $_.LastWriteTime -lt $limit } | Remove-Item -Force
Either Powershell or batch script will work. I want to distribute every N number of files from directory A to directory B1, B2, B3, etc.
Example:
C:\a (has 9 .jpg files)
file1.jpg
file2.jpg
...
file9.jpg
then c:\b1, C:\b2, C:\b3 should have 3 files each. it should create directories C:\b* as well.
So far I came up with this code, works fine but copies ALL the files from directory A to directory B:
$sourceFolder = "C:\a"
$destinationFolder = "C:\b"
$maxItems = 9
Get-Childitem $sourceFolder\*.jpg | ForEach-Object {Select-Object -First $maxItems | Robocopy $sourceFolder $destinationFolder /E /MOV}
This also works, will calculate how many new folders should be created.
$excludealreadycopieditems = #()
$sourcefolder = "C:\a"
$destinationFolder = "C:\b"
$maxitemsinfolder = 3
#Calculate how many folders should be created:
$folderstocreate = [math]::Ceiling((get-childitem $sourcefolder\*.jpg).count / $maxitemsinfolder)
#For loop for the proces
for ($i = 1; $i -lt $folderstocreate + 1; $i++)
{
#Create the new folders:
New-Item -ItemType directory $destinationFolder$i
#Copy the items (if moving in stead of copy use Move-Item)
get-childitem $sourcefolder\*.jpg -Exclude $excludealreadycopieditems | sort-object name | select -First $maxitemsinfolder | Copy-Item -Destination $destinationFolder$i ;
#Exclude the already copied items:
$excludealreadycopieditems = $excludealreadycopieditems + (get-childitem $destinationFolder$i\*.jpg | select -ExpandProperty name)
}
Something like this should do:
$cnt = 0
$i = 1
Get-ChildItem "$sourceFolder\*.jpg" | % {
if ($script:cnt -ge $maxItems) {
$script:i++
$script:cnt = 0
}
$dst = "$destinationFolder$script:i"
if (-not (Test-Path -LiteralPath $dst)) {
New-Item $dst -Type Directory | Out-Null
}
Copy-Item $_.FullName $dst
$script:cnt++
}
I have a PowerShell 2.0 script that I use to delete folders that have no files in them:
dir 'P:\path\to\wherever' -recurse | Where-Object { $_.PSIsContainer } | Where-Object { $_.GetFiles().Count -eq 0 } | foreach-object { remove-item $_.fullname -recurse}
However, I noticed that there were a ton of errors when running the script. Namely:
Remove-Item : Directory P:\path\to\wherever cannot be removed because it is not empty.
"WHAT?!" I panicked. They should all be empty! I filter for only empty folders! Apparently that's not quite how the script is working. In this scenario a folder that has only folders as children, but files as grandchildren is considered empty of files:
Folder1 (no files - 1 folder) \ Folder 2 (one file)
In that case, PowerShell sees Folder1 as being empty and tries to delete it. The reason this puzzles me is because if I right-click on Folder1 in Windows Explorer It says that Folder1 has 1 folder and 1 file within it. Whatever is used to calculate the child objects underneath Folder1 from within Explorer allows it to see grandchild objects ad infinitum.
Question:
How can I make my script not consider a folder empty if it has files as grandchildren or beyond?
Here's a recursive function I used in a recent script...
function DeleteEmptyDirectories {
param([string] $root)
[System.IO.Directory]::GetDirectories("$root") |
% {
DeleteEmptyDirectories "$_";
if ([System.IO.Directory]::GetFileSystemEntries("$_").Length -eq 0) {
Write-Output "Removing $_";
Remove-Item -Force "$_";
}
};
}
DeleteEmptyDirectories "P:\Path\to\wherever";
Updating for recursive deletion:
You can use a nested pipeline like below:
dir -recurse | Where {$_.PSIsContainer -and `
#(dir -Lit $_.Fullname -r | Where {!$_.PSIsContainer}).Length -eq 0} |
Remove-Item -recurse -whatif
(from here - How to delete empty subfolders with PowerShell?)
Add a ($_.GetDirectories().Count -eq 0) condition too:
dir path -recurse | Where-Object { $_.PSIsContainer } | Where-Object { ($_.GetFiles().Count -eq 0) -and ($_.GetDirectories().Count -eq 0) } | Remove-Item
Here is a more succinct way of doing this though:
dir path -recurse | where {!#(dir -force $_.fullname)} | rm -whatif
Note that you do not need the Foreach-Object while doing remove item. Also add a -whatif to the Remove-Item to see if it is going to do what you expect it to.
There were some issues in making this script, one of them being using this to check if a folder is empty:
{!$_.PSIsContainer}).Length -eq 0
However, I discovered that empty folders are not sized with 0 but rather NULL. The following is the PowerShell script that I will be using. It is not my own. Rather, it is from PowerShell MVP Richard Siddaway. You can see the thread that this function comes from over at this thread on PowerShell.com.
function remove-emptyfolder {
param ($folder)
foreach ($subfolder in $folder.SubFolders){
$notempty = $false
if (($subfolder.Files | Measure-Object).Count -gt 0){$notempty = $true}
if (($subFolders.SubFolders | Measure-Object).Count -gt 0){$notempty = $true}
if ($subfolder.Size -eq 0 -and !$notempty){
Remove-Item -Path $($subfolder.Path) -Force -WhatIf
}
else {
remove-emptyfolder $subfolder
}
}
}
$path = "c:\test"
$fso = New-Object -ComObject "Scripting.FileSystemObject"
$folder = $fso.GetFolder($path)
remove-emptyfolder $folder
You can use a recursive function for this. I actually have already written one:
cls
$dir = "C:\MyFolder"
Function RecurseDelete()
{
param (
[string]$MyDir
)
IF (!(Get-ChildItem -Recurse $mydir | Where-Object {$_.length -ne $null}))
{
Write-Host "Deleting $mydir"
Remove-Item -Recurse $mydir
}
ELSEIF (Get-ChildItem $mydir | Where-Object {$_.length -eq $null})
{
ForEach ($sub in (Get-ChildItem $mydir | Where-Object {$_.length -eq $null}))
{
Write-Host "Checking $($sub.fullname)"
RecurseDelete $sub.fullname
}
}
ELSE
{
IF (!(Get-ChildItem $mydir))
{
Write-Host "Deleting $mydir"
Remove-Item $mydir
}
}
}
IF (Test-Path $dir) {RecurseDelete $dir}
How can I get a du-ish analysis using PowerShell? I'd like to periodically check the size of directories on my disk.
The following gives me the size of each file in the current directory:
foreach ($o in gci)
{
Write-output $o.Length
}
But what I really want is the aggregate size of all files in the directory, including subdirectories. Also I'd like to be able to sort it by size, optionally.
There is an implementation available at the "Exploring Beautiful Languages" blog:
"An implementation of 'du -s *' in Powershell"
function directory-summary($dir=".") {
get-childitem $dir |
% { $f = $_ ;
get-childitem -r $_.FullName |
measure-object -property length -sum |
select #{Name="Name";Expression={$f}},Sum}
}
(Code by the blog owner: Luis Diego Fallas)
Output:
PS C:\Python25> directory-summary
Name Sum
---- ---
DLLs 4794012
Doc 4160038
include 382592
Lib 13752327
libs 948600
tcl 3248808
Tools 547784
LICENSE.txt 13817
NEWS.txt 88573
python.exe 24064
pythonw.exe 24576
README.txt 56691
w9xpopen.exe 4608
I modified the command in the answer slightly to sort descending by size and include size in MB:
gci . |
%{$f=$_; gci -r $_.FullName |
measure-object -property length -sum |
select #{Name="Name"; Expression={$f}},
#{Name="Sum (MB)";
Expression={"{0:N3}" -f ($_.sum / 1MB) }}, Sum } |
sort Sum -desc |
format-table -Property Name,"Sum (MB)", Sum -autosize
Output:
PS C:\scripts> du
Name Sum (MB) Sum
---- -------- ---
results 101.297 106217913
SysinternalsSuite 56.081 58805079
ALUC 25.473 26710018
dir 11.812 12385690
dir2 3.168 3322298
Maybe it is not the most efficient method, but it works.
If you only need the total size of that path, one simplified version can be,
Get-ChildItem -Recurse ${HERE_YOUR_PATH} | Measure-Object -Sum Length
function Get-DiskUsage ([string]$path=".") {
$groupedList = Get-ChildItem -Recurse -File $path | Group-Object directoryName | select name,#{name='length'; expression={($_.group | Measure-Object -sum length).sum } }
foreach ($dn in $groupedList) {
New-Object psobject -Property #{ directoryName=$dn.name; length=($groupedList | where { $_.name -like "$($dn.name)*" } | Measure-Object -Sum length).sum }
}
}
Mine is a bit different; I group all of the files on directoryname, then walk through that list building totals for each directory (to include the subdirectories).
Building on previous answers, this will work for those that want to show sizes in KB, MB, GB, etc., and still be able to sort by size. To change units, just change "MB" to desired units in both "Name=" and "Expression=". You can also change the number of decimal places to show (rounding), by changing the "2".
function du($path=".") {
Get-ChildItem $path |
ForEach-Object {
$file = $_
Get-ChildItem -File -Recurse $_.FullName | Measure-Object -Property length -Sum |
Select-Object -Property #{Name="Name";Expression={$file}},
#{Name="Size(MB)";Expression={[math]::round(($_.Sum / 1MB),2)}} # round 2 decimal places
}
}
This gives the size as a number not a string (as seen in another answer), therefore one can sort by size. For example:
PS C:\Users\merce> du | Sort-Object -Property "Size(MB)" -Descending
Name Size(MB)
---- --------
OneDrive 30944.04
Downloads 401.7
Desktop 335.07
.vscode 301.02
Intel 6.62
Pictures 6.36
Music 0.06
Favorites 0.02
.ssh 0.01
Searches 0
Links 0
My own take using the previous answers:
function Format-FileSize([int64] $size) {
if ($size -lt 1024)
{
return $size
}
if ($size -lt 1Mb)
{
return "{0:0.0} Kb" -f ($size/1Kb)
}
if ($size -lt 1Gb)
{
return "{0:0.0} Mb" -f ($size/1Mb)
}
return "{0:0.0} Gb" -f ($size/1Gb)
}
function du {
param(
[System.String]
$Path=".",
[switch]
$SortBySize,
[switch]
$Summary
)
$path = (get-item ".").FullName
$groupedList = Get-ChildItem -Recurse -File $Path |
Group-Object directoryName |
select name,#{name='length'; expression={($_.group | Measure-Object -sum length).sum } }
$results = ($groupedList | % {
$dn = $_
if ($summary -and ($path -ne $dn.name)) {
return
}
$size = ($groupedList | where { $_.name -like "$($dn.name)*" } | Measure-Object -Sum length).sum
New-Object psobject -Property #{
Directory=$dn.name;
Size=Format-FileSize($size);
Bytes=$size`
}
})
if ($SortBySize)
{ $results = $results | sort-object -property Bytes }
$results | more
}