I was trying to measure some ways to write to files in PowerShell. No question about that but I don't understand why the first Measure-Command statement below takes longer to be executed than the 2nd statement.
They are the same but in the second one I write a scriptblock to send to Invoke-Command and in the 1st one I only run the command.
All informations about Invoke-Command speed I can find are about remoting.
This block takes about 4 seconds:
Measure-Command {
$stream = [System.IO.StreamWriter] "$PSScriptRoot\t.txt"
$i = 0
while ($i -le 1000000) {
$stream.WriteLine("This is the line number: $i")
$i++
}
$stream.Close()
} # takes 4 sec
And this code below which is exactly the same but written in a scriptblock passed to Invoke-Command takes about 1 second:
Measure-Command {
$cmdtest = {
$stream = [System.IO.StreamWriter] "$PSScriptRoot\t2.txt"
$i = 0
while ($i -le 1000000) {
$stream.WriteLine("This is the line number: $i")
$i++
}
$stream.Close()
}
Invoke-Command -ScriptBlock $cmdtest
} # Takes 1 second
How is that possible?
As it turns out, based on feedback from a PowerShell team member on GitHub issue #8911, the issue is more generally about (implicit) dot-sourcing (such as direct invocation of an expression) vs. running in a child scope, such as with &, the call operator, or, in the case at hand, with Invoke-Command -ScriptBlock.
Running in a child scope avoids variable lookups that are performed when (implicitly) dot-sourcing.
Therefore, as of Windows PowerShell v5.1 / PowerShell (Core) 7.2.x, you can speed up statements involving script blocks by invoking them via & { ... }, in a child scope (somewhat counter-intuitively, given that creating a new scope involves extra work).
Note that using & means that such blocks then cannot modify the caller's variables directly, but there are workarounds.
The following simplified code, which uses a foreach expression to loop 1 million times (1e6) demonstrates the performance advantage of running via & { ... }:
# REGULAR, direct invocation of an expression (a `foreach` statement in this case),
# which is implicitly DOT-SOURCED
(Measure-Command { $result = foreach ($n in 1..1e6) { $n } }).TotalSeconds
# OPTIMIZED invocation in CHILD SCOPE, using & { ... }
# up to 10+ TIMES FASTER, depending on OS and PowerShell edition
(Measure-Command { $result = & { foreach ($n in 1..1e6) { $n } } }).TotalSeconds
However, note that the performance advantage diminishes and can even go away the more preexisting variables are being referenced in the script block:
# Define a few sample variables to reference in the script blocks.
# Note that, due to PowerShell's dynamic scoping, even the child
# scope created by & { ... } sees these variables.
$i1=1; $i2=2; $i3=3; $i4=4; $i5=5
(Measure-Command { $result = foreach ($n in 1..1e6) { $n, $i1, $i2, $i3, $i4, $i5 } }).TotalSeconds
# MAY OR MAY NOT BE FASTER, depending on the OS and PowerShell edition.
(Measure-Command { $result = & { foreach ($n in 1..1e6) { $n, $i1, $i2, $i3, $i4, $i5 } } }).TotalSeconds
The reason is that variables that aren't created in the script block (by assigning to them inside it) require a variable lookup with & { ... } too, due to PowerShell's dynamic scoping (see this answer).
Related
I want to create a function that take a array of pc and check if a pc name in parameter match with a pc name in the array at index of $i but when i create this function my function has a bug, the bug is $nomPeripherique is empty but i call it with "DESKTOP-QVRFEN4"
$tableauPeripherique= #("DESKTOP-QVRFEN4","DESKTOP-QVRETTD")
function Redemarrer($tableauPeripherique,$nomPeripherique){
for ($i = 0; $i -lt $tableauPeripherique.Length; $i++)
{
if ($tableauPeripherique[$i] -eq $nomPeripherique)
{
Restart-Computer -ComputerName $nomPeripherique
}
else
{
echo "nom de peripherique n'est pas présent dans la liste"
}
}
}
Redemarrer($tableauPeripherique,"DESKTOP-QVRFEN4")
Your are calling the function incorrectly. This is a common mistake for New PowerShell users as in many programming languages the function call has the arguments parenthetically stated after the Function name.
The call should look like:
Redemarrer $tableauPeripherique "DESKTOP-QVRFEN4"
No comma & no parens. The way you are calling it you're passing an entire array as the first argument. As such there's no second argument to compare to within the loop.
Personally I prefer advanced parameters. While there's a lot to know on the topic of Advanced Functions and their parameters there's still a low barrier to entry.
function Redemarrer
{
Param(
[Parameter(Mandatory = $true, Position = 0)]
[String[]]$tableauPeripherique,
[Parameter(Mandatory = $true, Position = 1)]
[String]$nomPeripherique
) # End Param block...
Process{
for ($i = 0; $i -lt $tableauPeripherique.Length; $i++)
{
if ($tableauPeripherique[$i] -eq $nomPeripherique) {
Restart-Computer -ComputerName $nomPeripherique
}
else {
Write-Output "nom de peripherique n'est pas présent dans la liste"
}
}
} # End Process Block...
}
Note: You do not need Write-Output or it's alias echo. More importantly the function will return the string if & when the else block fires. This can be problematic if you are assigning the function return and/or piping to another command. In short if your only intent is to write something to the console use Write-Host instead. However, as you get more advanced there are reasons not to do that either.
Of course this answer is a poor substitute for the topic. One good relatively entry level book on the matter is "Learn PowerShell Scripting in a Month of Lunches" by Don Jones & Jeffrey Hicks. Incidentally Don Jones has a lot to say about Write-Host.
I wrote a quick script to find the percentage of users in one user list (TEMP.txt) that are also in another user list (TEMP2.txt) It worked great for a while until my user lists got up above a couple 100,000 or so... its too slow. I want to convert it to runspace to speed it up, but I am failing miserably. The original script is:
$USERLIST1 = gc .\TEMP.txt
$i = 0
ForEach ($User in $USERLIST1){
If (gc .\TEMP2.txt |Select-String $User -quiet){
$i = $i + 1
}
}
$Count = gc .\TEMP2.txt | Measure-object -Line
$decimal = $i / $count.lines
$percent = $decimal * 100
Write-Host "$percent %"
Sorry I am still new at powershell.
Not sure how much this will help you, I am new with runspaces as well but here is some code I used with a Windows Form running things asynchronously in a separate runspace, you might be able to manipulate it to do what you need:
$Runspace = [Management.Automation.Runspaces.RunspaceFactory]::CreateRunspace($Host)
$Runspace.ApartmentState = 'STA'
$Runspace.ThreadOptions = 'ReuseThread'
$Runspace.Open()
#Add the Form object to the Runspace environment
$Runspace.SessionStateProxy.SetVariable('Form', $Form)
#Create a new PowerShell object (a Thread)
$PowerShellRunspace = [System.Management.Automation.PowerShell]::Create()
#Initializes the PowerShell object with the runspace
$PowerShellRunspace.Runspace = $Runspace
#Add the scriptblock which should run inside the runspace
$PowerShellRunspace.AddScript({
[System.Windows.Forms.Application]::Run($Form)
})
#Open and run the runspace asynchronously
$AsyncResult = $PowerShellRunspace.BeginInvoke()
#End the pipeline of the PowerShell object
$PowerShellRunspace.EndInvoke($AsyncResult)
#Close the runspace
$Runspace.Close()
#Remove the PowerShell object and its resources
$PowerShellRunspace.Dispose()
Apart from runspace concept, next script could run a bit faster:
$USERLIST1 = gc .\TEMP.txt
$USERLIST2 = gc .\TEMP2.txt
$i = 0
ForEach ($User in $USERLIST1) {
if ($USERLIST2.Contains($User)) {
$i += 1
}
}
$Count = $USERLIST2.Count
$decimal = $i / $count
$percent = $decimal * 100
Write-Host "$percent %"
Below I have some code to get the values of instances of performance counters (which are instantiated once a page is visited) and send them to Graphite to display graphs in the following format:
[Path in Graphite (e.g., metric.pages.Counter1)] [value of counter] [epoch time]
To do this I made the following code where the writer is configured correctly to work:
# Get all paths to MultipleInstance counters and averages that start with "BLABLA" and
# put them into an array and get the epoch time
$pathsWithInstances = (get-counter -ListSet BLABLA*) | select -ExpandProperty PathsWithInstances
$epochtime = [int][double]::Parse((Get-Date -UFormat %s))
# This functions splits the path (e.g., \BLABLA Web(welcome)\Page Requests) into three
# parts: the part before the
# opening brace (the CounterCategory, e.g., "\BLABLA Web"), the part in between the braces
# (the page or
# service, e.g., "welcome"), and the part after the closing brace (the name of the test,
# e.g.,
# "\Page Requests"). We obtain the metric out of this information and send it to
# Graphite.
enter code here
foreach ($pathWithInstance in $pathsWithInstances)
{
$instanceProperties = $pathWithInstance.Split('()')
$counterCategory = $instanceProperties[0]
if ($counterCategory -eq ("\BLABLA Web") )
{
# Replace the * with nothing so that counters that are used to display the
# average (e.g., \BLABLAWeb(*)\Page Requests) are displayed on top in the
# Graphite directory.
$pagePath = $instanceProperties[1].Replace('*','')
$nameOfTheTest = $instanceProperties[2]
# Countername which is used in Graphite path gets whitespace and backslash
# removed in the name used for the path in Graphite (naming conventions)
$counterName = $nameOfTheTest.Replace(' ','').Replace('\','')
$pathToPerfCounter = $pathWithInstance
$pathInGraphite = "metrics.Pages." + $pagePath + $counterName
#Invoked like this since otherwise the get-counter [path] does not seem to work
$metricValue = [int] ((Get-Counter "$pathToPerfCounter").countersamples | select -
property cookedvalue).cookedvalue
$metric = ($pathInGraphite + " " + $metricValue + " " + $epochTime)
$writer.WriteLine($metric)
$writer.Flush()
}
}
Unfortunately this code is very slow. It takes about one second for every counter to send a value. Does someone see why it is so slow and how it can be improved?
You're getting one counter at a time, and it takes a second for Get-Counter to get and "Cook" the values. Get-Counter will accept an array of counters, and will sample, "cook" and return them all in that same second. You can speed it up by sampling them all at once, and then parsing the values from the array of results:
$CounterPaths = (
'\\Server1\Memory\Page Faults/sec',
'\\Server1\Memory\Available Bytes'
)
(Measure-Command {
foreach ($CounterPath in $CounterPaths)
{Get-Counter -counter $counterpath}
}).TotalMilliseconds
(Measure-Command {
Get-Counter $CounterPaths
}).TotalMilliseconds
2017.4693
1012.3012
Example:
foreach ($CounterSample in (Get-Counter $CounterPaths).Countersamples)
{
"Path = $($CounterSample.path)"
"Metric = $([int]$CounterSample.CookedValue)"
}
Path = \\Server1\memory\page faults/sec
Metric = 193
Path = \\Server1\memory\available bytes
Metric = 1603678208
Use the Start-Job cmdlet, to create separate threads for each counter.
Here is a simple example of how to take the Counter Paths and pass them into an asynchronous ScriptBlock:
$CounterPathList = (Get-Counter -ListSet Processor).PathsWithInstances.Where({ $PSItem -like '*% Processor Time' });
foreach ($CounterPath in $CounterPathList) {
Start-Job -ScriptBlock { (Get-Counter -Counter $args[0]).CounterSamples.CookedValue; } -ArgumentList $CounterPath;
}
# Call Receive-Job down here, once all jobs are finished
IMPORTANT: The above example uses PowerShell version 4.0's "method syntax" for filtering objects. Please make sure you're running PowerShell version 4.0, or change the Where method to use the traditional Where-Object instead.
[Strawberry Perl v5.16.3, Windows 7 x64, executing via cmd, eg c:\strawberry> perl test.pl 100000]
SYMPTOM: The following code: foreach (1..$ARGV[0]) { foo($_); }, executes roughly 20% slower than if I had included this extra line, before it: my $num = $ARGV[0];
QUESTION: Can anyone help me understand why?
Notice, in the second case, that after I initialize and set $num, I do not then use $num in the loop parameters. Were this the case, I could probably be convinced that repeatedly testing against $ARGV[0] in a forloop is somehow slower than a variable that I define myself... but this is not the case.
To track time, I use: use Time::HiRes; my $time = [Time::HiRes::gettimeofday()]; at the top of my script, and: print "\n1: ", Time::HiRes::tv_interval($time); at the bottom.
Confused!
Thanks,
Michael
EDIT
I am including the entire script, with a comment preceding the offending line... Interestingly, it looks like the time discrepancy is at least partially dependent on my redundant initialization of %h, as well as #chain... This is getting weird.
use Time::HiRes; my $time = [Time::HiRes::gettimeofday()];
#my $max=$ARGV[0];
my %h = (1=>1,89=>89);
$h{1}=1;
$h{89}=89;
my #chain=();
my $ans=0;
sub sum{my $o=0; foreach (#_){$o+=$_}; return $o;}
foreach (1..$ARGV[0]-1){
my $x=$_;
my #chain = ();
while(!exists($h{$x})){
push(#chain,$x);
$x = sum(map {$_**2} split('',$x));
}
foreach (#chain){$h{$_}=$h{$x} if !exists($h{$_});}
}
print "\n1: ", Time::HiRes::tv_interval($time);
foreach (1..$ARGV[0]){$ans++ if ($h{$_}==89);}
print "\n2: ", Time::HiRes::tv_interval($time);
On my system (perl 5.16.3 on GNU/Linux) there is no measurable difference. The standard deviation of the timings is larger than the difference between measurements of different versions.
For each variant of the script, 10 executions were performed. The $ARGV[0] was 3.5E5 in all cases (350000).
Without my $num = $ARGV[0]:
$ perl measure.pl
2.369921 2.38991 2.380969 4.419895 2.398861 2.420928 2.388721 2.368144 2.387212 2.386347
mean: 2.5910908
sigma: 0.609763793801797
With my $num = $ARGV[0]:
$ perl measure.pl
4.435764 2.419485 2.403696 2.401771 2.411345 2.466776 4.408127 2.416889 2.389191 2.397409
mean: 2.8150453
sigma: 0.803721101668365
The measure.pl script:
use strict; use warnings; use 5.016;
use List::Util 'sum';
my #times = map qx/perl your-algorithm.pl 3.5E5/, 1..10;
chomp #times;
say "#times";
say "mean: ", mean(#times);
say "sigma: ", sigma(#times);
sub mean { sum(#_)/#_ }
sub sigma {
my $mean = mean(#_);
my $variance = sum(map { ($_-$mean)**2 } #_) / #_;
sqrt $variance;
}
With your-algorithm.pl being reduced so that only one timing is printed:
foreach (1..$ARGV[0]){$ans++ if ($h{$_}==89);}
print Time::HiRes::tv_interval($time), "\n";
We have a folder on Windows that's ... huge. I ran "dir > list.txt". The command lost response after 1.5 hours. The output file is about 200 MB. It shows there're at least 2.8 million files. I know the situation is stupid but let's focus the problem itself. If I have such a folder, how can I split it to some "manageable" sub-folders? Surprisingly all the solutions I have come up with all involve getting all the files in the folder at some point, which is a no-no in my case. Any suggestions?
Thank Keith Hill and Mehrdad. I accepted Keith's answer because that's exactly what I wanted to do but I couldn't quite get PS working quickly.
With Mehrdad's tip, I wrote this little program. It took 7+ hours to move 2.8 million files. So the initial dir command did finish. But somehow it didn't return to console.
namespace SplitHugeFolder
{
class Program
{
static void Main(string[] args)
{
var destination = args[1];
if (!Directory.Exists(destination))
Directory.CreateDirectory(destination);
var di = new DirectoryInfo(args[0]);
var batchCount = int.Parse(args[2]);
int currentBatch = 0;
string targetFolder = GetNewSubfolder(destination);
foreach (var fileInfo in di.EnumerateFiles())
{
if (currentBatch == batchCount)
{
Console.WriteLine("New Batch...");
currentBatch = 0;
targetFolder = GetNewSubfolder(destination);
}
var source = fileInfo.FullName;
var target = Path.Combine(targetFolder, fileInfo.Name);
File.Move(source, target);
currentBatch++;
}
}
private static string GetNewSubfolder(string parent)
{
string newFolder;
do
{
newFolder = Path.Combine(parent, Path.GetRandomFileName());
} while (Directory.Exists(newFolder));
Directory.CreateDirectory(newFolder);
return newFolder;
}
}
}
I use Get-ChildItem to index my whole C: drive every night into c:\filelist.txt. That's about 580,000 files and the resulting file size is ~60MB. Admittedly I'm on Win7 x64 with 8 GB of RAM. That said, you might try something like this:
md c:\newdir
Get-ChildItem C:\hugedir -r |
Foreach -Begin {$i = $j = 0} -Process {
if ($i++ % 100000 -eq 0) {
$dest = "C:\newdir\dir$j"
md $dest
$j++
}
Move-Item $_ $dest
}
The key is to do the move in a streaming manner. That is, don't collect up all the Get-ChildItem results into a single variable and then proceed. That would require all 2.8 million FileInfos to be in memory at once. Also, if you use the Name parameter on Get-ChildItem it will output a single string containing the file's path relative to the base dir. Even then, perhaps this size will just overwhelm the memory available to you. And no doubt, it will take quite a while to execute. IIRC correctly, my indexing script takes several hours.
If it does work, you should wind up with c:\newdir\dir0 thru dir28 but then again, I haven't tested this script at all so your mileage may vary. BTW this approach assumes that you're huge dir is a pretty flat dir.
Update: Using the Name parameter is almost twice as slow so don't use that parameter.
I found out the GetChildItem is the slowest option when working with many items in a directory.
Look at the results:
Measure-Command { Get-ChildItem C:\Windows -rec | Out-Null }
TotalSeconds : 77,3730275
Measure-Command { listdir C:\Windows | Out-Null }
TotalSeconds : 20,4077132
measure-command { cmd /c dir c:\windows /s /b | out-null }
TotalSeconds : 13,8357157
(with listdir function defined like this:
function listdir($dir) {
$dir
[system.io.directory]::GetFiles($dir)
foreach ($d in [system.io.directory]::GetDirectories($dir)) {
listdir $d
}
}
)
With this in mind, what I would do: I would stay in PowerShell but use more lowlevel approach with .NET methods:
function DoForFirst($directory, $max, $action) {
function go($dir, $options)
{
foreach ($f in [system.io.Directory]::EnumerateFiles($dir))
{
if ($options.Remaining -le 0) { return }
& $action $f
$options.Remaining--
}
foreach ($d in [system.io.directory]::EnumerateDirectories($dir))
{
if ($options.Remaining -le 0) { return }
go $d $options
}
}
go $directory (New-Object PsObject -Property #{Remaining=$max })
}
doForFirst c:\windows 100 {write-host File: $args }
# I use PsObject to avoid global variables and ref parameters.
To use the code you have to switch to .NET 4.0 runtime -- enumerating methods are new in .NET 4.0.
You can specify any scriptblock as -action parameter, so in your case it would be something like {Move-item -literalPath $args -dest c:\dir }.
Just try to list first 1000 items, I hope it will finish very quickly:
doForFirst c:\yourdirectory 1000 {write-host '.' -nonew }
And of course you can process all items at once, just use
doForFirst c:\yourdirectory ([long]::MaxValue) {move-item ... }
and each item should be processed immediately after it is returned. So the whole list is not read at once and then processed, but it is processed during reading.
How about starting with this:
cmd /c dir /b > list.txt
That should get you a list of all the file names.
If you're doing "dir > list.txt" from a powershell prompt, get-childitem is aliased as "dir". Get-childitem has known issues enumerating large directories, and the object collections it returns can get huge.