Match across line breaks when using -match with a regular expression - windows

The following results a single match of "foo".
$multilineString = "foo
bar
baz";
$multilineString -match ".*";
$matches;
That is because the . character does not include line breaks.
These also output only "foo".
$multilineString -match "(.|\r)*" | Out-Null; $matches[0];
$multilineString -match "(.|\r\n)*" | Out-Null; $matches[0];
In PowerShell, how do we use match to include any character including line breaks, so that the output will include all three lines:
foo
bar
baz

I almost never use -match for this specific use, as in my comment, I usually go with [regex]. After reviewing the MS Docs:
It is important to note that the $Matches hashtable contains only the first occurrence of any matching pattern.
So, if you want to get the same result as [regex]::Matches($multilineString, '\w+').Value, that would require first to split the string and then loop over it:
$multilineString = "foo
bar
baz"
$multilineString -split '\r?\n' | ForEach-Object {
if($_ -match '\w+')
{
$Matches
}
}
Name Value
---- -----
0 foo
0 bar
0 baz
An alternative that would also work and will not require a split or a loop is possible, but the regex pattern has to be more specific. In this case, we know that we're looking for 3 words.
$multilineString = "foo
bar
baz"
$multilineString -match '^(\w+)\s+(\w+)\s+(\w+)$'
$Matches
Name Value
---- -----
3 baz
2 bar
1 foo
0 foo…

Related

Using Select-Object in Powershell, how can I select only the part of a string I want on a per line basis?

Currently I have a script that will search a directory and fine all instances of the word "dummy". It will then output to a CSV the FileName, Path, LineNumber, Line to a file.
This Line contains a very standardized results like:
Hi I am a dummy, who are you?
Something dummy, blah blah?
Lastly dummy, how is your day?
I am trying to find a way to output an additional column in my CSV that contains all characters before the "?" as well as all of the characters after "dummy,".
Resulting lines would be:
who are you
blah blah
how is your day
I tried to use split but it keeps removing additional characters. Is it possible to find the index of "dummy," and "?" and then substring out the middle portion?
Any help would be greatly appreciated.
Code as it stands:
Write-Host "Hello, World!"
# path
$path = 'C:\Users\Documents\4_Testing\fe\*.ts'
# pattern to find dummy
$pattern = "dummy,"
Get-ChildItem -Recurse -Path $path | Select-String -Pattern $pattern |
Select-Object FileName,Path,LineNumber,Line
,#{name='Function';expression={
$_.Line.Split("dummy,")
}} |
Export-Csv 'C:\Users\User\Documents\4_Testing\Output1.csv' -NoTypeInformation
Write-Host "Complete"
Use the -replace regex operator to replace the whole line with just the part between dummy, and ?:
PS ~> 'Hi I am a dummy, who are you?' -replace '^.*dummy,\s*(.*)\?\s*$', '$1'
who are you
So your calculated property definition should like this:
#{Name = 'Function'; Expression = { $_.Line -replace '^.*dummy,\s*(.*)\?\s*$', '$1' }}
The pattern used above describes:
^ # start of string
.* # 0 or more of any character
dummy, # the literal substring `dummy,`
\s* # 0 or more whitespace characters
( # start of capture group
.* # 0 or more of any character
) # end capture group
\? # a literal question mark
\s* # 0 or more whitespace characters
$ # end of line/string
If you also want to remove everything after the first ?, change the pattern slightly:
#{Name = 'Function'; Expression = { $_.Line -replace '^.*dummy,\s*(.*?)\?.*$', '$1' }}
Adding the metacharacter ? to .* makes the subexpression lazy, meaning the regex engine tries to match as few characters as possible - meaning we'll only capture up until the first ?.

PowerShell rename files

I have a database full of .pdf and .dwf files.
I need to rename these.
The files are named as follows:
123456 text text.pdf
And should look like this:
123456000_text_text.text.pdf
I can replace the spaces with the following command:
dir | rename-item -NewName {$_.name -replace " ","_"}
Now I need a command to insert "0" three times after the first 6 digits.
Can someone help me?
Thanks already
You need to filter on *.pdf and *.dwf files only and also if the filenames match the criterion of starting with 6 digits followed by a space character. Then you can use regex replacements like this:
Get-ChildItem -Path D:\Test -File | Where-Object { $_.Name -match '^\d{6} .*\.(dwf|pdf)$' } |
Rename-Item -NewName { $_.Name -replace '^(\d{6}) ', '${1}000_' -replace '\s+', '_'}
Before:
D:\TEST
123456 text text.dwf
123456 text text.pdf
123456 text text.txt
After:
D:\TEST
123456 text text.txt
123456000_text_text.dwf
123456000_text_text.pdf
Regex details of filename -match:
^ Assert position at the beginning of the string
\d Match a single digit 0..9
{6} Exactly 6 times
\ Match the character “ ” literally
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\. Match the character “.” literally
( Match the regular expression below and capture its match into backreference number 1
Match either the regular expression below (attempting the next alternative only if this one fails)
dwf Match the characters “dwf” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
pdf Match the characters “pdf” literally
)
$ Assert position at the end of the string (or before the line break at the end of the string, if any)
What you have is 123456 text text.pdf
Want it to look like 123456000_text_text.pdf
A systematic way to achieve this would be>>
$const = "123456 text text.pdf"
$filename = $const -replace " ","_"
$temp = $filename.split("_")[0]
$rep1 = ([string]$temp).PadRight(9,'0')
$output = $filename -replace $temp,$rep1
Write-Host $output -ForegroundColor Green
The great thing about this method is that it will always trail with 0s keeping your number string to 9 digits.

Unwanted space in substring using powershell

I'm fairly new to PS: I'm extracting fields from multiple xml files ($ABB). The $net var is based on a pattern search and returns a non static substring on line 2. Heres what I have so far:
$ABB = If ($aa -eq $null ) {"nothing to see here"} else {
$count = 0
$files = #($aa)
foreach ($f in $files)
{
$count += 1
$mo=(Get-Content -Path $f )[8].Substring(51,2)
(Get-Content -Path $f | Select-string -Pattern $lf -Context 0,1) | ForEach-Object {
$net = $_.Context.PostContext
$enet = $net -split "<comm:FieldValue>(\d*)</comm:FieldValue>"
$enet = $enet.trim()}
Write-Host "$mo-nti-$lf-$enet" "`r`n"
}}
The output looks like this: 03-nti-260- 8409.
Note the space prefacing the 8409 which corresponds to the $net variable. I haven't been able to solve this on my own, my approach could be all wrong. I'm open to any and all suggestions. Thanks for your help.
Since your first characters in the first line of $net after $net = $_.Context.PostContext contains the split characters, a blank line will output as the first element of the output. Then when you stringify output, each split output item is joined by a single space.
You need to select lines that aren't empty:
$enet = $net -split "<comm:FieldValue>(\d*)</comm:FieldValue>" -ne ''
Explanation:
-Split characters not surrounded by () are removed from the output and the remaining string is split into multiple elements from each of those matched characters. When a matched character starts or ends a string, a blank line is output. Care must be taken to remove those lines if they are not required. Trim() will not work because Trim() applies to a single string rather than an array and will not remove empty string.
Adding -ne '' to the end of the command, removes empty lines. It is just an inline boolean condition that when applied to an array, only outputs elements where the condition is true.
You can see an example of the blank line condition below:
123 -split 1
23
123 -split 1 -ne ''
23
Just use a -replace to get rid of any spaces
For example:
'03-nti-260- 8409' -replace '\s'
<#
# Results
03-nti-260-8409
#>

Split pattern output by spaces in Powershell

I need to extract the third column of a string returned after matching a Pattern.
It also needs to be a one-liner
File contains data like this:
f5834eab44ff bfd0bc8498d8 1557718920
dc8087c38a0d a72e89879030 1557691221
e6d7aaf6d76b caf6cd0ef68c 1557543565
Right now it matches the pattern and returns the line.
But I cannot get it to Split on the spaces so I can get the 3rd column (index 2).
select-string -Path $hashlistfile -Pattern 'dc8087c38a0d') | $_.Split(" ") | $_[2]
Output should be:
1557691221
You can grab the Line property from the output object produced by Select-String, split that and then index directly into the result of String.Split():
Select-String -Path $hashlistfile -Pattern dc8087c38a0d |ForEach-Object {
$_.Line.Split(" ")[2]
}
You can only use '$_' inside cmdlets that have a script block option '{ }'. Select-string returns MatchInfo objects.
(select-string dc8087c38a0d $hashlistfile).gettype()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False MatchInfo System.Object
The -split operator seems to deal with it more easily. There's an extra parenthesis ')' after the pattern in your example.
select-string dc8087c38a0d $hashlistfile | foreach { -split $_ | select -index 2 }
1557691221

Perpend X Number of Lines in a File with PowerShell

I'm stuck on a problem I'm working on with a text file. The file is just a flat text file with some dates that have been added, in other words a simple log file. My problem is this "I need to comment out a number of lines, that number passed as a param". The part I'm hung up on is the actual commenting out X number of lines part (lets say a # is added). I can read and write files, read lines and write lines with a search string but what I can't seem to figure out is how to edit X number of lines and leave the other lines alone.
PS
In actuality it doesn't matter if the lines are at the end of the file or the beginning, though it would be nice to understand the method on how to add to the beginning or the end
gc .\foo.txt | select -First 3 | %{ "#{0}" -f $_ }
gc .\foo.txt | select -Skip 3
If I got it right then this pattern should work for you:
(Get-Content my.log) | .{
begin{
# add some lines to the start
"add this to the start"
"and this, too"
}
process{
# comment out lines that match some condition
# in this demo: a line contains 'foo'
# in your case: apply your logic: line counter, search string, etc.
if ($_ -match 'foo') {
# match: comment out it
"#$_"
}
else {
# no match: keep it as it is
$_
}
}
end {
# add some lines to the end
"add this to the end"
"and this, too"
}
} |
Set-Content my.log
Then the log file:
bar
foo
bar
foo
is transformed into:
add this to the start
and this, too
bar
#foo
bar
#foo
add this to the end
and this, too
Note: for very large files use a similar but slightly different pattern:
Get-Content my.log | .{
# same code
...
} |
Set-Content my-new.log
and then rename my-new.log to my.log. If you are about to write to a new file anyway then use the second more effective pattern in the first place.

Resources