Unwanted space in substring using powershell - windows

I'm fairly new to PS: I'm extracting fields from multiple xml files ($ABB). The $net var is based on a pattern search and returns a non static substring on line 2. Heres what I have so far:
$ABB = If ($aa -eq $null ) {"nothing to see here"} else {
$count = 0
$files = #($aa)
foreach ($f in $files)
{
$count += 1
$mo=(Get-Content -Path $f )[8].Substring(51,2)
(Get-Content -Path $f | Select-string -Pattern $lf -Context 0,1) | ForEach-Object {
$net = $_.Context.PostContext
$enet = $net -split "<comm:FieldValue>(\d*)</comm:FieldValue>"
$enet = $enet.trim()}
Write-Host "$mo-nti-$lf-$enet" "`r`n"
}}
The output looks like this: 03-nti-260- 8409.
Note the space prefacing the 8409 which corresponds to the $net variable. I haven't been able to solve this on my own, my approach could be all wrong. I'm open to any and all suggestions. Thanks for your help.

Since your first characters in the first line of $net after $net = $_.Context.PostContext contains the split characters, a blank line will output as the first element of the output. Then when you stringify output, each split output item is joined by a single space.
You need to select lines that aren't empty:
$enet = $net -split "<comm:FieldValue>(\d*)</comm:FieldValue>" -ne ''
Explanation:
-Split characters not surrounded by () are removed from the output and the remaining string is split into multiple elements from each of those matched characters. When a matched character starts or ends a string, a blank line is output. Care must be taken to remove those lines if they are not required. Trim() will not work because Trim() applies to a single string rather than an array and will not remove empty string.
Adding -ne '' to the end of the command, removes empty lines. It is just an inline boolean condition that when applied to an array, only outputs elements where the condition is true.
You can see an example of the blank line condition below:
123 -split 1
23
123 -split 1 -ne ''
23

Just use a -replace to get rid of any spaces
For example:
'03-nti-260- 8409' -replace '\s'
<#
# Results
03-nti-260-8409
#>

Related

Using Select-Object in Powershell, how can I select only the part of a string I want on a per line basis?

Currently I have a script that will search a directory and fine all instances of the word "dummy". It will then output to a CSV the FileName, Path, LineNumber, Line to a file.
This Line contains a very standardized results like:
Hi I am a dummy, who are you?
Something dummy, blah blah?
Lastly dummy, how is your day?
I am trying to find a way to output an additional column in my CSV that contains all characters before the "?" as well as all of the characters after "dummy,".
Resulting lines would be:
who are you
blah blah
how is your day
I tried to use split but it keeps removing additional characters. Is it possible to find the index of "dummy," and "?" and then substring out the middle portion?
Any help would be greatly appreciated.
Code as it stands:
Write-Host "Hello, World!"
# path
$path = 'C:\Users\Documents\4_Testing\fe\*.ts'
# pattern to find dummy
$pattern = "dummy,"
Get-ChildItem -Recurse -Path $path | Select-String -Pattern $pattern |
Select-Object FileName,Path,LineNumber,Line
,#{name='Function';expression={
$_.Line.Split("dummy,")
}} |
Export-Csv 'C:\Users\User\Documents\4_Testing\Output1.csv' -NoTypeInformation
Write-Host "Complete"
Use the -replace regex operator to replace the whole line with just the part between dummy, and ?:
PS ~> 'Hi I am a dummy, who are you?' -replace '^.*dummy,\s*(.*)\?\s*$', '$1'
who are you
So your calculated property definition should like this:
#{Name = 'Function'; Expression = { $_.Line -replace '^.*dummy,\s*(.*)\?\s*$', '$1' }}
The pattern used above describes:
^ # start of string
.* # 0 or more of any character
dummy, # the literal substring `dummy,`
\s* # 0 or more whitespace characters
( # start of capture group
.* # 0 or more of any character
) # end capture group
\? # a literal question mark
\s* # 0 or more whitespace characters
$ # end of line/string
If you also want to remove everything after the first ?, change the pattern slightly:
#{Name = 'Function'; Expression = { $_.Line -replace '^.*dummy,\s*(.*?)\?.*$', '$1' }}
Adding the metacharacter ? to .* makes the subexpression lazy, meaning the regex engine tries to match as few characters as possible - meaning we'll only capture up until the first ?.

PowerShell rename files

I have a database full of .pdf and .dwf files.
I need to rename these.
The files are named as follows:
123456 text text.pdf
And should look like this:
123456000_text_text.text.pdf
I can replace the spaces with the following command:
dir | rename-item -NewName {$_.name -replace " ","_"}
Now I need a command to insert "0" three times after the first 6 digits.
Can someone help me?
Thanks already
You need to filter on *.pdf and *.dwf files only and also if the filenames match the criterion of starting with 6 digits followed by a space character. Then you can use regex replacements like this:
Get-ChildItem -Path D:\Test -File | Where-Object { $_.Name -match '^\d{6} .*\.(dwf|pdf)$' } |
Rename-Item -NewName { $_.Name -replace '^(\d{6}) ', '${1}000_' -replace '\s+', '_'}
Before:
D:\TEST
123456 text text.dwf
123456 text text.pdf
123456 text text.txt
After:
D:\TEST
123456 text text.txt
123456000_text_text.dwf
123456000_text_text.pdf
Regex details of filename -match:
^ Assert position at the beginning of the string
\d Match a single digit 0..9
{6} Exactly 6 times
\ Match the character “ ” literally
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\. Match the character “.” literally
( Match the regular expression below and capture its match into backreference number 1
Match either the regular expression below (attempting the next alternative only if this one fails)
dwf Match the characters “dwf” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
pdf Match the characters “pdf” literally
)
$ Assert position at the end of the string (or before the line break at the end of the string, if any)
What you have is 123456 text text.pdf
Want it to look like 123456000_text_text.pdf
A systematic way to achieve this would be>>
$const = "123456 text text.pdf"
$filename = $const -replace " ","_"
$temp = $filename.split("_")[0]
$rep1 = ([string]$temp).PadRight(9,'0')
$output = $filename -replace $temp,$rep1
Write-Host $output -ForegroundColor Green
The great thing about this method is that it will always trail with 0s keeping your number string to 9 digits.

Trim the blank line and then pick the nth array item

So I have a Powershell code where I am trying to Get-content of a file, trimming the first blank line and then splitting the content in order to get the nth item in the array.
The issue is its giving me nth item of the second line of the file, while I need the nth item of the first line.
Here's my code.
$Ess_keys = "D:\Automation\Encryption\myKeys.txt"
Get-Content $Ess_keys | ? {$_.trim() -ne "" } |ForEach-Object{
$splitUp = $_ -split "\s+"
$PKey = $splitUp[5]}
$Pkey
Here's what the file looks like:
>
Public Key for Encryption: 27743,2195638463
Private Key for Decryption: 2073750047,2195638463
When I run it, this is the output its giving
PS C:\Users\wrtty> $pkey
2073750047,2195638463
As you can see, its picking the 5th array item in the second line. While I need it from the 1st line.
I also checked if its not trimming the 1st non-blank line. But when I run the below 2 set of codes, I can see its not trimming the first non-blank line.
PS C:\Users\wrtty> Get-Content $Ess_keys | ? {$_.trim() -ne "" }
Public Key for Encryption: 27743,2195638463
Private Key for Decryption: 2073750047,2195638463
PS C:\Users\wrtty> Get-Content $Ess_keys | where {$_ -ne ""}
output
Public Key for Encryption: 27743,2195638463
Private Key for Decryption: 2073750047,2195638463
Any suggestions?
In your attempt, you are overwriting $PKey with each loop iteration. Then you are only outputting $PKey at the end. So you only get the last matched line.
Since it appears you already know the data format within the file, you can use a simple Select-String pattern match to get the data you want.
$pkeys = Select-String -Path "D:\Automation\Encryption\myKeys.txt" -Pattern "Public Key for Encryption: (\S+)" -AllMatches |
Foreach-Object {
$_.Matches.Groups[1].Value
}
$pkeys
The above code stores ALL public key matched data in $pkeys. If you only want to access the first match, then $pkeys[0] will suffice. The regex (\S+) matches consecutive non-white space characters.
Thanks for your comment AdminOfThings
Below solution worked for me.
Get-Content $Ess_keys | where {$_ -ne ""} |Select-Object -First 1| ForEach-Object{
$splitUp = $_ -split "\s+"
$PKey = $splitUp[5]}

Powershell - Having difficulty in ignoring the header row (first row) and footer row (last row) in file

I am looking to find extra delimiters in my file on a line by line basis.
I would, however would like to ignore the header row (first row) and the footer row (last row) in the file and just focus on the file detail.
I am not sure on how to ignore the first and last row using the ReadLine() method. I DO NOT want to alter the file in any way, this script is used just to identify rows in the CSV file that have extra delimiters.
Please note: The file I am looking to search has millions of rows and in order to do that I have to rely on the ReadLine() method rather than the Get-Content approach.
I did try to use Select-Object -Skip 1 | Select-Object -SkipLast 1 in my Get-Content statement inputting the value into $measure but I didn't get the desired result.
For example:
H|Transaction|2017-10-03 12:00:00|Vendor --> This is the Header
D|918a39230a098134|2017-08-31 00:00:00.000|2017-08-15 00:00:00.000|SLICK-2340|...
D|918g39230b095134|2017-08-31 00:00:00.000|2017-08-15 00:00:00.000|EX|SRE-68|...
T|1268698 Records --> This is Footer
Basically, I want my script to ignore the header and footer, and use the first data row (D|918...) as the example of a correct record and the other detail records to be compared against it for error (in this example the second detail row should be returned, because there an invalid delimiter in the the field (EX|SRE-68...).
When I tried using -skip 1 and -skiplast 1 in the get-content statement, the process is still using the header row as a comparison and returning all detail records as invalid records.
Here's what I have so far...
Editor's note: Despite the stated intent, this code does use the header line (the 1st line) to determine the reference column count.
$File = "test.csv"
$Delimiter = "|"
$measure = Get-Content -Path $File | Measure-Object
$lines = $measure.Count
Write-Host "$File has ${lines} rows."
$i = 1
$reader = [System.IO.File]::OpenText($File)
$line = $reader.ReadLine()
$reader.Close()
$header = $line.Split($Delimiter).Count
$reader = [System.IO.File]::OpenText($File)
try
{
for()
{
$line = $reader.ReadLine()
if($line -eq $null) { break }
$c = $line.Split($Delimiter).Count
if($c -ne $header -and $i -ne${lines})
{
Write-Host "$File - Line $i has $c fields, but it should be $header"
}
$i++
}
}
finally
{
$reader.Close()
}
Any reason your using Read Line? The Get-Content your doing will already load the entire CSV into memory, so I'd save that to a variable and then use a loop to go through (starting at 1 to skip the first line).
So something like this:
$File = "test.csv"
$Delimiter = "|"
$contents = Get-Content -Path $File
$lines = $contents.Count
Write-Host "$File has ${lines} rows."
$header = $contents[0].Split($Delimiter).count
for ($i = 1; $i -lt ($lines - 1); $i++)
{
$c = $contents[$i].Split($Delimiter).Count
if($c -ne $header)
{
Write-Host "$File - Line $i has $c fields, but it should be $header"
}
}
Now that we know that performance matters, here's a solution that uses only [System.IO.TextFile].ReadLine() (as a faster alternative to Get-Content) to read the large input file, and does so only once:
No up-front counting of the number of lines via Get-Content ... | Measure-Object,
No separate instance of opening the file just to read the header line; keeping the file open after reading the header line has the added advantage that you can just keep reading (no logic needed to skip the header line).
$File = "test.csv"
$Delimiter = "|"
# Open the CSV file as a text file for line-based reading.
$reader = [System.IO.File]::OpenText($File)
# Read the lines.
try {
# Read the header line and discard it.
$null = $reader.ReadLine()
# Read the first data line - the reference line - and count its columns.
$refColCount = $reader.ReadLine().Split($Delimiter).Count
# Read the remaining lines in a loop, skipping the final line.
$i = 2 # initialize the line number to 2, given that we've already read the header and the first data line.
while ($null -ne ($line = $reader.ReadLine())) { # $null indicates EOF
++$i # increment line number
# If we're now at EOF, we've just read the last line - the footer -
# which we want to ignore, so we exit the loop here.
if ($reader.EndOfStream) { break }
# Count this line's columns and warn, if the count differs from the
# header line's.
if (($colCount = $line.Split($Delimiter).Count) -ne $refColCount) {
Write-Warning "$File - Line $i has $colCount fields rather than the expected $refColCount."
}
}
} finally {
$reader.Close()
}
Note: This answer was written before the OP clarified that performance was paramount and that a Get-Content-based solution was therefore not an option. My other answer now addresses that.
This answer may still be of interest for a slower, but more concise, PowerShell-idiomatic solution.
the_sw's helpful answer shows that you can use PowerShell's own Get-Content cmdlet to conveniently read a file, without needing to resort to direct use of the .NET Framework.
PSv5+ enables an idiomatic single-pipeline solution that is more concise and more memory-efficient - it processes lines one by one - albeit at the expense of performance; especially with large files, however, you may not want to read them in all at once, so a pipeline solution is preferable.
PSv5+ is required due to use of Select-Objects -SkipLast parameter.
$File = "test.csv"
$Delimiter = '|'
Get-Content $File | Select-Object -SkipLast 1 | ForEach-Object { $i = 0 } {
if (++$i -eq 1) {
return # ignore the actual header row
} elseif ($i -eq 2) { # reference row
$refColumnCount = $_.Split($Delimiter).Count
} else { # remaining rows, except the footer, thanks to -SkipLast 1
$columnCount = $_.Split($Delimiter).Count
if ($columnCount -ne $refColumnCount) {
"$File - Line $i has $columnCount fields rather than the expected $refColumnCount."
}
}
}

PowerShell replace the first three spaces with commas

I have a file that looks like this. There are many lines in this format.
5/10 RED SYSID This is a long message
I would like to have these line be in 4 comma-separated columns.
5/10,RED,SYSID,This is a long message
How can I replace only the first three spaces with commas?
You can do this with the PowerShell -split and -join operators.
$line -split ' ',3 -join ','
This example will convert the first three spaces into commas. -split ' ',3 will split the string into an array of four elements separated by the first three spaces in the string. Then -join ',' will rejoin them into one string with a comma between each.
The above won't work if your input has multiple spaces between fields since each space is considered separately, or if your fields are separated by other whitespace such as tabs. Instead, use a regex split.
$line -split '\s+',3,"RegexMatch" -join ','
This example treats as a delimiter the first three matches of \s+ and converts a sequence of consecutive whitespace into a single comma.
To run against every line in a file, use Get-Content and Foreach-Object
Get-Content $filename | foreach {
$_ -split '\s+',3,"RegexMatch" -join ','
} | Out-File $newfilename
The following regex should do what you want.
$line -replace '^(\S+?) (\S+?) (\S+?) (.*)','$1,$2,$3,$4'
This captures four groups of non-whitespace characters separated by spaces, with the last group containing the remainder of the string. Then it replaces them with those same four groups separated by commas.
To use this to modify every matching line in a file, Pipe Get-Content through Foreach-Object and finally to Out-File
$regex = [regex]'^(\S+?) (\S+?) (\S+?) (.*)','$1,$2,$3,$4'
Get-Content $filename | foreach {
$_ -replace $regex
} | Out-File $newfilename
Any lines the regex does not match will be sent to the output file unchanged. This includes if any lines contain tabs instead of spaces. If you need to test for this in your script, you can first test for $_ -match $regex, and take appropriate action if that returns false.
This might be what you're looking for.
Replace the first occurence of a string in a file
The relevant code is this:
$re = [regex]' '
$re.Replace([string]::Join("`n", (gc C:\Path\To\test.txt)), ',', 3)

Resources