I have a CSV file using different quote and text delimiter characters other than the default. I know for the delimiter there is an option for a different delimiter but I cannot find out how to get rid of the quote characters.
Import-Csv 'C:\test.txt' -Delimiter "(character U+0014 is used here, won't show here)"
But the quote character is the U+00FE and I need to remove this as well so I can get the text without any special characters. I do not want to write this out to a new file. I want to import the csv into a variable so I can do some analytic's on it. For example see if a field is empty.
Any ideas?
The delimiter is not actually a problem, as you can do that with
-Delimiter "$([char]0x14)"
As for the quotes you can use a preprocessing step and then use ConvertFrom- instead of Import-CSV:
Get-Content test.txt |
ForEach-Object { $_ -replace ([char]0xFE) } | # to remove the “quotes”
ConvertFrom-CSV -Delimiter "$([char]0x14)"
If your lines contain embedded quotes then it needs a bit more work and probably easier just to force-quote every field:
$14 = "$([char]0x14)"
$_ -replace ([char]0xFE) -replace '"', '""' -replace "(?<=^|$14)|(?=`$|$14)", '"'
Related
I have this script to trim leading spaces, remove " and , from a txt file.
But I couldn't put into an output file the result
$text = Get-Content output.txt
$text -replace '["]','' -replace '[,]','' | Foreach {write-host $_.TrimStart()}
This is the result I'm getting. I wanted it to output on a file instead of showing like this.
PS C:\Users\aa1\temp> $text -replace '["]','' -replace '[,]','' | Foreach {write-host $_.TrimStart()}
AutoScalingGroupName: asg1
MinSize: 1
AutoScalingGroupName: asg2
MinSize: 1
AutoScalingGroupName: asg3
MinSize: 1
AutoScalingGroupName: asg4
MinSize: 1
AutoScalingGroupName: asg5
MinSize: 3
PS C:\Users\aa1\temp>
If you read the contents of the file as one multiline string using parameter -Raw instead of a string array, you can do all replacements without using a loop like this:
(Get-Content -Path 'output.txt' -Raw) -replace '(?m)[",]|^\s+' | Set-Content -Path 'output.txt'
Regex details:
Match this alternative (attempting the next alternative only if this one fails)
(? Use these options for the whole regular expression
m ^$ match at line breaks
)
[",] Match a single character from the list “",”
|
Or match this alternative (the entire match attempt fails if this one fails to match)
^ Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed)
\s Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
Currently I have a script that will search a directory and fine all instances of the word "dummy". It will then output to a CSV the FileName, Path, LineNumber, Line to a file.
This Line contains a very standardized results like:
Hi I am a dummy, who are you?
Something dummy, blah blah?
Lastly dummy, how is your day?
I am trying to find a way to output an additional column in my CSV that contains all characters before the "?" as well as all of the characters after "dummy,".
Resulting lines would be:
who are you
blah blah
how is your day
I tried to use split but it keeps removing additional characters. Is it possible to find the index of "dummy," and "?" and then substring out the middle portion?
Any help would be greatly appreciated.
Code as it stands:
Write-Host "Hello, World!"
# path
$path = 'C:\Users\Documents\4_Testing\fe\*.ts'
# pattern to find dummy
$pattern = "dummy,"
Get-ChildItem -Recurse -Path $path | Select-String -Pattern $pattern |
Select-Object FileName,Path,LineNumber,Line
,#{name='Function';expression={
$_.Line.Split("dummy,")
}} |
Export-Csv 'C:\Users\User\Documents\4_Testing\Output1.csv' -NoTypeInformation
Write-Host "Complete"
Use the -replace regex operator to replace the whole line with just the part between dummy, and ?:
PS ~> 'Hi I am a dummy, who are you?' -replace '^.*dummy,\s*(.*)\?\s*$', '$1'
who are you
So your calculated property definition should like this:
#{Name = 'Function'; Expression = { $_.Line -replace '^.*dummy,\s*(.*)\?\s*$', '$1' }}
The pattern used above describes:
^ # start of string
.* # 0 or more of any character
dummy, # the literal substring `dummy,`
\s* # 0 or more whitespace characters
( # start of capture group
.* # 0 or more of any character
) # end capture group
\? # a literal question mark
\s* # 0 or more whitespace characters
$ # end of line/string
If you also want to remove everything after the first ?, change the pattern slightly:
#{Name = 'Function'; Expression = { $_.Line -replace '^.*dummy,\s*(.*?)\?.*$', '$1' }}
Adding the metacharacter ? to .* makes the subexpression lazy, meaning the regex engine tries to match as few characters as possible - meaning we'll only capture up until the first ?.
I want to remove the first and the last double quotes of each line from a csv input file and save the output in the same input file,with ucs-le bom encoding using powershell
The sample csv dataset is (input.csv):
"1","2","3"
The output csv(input.csv):
1","2","3
I have used
$csv = 'input.csv'
(Get-Content $csv) -replace '(?m)"([^,]*?)"(?=,|$)', '$1' |
Set-Content $csv
but it removes the double quotes from the first and the last element.
I have a database full of .pdf and .dwf files.
I need to rename these.
The files are named as follows:
123456 text text.pdf
And should look like this:
123456000_text_text.text.pdf
I can replace the spaces with the following command:
dir | rename-item -NewName {$_.name -replace " ","_"}
Now I need a command to insert "0" three times after the first 6 digits.
Can someone help me?
Thanks already
You need to filter on *.pdf and *.dwf files only and also if the filenames match the criterion of starting with 6 digits followed by a space character. Then you can use regex replacements like this:
Get-ChildItem -Path D:\Test -File | Where-Object { $_.Name -match '^\d{6} .*\.(dwf|pdf)$' } |
Rename-Item -NewName { $_.Name -replace '^(\d{6}) ', '${1}000_' -replace '\s+', '_'}
Before:
D:\TEST
123456 text text.dwf
123456 text text.pdf
123456 text text.txt
After:
D:\TEST
123456 text text.txt
123456000_text_text.dwf
123456000_text_text.pdf
Regex details of filename -match:
^ Assert position at the beginning of the string
\d Match a single digit 0..9
{6} Exactly 6 times
\ Match the character “ ” literally
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\. Match the character “.” literally
( Match the regular expression below and capture its match into backreference number 1
Match either the regular expression below (attempting the next alternative only if this one fails)
dwf Match the characters “dwf” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
pdf Match the characters “pdf” literally
)
$ Assert position at the end of the string (or before the line break at the end of the string, if any)
What you have is 123456 text text.pdf
Want it to look like 123456000_text_text.pdf
A systematic way to achieve this would be>>
$const = "123456 text text.pdf"
$filename = $const -replace " ","_"
$temp = $filename.split("_")[0]
$rep1 = ([string]$temp).PadRight(9,'0')
$output = $filename -replace $temp,$rep1
Write-Host $output -ForegroundColor Green
The great thing about this method is that it will always trail with 0s keeping your number string to 9 digits.
I have a file that looks like this. There are many lines in this format.
5/10 RED SYSID This is a long message
I would like to have these line be in 4 comma-separated columns.
5/10,RED,SYSID,This is a long message
How can I replace only the first three spaces with commas?
You can do this with the PowerShell -split and -join operators.
$line -split ' ',3 -join ','
This example will convert the first three spaces into commas. -split ' ',3 will split the string into an array of four elements separated by the first three spaces in the string. Then -join ',' will rejoin them into one string with a comma between each.
The above won't work if your input has multiple spaces between fields since each space is considered separately, or if your fields are separated by other whitespace such as tabs. Instead, use a regex split.
$line -split '\s+',3,"RegexMatch" -join ','
This example treats as a delimiter the first three matches of \s+ and converts a sequence of consecutive whitespace into a single comma.
To run against every line in a file, use Get-Content and Foreach-Object
Get-Content $filename | foreach {
$_ -split '\s+',3,"RegexMatch" -join ','
} | Out-File $newfilename
The following regex should do what you want.
$line -replace '^(\S+?) (\S+?) (\S+?) (.*)','$1,$2,$3,$4'
This captures four groups of non-whitespace characters separated by spaces, with the last group containing the remainder of the string. Then it replaces them with those same four groups separated by commas.
To use this to modify every matching line in a file, Pipe Get-Content through Foreach-Object and finally to Out-File
$regex = [regex]'^(\S+?) (\S+?) (\S+?) (.*)','$1,$2,$3,$4'
Get-Content $filename | foreach {
$_ -replace $regex
} | Out-File $newfilename
Any lines the regex does not match will be sent to the output file unchanged. This includes if any lines contain tabs instead of spaces. If you need to test for this in your script, you can first test for $_ -match $regex, and take appropriate action if that returns false.
This might be what you're looking for.
Replace the first occurence of a string in a file
The relevant code is this:
$re = [regex]' '
$re.Replace([string]::Join("`n", (gc C:\Path\To\test.txt)), ',', 3)