I am looking for a way to parse a text file and place the results into an array in powershell.
I know select-string -path -pattern will get all strings that match a pattern. But what if I already have a structured textfile, perhaps pipe delimminated, with new entries on each line. Like so:
prodServ1a
prodServ1b
prodServ1c
C:\dir\serverFile.txt
How can I place each of those servers into an array in powershell that I can loop through?
You say 'pipe delimited, like so' but your example isn't pipe delimited. I'll imagine it is, then you need to use the Import-CSV commandlet. e.g. if the data file contains this:
prodServ1a|4|abc
prodServ1b|5|def
prodServ1c|6|ghi
then this code:
$data = Import-Csv -Path test.dat -Header "Product","Cost","SerialNo" -Delimiter "|"
will import and split it, and add headers:
$data
Product Cost SerialNo
------- ---- --------
prodServ1a 4 abc
prodServ1b 5 def
prodServ1c 6 ghi
Then you can use
foreach ($item in $data) {
$item.SerialNo
}
If your data is flat or unstructured, but has a pattern such as delimited by a space or comma and no carriage returns or line feeds, you can use the Split() method.
PS>$data = "prodServ1a prodServ1b prodServ1c"
PS>$data
prodServ1a prodServ1b prodServ1c
PS>$data.Split(" ")
prodServ1a
prodServ1b
prodServ1c
Works particularly well with someone sending you a list of IP addresses separated by a comma.
Related
Example of what I am trying to do.
I have CSV_A.CSV that contains a list of keywords (each on a new line) like: apple, orange, pear. Note these keywords only occur in the TEXT_FILE exactly 1 time.
I have a text file TEXT_FILE.TXT that has 1000s of lines. I need a script that will search TEXT_FILE for apple, then orange, then pear and return its line as well as the next 5 lines.
So the ending result would be a file that contains 15 lines, 5 for each of the 3 key words.
Currently I have tried the following code and it gives me the first line for each keyword, but nothing more.
# path
$path = 'C:\Users\Documents\4_Testing\TEXT_FILE.TXT'
Import-Csv .\CSV_A.csv | ForEach-Object {
Get-ChildItem -Path $path | Select-String -Pattern "$($_.KeywordColumn)\(" -Context 0, 5 |
Select-Object Line |
add-content -path 'C:\Users\Documents\4_Testing\Output.csv'
}
Change this line:
Select-Object Line |
to:
ForEach-Object { #($_.Line;$_.Context.PostContext) } |
This way, each match will produce an array of 6 strings - the matching line, and the five following.
Goal
Using PowerShell, find a string in a file, run a simple transformation script on the string, and replace the original string with the new string in the same file
Details
The file is a Markdown file with one or more HTML blocks inside.
The goal is to make the entire file Markdown with no HTML.
Pandoc is a command-line HTML-to-Markdown transformation tool that easily transforms HTML to Markdown.
The transformation script is a Pandoc script.
Pandoc alone cannot transform a Markdown file that includes HTML to Markdown.
Each HTML block a is one long string with no line breaks (see example below).
The HTML is a little rough and sometimes not valid; despite this, Pandoc handles much of the transformation successfully. This may not be relevant.
I cannot change the fact that the file is generated originally as part Markdown/part HTML, that the HTML is sometimes invalid, or that each HTML block is all on one line.
PowerShell is required because that's the scripting language my team supports.
Example file of mixed Markdown/HTML code; most HTML is invalid
# Heading 1
Text
# Heading 2
<h3>Heading 3</h3><p>I am all on one line</h><span><div>I am not always valid HTML</div></span><br><h4>Heading 4<h4><ul><li>Item<br></li><li>Item</li><ul><span></span><img src="url" style="width:85px;">
# Heading 3
Text
# Heading 4
<h2>Heading 1</h2><div>Text</div><h2>Heading 2</h2><div>Text</div>
# Heading 5
<div><ul><li>Item</li><li>Item</li><li>Item</li></ul></div><code><pre><code><div>Code line 1</div><div>Code line 2</div><div>Code line 3</div></code></pre></code>
Text
Code for transformation script
pandoc -f html -t 'markdown_strict-raw_html-native_divs-native_spans-bracketed_spans' --atx-headers
Attempts
I surrounded each HTML block with a <start> and <end> tag with the goal to extract the text in between those tags with a regex, run the Pandoc script on it, and replace the original text. My plan was to run a foreach loop to iterate through each block one by one.
This attempt transforms the HTML to Markdown, but does not return the original Markdown with it:
$file = 'file.md'
$regex = '<start>.*?<end>'
$a = Get-Content $file -Raw
$a | Select-String $regex -AllMatches | ForEach-Object {$_.Matches.Value} | pandoc -f html -t 'markdown_strict-raw_html-native_divs-native_spans-bracketed_spans' --atx-headers
This poor attempt seeks to perform the replace, but only returns the original file with no changes:
$file = 'file.md'
$regex = '<start>.*?<end>'
$content = Get-Content $file -Raw
$a = $content | Select-String $regex -AllMatches
$b = $a | ForEach-Object {$_.Matches } | Foreach-Object {$_.Value} | Select-Object | pandoc -f html -t 'markdown_strict-raw_html-native_divs-native_spans-bracketed_spans' --atx-headers
$content | ForEach-Object {
$_ -replace $a,$b }
I am struggling to move beyond these attempts. I am new at PowerShell. If this approach is wrong entirely I would be grateful to know. Thank you for any advice.
Given the line-oriented nature of your input, you can process your input file line by line and decide for each line whether it needs transformation or not:
$file = 'file.md'
(Get-Content $file | ForEach-Object {
if ($_ -match '^<') { # Is this an HTML line? - you could make this regex stricter
$_ | pandoc -f html -t 'markdown_strict-raw_html-native_divs-native_spans-bracketed_spans' --atx-headers
} else { # A non-HTML line, pass through as-is
$_
}
}) | Set-Content -Encoding Utf8 $file # be sure to choose the desired encoding
Note the (...) around the pipeline before Set-Content, which ensures that $file is read into memory in full up front, which allows writing back to the same file - do note that this convenient approach bears the slight risk of data loss, however, if the command is interrupted before writing completes; always create a backup of the input files first.
I have a file that looks like this. There are many lines in this format.
5/10 RED SYSID This is a long message
I would like to have these line be in 4 comma-separated columns.
5/10,RED,SYSID,This is a long message
How can I replace only the first three spaces with commas?
You can do this with the PowerShell -split and -join operators.
$line -split ' ',3 -join ','
This example will convert the first three spaces into commas. -split ' ',3 will split the string into an array of four elements separated by the first three spaces in the string. Then -join ',' will rejoin them into one string with a comma between each.
The above won't work if your input has multiple spaces between fields since each space is considered separately, or if your fields are separated by other whitespace such as tabs. Instead, use a regex split.
$line -split '\s+',3,"RegexMatch" -join ','
This example treats as a delimiter the first three matches of \s+ and converts a sequence of consecutive whitespace into a single comma.
To run against every line in a file, use Get-Content and Foreach-Object
Get-Content $filename | foreach {
$_ -split '\s+',3,"RegexMatch" -join ','
} | Out-File $newfilename
The following regex should do what you want.
$line -replace '^(\S+?) (\S+?) (\S+?) (.*)','$1,$2,$3,$4'
This captures four groups of non-whitespace characters separated by spaces, with the last group containing the remainder of the string. Then it replaces them with those same four groups separated by commas.
To use this to modify every matching line in a file, Pipe Get-Content through Foreach-Object and finally to Out-File
$regex = [regex]'^(\S+?) (\S+?) (\S+?) (.*)','$1,$2,$3,$4'
Get-Content $filename | foreach {
$_ -replace $regex
} | Out-File $newfilename
Any lines the regex does not match will be sent to the output file unchanged. This includes if any lines contain tabs instead of spaces. If you need to test for this in your script, you can first test for $_ -match $regex, and take appropriate action if that returns false.
This might be what you're looking for.
Replace the first occurence of a string in a file
The relevant code is this:
$re = [regex]' '
$re.Replace([string]::Join("`n", (gc C:\Path\To\test.txt)), ',', 3)
I have a CSV file using different quote and text delimiter characters other than the default. I know for the delimiter there is an option for a different delimiter but I cannot find out how to get rid of the quote characters.
Import-Csv 'C:\test.txt' -Delimiter "(character U+0014 is used here, won't show here)"
But the quote character is the U+00FE and I need to remove this as well so I can get the text without any special characters. I do not want to write this out to a new file. I want to import the csv into a variable so I can do some analytic's on it. For example see if a field is empty.
Any ideas?
The delimiter is not actually a problem, as you can do that with
-Delimiter "$([char]0x14)"
As for the quotes you can use a preprocessing step and then use ConvertFrom- instead of Import-CSV:
Get-Content test.txt |
ForEach-Object { $_ -replace ([char]0xFE) } | # to remove the “quotes”
ConvertFrom-CSV -Delimiter "$([char]0x14)"
If your lines contain embedded quotes then it needs a bit more work and probably easier just to force-quote every field:
$14 = "$([char]0x14)"
$_ -replace ([char]0xFE) -replace '"', '""' -replace "(?<=^|$14)|(?=`$|$14)", '"'
So lets say I have 5 files: f1, f2, f3, f4, f5. How can I remove the common strings (same text in all files) from all 5 files and put them into a 6th file, f6? Please let me know.
Format of the files:
property.a.p1=some string
property.b.p2=some string2
.
.
.
property.zzz.p4=123455
So if the above is an excerpt from file 1 and files 2 to 5 also have the string property.a.p1=some string in them, then I'd like to remove that string from files 1 to 5 and put it in file 6. Each line of each file is on a new line. Thus, I would be comparing each string on a newline one by one. Each file is around 400 to 600 lines.
I found this on a forum for removing common strings from two files using ruby:
$ ruby -ne 'BEGIN {a=File.read("file1").split(/\n+/)}; print $_ if a.include?($_.chomp)' file2
See if this does what you want. It's a "2-pass" solution, the first pass uses a hash table to find the common lines, and the second uses that to filter out any lines that match the commons.
$files = gci "file1.txt","file2.txt","file3.txt","file4.txt","file5.txt"
$hash = #{}
$common = new-object system.collections.arraylist
foreach ($file in $files) {
get-content $file | foreach {
$hash[$_] ++
}
}
$hash.keys |% {
if ($hash[$_] -eq 5){[void]$common.add($_)}
}
$common | out-file common.txt
[regex]$common_regex = ‘^(‘ + (($common |foreach {[regex]::escape($_)}) –join “|”) + ‘)$’
foreach ($file in $files) {
$new_file = get-content $file |? {$_ -notmatch $common_regex}
$new_file | out-file "new_$($file.name)"
}
Create a table in an SQL database like this:
create table properties (
file_name varchar(100) not null, -- Or whatever sizes make sense
prop_name varchar(100) not null,
prop_value varchar(100) not null
)
Then parse your files with some simple regular expressions or even just split:
prop_name, prop_value = line.strip.split('=')
dump the parsed data into your table, and do a bit of SQL to find the properties that are common to all files:
select prop_name, prop_value
from properties
group by prop_name, prop_value
having count(*) = $n
Where $n is replaced by the number of input files. Now you have a list of all the common properties and their values so write those to your new file, remove them from your properties table, and then spin through all the rows that are left in properties and write them to the appropriate files (i.e. the file named by the file_name column).
You say that the files are "huge" so you probably don't want to slurp all of them into memory at the same time. You could do multiple passes and use a hash-on-disk library for keeping track of what has been seen and where but that would be a waste of time if you have an SQL database around and everyone should have, at least, SQLite kicking around. Managing large amounts of structured data is what SQL and databases are for.