I have the below code to convert an LDIF file (over 100.000 lines) to a CSV file (over 4.000 lines), but I'm not sure I'm happy with the time it takes - although I don't know how long it should take really; maybe that's a normal time on my laptop (Core i5 7th Gen, 16GB RAM, SSD drive)?
Would there be any room for improvement? (especially for the parsing if possible, which takes 30 seconds)
# Reducing & editing data to process:
# -----------------------------------
$original = Get-Content $IN_ldif_file
$reduced = (($original | select-string -pattern '^cust[A-Z]','^$' -CaseSensitive).Line) -replace ':: ', ': ' -replace '^cust',''
"Writing reduced LDIF file..." # < 1 sec
(Measure-Command { Set-Content $reducedLDIF -Value $reduced -Encoding UTF8 }).TotalSeconds
# Parsing the relevant data:
# --------------------------
$inData = New-Object -TypeName System.IO.StreamReader -ArgumentList $reducedLDIF
$a = #{} # initialize the temporary hash
$lineNum = $rcdNum = 0 # initialize the counters
"Parsing reduced LDIF file..." # 27-36 sec
(Measure-Command {
# Begin reading and processing the input file:
$results = while (-not $inData.EndOfStream)
{
$line = $inData.ReadLine()
Write-Verbose "$("{0:D4}" -f ++$lineNum)|$("{0:D4}|" -f $rcdNum)$line"
if (($line -match "^\s*$") -or $inData.EndOfStream )
{
# blank line or end of stream - dump the hash as an object and reinit the hash
[PSCustomObject]$a
$a = #{}
$rcdNum++
} else {
# build up hash table for the object
$key, $value = $line -split ": "
$a[$key] = $value
}
}
$inData.Close()
}).TotalSeconds
# Populating & writing the CSV file:
# ----------------------------------
"Populating the CSV data..." # 7-11 sec
(Measure-Command {
$out = $results |
select "Attribute01",
"Attribute02",
"Attribute03",
<# etc... #>
#{n="Attribute39"; E={$_."Attribute20"}}, # Attribute39 (not in LDIF) takes value of Attribute20
"Attribute40"
}).TotalSeconds
"Writing CSV file..." # < 1 sec
(Measure-Command { $out | Export-CSV $OUT_csv_file -NoTypeInformation }).TotalSeconds
Note: I actually don't need to export the "$reduced" data to a file (e.g. "$reducedLDIF"), but the piece of code I found for the parsing seems to require a file.
Thanks!
So I found a way to cut the parsing time by almost half, by re-using the data in the $reduced variable that's already in memory:
$a = #{} # initialize the temporary hash
$lineNum = $rcdNum = 0 # initialize the counters
"Parsing reduced LDIF file..."
(Measure-Command {
$results = ForEach ($line in $reduced) {
Write-Verbose "$("{0:D6}" -f ++$lineNum)|$("{0:D4}|" -f $rcdNum)$line"
if ($line -match "^\s*$")
{ # blank line or end of stream - dump the hash as an object and reinit the hash
[PSCustomObject]$a
$a = #{}
$rcdNum++
}
else {
# build up hash table for the object
$key, $value = $line -split ": "
$a[$key] = $value
}
}
}).TotalSeconds
This is already more acceptable (about 16 sec instead of 30).
Related
I have a bunch of files in folder A and their corresponding metadata files in folder B. I want to loop though the data files and check if the columns are the same in the metadata file, (since incoming data files could have new columns added at any position without notice). If the columns in both files match, no action to is to be taken. If Data file has more columns than metadata file, then those columns should be deleted from incoming data file. Any help would be appreciated. Thanks!
Data file is ps_job.txt
“empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”
“1”|”Tom”|”10″|”11111″|”1000″|”M”
“2”|”Ann”|”20″|”22222″|”2000″|”F”
Meta data file is ps_job_metadata.dat
“empid”|”name”|”zipcode”|”salary”
I would like my output to be
“empid”|”name”|”zipcode”|”salary”
“1”|”Tom”|”11111″|”1000″
“2”|”Ann”|”22222″|”2000″
That's a seemingly simple question with a very complicated answer. However, I've broken down the code for what you will need to do. Here are the steps that need to happen in order for powershell to do everything you're asking of it.
Read the .dat file
Save the .dat data into an object
Read the .txt file
Save the .txt header into an object
Check for the differences
Delete the old text file (that had too many columns)
Create a new text file with the new columns
I've made some assumptions in how this looks. However, with the way I've structured the code, it should be easy enough to make modifications as necessary if my assumptions are wrong. Here are my assumptions:
The text file will always have all of the columns that the DAT file has (even though it will sometimes have more)
The dat file is structured like a text file and can be directly imported into powershell.
And here is the code, with comments. I've done my best to explain the purpose of each section, but I've written this with the expectation that you have a basic knowledge of powershell, especially arrays. If you have questions I'll do my best to answer, though I'll ask that you refer to the section of code you have questions on.
###
### The paths. I'm sure you will have multiples of each file. However, I didn't want to attempt to pull in
### the files with this sample code as it can vary so much in your environment.
###
$dat = "C:\StackOverflow\thingy.dat"
$txt = "C:\stackoverflow\ps_job.txt"
###
### This is the section to process the DAT file
###
# This will read the file and put it in a variable
$dat_raw = get-content -Path $dat
# Now, let's seperate out the punctuation and give us our object
$dat_array = $dat_raw.split("|")
$dat_object = #()
foreach ($thing in $dat_array)
{
$dat_object+=$thing.Replace("""","")
}
###
### This is the section to process the TXT file
###
# This will read the file and put it into a variable
$txt_raw = get-content -Path $txt
# Now, let's seperate out the punctuation and give us our object
$txt_header_array = $txt_raw[0].split("|")
$txt_header_object = #()
foreach ($thing in $txt_header_array)
{
$txt_header_object += $thing.Replace("""","")
}
###
### Now, let's figure out which columns we're eliminating (if any)
###
$x = 0
$total = $txt_header_object.count
$to_keep = #()
While ($x -le $total)
{
if ($dat_object -contains $txt_header_object[$x])
{
$to_keep += $x
}
$x++
}
### Now that we know which objects to keep, we can apply the changes to each line of the text file.
### We will save each line to a new variable. Then, once we have the new variable, we will delete
### The existing file with a new file that has only the data we want.Note, we will only run this
### Code if there's a difference in the files.
if ($total -ne $to_keep.count)
{
### This first section will go line by line and 'fix' the number of columns
$new_text_file = #()
foreach ($line in $txt_raw)
{
if ($line.Length -gt 0)
{
# Blank out the array each time
$line_array = #()
foreach ($number in $to_keep)
{
$line_array += ($line.split("|"))[$number]
}
$new_text_file += $line_array -join "|"
}
else
{
$new_text_file +=""
}
}
### This second section will delete the original file and replace it with our good
### file that has been created.
Remove-item -Path $txt
$new_text_file | out-file -FilePath $txt
}
This small example can be a start for your solution :
$ps_job = Import-Csv D:\ps_job.txt -Delimiter '|'
$ps_job_metadata = (Get-Content D:\ps_job_metadata.txt) -split '\|'-replace '"'
foreach( $d in (Compare-Object $column $ps_job_metadata))
{
if($d.SideIndicator -eq '<=')
{
$ps_job | %{ $_.psobject.Properties.Remove($d.InputObject) }
}
}
$ps_job | Export-Csv -Path D:\output.txt -Delimiter '|' -NoTypeInformation
I tried this and it works.
$outputFile = "C:\Script_test\ps_job_mod.dat"
$sample = Import-Csv -Path "C:\Script_test\ps_job.dat" -Delimiter '|'
$metadataLine = Get-Content -Path "C:\Script_test\ps_job_metadata.txt" -First 1
$desiredColumns = $metadataLine.Split("|").Replace("`"","")
$sample | select $desiredColumns | Export-Csv $outputFile -Encoding UTF8 -NoTypeInformation -Delimiter '|'
Please note that the smart quotes are in consistent over the rows and there are empty lines between the rows (I highly recommend to reformat/update your question).
Anyways, as long as the quoting of the header is consistent between the two (ps_job.txt and ps_job_metadata.dat) files:
# $JobTxt = Get-Content .\ps_job.txt
$JobTxt = #'
“empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”
“1”|”Tom”|”10″|”11111″|”1000″|”M”
“2”|”Ann”|”20″|”22222″|”2000″|”F”
'#
# $MetaDataTxt = Get-Content .\ps_job_metadata.dat
$MetaDataTxt = #'
“empid”|”name”|”zipcode”|”salary”
'#
$Job = ConvertFrom-Csv -Delimiter '|' $JobTxt
$MetaData = ConvertFrom-Csv -Delimiter '|' (#($MetaDataTxt) + 'x|')
$Job | Select-Object $MetaData.PSObject.Properties.Name
“empid” ”name” ”zipcode” ”salary”
------- ------ --------- --------
“1” ”Tom” ”11111″ ”1000″
“2” ”Ann” ”22222″ ”2000″
Here's the same answer I posted to your question on Powershell.org
$jobfile = "ps_job.dat"
$metafile = "ps_job_metadata.dat"
$outputfile = "some_file.csv"
$meta = ((Get-Content $metafile -First 1 -Encoding UTF8) -split '\|')
Class ColumnSelector : System.Collections.Specialized.OrderedDictionary {
Select($line,$meta)
{
$meta | foreach{$this.add($_,(iex "`$line.$_"))}
}
ColumnSelector($line,$meta)
{
$this.select($line,$meta)
}
}
import-csv $jobfile -Delimiter '|' |
foreach{[pscustomobject]([columnselector]::new($_,$meta))} |
Export-CSV $outputfile -Encoding UTF8 -NoTypeInformation -Delimiter '|'
Output
PS C:\>Get-Content $outputfile
"empid"|"name"|"zipcode"|"salary"
"1"|"Tom"|"11111"|"1000"
"2"|"Ann"|"22222"|"2000"
Provided you want to keep those curly quotes and your code page and console font supports all the characters, you can do the following:
# Create array of properties delimited by |
$headers = (Get-Content .\ps_job_metadata.dat -Encoding UTF8) -split '\|'
Import-Csv ps_job.dat -Delimiter '|' -Encoding utf8 | Select-Object $headers
I want to write a powershell-script which checks if a network interface card which uses receive side scaling uses a processor with a NUMA (Non-Uniform Memory Access) distance > 0.
What I've done so far:
$name = "Ethernet"
$adapter = Get-NetAdapterRss -Name $name
This outputs the RSS-Adapter processor data (together with other information) like:
RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:2/0 0:4/0 0:6/0 0:8/0 0:10/0 0:12/0 0:14/0
0:16/0 0:18/0 0:20/0 0:22/0 0:24/32767 0:26/32767 0:28/32767
0:30/32767
0:32/32767 0:34/32767 0:36/32767 0:38/32767 0:40/32767
0:42/32767 0:44/32767 0:46/32767
As you see, the NUMA distance is the value behind the '/'.
Now i want to retrieve it like:
foreach($processor in $adapter.RssProcessorArray)
{
Write-Host $processor.ProcessorGroup
Write-Host $processor.ProcessorNumber
Write-Host $processor.??
}
Somehow there is no ".NumaDistance" property on the object i get. How can i get this value for each processor in the list?
Similar idea, but with regexp:
$str = (Get-NetAdapterrss -name "Ethernet" | Out-String).Split("`n") | where {$_ -like 'RssProcessorArray*'}
$rss = $str | Select-String '\d+:\d+/\d+' -AllMatches
Write-Output $rss.Matches.Value
$rss.Matches.Value | foreach { ($_ -split "[:/]") -join "---" } #if need each value separetly
Using static data as an example, but hope this helps
$text = 'RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:2/0 0:4/0 0:6/0 0:8/0 0:10/0 0:12/0 0:14/0 0:16/0 0:18/0 0:20/0 0:22/0 0:24/32767 0:26/32767 0:28/32767 0:30/32767 0:32/32767 0:34/32767 0:36/32767 0:38/32767 0:40/32767 0:42/32767 0:44/32767 0:46/32767'
# split the text up on spaces
$firstSplit = $text.Split(' ')
# take all results starting at the first 0:0/0
# put into an array
[array]$processData = $firstSplit[4..($firstSplit.Count -1)]
# get just the data after the / for each item in the array
[array]$splitProcessData = $processData.split('/') | ? {$_ -notmatch ':'}
foreach($processor in $adapter.RssProcessorArray)
{
Write-Host $processor.ProcessorGroup
Write-Host $processor.ProcessorNumber
foreach($entry in $splitProcessData)
{
Write-Host $entry
}
}
So I have a parser that goes through two different logs, both .csv files, and checks for certain lines based off the regex code that I have chosen.
This one grabs the IDNumber from the beginning of the filename(1234-randomfile.csv), then adds the files location to a variable($Validate), then based on the regex, adds files to certain variables($Scriptdone, $Updatedone, $Failed) and starts the checks to see if they have them.
I am trying to make it so that the output is not line for line as the files I parse through have the same IDNumbers. So for example:
Output Currently:
1234 Script Completed
1234 Update Completed
How I want output:
1234 Script Completed Update Completed
Anyways, Thanks for all the assistance!
function Get-MR4RES {
[CmdletBinding()]
param (
[Parameter(Position = 0,
Mandatory = $True)]
[ValidateNotNullorEmpty()]
[ValidateScript( {Test-Path -Path $_ -PathType 'Any'})]
[String]
$Files,
[Parameter(Position = 1,
Mandatory = $false)]
[String]
$CSVPath) # End Param
begin {
# Setting Global Variables
$Scriptcompletedsuccess = '.+Script\scompleted\ssuccessfully.+' # 3:44:15 End function called, Script completed successfully at 3:44:15 on Tue 07/03/2018
$Updatecomplete = '\w+\s+\:\s\[\d+\:\d+\:\d+\]\s+\w+\scomplete' # STATUS : [03:43:07] Update complete
$FailedValidaton = '.+check\sfail.+'
$Fail1 = 'Validation Failed'
$Fail2 = 'Failed'
$Good1 = 'Script completed'
$Good2 = 'Update completed'
$array = #('IDNumber, Results')
$counter = 0
$FileList = (Get-ChildItem -Path $Files -File -Filter "*.log").FullName
$Done = ''
} # End begin
process {
# Do the following code in all the files in the filelist
foreach ($File in $fileList) {
# Test files variables to ensure is directory to ensure progress bar will be operational and needed
if ((Get-Item $Files) -is [System.IO.DirectoryInfo]) {
# Counts once per each file variable in filelist variable
$counter++
# Progress bar indicates the name of the current file and calculates percent based on current count verses total files in $filelist
Write-Progress -Activity 'Analyzing Files' -CurrentOperation $File -PercentComplete (($counter / $FileList.count) * 100)
}
# Calculates ID number based on filename, file name is -filtered in beginning to only contain properly named files
$IDNumber = [System.IO.Path]::GetFileName("$File").split('-')[0]
# Puts file into Variable to be IF Else
$Validate = Get-Content -Path $File
$Scriptdone = $Validate | Where-Object {$_ -match $Scriptcompletedsuccess}
$Updatedone = $Validate | where-object {$_ -match $Updatecomplete}
$Failed = $Validate | Where-Object {$_ -match $FailedValidaton}
# Check if the file HAS a FAILED validation
if($Failed){
# Creates an array of the data from each file that failed
$array += -join ("$IDNumber",', ',"$Fail1")
}
Elseif($Scriptdone){
$Done = $Good1
# Creates an array of the data from each file that script completed
$array += -join ("$IDNumber",', ',"$Done")
} # if the parser found "Update complete"
Elseif($Updatedone){
$Done = $Good2
# Creates an array of the data from each file that update is done
$array += -join ("$IDNumber",', ',"$Done")
} # End of Successful
Else{
# Creates an array of the data from each file that failed
$array += -join ("$IDNumber",', ',"$Fail2")
}
} # End of foreach
} # End process section
End {
# If CSVPath is used in get-command
if ($PSBoundParameters.ContainsKey('CSVPath')) {
# Pipe the array data to a CSV
Add-Content -Path $CSVPath -Value $array -Encoding ascii
}
# If no CSVPath is used in get-command
else {
# Out-put to console
Write-Output $array
} # End of else
} # End of the End
} # End of function
If you want to append new message to existing output you have to tell PowerShell to which entry it should add new info. As manipulating strings is not very intuitive in my opinion I'd suggest to use an object for that.
First you have to define data structure:
// Before ForEach
$array = #()
$properties = #{'ID'="";
'Results'=""}
// In ForEach
$object = New-Object –TypeName PSObject –Prop $properties
$object.ID = $IDNumber
Next, in your if you can set the value (this can also be done using Switch as suggested by #LotPings but let's leave it as it is for simplicity):
$object.Results = $Done // or $Fail or $Fail2
Then you should first check if the entry with such $ID already exists and if yes, add new result. If no, just add new element to the array. Something like this should work:
$line = $array | Where-Object ID -eq $object.id
if ($line) {
$line.Results += " $($object.Results)"
}
else {
$array += $object
}
Of course this will also require changing the way as you output you data (for example by using Export-Csv):
$array | Export-Csv $CSVPath -Append -NoTypeInformation
I try to make, the line from the first array is read from a file and is replaced with a line from the second array, so some times with different lines. I made a script, but I do not understand why it does not work.
$OldStrings = #(
"desktopwidth:i:1440",
"desktopheight:i:900",
"winposstr:s:0,1,140,60,1596,999"
)
$NewStrings = #(
"desktopwidth:i:1734",
"desktopheight:i:990",
"winposstr:s:0,1,50,7,1800,1036"
)
$LinesArray = Get-Content -Path 'C:\temp\My Copy\Default.rdp'
$LinesCount = $LinesArray.Count
for ($i=0; $i -lt $LinesCount; $i++) {
foreach ($OldString in $OldStrings) {
foreach ($NewString in $NewStrings) {
if ($LinesArray[$i] -like $OldString) {
$LinesArray[$i] = $LinesArray[$i] -replace $OldString, $NewString
Write-Host "`nline" $i "takes on value:" $LinesArray[$i] "`n" -ForegroundColor Gray
}
}
}
}
The file is probably why it is not read at all.
After executing the script, I see only
line 2 takes on value: desktopwidth:i:1734
line 3 takes on value: desktopwidth:i:1734
line 5 takes on value: desktopwidth:i:1734
You're looking through the string arrays twice. You want to do two loops, one for each line in the file AND another for each count in the lines you're replacing. I think this should work:
$OldStrings = #(
"desktopwidth:i:1440",
"desktopheight:i:900",
"winposstr:s:0,1,140,60,1596,999"
)
$NewStrings = #(
"desktopwidth:i:1734",
"desktopheight:i:990",
"winposstr:s:0,1,50,7,1800,1036"
)
$LinesArray = Get-Content -Path 'C:\temp\My Copy\Default.rdp'
# loop through each line
for ($i=0; $i -lt $LinesArray.Count; $i++)
{
for ($j=0;$j -lt $OldStrings.Count; $j++)
{
if ($LinesArray[$i] -match $OldStrings[$j])
{
$LinesArray[$i] = $LinesArray[$i] -replace $OldStrings[$j],$NewStrings[$j]
Write-Host "`nline" $i "takes on value:" $LinesArray[$i] "`n" -ForegroundColor Gray
}
}
}
$LinesArray | Set-Content -Path 'C:\temp\My Copy\Default.rdp'
You don't need to bother checking the lines to look for matches. Since you have the replacements ready just do the replacements outright anyway. Should be faster this way as well.
$stringReplacements = #{
"desktopwidth:i:1440" = "desktopwidth:i:1734"
"desktopheight:i:900" = "desktopheight:i:990"
"winposstr:s:0,1,140,60,1596,999" = "winposstr:s:0,1,50,7,1800,1036"
}
$path = 'C:\temp\My Copy\Default.rdp'
# Read the file in as a single string.
$fileContent = Get-Content $path | Out-String
# Iterate over each key value pair
$stringReplacements.Keys | ForEach-Object{
# Attempt the replacement for each key/pair search/replace pair
$fileContent =$fileContent.Replace($_,$stringReplacements[$_])
}
# Write changes back to file.
# $fileContent | Set-Content $path
$stringReplacements is a key value hash of search and replace strings. I don't see you writing the changes back to file so I left a line on the end for you to uncomment.
You could add in checks to do the replacements still if you value the write-host lines but I figured that was for debugging and you already know how to do that.
I'm trying to format large text files (~300MB) between 0 to 3 columns :
12345|123 Main St, New York|91110
23456|234 Main St, New York
34567|345 Main St, New York|91110
And the output should be:
000000000012345,"123 Main St, New York",91110,,,,,,,,,,,,
000000000023456,"234 Main St, New York",,,,,,,,,,,,,
000000000034567,"345 Main St, New York",91110,,,,,,,,,,,,
I'm new to powershell, but I've read that I should avoid Get-Content so I am using StreamReader. It is still much too slow:
function append-comma{} #helper function to append the correct amount of commas to each line
$separator = '|'
$infile = "\large_data.csv"
$outfile = "new_file.csv"
$target_file_in = New-Object System.IO.StreamReader -Arg $infile
If ($header -eq 'TRUE') {
$firstline = $target_file_in.ReadLine() #skip header if exists
}
while (!$target_file_in.EndOfStream ) {
$line = $target_file_in.ReadLine()
$a = $line.split($separator)[0].trim()
$b = ""
$c = ""
if ($dataType -eq 'ECN'){$a = $a.padleft(15,'0')}
if ($line.split($separator)[1].length -gt 0){$b = $line.split($separator)[1].trim()}
if ($line.split($separator)[2].length -gt 0){$c = $line.split($separator)[2].trim()}
$line = $a +',"'+$b+'","'+$c +'"'
$line -replace '(?m)"([^,]*?)"(?=,|$)', '$1' |append-comma >> $outfile
}
$target_file_in.close()
I am building this for other people on my team and wanted to add a gui using this guide:
http://blogs.technet.com/b/heyscriptingguy/archive/2014/08/01/i-39-ve-got-a-powershell-secret-adding-a-gui-to-scripts.aspx
Is there a faster way to do this in Powershell?
I wrote a script using Linux bash(Cygwin64 on Windows) and a separate one in Python. Both ran much faster, but I am trying to script something that would be "approved" on a Windows Platform.
All that splitting and replacing costs you way more time than you gain from the StreamReader. Below code cut execution time to ~20% for me:
$separator = '|'
$infile = "\large_data.csv"
$outfile = "new_file.csv"
if ($header -eq 'TRUE') {
$linesToSkip = 1
} else {
$linesToSkip = 0
}
Get-Content $infile | select -Skip $linesToSkip | % {
[int]$a, [string]$b, [string]$c = $_.split($separator)
'{0:d15},"{1}",{2},,,,,,,,,,,,,' -f $a, $b.Trim(), $c.Trim()
} | Set-Content $outfile
How does this work for you? I was able to read and process a 35MB file in about 40 seconds on a cheap ole workstation.
File Size: 36,548,820 bytes
Processed In: 39.7259722 seconds
Function CheckPath {
[CmdletBinding()]
param(
[Parameter(Mandatory=$True,
ValueFromPipeline=$True)]
[string[]]$Path
)
BEGIN {}
PROCESS {
IF ((Test-Path -LiteralPath $Path) -EQ $False) {Write-host "Invalid File Path $Path"}
}
END {}
}
$infile = "infile.txt"
$outfile = "restult5.txt"
#Check File Path
CheckPath $InFile
#Initiate StreamReader
$Reader = New-Object -TypeName System.IO.StreamReader($InFile);
#Create New File Stream Object For StreamWriter
$WriterStream = New-Object -TypeName System.IO.FileStream(
$outfile,
[System.IO.FileMode]::Create,
[System.IO.FileAccess]::Write);
#Initiate StreamWriter
$Writer = New-Object -TypeName System.IO.StreamWriter(
$WriterStream,
[System.Text.Encoding]::ASCII);
If ($header -eq $True) {
$Reader.ReadLine() |Out-Null #Skip First Line In File
}
while ($Reader.Peek() -ge 0) {
$line = $Reader.ReadLine() #Read Line
$Line = $Line.split('|') #Split Line
$OutPut = "$($($line[0]).PadLeft(15,'0')),`"$($Line[1])`",$($Line[2]),,,,,,,,,,,,"
$Writer.WriteLine($OutPut)
}
$Reader.Close();
$Reader.Dispose();
$Writer.Flush();
$Writer.Close();
$Writer.Dispose();
$endDTM = (Get-Date) #Get Script End Time For Measurement
Write-Host "Elapsed Time: $(($endDTM-$startDTM).totalseconds) seconds" #Echo Time elapsed
Regex is fast:
$infile = ".\large_data.csv"
gc $infile|%{
$x=if($_.indexof('|')-ne$_.lastindexof('|')){
$_-replace'(.+)\|(.+)\|(.+)',('$1,"$2",$3'+','*12)
}else{
$_-replace'(.+)\|(.+)',('$1,"$2"'+','*14)
}
('0'*(15-($x-replace'([^,]),.+','$1').length))+$x
}
I have another approach. Let powershell read the input file as a csv file, with a pipe character as delimiter. Then format the output the way you want it. I have not tested this for speed with large files.
$infile = "\large-data.csv"
$outfile = "new-file.csv"
import-csv $infile -header id,addr,zip -delimiter "|" |
% {'{0},"{1}",{2},,,,,,,,,,,,,' -f $_.id.padleft(15,'0'), $_.addr.trim(), $_.zip} |
set-content $outfile