UNIX format files with Powershell - windows

How do you create a unix file format in Powershell? I am using the following to create a file, but it always creates it in the windows format.
"hello world" | out-file -filepath test.txt -append
As I understand, the new line characters CRLF make it to be a Windows format file whereas the unix format needs only a LF at the end of the line. I tried replacing the CRLF with the following, but it didn't work
"hello world" | %{ $_.Replace("`r`n","`n") } | out-file -filepath test.txt -append

There is a Cmdlet in the PowerShell Community Extensions called ConvertTo-UnixLineEnding

One ugly-looking answer is (taking input from dos.txt outputting to unix.txt):
[string]::Join( "`n", (gc dos.txt)) | sc unix.txt
but I would really like to be able to make Set-Content do this by itself and this solution does not stream and therefore does not work well on large files...
And this solution will end the file with a DOS line ending as well... so it is not 100%

I've found that solution:
sc unix.txt ([byte[]][char[]] "$contenttext") -Encoding Byte
posted above, fails on encoding convertions in some cases.
So, here is yet another solution (a bit more verbose, but it works directly with bytes):
function ConvertTo-LinuxLineEndings($path) {
$oldBytes = [io.file]::ReadAllBytes($path)
if (!$oldBytes.Length) {
return;
}
[byte[]]$newBytes = #()
[byte[]]::Resize([ref]$newBytes, $oldBytes.Length)
$newLength = 0
for ($i = 0; $i -lt $oldBytes.Length - 1; $i++) {
if (($oldBytes[$i] -eq [byte][char]"`r") -and ($oldBytes[$i + 1] -eq [byte][char]"`n")) {
continue;
}
$newBytes[$newLength++] = $oldBytes[$i]
}
$newBytes[$newLength++] = $oldBytes[$oldBytes.Length - 1]
[byte[]]::Resize([ref]$newBytes, $newLength)
[io.file]::WriteAllBytes($path, $newBytes)
}

make your file in the Windows CRLF format. then convert all lines to Unix format in new file:
$streamWriter = New-Object System.IO.StreamWriter("\\wsl.localhost\Ubuntu\home\user1\.bashrc2")
$streamWriter.NewLine = "`n"
gc "\\wsl.localhost\Ubuntu\home\user1\.bashrc" | % {$streamWriter.WriteLine($_)}
$streamWriter.Flush()
$streamWriter.Close()
not a one-liner, but works for all lines, including EOF. new file now shows as Unix format in Notepad on Win11.
delete original file & rename new file to original, if you like:
ri "\\wsl.localhost\Ubuntu\home\user1\.bashrc" -Force
rni "\\wsl.localhost\Ubuntu\home\user1\.bashrc2" "\\wsl.localhost\Ubuntu\home\user1\.bashrc"

Two more examples on how you can replace CRLF by LF:
Example:
(Get-Content -Raw test.txt) -replace "`r`n","`n" | Set-Content test.txt -NoNewline
Example:
[IO.File]::WriteAllText('C:\test.txt', ([IO.File]::ReadAllText('C:\test.txt') -replace "`r`n","`n"))
Be aware, this does really just replace CRLF by LF. You might need to add a trailing LF if your Windows file does not contain a trailing CRLF.

Related

Powershell 7.x How to Select a Text Substring of Unknown Length Only Using Boundary Substrings

I am trying to store a text file string which has a beginning and end that make it a substring of the original text file. I am new to Powershell so my methods are simple/crude. Basically my approach has been:
Roughly get what I want from the start of the string
Worry about trimming off what I don't want later
My minimum reproducible example is as follows:
# selectStringTest.ps
$inputFile = Get-Content -Path "C:\test\test3\Copy of 31832_226140__0001-00006.txt"
# selected text string needs to span from $refName up to $boundaryName
[string]$refName = "001 BARTLETT"
[string]$boundaryName = "001 BEECH"
# a rough estimate of the text file lines required
[int]$lines = 200
if (Select-String -InputObject $inputFile -pattern $refName) {
Write-Host "Selected shortened string found!"
# this selects the start of required string but with extra text
[string]$newFileStart = $inputFile | Select-String $refName -CaseSensitive -SimpleMatch -Context 0, $lines
}
else {
Write-Host "Selected string NOT FOUND."
}
# tidy up the start of the string by removing rubbish
$newFileStart = $newFileStart.TrimStart('> ')
# this is the kind of thing I want but it doesn't work
$newFileStart = $newFileStart - $newFileStart.StartsWith($boundaryName)
$newFileStart | Out-File tempOutputFile
As it is: the output begins correctly but I cannot remove text including and after $boundaryName
The original text file is OCR generated (Optical Character Recognition) So it is unevenly formatted. There are newlines in odd places. So I have limited options when it comes to delimiting.
I am not sure my if (Select-String -InputObject $inputFile -pattern $refName)is valid. It appears to work correctly. The general design seems crude. In that I am guessing how many lines I will need. And finally I have tried various methods of trimming the string from $boundaryName without success. For this:
string.split() not practical
replacing spaces with newlines in an array & looping through to elements of $boundaryName is possible but I don't know how to terminate the array at this point before returning it to string.
Any suggestions would be appreciated.
Abbreviated content of x2 200 listings single Copy of 31832_226140__0001-00006.txt file is:
Beginning of text file
________________
BARTLETT-BEDGGOOD
PENCARROW COMPOSITE ROLL
PAGE 6
PAGE 7
PENCARROW COMPOSITE ROLL
BEECH-BEST
www.
.......................
001 BARTLETT. Lois Elizabeth
Middle of text file
............. 15 St Ronans Av. Lower Hutt Marned 200 BEDGGOOD. Percy Lloyd
............15 St Ronans Av, Lower Mutt. Coachbuild
001 BEECH, Margaret ..........
End of text file
..............312 Munita Rood Eastbourne, Civil Eng 200 BEST, Dons Amy .........
..........50 Man Street, Wamuomata, Marned
SO NON
To use a regex across newlines, the file needs to be read as a single string. Get-Content -Raw will do that. This assumes that you do not want the lines containing refName and boundaryName included in the output
$c = Get-Content -Path '.\beech.txt' -Raw
$refName = "001 BARTLETT"
$boundaryName = "001 BEECH"
if ($c -match "(?smi).*$refName.*?`r`n(.*)$boundaryName.*?`r`n.*") {
$result = $Matches[1]
}
$result
More information at https://stackoverflow.com/a/12573413/447901
How close does this come to what you want?
function Process-File {
param (
[Parameter(Mandatory = $true, Position = 0)]
[string]$HeadText,
[Parameter(Mandatory = $true, Position = 1)]
[string]$TailText,
[Parameter(ValueFromPipeline)]
$File
)
Process {
$Inside = $false;
switch -Regex -File $File.FullName {
#'^\s*$' { continue }
"(?i)^\s*$TailText(?<Tail>.*)`$" { $Matches.Tail; $Inside = $false }
'^(?<Line>.+)$' { if($Inside) { $Matches.Line } }
"(?i)^\s*$HeadText(?<Head>.*)`$" { $Matches.Head; $Inside = $true }
default { continue }
}
}
}
$File = 'Copy of 31832_226140__0001-00006.txt'
#$Path = $PSScriptRoot
$Path = 'C:\test\test3'
$Result = Get-ChildItem -Path "$Path\$File" | Process-File '001 BARTLETT' '001 BEECH'
$Result | Out-File -FilePath "$Path\SpanText.txt"
This is the output:
. Lois Elizabeth
............. 15 St Ronans Av. Lower Hutt Marned 200 BEDGGOOD. Percy Lloyd
............15 St Ronans Av, Lower Mutt. Coachbuild
, Margaret ..........

Append forward slash to the end of certain line in a file

I have a file (Flags.txt) that looks like this:
...
C_INCLUDES = ... ... .../xxx
...
CXX_INCLUDES = ... ... .../yyy
where the line with C_INCLUDES can end with any string (here e.g xxx).
At the end, the file shall look like this:
...
C_INCLUDES = ... ... .../xxx/
...
CXX_INCLUDES = ... ... .../yyy
Therefore I want to use a windows batch file (not possible to use sed or awk) to search for the name C_INCLUDES and append at the end of the line the forward slash (but could be any smbol e.g "xxxz" or "xxx!" )?
I tried the solution from:
https://social.technet.microsoft.com/Forums/scriptcenter/en-US/fa09e27d-9f6b-4d4e-adda-f0663e0a9dde/append-string-to-text-file-at-end-of-line-starting-with-blah?forum=ITCG
$original = "flags.txt"
$tempfile = "tmp.txt"
get-content $original | foreach-object {
if ($_ -match "^C_INCLUDES") {
$_ + "/" >> $tempfile
}
else {
$_ >> $tempfile
}
}
copy-item $tempfile $original
remove-item $tempfile
But it don't work
Thanks
You imply you cannot use 3rd party (non-native) exe files such as sed. But you can use a batch file.
So you should have no problem using JREPL.BAT - a regular expression find/replace text processing utility. JREPL is pure script (hybrid batch/JScript) that runs natively on any Windows machine from XP onward - no 3rd party exe file required.
Full documentation is available from the command line via jrepl /?, or jrepl /?? for paged help.
Once you have JREPL.BAT, then the following one liner is all that is needed. It looks for any line that begins with C_INCLUDES and doesn't already end with /, and appends / to any line that matches.
jrepl "^C_INCLUDES .*(?=.$)[^/]" "$&/" /f "Flags.txt" /o -
Since JREPL is a batch script, you must use call jrepl if you put the command within another batch script.

Powershell - Read a single text file and sort contents to multiple files based on text within the line

I'm looking for some direction on how to read a file line by line, then copy the line based on a search criteria to a newly created file. Since my description is probably poor, I've tried to illustrate below:
Single Text File Sample:
Name=N0060093G
Name=N0060093H
Name=N400205PW
Name=N400205PX
Name=N966O85Q0
Name=N966O85Q1
The script would read each line and use the "###" after "Name=N", to create a new file name after the identifier, "###" to copy each appropriate line to the new file. So, lines "Name=N0060093G"and "Name=N0060093H" would go to "006.txt"; "Name=N400205PW" and "Name=N400205PX" would write to "400.txt", etc.
A RegEx style approach:
$File = 'test.txt'
Get-Content $File | ForEach {
If ($_ -match '^Name\=N(?<filename>\d{3}).*') {
$_ | Out-File -Append "$($Matches.Filename).txt" -WhatIf
}
}

Awk command for powershell

Is there any command like awk in powershell?
I want to execute this command:
awk '
BEGIN {count=1}
/^Text/{text=$0}
/^Time/{time=$0}
/^Rerayzs/{retext=$0}
{
if (NR % 3 == 0) {
printf("%s\n%s\n%s\n", text, time, retext) > (count ".txt")
count++
}
}' file
to a powershell command.
Usually we like to see what you have tried. It at least shows that you are making an effort, and we aren't just doing your work for you. I think you're new to PowerShell, so I'm going to just spoon-feed you an answer, hoping that you use it to learn and expand your knowledge, and hopefully have better questions in the future.
I am pretty sure that this will accomplish the same thing as what you laid out. You have to give it an array of input (the contents of a text file, an array of strings, something like that), and it will generate several files depending on how many matches it finds for the treo "Text", "Time", and "Rerayzs". It will order them as Text, then a new line with Time, and then a new line with Rerayzs.
$Text,$Time,$Retext = $Null
$FileCounter = 1
gc c:\temp\test.txt|%{
Switch($_){
{$_ -match "^Text"} {$Text = $_}
{$_ -match "^Time"} {$Time = $_}
{$_ -match "^Rerayzs"} {$Retext = $_}
}
If($Text -and $Time -and $Retext){
("{0}`n{1}`n{2}") -f $Text,$Time,$Retext > "c:\temp\$FileCounter.txt"
$FileCounter++
$Text,$Time,$Retext = $Null
}
}
That will get the text of a file C:\Temp\Test.txt and will output numbered files to the same location. The file I tested against is:
Text is good.
Rerayzs initiated.
Stuff to not include
Time is 18:36:12
Time is 20:21:22
Text is completed.
Rerayzs failed.
I was left with 2 files as output. The first reads:
Text is good.
Time is 18:36:12
Rerayzs initiated.
The second reads:
Text is completed.
Time is 20:21:22
Rerayzs failed.

Powershell: Count instances of strings in a file using a list

I am trying to get the number of times a string (varying from 40 to 400+ characters) in "file1" occurs in "file2" in an effective way. file1 has about 2k lines and file2 has about 130k lines. I currently have a Unix solution that does it in about 2 mins in a VM and about 5 in Cygwin, but I am trying to do it with Powershell/Python since the files are in windows and I am using the output in excel and use it with automation (AutoIT.)
I have a solution, but it takes WAY too long (in about the same times that the Cygwin finished - all 2k lines - I had only 40-50 lines in Powershell!)
Although I haven't prepare a solution yet, I am open to use Python as well if there is a solution that can be fast and accurate.
Here is the Unix Code:
while read SEARCH_STRING;
do printf "%s$" "${SEARCH_STRING}";
grep -Fc "${SEARCH_STRING}" file2.csv;
done < file1.csv | tee -a output.txt;
And here is the Powershell code I currently have
$Target = Get-Content .\file1.csv
Foreach ($line in $Target){
#Just to keep strings small, since I found that not all
#strings were being compared correctly if they where 250+ chars
$line = $line.Substring(0,180)
$Coll = Get-Content .\file2.csv | Select-string -pattern "$line"
$cnt = $Coll | measure
$cnt.count
}
Any ideas of suggestions will help.
Thanks.
EDIT
I'm trying a modified solution suggested by C.B.
del .\output.txt
$Target = Get-Content .\file1.csv
$file= [System.IO.File]::ReadAllText( "C:\temp\file2.csv" )
Foreach ($line in $Target){
$line = [string]$line.Substring(0, $line.length/2)
$cnt = [regex]::matches( [string]$file, $line).count >> ".\output.txt"
}
But, since my strings in file1 are varying in length I keept getting OutOfBound exceptions for the SubString function, so I halved (/2) the input string to try to get a match. And when I try to halve them, if I it had an open parentheses, it tells me this:
Exception calling "Matches" with "2" argument(s): "parsing "CVE-2013-0796,04/02/2013,MFSA2013-35 SeaMonkey: WebGL
crash with Mesa graphics driver on Linux (C" - Not enough )'s."
At C:\temp\script_test.ps1:6 char:5
+ $cnt = [regex]::matches( [string]$file, $line).count >> ".\output.txt ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ArgumentException
I don't know if there is a way to raise the input limit in powershell (My biggest size at the moment is 406, but could be bigger in the future) or just give up and try a Python solution.
Thoughts?
EDIT
Thanks to #C.B. I got the correct answer and it matches the output of the Bash script perfectly. Here is the full code that outputs results to a text file:
$Target = Get-Content .\file1.csv
$file= [System.IO.File]::ReadAllText( "C:\temp\file2.csv" )
Foreach ($line in $Target){
$cnt = [regex]::matches( $file, [regex]::escape($line)).count >> ".\output.txt"
}
Give this a try:
$Target = Get-Content .\file1.csv
$file= [System.IO.File]::ReadAllText( "c:\test\file2.csv" )
Foreach ($line in $Target){
$line = $line.Substring(0,180)
$cnt = [regex]::matches( $file, [regex]::escape($line)).count
}
One issue with your script is that you read file2.csv over and over again, for each line from file1.csv. Reading the file just once and storing the content in a variable should significantly speed things up. Try this:
$f2 = Get-Content .\file2.csv
foreach ($line in (gc .\file1.csv)) {
$line = $line.Substring(0,180)
#($f2 | ? { $_ -match $line }).Count
}

Resources