How to remove comments from text file - windows

My text file contains one line comments that all being with "// ". Two forward slashes and a space. These may either take up the whole line or just the last part of a line. Each comment does not extend beyond the line that it's on. So no /* */ type comments crossing multiple lines.
In simple terms, all comments start with "//space" anywhere on the line. Anything starting with "//space" should be removed and trailing spaces on that line should also be removed. Leading spaces should stay. Any blank lines should be removed.
Sample file:
// This is a comment
x = 1 // This is also a comment after the double slash
x = 2
x = 3 // The above is a blank line
// Comment on this record but nothing precedes it, so should be deleted.
y = 4 // A line with leading spaces that should be kept.
z = "//path"; // The first double slashes are not a comment since the space is missing after the "//"
// Last comment line.
Result file (no trailing spaces, but keep leading spaces.:
x = 1
x = 2
x = 3
y = 4
z = "//path";
I can remove the blank lines using gc file.txt | Where-Object { $_ -ne ''} > result.txt. However I'm having trouble with reading just the beginning part of a line up to the "//" comment part.
I also tried findstr but haven't found how to read each line up to the "//" and then trim spaces out.
I could write a script program to loop throught the file and do this, but it seems like there should be a way to accomplish it using a simple one or two line powershell or bat file command.
What is the easiest way (shortest amount of code) to remove these comments while keeping the uncommented contents of the file?

Since you seem to equate "easy" with "short", here's a fairly simple solution:
gc .\samplefile.txt|%{$_-replace"(.*)(// .*)",'$1'}|?{$_}
if it's really that important to you :-)
A bit more verbose version (still using regex):
Get-Content .\samplefile.txt | Where-Object {
-not ([String]::IsNullOrEmpty($_.Trim()) -or $_-match"^\s*// ")
} |ForEach-Object { $_ -replace "(.*)(// .*)",'$1' }
That being said, I would (personally) go for a more verbose and easier-to-read/maintain solution:
To remove everything after //, the easiest way is to find the first occurrence of // with String.IndexOf() and then grab the first part with String.Substring():
PS C:\> $CommentedString = "Content // this is a comment"
PS C:\> $CommentIndex = $CommentedString.IndexOf('// ')
PS C:\> $CommentedString.Substring(0,$CommentIndex)
Content
For the indented comments you can also use String.Trim() to remove whitespace from the beginning and end of the string:
PS C:\> " // Indented comment" -match '^//'
True
You can use the ForEach-Object cmdlet to go through every line and apply the above:
function Remove-Comments {
param(
[string]$Path,
[string]$OutFile
)
# Read file, remove comments and blank lines
$CleanLines = Get-Content $Path |ForEach-Object {
$Line = $_
# Trim() removes whitespace from both ends of string
$TrimmedLine = $Line.Trim()
# Check if what's left is either nothing or a comment
if([string]::IsNullOrEmpty($TrimmedLine) -or $TrimmedLine -match "^// ") {
# if so, return nothing (inside foreach-object "return" acts like "coninue")
return
}
# See if non-empty line contains comment
$CommentIndex = $Line.IndexOf("// ")
if($CommentIndex -ge 0) {
# if so, remove the comment
$Line = $Line.Substring(0,$CommentIndex)
}
# return $Line to $CleanLines
return $Line
}
if($OutFile -and (Test-Path $OutFile)){
[System.IO.File]::WriteAllLines($OutFile, $CleanLines)
} else {
# No OutFile was specified, write lines to pipeline
Write-Output $CleanLines
}
}
Applied to your sample:
PS C:\> Remove-Comments D:\samplefile.txt
x = 1
x = 2
x = 3

Like a great many text processing problems, there is a simple solution using JREPL.BAT - a powerful regex text processing utility for the Windows command line. It is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward. Full documentation is embedded within the script.
jrepl "^(.*?)\s*// " "$1!=''?$1:false" /jmatch /f test.txt /o out.txt
You can overwrite the original file by specifying - as the output file:
jrepl "^(.*?)\s*// " "$1!=''?$1:false" /jmatch /f test.txt /o -
I've tested, and it gives the exact output you are looking for.
If you put the command within a batch script, then you must use call jrepl

Tha Batch file below do what you want. Sorry, but there is not an "easy short code" way to do this...
#echo off
setlocal EnableDelayedExpansion
rem Set the maximum number of trailing spaces as a power_of_2-1 value. For example, for 15 spaces:
set spcPow2=4
set "spaces= "
for /L %%i in (1,1,%spcPow2%) do set "spaces=!spaces!!spaces!"
set /A spcPow2-=1
rem Process all lines, excepting empty ones and lines that start with "/"
setlocal DisableDelayedExpansion
for /F "eol=/ delims=" %%a in (test.txt) do (
set "line=%%a"
rem Split line at "// " and get the first part
setlocal EnableDelayedExpansion
for /F "delims=¡" %%b in ("!line:// =¡!") do (
endlocal
set "line=%%b"
)
rem Eliminate trailing spaces
setlocal EnableDelayedExpansion
set spc=0
for /L %%b in (%spcPow2%,-1,0) do (
set /A "newSpc=spc+(1<<%%b)"
for %%n in (!newSpc!) do if "!line:~-%%n!" equ "!spaces:~-%%n!" set "spc=%%n"
)
if !spc! gtr 0 for %%n in (!spc!) do set "line=!line:~0,-%%n!"
rem Show resulting line
if defined line echo !line!
endlocal
)
EDIT: New solution added
#set #x=1 // & CScript //nologo //E:JScript "%~F0" < samplefile.txt & goto :EOF
WScript.Stdout.Write(WScript.Stdin.ReadAll().replace(/(.*)\/\/ .*/g,"$1"))
Copy previous code into a file with .BAT extension, that is, it is a Batch file!

Related

If variable can't have spaces?

I am making an experimental program in batch for a simple chatting interface. In this one, I made a function where if there is the word r placed in chat, it ignores it and just redisplays the text file again. It works fine if I put r and it just refreshes, and if I put one word it works fine, but if I put a word and a space and another word, it breaks and shows the following error:
Chat(Put r for refresh):hey hi
hi was unexpected at this time.
Does anyone know how to fix this? Thanks.
Code:
#echo off
cls
cd %USERPROFILE%\Desktop\Chat
for /f "delims=" %%A in (chat.txt) do (
set %%A
)
echo %chatt%
echo %chatp%
echo %chatn%
cd %USERPROFILE%\Desktop\Chat\Servers\%chatt%
:1
cls
type %chatn%.chat
set /p in=Chat(Put r for refresh):
if %in% == r goto 1
echo %chatp%: %in%>>%chatn%.chat
goto 1
The usual way to deal with spaces in a string variables contents is to wrap it in quotes. This is the case here. When you use the variables contents with %in% the contents are inserted verbatim, so the suspect line would look like this:
if hey hi == r goto 1
It starts off okay if hey but then instead of seeing a comparison operator like == it sees hi and chokes. So wrap it all in quotes:
if "%in%" == "r" goto 1
That way it will be interpreted like
if "hey hi" == "r" goto 1
and the bat engine will know that "hey hi" should be treated as one entity.

Batch file to read a txt with special characters and replace a word in it

I'm trying to make a batch file that reads a txt file "ayylmao.txt" and find a specific word "hello" and replaces it with "xello".
The thing is that the "ayylmao.txt" contains specific characters.
Ayylmao.txt looks something like this:
‹‹R‹Ę‹/M‹;Ču‹č˙˙˙‹‹#‰‹‹#CëC;Đu‹čq˙˙˙‹‹#C‹D$‰;7u®‹Ó‹Ćčúţ˙˙„Ŕu3Ŕ‰YZ]_^[ĂŤ# SVWUÄđ‰$‹ô‹‰D$‹
‹‹#;Č‚† ‹Ř‹>_‹ůz;ßrv;Ču!‹B‹A‹B‹)B‹x uV‹čđţ˙˙ëM‹Ř‹>_‹ůz;ßu
‹B‹)Bë3‹Z‰\$‹>‹‹.}+ű‰|$+Č‹‰HŤT$‹čMţ˙˙„Ŕu3 hello Ŕë°ë‹‹ ‰‹;D$…Y˙˙˙3ŔÄ]_^[ĂSVW‹Ú‹đţ }ľ ëĆ˙˙ ć ˙˙‰sjh Vj
You can see the "hello" word in the last line. I want the batch to go to the process and give me a ayylmao1.txt that looks like this:
‹‹R‹Ę‹/M‹;Ču‹č˙˙˙‹‹#‰‹‹#CëC;Đu‹čq˙˙˙‹‹#C‹D$‰;7u®‹Ó‹Ćčúţ˙˙„Ŕu3Ŕ‰YZ]_^[ĂŤ# SVWUÄđ‰$‹ô‹‰D$‹
‹‹#;Č‚† ‹Ř‹>_‹ůz;ßrv;Ču!‹B‹A‹B‹)B‹x uV‹čđţ˙˙ëM‹Ř‹>_‹ůz;ßu
‹B‹)Bë3‹Z‰\$‹>‹‹.}+ű‰|$+Č‹‰HŤT$‹čMţ˙˙„Ŕu3 xello Ŕë°ë‹‹ ‰‹;D$…Y˙˙˙3ŔÄ]_^[ĂSVW‹Ú‹đţ }ľ ëĆ˙˙ ć ˙˙‰sjh Vj
You can see that "hello" is now "xello".
I found this batch file that replaces a word from a text file:
#echo off
REM -- Prepare the Command Processor --
SETLOCAL ENABLEEXTENSIONS
SETLOCAL DISABLEDELAYEDEXPANSION
if "%~1"=="" findstr "^::" "%~f0"&GOTO:EOF
for /f "tokens=1,* delims=]" %%A in ('"type %3|find /n /v """') do (
set "line=%%B"
if defined line (
call set "line=echo.%%line:%~1=%~2%%"
for /f "delims=" %%X in ('"echo."%%line%%""') do %%~X
) ELSE echo.
)
This code works for files that don't have specific characters very good if use it like this:
code.bat "hello" "xello" "ayylmao.txt">"ayylmao1.txt"
This code only types in ayylmao1.txt few special characters but replaces hello. I want all the special characters typed in there.
I made it like this:
chcp 1252
code.bat "hello" "xello" "ayylmao.txt">"ayylmao1.txt"
But it didn't work. It worked just like the first code.
If there is a way in PowerShell to do this I'd be glad to hear it.
What you have there looks like a binary file, not a text file, despite the extension. Batch is no good for editing binary files. In PowerShell it's doable, but you need to resort to working with the data bytes instead of simple text.
This is a basic example that will find the first occurrence of the string "hello" in your file and replace it with "xhello":
$f = 'C:\path\to\ayylmao.txt'
$stext = 'hello'
$rtext = [char[]]'xhello'
$len = $stext.Length
$offset = $len - 1
$data = [IO.File]::ReadAllBytes($f)
# find first occurrence of $stext in byte array
for ($i=0; $i -lt $data.Count - $offset; $i++) {
$slice = $data[$i..($i+$offset)]
if (-join [char[]]$slice -eq $stext) { break }
}
# Once you know the beginning ($i) and length ($len) of the array slice
# containing $stext you can "cut up" $data and concatenate the slices before
# and after $stext to the byte sequence you want to insert ($rtext):
#
# |<-- $stext -->|
# [...]['h','e','l','l','o'][...] <-- $data
# ^ ^ ^ ^
# | | | |
# | $i | $i+$len
# $i-1 $i+$offset (== $i+$len-1)
#
$rdata = $data[0..($i-1)] + [byte[]]$rtext + $data[($i+$len)..($data.Count-1)]
[IO.File]::WriteAllBytes($f, $rdata)
You'll need to adjust this code if you want the replacement to work differently (replace other occurrences as well, replace a different occurrence, …).
But it didn't work. It worked just like the first code. Help ?
This batch code is coming from this site and there is a link to discussion why it doesn't work with special characters.
Yes, the PowerShell replace command can replace the string and keep the special characters. To call it from within your batch script, use the following line
powershell -command "(get-content Ayylmao.txt) -replace 'hello','xello' | set-content Ayylmao.txt"
If you want to enter your parameters from the command line, then the line would be
powershell -command "(get-content %3) -replace '%1','%2' | set-content %4"
And if you want to use variables defined in the batch script, it is the same as you would for any batch script
set file=Ayylmao.txt
set Search_criteria=hello
set Replace_criteria=xello
powershell -command "(get-content %file%) -replace '%Search_criteria%','%Replace_criteria%' | set-content %file%"

Get last n lines or bytes of a huge file in Windows (like Unix's tail). Avoid time consuming options

I need to retrieve the last n lines of huge files (1-4 Gb), in Windows 7.
Due to corporate restrictions, I cannot run any command that is not built-in.
The problem is that all solutions I found appear to read the whole file, so they are extremely slow.
Can this be accomplished, fast?
Notes:
I managed to get the first n lines, fast.
It is ok if I get the last n bytes. (I used this https://stackoverflow.com/a/18936628/2707864 for the first n bytes).
Solutions here Unix tail equivalent command in Windows Powershell did not work.
Using -wait does not make it fast. I do not have -tail (and I do not know if it will work fast).
PS: There are quite a few related questions for head and tail, but not focused on the issue of speed. Therefore, useful or accepted answers there may not be useful here. E.g.,
Windows equivalent of the 'tail' command
CMD.EXE batch script to display last 10 lines from a txt file
Extract N lines from file using single windows command
https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent
powershell to get the first x MB of a file
https://superuser.com/questions/859870/windows-equivalent-of-the-head-c-command
If you have PowerShell 3 or higher, you can use the -Tail parameter for Get-Content to get the last n lines.
Get-content -tail 5 PATH_TO_FILE;
On a 34MB text file on my local SSD, this returned in 1 millisecond vs. 8.5 seconds for get-content |select -last 5
How about this (reads last 8 bytes for demo):
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-8, 'End') | Out-Null
for ($i = 0; $i -lt 8; $i++)
{
$fs.ReadByte()
}
UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):
$N = 8
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
$buffer = new-object Byte[] $N
$fs.Read($buffer, 0, $N) | Out-Null
$fs.Close()
[System.Text.Encoding]::UTF8.GetString($buffer)
UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than M newline char sequences in the result:
$M = 3
$fpath = "C:\10GBfile.dat"
$result = ""
$seq = "`r`n"
$buffer_size = 10
$buffer = new-object Byte[] $buffer_size
$fs = [IO.File]::OpenRead($fpath)
while (([regex]::Matches($result, $seq)).Count -lt $M)
{
$fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
$fs.Read($buffer, 0, $buffer_size) | Out-Null
$result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
}
$fs.Close()
($result -split $seq) | Select -Last $M
Try playing with bigger $buffer_size - this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be \r\n or just \n.
This is very dirty code without any error handling and optimizations.
When the file is already opened, it's better to use
Get-Content $fpath -tail 10
because of "exception calling "OpenRead" with "1" argument(s): "The process cannot access the file..."
This is not an answer, but a large comment as reply to sancho.s' answer.
When you want to use small PowerShell scripts from a Batch file, I suggest you to use the method below, that is simpler and allows to keep all the code in the same Batch file:
#PowerShell ^
$fpath = %2; ^
$fs = [IO.File]::OpenRead($fpath); ^
$fs.Seek(-%1, 'End') ^| Out-Null; ^
$mystr = ''; ^
for ($i = 0; $i -lt %1; $i++) ^
{ ^
$mystr = ($mystr) + ([char[]]($fs.ReadByte())); ^
} ^
Write-Host $mystr
%End PowerShell%
With the awesome answer by Aziz Kabyshev, which solves the issue of speed, and with some googling, I ended up using this script
$fpath = $Args[1]
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$Args[0], 'End') | Out-Null
$mystr = ''
for ($i = 0; $i -lt $Args[0]; $i++)
{
$mystr = ($mystr) + ([char[]]($fs.ReadByte()))
}
$fs.Close()
Write-Host $mystr
which I call from a batch file containing
#PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& '.\myscript.ps1' %1 %2"
(thanks to How to run a PowerShell script from a batch file).
Get last n bytes of a file:
set file="C:\Covid.mp4"
set n=7
copy /b %file% tmp
for %i in (tmp) do set /a m=%~zi-%n%
FSUTIL file seteof tmp %m%
fsutil file createnew temp 1
FSUTIL file seteof temp %n%
type temp >> tmp
fc /b tmp %file% | more +1 > temp
REM problem parsing file with byte offsets in hex from fc, to be converted to decimal offsets before output
type nul > tmp
for /f "tokens=1-3 delims=: " %i in (temp) do set /a 0x%i >> tmp & set /p=": " <nul>> tmp & echo %j %k >> tmp
set /a n=%m%+%n%-1
REM output
type nul > temp
for /l %j in (%m%,1,%n%) do (find "%j: "< tmp || echo doh: la 00)>> temp
(for /f "tokens=3" %i in (temp) do set /p=%i <nul) & del tmp & del temp
Tested on Win 10 cmd Surface Laptop 1
Result: 1.43 GB file processed in 10 seconds

How to combine multiple lines in a single text file into one line, in Windows?

I have a multiple standard text files that follow this format, with varying numbers of lines in each file:
Line1
Line2
Line3
Line4
I want to merge every line into one, with a space in between each set of characters, so the text file would look as such:
Line1 Line2 Line3 Line3
...and so on. This needs to work with any given number of lines, due to the fact that each text file contains a different number of lines. My intention is not to merge the lines in the text files; I want each text file to remain separate. All the solutions I have found online either don't quite fit this or work exclusively with UNIX. I am running Windows 7. This can be done in Powershell, VBS, Batch, a particular program, doesn't matter, it just needs to work with Windows.
Much appreciated!
#ECHO OFF
setlocal
(SET var=)
FOR /f "delims=" %%x IN (list.txt) DO (
CALL SET var=%%var%% %%x
)
SET var=%var:~1%
echo var=%var%=
Where list.txt is the file containing your lines and var is the variable into which you want the lines concatenated.
Using batch:
for /f "usebackqdelims=" %%i in ("infile.txt") do #<nul set /p"=%%i ">>"outfile.txt"
>>"outfile.txt" echo.
Using PowerShell give this a try and see if it's what you want:
$my_file = "C:\file.txt"
$out_file = "C:\out.txt"
(Get-Content -Path $my_file) -join " " | Set-Content -Path $out_file
For the sake of completeness here's another solution in vbscript:
Set fso = CreateObject("Scripting.FileSystemObject")
Set infile = fso.OpenTextFile("C:\infile.txt")
Set outfile = fso.OpenTextFile("C:\outfile.txt", 2, True)
If Not infile.AtEndOfStream Then outfile.Write infile.ReadLine
Do Until infile.AtEndOfStream
outfile.Write " " & infile.ReadLine
Loop
infile.Close
outfile.Close
Install git-scm, cygwin or something else that contains bash, then you can do
cat *.txt | tr "\n" " "
Something like this?
(gc C:\test.txt) -join " "

Filtering file using reg exp and concatenate certain lines together (command-prompt)

I have to filter a text file filter.tmp containing two types of lines, this shows the difference:
findstr /r "^[0-9][0-9]*.*$" filter.tmp > filter-numbers.tmp
findstr /r "^[^0-9][^0-9]*.*$" filter.tmp > filter-text.tmp
What I need to do is to append lines containing text together like this and if line does contain number just put it to output file:
IF "current line" contains text THEN
previous line = concatenate "previous line" + "/" + "current line"
ELSE
echo "previous line" >> filter.out
echo "current line" >> filter.out
filter.tmp contains something like:
Hello
World
Foo
Bar
45: this is some line
Trouble
with code
66: another line
filter.out should look like:
Hello/World/Foo/Bar
45: this is some line
Trouble/with code
66: another line
I realize, this is very simple, but I just can not get it working. As I am thinking about it, it would be much easier to use C++....
This is a quite verbatim translation of your pseudocode and your regexes, based on the assumption that »contains numbers« really means »starts with two digits« (which is what your regexes show):
#echo off
setlocal enabledelayedexpansion
set Prev=
for /f "delims=" %%x in (filter.tmp) do (
set "Line=%%x"
if "!Line:~0,2!" GEQ "00" if "!Line:~0,2!" LEQ "99" (
if not "!Prev!"=="" (>>filter.out echo !Prev!)
>>filter.out echo !Line!
set Prev=
) else (
if "!Prev!"=="" (set "Prev=!Line!") else (set "Prev=!Prev!/!Line!")
)
)
if not "!Prev!"=="" (>>filter.out echo !Prev!)
This uses several things. First of all, we need delayed expansion which enables us to manipulate environment variables within the loop. Then we iterate over the lines in the file with for /f. Note that this will skip empty lines in the file, but you cannot avoid that. Inside the for /f loop the variable Line holds the current line and Prev the previous one (if there has been a previous one). I swapped the then and else branches of the condition since numbers at the start of the line are easier to check for than non-numbers.
With the echo you'll notice that I moved the redirection to the start of the line; this is to prevent trailing numbers in Prev or Line from having an effect on the redirection (and also to avoid trailing spaces).
If you're not adverse to PowerShell, you can use the following:
$(switch -Regex -File filter.tmp {
'^\D' { if ($prev) { $prev += "/$_" } else { $prev = $_ } }
'^\d{2}' { if ($pref) {$prev}; $_; $prev = '' }
}
if ($prev) { $prev }
) | Set-Content filter.out

Resources