I want to save content of every text file in another output text using command prompt. I have used this command:
type ".\test\*.txt" > out.txt
But every output is appended in that text file. How do I introduce line break into output file after every text file read.
Your question is a bit vague so I am guessing you just need to add as many echo's as you need / want.
This is a slight mod on one of the many examples given by forfiles /?
forfiles /p test /m "*.txt" /C "cmd /c type #file >>../out.txt &&echo. >>../out.txt &&echo. >>../out.txt
ensure you rename or delete any existing out.txt before you start
There are several possibilities, which I want to share with you:
Use type to return each file as is, then use findstr /V "$" to find out whether the last line is terminated by a line-break and explicitly output one by echo/ if not, using the conditional execution operator &&:
#echo off
rem // Write to output file:
> "out.txt" (
rem // Loop through input files to resolve wildcards:
for %%F in (".\test\*.txt") do (
rem // First type out the currently iterated file:
type "%%~F"
rem /* Let `findstr` detect whether there is a terminating line-break,
rem and conditionally append one if not: */
> nul findstr /V /M "$" "%%~F" && echo/
)
)
Caveats:
files must be less than 2 GiB in size?
Use find /V "", which appends a line-break to the last line if not present. Since find usually outputs a header that we do not need, provide the files via input redirection < rather than as arguments. Since redirection does not support wildcards, let a standard for-loop resolve them:
#echo off
rem // Write to output file:
> "out.txt" (
rem // Loop through input files since `<` does not support wildcards:
for %%F in (".\test\*.txt") do (
rem // Let `find` return all lines with a terminating line-break:
< "%%~F" find /V ""
)
)
Caveats:
files must be less than 2 GiB in size?
lines must be shorter than 4 KiB;
Use more /V "", which appends a line-break to the last line if not present. To avoid a prompt after each file, let a for-loop resolve wildcards rather than more:
#echo off
rem // Write to output file:
> "out.txt" (
rem // Loop through input files since `more` awaits prompt per file on wildcards:
for %%F in (".\test\*.txt") do (
rem // Let `more` return all lines with a terminating line-break:
more "%%~F"
)
)
Caveats:
files must be less than 2 GiB in size?
files must contain less than 64 Ki lines;
lines must be shorter than 64 KiB;
TABs become expanded to SPACEs;
Use sort, which appends a line-break to the last line if not present. To avoid resorting of the lines, specify the character start position for sorting beyond the supported line length. Since sort then reverses the entire file, simply apply the command twice:
#echo off
rem // Write to output file:
> "out.txt" (
rem // Loop through input files to not unintentionally rearrange lines:
for %%F in (".\test\*.txt") do (
rem // Let `sort` return all lines with a terminating line-break:
sort "%%~F" /+65535 /REC 65535 | sort /+65535 /REC 65535
)
)
Caveats:
files must be less than 2 GiB in size?
lines must be shorter than 64 KiB;
Read each file by for /F and echo out each line, so each one, even the last one, will be terminated by a line-break; findstr is there to precede each line with its line number plus a colon in order for them not to appear empty to for /F, which would skip such lines; the prefix becomes then stripped off in the loop body:
#echo off
rem // Write to output file:
> "out.txt" (
rem // Loop through input files since `for /F` does not support wildcards:
for %%F in (".\test\*.txt") do (
rem // Precede each line by line number plus `:`, then read each augmented line:
for /F "delims=" %%L in ('findstr /N "^" "%%~F"') do (
rem // Store currently read augmented line:
set "LINE=%%L"
rem // Toggle delayed expansion to aviod loss of `!`:
setlocal EnableDelayedExpansion
rem // Remove line number prefix and return current line with a line-break:
echo(!LINE:*:=!
endlocal
)
)
)
Caveats:
files must be less than 2 GiB in size;
lines must be shorter than about 8 KiB;
Related
This question already has an answer here:
At which point does `for` or `for /R` enumerate the directory (tree)?
(1 answer)
Closed 3 years ago.
I can add a prefix to a series of text files using:
:: rename files
for %%a in (*.txt) do (
ren "%%a" "Seekret file %%a"
:: ECHO %%a Seekret file %%a
)
which will turn
a.txt
b.txt
c.txt
into
Seekret file a.txt
Seekret file b.txt
Seekret file c.txt
However, the above code seems to rename the first file twice with the prefix. I end up with
Seekret file Seekret file a.txt
and I have no idea why. Any ideas?
Use
for /f "delims=" %%a in ('dir /b /a-d *.txt') do (
What is happening is that the version you are using sees the renamed-file as a new file. The dir version builds a list of the filenames and then executes the for on each line, so the list is already built and static and cmd isn't trying to operate on a moving target.
Also - use rem, not :: within a code-block (parenthesised sequence of instructions) as this form of comment is in fact a broken label and labels are not allowed in a code block.
Yes, this can happen, especially on FAT32 and exFAT drives because of these file systems do not return the list of directory entries matched by a wildcard pattern to calling executable in an alphabetic order. for processes the directory entries matching *.txt one after the other and the command ren results in changing the directory entries, i.e. the file names list is modified while iterating over it.
The solution is using:
for /F "eol=| delims=" %%I in ('dir *.txt /A-D /B 2^>nul') do ren "%%I" "Seekret file %%I"
FOR runs in this case in background %ComSpec% /c with the command line specified between ' which means with Windows installed into directory C:\Windows:
C:\Windows\System32\cmd.exe /C dir *.txt /A-D /B 2>nul
So one more command process is started in background which executes DIR which
searches in current directory
just for files because of option /A-D (attribute not directory)
including files with hidden attribute set (use /A-D-H to exclude hidden files)
matching the wildcard pattern *.txt
and outputs in bare format just the file names because of option /B.
An error message output by DIR to handle STDERR in case of not finding any directory entry matching these criteria is suppressed by redirecting it to device NUL.
Read the Microsoft article about Using Command Redirection Operators for an explanation of 2>nul. The redirection operator > must be escaped with caret character ^ on FOR command line to be interpreted as literal character when Windows command interpreter processes this command line before executing command FOR which executes the embedded dir command line with using a separate command process started in background.
The file names without path are output by DIR to handle STDOUT of background command process. This output is captured by FOR respectively the command process executing the batch file.
After started command process terminated itself, FOR processes the captured list of file names. All changes done on directory during the loop iterations do not matter anymore for that reason. The file names list does not change anymore.
The options eol=| delims= are needed to get the complete file names assigned one after the other to loop variable I even on starting with ; or containing a space character. eol=| redefines default end of line character ; to a vertical bar which no file name can contain. delims= defines an empty list of delimiters to disable default line splitting behavior on normal spaces and horizontal tabs.
Note: :: is an invalid label and not a comment. Labels inside a command block are not allowed and usually result in undefined behavior on execution of the command block. Use command REM (remark) for a comment.
Even better would be:
for /F "eol=| delims=" %%I in ('dir *.txt /A-D /B 2^>nul ^| %SystemRoot%\System32\findstr.exe /B /I /L /V /C:"Seekret file "') do ren "%%I" "Seekret file %%I"
FINDSTR is used here to output from list of file names output by DIR and redirected to STDIN of FINDSTR all file names which
do not because of /V (inverted result)
begin because of option /B
case-insensitive because of option /I
with the literally interpreted because of option /L (redundant to /C:)
string Seekret file .
Option /C: is needed to specify the search string containing two spaces as using just "Seekret file" would result in searching literally and case-insensitive for either Seekret OR file at begin of a line. In a search string specified with just "..." each space is interpreted by FINDSTR as an OR expression like | in a Perl regular expression string.
A search string specified with /C: is interpreted implicitly as literal string, but with using /R (instead of /L) it would be possible to get this string interpreted as regular expression string on which a space is interpreted as space and not as OR expression. It is possible to specify multiple search strings using multiple times /C:.
My recommendation on using FINDSTR: Use always either /L or /R to make it clear for FINDSTR and for every reader of the command line how FINDSTR should interpret the search string(s) specified with "..." or with /C:"...".
I guess I'll throw my hat in too, since I'm not really a fan of looping through dir output and no one else is currently accounting for this script already having been run:
#echo off
set "dir=C:\Your\Root\Directory"
set "pfx=Seekret file "
setlocal enabledelayedexpansion
for /r "%dir%" %%A in (*.txt) do (
set "txt=%%~nA"
if not "!txt:~0,13!"=="%pfx%" ren "%%A" "%pfx%%%~nxA"
)
pause
for /r will loop recursively through all .txt files, set each one as parameter %%A (per iteration), set a variable txt as parameter %%A reduced to just its name (%%~nA), and then it compares the first 13 characters of the text file to your example prefix (which is 13 characters long when you include the space: Seekret file) - if they match the loop does nothing; if they do not match, the loop will rename %%A to include the prefix at the beginning. If you don't want it to be recursive, you can use for %%A in ("%dir%"\*.txt) do ( instead. Other than that, you'll just change !txt:~0,13! depending on what your prefix is or how many letters into a filename you want to check. You also don't have to set your directory and prefix variables, I just prefer to do so because it makes the block look cleaner - and it's easier to go back and change one value as opposed to every place that value occurs in a script.
Reference: for /r, ren, variable substrings
I need batch file that searches for a text (eg., FOO) and replaces it with another text (eg., BAR) in all the text files within a folder and it's sub-folders.
I need to give this batch file to the user. So, it is not possible to ask the user to install anything else and also i don't wanna add other files to my batch script, is that even passable? I found many answer for this issue but everyone advise to install other program or to add a file to the batch script . Can someone please help me with this?
Here is a simple and pure batch-file solution -- let us call it replac.bat:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_ROOT=%~3" & rem // (path to root directory; third command line argument)
set "_MASK=%~4" & rem // (file search pattern; fourth command line argument)
set "_SEARCH=%~1" & rem // (search string; first command line argument)
set "_REPLAC=%~2" & rem // (replace string; second command line argument)
set "_CASE=#" & rem // (clear for case-insensitive search)
set "_RECURS=#" & rem // (clear for non-recursive search)
set "_TMPF=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (path to temporary file)
rem // Validate passed command line arguments, apply defaults:
if not defined _SEARCH exit /B 1
if not defined _ROOT set "_ROOT=."
if not defined _MASK set "_MASK=*.txt"
rem // Prepare `if` option (case-insensitivity) for later use:
if defined _CASE (set "IFSW=") else (set "IFSW=/I")
rem // Prepare `for` option (recursion) for later use:
if defined _RECURS (set "FOROPT=/R") else (set "FOROPT=")
rem // Change into root directory temporarily:
pushd "%_ROOT%" || exit /B 1
rem // Loop through all matching files in the directory tree:
for %FOROPT% %%F in ("%_MASK%") do (
rem // Write to temporary file:
> "%_TMPF%" (
set "FLAG="
rem /* Read current file line by line; use `findstr` to precede every line by
rem its line number and a colon `:`; this way empty lines appear non-empty
rem to `for /F`, which avoids them to be ignored; otherwise empty lines
rem became lost: */
for /F "delims=" %%L in ('findstr /N "^" "%%~fF"') do (
rem // Store current line text:
set "LINE=%%L"
setlocal EnableDelayedExpansion
rem // Remove line number prefix:
set "LINE=!LINE:*:=!"
rem // Skip replacement for empty line text:
if defined LINE (
rem /* Use `for /F` loop to avoid trouble in case search or replace
rem strings contain quotation marks `"`: */
for /F "tokens=* delims=*= eol=~" %%K in ("!_SEARCH!=!_REPLAC!") do (
rem // Split search and replace strings:
for /F "tokens=1 delims== eol==" %%I in ("%%K") do (
rem // Query to handle case-sensitivity:
if %IFSW% "!LINE!"=="!LINE:%%I=%%I!" (
rem // Detect whether replacement changes line:
if not "!LINE!"=="!LINE:%%K!" (
rem // Actually do the sub-string replacement:
set "LINE=!LINE:%%K!"
set "FLAG=#"
)
)
)
)
)
rem // Output the resulting line text:
echo(!LINE!
if defined FLAG (endlocal & set "FLAG=#") else (endlocal)
)
)
rem // Check whether file content would change upon replacement:
if defined FLAG (
rem // Move the temporary file onto the original one:
> nul move /Y "%_TMPF%" "%%~fF"
) else (
rem // Simply delete temporary file:
del "%_TMPF%"
)
)
popd
endlocal
exit /B
To use this script, provide the search string as the first and the replace string as the second command line argument, respectively; the third argument defines the root directory which defaults to the current working directory, and the fourth one defines the file pattern which defaults to *.txt:
replac.bat "Foo" "Bar"
The following restrictions apply:
all matching files must be plain ASCII/ANSI text files with Windows-style line-breaks;
neither the lines in the files nor the search and replace strings may be longer than approximately 8190 bytes/characters;
the search string must not be empty, it must not begin with * or ~, and it must not contain =;
the search and replace strings must not contain ! or ^;
for /r %i in (bar.txt) do echo ren "%i" foobar.txt
Remove the echo ONLY once you are sure the files are going to be correctly renamed.
to use it in a Batch file, add additional % to the variables, like:
#echo off
for /r %%i in (bar.txt) do echo ren "%%i" foobar.txt
Oops, sorry, I just saw that you want a solution without installing anything... This doesn't apply to OP then, but might be useful to someone else so I'll leave it up.
If you install git bash (or mingw or cygwin etc) you can use sed, as explained in this answer: https://stackoverflow.com/a/11660023. Use globbing to match multiple files. e.g.
sed -i 's/FOO/BAR/g' ./*.txt
sed uses regexp, so you have access to powerful features like matching only the start of lines (^) or the end of lines ($), any number ([0-9]) etc.
How to extract portion of file that starts with HDR followed by search keyword using a batch file and Windows command interpreter?
Only certain HDR should be copied to another file with name GoodHDR.txt.
HDRs not included in searches should be copied also to another file with name BadHDR.txt.
For example, I have HeaderList.txt below and need to get HEADER0001 and HEADER0003 portions.
HDRHEADER0001 X004010850P
BEG00SAD202659801032017021699CANE
HDRHEADER0002 X004010850P
BEG00SAD202611701012017021499CANW
DTM01020170214
N1ST 92 0642397236
N315829 RUE BELLERIVE
N4MONTREAL QCH1A5A6 CANADA
HDRHEADER0003 X004010850P
BEG00SAP521006901012017021399CANOUT B16885
DTM01020170213
N1STCEGEP SAINT LAURENT 92 0642385892
Expected outcome:
GoodHDR.txt only contains HEADER0001 and HEADER0003.
HDRHEADER0001 X004010850P
BEG00SAD202659801032017021699CANE
HDRHEADER0003 X004010850P
BEG00SAP521006901012017021399CANOUT B16885
DTM01020170213
N1STCEGEP SAINT LAURENT 92 0642385892
BadHDR.txt contains HEADER0002:
HDRHEADER0002 X004010850P
BEG00SAD202611701012017021499CANW
DTM01020170214
N1ST 92 0642397236
N315829 RUE BELLERIVE
N4MONTREAL
The batch code below expects to be started with the parameters 0001 0003 to produce the two output files from source file as posted in question.
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "SourceFile=HeaderList.txt"
set "FoundFile=GoodHDR.txt"
set "IgnoreFile=BadHDR.txt"
if "%~1" == "" goto ShowHelp
if "%~1" == "/?" goto ShowHelp
if not exist "%SourceFile%" goto NoHeaderList
del "%IgnoreFile%" 2>nul
del "%FoundFile%" 2>nul
rem Assign the headers passed as arguments to environment variables with
rem name HDR%~1X, HDR%~2X, HDR%~3X, etc. used later for quickly searching
rem for number of current header within the list of specified numbers.
rem All parameter strings not existing of exactly 4 digits are ignored.
set HeadersCount=0
:SetHeaders
set "HeaderNumber=%~1"
if "%HeaderNumber:~3,1%" == "" goto NextArgument
if not "%HeaderNumber:~4,1%" == "" goto NextArgument
for /F "delims=0123456789" %%I in ("%HeaderNumber%") do goto NextArgument
set "HDR%HeaderNumber%X=%HeaderNumber%"
set /A HeadersCount+=1
:NextArgument
shift /1
if not "%~1" == "" goto SetHeaders
if %HeadersCount% == 0 goto ShowHelp
rem Proces the header blocks in the source file.
set "OutputFile=%IgnoreFile%"
for /F "usebackq delims=" %%L in ("%SourceFile%") do call :ProcessLine "%%L"
rem Output a summary information of header block separation process.
if "%HeadersCount%" == "-1" set "HeadersCount="
if not defined HeadersCount (
echo All header blocks found and written to file "%FoundFile%".
goto EndBatch
)
set "SingularPlural= was"
if not %HeadersCount% == 1 set "SingularPlural=s were"
echo Following header block%SingularPlural% not found:
echo/
for /F "tokens=2 delims==" %%V in ('set HDR') do echo %%V
goto EndBatch
rem ProcessLine is a subroutine called from main FOR loop with
rem a line read from source file as first and only parameter.
rem It compares the beginning of the line with HDRHEADER. The line is
rem written to active output file if it does not start with that string.
rem Otherwise the string after HDRHEADER is extracted from the
rem line and searched in list of HDR environment variables.
rem Is the header in list of environment variables, this line and all
rem following lines up to next header line or end of source file are
rem written to file with found header blocks.
rem Otherwise the current header line and all following lines up to
rem next header line or end of source file are written to file with
rem header blocks to ignore.
rem Once all header blocks to find are indeed found and written completely
rem to the file for found header blocks, all remaining lines of source file
rem are written to the ignore file without further evaluation.
:ProcessLine
if not defined HeadersCount (
>>"%OutputFile%" echo %~1
goto :EOF
)
set "Line=%~1"
if not "%Line:~0,9%" == "HDRHEADER" (
>>"%OutputFile%" echo %~1
goto :EOF
)
set "HeaderLine=%Line:~9%"
for /F %%N in ("%HeaderLine%") do set "HeaderNumber=%%N"
set "OutputFile=%IgnoreFile%"
for /F %%N in ('set HDR%HeaderNumber%X 2^>nul') do (
set "HDR%HeaderNumber%X="
set /A HeadersCount-=1
set "OutputFile=%FoundFile%"
)
>>"%OutputFile%" echo %~1
if %HeadersCount% == 0 (
set "HeadersCount=-1"
) else if %HeadersCount% == -1 (
set "HeadersCount="
)
goto :EOF
:NoHeaderList
echo Error: The file "%SourceFile%" could not be not found in directory:
echo/
echo %CD%
goto EndBatch
:ShowHelp
echo Searches for specified headers in "%SourceFile%" and writes the
echo found header blocks to file "%FoundFile%" and all other to file
echo "%IgnoreFile%" and outputs the header blocks not found in file.
echo/
echo %~n0 XXXX [YYYY] [ZZZZ] [...]
echo/
echo %~nx0 must be called with at least one header number.
echo Only numbers with 4 digits are accepted as parameters.
:EndBatch
echo/
endlocal
pause
The redirection operator >> and the current name of the output file is specified at beginning of all lines which print with command ECHO the current line to avoid appending a trailing space on each line written to an output file and get the line printing nevertheless working if a line ends with 1, 2, 3, ...
Some additional notes about limitations on usage of this code:
The batch code is written with avoiding the usage of delayed expansion to be able to easily process also lines containing an exclamation mark. The disadvantage of not using delayed expansion is that lines containing characters in a line with a special meaning on command line like &, >, <, |, etc. result in wrong output and can even produce additional, unwanted files in current directory.It would be of course possible to extend the batch code to work also for lines in source file containing any ANSI character, but this is not necessary according to source file example which does not contain any "poison" character.
FOR ignores empty lines on reading lines from a text file. So the code as is produces 1 or 2 output files with no empty lines copied from source file.
The main FOR loop reading the lines from source file skips all lines starting with a semicolon. If this could be a problem, specify on FOR command line reading the lines from source file before delims= the parameter eol= with a character which definitely never exists at beginning of a line in source file. See help of command FOR displayed on running in a command prompt window for /? for details on parameters of set /F like eol=, delims= and tokens=.
The length of a string assigned to an environment variable plus equal sign plus name of environment variable is limited to 8192 characters. For that reason this batch code can't be used for a source file with lines longer than 8187 characters.
The length of a command line is also limited. The maximum length depends on version of Windows. So this batch file can't be used with a very large number of header numbers.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
call /?
del /?
echo /?
endlocal /?
for /?
goto /?
if /?
pause /?
rem /?
set /?
setlocal /?
shift /?
Read also the Microsoft article about Using Command Redirection Operators for details about >> and 2>nul and 2^>nul with redirection operator > being escaped with caret character ^ for being interpreted as literal character on parsing FOR command line, but as redirection operator later on execution of command SET by command FOR.
Problem with the file is that it sometimes contains a blank line and so the size is not zero. I tried this but as it has an empty line so it returns 1 instead of 0. Any suggestions how to tackle it?
set /a varTestPoints=0
for /f %%a in ('type "file.txt"^|find "" /v /c') do set /a varTestPoints=%%a
The size can be checked with
for %%a in ("file.txt") do echo %%~za
where %%~za is the size of the file referenced by %%a
To test if the file only contains blank lines any of these commands can be used
(for /f usebackq^ eol^= %%a in ("file.txt") do break) && echo has data || echo empty
(for /f "usebackq eol= " %%a in ("file.txt") do break) && echo has data || echo empty
If the for /f can not find lines with data, it raises an error that can be checked with the && and || conditional execution operators
note as aschipfl points, in the original code that checks for blank lines the default behaviour in for /f that skips lines that start with a semicolon will make the test fail in the lines in the file that start with ;. Now the code also handles this case by two different ways.
In the first command, eol clause is disabled by assigning it an empty list or delimiters.
The second command assigns a space to eol. While it seems that we simply have changed the problematic character, when the lines are parser by the for /f tokenizer, the delims clause has precedence over the eol (more information here), so spaces will be removed as delimiters before they can be seen as eol.
A file with a single blank line will be 2 bytes long (CR, LF). You can detect this by checking if the total file size is less than or equal to 2.
for %%a in (file.txt) do if %%~za LEQ 2 echo File has no more than 2 bytes
This may not work for other files that have more text, but still consist entirely of whitespace and thus appear "empty". For example, a file containing a single tab followed by a newline would have 3 bytes. You may be able to adjust your definition of a "blank" file and adapt the code accordingly.
The solution above won't work if your definition of an "empty" file is one that contains only whitespace, regardless of length. Instead, you can use for /F to parse the file. When reading a file, for /F only matches lines that contain non-whitespace characters. If it finds one, then the file is not "blank".
set "fileIsBlank=1"
for /F %%a in (file.txt) do set "fileIsBlank=0"
if %fileIsBlank% EQU 0 echo File has non-blank lines in it..
How could I trim all trailing spaces from a text file using the Windows command prompt?
The DosTips RTRIM function that Ben Hocking cites can be used to create a script that can right trim each line in a text file. However, the function is relatively slow.
DosTips user (and moderator) aGerman developed a very efficient right trim algorithm. He implemented the algorithm as a batch "macro" - an interesting concept of storing complex mini scripts in environment variables that can be executed from memory. The macros with arguments are a major discussion topic in and of themselves that is not relevent to this question.
I have extracted aGerman's algorithm and put it in the following batch script. The script expects the name of a text file as the only parameter and proceeds to right trim the spaces off each line in the file.
#echo off
setlocal enableDelayedExpansion
set "spcs= "
for /l %%n in (1 1 12) do set "spcs=!spcs!!spcs!"
findstr /n "^" "%~1" >"%~1.tmp"
setlocal disableDelayedExpansion
(
for /f "usebackq delims=" %%L in ("%~1.tmp") do (
set "ln=%%L"
setlocal enableDelayedExpansion
set "ln=!ln:*:=!"
set /a "n=4096"
for /l %%i in (1 1 13) do (
if defined ln for %%n in (!n!) do (
if "!ln:~-%%n!"=="!spcs:~-%%n!" set "ln=!ln:~0,-%%n!"
set /a "n/=2"
)
)
echo(!ln!
endlocal
)
) >"%~1"
del "%~1.tmp" 2>nul
Assuming the script is called rtrimFile.bat, then it can be called from the command line as follows:
rtrimFile "fileName.txt"
A note about performance
The original DosTips rtrim function performs a linear search and defaults to trimming a maximum of 32 spaces. It has to iterate once per space.
aGerman's algorithm uses a binary search and it is able to trim the maximum string size allowed by batch (up to ~8k spaces) in 13 iterations.
Unfotunately, batch is very SLOW when it comes to processing text. Even with the efficient rtrim function, it takes ~70 seconds to trim a 1MB file on my machine. The problem is, just reading and writing the file without any modification takes significant time. This answer uses a FOR loop to read the file, coupled with FINDSTR to prefix each line with the line number so that blank lines are preserved. It toggles delayed expansion to prevent ! from being corrupted, and uses a search and replace operation to remove the line number prefix from each line. All that before it even begins to do the rtrim.
Performance could be nearly doubled by using an alternate file read mechanism that uses set /p. However, the set /p method is limited to ~1k bytes per line, and it strips trailing control characters from each line.
If you need to regularly trim large files, then even a doubling of performance is probably not adequate. Time to download (if possible) any one of many utilities that could process the file in the blink of an eye.
If you can't use non-native software, then you can try VBScript or JScript excecuted via the CSCRIPT batch command. Either one would be MUCH faster.
UPDATE - Fast solution with JREPL.BAT
JREPL.BAT is a regular expression find/replace utility that can very efficiently solve the problem. It is pure script (hybrid batch/JScript) that runs natively on any Windows machine from XP onward. No 3rd party exe files are needed.
With JREPL.BAT somewhere within your PATH, you can strip trailing spaces from file "test.txt" with this simple command:
jrepl " +$" "" /f test.txt /o -
If you put the command within a batch script, then you must precede the command with CALL:
call jrepl " +$" "" /f test.txt /o -
Go get yourself a copy of CygWin or the sed package from GnuWin32.
Then use that with the command:
sed "s/ *$//" inputFile >outputFile
Dos Tips has an implementation of RTrim that works for batch files:
:rTrim string char max -- strips white spaces (or other characters) from the end of a string
:: -- string [in,out] - string variable to be trimmed
:: -- char [in,opt] - character to be trimmed, default is space
:: -- max [in,opt] - maximum number of characters to be trimmed from the end, default is 32
:$created 20060101 :$changed 20080219 :$categories StringManipulation
:$source http://www.dostips.com
SETLOCAL ENABLEDELAYEDEXPANSION
call set string=%%%~1%%
set char=%~2
set max=%~3
if "%char%"=="" set char= &rem one space
if "%max%"=="" set max=32
for /l %%a in (1,1,%max%) do if "!string:~-1!"=="%char%" set string=!string:~0,-1!
( ENDLOCAL & REM RETURN VALUES
IF "%~1" NEQ "" SET %~1=%string%
)
EXIT /b
If you're not used to using functions in batch files, read this.
There is a nice trick to remove trailing spaces based on this answer of user Aacini; I modified it so that all other spaces occurring in the string are preserved. So here is the code:
#echo off
setlocal EnableDelayedExpansion
rem // This is the input string:
set "x= This is a text string containing many spaces. "
rem // Ensure there is at least one trailing space; then initialise auxiliary variables:
set "y=%x% " & set "wd=" & set "sp="
rem // Now here is the algorithm:
set "y=%y: =" & (if defined wd (set "y=!y!!sp!!wd!" & set "sp= ") else (set "sp=!sp! ")) & set "wd=%"
rem // Return messages:
echo input: "%x%"
echo output: "%y%"
endlocal
However, this approach fails when a character of the set ^, !, " occurs in the string.
Good tool for removing trailing spaces in files in windows:
http://mountwhite.net/en/spaces.html
I just found a very nice solution for trimming off white-spaces of a string:
Have you ever called a sub-routine using call and expanded all arguments using %*? You will notice that any leading and/or trailing white-spaces are removed. Any white-spaces occurring in between other characters are preserved; so are all the other command token separators ,, ;, = and also the non-break space (character code 0xFF). This effect I am going to utilise for my script:
#echo off
set "STR="
set /P STR="Enter string: "
rem /* Enable Delayed Expansion to avoid trouble with
rem special characters: `&`, `<`, `>`, `|`, `^` */
setlocal EnableDelayedExpansion
echo You entered: `!STR!`
call :TRIM !STR!
echo And trimmed: `!RES!`
endlocal
exit /B
:TRIM
set "RES=%*"
exit /B
This script expects a string entered by the user which is then trimmed. This can of course also be applied on lines of a file (which the original question is about, but reading such line by line using for /F is shown in other answers anyway, so I skip this herein). To trim the string on one side only, add a single character to the opposite side prior to trimming and remove it afterwards.
This approach has got some limitations though: it does not handle characters %, !, ^ and " properly. To overcome this, several intermediate string manipulation operations become required:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "STR="
set /P STR="Enter string: "
setlocal EnableDelayedExpansion
echo You entered: `!STR!`
set "STR=!STR:%%=%%%%!"
set "STR=!STR:"=""!^"
if not "%STR%"=="%STR:!=%" set "STR=!STR:^=^^^^!"
set "STR=%STR:!=^^^!%"
call :TRIM !STR!
set "RES=!RES:""="!^"
echo And trimmed: `!RES!`
endlocal
endlocal
exit /B
:TRIM
set "RES=%*"
exit /B
Update
Both of the above scripts cannot handle the characters &, <, > and |, because call seems to become aborted as soon as such a character appears in an unquoted and unescaped manner.
However, I finally found a way to fix that and come up with an approach that can successfully deal with all characters (except perhaps some control characters, which I did not test):
#echo off
setlocal EnableExtensions EnableDelayedExpansion
rem // The last white-space in `STRING` is a tabulator:
set "RESULT=" & set "STRING= (<&>"^|)^^!^^^^;,= ^"
echo Input string: `!STRING!`
rem // Double quotes to avoid troubles with unbalanced ones:
if defined STRING set "STRING=!STRING:"=""!^"
rem // Particularly handle carets and exclamation marks as delayed expansion is enabled:
if defined STRING set "STRING=!STRING:^=^^^^!"
if defined STRING set "STRING=%STRING:!=^^^!%" !
if defined STRING (
rem // Escape all characters that `call` has got troubles with:
set "STRING=!STRING:^=^^!"
set "STRING=!STRING:&=^&!"
set "STRING=!STRING:<=^<!"
set "STRING=!STRING:>=^>!"
set "STRING=!STRING:|=^|!"
)
rem /* Call the sub-routine here; the strigs `!=!` constitute undefined dummy variables
rem with an illegal name, which eventually become removed; the purpose of them us to
rem enable usage of that `call` inside of a `for` loop with the meta-variable `%%S`,
rem which would otherwise become unintentionally expanded rather than `%%STRING%%`,
rem which literally contained `%%S`; the `!=!` at the end is just there in case you
rem want to append another string that could also match another `for` meta-variable;
rem note that `!!` is not possible as this would be collapsed to a single `!`, so
rem a (most probably undefined) variable `!STRING%!` would then become expanded: */
call :TRIM %%!=!STRING%%!=!
rem /* The caret doubling done by `call` does not need to be reverted, because due to
rem doubling of the quotes carets appear unquoted, so implicit reversion occurs here;
rem of course the doubling of the quotes must eventually be undone: */
if defined RESULT set "RESULT=!RESULT:""="!^"
echo Now trimmed: `!RESULT!`
endlocal
exit /B
:TRIM
rem // This is the effective line that does the left- and right-trimming:
set "RESULT=%*" !
exit /B
I use this Python 2 script to print lines with trailing whitespace and remove them manually:
#!/usr/bin/env python2
import sys
if not sys.argv[1:]:
sys.exit('usage: whitespace.py <filename>')
for no, line in enumerate(open(sys.argv[1], 'rb').read().splitlines()):
if line.endswith(' '):
print no+1, line
I know that Python is not preinstalled for Windows, but at least it works cross-platform.