Is it possible to remove duplicate rows from a text file? If yes, how?
Sure can, but like most text file processing with batch, it is not pretty, and it is not particularly fast.
This solution ignores case when looking for duplicates, and it sorts the lines. The name of the file is passed in as the 1st and only argument to the batch script.
#echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "sorted=%file%.sorted"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
sort "%file%" >"%sorted%"
>"%deduped%" (
set "prev="
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
if /i "!ln!" neq "!prev!" (
endlocal
(echo %%A)
set "prev=%%A"
) else endlocal
)
)
>nul move /y "%deduped%" "%file%"
del "%sorted%"
This solution is case sensitive and it leaves the lines in the original order (except for duplicates of course). Again the name of the file is passed in as the 1st and only argument.
#echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "line=%file%.line"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
>"%deduped%" (
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
>"%line%" (echo !ln:\=\\!)
>nul findstr /xlg:"%line%" "%deduped%" || (echo !ln!)
endlocal
)
)
>nul move /y "%deduped%" "%file%"
2>nul del "%line%"
EDIT
Both solutions above strip blank lines. I didn't think blank lines were worth preserving when talking about distinct values.
I've modified both solutions to disable the FOR /F "EOL" option so that all non-blank lines are preserved, regardless what the 1st character is. The modified code sets the EOL option to a linefeed character.
New solution 2016-04-13: JSORT.BAT
You can use my JSORT.BAT hybrid JScript/batch utility to efficiently sort and remove duplicate lines with a simple one liner (plus a MOVE to overwrite the original file with the final result). JSORT is pure script that runs natively on any Windows machine from XP onward.
#jsort file.txt /u >file.txt.new
#move /y file.txt.new file.txt >nul
you may use uniq http://en.wikipedia.org/wiki/Uniq from UnxUtils http://sourceforge.net/projects/unxutils/
Some time ago I found an unexpectly simple solution, but this unfortunately only works on Windows 10: the sort command features some undocumented options that can be adopted:
/UNIQ[UE] to output only unique lines;
/C[ASE_SENSITIVE] to sort case-sensitively;
So use the following line of code to remove duplicate lines (remove /C to do that in a case-insensitive manner):
sort /C /UNIQUE "incoming.txt" /O "outgoing.txt"
This removes duplicate lines from the text in incoming.txt and provides the result in outgoing.txt. Regard that the original order is of course not going to be preserved (because, well, this is the main purpose of sort).
However, you sould use these options with care as there might be some (un)known issues with them, because there is possibly a good reason for them not to be documented (so far).
The Batch file below do what you want:
#echo off
setlocal EnableDelayedExpansion
set "prevLine="
for /F "delims=" %%a in (theFile.txt) do (
if "%%a" neq "!prevLine!" (
echo %%a
set "prevLine=%%a"
)
)
If you need a more efficient method, try this Batch-JScript hybrid script that is developed as a filter, that is, similar to Unix uniq program. Save it with .bat extension, like uniq.bat:
#if (#CodeSection == #Batch) #then
#CScript //nologo //E:JScript "%~F0" & goto :EOF
#end
var line, prevLine = "";
while ( ! WScript.Stdin.AtEndOfStream ) {
line = WScript.Stdin.ReadLine();
if ( line != prevLine ) {
WScript.Stdout.WriteLine(line);
prevLine = line;
}
}
Both programs were copied from this post.
set "file=%CD%\%1"
sort "%file%">"%file%.sorted"
del /q "%file%"
FOR /F "tokens=*" %%A IN (%file%.sorted) DO (
SETLOCAL EnableDelayedExpansion
if not [%%A]==[!LN!] (
set "ln=%%A"
echo %%A>>"%file%"
)
)
ENDLOCAL
del /q "%file%.sorted"
This should work exactly the same. That dbenham example seemed way too hardcore for me, so, tested my own solution. usage ex.: filedup.cmd filename.ext
Pure batch - 3 effective lines.
#ECHO OFF
SETLOCAL
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
FOR /f "delims=" %%a IN (q34223624.txt) DO SET $%%a=Y
(FOR /F "delims=$=" %%a In ('set $ 2^>Nul') DO ECHO %%a)>u:\resultfile.txt
GOTO :EOF
Works happily if the data does not contain characters to which batch has a sensitivity.
"q34223624.txt" because question 34223624 contained this data
1.1.1.1
1.1.1.1
1.1.1.1
1.2.1.2
1.2.1.2
1.2.1.2
1.3.1.3
1.3.1.3
1.3.1.3
on which it works perfectly.
Did come across this issue and had to resolve it myself because the use was particulate to my need.
I needed to find duplicate URL's and order of lines was relevant so it needed to be preserved. The lines of text should not contain any double quotes, should not be very long and sorting cannot be used.
Thus I did this:
setlocal enabledelayedexpansion
type nul>unique.txt
for /F "tokens=*" %%i in (list.txt) do (
find "%%i" unique.txt 1>nul
if !errorlevel! NEQ 0 (
echo %%i>>unique.txt
)
)
Auxiliary: if the text does contain double quotes then the FIND needs to use a filtered set variable as described in this post: Escape double quotes in parameter
So instead of:
find "%%i" unique.txt 1>nul
it would be more like:
set test=%%i
set test=!test:"=""!
find "!test!" unique.txt 1>nul
Thus find will look like find """what""" file and %%i will be unchanged.
I have used a fake "array" to accomplish this
#echo off
:: filter out all duplicate ip addresses
REM you file would take place of %1
set file=%1%
if [%1]==[] goto :EOF
setlocal EnableDelayedExpansion
set size=0
set cond=false
set max=0
for /F %%a IN ('type %file%') do (
if [!size!]==[0] (
set cond=true
set /a size="size+1"
set arr[!size!]=%%a
) ELSE (
call :inner
if [!cond!]==[true] (
set /a size="size+1"
set arr[!size!]=%%a&& ECHO > NUL
)
)
)
break> %file%
:: destroys old output
for /L %%b in (1,1,!size!) do echo !arr[%%b]!>> %file%
endlocal
goto :eof
:inner
for /L %%b in (1,1,!size!) do (
if "%%a" neq "!arr[%%b]!" (set cond=true) ELSE (set cond=false&&goto :break)
)
:break
the use of the label for the inner loop is something specific to cmd.exe and is the only way I have been successful nesting for loops within each other. Basically this compares each new value that is being passed as a delimiter and if there is no match then the program will add the value into memory. When it is done it will destroy the target files contents and replace them with the unique strings
Related
I need a big help from the community, please if somebody can give me some hints. I have the following windows batch script which is supposed to read more than 10 million records as different CSV files and merge them all together. I am running the script on the server. So it's not very slow. But the problem is that the code doesn't handle duplicated records. I am not sure how to change the script in order to handle the duplication records and only passed unique records. I would be very very appreciated for your help.
rem Set current working directory to Task folder
set FilePath=%~dp0
set FolderPath=%FilePath:~0,-1%
rem Set Space environment variables
call "%FolderPath%"\..\SpaceEnv.bat
rem Set Task specific environment variables
set TaskName=MergeCSVfiles
set fileName=result.csv
set LogFile=%TaskName%_%LogDateTime%.log
:begin
cd ..
cd "Source Files\DCM_Source\Inbox"
echo Staring merge %fileName% at: %time%
setlocal enabledelayedexpansion
set "first=1"
>%fileName% (
for %%F in (msource*.csv) do (
if not "%%F"=="%fileName%" (
set /p "header="<"%%F"
if defined first (
type "%%F"
set "first="
) else (
type "%%F" |find /V "!header!"
)
)
)
)
endlocal
echo Finish merging %fileName% at: %time%
******UPDATED******
Example of CSV file
Sites|Level 2 sites|Date-time (visit start)|Visit ID|Unique visitor ID|Date-time (event)|Sources|Visitor categories|Visitor ID|Visits
SE Romania|PRM|2018-01-01T00:30:04|1|-6427177464|2018-01-01T00:30:04|Portal sites|-|0|2
SE Romania|PRM|2018-01-01T00:30:04|1|-6427177464|2018-01-01T00:30:04|Portal sites|-|0|2
This code will dedupe a file. In order to do that it must be sorted. This means any header record at the top of the file will be sorted into the file. This is code I received from dbenham. I can't remember if he originally posted it on StackOverflow or DosTips.com. If the file is very large it will more than likely crash with an out of memory error.
#echo off
:: Call function to dedupe file
CALL :DEDUPE "filename.txt"
goto :eof
:DEDUPE
:: DEDUPE file
setlocal disableDelayedExpansion
set "file=%~1"
set "sorted=%file%.sorted"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
sort "%file%" >"%sorted%"
>"%deduped%" (
set "prev="
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
if /i "!ln!" neq "!prev!" (
endlocal
(echo %%A)
set "prev=%%A"
) else endlocal
)
)
>nul move /y "%deduped%" "%file%"
del "%sorted%"
GOTO :EOF
#ECHO OFF
SETLOCAL
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
SET "filenamecommon=q49264647*.csv"
:: switch to required source directory
PUSHD "%sourcedir%"
:: get header line
FOR %%f IN (%filenamecommon%) DO FOR /f "delims=" %%h IN (%%f) DO SET "header=%%h"&goto gotheader
:gotheader
COPY %filenamecommon% atempfilename
SET "lastline="
>resultfilename (
ECHO %header%
SETLOCAL enabledelayedexpansion
FOR /f "delims=" %%d IN ('sort atempfilename' ) DO (
IF "%%d" neq "!lastline!" IF "%%d" neq "%header%" ECHO %%d
SET "lastline=%%d"
)
endlocal
)
DEL atempfilename
popd
GOTO :EOF
You would need to change the setting of sourcedir to suit your circumstances.
I used file/dirctorynames that suit my system for testing.
Note : datafiles containing the characters ! or ^ or unbalanced " will not be processed correctly.
First, find the header line by setting header from any matching filename. Once header is set, forcibly abort the for loops.
copy and concatenate all of the required files to a tempfile.
output the header line, then sort the tempfile to group identical lines. Read the result and output only those lines that differed from the previous and were not header lines.
Applying /i to the if statements will make the entire routine disregard character-case.
Sort the tempfile
Ok. Give this code a try. I think this code would generate the result file with not duplicated records not matters its size. However, the time the program will take depends on several factors, although IMHO it should not be excessive because the core part of the process is based on findstr.exe command.
#echo off
setlocal
del result.csv 2>NUL
rem Process all input files
for /F "delims=" %%f in ('dir /B /O:-S msource*.csv') do (
echo Merging file: %%f
if not exist result.csv (
rem Initialize output file with first input file
copy "%%f" result.csv > NUL
) else (
rem Get records in this file that are not in result file
findstr /V /G:result.csv "%%f" > newRecords.csv
rem and add they to the result file
type newRecords.csv >> result.csv
)
)
del newRecords.csv
You may also try to eliminate the dash in /O:-S switch of dir command; perhaps this change will speed up the process a little...
In below code i am tring to fetch the line no of string "AXX0000XXXA" from file data.txt,then fetching line by line and printing target.txt file,in between if the line reach the find line no i am adding one more line from file temp.txt.The code is working fine with the less nos of records(tested with 150 lines-File Size 100 kb),but when i am processing with 50K records(File Size 25MB) it is taking more then 25 minutes to process.could you please help me how i will process same in less time.
#echo off
setlocal enabledelayedexpansion
for /f "delims=:" %%a in ('findstr /n "AXX0000XXXA" "C:\Users\23456\Desktop\data.txt"') do (set find_line=%%a)
set /a counter=0
for /f "usebackq delims=" %%b in (`"findstr /n ^^ C:\Users\23456\Desktop\data.txt"`) do (
set curr_line=%%b
set /a counter=!counter!+1
if !counter! equ !find_line! (
type temp.txt >> target.txt
)
call :print_line curr_line
)
endlocal
:print_line
setlocal enabledelayedexpansion
set line=!%1!
set line=!line:*:=!
echo !line!>>target.txt
endlocal
Your code uses three Batch file constructs that are inherently slow: call command, >> append redirection and setlocal/endlocal, and these constructs are executed once per each file line! It would be faster to include the subroutine into the original code to avoid the call and setlocal commands, and an echo !line!>>target.txt command imply open the file, search for the end, append the data and close the file, so it is faster to use this construct: (for ...) > target.txt that just open the file once. An example of a code with such changes is in Compo's answer.
This is another method to solve this problem that may run faster when the search line is placed towards the beginning of the file:
#echo off
setlocal enabledelayedexpansion
for /f "delims=:" %%a in ('findstr /n "AXX0000XXXA" "C:\Users\23456\Desktop\data.txt"') do (set /A find_line=%%a-1)
call :processFile < "C:\Users\23456\Desktop\data.txt" > target.txt
goto :EOF
:processFile
rem Duplicate the first %find_line%-1 lines
for /L %%i in (1,1,%find_line%) do (
set /P "line="
echo !line!
)
rem Insert the additional line
type temp.txt
rem Copy the rest of lines
findstr ^^
exit /B
This should create target.txt with content matching data.txt except for an inserted line taken from tmp.txt immediately above the line matching the search string, AXX0000XXXA.
#Echo Off
Set "fSrc=C:\Users\23456\Desktop\data.txt"
Set "iSrc=temp.txt"
Set "sStr=AXX0000XXXA"
Set "fDst=target.txt"
Set "iStr="
Set/P "iStr="<"%iSrc%" 2>Nul
If Not Defined iStr Exit/B
Set "nStr="
For /F "Delims=:" %%A In ('FindStr/N "%sStr%" "%fSrc%" 2^>Nul') Do Set "nStr=%%A"
If Not Defined nStr Exit/B
( For /F "Tokens=1*Delims=:" %%A In ('FindStr/N "^" "%fSrc%"') Do (
If "%%A"=="%nStr%" Echo %iStr%
Echo %%B))>"%fDst%"
I have made it easy for you to change your variable data, you only need to alter lines 3-6.
I have assumed that this was your intention, your question was not clear, please accept my apologies if I have assumed incorrectly.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a text file. I have to swap odd and even lines.
I made a batch script that writes even lines into testfile2.txt and odd lines into testfile3.txt.
#echo off
setlocal EnableDelayedExpansion
set "filepath1=C:\\Users\\andyb\\Desktop\\testfile.txt"
set "filepath2=C:\\Users\\andyb\\Desktop\\testfile2.txt"
set "filepath3=C:\\Users\\andyb\\Desktop\\testfile3.txt"
set counter=0
set B=0
for /F %%A in (%filepath1%) do (
set /a B=!counter!%%2
if !B! equ 0 (echo %%A>>%filepath2%) else (echo %%A>>%filepath3%)
set /A counter=counter+1
)
And I want to take 1 line from file that contains odd lines, then 1 line from the file with even lines and write it to my first file. But I don't understand how to do it in FOR loop because it reads a line from only one file and I can't work with another file in this loop.
Example of input file:
1a
2b
3c
4d
Example of output file:
2b
1a
4d
3c
Try the following:
#echo off
setlocal EnableExtensions EnableDelayedExpansion
rem // Define constants here:
set "_FILE=textfile.txt"
rem // Count number of lines:
for /F %%C in ('^< "!_FILE!" find /C /V ""') do set "COUNT=%%C"
rem // Divide by two, round up:
set /A "COUNT=(COUNT+1)/2"
< "!_FILE!" > "!_FILE!.tmp" (
rem // Read files in blocks of two lines:
for /L %%I in (1,1,%COUNT%) do (
set "LINE1=" & set "LINE2="
set /P LINE1=""
set /P LINE2=""
echo(!LINE2!
echo(!LINE1!
)
)
rem // Overwrite original file:
> nul move /Y "!_FILE!.tmp" "!_FILE!"
endlocal
exit /B
There are several solutions for this task.
The first one uses delayed expansion on execution of all lines of batch file exchanging odd and even lines. This means it does not work right for lines with an exclamation in line because ! is removed from line.
#echo off
setlocal EnableExtensions EnableDelayedExpansion
set "SourceFile=%USERPROFILE%\Desktop\TestFile.txt"
if not exist "%SourceFile%" goto EndBatch
set "TargetFile=%USERPROFILE%\Desktop\TestFile2.txt"
del "%TargetFile%" 2>nul
set "LineOdd="
for /F "usebackq delims=" %%I in ("%SourceFile%") do (
if not defined LineOdd (
set "LineOdd=%%I"
) else (
echo %%I>>"%TargetFile%"
echo !LineOdd!>>"%TargetFile%"
set "LineOdd="
)
)
if defined LineOdd echo !LineOdd!>>"%TargetFile%"
move /Y "%TargetFile%" "%SourceFile%"
:EndBatch
endlocal
Blank and empty lines are skipped by FOR and therefore missing in target file. And lines starting with a semicolon ; are ignored on reading each line by FOR and for that reason are missing also in output file. But those limitations should not matter here according to input example.
The limitations of first solution could be avoided using this batch code which is of course much slower:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "SourceFile=%USERPROFILE%\Desktop\TestFile.txt"
if not exist "%SourceFile%" goto EndBatch
set "TargetFile=%USERPROFILE%\Desktop\TestFile2.txt"
del "%TargetFile%" 2>nul
set "LineOdd="
for /F "tokens=1* delims=:" %%H in ('%SystemRoot%\System32\findstr.exe /N /R "^" "%SourceFile%"') do (
if not defined LineOdd (
set "LineOdd=_%%I"
) else (
if "%%I" == "" (
echo/>>"%TargetFile%"
) else (
echo %%I>>"%TargetFile%"
)
setlocal EnableDelayedExpansion
if "!LineOdd!" == "_" (
echo/>>"%TargetFile%"
) else (
echo !LineOdd:~1!>>"%TargetFile%"
)
endlocal
set "LineOdd="
)
)
if defined LineOdd (
setlocal EnableDelayedExpansion
if "!LineOdd!" == "_" (
echo/>>"%TargetFile%"
) else (
echo !LineOdd:~1!>>"%TargetFile%"
)
endlocal
)
move /Y "%TargetFile%" "%SourceFile%"
:EndBatch
endlocal
It would be also possible to use hybrid batch file JREPL.BAT written by Dave Benham:
call jrepl.bat "^(.*)\r\n(.*)\r\n" "$2\r\n$1\r\n" /M /X /F "%USERPROFILE%\Desktop\TestFile.txt" /O "%USERPROFILE%\Desktop\TestFile2.txt"
move /Y "%USERPROFILE%\Desktop\TestFile2.txt" "%USERPROFILE%\Desktop\TestFile.txt"
The last line of the file must have a DOS/Windows line termination (carriage return \r and line-feed \n) if being an even line on using this solution.
For understanding the used commands/executables/batch files and how they work, open a command prompt window, execute there the following command lines, and read entirely all help pages displayed for each command/executable/batch file very carefully.
del /?
echo /?
endlocal /?
findstr.exe /?
for /?
goto /?
if /?
jrepl.bat /?
move /?
set /?
setlocal /?
Read also the Microsoft article about Using Command Redirection Operators for an explanation of 2>nul and >>.
The following example will create two files from testfile.txt, file0.out containing the even lines, and file1.out containing the odd lines.
#Echo Off
SetLocal EnableDelayedExpansion
For /F "Tokens=1* Delims=:" %%A In ('FindStr/N "^" "testfile.txt"') Do (
Set/A "_=%%A%%2"
(Echo(%%B)>>file!_!.out)
Rename the output files according to your requirements.
I think to reinterleave the odd and even versions in reversed order isn't that difficult. Appending to Compo's try:
#Echo Off
SetLocal EnableDelayedExpansion
Set File=testfile
For /F "Tokens=1* Delims=:" %%A In ('FindStr/N "^" "%File%.txt"'
) Do Set/A "_=%%A%%2"&>>%File%_!_!.txt Echo(%%B
<%File%_0.txt (For /f "delims=" %%A in (%File%_1.txt) Do (
Set "B="&Set /P "B="
Echo(!B!
Echo(%%A
)) >%File%-new.txt
Del %File%_*
In case of an uneven total the second last line will be empty. sample Output:
2b
1a
4d
3c
5e
So I'm building a messaging program in batch (I know, it's newbish) and the program takes user input, puts it in my .txt file log.txt, and types it on the screen. I want the output to look like this...
Title
----------------------
contents
of
the
file
here
----------------------
User input here>>
This may seem simple, but the file will be constantly updated by users and I want the program to only display a range of lines to keep that message area stays the same size. I found a simple program to display specific lines, but I can't make them move down one line each time log.txt is changed. Here it is:
#setlocal enableextensions enabledelayedexpansion
#echo off
set lines=1
set curr=1
for /f "delims=" %%a in ('type bob.txt') do (
for %%b in (!lines!) do (
if !curr!==%%b echo %%a
)
set /a "curr = curr + 1"
)
endlocal
(By the way, this program is called lines.bat. I just call it in cmd to test it.)
To return a defined number of lines starting from a certain line number, you can do the following:
#echo off
setlocal EnableExtensions
rem define the (path to the) text file here:
set "TEXT_FILE=log.txt"
rem define the line number here:
set /A "LINE_NUMBER=1"
rem define the number of lines here:
set /A "LINE_COUNT=5"
set /A "LINE_LIMIT=LINE_NUMBER+LINE_COUNT-1"
for /F delims^=^ eol^= %%L in ('findstr /N /R "^" "%TEXT_FILE%"') do (
setlocal DisableDelayedExpansion
set "LINE=%%L"
setlocal EnableDelayedExpansion
for /F "tokens=1 delims=:" %%N in ("!LINE!") do set "LNUM=%%N"
set "LINE=!LINE:*:=!"
if !LNUM! GEQ !LINE_NUMBER! (
if !LNUM! LEQ !LINE_LIMIT! (
echo !LINE!
)
)
endlocal
endlocal
)
endlocal
The findstr command with the /R "^" search pattern returns all lines. The findstr switch /N lets every line precede with a line number (starting from 1) and a colon. The : is used to split the line in two parts: the first part representing the line number is checked whether it is in the range to be returned; the second part is the original line of text which is simply output in case. Even empty lines are taken into account.
You might ask why not simply using the above mentioned : as a delims delimiter option for for /F, but this would cause problems with lines of text starting with :.
The toggling of delayed expansion is necessary to avoid trouble with special characters like !, for instance.
To return the last defined number of lines, the following approach can be used:
#echo off
setlocal EnableExtensions
rem define the (path to the) text file here:
set "TEXT_FILE=log.txt"
rem define the number of lines here:
set /A "LINE_COUNT=5"
for /F "tokens=1 delims=:" %%L in ('findstr /N /R "^" "%TEXT_FILE%"') do (
set /A "LINE_SKIP=%%L"
)
set /A "LINE_SKIP-=LINE_COUNT"
if %LINE_SKIP% GTR 0 (
set "LINE_SKIP=skip^=%LINE_SKIP%^ "
) else (
set "LINE_SKIP="
)
for /F %LINE_SKIP%delims^=^ eol^= %%L in ('findstr /N /R "^" "%TEXT_FILE%"') do (
setlocal DisableDelayedExpansion
set "LINE=%%L"
setlocal EnableDelayedExpansion
set "LINE=!LINE:*:=!"
echo !LINE!
endlocal
endlocal
)
endlocal
Again, the findstr /N /R "^" command is used. But here, we have an additional for /F loop first, which merely counts the number of lines in the text file, extracting the line number preceded by findstr. The second for /F loop is quite similar to the above approach, but a dynamic skip option is introduced, so that the loop starts iterating through the last lines only; the rest is almost the same as above, except that the conditions concerning the current line number have been removed.
I know I could do the counting of lines also by using find /C /V "" rather than looping through the findstr /N /R "^" output, but if there are one or more empty lines at the end of the file, find returns a number one less as the findstr method, so I went for findstr consistently.
Also here, delayed expansion is toggled to avoid trouble with the ! character.
Ok so I write in batch files a lot. A while back I asked a question user:cmd on how to copy one part of a running batch file into a new batch file,
Well it works if your going to use it one time in a batch file. My goal is to create multiple large batch files from within a single setup batch. What happens is if they choose to install, then the batch file runs the following.
cls
setlocal EnableDelayedExpansion
color e
::Start of embedded code
set Begin=
for /F "delims=:" %%a in ('findstr /N "^:EMBEDDED_CODE" "%~F0"') do (
if not defined Begin (
set Begin=%%a
) else (
set End=%%a
)
)
::*****************************************************************************
(for /F "skip=%Begin% tokens=1* delims=[]" %%a in ('find /N /V "" "%~F0"') do (
if %%a equ %End% goto :Build-file2
echo(%%b
)) > file1.bat & goto :Build-file2
)
goto :Build-file2
:EMBEDDED_CODE Begin
CODE TO PUT INTO "file1.bat"
:EMBEDDED_CODE End
:Build-file2
cls
setlocal EnableDelayedExpansion
color e
::Start of embedded code
set Begin=
for /F "delims=:" %%a in ('findstr /N "^:EMBEDDED_CODE" "%~F0"') do (
if not defined Begin (
set Begin=%%a
) else (
set End=%%a
)
)
::*****************************************************************************
(for /F "skip=%Begin% tokens=1* delims=[]" %%a in ('find /N /V "" "%~F0"') do (
if %%a equ %End% goto :EOF
echo(%%b
)) > file2.bat & goto :EOF
)
goto :EOF
:EMBEDDED_CODE Begin
CODE TO PUT INTO "file2.bat"
:EMBEDDED_CODE End
The problem that is occurring is instead of it just copying the code between labels EMBEDDED_CODE Begin and EMBEDDED_CODE End in the first FOR loop it copies from EMBEDDED_CODE Begin down to the very bottom of the script puts it in the file I want and then goes to the next FOR loop which repeats the process with different code between the to labels. so file1.bat and file2.bat both contain the exact same code but with the desired file names of file1.bat AND file2.bat.
Why would you expect anything different than the results you are getting? The FINDSTR will search the entire file, so Begin is set to the first occurrence of :EMBEDDED_CODE in the first block of code, and End is set to the last occurrence in the last block of code (last value set wins). You replicate the code, so of course you get the same faulty result two times.
Simply change the labels in your second block of code, perhaps :EMBEDDED_CODE2, and adjust your 2nd FINDSTR accordingly. All should work then.
I often use a slightly different approach that minimizes the amount of file reading. Simply modify all lines from a given embedded block of code with the same unique prefix. Then FINDSTR can directly output the desired lines, and a FOR /F is used to strip off the prefix. You just need the prefix to end with a character that never matches the beginning of your code.
You should be careful about enabling delayed expansion when reading a file with FOR /F. Your embedded code will be corrupted if it contains ! and delayed expansion is enabled. (unless the ! is escaped, but that can be a pain)
#echo off
for %%C in (1 2) do (
for /f "tokens=1* delims=}" %%A in ('findstr /bl ":%%C}" "%~f0"') do echo(%%B
)>file%%C.bat
:1}Your first code block goes here
:1}
:1} Blank lines and indents are preserved
:1}And so are exclamation points!
:2}And here is your second code block
:2}...
echo file1.bat
echo ---------
type file1.bat
echo(
echo(
echo file2.bat
echo ---------
type file2.bat
This will almost do what you need.
This code needs to read twice the input file, first to locate the range of lines to process (findstr line numbering), and second to extract them. In second loop findstr numbering is used again to avoid for /f to compress blank lines and alter line numbering.
On the other hand, the problem with special characters inside extacted text is handled, enabling and disabling delayed expansion as needed.
Maybe not the best performance, but it seems to work. Adapt as needed.
#echo off
setlocal enableextensions enabledelayedexpansion
call :extractEmbedded "Section1" extracted.txt
if not errorlevel 1 (
cls
type extracted.txt
)
exit /b
:extractEmbedded id outputFile
rem prepare environment
setlocal enableextensions enabledelayedexpansion
rem asume failure on execution
set "_return=1"
rem find embedded zone in current file
set "_start="
set "_end="
for /f "tokens=1 delims=:" %%l in ('findstr /n /b /c:":EMBEDDED %~1" "%~f0"') do (
if not defined _start ( set "_start=%%l" ) else ( set "_end=%%l" )
)
rem adjust lines to process
set /a "_start+=0"
set /a "_end-=1"
rem if nothing found, task done
if %_start% GEQ %_end% goto endExtractEmbedded
rem prepare file extraction
if "%_start%"=="0" (set "_skip=" ) else ( set "_skip=skip^=%_start%" )
rem extract proper area of file to output file
(for /f tokens^=^*^ %_skip%^ eol^= %%l in ('findstr /n "^" "%~f0"') do if !_start! LSS !_end! (
setlocal disabledelayedexpansion
set "_line=%%l"
setlocal enabledelayedexpansion
echo(!_line:*:=!
endlocal & endlocal
set /a "_start+=1"
))>"%~2"
rem everything ok
set "_return=0"
:endExtractEmbedded
rem exit with errorlevel
endlocal & exit /b %_return%
:EMBEDDED Section1
This is a section; of embedded!!! code
that needs to be extracted to generate
a new file to be processed.
TEST: !""$%&/()=?¿^*[];,:-\|
:EMBEDDED Section1