Preferably a one-liner, how could I delete a range of lines at the beginning from a large (3MB+) text file in a timely fashion (few seconds max). I've seen solutions using for /f along with findstr, but the for loop made it extremely slow, and the tool more cannot handle larger files without hanging.
#echo off &setlocal
set "testing.txt=%~1"
(for /f "delims=" %%i in ('findstr /n "^" "testing.txt"') do (
set "line=%%i"
for /f "delims=:" %%a in ("%%i") do set "row=%%a"
setlocal enabledelayedexpansion
set "line=!line:*:=!"
if !row! gtr 100 echo(!line!
endlocal
))>output.txt
Here is an attempt. It is incredibly slow. Any recommendations would be appreciated.
This is the fastest way to eliminate the first lines in a large file. Is written as one-liner, as you requested:
#echo off
< testing.txt ( (for /L %%i in (1,1,100) do set /P "=") & findstr "^" ) > output.txt
However, be aware that this method can only manage lines up to 1023 characters long because it uses set /P command to read and discard not desired lines...
For a description of this method, see this answer.
As far as I got you want to skip the first 100 lines of your huge text file and want to return the rest.
Well, when I look to your code the first thing I see is you have two for /F loops nested, which might slow things down.
The inner loop just splits off a preceding line number that is separated by a colon from the rest.
For this purpose you could (mis-)use set /A, which is capable of converting a string into a numeric value, when you use its implicit variable expansion (hence no % or !); this process stops when the first non-numeric character is encountered, which is the : in our situation. So just replace the inner for /F loop with:
set /A "row=line"
This will for sure speed things up a bit. However, regard that this limits the input text file to 2^31 - 1 lines. Note, that the number of characters/bytes per line is still limited to about 8190.
By the way, you do not have to do set "line=!line:*:=!" as a separate step, just remove this and replace echo(!line! with echo(!line:*:=!.
If you do not need to preserve empty lines, the whole approach is as simple as this:
#echo off
for /F usebackq^ skip^=100^ delims^=^ eol^= %%i in ("testing.txt") do echo(%%i
This does not limit the file size, but the line lengths must not exceed 8191 characters/bytes.
Related
I am trying to extract the values from the third field of a file which has data records.
The fields are separated by vertical bar characters:
9001||10454145||60|60
9001|234467|10454145||60|60
9001|234457|10454145||60|60
Command is -
for /f "tokens=3 delims=|" %%A IN ('Findstr /i "9001" .\itemloc\%%~nf.dat') do (
echo %%A >> log.txt
)
But the output I am getting is
60
10454145
10454145
The empty fields are messing up my output. Any suggestions how to make the for token work with empty fields in the record?
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
rem The following settings for the directories and filenames are names
rem that I use for testing and deliberately includes spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.
SET "sourcedir=u:\your files"
SET "destdir=u:\your results"
SET "filename1=%sourcedir%\q75199035.txt"
SET "outfile=%destdir%\outfile.txt"
(
FOR /f "usebackqtokens=1*delims=" %%e IN ("%filename1%") DO (
SET "line=%%e"
FOR /f "tokens=3 delims=|" %%y IN ("!line:||=|(missing)|!") DO ECHO %%y
)
)>"%outfile%"
TYPE "%outfile%"
GOTO :EOF
Always verify against a test directory before applying to real data.
Note that if the filename does not contain separators like spaces, then both usebackq and the quotes around %filename1% can be omitted.
The magic is that for each line, || is replaced by |(missing)|.
This simple solution has its faults - for instance if there is ||| in the source data, or the usual suspects (some punctuation symbols like !) but should be quite happy with alphameric source text.
Another way would be to use a third-party utility like sed to pre-process the source data.
The fundamental reason for this phenomenon is that for/f parses the line as [delimiters]token1[delimiters]token2..., where [delimiters] is any sequence of any of the delimiter characters.
I have a .bat file and I'm trying to parse through each character in a Folder Location String, in order to count the occurrences of a certain character.
How could I do that in a .bat file?
Is it possible using a FOR loop where you perform code similar to this?:
FOR /F %%i in (%cd%) DO (IF %%i =="w" counter+=1)
I know the above isn't correct, I'm just starting out.
Haven't managed to find the answer in my research so far.
either work with substrings (look at every character), which is slow with longer strings or count the words and count the words when you replace each "w" with " w". The difference is the count of w (it ignores capitalization, so it counts w plus W):
#echo off
set "string=Hello beautiful World"
for /f %%A in ('(for %%a in (%string%^) do #echo %%a^)^|find /c /v ""') do set A=%%A
for /f %%A in ('(for %%a in (%string:l= l%^) do #echo %%a^)^|find /c /v ""') do set B=%%A
set /a count=B-A
echo --- %count%
Reason for not just removing the spaces is that for standard delimiters is not just space, but space, tab, comma, ...
Note: this does not take care of poison characters.
I'm trying to insert a line into a file using the following code (from Write batch variable into specific line in a text file)
#echo off
setlocal enableextensions enabledelayedexpansion
set inputfile=variables.txt
set tempfile=%random%-%random%.tmp
copy /y nul %tempfile%
set line=0
for /f "delims=" %%l in (%inputfile%) do (
set /a line+=1
if !line!==4 (
echo WORDS YOU REPLACE IT WITH>>%tempfile%
) else (
echo %%l>>%tempfile%
)
)
del %inputfile%
ren %tempfile% %inputfile%
endlocal
My problem is the file has comment lines (which start with semicolons) which need to be kept
; directory during network startup. This statement must indicate a local disc
; drive on your PC and not a network disc drive.
LOCALDRIVE=C:\TEMP;
; PANELISATION PART/NET NAMING CONVENTION
; When jobs are panelised, parts/nets are renamed for each panel step by
When I run the batch file, it ignores the semicolon lines, So I only get:
LOCALDRIVE=C:\TEMP;
What do I need to do to keep the semicolon lines?
The EOL option determines what lines are to be ignored. The default value is a semicolon. If you know a character that can never appear in the first position of a line, then you can simply set EOL to that character. For example, if you know a line can't start with |, then you could use
for /f "eol=| delims=" %%l in (%inputfile%) do ...
There is an awkward syntax that disables EOL completely, and also disables DELIMS:
for /f delims^=^ eol^= %%l in (%inputfil%) do ...
Note that FOR /F always discards empty lines, so either of the above would result in:
; directory during network startup. This statement must indicate a local disc
; drive on your PC and not a network disc drive.
LOCALDRIVE=C:\TEMP;
; PANELISATION PART/NET NAMING CONVENTION
; When jobs are panelised, parts/nets are renamed for each panel step by
A trick is used if you want to preserve empty lines. Use FIND or FINDSTR to insert the line number before each line, and then use expansion find/replace to remove the line number. Now you know the line never begins with ;, so you can ignore the EOL option.
for /f "delims=" %%L in ('findstr /n "^" "%inputfile%"') do (
set "ln=%%L"
set "ln=!ln:*:=!"
REM You now have the original line, do whatever needs to be done here
)
But all of the above have a potential problem in that you have delayed expansion enabled when you expand the FOR variable, which means that any content containing ! will be corrupted. To solve this you must toggle delayed expansion on and off within the loop:
setlocal disableDelayedExpansion
...
for /f "delims=" %%L in (findstr /n "^" "%inputfile%") do (
set "ln=%%L"
setlocal enableDelayedExpansion
set "ln=!ln:*:=!"
REM You now have the original line with ! preserved, do whatever needs done here
endlocal
)
Also, when ECHOing an empty line, it will print out ECHO is off unless you do something like
echo(!ln!
It takes time to open and position the write cursor to the end every time you use >> within the loop. It is faster to enclose the entire operation in one set of parentheses and redirect once. Also, you can replace the DEL and REN with a single MOVE command.
Here is a final robust script:
#echo off
setlocal disableDelayedExpansion
set "inputfile=variables.txt"
set line=0
>"%inputfile%.new" (
for /f "delims=" %%L in (findstr /n "^" "%inputfile%") do (
set "txt=%%L"
set /a line+=1
setlocal enableDelayedExpansion
set "txt=!txt:*:=!"
if !line! equ 4 (
echo New line content here
) else (
echo(!txt!
)
endlocal
)
)
move /y "%inputfile%.new" "%inputfile%" >nul
endlocal
That is an awful lot of work for such a simple task, and it requires a lot of arcane knowledge.
There is a much quicker hack that works as long as
your first 4 lines do not exceed 1021 bytes
none of your first 3 lines have trailing control characters that need to be preserved
the remaining lines do not have <tab> characters that must be preserved (MORE converts <tab> into a string of spaces.
#echo off
setlocal enableDelayedExpansion
set "inputfile=variables.txt"
>"%inputfile%.new" (
<"%inputfile%" (
for /l %%N in (1 1 3) do (
set "ln="
set /p "ln="
echo(!ln!
)
)
echo New line content here
more +4 "%inputfile%"
)
move /y "%inputfile%.new" "%inputfile%"
That is still a lot of work and arcane knowledge.
I would use my JREPL.BAT utility
Batch is really a terrible tool for text processing. That is why I developed JREPL.BAT to manipulate text using regular expressions. It is a hybrid JScript/batch script that runs natively on any Windows machine from XP onward. It is extremely versatile, robust, and fast.
A minimal amount of code is required to solve your problem with JREPL. Your problem doesn't really require the regular expression capabilities.
jrepl "^" "" /jendln "if (ln==4) $txt='New content here'" /f "variables.txt" /o -
If used within a batch script, then you must use call jrepl ... because JREPL.BAT is also a batch script.
By default, the FOR command treats ; as the end-of-line character, so all those lines that start with ; are being ignored.
Add eol= to your FOR command, like this:
for /f "eol= delims=" %%l in (%inputfile%) do (
It looks like you're echoing just the line delimiter, not the whole line:
echo %%l>>%tempfile%
I'm rusty on ms-dos scripts, so I can't give you more than that.
I'm trying to use a batch file to convert a file containing sql code into a single environment variable for use with the MSSQL utility bcp.
For example, if InFile.sql contains
-- This is a simple statement
SELECT *
FROM table
The output of ECHO %query% should be
SELECT * FROM people
The code below works for me most of the time
SETLOCAL=ENABLEDELAYEDEXPANSION
:: Replace VarOld with VarNew
FOR /f "delims=" %%a IN ('TYPE InFile.sql') DO ( SET line=%%a & ECHO !line:table=people! >> TmpFile1 )
:: Remove comment lines starting with '-' and remove newline characters
(FOR /f "eol=- delims=" %%a in (TmpFile1) DO SET/p=%%a ) <nul >TmpFile2
:: Create variable 'Query'
FOR /f "delims=" %%a IN ('TYPE TmpFile2') DO SET query=%%a
however, the first FOR loop adds 3 space characters at the end of each line and the second FOR loop adds another space character so the result is
SELECT * FROM people
I could cope with the additional spaces (although the purist in me wasn't happy!) until I had to use it with a long SQL query and multiple replacement steps - every line in the file was having 12 space characters added. The additional spaces are enough to make the resulting query around 8300 characters long - too much for Windows' 8196 character limit for a batch file line.
Can anybody see how I can remove these spurious spaces?
Using tokens=* in a for loop should trim whitespace as you're capturing a line of infile.sql. Here's a proof of concept, echoing %query% contained within quotation marks to illustrate the trimming:
#echo off
setlocal enabledelayedexpansion
set query=
if "%~1"=="" goto usage
if not exist "%~1" goto usage
for /f "usebackq eol=- tokens=*" %%I in ("%~f1") do (
set "sub=%%I"
set query=!query! !sub:table=people!
)
:: strip the leading space from %query%
echo "%query:~1%"
goto :EOF
:usage
echo Usage: %~nx0 sqlfile
Example output:
C:\Users\me\Desktop>type infile.sql
-- This is a simple statement
SELECT *
FROM table
C:\Users\me\Desktop>test.bat infile.sql
"SELECT * FROM people"
The fundamental issue is that trailing spaces ARE significant in SET statements and ECHO statements before the redirectors.
In your code, you need to remove the spaces after %%a and people! in the first FOR Thus:
FOR /f "delims=" %%a IN ('TYPE InFile.sql') DO (SET line=%%a&ECHO !line:table=people!>> TmpFile1)
The next problem is a little more subtle. In
(FOR /f "eol=- delims=" %%a in (TmpFile1) DO SET/p=%%a ) <nul >TmpFile2
the space following /p=%%a is REQUIRED because it provides the separator between the text taken from the lines when building TmpFile2 - and that leads to a superfluous trailing space. Try replacing that space with a Q for instance - just for testing.
Hence, you need to delete the final space from QUERY after it's been constructed in your final FOR
SET query=%query:~0,-1%
I have a FOR /F statement which I want it to enumerate the delims the command is parsing. For example:
FOR /F "delims=," %%A IN ("This,is,a,comma,delimited,sentence") DO (
some command to enumerate delims
)
:OUT
I want it to account for each delimited item in that sentence. In this case it would output 6
EDIT: I know the long way would be to do a check on each one.. but I'm trying to avoid that method:
IF %%A == [] SET enum=0 && GOTO:OUT
IF %%B == [] SET enum=1 && GOTO:OUT
etc.
There is NO way to directly enumerate items in a FOR /F "delims=..." command. The usual way to do that is via a loop that count one item, eliminate it from the sentence and repeat while there was a counted item.
However, depending on the specific delimiter and the rest of characters in the sentence, you may use a FOR command (with no /F option) that will REPEAT its code with each item separated BY THE STANDARD BATCH DELIMITERS, that are comma, semicolon and equal-sign, besides spaces. In your particular example:
SET ENUM=0
FOR %%A IN (This,is,a,comma,delimited,sentence) DO SET /A ENUM+=1
directly count the number of comma-separated items. If the delimiter is a character other than comma, semicolon or equal-sign, a possible solution is a three steps method:
1- Replace spaces, comma, semicolon and equal-sign for another know character(s).
2- Replace the delimiter for any Batch standard delimiter (space, comma, etc).
3- Use a simple FOR to directly enumerate the items.
There IS a way to directly enumerate items in a FOR /F "delims=..." command.
You only need to insert some newlines into the string.
setlocal EnableDelayedExpansion
set LF=^
rem ** Two empty lines are required
set count=0
FOR /F "tokens=* delims=" %%a in ("item1!LF!item2!LF!item3") DO (
set /a count+=1
echo !count!: "%%a"
)
echo(
set "CSV=This,is,a,comma,delimited,sentence"
echo Or with comma separeted text: !CSV!
for %%L in ("!LF!") do set "CSV=!CSV:,=%%~L!"
set count=0
FOR /F "tokens=* delims=" %%a in ("!CSV!") DO (
set /a count+=1
echo !count!: "%%a"
)
Aacini's suggestion to use a simple FOR instead of FOR /F is a good solution, unless the string might contain wildcard characters * or ?. The ? character could be protected by search and replace, but there is no efficient way to replace * in batch.
If you run into the wildcard problem then you can revert to using FOR /F in a loop to parse one word at a time. Most people use GOTO to accomplish the loop because you have no way of knowing how many words you will find. But GOTO is relatively slow. You can achieve a significant performance boost by using an outer FOR loop (not FOR /L) with an arbitrarily large number of items. Within the body you can exit the loop with a GOTO whenever there are no more words. If the loop falls through without exhausting the words, you can use a GOTO to restart the loop. An outer loop with 100 items will only perform 1 GOTO per 100 parsed words.
#echo off
setlocal enableDelayedExpansion
set "str=This,is,a,comma,delimited,sentence"
set "L10=1 2 3 4 5 6 7 8 9 0"
:loop - The outer FOR loop can handle 100 words before a single GOTO is needed
for %%. in (%L10% %L10% %L10% %L10% %L10% %L10% %L10% %L10% %L10% %L10%) do (
for /f "tokens=1* delims=," %%A in ("!str!") do (
echo %%A
if "%%B" == "" goto :break
set "str=%%B"
)
)
goto :loop
:break
As with all FOR loops, you must worry about corruption of ! and ^ if delayed expansion is enabled when the %%A variable is expanded.
FOR /L should not be used for the outer loop because FOR /L always finishes counting all iterations, even if you use GOTO within the body.