Getting a list of common strings with first occurance mark - windows

I've got bunch of text files with some content. First I wanted to number the lines globally. Then I extracted all lines that are duplicated somewhere (occur in any of given files at least twice). But now I need to mark all of these lines with the filename and line number of the first occurrence of this line. And now the funny part - it needs to be a windows batch file, using native windows tools. That's why I've got this problem to begin with.
So, to sum it up:
I have a file A with unique strings/lines, each of them is said to occur at least twice in given set of files.
I need to search these files and mark all occurrences of given line from A file with
-file name in which the line first occured
-line number in this file
This is my code with effort to number lines and format files.
#echo off
setlocal EnableDelayedExpansion
set /a lnum=0
if not [%1]==[] pushd %1
for /r %%F in (*.txt) do call :sub "%%F"
echo Total lines in %Files% files: %Total%
popd
exit /b 0
:Sub
set /a Cnt=0
for /f %%n in ('type %1') do (
set /a Cnt+=1
set /a lnum=!lnum!+1
echo ^<!lnum!^> %%n >> %1_ln.txt && echo ^<!lnum!^> >> %1_ln.txt && echo. >> %1_ln.txt
)
set /a Total+=Cnt
set /a Files+=1
echo %1: %Cnt% lines

#echo off
setlocal EnableDelayedExpansion
set lnum=0
if not "%~1" == "" pushd %1
rem "I've got bunch of text files..." (%%F is file name)
for /r %%F in (*.txt) do call :sub "%%F"
echo Total lines in %Files% files: %lnum%
popd
exit /b 0
:Sub "filename"
set Cnt=0
rem "... with some content." (%%n is line contents)
(for /f "usebackq delims=" %%n in (%1) do (
set /a Cnt+=1
rem "First I wanted to number the lines globally."
set /a lnum+=1
echo ^<!lnum!^> %%n
rem "Then I extracted all lines that are duplicated somewhere" (that were defined before)
if defined line[%%n] (
rem "I need to mark all of these lines with the filename and line number of the first occurrence of this line."
echo ^<!line[%%n]!^>
echo/
) else (
REM (Store the first occurrence of this line with *local* line number and filename)
set line[%%n]=!Cnt!: %1
)
)) > "%~PN1_ln.txt"
set /A Files+=1
echo %1: %Cnt% lines
exit /B
The above Batch program ignore empty lines in the input files and fail if they contain special Batch characters, like ! & < > |; this limitation may be fixed if required.

#ECHO OFF
SETLOCAL
FOR /f "delims=" %%s IN (A) DO (
SET searching=Y
FOR /f "delims=" %%f IN (
'dir /s /b /a-d *.txt') DO IF DEFINED searching (
FOR /f "tokens=1delims=:" %%L IN (
'findstr /b /e /n /l /c:"%%s" ^<"%%f"') DO IF DEFINED searching (
ECHO Line %%L IN "%%f" FOUND "%%s"
SET "searching="
)
)
)
Here's the meat of a routine that should do what you appear to be looking for - and that's as clear as mud.
It looks through the "A" file for each string in turn, assigns the string to %%s and sets the flag searching
Then it looks through the file list, assigning filenames to %%f
Then it executes a findstr to find the /c:"%%s" complete string %%s (including any spaces) in /l or literal mode (ie. not using regular expressions) for a line that both /b and /e begins and ends with the target (ie exactly matches) and /n numbers those lines.
The output of findstr will be in the format linenumber:linecontents so if this line is examined by the FORwith the option "delims=:" then the partion up to the first : is assigned to to %%L
So - %%L contains the line#, %%f the filename, %%s the string
Clearing searching having detected this line by setting its value to [nothing] means it's not NOT DEFINED hence no further lines will be reported from the current file, and no further filenames will be examined.
Now if you want to get a listing of ALL of the occurrences of the target lines, all you need to do is to REM-out the SET "searching=" line. Searching will then never be reset, so each line in each file is reported.
If you want some other combination, please clarify.
I have absolutely no idea whatever what you mean by "marking" a line.

#ECHO OFF & setlocal
for /f "tokens=1*delims==" %%i in ('set "$" 2^>nul') do set "%%i="
for %%a in (*.txt) do (
for /f %%b in ('find /v /c "" ^<"%%a"') do echo(%%b lines in %%a.
set /a counter=0, files+=1
for /f "usebackqdelims=" %%b in ("%%~a") do (
set /a counter+=1, total+=1
set "line=%%b"
setlocal enabledelayedexpansion
if not defined $!line! set "$!line!=%%a=!counter!=!line!"
for /f "delims=" %%i in ('set "$" 2^>nul') do (if "!"=="" endlocal)& set "%%i"
)
)
echo(%total% lines in %files% files.
for /f "delims=" %%a in (a) do set "#%%a=%%a"
for /f "tokens=2,3*delims==:" %%i in ('set "$" 2^>nul') do (
if defined #%%k echo("%%k" found in %%i at line %%j.
)
Script can handle !&<>|%, but not =.

Related

Creating Each line of text as variable and them constantly changing in a loop in batch

So what I'm trying to do is create a find for multiple people where it in the text file it will say names and numbers like
Example of text file:
Beth
1234567891
Jay
2134456544
This is the best way I can explain what I'm trying to do:
#echo off
set "file=Test1.txt"
setlocal EnableDelayedExpansion
<"!file!" (
for /f %%i in ('type "!file!" ^| find /c /v ""') do set /a n=%%i && for /l %%j in (1 1 %%i) do (
set /p "line_%%j="
)
)
set /a Name=1
set /a Number=2
Echo Line_%Name%> %Name%.txt (Im trying to get this to say line_2 to say 1st line in the text file)
Echo Line_%Number%> %Name%.txt (Im trying to get this to say line_2 to say 2nd line in the text file)
:Start
set /a Name=%Name%+2 (These are meant to take off after 1 so lines 3,5,7,9 so on)
set /a Number=%Number%+2 (These are meant to take off after 2 so lines 4,6,8,10 so on)
Echo Line_%Name%
Echo Line_%Number%
GOTO :Start
so the outcome would be
In Beth.txt:
Beth
1234567891
So every name will be a file name and the first line in a file. I will change it later so I can do a addition in each text file.
Name: Beth
Number: 1234567891
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=u:\your files"
SET "destdir=u:\your results"
SET "filename1=%sourcedir%\q65417881.txt"
rem make sure arrays are empty
For %%b IN (name number) DO FOR /F "delims==" %%a In ('set %%b[ 2^>Nul') DO SET "%%a="
rem Initialise counter and entry array
SET /a count=0
SET "number[0]=dummy"
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO (
IF DEFINED number[!count!] (SET /a count+=1&SET "name[!count!]=%%a") ELSE (SET "number[!count!]=%%a")
)
rem clear out dummy entry
SET "number[0]=dummy"
FOR /L %%c IN (1,1,%count%) DO (
rem replace spaces with dashes
SET "name[%%c]=!name[%%c]: =-!"
rem report to console rem report to console
ECHO Name: !name[%%c]! Number: !number[%%c]!
rem generate name.txt file
(
ECHO !name[%%c]!
ECHO !number[%%c]!
)>"%destdir%\!name[%%c]!.txt"
)
GOTO :EOF
You would need to change the values assigned to sourcedir and destdir to suit your circumstances. The listing uses a setting that suits my system.
I deliberately include spaces in names to ensure that the spaces are processed correctly.
I used a file named q65417881.txt containing your data for my testing.
The line data read from the file is assigned to %%a is assigned to and number[!count!] alternately. The data is retained in these arrays for use by further processing.
[Edited to include conversion of spaces within names to dashes]
If I understand correctly, you want to precede every second line with Number: + SPACE and every other line with Name: + SPACE. For this you do not need to store each line in a variable first, you can use a single for /F loop lo read the file line by line and process every line individually. There are two possibilities:
Temporarily precede every line with a line number plus : using findstr /N:
#echo off
rem // Loop through lines and precede each with line number plus `:`:
for /F "tokens=1* delims=:" %%K in ('findstr /N "^" "Test1.txt"') do (
rem // Calculate remainder of division by two:
set /A "MOD=%%K%%2" 2> nul
rem // Toggle delayed expansion to avoid issues with `!`:
setlocal EnableDelayedExpansion
rem // Conditionally return line string with adequate prefix:
if !MOD! neq 0 (
endlocal & echo Name: %%L
) else (
endlocal & echo Number: %%L
)
)
This will fail when a line begins with the a :.
Check whether numeric representation of current line string is greater than 0:
#echo off
rem // Loop through (non-empty) lines:
for /F "usebackq delims=" %%L in ("Test1.txt") do (
rem // Determine numeric representation of current line string:
set /A "NUM=%%L" 2> nul
rem // Toggle delayed expansion to avoid issues with `!`:
setlocal EnableDelayedExpansion
rem // Conditionally return line string with adequate prefix:
if !NUM! equ 0 (
endlocal & echo Name: %%L
) else (
endlocal & echo Number: %%L
)
)
This fails when a name begins with numerals and/or when a numeric line is 0.
And just for the sake of posting something different:
#SetLocal EnableExtensions DisableDelayedExpansion & (Set LF=^
% 0x0A %
) & For /F %%G In ('Copy /Z "%~f0" NUL') Do #Set "CR=%%G"
#For /F "Tokens=1,2* Delims=:" %%G In ('%__AppDir__%cmd.exe /D/V/C ^
"%__AppDir__%findstr.exe /NR "^[a-Z]*!CR!!LF![0123456789]" "Test1?.txt" 2>NUL"
') Do #(SetLocal EnableDelayedExpansion
(Set /P "=Name: %%I!CR!!LF!Number: " 0<NUL & Set "_="
For /F Delims^=^ EOL^= %%J In ('%__AppDir__%more.com +%%H "%%G"') Do #(
If Not Defined _ Set "_=_" & Echo %%J)) 1>"%%I.txt" & EndLocal)
This file should be run with the Test1.txt file in the current working directory. It is important that along side Test1.txt, there are no other .txt files with the same basename followed by one other character, (for example Test1a.txt or Test12.txt). Should you wish to change your filename, just remember that you must suffix its basename in the above code with a ? character, (e.g. MyTextFile.log ⇒ MyTextFile?.log).
I had the rare opportunity to verify that this script worked against the following example Test1.txt file:
Beth
1234567891
Jay
2134456544
Bob
2137856514
Jimmy
4574459540
Mary
3734756547
Gemma
6938456114
Albert
0134056504

How to use Windows batch for bicirculation

two files 1.txt 2.txt
1.txt
a2441
b4321
p8763
2.txt
Apple
banana
peach
I want to generated content in 12.bat
ren a2441 Apple
ren b4321 banana
ren p8763 peach
I try to write like this but failed
#echo on
for /f %%a in (1.txt) do (
for /f %%b in (2.txt ) do (
ren %%a %%b>>12.bat
)
)
pause
How can I achieve my desired results? help me,thks
It is not so straightforward. You'll need to skip increasing amount of lines in the inner loop , but because of the FOR parsing you'll need additional subroutine call.And if there are empty lined they need to be stripped with findstr for more accurate skipping of the lines. Also skip argument in FOR does not accept 0 so also an if is needed.
#echo off
setlocal ENABLEDELAYEDEXPANSION
set /a s=0
for /f "tokens=* delims=" %%a in (1.txt) do (
call :inner !s!
echo ren %%a !second!
set /a s=s+1
)
endlocal
pause
exit /b
:inner
set "second="
if %1==0 (
for /f "tokens=* delims=" %%b in ('findstr /r "[a-zA-Z]" "2.txt"') do (
if not defined second set "second=%%b"
)
) else (
for /f "skip=%1 tokens=* delims=" %%b in ('findstr /r "[a-zA-Z]" "2.txt"' ) do (
if not defined second set "second=%%b"
)
)
The problem with anidated loops is that for each line of the outer file/loop all the lines of the inner file/loop are processed.
One simple solution is to number the lines of both files and only generate the output line when both numbers match. Quick and dirty:
#echo off
setlocal enableextensions disabledelayedexpansion
>"12.bat" (
for /f "tokens=1,* delims=:" %%a in ('findstr /n "^" 1.txt') do (
for /f "tokens=1,* delims=:" %%c in ('findstr /n "^" 2.txt') do (
if %%a==%%c (
echo ren %%b %%d
)
))
)
This code just uses findstr to generate line numbers (/n) for each of the input files, and using the delims and tokens clauses of the for /f command to separate them from the original line contents (note: if the lines could start with a colon, more code is needed). If the line number in first file matches the line number in second file the command is echoed (there is a active output redirection to the final file)
Another option is to assign each file to a input stream and to read them using set /p operations from each of the streams
#echo off
setlocal enableextensions disabledelayedexpansion
call :processFiles 9<"1.txt" 8<"2.txt" >"12.bat"
goto :eof
:processFiles
<&9 set /p f1= || goto :eof
<&8 set /p f2= || goto :eof
echo ren %f1% %f2%
goto :processFiles
In the call to the subroutine we assign each file to a stream. Inside the subroutine each stream is read with set /p (note: set /p can not read lines longer than 1021 characters)

Reading list of files in a directory and copying the contents using batch command file

I have a list of csv files in a directory which have name with format XX_YYYFile.csv, where XX is a name that can have any characters (including space), and YYY is random 3 digits. For example: "book_123File.csv", "best movie_234File.csv", etc. I want to read this list of files then create new CSV files by removing "_YYYFile". The content of the new files are the same with the original ones, except the first line needs to be added with value "number,name,date".
set inputFileFolder=C:\Input
set outputFileFolder=C:\Output
FOR /F "delims=" %%F IN ('DIR %inputFileFolder%\*File.csv /B /O:D') DO (
set reportInputFile=%inputFileFolder%\%%F
set reportInputFileName=%%F
set result=!reportInputFileName:~0,-12!
set reportOutputFileName=!result!.csv
set reportOutputFile=%outputFileFolder%\!result!.csv
echo number,name,date > !reportOutputFile!
for /f "tokens=* delims=" %%a in (!reportInputFile!) do (
echo %%a >> !reportOutputFile!
)
)
If I run this batch file, file "book.csv" is successfully created with the correct contents (first line: "number,name,date", the next lines are from file "book_123.csv"). But file "best movie_234.csv" and other files contain space in the filename are not created successfully. File "best movie.csv" is created with only 1 line "number,name,date". The contents of file "best movie_234.csv" are not copied to file "best movie.csv".
Please help.
You need to Escape Characters, Delimiters and Quotes properly. Note the usebackq parameter in inner for /F loop as well:
#ECHO OFF
SETLOCAL EnableExtensions EnableDelayedExpansion
set "inputFileFolder=C:\Input"
set "outputFileFolder=C:\Output"
FOR /F "delims=" %%F IN ('DIR "%inputFileFolder%\*File.csv" /B /O:D') DO (
set "reportInputFile=%inputFileFolder%\%%F"
set "reportInputFileName=%%F"
set "result=!reportInputFileName:~0,-12!"
set "reportOutputFileName=!result!.csv"
set "reportOutputFile=%outputFileFolder%\!result!.csv"
>"!reportOutputFile!" echo number,name,date
for /f "usebackq tokens=* delims=" %%a in ("!reportInputFile!") do (
>>"!reportOutputFile!" echo %%a
)
rem above `for /f ... %%a ...` loop might be replaced by FINDSTR
rem >>"!reportOutputFile!" findstr "^" "!reportInputFile!"
rem or by TYPE
rem >>"!reportOutputFile!" type "!reportInputFile!"
)
Hint: each > and >> redirector works as follows:
opens specified oputput file, then
writes something to oputput file, and finally
closes oputput file.
This procedure might be extremely slow if repeated in next for /f ... %%a ... loop for larger files:
>"!reportOutputFile!" echo number,name,date
for /f "usebackq tokens=* delims=" %%a in ("!reportInputFile!") do (
>>"!reportOutputFile!" echo %%a
)
Use block syntax rather:
>"!reportOutputFile!" (
echo number,name,date
for /f "usebackq tokens=* delims=" %%a in ("!reportInputFile!") do (
echo %%a
)
)
above for /f ... %%a ... loop might be replaced by FINDSTR command (it eliminates empty lines like for does) as follows:
>"!reportOutputFile!" (
echo number,name,date
findstr "^." "!reportInputFile!"
)
or by TYPE command (it will retain empty lines unlike for) as follows:
>"!reportOutputFile!" (
echo number,name,date
type "!reportInputFile!"
)

Windows - findstr in for loop (file content)

I have a text file which containts stuff like
M test123
S test
M abc
and so on...
I'm trying to write a batch script that will do the following:
Read this text file, search every line for "M " (with spaces!) and then save the found line in a variable, delete the "M " and store the output in a seperate output.txt
So the output.txt should containt then following:
test123
S test
abc
Here's what I have so far:
SETLOCAL ENABLEDELAYEDEXPANSION
SET count=1
FOR /F "tokens=* USEBACKQ" %%F IN (output_whole_check.txt) DO (
SET var!count!=%%F
findstr /lic:"M " > nul && (set var!count!=var!count!:~8%) || (echo not found)
SET /a count=!count!+1
)
ENDLOCAL
Or is there some easier way to solve that without any additional stuff installed on windows?
Try this one. It echoes all lines to output.txt with "M " replaced with nothing.
#echo off & setlocal
>output.txt (
FOR /F "usebackq delims=" %%I IN ("output_whole_check.txt") DO (
set "line=%%I"
setlocal enabledelayedexpansion
echo(!line:M =!
endlocal
)
)
Result:
test123
S test
abc
Or if your output_whole_check.txt is very large, it might be faster to loop over the lines using a for /L loop. for /L is more efficient than for /F. You just have to get a count of the lines to determine how many iterations to loop.
#echo off & setlocal
rem // get number of lines in the text file
for /f "tokens=2 delims=:" %%I in ('find /v /c "" "output_whole_check.txt"') do set /a "count=%%I"
<"output_whole_check.txt" >"output.txt" (
for /L %%I in (1,1,%count%) do (
set /P "line="
setlocal enabledelayedexpansion
echo(!line:M =!
endlocal
)
)
The result is the same output.

Batch split a text file

I have this batch file to split a txt file:
#echo off
for /f "tokens=1*delims=:" %%a in ('findstr /n "^" "PASSWORD.txt"') do for /f "delims=~" %%c in ("%%~b") do >"text%%a.txt" echo(%%c
pause
It works but it splits it line by line. How do i make it split it every 5000 lines. Thanks in advance.
Edit:
I have just tried this:
#echo off
setlocal ENABLEDELAYEDEXPANSION
REM Edit this value to change the name of the file that needs splitting. Include the extension.
SET BFN=passwordAll.txt
REM Edit this value to change the number of lines per file.
SET LPF=50000
REM Edit this value to change the name of each short file. It will be followed by a number indicating where it is in the list.
SET SFN=SplitFile
REM Do not change beyond this line.
SET SFX=%BFN:~-3%
SET /A LineNum=0
SET /A FileNum=1
For /F "delims==" %%l in (%BFN%) Do (
SET /A LineNum+=1
echo %%l >> %SFN%!FileNum!.%SFX%
if !LineNum! EQU !LPF! (
SET /A LineNum=0
SET /A FileNum+=1
)
)
endlocal
Pause
exit
But i get an error saying: Not enough storage is available to process this command
This will give you the a basic skeleton. Adapt as needed
#echo off
setlocal enableextensions disabledelayedexpansion
set "nLines=5000"
set "line=0"
for /f "usebackq delims=" %%a in ("passwords.txt") do (
set /a "file=line/%nLines%", "line+=1"
setlocal enabledelayedexpansion
for %%b in (!file!) do (
endlocal
>>"passwords_%%b.txt" echo(%%a
)
)
endlocal
EDITED
As the comments indicated, a 4.3GB file is hard to manage. for /f needs to load the full file into memory, and the buffer needed is twice this size as the file is converted to unicode in memory.
This is a fully ad hoc solution. I've not tested it over a file that high, but at least in theory it should work (unless 5000 lines needs a lot of memory, it depends of the line length)
AND, with such a file it will be SLOW
#echo off
setlocal enableextensions disabledelayedexpansion
set "line=0"
set "tempFile=%temp%\passwords.tmp"
findstr /n "^" passwords.txt > "%tempFile%"
for /f %%a in ('type passwords.txt ^| find /c /v "" ') do set /a "nFiles=%%a/5000"
for /l %%a in (0 1 %nFiles%) do (
set /a "e1=%%a*5", "e2=e1+1", "e3=e2+1", "e4=e3+1", "e5=e4+1"
setlocal enabledelayedexpansion
if %%a equ 0 (
set "e=/c:"[1-9]:" /c:"[1-9][0-9]:" /c:"[1-9][0-9][0-9]:" /c:"!e2![0-9][0-9][0-9]:" /c:"!e3![0-9][0-9][0-9]:" /c:"!e4![0-9][0-9][0-9]:" /c:"!e5![0-9][0-9][0-9]:" "
) else (
set "e=/c:"!e1![0-9][0-9][0-9]:" /c:"!e2![0-9][0-9][0-9]:" /c:"!e3![0-9][0-9][0-9]:" /c:"!e4![0-9][0-9][0-9]:" /c:"!e5![0-9][0-9][0-9]:" "
)
for /f "delims=" %%e in ("!e!") do (
endlocal & (for /f "tokens=1,* delims=:" %%b in ('findstr /r /b %%e "%tempFile%"') do #echo(%%c)>passwords_%%a.txt
)
)
del "%tempFile%" >nul 2>nul
endlocal
EDITED, again: Previous code will not correctly work for lines starting with a colon, as it has been used as a delimiter in the for command to separate line numbers from data.
For an alternative, still pure batch but still SLOW
#echo off
setlocal enableextensions disabledelayedexpansion
set "nLines=5000"
set "line=0"
for /f %%a in ('type passwords.txt^|find /c /v ""') do set "fileLines=%%a"
< "passwords.txt" (for /l %%a in (1 1 %fileLines%) do (
set /p "data="
set /a "file=line/%nLines%", "line+=1"
setlocal enabledelayedexpansion
>>"passwords_!file!.txt" echo(!data!
endlocal
))
endlocal
Test this: the input file is "file.txt" and output files are "splitfile-5000.txt" for example.
This uses a helper batch file called findrepl.bat - download from: https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat
Place findrepl.bat in the same folder as the batch file or on the path.
#echo off
:: splits file.txt into 5000 line chunks.
set chunks=5000
set /a s=1-chunks
:loop
set /a s=s+chunks
set /a e=s+chunks-1
echo %s% to %e%
call findrepl /o:%s%:%e% <"file.txt" >"splitfile-%e%.txt"
for %%b in ("splitfile-%e%.txt") do (if %%~zb EQU 0 del "splitfile-%e%.txt" & goto :done)
goto :loop
:done
pause
A limitation is the number of lines in the file and the real largest number is 2^31 - 1 where batch math tops out.
#echo off
setlocal EnableDelayedExpansion
findstr /N "^" PASSWORD.txt > temp.txt
set part=0
call :splitFile < temp.txt
del temp.txt
goto :EOF
:splitFile
set /A part+=1
(for /L %%i in (1,1,5000) do (
set "line="
set /P line=
if defined line echo(!line:*:=!
)) > text%part%.txt
if defined line goto splitFile
exit /B
If the input file has not empty lines, previous method may be modified in order to run faster.

Resources