Escapaing FINDSTR Characters

Escapaing FINDSTR Characters - windows

I have been unsucessful in escapaing a combination of letters within my FINDSTR search query. My query contains a quote mark (") and a chvron (>).
My search query = findstr /N /C:"font-size:10px">" inputfile.txt > outpputfile.txt
I have serached the web and this site for solutions. I have managed to successfully escape the quote mark by placing my quote within two other quotes i.e. "font-size:10px""""
Can I get some help in determineing how I can escape the BOTH the quote and chevron to make my query successful.
PS - Ultimately, I am trying to develop a search to find all matches for the string font-size:10px">AAAA. where the letters AAAA represent any number between 1-9999 followed by a period (.) - if you can provide some solution/support with that I would be even more greatful (if that is possible :)

The escape character of the findstr command is the backslash, so " must be expressed by \". However, you need to regard, that the Command Interpreter cmd.exe also recognises quotation marks, and that it uses another character for escaping, namely the caret symbol, leading to this:
findstr /N /R /C:"font-size:10px\"^>[0-9][0-9]*\.^" "inputfile.txt" > "outputfile.txt"
In this command line, the portion font-size:10px\ appears quoted to cmd.exe. The subsequent characters appears unquoted, that is why the > character must be escaped like ^>. The . is a special character for findstr, so we must escape it like \.. The closing quotation mark for the search expression must be escaped like ^" in order to hide it from cmd.exe, because there is an unbalanced number of quotation marks, which would otherwise affect the remaining command line.
Note, that [0-9] may also match characters like ¹, ², ³, depending on the current code page. To avoid that, use the following expression instead:
findstr /N /R /C:"font-size:10px\"^>[0123456789][0123456789]*\.^" "inputfile.txt" > "outputfile.txt"
As you may have noticed, I have quoted all file names. this is not necessary in this particular situation, but it is best practice to avoid troubles with file paths/names containing white-spaces or other special characters.

As the question specifically stated:
where the letters AAAA represent any number between 1-9999
The more complete code would be:
%SystemRoot%\System32\findstr.exe /R /N /C:"font-size:10px\"^>[23456789]\." /C:"font-size:10px\"^>[123456789][0123456789]\." /C:"font-size:10px\"^>[123456789][0123456789][0123456789]\." /C:"font-size:10px\"^>[123456789][0123456789][0123456789][012345678]\." "inputfile.txt" 1>"outputfile.txt"
As you can see, I've used the /C option in four different combinations; to match for numbers 2 to 9, numbers 10 to 99, numbers 100 to 999, and numbers 1000 to 9998; thus catering for all possible whole numbers 'between' those requested.
If you really meant 1 to 9999 inclusive, then a quick modification is all you need:
%SystemRoot%\System32\findstr.exe /R /N /C:"font-size:10px\"^>[123456789]\." /C:"font-size:10px\"^>[123456789][0123456789]\." /C:"font-size:10px\"^>[123456789][0123456789][0123456789]\." /C:"font-size:10px\"^>[123456789][0123456789][0123456789][0123456789]\." "inputfile.txt" 1>"outputfile.txt"

Related

How to insert comma before last but one and last word in all lines of a file?

I want a batch script to insert comma left to last but one word and between last but one word and last word in all lines of a file. The commas should replace the spaces between the words.
E.g. test file:
This is first line
This is the second line
Check Subsystem version 3.1.8-11P
I want output look like:
This is,first,line
This is the,second,line
Check Subsystem,version,3.1.8-11P
This script is inserting comma before the last word, but I have a line in between with & and its deleting that line with & symbol in it.
E.g. Create & Delete Version 1.1.1
This line is getting deleted.
#echo off
for /f usebackq^ delims^=^ eol^= %%A in ("Doc.txt") do (
set "s=%%A"
set "s1=%%A"
setlocal enableDelayedExpansion
set "s=!s:#=#a!"
set "s=!s:\=#b!"
set "s=!s:/=#f!"
set "s=!s:.=#d!"
set "s=!s: =.!"
for /f "delims=" %%A in (".!s!") do (
endlocal
set "s=%%~xA"
setlocal enableDelayedExpansion
if defined s (
set "s=!s:.= !"
set "s=!s:#d=.!"
set "s=!s:#f=/!"
set "s=!s:#b=\!"
set "s=!s:#a=#!"
set "s=!s:~1!"
call set s2=%%s1:!s!=%%
echo !s2!,!s! >> output.txt
)
endlocal
)
)
pause
And also I'm not sure how to insert comma left to last but one word.

This can be done easily with any text editor supporting regular expressions by searching for
\W+(\w+)\W+(\w+)$ and using as replace string ,$1,$2.
How can you find and replace text in a file using the Windows command-line environment? contains in one answer the link to latest version of JREPL.BAT written by by Dave Benham which makes it possible to make this modification on the file using the regular expression search and replace strings.
For usage of JREPL.BAT from within a batch file with this additional batch file in same directory as the executed batch file:
call "%~dp0jrepl.bat" "\W+(\w+)\W+(\w+)$" ",$1,$2" /F "%~1" /O -
"%~1" references here in this generic command line the name of the file to modify.
Run jrepl.bat /? for help on the other options used here.
Run call /? for help on this command and explanation of %~dp0 (drive and path of current batch file ending with a backslash).
Explanation of the regular expression replace:
\W+ ... find 1 or more non word character according to Unicode standard.
(\w+) ... find 1 or more word character according to Unicode standard and mark the found string for back-referencing in replace string with $1.
\W+ ... find again 1 or more non word character according to Unicode standard.
(\w+) ... find again 1 or more word character according to Unicode standard and mark the found string for back-referencing in replace string with $2.
$ ... search expression results in a positive match only if the two words could be found at end of a line whereby the newline characters are not matched.
For a line like
Check Subsystem version 3.1.8-11P
which should be modified to
Check Subsystem,version,3.1.8-11P
in the file a different regular expression search string is necessary because of . and - are non word characters according to Unicode standard.
The search string must be modified to: [^\w.\-]+([\w.\-]+)[^\w.\-]+([\w.\-]+)$
[...] is positive character class definition matching any character specified within the square brackets.
[^...] is a negative character class definition matching any character not specified within the square brackets.
\w is a special character class definition for all word characters according to Unicode standard.
. inside square brackets of a positive or negative character class definition is interpreted as literal character. The dot character would have a special meaning in regular expression search strings outside a character class definition.
- inside square brackets of a positive or negative character class definition means all characters from character X specified left to - to character Y specified right to - according to code values of the characters X and Y. The hyphen is interpreted also as literal character if there is no character either left or right to - inside the character class definition. However, it is advisable to escape - with a backslash inside a character class definition independent on its position within the brackets when the hyphen character should be interpreted as literal character. Outside a character class definition the hyphen character has no special meaning.
So the entire command line to use for a file containing strings with dot and hyphen which should be interpreted as "words" is:
call "%~dp0jrepl.bat" "[^\w.\-]+([\w.\-]+)[^\w.\-]+([\w.\-]+)$" ",$1,$2" /F "%~1" /O -

How can I make a batch file that will tell me which lines of a text file are NOT in another file?

What I'm trying to do is take a text file with a bunch of strings to search for, each on its own line, and search for each one of these strings in a file (check.txt). I want the output to be a text file with a list of all the strings that COULDN'T be found.
I've tried a few things so far.
for /F "tokens=*" %%A in search.txt do (
#echo on
FINDSTR %%A check.txt
IF ERRORLEVEL 1 echo %%A FAIL > fail_match.txt
)
Another attempt I made (this one was just to tell me if the whole list was good or not) was
#echo on
FINDSTR /g:search.txt check.txt > a_match.txt
IF ERRORLEVEL 1 echo bad > a_match.txt
I realize that these are incredibly basic, and I'm sure there's some easy answer that I just don't understand. I'm not a programmer; I just want to make my job a lot easier (and faster).
To clarify, my list of things to search for is in search.txt, my list of things to check them against is check.txt. Check.txt is a json file, so it's all one enormous line. I don't know if that will make a difference or not. I want a list of all lines in search.txt that are not in check.txt.

Your search scheme seems naive on two fronts:
1) JSON is not guaranteed to be a single line. A valid JASON may have any amount of whitespace, including newlines. This could cause problems if your search string logically matches across multiple lines.
2) What about substring matches? Suppose one search string is bat, and your JSON contains bath. I doubt you would want to consider that a match.
It is possible that neither of the above concerns are a problem for your case. Assuming they aren't, then there may be a fairly simple solution using FINDSTR.
You were close on your first try, except
A) - Your FOR /F IN() clause is missing parentheses
B) - You want to force each search string to be interpreted as a string literal, possibly with spaces. That requires the /C: option.
C) - You assume leading spaces are not significant in your search string ("tokens=*" strips leading spaces)
D) - You assume no search lines begin with semicolon. (The default EOF character is semicolon, and FOR /F skips all lines that begin with the EOF character)
E) - Quotes and backslashes must be escaped within a search string: \" -> \\\\\", \ -> \\, " -> \". See What are the undocumented features and limitations of the Windows FINDSTR command? for more information.
Points C) and D) may be fixed by disabling EOF and DELIMS using the following odd syntax:
for delims^=^ eof^= %%A in ...
Point E) can be addressed by defining a variable and adding escape sequences via search and replace. But this requires delayed expansion, but delayed expansion will corrupt FOR /F variables upon expansion if they contain !. So delayed expansion must be strategically toggled on and off within the loop.
Instead of using IF ERRORLEVEN n, you can use conditional command concatenation || to take action if the previous command failed.
You don't need to see the output of the FINDSTR command, so that can be redirected to NUL.
You can improve performance by redirecting just once, outside the loop.
#echo off
setlocal disableDelayedExpansion
>fail_match.txt (
for /f delims^=^ eol^= %%A in (search.txt) do (
set "search=%%A"
setlocal enableDelayedExpansion
set "search2=!search:\"=\\"!"
set "search2=!search2:\=\\!"
set "search2=!search2:"=\"!"
findstr /c:"!search2!" check.txt >nul || echo !search!
endlocal
)
)
If none of your search strings begin with ;, and no search string contains " or \, then the solution can be as simple as:
#echo off
setlocal disableDelayedExpansion
>fail_match.txt (
for /f "delims=" %%A in (search.txt) do findstr /c:"%%A" check.txt >nul || echo %%A
)

if I read your question right (output all lines of check.txt that are not in search.txt), this single line should do:
findstr /v /x /g:search.txt check.txt > nomatch.txt

How to use OR operator with command FINDSTR from a Windows command prompt?

Findstr is supposed to support regular expressions and the way I am using it I need to have an OR to check if a file ends in .exe OR .dll. However I cannot get the OR operation to work. Windows thinks on using | that I try to pipe the previous command and OR is read as literal OR.

findstr.exe in Windows system32 directory supports only a very limited set of regular expression characters. Running in a command prompt window findstr /? results in getting displayed help for this console application listing also the supported regular expression characters with their meanings.
But as Eryk Sun explained in his comment above, multiple search strings can be specified on command line to build a simple OR expression.
In case of having a list file FileNames.lst containing for example
C:\Program Files\Internet Explorer\ieproxy.dll
C:\Program Files\Internet Explorer\iexplore.exe
C:\Program Files\Internet Explorer\iexplore.exe.mui
and just all file names ending with .dll OR .exe case-insensitive should be output by command findstr, the command line for getting this output could be:
%SystemRoot%\system32\findstr.exe /I /R "\.exe$ \.dll$" FileNames.lst
The output is for the example lines in FileNames.lst:
C:\Program Files\Internet Explorer\ieproxy.dll
C:\Program Files\Internet Explorer\iexplore.exe
The space in regular expression search string is interpreted by findstr as a separator between the two strings. Therefore findstr searches with the regular expression strings \.dll$ and \.exe$ and returns all lines where one of the two expressions matches a string.
Another method to OR two or more regular expression strings would be using parameter /C:"..." multiple times on command line which is necessary when a regular expression search string contains one or more spaces which should be included as literal character(s) in search expression.
%SystemRoot%\system32\findstr.exe /I /R /C:"\.dll$" /C:"\.exe$" FileNames.lst
The result is the same as above with the other command line.
But for this specific task it is not necessary at all to run a regular expression search as findstr offers also the parameter /E for returning only lines where the searched strings are found at end of a line.
%SystemRoot%\system32\findstr.exe /E /I /C:.exe /C:.dll FileNames.lst
A brief description between the differences on using "..." or /C:"...":
"regexp1 regexp2 regexp3" means searching for a line with a string matched by one of the three space separated regular expressions. The option /R can be used additionally to explicitly interpret the three strings between the two spaces as regular expressions. It is advisable to do so for making it 100% clear for findstr and every reader that the search strings are interpreted as regular expressions.
/L "word1 word2 word3" means searching for a line with a string matched by one of the three space separated literally interpreted strings. The used option /L forces explicitly an interpretation of the three strings between the two spaces as literal strings and not as regular expressions.
/C:"word 1" /C:"word 2" /C:"word 3" means searching for a line with a string matched by one of the three literally interpreted strings on which the space character is interpreted as space. The option /L can be used additionally to explicitly interpret the three search strings between as literal strings. It is advisable to do so for making it 100% clear for findstr and every reader that the search strings are interpreted as literal strings.
/R /C:"reg exp 1" /C:"reg exp 2" /C:"reg exp 3" means searching for a line with a string matched by one of the three regular expressions strings on which the space character is interpreted as space. The option /R forces explicitly an interpretation of the three strings as regular expressions with space being interpreted as space.

Windows Findstr

I'm trying to find files in a folder with specific pattern like:
abcd201 abcd001 abcd004
The folder contains files named
abcd(3 numbers)
I'm trying to use the pattern:
abcd[0,2][0][1,4] but currently not working.
DIR /b C:\Folder\abcd"[0,2][0][1,4]".txt
Thanks!

dir command does not support regular expressions. You need to filter the output with findstr
dir /b "c:\folder\abcd*.txt" | findstr /r /c:"^abcd[02]0[14]\.txt$"
That is, use dir command to obtain a first approximation of what you are searching and then filter the list (pipe the dir command to findstr) to obtain only the list of required files.
The regular expression (/r) in findstr means: filter the lines, starting at the start of the line (initial ^), followed by abcd, followed by any character in the set [02], followed by a 0, followed by any character in the set [14], followed by a dot (a single dot means any character, so, it needs to be escaped \.), followed by the string txt and the end of the line ($).
Maybe you will need to add a /i switch to findstr to indicate it must ignore case when matching.

The regex of your example would also match abcd204 name. You may find these 4 files in a simpler way:
for %a in (0 2) do for %c in (1 4) do dir /B C:\Folder\abcd%a0%c.txt 2>NUL
This method is faster than findstr's one, especially if the number of files is large.

Batch: create fileC.txt from the result of (fileA.txt minus fileB.txt)

I'm trying to create a batch that creates a fileC.txt containing all lines in fileA.txt except for those that contains the strings in the lines in fileB.txt:
Pseudo:
foreach(line L in fileA.txt)
excluded = false
foreach(string str in fileB.txt)
if L contains str
exclude = true
if !excluded
add L to fileC.txt
if L !contains
For example
fileA.txt: (all)
this\here\is\a\line.wav
and\this\is\another.wav
i\am\a\chocolate.wav
peanut\butter\jelly\time.wav
fileB.txt: (those to be excluded)
another.wav
time.wav
fileC.txt: (wanted result)
this\here\is\a\line.wav
i\am\a\chocolate.wav
I've been fiddling around with FINDSTR but I just can't seem to puzzle it together.. any help or pointers greatly appreciated!
Cheers!
/ Fredde

The answer should be this simple:
findstr /lvg:"fileB.txt" "fileA.txt" >fileC.txt
And with your example, the above does give the correct results.
But there is a nasty FINDSTR bug that makes it unreliable when using multiple case sensitive literal search strings. See Why doesn't this FINDSTR example with multiple literal search strings find a match?, as well as the answer that goes with it. For a "complete" list of undocumented FINDSTR features and bugs, see What are the undocumented features and limitations of the Windows FINDSTR command?.
So the simple code above can fail depending on the content of the files. If you can get away with using a case insensitive search, then the solution is simple.
findstr /livg:"fileB.txt" "fileA.txt" >fileC.txt
Edit: Both versions above will fail if fileB.txt contains \\ or \". In order to work properly, those strings must be escaped as \\\ and \\"
But if you must use a case sensitive search, then there is no simple solution. Your best bet for a pure batch solution might be to use the /R regular expression option. But then you will have to create a modified version of fileB.txt where all regex meta-characters are escaped so that the strings give the correct literal search. That is a mini project in and of itself.
Perhaps your best option for a case sensitive solution is to get a 3rd party tool like grep or sed for Windows.
Edit: Here is a reasonably performing pure batch solution that is nearly bullet proof
I looked into doing something like the proposed logic in your question. But using batch to read all lines in a file is relatively slow. This solution only reads the exclude file line by line. It uses FINDSTR to read the lines in "fileA.txt" repeatedly, once per search string. This is a much faster algorithm for a batch file.
The traditional method to read a file is to use a FOR /F loop, but there is another technique using SET /P that is faster, and it is safe to use with delayed expansion. The only limitations to this method are:
It strips trailing control characters from the line
It is limited to 1021 bytes per line
Each line must be terminated by <CR><LF> as is the Windows standard. It will not work with unix style lines terminated by <LF>
The search strings must have each \ and " escaped as \\ and \" when they are used with the /C option.
#echo off
setlocal enableDelayedExpansion
copy fileA.txt fileC.txt >nul
for /f %%N in ('find /c /v "" ^<fileB.txt') do set len=%%N
<fileB.txt (
for /l %%N in (1 1 !len!) do (
set "ln="
set /p "ln="
if defined ln (
set "ln=!ln:\=\\!"
set ln=!ln:"=\"!
move /y fileC.txt temp.txt >nul
findstr /lv /c:"!ln!" temp.txt >fileC.txt
)
)
)
del temp.txt
type fileC.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio