I'm doing a little string validation with findstr and its /r flag to allow for regular expressions. In particular I'd like to validate integers.
The regex
^[0-9][0-9]*$
worked fine for non-negative numbers but since I now support negative numbers as well I tried
^([1-9][0-9]*|0|-[1-9][0-9]*)$
for either positive or negative integers or zero.
The regex works fine theoretically. I tested it in PowerShell and it matches what I want. However, with
findstr /r /c:"^([1-9][0-9]*|0|-[1-9][0-9]*)$"
it doesn't.
While I know that findstr doesn't have the most advanced regex support (even below Notepad++ which is probably quite an achievement), I would have expected such simple expressions to work.
Any ideas what I'm doing wrong here?
This works for me:
findstr /r "^[1-9][0-9]*$ ^-[1-9][0-9]*$ ^0$"
If you don't use the /c option, the <Strings> argument is treated as a space-separated list of search strings, which makes the space a sort of crude replacement for the | construct. (As long as your regexes don't contain spaces, that is.)
Argh, I should have read the documentation better. findstr apparently doesn't support alternations (|).
So I'm probably back to multiple invocations or replacing the whole thing with a custom parser eventually.
This is what I do for now:
set ERROR=1
rem Test for zero
echo %1|findstr /r /c:"^0$">nul 2>&1
if not errorlevel 1 set ERROR=
rem Test for positive numbers
echo %1|findstr /r /c:"^[1-9][0-9]*$">nul 2>&1
if not errorlevel 1 set ERROR=
rem Test for negative numbers
echo %1|findstr /r /c:"^-[1-9][0-9]*$">nul 2>&1
if not errorlevel 1 set ERROR=
Or if you can, download grep for windows.. Many more features than findstr provides.
A simpler regex that achieves the same thing is possible, just add an optional minus to the start of your original expression:
^-?[0-9][0-9]*$
Support for regex in findstr is quite limited. I suggest using Notepad++. The find in files option supports Perl Compatible Regular Expressions; results showing filename, line number and matching text can be easily copied to a text file.
Related
What I'm trying to do is take a text file with a bunch of strings to search for, each on its own line, and search for each one of these strings in a file (check.txt). I want the output to be a text file with a list of all the strings that COULDN'T be found.
I've tried a few things so far.
for /F "tokens=*" %%A in search.txt do (
#echo on
FINDSTR %%A check.txt
IF ERRORLEVEL 1 echo %%A FAIL > fail_match.txt
)
Another attempt I made (this one was just to tell me if the whole list was good or not) was
#echo on
FINDSTR /g:search.txt check.txt > a_match.txt
IF ERRORLEVEL 1 echo bad > a_match.txt
I realize that these are incredibly basic, and I'm sure there's some easy answer that I just don't understand. I'm not a programmer; I just want to make my job a lot easier (and faster).
To clarify, my list of things to search for is in search.txt, my list of things to check them against is check.txt. Check.txt is a json file, so it's all one enormous line. I don't know if that will make a difference or not. I want a list of all lines in search.txt that are not in check.txt.
Your search scheme seems naive on two fronts:
1) JSON is not guaranteed to be a single line. A valid JASON may have any amount of whitespace, including newlines. This could cause problems if your search string logically matches across multiple lines.
2) What about substring matches? Suppose one search string is bat, and your JSON contains bath. I doubt you would want to consider that a match.
It is possible that neither of the above concerns are a problem for your case. Assuming they aren't, then there may be a fairly simple solution using FINDSTR.
You were close on your first try, except
A) - Your FOR /F IN() clause is missing parentheses
B) - You want to force each search string to be interpreted as a string literal, possibly with spaces. That requires the /C: option.
C) - You assume leading spaces are not significant in your search string ("tokens=*" strips leading spaces)
D) - You assume no search lines begin with semicolon. (The default EOF character is semicolon, and FOR /F skips all lines that begin with the EOF character)
E) - Quotes and backslashes must be escaped within a search string: \" -> \\\\\", \ -> \\, " -> \". See What are the undocumented features and limitations of the Windows FINDSTR command? for more information.
Points C) and D) may be fixed by disabling EOF and DELIMS using the following odd syntax:
for delims^=^ eof^= %%A in ...
Point E) can be addressed by defining a variable and adding escape sequences via search and replace. But this requires delayed expansion, but delayed expansion will corrupt FOR /F variables upon expansion if they contain !. So delayed expansion must be strategically toggled on and off within the loop.
Instead of using IF ERRORLEVEN n, you can use conditional command concatenation || to take action if the previous command failed.
You don't need to see the output of the FINDSTR command, so that can be redirected to NUL.
You can improve performance by redirecting just once, outside the loop.
#echo off
setlocal disableDelayedExpansion
>fail_match.txt (
for /f delims^=^ eol^= %%A in (search.txt) do (
set "search=%%A"
setlocal enableDelayedExpansion
set "search2=!search:\"=\\"!"
set "search2=!search2:\=\\!"
set "search2=!search2:"=\"!"
findstr /c:"!search2!" check.txt >nul || echo !search!
endlocal
)
)
If none of your search strings begin with ;, and no search string contains " or \, then the solution can be as simple as:
#echo off
setlocal disableDelayedExpansion
>fail_match.txt (
for /f "delims=" %%A in (search.txt) do findstr /c:"%%A" check.txt >nul || echo %%A
)
if I read your question right (output all lines of check.txt that are not in search.txt), this single line should do:
findstr /v /x /g:search.txt check.txt > nomatch.txt
I'm trying to find files in a folder with specific pattern like:
abcd201 abcd001 abcd004
The folder contains files named
abcd(3 numbers)
I'm trying to use the pattern:
abcd[0,2][0][1,4] but currently not working.
DIR /b C:\Folder\abcd"[0,2][0][1,4]".txt
Thanks!
dir command does not support regular expressions. You need to filter the output with findstr
dir /b "c:\folder\abcd*.txt" | findstr /r /c:"^abcd[02]0[14]\.txt$"
That is, use dir command to obtain a first approximation of what you are searching and then filter the list (pipe the dir command to findstr) to obtain only the list of required files.
The regular expression (/r) in findstr means: filter the lines, starting at the start of the line (initial ^), followed by abcd, followed by any character in the set [02], followed by a 0, followed by any character in the set [14], followed by a dot (a single dot means any character, so, it needs to be escaped \.), followed by the string txt and the end of the line ($).
Maybe you will need to add a /i switch to findstr to indicate it must ignore case when matching.
The regex of your example would also match abcd204 name. You may find these 4 files in a simpler way:
for %a in (0 2) do for %c in (1 4) do dir /B C:\Folder\abcd%a0%c.txt 2>NUL
This method is faster than findstr's one, especially if the number of files is large.
I want to search one string e.g. "main" in my project on windows OS recursively. I searched that and find a solution Windows recursive grep command-line
I applied same with two different approach, and result is not as expected.
e.g. my approach
findstr /S "main" *.cpp
but when I choose
findstr /S "int main" *.cpp
I am not getting only my main function.
What is the difference between these two approaches? is it wrong to provide strings with space?
This is because findstr takes a set of strings to search for. To actually match the string int main you have to use the /C option:
findstr /s /C:"int main" *.cpp
whereas your variant gives you every line with either int or main.
Kind of late to the party here,
but if you use
findstr /S "int.main" *.cpp
it will treat the dot as a wild card,
which matches a space, and as long as you
don't mind some superfluous matches (which are unlikely in this case)
it will work fine for you.
I use that, having not known about the /C: option before reading the answers above.
I have a lot of variables to place in this certain command, is there a way to add many variables in it without rewriting the command?
dir c:\ /s /b /a | find "my file"
for example i want to search for "my file" and 50 other things.
thanks for the answers
In short: no. You have to rewrite the command, unless the specific command actually does accept multiple parameters (which may be variables). In your case it doesn't so you need to rewrite it.
One option would be to use findstr instead of find. You can pass multiple search patterns:
dir c:\ /s /b /a | findstr /c:"my file" /c:"other" /c:"other 2" ...
I don't know how well that scales to about "50 other things" though, but then no such solution may. Maybe you can condense the filenames to a view using regular expressions (check findstr /?).
You could also simply do:
for /R c:\ %i in ("my file" "other" "other2") do #echo %i
Both solutions bear the option for duplicates, however. They essentially search based on a "contains" semantic. So, both "C:\foo\my file" and "C:\foo\bar\my file\something.txt" would match. But then your original solution had that issue as well.
If you can make your search patterns unique or can live with false positives, than that shouldn't be an issue. But be aware of it nevertheless.
Extending Christian's suggestion to use FINDSTR instead of FIND - You can simply put all 50 search terms in a text file and reference them using the /G:"filename" option.
But there is one important caveat - There is a nasty FINDSTR bug when searching for multiple literal strings. See Why doesn't this FINDSTR example with multiple literal search strings find a match?.
As explained in the link, the work-around is to either do a case insensitive search using the /I option, or else use regular expression search terms with the /R option.
For a "complete" listing of undocumented FINDSTR features and bugs, see What are the undocumented features and limitations of the Windows FINDSTR command?.
You could wrap it in a for loop:
for %i in ("my file" "second file") do dir c:\ /s /b /a | find %i
I'm trying to create a batch that creates a fileC.txt containing all lines in fileA.txt except for those that contains the strings in the lines in fileB.txt:
Pseudo:
foreach(line L in fileA.txt)
excluded = false
foreach(string str in fileB.txt)
if L contains str
exclude = true
if !excluded
add L to fileC.txt
if L !contains
For example
fileA.txt: (all)
this\here\is\a\line.wav
and\this\is\another.wav
i\am\a\chocolate.wav
peanut\butter\jelly\time.wav
fileB.txt: (those to be excluded)
another.wav
time.wav
fileC.txt: (wanted result)
this\here\is\a\line.wav
i\am\a\chocolate.wav
I've been fiddling around with FINDSTR but I just can't seem to puzzle it together.. any help or pointers greatly appreciated!
Cheers!
/ Fredde
The answer should be this simple:
findstr /lvg:"fileB.txt" "fileA.txt" >fileC.txt
And with your example, the above does give the correct results.
But there is a nasty FINDSTR bug that makes it unreliable when using multiple case sensitive literal search strings. See Why doesn't this FINDSTR example with multiple literal search strings find a match?, as well as the answer that goes with it. For a "complete" list of undocumented FINDSTR features and bugs, see What are the undocumented features and limitations of the Windows FINDSTR command?.
So the simple code above can fail depending on the content of the files. If you can get away with using a case insensitive search, then the solution is simple.
findstr /livg:"fileB.txt" "fileA.txt" >fileC.txt
Edit: Both versions above will fail if fileB.txt contains \\ or \". In order to work properly, those strings must be escaped as \\\ and \\"
But if you must use a case sensitive search, then there is no simple solution. Your best bet for a pure batch solution might be to use the /R regular expression option. But then you will have to create a modified version of fileB.txt where all regex meta-characters are escaped so that the strings give the correct literal search. That is a mini project in and of itself.
Perhaps your best option for a case sensitive solution is to get a 3rd party tool like grep or sed for Windows.
Edit: Here is a reasonably performing pure batch solution that is nearly bullet proof
I looked into doing something like the proposed logic in your question. But using batch to read all lines in a file is relatively slow. This solution only reads the exclude file line by line. It uses FINDSTR to read the lines in "fileA.txt" repeatedly, once per search string. This is a much faster algorithm for a batch file.
The traditional method to read a file is to use a FOR /F loop, but there is another technique using SET /P that is faster, and it is safe to use with delayed expansion. The only limitations to this method are:
It strips trailing control characters from the line
It is limited to 1021 bytes per line
Each line must be terminated by <CR><LF> as is the Windows standard. It will not work with unix style lines terminated by <LF>
The search strings must have each \ and " escaped as \\ and \" when they are used with the /C option.
#echo off
setlocal enableDelayedExpansion
copy fileA.txt fileC.txt >nul
for /f %%N in ('find /c /v "" ^<fileB.txt') do set len=%%N
<fileB.txt (
for /l %%N in (1 1 !len!) do (
set "ln="
set /p "ln="
if defined ln (
set "ln=!ln:\=\\!"
set ln=!ln:"=\"!
move /y fileC.txt temp.txt >nul
findstr /lv /c:"!ln!" temp.txt >fileC.txt
)
)
)
del temp.txt
type fileC.txt