I am trying to find last line in a text file using the regex ^.*\z, it's working fine in notepad++ but when I try it in cmd using findstr /R "^.*^Z" file.txt not working.
Open a command prompt window and run findstr /?. The output help explains what FINDSTR supports. The regular expression feature is limited in FINDSTR. It does not support all the features as supported by Boost Perl Regular Expression library used by many text editors in various versions.
This batch code could be used to get last non empty line from a file assigned to an environment variable:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "LastLine="
if exist "file.txt" for /F "usebackq eol= delims=" %%# in ("file.txt") do set "LastLine=%%#"
echo Last line is: "%LastLine%"
endlocal
Command FOR skips all empty lines and by default also all lines starting with a semicolon. For that reason eol= is used to define form-feed control character as end of line. In case of last line of file surely never starts with ; it would be best to remove eol= from the FOR command line.
In case of file to process always has at least X lines, it would make sense to add to the FOR options after usebackq the option skip=X to skip the first X lines of the file for faster processing.
For details on command FOR open a command prompt window and run for /?.
I'm trying to capture the first 4 characters of output from the following Windows command.
nltest /server:%COMPUTERNAME% /dsgetsite
What is normally returned would be:
SITEAdSiteName
This command completed successfully.
I've tried using the for /F command but can't seem to figure out how to strip everything else except the first 4 characters of what is returned.
I'm thinking using the for /F may not be the best way to accomplish this.
Are there other suggestions on how I many accomplish this?
I think my challenge is defining (or not) the delimiter to being any character, I've tried the *, but didn't seem to do it for me.
When I use this:
for /F "tokens=1-4 delims=*" %A in ('nltest /server:%COMPUTERNAME% /dsgetsite') DO echo %A
I get both output lines, sort of stumped here.
To store the first line of the output of nltest /server:%COMPUTERNAME% /DSGETSITE in variable LINE, use the following command line (use %%F instead of %F to use this in a batch file):
set "LINE=" & for /F %F in ('nltest /server:%COMPUTERNAME% /DSGETSITE') do if not defined LINE set "LINE=%F"
To return the first four characters, use sub-string expansion:
set "LINE=%LINE:~,4%"
echo %LINE%
What I'm trying to do is take a text file with a bunch of strings to search for, each on its own line, and search for each one of these strings in a file (check.txt). I want the output to be a text file with a list of all the strings that COULDN'T be found.
I've tried a few things so far.
for /F "tokens=*" %%A in search.txt do (
#echo on
FINDSTR %%A check.txt
IF ERRORLEVEL 1 echo %%A FAIL > fail_match.txt
)
Another attempt I made (this one was just to tell me if the whole list was good or not) was
#echo on
FINDSTR /g:search.txt check.txt > a_match.txt
IF ERRORLEVEL 1 echo bad > a_match.txt
I realize that these are incredibly basic, and I'm sure there's some easy answer that I just don't understand. I'm not a programmer; I just want to make my job a lot easier (and faster).
To clarify, my list of things to search for is in search.txt, my list of things to check them against is check.txt. Check.txt is a json file, so it's all one enormous line. I don't know if that will make a difference or not. I want a list of all lines in search.txt that are not in check.txt.
Your search scheme seems naive on two fronts:
1) JSON is not guaranteed to be a single line. A valid JASON may have any amount of whitespace, including newlines. This could cause problems if your search string logically matches across multiple lines.
2) What about substring matches? Suppose one search string is bat, and your JSON contains bath. I doubt you would want to consider that a match.
It is possible that neither of the above concerns are a problem for your case. Assuming they aren't, then there may be a fairly simple solution using FINDSTR.
You were close on your first try, except
A) - Your FOR /F IN() clause is missing parentheses
B) - You want to force each search string to be interpreted as a string literal, possibly with spaces. That requires the /C: option.
C) - You assume leading spaces are not significant in your search string ("tokens=*" strips leading spaces)
D) - You assume no search lines begin with semicolon. (The default EOF character is semicolon, and FOR /F skips all lines that begin with the EOF character)
E) - Quotes and backslashes must be escaped within a search string: \" -> \\\\\", \ -> \\, " -> \". See What are the undocumented features and limitations of the Windows FINDSTR command? for more information.
Points C) and D) may be fixed by disabling EOF and DELIMS using the following odd syntax:
for delims^=^ eof^= %%A in ...
Point E) can be addressed by defining a variable and adding escape sequences via search and replace. But this requires delayed expansion, but delayed expansion will corrupt FOR /F variables upon expansion if they contain !. So delayed expansion must be strategically toggled on and off within the loop.
Instead of using IF ERRORLEVEN n, you can use conditional command concatenation || to take action if the previous command failed.
You don't need to see the output of the FINDSTR command, so that can be redirected to NUL.
You can improve performance by redirecting just once, outside the loop.
#echo off
setlocal disableDelayedExpansion
>fail_match.txt (
for /f delims^=^ eol^= %%A in (search.txt) do (
set "search=%%A"
setlocal enableDelayedExpansion
set "search2=!search:\"=\\"!"
set "search2=!search2:\=\\!"
set "search2=!search2:"=\"!"
findstr /c:"!search2!" check.txt >nul || echo !search!
endlocal
)
)
If none of your search strings begin with ;, and no search string contains " or \, then the solution can be as simple as:
#echo off
setlocal disableDelayedExpansion
>fail_match.txt (
for /f "delims=" %%A in (search.txt) do findstr /c:"%%A" check.txt >nul || echo %%A
)
if I read your question right (output all lines of check.txt that are not in search.txt), this single line should do:
findstr /v /x /g:search.txt check.txt > nomatch.txt
I'm trying to create a batch that creates a fileC.txt containing all lines in fileA.txt except for those that contains the strings in the lines in fileB.txt:
Pseudo:
foreach(line L in fileA.txt)
excluded = false
foreach(string str in fileB.txt)
if L contains str
exclude = true
if !excluded
add L to fileC.txt
if L !contains
For example
fileA.txt: (all)
this\here\is\a\line.wav
and\this\is\another.wav
i\am\a\chocolate.wav
peanut\butter\jelly\time.wav
fileB.txt: (those to be excluded)
another.wav
time.wav
fileC.txt: (wanted result)
this\here\is\a\line.wav
i\am\a\chocolate.wav
I've been fiddling around with FINDSTR but I just can't seem to puzzle it together.. any help or pointers greatly appreciated!
Cheers!
/ Fredde
The answer should be this simple:
findstr /lvg:"fileB.txt" "fileA.txt" >fileC.txt
And with your example, the above does give the correct results.
But there is a nasty FINDSTR bug that makes it unreliable when using multiple case sensitive literal search strings. See Why doesn't this FINDSTR example with multiple literal search strings find a match?, as well as the answer that goes with it. For a "complete" list of undocumented FINDSTR features and bugs, see What are the undocumented features and limitations of the Windows FINDSTR command?.
So the simple code above can fail depending on the content of the files. If you can get away with using a case insensitive search, then the solution is simple.
findstr /livg:"fileB.txt" "fileA.txt" >fileC.txt
Edit: Both versions above will fail if fileB.txt contains \\ or \". In order to work properly, those strings must be escaped as \\\ and \\"
But if you must use a case sensitive search, then there is no simple solution. Your best bet for a pure batch solution might be to use the /R regular expression option. But then you will have to create a modified version of fileB.txt where all regex meta-characters are escaped so that the strings give the correct literal search. That is a mini project in and of itself.
Perhaps your best option for a case sensitive solution is to get a 3rd party tool like grep or sed for Windows.
Edit: Here is a reasonably performing pure batch solution that is nearly bullet proof
I looked into doing something like the proposed logic in your question. But using batch to read all lines in a file is relatively slow. This solution only reads the exclude file line by line. It uses FINDSTR to read the lines in "fileA.txt" repeatedly, once per search string. This is a much faster algorithm for a batch file.
The traditional method to read a file is to use a FOR /F loop, but there is another technique using SET /P that is faster, and it is safe to use with delayed expansion. The only limitations to this method are:
It strips trailing control characters from the line
It is limited to 1021 bytes per line
Each line must be terminated by <CR><LF> as is the Windows standard. It will not work with unix style lines terminated by <LF>
The search strings must have each \ and " escaped as \\ and \" when they are used with the /C option.
#echo off
setlocal enableDelayedExpansion
copy fileA.txt fileC.txt >nul
for /f %%N in ('find /c /v "" ^<fileB.txt') do set len=%%N
<fileB.txt (
for /l %%N in (1 1 !len!) do (
set "ln="
set /p "ln="
if defined ln (
set "ln=!ln:\=\\!"
set ln=!ln:"=\"!
move /y fileC.txt temp.txt >nul
findstr /lv /c:"!ln!" temp.txt >fileC.txt
)
)
)
del temp.txt
type fileC.txt
I have a huge files with e-mail addresses and I would like to count how many of them are in this file. How can I do that using Windows' command line ?
I have tried this but it just prints the matching lines. (btw : all e-mails are contained in one line)
findstr /c:"#" mail.txt
Using what you have, you could pipe the results through a find. I've seen something like this used from time to time.
findstr /c:"#" mail.txt | find /c /v "GarbageStringDefNotInYourResults"
So you are counting the lines resulting from your findstr command that do not have the garbage string in it. Kind of a hack, but it could work for you. Alternatively, just use the find /c on the string you do care about being there. Lastly, you mentioned one address per line, so in this case the above works, but multiple addresses per line and this breaks.
Why not simply using this (this determines the number of lines containing (at least) an # char.):
find /C "#" "mail.txt"
Example output:
---------- MAIL.TXT: 96
To avoid the file name in the output, change it to this:
find /C "#" < "mail.txt"
Example output:
96
To capture the resulting number and store it in a variable, use this (change %N to %%N in a batch file):
set "NUM=0"
for /F %N in ('find /C "#" ^< "mail.txt"') do set "NUM=%N"
echo %NUM%
Using grep for Windows
Very simple solution:
grep -o "#" mail.txt | grep -c .
Remember a dot at end of line!
Here is little bit more understandable way:
grep -o "#" mail.txt | grep -c "#"
First grep selects only "#" strings and put each on new line.
Second grep counts lines (or lines with #).
The grep utility can be easy installed from grep-for Windows page. It is very small and safe text filter. The grep is one of most usefull Unix/Linux commands and I use it in both Linux and Windows daily.
The Windows findstr is good, but does not have such features as grep.
Installation of the grep in Windows will be one of the best decision if you like CLI or batch scripts.
Download and Installation
Download latest version from the project page https://sourceforge.net/projects/grep-for-windows/. Direct link to file is https://sourceforge.net/projects/grep-for-windows/files/grep-3.5_win32.zip/download.
Unzip the ZIP archive. A file is inside.
Put the grep.exe file to the C:\Windows directory or another place from the system path list got using command echo %PATH%.
That is all.
Test if grep is working:
Open command line window (cmd)
Run the command grep --help
Uninstallation
Delete the grep.exe file from folder where you have placed it.
May be it's a little bit late, but the following script worked for me (the source file contained quote characters, this is why I used 'usebackq' parameter).
The caret sign(^) acts as escape character in windows batch scripting language.
#setlocal enableextensions enabledelayedexpansion
SET TOTAL=0
FOR /F "usebackq tokens=*" %%I IN (file.txt) do (
SET LN=%%I
FOR %%J IN ("!LN!") do (
FOR /F %%K IN ('ECHO %%J ^| FIND /I /C "searchPhrase"') DO (
#SET /A TOTAL=!TOTAL!+%%K
)
)
)
ECHO Number of occurences is !TOTAL!
I found this on the net. See if it works:
findstr /R /N "^.*certainString.*$" file.txt | find /c "#"
I would install the unix tools on your system (handy in any case :-), then it's really simple - look e.g. here:
Count the number of occurrences of a string using sed?
(Using awk:
awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt
).
You can get the Windows unix tools here:
http://unxutils.sourceforge.net/
OK - way late to the table, but... it seems many respondents missed the original spec that all email addresses occur on 1 line. This means unless you introduce a CRLF with each occurrence of the # symbol, your suggestions to use variants of FINDSTR /c will not help.
Among the Unix tools for DOS is the very powerful SED.exe. Google it. It rocks RegEx. Here's a suggestion:
find "#" datafile.txt | find "#" | sed "s/#/#\n/g" | find /n "#" | SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/">CountChars.bat
Explanation: (assuming the file with the data is named "Datafile.txt")
1) The 1st FIND includes 3 lines of header info, which throws of a line-count approach, so pipe the results to a 2nd (identical) find to strip off unwanted header info.
2) Pipe the above results to SED, which will search for each "#" character and replace it with itself+ "\n" (which is a "new line" aka a CRLF) which gets each "#" on its own line in the output stream...
3) When you pipe the above output from SED into the FIND /n command, you'll be adding line numbers to the beginning of each line. Now, all you have to do is isolate the numeric portion of each line and preface it with "SET /a" to convert each line into a batch statement that (increasingly with each line) sets the variable equal to that line's number.
4) isolate each line's numeric part and preface the isolated number per the above via:
| SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/"
In the above snippet, you're piping the previous commands's output to SED, which uses this syntax "s/WhatToLookFor/WhatToReplaceItWith/", to do these steps:
a) look for a "[" (which must be "escaped" by prefacing it with "\")
b) begin saving (or "tokenizing") what follows, up to the closing "]"
--> in other words it ignores the brackets but stores the number
--> the ".*" that follows the bracket wildcards whatever follows the "]"
c) the stuff between the \( and the \) is "tokenized", which means it can be referred-to later, in the "WhatToReplaceItWith" section. The first stuff that's tokenized is referred to via "\1" then second as "\2", etc.
So... we're ignoring the [ and the ] and we're saving the number that lies between the brackets and IGNORING all the wild-carded remainder of each line... thus we're replacing the line with the literal string:
Set /a NumFound= + the saved, or "tokenized" number, i.e.
...the first line will read: Set /a NumFound=1
...& the next line reads: Set /a NumFound=2 etc. etc.
Thus, if you have 1,283 email addresses, your results will have 1,283 lines.
The last one executed = the one that matters.
If you use the ">" character to redirect all of the above output to a batch file, i.e.:
> CountChars.bat
...then just call that batch file & you'll have a DOS environment variable named "NumFound" with your answer.
This is how I do it, using an AND condition with FINDSTR (to count number of errors in a log file):
SET COUNT=0
FOR /F "tokens=4*" %%a IN ('TYPE "soapui.log" ^| FINDSTR.exe /I /R^
/C:"Assertion" ^| FINDSTR.exe /I /R /C:"has status VALID"') DO (
:: counts number of lines containing both "Assertion" and "has status VALID"
SET /A COUNT+=1
)
SET /A PASSNUM=%COUNT%
NOTE: This counts "number of lines containing string match" rather than "number of total occurrences in file".
Use this:
type file.txt | find /i "#" /c