Windows Findstr - windows

I'm trying to find files in a folder with specific pattern like:
abcd201 abcd001 abcd004
The folder contains files named
abcd(3 numbers)
I'm trying to use the pattern:
abcd[0,2][0][1,4] but currently not working.
DIR /b C:\Folder\abcd"[0,2][0][1,4]".txt
Thanks!

dir command does not support regular expressions. You need to filter the output with findstr
dir /b "c:\folder\abcd*.txt" | findstr /r /c:"^abcd[02]0[14]\.txt$"
That is, use dir command to obtain a first approximation of what you are searching and then filter the list (pipe the dir command to findstr) to obtain only the list of required files.
The regular expression (/r) in findstr means: filter the lines, starting at the start of the line (initial ^), followed by abcd, followed by any character in the set [02], followed by a 0, followed by any character in the set [14], followed by a dot (a single dot means any character, so, it needs to be escaped \.), followed by the string txt and the end of the line ($).
Maybe you will need to add a /i switch to findstr to indicate it must ignore case when matching.

The regex of your example would also match abcd204 name. You may find these 4 files in a simpler way:
for %a in (0 2) do for %c in (1 4) do dir /B C:\Folder\abcd%a0%c.txt 2>NUL
This method is faster than findstr's one, especially if the number of files is large.

Related

Renaming Multiple Files

I have almost 2000 files which I need to rename.
The files are named in the following format: PART1#PART2#PART3.pdf
I would like to batch rename the files so that PART2 is moved before PART1 e.g. PART2#PART1#PART3.pdf
PART 1 = A random document reference e.g. 124244
PART 2 = A reference number e.g. 12-12434-A
PART 3 = A short description e.g. Part 1
The # symbol separates each of these parts.
Is there a simple utility which I can use to make this change?
Use a batch file
#echo off
setlocal enableextensions disabledelayedexpansion
cd /d "c:\where\thefiles\are"
for /f "tokens=1,2,* delims=#" %%a in ('
dir /b /a-d *.pdf ^| findstr /r /b /e /i /c:"[^#][^#-]*#[^#][^#]*#..*\.pdf"
') do echo ren "%%a#%%b#%%c" "%%b#%%a#%%c"
What this code does is
Get the file list: a dir command asking for .pdf files in a bare format without the folders
Filters to only get the adecuated files: findstr command, searching for a regular expression that matches the beginning and end of the lines, ignoring case. The expression that is tested against the file names is : a non # character, followed by a sequence of non # or - characters (to avoid renaming the files twice), followed by a #, followed by a non # and a sequence of non # characters, followed by a # and any sequence of characters ending in .pdf
The for command splits the names using the # as token delimiter and for each one do the rename.
Rename operations are only echoed to console. If the output is correct, remove the echo command

want to copy names of files to text in a directory based on their file extension

I want to copy names of files to text in a directory based on their file extension.
As of now I am using dir /b >i67.txt which works fine for me but its not resolving problem of specific file extensions.
Can someone help me in getting a batch script for the same.
You are looking for the following command, run it in the context of the directory which contains your files:
dir /b /s /-p *.txt /o:n | findstr /E .txt > i67.txt
Using the above code example, you will be able to find all *.txt files in the directory and output the results into the i67.txt file (will be outputted to the same directory).
You can specify multiple file masks within one DIR /B command. Based on your comment to Yair Nevet's answer, it seems you want the following extensions: .ovr, .inc, and .dat. That can be done simply using:
dir /b /s *.ovr *.inc *.dat >i67.txt
If the files are on an NTFS volume that has short 8.3 names enabled, then you might get additional undesired file extensions if you have any file extensions longer than 4 characters that begin with your wanted extension. For example someName.data would show up in your output because it most likely would have a short name of SOMENA~1.DAT that matches your file mask.
You can prevent short name inclusion by piping the output to FINDSTR. The /L option forces a literal search as opposed to regular expressions, the /I option ignores case, and the /E option matches only the end of each line. Multiple search terms are delimited by spaces.
dir /b /s *.ovr *.inc *.dat | findstr /lie ".ovr .inc .dat"
Regarding your following comment:
Here is what I am using now: dir /b | findstr [a-z].*ovr>i67.txt &&
dir /b | findstr [a-z].*inc>>i67.txt && dir /b | findstr
[a-z].*dat>>i67.txt What it does?? --- It copies all
names(remember,only name except files itself which are ending with
extension .ovr .dat and .cpi ) present in a directory and copy it to a
text file(here name is i67.txt)
That will not actually do what you want for several reasons.
Windows file names are not case sensitive. Windows would treat NAME.OVR and name.ovr the same, so you should as well. That requires the /I option.
There is nothing in your search to anchor ovr to the extension. It will look for your pattern anywhere within the file name. And the dot is a meta character that represents any character - not a literal dot. The asterisk allows the dot to match any number of characters.
I can't be sure, but it looks like perhaps you only want to match files that begin with a letter. The following modification to my answer should do the trick:
dir /b /s *.ovr *.inc *.dat | findstr /ri "^[a-z].*\.ovr$ ^[a-z].*\.inc$ ^[a-z].*\.dat$"
The \R option forces a regular expression match instead of a literal. It is the default behavior for the given search, but it is a good idea to be explicit with regard to regex vs literal search.
^ anchors the search to the beginning of the name
[a-z] matches any letter (sort of). Remember it is not case sensitive because of the /I option. Without the /I option, it would not match upper case Z. See Why does findstr not handle case properly (in some circumstances)? for an explanation.
.* matches any number of characters, without restriction
\. matches a dot literal, marking the beginning of your extension
Then comes your extension
$ anchors the match to the end of the name

Batch: create fileC.txt from the result of (fileA.txt minus fileB.txt)

I'm trying to create a batch that creates a fileC.txt containing all lines in fileA.txt except for those that contains the strings in the lines in fileB.txt:
Pseudo:
foreach(line L in fileA.txt)
excluded = false
foreach(string str in fileB.txt)
if L contains str
exclude = true
if !excluded
add L to fileC.txt
if L !contains
For example
fileA.txt: (all)
this\here\is\a\line.wav
and\this\is\another.wav
i\am\a\chocolate.wav
peanut\butter\jelly\time.wav
fileB.txt: (those to be excluded)
another.wav
time.wav
fileC.txt: (wanted result)
this\here\is\a\line.wav
i\am\a\chocolate.wav
I've been fiddling around with FINDSTR but I just can't seem to puzzle it together.. any help or pointers greatly appreciated!
Cheers!
/ Fredde
The answer should be this simple:
findstr /lvg:"fileB.txt" "fileA.txt" >fileC.txt
And with your example, the above does give the correct results.
But there is a nasty FINDSTR bug that makes it unreliable when using multiple case sensitive literal search strings. See Why doesn't this FINDSTR example with multiple literal search strings find a match?, as well as the answer that goes with it. For a "complete" list of undocumented FINDSTR features and bugs, see What are the undocumented features and limitations of the Windows FINDSTR command?.
So the simple code above can fail depending on the content of the files. If you can get away with using a case insensitive search, then the solution is simple.
findstr /livg:"fileB.txt" "fileA.txt" >fileC.txt
Edit: Both versions above will fail if fileB.txt contains \\ or \". In order to work properly, those strings must be escaped as \\\ and \\"
But if you must use a case sensitive search, then there is no simple solution. Your best bet for a pure batch solution might be to use the /R regular expression option. But then you will have to create a modified version of fileB.txt where all regex meta-characters are escaped so that the strings give the correct literal search. That is a mini project in and of itself.
Perhaps your best option for a case sensitive solution is to get a 3rd party tool like grep or sed for Windows.
Edit: Here is a reasonably performing pure batch solution that is nearly bullet proof
I looked into doing something like the proposed logic in your question. But using batch to read all lines in a file is relatively slow. This solution only reads the exclude file line by line. It uses FINDSTR to read the lines in "fileA.txt" repeatedly, once per search string. This is a much faster algorithm for a batch file.
The traditional method to read a file is to use a FOR /F loop, but there is another technique using SET /P that is faster, and it is safe to use with delayed expansion. The only limitations to this method are:
It strips trailing control characters from the line
It is limited to 1021 bytes per line
Each line must be terminated by <CR><LF> as is the Windows standard. It will not work with unix style lines terminated by <LF>
The search strings must have each \ and " escaped as \\ and \" when they are used with the /C option.
#echo off
setlocal enableDelayedExpansion
copy fileA.txt fileC.txt >nul
for /f %%N in ('find /c /v "" ^<fileB.txt') do set len=%%N
<fileB.txt (
for /l %%N in (1 1 !len!) do (
set "ln="
set /p "ln="
if defined ln (
set "ln=!ln:\=\\!"
set ln=!ln:"=\"!
move /y fileC.txt temp.txt >nul
findstr /lv /c:"!ln!" temp.txt >fileC.txt
)
)
)
del temp.txt
type fileC.txt

How to find the number of occurrences of a string in file using windows command line?

I have a huge files with e-mail addresses and I would like to count how many of them are in this file. How can I do that using Windows' command line ?
I have tried this but it just prints the matching lines. (btw : all e-mails are contained in one line)
findstr /c:"#" mail.txt
Using what you have, you could pipe the results through a find. I've seen something like this used from time to time.
findstr /c:"#" mail.txt | find /c /v "GarbageStringDefNotInYourResults"
So you are counting the lines resulting from your findstr command that do not have the garbage string in it. Kind of a hack, but it could work for you. Alternatively, just use the find /c on the string you do care about being there. Lastly, you mentioned one address per line, so in this case the above works, but multiple addresses per line and this breaks.
Why not simply using this (this determines the number of lines containing (at least) an # char.):
find /C "#" "mail.txt"
Example output:
---------- MAIL.TXT: 96
To avoid the file name in the output, change it to this:
find /C "#" < "mail.txt"
Example output:
96
To capture the resulting number and store it in a variable, use this (change %N to %%N in a batch file):
set "NUM=0"
for /F %N in ('find /C "#" ^< "mail.txt"') do set "NUM=%N"
echo %NUM%
Using grep for Windows
Very simple solution:
grep -o "#" mail.txt | grep -c .
Remember a dot at end of line!
Here is little bit more understandable way:
grep -o "#" mail.txt | grep -c "#"
First grep selects only "#" strings and put each on new line.
Second grep counts lines (or lines with #).
The grep utility can be easy installed from grep-for Windows page. It is very small and safe text filter. The grep is one of most usefull Unix/Linux commands and I use it in both Linux and Windows daily.
The Windows findstr is good, but does not have such features as grep.
Installation of the grep in Windows will be one of the best decision if you like CLI or batch scripts.
Download and Installation
Download latest version from the project page https://sourceforge.net/projects/grep-for-windows/. Direct link to file is https://sourceforge.net/projects/grep-for-windows/files/grep-3.5_win32.zip/download.
Unzip the ZIP archive. A file is inside.
Put the grep.exe file to the C:\Windows directory or another place from the system path list got using command echo %PATH%.
That is all.
Test if grep is working:
Open command line window (cmd)
Run the command grep --help
Uninstallation
Delete the grep.exe file from folder where you have placed it.
May be it's a little bit late, but the following script worked for me (the source file contained quote characters, this is why I used 'usebackq' parameter).
The caret sign(^) acts as escape character in windows batch scripting language.
#setlocal enableextensions enabledelayedexpansion
SET TOTAL=0
FOR /F "usebackq tokens=*" %%I IN (file.txt) do (
SET LN=%%I
FOR %%J IN ("!LN!") do (
FOR /F %%K IN ('ECHO %%J ^| FIND /I /C "searchPhrase"') DO (
#SET /A TOTAL=!TOTAL!+%%K
)
)
)
ECHO Number of occurences is !TOTAL!
I found this on the net. See if it works:
findstr /R /N "^.*certainString.*$" file.txt | find /c "#"
I would install the unix tools on your system (handy in any case :-), then it's really simple - look e.g. here:
Count the number of occurrences of a string using sed?
(Using awk:
awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt
).
You can get the Windows unix tools here:
http://unxutils.sourceforge.net/
OK - way late to the table, but... it seems many respondents missed the original spec that all email addresses occur on 1 line. This means unless you introduce a CRLF with each occurrence of the # symbol, your suggestions to use variants of FINDSTR /c will not help.
Among the Unix tools for DOS is the very powerful SED.exe. Google it. It rocks RegEx. Here's a suggestion:
find "#" datafile.txt | find "#" | sed "s/#/#\n/g" | find /n "#" | SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/">CountChars.bat
Explanation: (assuming the file with the data is named "Datafile.txt")
1) The 1st FIND includes 3 lines of header info, which throws of a line-count approach, so pipe the results to a 2nd (identical) find to strip off unwanted header info.
2) Pipe the above results to SED, which will search for each "#" character and replace it with itself+ "\n" (which is a "new line" aka a CRLF) which gets each "#" on its own line in the output stream...
3) When you pipe the above output from SED into the FIND /n command, you'll be adding line numbers to the beginning of each line. Now, all you have to do is isolate the numeric portion of each line and preface it with "SET /a" to convert each line into a batch statement that (increasingly with each line) sets the variable equal to that line's number.
4) isolate each line's numeric part and preface the isolated number per the above via:
| SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/"
In the above snippet, you're piping the previous commands's output to SED, which uses this syntax "s/WhatToLookFor/WhatToReplaceItWith/", to do these steps:
a) look for a "[" (which must be "escaped" by prefacing it with "\")
b) begin saving (or "tokenizing") what follows, up to the closing "]"
--> in other words it ignores the brackets but stores the number
--> the ".*" that follows the bracket wildcards whatever follows the "]"
c) the stuff between the \( and the \) is "tokenized", which means it can be referred-to later, in the "WhatToReplaceItWith" section. The first stuff that's tokenized is referred to via "\1" then second as "\2", etc.
So... we're ignoring the [ and the ] and we're saving the number that lies between the brackets and IGNORING all the wild-carded remainder of each line... thus we're replacing the line with the literal string:
Set /a NumFound= + the saved, or "tokenized" number, i.e.
...the first line will read: Set /a NumFound=1
...& the next line reads: Set /a NumFound=2 etc. etc.
Thus, if you have 1,283 email addresses, your results will have 1,283 lines.
The last one executed = the one that matters.
If you use the ">" character to redirect all of the above output to a batch file, i.e.:
> CountChars.bat
...then just call that batch file & you'll have a DOS environment variable named "NumFound" with your answer.
This is how I do it, using an AND condition with FINDSTR (to count number of errors in a log file):
SET COUNT=0
FOR /F "tokens=4*" %%a IN ('TYPE "soapui.log" ^| FINDSTR.exe /I /R^
/C:"Assertion" ^| FINDSTR.exe /I /R /C:"has status VALID"') DO (
:: counts number of lines containing both "Assertion" and "has status VALID"
SET /A COUNT+=1
)
SET /A PASSNUM=%COUNT%
NOTE: This counts "number of lines containing string match" rather than "number of total occurrences in file".
Use this:
type file.txt | find /i "#" /c

Windows recursive grep command-line

I need to do a recursive grep in Windows, something like this in Unix/Linux:
grep -i 'string' `find . -print`
or the more-preferred method:
find . -print | xargs grep -i 'string'
I'm stuck with just cmd.exe, so I only have Windows built-in commands. I can't install Cygwin, or any 3rd party tools like UnxUtils on this server unfortunately. I'm not even sure I can install PowerShell. Any suggestions using only cmd.exe built-ins (Windows 2003 Server)?
findstr can do recursive searches (/S) and supports some variant of regex syntax (/R).
C:\>findstr /?
Searches for strings in files.
FINDSTR [/B] [/E] [/L] [/R] [/S] [/I] [/X] [/V] [/N] [/M] [/O] [/P] [/F:file]
[/C:string] [/G:file] [/D:dir list] [/A:color attributes] [/OFF[LINE]]
strings [[drive:][path]filename[ ...]]
/B Matches pattern if at the beginning of a line.
/E Matches pattern if at the end of a line.
/L Uses search strings literally.
/R Uses search strings as regular expressions.
/S Searches for matching files in the current directory and all
subdirectories.
/I Specifies that the search is not to be case-sensitive.
/X Prints lines that match exactly.
/V Prints only lines that do not contain a match.
/N Prints the line number before each line that matches.
/M Prints only the filename if a file contains a match.
/O Prints character offset before each matching line.
/P Skip files with non-printable characters.
/OFF[LINE] Do not skip files with offline attribute set.
/A:attr Specifies color attribute with two hex digits. See "color /?"
/F:file Reads file list from the specified file(/ stands for console).
/C:string Uses specified string as a literal search string.
/G:file Gets search strings from the specified file(/ stands for console).
/D:dir Search a semicolon delimited list of directories
strings Text to be searched for.
[drive:][path]filename
Specifies a file or files to search.
Use spaces to separate multiple search strings unless the argument is prefixed
with /C. For example, 'FINDSTR "hello there" x.y' searches for "hello" or
"there" in file x.y. 'FINDSTR /C:"hello there" x.y' searches for
"hello there" in file x.y.
Regular expression quick reference:
. Wildcard: any character
* Repeat: zero or more occurrences of previous character or class
^ Line position: beginning of line
$ Line position: end of line
[class] Character class: any one character in set
[^class] Inverse class: any one character not in set
[x-y] Range: any characters within the specified range
\x Escape: literal use of metacharacter x
\<xyz Word position: beginning of word
xyz\> Word position: end of word
For full information on FINDSTR regular expressions refer to the online Command
Reference.
findstr /spin /c:"string" [files]
The parameters have the following meanings:
s = recursive
p = skip non-printable characters
i = case insensitive
n = print line numbers
And the string to search for is the bit you put in quotes after /c:
I just searched a text with following command which listed me all the file names containing my specified 'search text'.
C:\Users\ak47\Desktop\trunk>findstr /S /I /M /C:"search text" *.*
Recursive search for import word inside src folder:
> findstr /s import .\src\*
I recommend a really great tool:
native unix utils:
http://unxutils.sourceforge.net/
http://en.wikipedia.org/wiki/UnxUtils
Just unpack them and put that folder into your PATH environment variable and voila! :)
Works like a charm, and there are much more then just grep ;)
for /f %G in ('dir *.cpp *.h /s/b') do ( find /i "what you search" "%G") >> out_file.txt
Select-String worked best for me. All the other options listed here, such as findstr, didn't work with large files.
Here's an example:
select-string -pattern "<pattern>" -path "<path>"
note: This requires Powershell
If you have Perl installed, you could use ack, available at http://beyondgrep.com/.
"findstr /spin /c:"string" [[drive:][path]filename[...]]"
Similar to the 2nd highest answer above (by i_am_jorf on Mar 30, 2009 at 22:26) which shows the following example: "findstr /spin /c:"string" [files]"
However, running "findstr /?" shows there is no option or parameter defined as
"[files]". I believe what he is implying here is the parameter that defines which files to search for which "findstr /?" describes as:
"[[drive:][path]filename[ ...]]"
It later defines this with the following:
"[drive:][path]filename" - Specifies a file or files to search.
So, to not use personal short-hand I am providing it the way that findstr /> defines it if searching for certain files:
"findstr /spin /c:"string" [[drive:][path]filename[...]]"

Resources