How do I search for multiple strings in a single file using findstr? - cmd

I'm trying to search a folder for all files that include two different strings. I'm using PowerShell and the findstr command.
For example, I want to find all files that include BOTH "String: A" and "String: B", but not files that only have "String: A" OR "String: B".
I've tried using findstr /c:"String: A" /c:"String: B" *.txt in the folder, but it ended up giving me all files that had either "String: A" or "String: B", not just the files with both strings in them. findstr /? didn't explain how to essentially do an AND search, so I was wondering if anyone knew how to do such a thing.
I also tried findstr /c:"String: A" *.txt | findstr /c:"String: B" *.txt from this answer, but this ends up with no results (as in, PowerShell sits there for a very long time and never returns).
This answer was closer (I used findstr /r /c:"String: A.*String: B" *.txt), but the command returned nothing (I know from my data that there should be at least one file with both strings in it).
I'm not sure if there are formatting issues with the strings (given that they include multiple words and symbols), which is why I've been using /c: in the string formatting.

The challenge is that you seem to want to know if all of the words are present anywhere in the file, whereas findstr.exe matches patterns on a single line each.
PowerShell's more powerful findstr.exe analog, Select-String, can be combined with Group-Object to provide a solution:
$patterns = 'String: A', 'String: B'
Select-String -Path *.txt -Pattern $patterns -AllMatches |
Group-Object Path | # Group matching lines by file of origin
Where-Object {
# Does the distinct set of patterns found comprise all input patterns?
($_.Group.Pattern | Sort-Object -Unique).Count -eq $patterns.Count
} |
ForEach-Object Name
Note that this only outputs the paths of the matching files.
To also output the individual lines that contained matches for any of the patterns inside a matching file, replace ForEach-Object Name with ForEach-Object Group.

For the sake of completeness, a potential FindStr example:
%SystemRoot%\System32\findstr.exe /MIC:"String: A" *.txt | %SystemRoot%\System32\findstr.exe /F:/ /MIC:"String: B"

Related

How to get actual separate lines in PowerShell's Write-Output using a newline character

I tried to create a multiline Input to practice Select-String, expecting only a single matching line to be output, like I would normaly see it in an echo -e ... | grep combination. But the following command still gives me both lines. It seems to be the newline is only interpreted on final ouptut and Select-String still gets a single line of input
Write-Output "Hi`nthere" | Select-String -Pattern "i"
#
# Hi
# there
#
#
while I would expect it to return just
Hi
I used this version of PowerShell:
Get-Host | Select-Object Version
# 5.1.19041.906
Comparing with bash I would do the following for testing commands on multiline input in bash. I usually generate multiple lines with echo -e and then grep processes the individual lines.
echo -e "Hi\nthere" | grep "i"
# Hi
I hope someone can explain what I miss here in PowerShell? This problem seems like a basic misconception to me, where I also was not sure what to Google for.
Edits
[edit 1]: problem also for line ending with carriage return
Write-Output "Hi`r`nthere" | Select-String -Pattern "i"
I saw that separating with commas works as valid multiline input. So maybe the question is how to convert from newline to actual input line separation.
Write-Output "Hi","there" | Select-String -Pattern "i"
# Hi
[edit 2]: from edit 1 I found this stackoverflow-answer, where for me it now works with
Write-Output "Hi`nthere".Split([Environment]::NewLine) | Select-String -Pattern "i"
# or
Write-Output "Hi`nthere".Split("`n") | Select-String -Pattern "i"
Still may someone please explain why this is relevant here, but not in bash?
All the information is in the comments, but let me summarize and complement it:
PowerShell's pipeline is object-based, and Select-String operates on each input object - even if that happens to be a single multi-line string object, such as output by Write-Output "Hi`nthere"
It is only the output from external programs that is streamed line by line.
Therefore, you must split your multi-line string into individual lines in order to match them as such.
The best idiom for that is -split '\r?\n', because it recognizes both Windows-format CRLF and Unix-format LF-only newlines :
"Hi`nthere" -split '\r?\n' | Select-String -Pattern "i"
Note:
I've omitted Write-Output in favor of PowerShell's implicit output behavior (see the bottom section of this answer for more information).
For more information on how -split '\r?\n' works, see this answer.
Select-String doesn't directly output the matching lines (strings); instead it wraps them in match-information objects that provide metadata about each match. To get just the matching line (string):
In PowerShell (Core) 7+, add the -Raw switch.
In Windows PowerShell, pipe to ForEach-Object Line or wrap the entire call in (...).Line

Extract value from key value pair using powershell

I have a file having key-value data in it. I have to get the value of a specific key in that file. I have the Linux equivalent command:
File:
key1=val1
key2=val2
..
Command:
cat path/file | grep 'key1' | awk -F '=' '{print $2}'
Output:
val1
I want to achieve the same output on windows as well. I don't have any experience working in power shell but I tried with the following:
Get-Content "path/file" | Select-String -Pattern 'key1' -AllMatches
But I'm getting output like this:
key1=val1
What am i doing wrong here?
<# required powershell version 5.1 or later
#'
key1=val1
key2=val2
'# | out-file d:\temp.txt
#>
(Get-Content d:\temp.txt | ConvertFrom-StringData).key1
Note:
With your specific input format (key=value lines), Алексей Семенов's helpful answer offers the simplest solution, using ConvertFrom-StringData; note that it ignores whitespace around = and trailing whitespace after the value.
The answer below focuses generally on how to implement grep and awk-like functionality in PowerShell.
It is not the direct equivalent of your approach, but a faster and PowerShell-idiomatic solution using a switch statement:
# Create a sample file
#'
key1=val1
key2=val2
'# > sample.txt
# -> 'val1'
switch -Regex -File ./sample.txt { '^\s*key1=(.*)' { $Matches[1]; break } }
The -Regex option implicitly performs a -match operation on each line of the input file (thanks to -File), and the results are available in the automatic $Matches variable.
$Matches[1] therefore returns what the first (and only) capture group ((...)) in the regex matched; break stops processing instantly.
A more concise, but slower option is to combine the -match and -split operators, but note that this will only work as intended if only one line matches:
((Get-Content ./sample.txt) -match '^\s*key1=' -split '=')[1]
Also note that this invariably involves reading the entire file, by loading all lines into an array up front via Get-Content.
A comparatively slow version - due to using a cmdlet and thereby implicitly the pipeline - that fixes your attempt:
(Select-String -List '^\s*key1=(.*)' ./sample.txt).Matches[0].Groups[1].Value
Note:
Select-String outputs wrapper objects of type Microsoft.PowerShell.Commands.MatchInfo that wrap metadata around the matching strings rather than returning them directly (the way that grep does); .Matches is the property that contains the details of the match, which allows accessing what the capture group ((...)) in the regex captured, but it's not exactly obvious how to access that information.
The -List switch ensures that processing stops at the first match, but note that this only works with a direct file argument rather than with piping a file's lines individually via Get-Content.
Note that -AllMatches is for finding multiple matches in a single line (input object), and therefore not necessary here.
Another slow solution that uses ForEach-Object with a script block in which each line is -split into the key and value part, as suggested by Jeroen Mostert:
Get-Content ./sample.txt | ForEach-Object {
$key, $val = $_ -split '='
if ($key -eq 'key1') { $val }
}
Caveat: This invariably processes all lines, even after the key of interest was found.
To prevent that, you can append | Select-Object -First 1 to the command.
Unfortunately, as of PowerShell 7.1 there is no way to directly exit a pipeline on demand from a script block; see long-standing GitHub feature request #3821.
Note that break does not work as intended - except if you wrap your pipeline in a dummy loop statement (such as do { ... } while ($false)) to break out of.

How to disable findstr from sorting output?

I'm piping output from a command to findstr to extract certain lines. Here's my code:
example_command.exe | findstr /C:"string_D " /C:"string_B " /C:"string_C " /C:"string_A "
Yes, there are two spaces after the string text. I expected the output to be:
string_D
string_B
string_C
string_A
However, I'm getting:
string_A
string_B
string_C
string_D
findstr appears to be sorting the output alphabetically. Can that be disabled? I'd like it to output in the same order I entered it.
I want to do this with standard Windows 7 commands so I can easily distribute it in batch files.
I can separate the strings and run example_command.exe four times but that takes four times as long.
Is this another undocumented feature of findstr?
While it's pretty much running example_command.exe multiple times, this should give you the output you're looking for.
example_command.exe | findstr /C:"string_D " && example_command.exe | findstr /C:"string_B " && example_command.exe | findstr /C:"string_C " && example_command.exe | findstr /C:"string_A "
However like you said it will take 4 times as long.

How do I prevent this infinite loop in PowerShell?

Say I have several text files that contain the word 'not' and I want to find them and create a file containing the matches. In Linux, I might do
grep -r not *.txt > found_nots.txt
This works fine. In PowerShell, the following echos what I want to the screen
get-childitem *.txt -recurse | select-string not
However, if I pipe this to a file:
get-childitem *.txt -recurse | select-string not > found_nots.txt
It runs for ages. I eventually CTRL-C to exit and take a look at the found_nots.txt file which is truly huge. It looks as though PowerShell includes the output file as one of the files to search. Every time it adds more content, it finds more to add.
How can I stop this behavior and make it behave more like the Unix version?
Use the -Exclude option.
get-childitem *.txt -Exclude 'found_nots.txt' -recurse | select-string not > found_nots.txt
First easy solution is rename file output extension to another

piping findstr's output

Windows command line, I want to search a file for all rows starting with:
# NNN "<file>.inc"
where NNN is a number and <file> any string.
I want to use findstr, because I cannot require that the users of the script install ack.
Here is the expression I came up with:
>findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9_]*.inc" all_pre.txt
The file to search is all_pre.txt.
So far so good. Now I want to pipe that to another command, say for example more.
>findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9]*.inc" all_pre.txt | more
The result of this is the same output as the previous command, but with the file name as prefix for every row (all_pre.txt).
Then comes:
FINDSTR: cannot open |
FINDSTR: cannot open more
Why doesn't the pipe work?
snip of the content of all_pre.txt
# 1 "main.ss"
# 7 "main.ss"
# 11 "main.ss"
# 52 "main.ss"
# 1 "Build_flags.inc"
# 7 "Build_flags.inc"
# 11 "Build_flags.inc"
# 20 "Build_flags.inc"
# 45 "Build_flags.inc(function a called from b)"
EDIT: I need to escape the dot in the regex also. Not the issue, but worth to mention.
>findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9_]*\.inc" all_pre.txt
EDIT after Frank Bollack:
>findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9_]*\.inc.*" all_pre.txt | more
is not working, although (I think) it should look for the same string as before then any character any number of times. That must include the ", right?
You are missing a trailing \" in your search pattern.
findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9]*.inc\"" all_pre.txt | more
The above works for me.
Edit:
findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9]*\.inc.*\"" all_pre.txt | more
This updated search string will now match these lines from your example:
# 1 "Build_flags.inc"
# 7 "Build_flags.inc"
# 11 "Build_flags.inc"
# 20 "Build_flags.inc"
# 45 "Build_flags.inc(function a called from b)"
Edit:
To circumvent this "bug" in findstr, you can put your search into a batch file like this:
#findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9_]*\.inc" %1
Name it something like myfindstr.bat and call it like that:
myfinsdtr all_pre.txt | more
You can now use the pipe and redirection operators as usual.
Hope that helps.
I can't really explain the why, but from my experience although findstr behaviour with fixed strings (e.g. /c:"some string") is exactly as desired, regular expressions are a different beast. I routinely use the fixed string search function like so to extract lines from CSV files:
C:\> findstr /C:"literal string" filename.csv > output.csv
No issue there.
But using regular expressions (e.g. /R "^\"some string\"" ) appears to force the findstr output to console and can't be redirected via any means. I tried >, >>, 1> , 2> and all fail when using regular expressions.
My workaround for this is to use findstr as the secondary command. In my case I did this:
C:\> type filename.csv | findstr /R "^\"some string\"" > output.csv
That worked for me without issue directly from a command line, with a very complex regular expression string. In my case I only had to escape the " for it to work. other characters such as , and . worked fine as literals in the expression without escaping.
I confirmed that the behaviour is the same on both windows 2008 and Windows 7.
EDIT: Another variant also apparently works:
C:\> findstr /R "^\"some string\"" < filename.csv > output.csv
it's the same principle as using type, but just using the command line itself to create the pipe.
If you use a regex with an even number of double quotes, it works perfectly. But your number of " characters is odd, redirection doesn't work. You can either complete your regex with the second quote (you can use range for this purpose: [\"\"]), or replace your quote character with the dot metacharacter.
It looks like a cmd.exe issue, findstr is not guilty.
Here is my find, it's related to the odd number of double quotes not redirecting from within a batch script. Michael Yutsis had it right, just didn't give an example, so I thought I would:
dataset:
"10/19/2022 20:02:06.057","99.526755039736002573"
"10/19/2022 20:02:07.061"," "
"10/19/2022 20:02:08.075","85.797437749585213851"
"10/19/2022 20:02:09.096","96.71306029796799919"
"10/19/2022 20:02:10.107","4.0273833029566628028"
I tried using the following to find just lines that had a fractional portion of a number at the end of each line.
findstr /r /c:"\.[0-9]*\"$" file1.txt > file2.txt
(a valid regex string surrounded by quotes that has one explicit double quote in it)
needed to become
findstr /r /c:"\"[0-9]*\.[0-9]*\"$"" file1.txt > file2.txt
so it could identify the entire decimal (including the explicit quotes).
I tried just adding another double quote at the end of the string ($"" ) and the command worked and generated file2.txt, but it didn't match any lines in the file, so the extra trailing double quote becomes part of the regex string, I guess, and it doesn't match anything. Including the leading double quote around the full decimal was necessary, and fine for my needs.

Resources