FINDSTR refuses to match EOL - windows

Given the following piped in text:
a master
a release
a release2
a some-release
Can someone please explain why
findstr /i /r /c:"a release$"
does not return line 2?
After several hours of reading everything imaginable about the windows findstr command, it just doesn't seem possible to get the $ character to match the EOL. Note that using the /E switch instead of $ makes no difference. I am running Windows 7.
Can someone come up with any way to match just line 2 using standard windows commands? I will resort to grep if necessary, but I can't believe there's no way to solve this natively.
Thanks!

You mention "piped" text. I just had this problem and was searching on stack. The answer for me was, findstr /R and the echo command have some weird quirks with pipes in DOS (or cmd).
I was trying to match files ending in .jpg, so to test, my command was:
echo "myfile spaces in name.jpg" | findstr /i /r "\.jpg$"
But that wasn't working. I used gnu utils to find out, echo with a space before the pipe inserts a SPACE in the output of the PIPE. Since I'm very used to UNIX style echo and regular expressions I did not expect the extra space inserted after the filename in my echo test.
To fix, I added a " *" (space-star) in the regex after the jpg, (to match 0 or more spaces):
echo "myfile spaces in name.jpg" | findstr /i /r "\.jpg *$"
That worked great.**
Proof using GNU's octal dump command (space is 040 in octal):
c:\>echo "myfile.jpg" | od -cb
0000000 " m y f i l e . j p g " \r \n
042 155 171 146 151 154 145 056 152 160 147 042 040 015 012
Now if I remove the space before the "|" pipe, it goes away:
c:\>echo "myfile.jpg"| od -cb
0000000 " m y f i l e . j p g " \r \n
042 155 171 146 151 154 145 056 152 160 147 042 015 012
This shouldn't happen in text files in DOS, nor most commands filtered through a pipe, but it will happen if you do quick command line tests using echo like I did.
** Another weird quirk (at least on my Win10 version of findstr), it ignores double-quotes near the end of the line.

If there are not non visible characters in the data, the most probable cause is the line termination character. If the lines piped do not end with carriage return / line feed (0x0D 0x0A) characters, findstr will not match the end of the line where it should.
Try something like
sourceofdata | more | findstr /r /c:"a release$"
sourceofdata | find /v "" | findstr /r /c:"a release$"
Both find and more changes the line ending. If it works, you have found the source of the problem.
If not, here (if you have still not readed it) you will find an extensive documentation on how findstr can fail.

It won't work. /c says do literal not regular expresion. You can make it work with command line switches though. Did you look up the reference before writing your command.

Related

Print a newline using awk on windows using unix tools for Windows

I installed unix tools for windows, but I cannot print a newline using awk. I've tried all the switches: \n, but nothing seems to work. Your help will be greatly appreciated.
My input (c:\temp\servers.txt)"
123 0 1
234 1 1
this is my script :
for /F "tokens=1" %%A In (c:\temp\servers.txt) DO awk "$2 == "0" {print $6,\n }" input.txt
Windows is useless at accepting single quotes in command parameters - which means you need to put your awk commands inside double quotes on Windows, which means you can't use double quotes for print statements and strings - grrrr!
It is therefore easier to put your awk commands into a file called script.awk like this:
BEGIN{ORS="\r\n"}
$2==0{print $1}
That says... separate all output lines with Carriage Return + Linefeed (just how Windows likes things) and then, on any line where the second field is zero, print the first field.
Then do this:
awk -f script.awk C:\servers.txt
The easiest way to print a newline is just to use print "". The Output Record Separator ORS (which is a newline by default) will be appended to the empty string.
If you already have a print statement and want to add an additional newline to the output, you can use the ORS variable yourself, for example print $6 ORS.
This approach is more portable than hard-coding the characters "\r\n" or "\n" in your script, as the default value of the ORS should just work for the system that you are running on (i.e. it should already be "\r\n" on Windows and "\n" on most other systems).
An example of printing a string that contains embedded newlines ;
the ā€˜\nā€™ is an escape sequence, used to represent the newline character
$ awk 'BEGIN { print "line one\nline two\nline three" }'
-| line one
-| line two
-| line three
Maybe this article will help you out abit more:
https://www.gnu.org/software/gawk/manual/html_node/Print-Examples.html
There are multiple solutions to this problem, u can try and see which one suits best for you
In Windows you have to use double quotes for the command line script, and you have to escape embedded double quotes with a backslash like this:
C:\Users\ReluctantBiosGuy>gawk "BEGIN{print \"line one\nline two\nline three\"}"
line one
line two
line three
C:\Users\ReluctantBiosGuy>
For your original script, I don't see why you are trying to use command.com's FOR command, nor is it clear what you really want to print. However, the following should give you enough info to do what you really want to do:
C:\Users\ReluctantBiosGuy>type x
123 4 5
132 0 4
153 0 33
384 3 88
C:\Users\ReluctantBiosGuy>gawk "$2==0{printf(\"Token 1 is %s\n\n\",$1)}" x
Token 1 is 132
Token 1 is 153
C:\Users\ReluctantBiosGuy>

Unattended wget security implications

I've written a simple script utilizing wget to collect specific filetypes (.png) from a website. While this works pretty well I'm somewhat concerned about possible security risks.
As it stands wget will just download everything with the .png extension, theoretically it would be possible for the website to contain malicious or trash files that have been renamed.
Is there a way to do some filtering before wget downloads? The files I'm looking to download always share some charcteristics that could be used to identify them (PNG image data, 200 x 300, 8-bit/color RGB, non-interlaced; Size between 80-120kB)
Can --spider be used to atleast sort out the files by size before downloading anything? If so I'd appriciate any help with that!
This could probably be done after downloading using file plus some other commands, but I'd like to avoid grabbing bad data in the first place - any way to do this? Or alternatives that can do something like that?
Thanks for your input!
PNG files have an 8 byte header which contains the following:
137 A byte with its most significant bit set (``8-bit character'')
80 P
78 N
71 G
13 Carriage-return (CR) character, a.k.a. CTRL-M or ^M
10 Line-feed (LF) character, a.k.a. CTRL-J or ^J
26 CTRL-Z or ^Z
10 Line-feed (LF) character, a.k.a. CTRL-J or ^J
So if you feed the first 8 bytes into od you should see something like this:
$ head -c 8 knox.png | od -c
0000000 211 P N G \r \n 032 \n
I think that gives you the basis of a pretty good test.
I don't think there is any way to limit wget to partially download a file, but you can do it in curl e.g.:
curl -s -r 0-8 "http://www.fnordware.com/superpng/pnggrad8rgb.png" | od -c
0000000 211 P N G \r \n 032 \n \0
0000011

How to disable findstr from sorting output?

I'm piping output from a command to findstr to extract certain lines. Here's my code:
example_command.exe | findstr /C:"string_D " /C:"string_B " /C:"string_C " /C:"string_A "
Yes, there are two spaces after the string text. I expected the output to be:
string_D
string_B
string_C
string_A
However, I'm getting:
string_A
string_B
string_C
string_D
findstr appears to be sorting the output alphabetically. Can that be disabled? I'd like it to output in the same order I entered it.
I want to do this with standard Windows 7 commands so I can easily distribute it in batch files.
I can separate the strings and run example_command.exe four times but that takes four times as long.
Is this another undocumented feature of findstr?
While it's pretty much running example_command.exe multiple times, this should give you the output you're looking for.
example_command.exe | findstr /C:"string_D " && example_command.exe | findstr /C:"string_B " && example_command.exe | findstr /C:"string_C " && example_command.exe | findstr /C:"string_A "
However like you said it will take 4 times as long.

How to fix inconsistent line endings for whole VS solution?

Visual Studio will detect inconsistent line endings when opening a file and there is an option to fix it for that specific file. However, if I want to fix line endings for all files in a solution, how do I do that?
Just for a more complete answer, this worked best for me:
Replace
(?<!\r)\n
with
\r\n
in entire solution with "regEx" option.
This will set the correct line ending in all files which didn't have the correct line ending so far. It uses the negative lookahead to check for the non-existance of a \r in front of the \n.
Be careful with the other solutions: They will either modify all lines in all files (ignoring the original line ending) or delete the last character of each line.
You can use the Replace in Files command and enable regular expressions. For example, to replace end-of-lines that have a single linefeed "\n" (like, from GitHub, for example) with the standard Windows carriage-return linefeed "\r\n", search for:
([^\r]|^)\n
This says to create a group (that's why the parentheses are required), where the first character is either not a carriage-return or is the beginning of a line. The beginning of the line test is really only for the very beginning of the file, if it happens to start with a "\n". Following the group is a newline. So, you will match ";\n" which has the wrong end-of-line, but not "\r\n" which is the correct end-of-line.
And replace it with:
$1\r\n
This says to keep the group ($1) and then replace the "\n" with "\r\n".
Try doing
Edit > Advanced > Format Document
Then save the document, as long as the file doesn't get modified by another external editor, it should stay consistent. Fixes it for me.
If you have Cygwin with the cygutils package installed, you can use this chain of commands from the Cygwin shell:
unix2dos -idu *.cpp | sed -e 's/ 0 [1-9][0-9]//' -e 's/ [1-9][0-9]* 0 //' | sed '/ [1-9][0-9] / !d' | sed -e 's/ [1-9][0-9 ] //' | xargs unix2dos
(Replace the *.cpp with whatever wildcard you need)
To understand how this works, the unix2dos command is used to convert the files, but only files that have inconsistent line endings (i.e., a mixture of UNIX and DOS) need to be converted. The -idu option displays the number of dos and unix line endings in the file. For example:
0 491 Acad2S5kDim.cpp
689 0 Acad2S5kPolyline.cpp
0 120 Acad2S5kRaster.cpp
433 12 Acad2S5kXhat.cpp
0 115 AppAuditInfo.cpp
Here, only the Acad2S5kXhat.cpp file needs to be converted. The sed commands filter the output to produce a list of just the files that need to be converted, and these are then processed via xargs.

piping findstr's output

Windows command line, I want to search a file for all rows starting with:
# NNN "<file>.inc"
where NNN is a number and <file> any string.
I want to use findstr, because I cannot require that the users of the script install ack.
Here is the expression I came up with:
>findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9_]*.inc" all_pre.txt
The file to search is all_pre.txt.
So far so good. Now I want to pipe that to another command, say for example more.
>findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9]*.inc" all_pre.txt | more
The result of this is the same output as the previous command, but with the file name as prefix for every row (all_pre.txt).
Then comes:
FINDSTR: cannot open |
FINDSTR: cannot open more
Why doesn't the pipe work?
snip of the content of all_pre.txt
# 1 "main.ss"
# 7 "main.ss"
# 11 "main.ss"
# 52 "main.ss"
# 1 "Build_flags.inc"
# 7 "Build_flags.inc"
# 11 "Build_flags.inc"
# 20 "Build_flags.inc"
# 45 "Build_flags.inc(function a called from b)"
EDIT: I need to escape the dot in the regex also. Not the issue, but worth to mention.
>findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9_]*\.inc" all_pre.txt
EDIT after Frank Bollack:
>findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9_]*\.inc.*" all_pre.txt | more
is not working, although (I think) it should look for the same string as before then any character any number of times. That must include the ", right?
You are missing a trailing \" in your search pattern.
findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9]*.inc\"" all_pre.txt | more
The above works for me.
Edit:
findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9]*\.inc.*\"" all_pre.txt | more
This updated search string will now match these lines from your example:
# 1 "Build_flags.inc"
# 7 "Build_flags.inc"
# 11 "Build_flags.inc"
# 20 "Build_flags.inc"
# 45 "Build_flags.inc(function a called from b)"
Edit:
To circumvent this "bug" in findstr, you can put your search into a batch file like this:
#findstr /r /c:"^# [0-9][0-9]* \"[a-zA-Z0-9_]*\.inc" %1
Name it something like myfindstr.bat and call it like that:
myfinsdtr all_pre.txt | more
You can now use the pipe and redirection operators as usual.
Hope that helps.
I can't really explain the why, but from my experience although findstr behaviour with fixed strings (e.g. /c:"some string") is exactly as desired, regular expressions are a different beast. I routinely use the fixed string search function like so to extract lines from CSV files:
C:\> findstr /C:"literal string" filename.csv > output.csv
No issue there.
But using regular expressions (e.g. /R "^\"some string\"" ) appears to force the findstr output to console and can't be redirected via any means. I tried >, >>, 1> , 2> and all fail when using regular expressions.
My workaround for this is to use findstr as the secondary command. In my case I did this:
C:\> type filename.csv | findstr /R "^\"some string\"" > output.csv
That worked for me without issue directly from a command line, with a very complex regular expression string. In my case I only had to escape the " for it to work. other characters such as , and . worked fine as literals in the expression without escaping.
I confirmed that the behaviour is the same on both windows 2008 and Windows 7.
EDIT: Another variant also apparently works:
C:\> findstr /R "^\"some string\"" < filename.csv > output.csv
it's the same principle as using type, but just using the command line itself to create the pipe.
If you use a regex with an even number of double quotes, it works perfectly. But your number of " characters is odd, redirection doesn't work. You can either complete your regex with the second quote (you can use range for this purpose: [\"\"]), or replace your quote character with the dot metacharacter.
It looks like a cmd.exe issue, findstr is not guilty.
Here is my find, it's related to the odd number of double quotes not redirecting from within a batch script. Michael Yutsis had it right, just didn't give an example, so I thought I would:
dataset:
"10/19/2022 20:02:06.057","99.526755039736002573"
"10/19/2022 20:02:07.061"," "
"10/19/2022 20:02:08.075","85.797437749585213851"
"10/19/2022 20:02:09.096","96.71306029796799919"
"10/19/2022 20:02:10.107","4.0273833029566628028"
I tried using the following to find just lines that had a fractional portion of a number at the end of each line.
findstr /r /c:"\.[0-9]*\"$" file1.txt > file2.txt
(a valid regex string surrounded by quotes that has one explicit double quote in it)
needed to become
findstr /r /c:"\"[0-9]*\.[0-9]*\"$"" file1.txt > file2.txt
so it could identify the entire decimal (including the explicit quotes).
I tried just adding another double quote at the end of the string ($"" ) and the command worked and generated file2.txt, but it didn't match any lines in the file, so the extra trailing double quote becomes part of the regex string, I guess, and it doesn't match anything. Including the leading double quote around the full decimal was necessary, and fine for my needs.

Resources