How to check if stdout of a program is in a file? - bash

I've attempted numerous times and tried different methods but cannot seem to get this to work. I am trying to run a python script and grep the output to see if it is contained in a file and if it is not I want to append it to said file.
$./scan_network.py 22 192.168.1.1 192.168.1.20 | if ! grep -q - ./results.log; then - >> results.log; fi
I understand that it macOS grep does not understand - as stdout and that then - >> would not work because it would not pick up stdout either. I am not sure what to do.
As stated before the primary goal is to check the output of the script against a file and if the IP address is not found in the file, it needs to be appended.
Edit:
results.log is currently an empty file. Output of scan_network.py on would be 192.168.1.6 for now. When I go to run it on another network the output would be numerous addresses in a range example being 10.234.x.y where x and y would be any number between 0 and 255.

One simple solution is to merge the log file and the output of the program into a new log file:
sort -u <(./scan_network.py 22 192.168.1.1 192.168.1.20) results.log > newresults.log
The -u flag causes duplicate lines to be removed from the output, so you will get only one of each line.
That has the side effect of reordering the lines (so that they are sorted alphabetically). It is possible to preserve order if necessary, but it gets more complicated.
With a reasonably modern gnu sort, you can use a "version number" sort, which will do a reasonable job of keeping IP numbers in logical order; you can use the -V flag to do that. Or you can sort the octets individually with sort -u -t. -k1,1n -k2,2n -k3,3n -k4,4n .... Or you can just live with lexicographic ordering. Do not just use -n for standard numeric sorting, because it will only examine the first octet, and that will have an unfortunate interaction with the -u option, since two lines which compare equal are considered duplicates. As numeric sort only considers the numeric prefix, there will be many false duplicates.

If you don't mind sorting and rewriting your log file, rici's helpful answer works well (note that simply using -V for true per-component numerical IP-address sorting is not an option on macOS, unfortunately).[1].
Here's an alternative that only appends to the existing log file on demand, in-place, without reordering existing lines:
grep -f results.log -xFv <(./scan_network.py 22 192.168.1.1 192.168.1.20) >> results.log
Note: This assumes that ./scan_network.py's output is line-based; pipe to tr to transform to line-based output, if necessary.
-f treats each line in the specified file as a separate search term, where a match of any term is considered an overall match.
-x matches lines in full
-F performs literal matching (doesn't interpret search terms as regular expressions)
-v only outputs lines that do not match
The net effect is that only lines output by ./scan_network.py ... that aren't already present in results.log are appended to results.log.
Note, however, that performance will likely suffer the larger results.log becomes, so rici's approach may be preferable in the long run, particularly, if the log file keeps growing and/or you want the log sorted by IP addresses anyway.
As for what you've tried:
Both GNU and BSD/macOS grep optionally accept - as a placeholder for stdin to accept the input from, but note that this operand is never needed, because grep reads input from stdin by default.
By contrast, only GNU grep accepts - as the option-argument to -f, i.e., the file containing the search terms to apply.
BSD/macOS requires either an explicit filename, a process substitution (as above), or, in a pinch, /dev/stdin to refer to stdin.
The logic of your search must be reversed: as in the command above, the existing log file contents must serve as the search terms (passed to -f), and the ./scan_network.py ... output must serve as the input in order to determine which lines are not (-v) already in the log file.
using - to represent stdin or stdout, depending on context, is a mere convention that only works as a command argument, so your attempt to refer to stdout output with if ...; then - >> results.log cannot work, because - is invariably interpreted as a command name.
If you use grep -q, stdout output is by definition suppressed, so there's nothing to pass on (even if you used a pipe).
[1] macOS's (OS X's) sort does not support -V for per-component version-number sorting (which can be applied to IP addresses too). Even though the macOS sort is a GNU sort, it is an ancient one - v5.93 as of macOS 10.12 - that predates support for -V.

Assuming that your script returns a single line of text, you can store the output in a variable and then grep for that string. For example:
logfile="results.log"
# save output to a shell variable
str=$(./scan_network.py 22 192.168.1.1 192.168.1.20)
# don't call grep twice for the same pattern
grep=$(grep -F "$str" "$logfile")
# append if grep results are empty
if [[ -z "$grep" ]]; then
echo "$grep" >> "$logfile"
fi

Related

Can you pipe into the bash strings command?

The strings command is a handy tool to extract printable strings from binary input files.
I've used it with files plenty.
But what if I wish to stream to it?
A use case is grepping a stream of data that may be binary for specific strings.
I tried
data-source | strings -- - | grep needle to see if the - had it treat stdin as a file type, but this doesn't work, strings just waits.
If you look at the help for strings:
Usage: strings [option(s)] [file(s)]
Display printable strings in [file(s)] (stdin by default)
You see that stdin is the default behavior if there are no arguments. By adding - the behavior seems to change, which is strange, but I was able to reproduce that result too.
So it seems the correct way to do what you want is:
data-source | strings | grep needle
In a comment, you asked why not strings datasource |grep -o needle ?
If you could arrange that command so datasource is a stream, it might work, but it's usually easier to arrange that using |.
For example, the below are roughly equivalent in zsh. You'd have to figure out a way to do it in your shell of choice if that's not zsh.
strings <(tail -f syslog) | grep msec
tail -f syslog | strings | grep msec

How do I get rid of “--” line separator when using grep

I'm using the commands given below for splitting my fastq file into two separate paired end reads files:
grep '#.*/1' -A 3 24538_7#2.fq >24538_7#2_1.fq
grep '#.*/2' -A 3 24538_7#2.fq >24538_7#2_2.fq
But it's automatically introducing a -- line separator between the entries. Hence, making my fastq file inappropriate for further processing(because it then becomes an invalid fastq format).
So, I want to get rid of the line separator(--).
PS: I've found the answer for Linux machine but I'm using MacOS, and those didn't work on Mac terminal.
You can use the --no-group-separator option to suppress it (in GNU grep).
Alternatively, you could use (GNU) sed:
sed '\|#.*/1|,+3!d'
deletes all lines other than the one matching #.*/1 and the next three lines.
For macOS sed, you could use
sed -n '\|#.*/1|{N;N;N;p;}'
but this gets unwieldy quickly for more context lines.
Another approach would be to chain grep with itself:
grep '#.*/1' -A 3 file.fq | grep -v "^--"
The second grep selects non-matching (-v) lines that start with -- (though this pattern can sometimes be interpreted as a command line option, requiring some weird escaping like "[-][-]", which is why i put the ^ there).

Unix Epoch to date with sed

I wanna change unix epoch to normal date
i'm trying:
sed < file.json -e 's/\([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/`date -r \1`/g'
any hint?
With the lack of information from your post, I can not give you a better answer than this but it is possible to execute commands using sed!
You have different ways to do it you can use
directly sed e instruction followed by the command to be
executed, if you do not pass a command to e then it will treat the content of the pattern buffer as external command.
use a simple substitute command with sed and pipe the output to sh
Example 1:
echo 12687278 | sed "s/\([0-9]\{8,\}\)/date -d #\1/;e"
Example 2:
echo 12687278 | sed "s/\([0-9]\{8,\}\)/date -d #\1/" |sh
Test 1 (with Japanese locale LC_TIME=ja_JP.UTF-8):
Test 2 (with Japanese locale LC_TIME=ja_JP.UTF-8):
Remarks:
I will let you adapt the date command accordingly to your system specifications
Since modern dates are longer than 8 characters, the sed command uses an
open ended length specifier of at least 8, rather than exactly 8.
Allan has a nice way to tackle dynamic arguments: write a script dynamically and pipe it to a shell! It works. It tends to be a bit more insecure because you could potentially pipe unintentional shell components to sh - for example if rm -f some-important-file was in the file along with the numbers , the sed pipeline wouldn't change that line, and it would also be passed to sh along with the date commands. Obviously, this is only a concern if you don't control the input. But mistakes can happen.
A similar method I much prefer is with xargs. It's a bit of a head trip for new users, but very powerful. The idea behind xargs is that it takes its input from its standard in, then adds it to the command comprised of its own non-option arguments and runs the command(s). For instance,
$ echo -e "/tmp\n/usr/lib" | xargs ls -d
/tmp /usr/lib
Its a trivial example of course, but you can see more exactly how this works by adding an echo:
echo -e "/tmp\n/usr/lib" | xargs echo ls -d
ls -d /tmp /usr/lib
The input to xargs becomes the additional arguments to the command specified in xargs's own arguments. Read that twice if necessary, or better yet, fiddle with this powerful tool, and the light bulb should come on.
Here's how I would approach what you're doing. Of course I'm not sure if this is actually a logical thing to do in your case, but given the detail you went into in your question, it's the best I can do.
$ cat dates.txt
Dates:
1517363346
I can run a command like this:
$ sed -ne '/^[0-9]\{8,\}$/ p' < dates.txt | xargs -I % -n 1 date -d #%
Tue Jan 30 19:49:06 CST 2018
Makes sense, because I used the commnad echo -e "Dates:\ndate +%s" > dates.txt to make the file a few minutes before I wrote this post! Let's go through it together and I'll break down what I'm doing here.
For one thing, I'm running sed with -n. This tells it not to print the lines by default. That makes this script work if not every line has an 8+ digit "date" in it. I also added anchors to the start (^) and end ($) of the regex so the line had only the approprate digits ( I realize this may not be perfect for you, but without understanding your its input, I can't do better ). These are important changes if your file is not entirely comprised of date strings. Additionally, I am matching at least 8 characters, as modern date strings are going to be more like 10 characters long. Finally, I added a command p to sed. This tells it to print the matching lines, which is necessary because I specifically said not to print the nonmatching lines.
The next bit is the xargs iteslf. The sed will write a date string out to xargs's standard input. I set only a few settings for xargs. By default it will add the standard input to the end of the command, separated by a space. I didn't want a space, so I used -I to specify a replacement string. % doesn't have a special meaning; its just a placeholder that gets replaced with the input. I used % because its not a special character but rarely is used in commands. Finally, I added -n 1 to make sure only 1 input was used per execution of date. ( xargs can also add many inputs together, as in my ls example above).
The end result? Sed matches lines that consist, exclusively, of 8 or more numeric values, outputting the matching lines. The pipe then sends this output to xargs, which takes each line separately (-n 1) and, replacing the placeholder (-I %) with each match, then executes the date command.
This is a shell pattern I really like, and use every day, and with some clever tweaks, can be very powerful. I encourage anyone who uses linux shell to get to know xargs right away.
There is another option for GNU sed users. While the BSD land folks were pretty true to their old BSD unix roots, the GNU folks, who wrote their userspace from scratch, added many wonderful enhancements to the standards. GNU Sed can apparently run a subshell command for you and then do the replacement for you, which would be dramatically easier. Since you are using the bsd style date invocation, I'm going to assume you don't have gnu sed at your disposal.
Using sed: tested with macOs only
There is a slight difference with the command date that should use the flag (-r) instead of (-d) exclusive to macOS
echo 12687278 | sed "s/\([0-9]\{8,\}\)/$(date -r \1)/g"
Results:
Thu Jan 1 09:00:01 JST 1970

Using both GNU Utils with Mac Utils in bash

I am working with plotting extremely large files with N number of relevant data entries. (N varies between files).
In each of these files, comments are automatically generated at the start and end of the file and would like to filter these out before recombining them into one grand data set.
Unfortunately, I am using MacOSx, where I encounter some issues when trying to remove the last line of the file. I have read that the most efficient way was to use head/tail bash commands to cut off sections of data. Since head -n -1 does not work for MacOSx I had to install coreutils through homebrew where the ghead command works wonderfully. However the command,
tail -n+9 $COUNTER/test.csv | ghead -n -1 $COUNTER/test.csv >> gfinal.csv
does not work. A less than pleasing workaround was I had to separate the commands, use ghead > newfile, then use tail on newfile > gfinal. Unfortunately, this will take while as I have to write a new file with the first ghead.
Is there a workaround to incorporating both GNU Utils with the standard Mac Utils?
Thanks,
Keven
The problem with your command is that you specify the file operand again for the ghead command, instead of letting it take its input from stdin, via the pipe; this causes ghead to ignore stdin input, so the first pipe segment is effectively ignored; simply omit the file operand for the ghead command:
tail -n+9 "$COUNTER/test.csv" | ghead -n -1 >> gfinal.csv
That said, if you only want to drop the last line, there's no need for GNU head - OS X's own BSD sed will do:
tail -n +9 "$COUNTER/test.csv" | sed '$d' >> gfinal.csv
$ matches the last line, and d deletes it (meaning it won't be output).
Finally, as #ghoti points out in a comment, you could do it all using sed:
sed -n '9,$ {$!p;}' file
Option -n tells sed to only produce output when explicitly requested; 9,$ matches everything from line 9 through (,) the end of the file (the last line, $), and {$!p;} prints (p) every line in that range, except (!) the last ($).
I realize that your question is about using head and tail, but I'll answer as if you're interested in solving the original problem rather than figuring out how to use those particular tools to solve the problem. :)
One method using sed:
sed -e '1,8d;$d' inputfile
At this level of simplicity, GNU sed and BSD sed both work the same way. Our sed script says:
1,8d - delete lines 1 through 8,
$d - delete the last line.
If you decide to generate a sed script like this on-the-fly, beware of your quoting; you will have to escape the dollar sign if you put it in double quotes.
Another method using awk:
awk 'NR>9{print last} NR>1{last=$0}' inputfile
This works a bit differently in order to "recognize" the last line, capturing the previous line and printing after line 8, and then NOT printing the final line.
This awk solution is a bit of a hack, and like the sed solution, relies on the fact that you only want to strip ONE final line of the file.
If you want to strip more lines than one off the bottom of the file, you'd probably want to maintain an array that would function sort of as a buffered FIFO or sliding window.
awk -v striptop=8 -v stripbottom=3 '
{ last[NR]=$0; }
NR > striptop*2 { print last[NR-striptop]; }
{ delete last[NR-striptop]; }
END { for(r in last){if(r<NR-stripbottom+1) print last[r];} }
' inputfile
You specify how much to strip in variables. The last array keeps a number of lines in memory, prints from the far end of the stack, and deletes them as they are printed. The END section steps through whatever remains in the array, and prints everything not prohibited by stripbottom.

Shell: Get a list of latest files by filename

I have files like
update-1.0.1.patch
update-1.0.2.patch
update-1.0.3.patch
update-1.0.4.patch
update-1.0.5.patch
And I have a variable that contains the last applied path (e.g. update-1.0.3.patch). So, now I have to get a list of files to apply (in the example updates 1.0.4 and 1.0.5). How can I get such list?
To clarify, I need a method to get a list of files that comes alphabetically later of given file and this list of files must be also in alphabetical order (obviously is not allways possible to apply the patch 1.0.5 before 1.0.4).
sed is your go-to guy for printing ranges of lines. You give it a starting/ending address or pattern and command to do between.
ls update-1.0.* | sort | sed -ne "/$ENVVAR/,// p"
The sort probably isn't necessary because ls can sort by name, but it might be good to include as a courtesy to maintainers to show the necessity. The -n to sed means "don't print every line automatically" and the -e means "I'm giving you a script on the command line". I used " to enclose the script to that $ENVVAR would be eval'd. The ending address is empty (//) and the p means "print the line".
Oh, and I just noticed you only want the ones later. There's probably a way to tell sed to start on the line after your address, but instead I'd pipe it through tail -n +2 to start on the second line.

Resources