The strings command is a handy tool to extract printable strings from binary input files.
I've used it with files plenty.
But what if I wish to stream to it?
A use case is grepping a stream of data that may be binary for specific strings.
I tried
data-source | strings -- - | grep needle to see if the - had it treat stdin as a file type, but this doesn't work, strings just waits.
If you look at the help for strings:
Usage: strings [option(s)] [file(s)]
Display printable strings in [file(s)] (stdin by default)
You see that stdin is the default behavior if there are no arguments. By adding - the behavior seems to change, which is strange, but I was able to reproduce that result too.
So it seems the correct way to do what you want is:
data-source | strings | grep needle
In a comment, you asked why not strings datasource |grep -o needle ?
If you could arrange that command so datasource is a stream, it might work, but it's usually easier to arrange that using |.
For example, the below are roughly equivalent in zsh. You'd have to figure out a way to do it in your shell of choice if that's not zsh.
strings <(tail -f syslog) | grep msec
tail -f syslog | strings | grep msec
Related
There are some great answers that address removing null characters from a file- it seems like sed is probably the most effective way. However, all the other questions I have been able to find are concerned not with finding null characters but removing them.
There are certain questions that do provide valid solutions- however, I am having difficulty finding a POSIX-compliant solution that does not rely on GNUisms. The two solutions I've seen that work use cat with the -v option and grep with the -P option (neither of which shall be supported).
I make it a habit to delegate as much as possible to the shell, but the shell can't help me here because it is not possible to store a null character in a variable. External tools are the only option, but I can't even find a way with them when I adhere to POSIX-compliant options.
One possible way would be tr and wc:
[ "$(tr -cd '\0' < file | wc -c)" -ge 0 ]
Alternatively, od and grep will allow stopping on the first one without reading the rest of the file:
od -A n -t x1 file | grep -q 00
Use tr -d '\000' to trim nulls into a temporary file and use wc -c to get the number of characters in the file. If the temporary-file doesn't match (cmp -s) the original, that contained nulls, and the output from wc can be used to compute the number of nulls -- the point of the question.
p.s.: grep -P isn't POSIX either. Nor is the -C option found in POSIX.
I wanna change unix epoch to normal date
i'm trying:
sed < file.json -e 's/\([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/`date -r \1`/g'
any hint?
With the lack of information from your post, I can not give you a better answer than this but it is possible to execute commands using sed!
You have different ways to do it you can use
directly sed e instruction followed by the command to be
executed, if you do not pass a command to e then it will treat the content of the pattern buffer as external command.
use a simple substitute command with sed and pipe the output to sh
Example 1:
echo 12687278 | sed "s/\([0-9]\{8,\}\)/date -d #\1/;e"
Example 2:
echo 12687278 | sed "s/\([0-9]\{8,\}\)/date -d #\1/" |sh
Test 1 (with Japanese locale LC_TIME=ja_JP.UTF-8):
Test 2 (with Japanese locale LC_TIME=ja_JP.UTF-8):
Remarks:
I will let you adapt the date command accordingly to your system specifications
Since modern dates are longer than 8 characters, the sed command uses an
open ended length specifier of at least 8, rather than exactly 8.
Allan has a nice way to tackle dynamic arguments: write a script dynamically and pipe it to a shell! It works. It tends to be a bit more insecure because you could potentially pipe unintentional shell components to sh - for example if rm -f some-important-file was in the file along with the numbers , the sed pipeline wouldn't change that line, and it would also be passed to sh along with the date commands. Obviously, this is only a concern if you don't control the input. But mistakes can happen.
A similar method I much prefer is with xargs. It's a bit of a head trip for new users, but very powerful. The idea behind xargs is that it takes its input from its standard in, then adds it to the command comprised of its own non-option arguments and runs the command(s). For instance,
$ echo -e "/tmp\n/usr/lib" | xargs ls -d
/tmp /usr/lib
Its a trivial example of course, but you can see more exactly how this works by adding an echo:
echo -e "/tmp\n/usr/lib" | xargs echo ls -d
ls -d /tmp /usr/lib
The input to xargs becomes the additional arguments to the command specified in xargs's own arguments. Read that twice if necessary, or better yet, fiddle with this powerful tool, and the light bulb should come on.
Here's how I would approach what you're doing. Of course I'm not sure if this is actually a logical thing to do in your case, but given the detail you went into in your question, it's the best I can do.
$ cat dates.txt
Dates:
1517363346
I can run a command like this:
$ sed -ne '/^[0-9]\{8,\}$/ p' < dates.txt | xargs -I % -n 1 date -d #%
Tue Jan 30 19:49:06 CST 2018
Makes sense, because I used the commnad echo -e "Dates:\ndate +%s" > dates.txt to make the file a few minutes before I wrote this post! Let's go through it together and I'll break down what I'm doing here.
For one thing, I'm running sed with -n. This tells it not to print the lines by default. That makes this script work if not every line has an 8+ digit "date" in it. I also added anchors to the start (^) and end ($) of the regex so the line had only the approprate digits ( I realize this may not be perfect for you, but without understanding your its input, I can't do better ). These are important changes if your file is not entirely comprised of date strings. Additionally, I am matching at least 8 characters, as modern date strings are going to be more like 10 characters long. Finally, I added a command p to sed. This tells it to print the matching lines, which is necessary because I specifically said not to print the nonmatching lines.
The next bit is the xargs iteslf. The sed will write a date string out to xargs's standard input. I set only a few settings for xargs. By default it will add the standard input to the end of the command, separated by a space. I didn't want a space, so I used -I to specify a replacement string. % doesn't have a special meaning; its just a placeholder that gets replaced with the input. I used % because its not a special character but rarely is used in commands. Finally, I added -n 1 to make sure only 1 input was used per execution of date. ( xargs can also add many inputs together, as in my ls example above).
The end result? Sed matches lines that consist, exclusively, of 8 or more numeric values, outputting the matching lines. The pipe then sends this output to xargs, which takes each line separately (-n 1) and, replacing the placeholder (-I %) with each match, then executes the date command.
This is a shell pattern I really like, and use every day, and with some clever tweaks, can be very powerful. I encourage anyone who uses linux shell to get to know xargs right away.
There is another option for GNU sed users. While the BSD land folks were pretty true to their old BSD unix roots, the GNU folks, who wrote their userspace from scratch, added many wonderful enhancements to the standards. GNU Sed can apparently run a subshell command for you and then do the replacement for you, which would be dramatically easier. Since you are using the bsd style date invocation, I'm going to assume you don't have gnu sed at your disposal.
Using sed: tested with macOs only
There is a slight difference with the command date that should use the flag (-r) instead of (-d) exclusive to macOS
echo 12687278 | sed "s/\([0-9]\{8,\}\)/$(date -r \1)/g"
Results:
Thu Jan 1 09:00:01 JST 1970
I've attempted numerous times and tried different methods but cannot seem to get this to work. I am trying to run a python script and grep the output to see if it is contained in a file and if it is not I want to append it to said file.
$./scan_network.py 22 192.168.1.1 192.168.1.20 | if ! grep -q - ./results.log; then - >> results.log; fi
I understand that it macOS grep does not understand - as stdout and that then - >> would not work because it would not pick up stdout either. I am not sure what to do.
As stated before the primary goal is to check the output of the script against a file and if the IP address is not found in the file, it needs to be appended.
Edit:
results.log is currently an empty file. Output of scan_network.py on would be 192.168.1.6 for now. When I go to run it on another network the output would be numerous addresses in a range example being 10.234.x.y where x and y would be any number between 0 and 255.
One simple solution is to merge the log file and the output of the program into a new log file:
sort -u <(./scan_network.py 22 192.168.1.1 192.168.1.20) results.log > newresults.log
The -u flag causes duplicate lines to be removed from the output, so you will get only one of each line.
That has the side effect of reordering the lines (so that they are sorted alphabetically). It is possible to preserve order if necessary, but it gets more complicated.
With a reasonably modern gnu sort, you can use a "version number" sort, which will do a reasonable job of keeping IP numbers in logical order; you can use the -V flag to do that. Or you can sort the octets individually with sort -u -t. -k1,1n -k2,2n -k3,3n -k4,4n .... Or you can just live with lexicographic ordering. Do not just use -n for standard numeric sorting, because it will only examine the first octet, and that will have an unfortunate interaction with the -u option, since two lines which compare equal are considered duplicates. As numeric sort only considers the numeric prefix, there will be many false duplicates.
If you don't mind sorting and rewriting your log file, rici's helpful answer works well (note that simply using -V for true per-component numerical IP-address sorting is not an option on macOS, unfortunately).[1].
Here's an alternative that only appends to the existing log file on demand, in-place, without reordering existing lines:
grep -f results.log -xFv <(./scan_network.py 22 192.168.1.1 192.168.1.20) >> results.log
Note: This assumes that ./scan_network.py's output is line-based; pipe to tr to transform to line-based output, if necessary.
-f treats each line in the specified file as a separate search term, where a match of any term is considered an overall match.
-x matches lines in full
-F performs literal matching (doesn't interpret search terms as regular expressions)
-v only outputs lines that do not match
The net effect is that only lines output by ./scan_network.py ... that aren't already present in results.log are appended to results.log.
Note, however, that performance will likely suffer the larger results.log becomes, so rici's approach may be preferable in the long run, particularly, if the log file keeps growing and/or you want the log sorted by IP addresses anyway.
As for what you've tried:
Both GNU and BSD/macOS grep optionally accept - as a placeholder for stdin to accept the input from, but note that this operand is never needed, because grep reads input from stdin by default.
By contrast, only GNU grep accepts - as the option-argument to -f, i.e., the file containing the search terms to apply.
BSD/macOS requires either an explicit filename, a process substitution (as above), or, in a pinch, /dev/stdin to refer to stdin.
The logic of your search must be reversed: as in the command above, the existing log file contents must serve as the search terms (passed to -f), and the ./scan_network.py ... output must serve as the input in order to determine which lines are not (-v) already in the log file.
using - to represent stdin or stdout, depending on context, is a mere convention that only works as a command argument, so your attempt to refer to stdout output with if ...; then - >> results.log cannot work, because - is invariably interpreted as a command name.
If you use grep -q, stdout output is by definition suppressed, so there's nothing to pass on (even if you used a pipe).
[1] macOS's (OS X's) sort does not support -V for per-component version-number sorting (which can be applied to IP addresses too). Even though the macOS sort is a GNU sort, it is an ancient one - v5.93 as of macOS 10.12 - that predates support for -V.
Assuming that your script returns a single line of text, you can store the output in a variable and then grep for that string. For example:
logfile="results.log"
# save output to a shell variable
str=$(./scan_network.py 22 192.168.1.1 192.168.1.20)
# don't call grep twice for the same pattern
grep=$(grep -F "$str" "$logfile")
# append if grep results are empty
if [[ -z "$grep" ]]; then
echo "$grep" >> "$logfile"
fi
Problem - I have a set of strings that essentially look like this:
|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|...|ZZZZZZZZZ|
The '...' denotes omitted fields.
Please note that the fields between the pipes ('|') can appear in ANY ORDER and not all fields are necessarily present. My task is to find the "XXXXXXX" field and extract it from the string; I can specify that field with a regex and find it with grep/awk/etc., but once I have that one line extracted from the file, I am at a loss as to how to extract just that text between the pipes.
My searches have turned up splitting the line into individual fields and then extracting the Nth field, however, I do not know what N is, that is the trick.
I've thought of splitting the string by the delimiter, substituting the delimiter with a newline, piping those lines into a grep for the field, but that involves running another program and this will be run on a production server through near-TB of data, so I wanted to minimize program invocations. And I cannot copy the files to another machine nor do I have the benefit of languages like Python, Perl, etc., I'm stuck with the "standard" UNIX commands on SunOS. I think I'm being punished.
Thanks
As an example, let's extract the field that matches MyField:
Using sed
$ s='|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|12MyField34|ZZZZZZZZZ|'
$ sed -E 's/.*[|]([^|]*MyField[^|]*)[|].*/\1/' <<<"$s"
12MyField34
Using awk
$ awk -F\| -v re="MyField" '{for (i=1;i<=NF;i++) if ($i~re) print $i}' <<<"$s"
12MyField34
Using grep -P
$ grep -Po '(?<=\|)[^|]*MyField[^|]*' <<<"$s"
12MyField34
The -P option requires GNU grep.
$ sed -e 's/^.*|\(XXXXXXXXX\)|.*$/\1/'
Naturally, this only makes sense if XXXXXXXXX is a regular expression.
This should be really fast if used something like:
$ grep '|XXXXXXXXX|' somefile | sed -e ...
One hackish way -
sed 's/^.*|\(<whatever your regex is>\)|.*$/\1/'
but that might be too slow for your production server since it may involve a fair amount of regex backtracking.
How do I limit grep's output to just one line per file?
(Since this is part of a shellscript function, I can use everything, but I'm too nooby to figure out how to pipe the specific parts the right way.)
The function I'm trying to write is basically "Given a string, display every file (in this directory and all subdirectories), which contains it and display a list of those files as clickable links"
(btw. could you hint me to scripts/commands, which do something like this?)
If you are interested: The functions in .bashrc are these:
(And should be used like: "where foobar")
function where(){
grep -rHoiIm1 "$#" | cut -d":" -f1-1 | asURL
}
function asURL() {
PREFIX="file://$(pwd)/";
sed "s*^*$PREFIX*" |
sed 's/ /%20/g';
}
If you're only interested in the paths of matching files, use the -l / --files-with-matches option:
function where(){
grep -riIl "$#" | asURL
}
Note that I've omitted several options that don't apply anymore once you use -l.
As an aside: while your asUrl() function will work in simple cases, it's not fully robust and can result in invalid URLs. Aside from that, there's no reason for two invocations of sed; simply string the two s calls together in a single script, separated with ;.
Add the -l option to grep to tell it to output file names only.
From the grep man page:
-l
--files-with-matches
Suppress normal output; instead print the name of each input file from
which output would normally have been printed. The scanning of each file
stops on the first match. (-l is specified by POSIX.)