How to apply a command to each line in pipe? - bash

I want to apply a command to each line of piped stdin like so:
cat file.txt | grep ... | ./filter | wc -l
the problem is ./filter accepts only a single line of input and gives a single line of output. I've tried xargs but it spawns a subshell and I can't capture it's output to continue working with the result. Is there an easy way to do that?

If it accepts a single line, then you should put it in a loop if you want to process multiple lines,
cat file.txt |
grep ... |
while read line ; do
echo "$line" | ./filter
done |
wc -l

To call a command for each line you can read a line into a variable and use the variable as a standard input. (Also, let’s avoid UUOC.)
grep ... < file.txt |
while IFS= read -r line; do
./filter <<< "$line"
done |
wc -l
In this case it looks like things may get way easier if you instead write the whole filter in awk. Because it will give you wc -l for free (NR), plus line and record splitting and filtering better than what grep can do.

Related

Why doesn't this sed command put a newline

I have a file, ciao.py thas has only one line in it: print("ciao")
I want to do this: I want to do that via pipe stream, and als, if I do cat ciao.py | sed 's/.*/&\n&/' it would work, but I want to do this in two separated parts, simulating the case where I want to print it and then pass that to further commands.
If I do this:
cat ciao.py | sed 's/.*/&\n/' |tee >(xargs echo) | xargs echo
it does not work. It prints print("ciao") print("ciao") in the same line. I don't understand why, since I am putting \n with sed.
I'd guess print cia is appearing twice on the same line because xargs is calling echo with multiple strings since xargs calls the command you provide it with groups of input lines at a time by default.
Is this what you're trying to do?
$ cat ciao.py | sed 's/.*/&\n/' |tee >(xargs -n 1 echo) | xargs -n 1 echo
print(ciao)
print(ciao)
or:
$ cat ciao.py | sed 's/.*/&\n/' |tee >(cat) | xargs -n 1 echo
print(ciao)
print(ciao)
There are, of course, better ways to get that output from that input, e.g.:
$ sed 'p' ciao.py
print("ciao")
print("ciao")

Modify output from Stdout and redirect

I am using John the Ripper, an application that outputs a generated password line by line. I want to make a bash script that takes the output of each line and apply "md5sum" to it and print it out.
For example:
$ ./john --wordlist=password.lst -rules:Single
12346
fdgh
sdfdfj
test
password1234
...
and so on... (really fast)
I want to take each line and apply md5sum to each line.
$ md5sum <<< "12346"
f447b20a7fcbf53a5d5be013ea0b15af -
Use
command | while IFS= read -r l; do md5sum <<<"$l"; done
or simpler with xargs (or not simpler):
command | xargs -n1 sh -c 'md5sum <<<"$1"' --
where command is your ./john --wordlist=password.lst -rules:Single command.

how to cat command output to string in shell script

in my script i need to loop through lines in a file, once i find some specific line i need to save it to variable so later on i can use it outside the loop, i tried the following but it wont' work:
count=0
res=""
python my.py -p 12345 |
while IFS= read -r line
do
count=$((count+1))
if [ "$count" -eq 5 ]; then
res=`echo "$line" | xargs`
fi
done
echo "$res"
it output nothing, i also tried this,
res=""
... in the loop...
res=$res`echo "$line" | xargs`
still nothing. please help. thanks.
Update: Thanks for all the help. here is my final code:
res=python my.py -p 12345 | sed -n '5p' | xargs
for finding a specific line in a file, have you considered using grep?
grep "thing I'm looking for" /path/to/my.file
this will output the lines that match the thing you're looking for. Moreover this can be piped to xargs as in your question.
If you need to look at a particularly numbered line of a file, consider using the head and tail commands (which can also be piped to grep).
cat /path/to/my.file | head -n5 | tail -n1 | grep "thing I'm looking for"
These commands take the first lines specified (in this case, 5 and 1 respectively) and only prints those out. Hopefully this will help you accomplish your task.
Happy coding! Leave a comment if you have any questions.

Getting head to display all but the last line of a file: command substitution and standard I/O redirection

I have been trying to get the head utility to display all but the last line of standard input. The actual code that I needed is something along the lines of cat myfile.txt | head -n $(($(wc -l)-1)). But that didn't work. I'm doing this on Darwin/OS X which doesn't have the nice semantics of head -n -1 that would have gotten me similar output.
None of these variations work either.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
I tested out more variations and in particular found this to work:
cat <<EOF | echo $(($(wc -l)-1))
>Hola
>Raul
>Como Esta
>Bueno?
>EOF
3
Here's something simpler that also works.
echo "hello world" | echo $(($(wc -w)+10))
This one understandably gives me an illegal line count error. But it at least tells me that the head program is not consuming the standard input before passing stuff on to the subshell/command substitution, a remote possibility, but one that I wanted to rule out anyway.
echo "hello" | head -n $(cat && echo 1)
What explains the behavior of head and wc and their interaction through subshells here? Thanks for your help.
head -n -1 will give you all except the last line of its input.
head is the wrong tool. If you want to see all but the last line, use:
sed \$d
The reason that
# Sample of incorrect code:
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
fails is that wc consumes all of the input and there is nothing left for head to see. wc inherits its stdin from the subshell in which it is running, which is reading from the output of the echo. Once it consumes the input, it returns and then head tries to read the data...but it is all gone. If you want to read the input twice, the data will have to be saved somewhere.
Using sed:
sed '$d' filename
will delete the last line of the file.
$ seq 1 10 | sed '$d'
1
2
3
4
5
6
7
8
9
For Mac OS X specifically, I found an answer from a comment to this Q&A.
Assuming you are using Homebrew, run brew install coreutils then use the ghead command:
cat myfile.txt | ghead -n -1
Or, equivalently:
ghead -n -1 myfile.txt
Lastly, see brew info coreutils if you'd like to use the commands without the g prefix (e.g., head instead of ghead).
cat myfile.txt | echo $(($(wc -l)-1))
This works. It's overly complicated: you could just write echo $(($(wc -l)-1)) <myfile.txt or echo $(($(wc -l <myfile.txt)-1)). The problem is the way you're using it.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
wc consumes all the input as it's counting the lines. So there is no data left to read in the pipe by the time head is started.
If your input comes from a file, you can redirect both wc and head from that file.
head -n $(($(wc -l <myfile.txt) - 1)) <myfile.txt
If your data may come from a pipe, you need to duplicate it. The usual tool to duplicate a stream is tee, but that isn't enough here, because the two outputs from tee are produced at the same rate, whereas here wc needs to fully consume its output before head can start. So instead, you'll need to use a single tool that can detect the last line, which is a more efficient approach anyway.
Conveniently, sed offers a way of matching the last line. Either printing all lines but the last, or suppressing the last output line, will work:
sed -n '$! p'
sed '$ d'
Here is a one-liner that can get you the desired output, and it can be used more generally for getting all lines from a file except the last n lines.
grep -n "" myfile.txt \ # output the line number for each line
| sort -nr \ # reverse the file by using those line numbers
| sed '1,4d' \ # delete first 4 lines (last 4 of the original file)
| sort -n \ # reverse the reversed file (correct the line order)
| sed 's/^[0-9]*://' # remove the added line numbers
Here is the above command in an actual single line and runnable (can't execute the above due to the added comments):
grep -n "" myfile.txt | sort -nr | sed '1,4d' | sort -n | sed 's/^[0-9]*://'
It's a little cumbersome, and this problem can be solved with more comprehensive commands like ghead, but when you can't or don't want to download such tools, it's nice to be able to do this with the more basic options. I've been in situations where it's simply not an option to get better tools.
awk 'NR>1{print p}{p=$0}'
For this job, an awk one-liner is a bit longer than a sed one.

bash echo number of lines of file given in a bash variable without the file name

I have the following three constructs in a bash script:
NUMOFLINES=$(wc -l $JAVA_TAGS_FILE)
echo $NUMOFLINES" lines"
echo $(wc -l $JAVA_TAGS_FILE)" lines"
echo "$(wc -l $JAVA_TAGS_FILE) lines"
And they both produce identical output when the script is run:
121711 /home/slash/.java_base.tag lines
121711 /home/slash/.java_base.tag lines
121711 /home/slash/.java_base.tag lines
I.e. the name of the file is also echoed (which I don't want to). Why do these scriplets fail and how should I output a clean:
121711 lines
?
An Example Using Your Own Data
You can avoid having your filename embedded in the NUMOFLINES variable by using redirection from JAVA_TAGS_FILE, rather than passing the filename as an argument to wc. For example:
NUMOFLINES=$(wc -l < "$JAVA_TAGS_FILE")
Explanation: Use Pipes or Redirection to Avoid Filenames in Output
The wc utility will not print the name of the file in its output if input is taken from a pipe or redirection operator. Consider these various examples:
# wc shows filename when the file is an argument
$ wc -l /etc/passwd
41 /etc/passwd
# filename is ignored when piped in on standard input
$ cat /etc/passwd | wc -l
41
# unusual redirection, but wc still ignores the filename
$ < /etc/passwd wc -l
41
# typical redirection, taking standard input from a file
$ wc -l < /etc/passwd
41
As you can see, the only time wc will print the filename is when its passed as an argument, rather than as data on standard input. In some cases, you may want the filename to be printed, so it's useful to understand when it will be displayed.
wc can't get the filename if you don't give it one.
wc -l < "$JAVA_TAGS_FILE"
You can also use awk:
awk 'END {print NR,"lines"}' filename
Or
awk 'END {print NR}' filename
(apply on Mac, and probably other Unixes)
Actually there is a problem with the wc approach: it does not count the last line if it does not terminate with the end of line symbol.
Use this instead
nbLines=$(cat -n file.txt | tail -n 1 | cut -f1 | xargs)
or even better (thanks gniourf_gniourf):
nblines=$(grep -c '' file.txt)
Note: The awk approach by chilicuil also works.
It's a very simple:
NUMOFLINES=$(cat $JAVA_TAGS_FILE | wc -l )
or
NUMOFLINES=$(wc -l $JAVA_TAGS_FILE | awk '{print $1}')
I normally use the 'back tick' feature of bash
export NUM_LINES=`wc -l filename`
Note the 'tick' is the 'back tick' e.g. ` not the normal single quote

Resources