Split pipe into two and paste the results together? - bash

I want to pipe the output of the command into two commands and paste the results together. I found this answer and similar ones suggesting using tee but I'm not sure how to make it work as I'd like it to.
My problem (simplified):
Say that I have a myfile.txt with keys and values, e.g.
key1 /path/to/file1
key2 /path/to/file2
What I am doing right now is
paste \
<( cat myfile.txt | cut -f1 ) \
<( cat myfile.txt | cut -f2 | xargs wc -l )
and it produces
key1 23
key2 42
The problem is that cat myfile.txt is repeated here (in the real problem it's a heavier operation). Instead, I'd like to do something like
cat myfile.txt | tee \
<( cut -f1 ) \
<( cut -f2 | xargs wc -l ) \
| paste
But it doesn't produce the expected output. Is it possible to do something similar to the above with pipes and standard command-line tools?

This doesn't answer your question about pipes, but you can use AWK to solve your problem:
$ printf %s\\n 1 2 3 > file1.txt
$ printf %s\\n 1 2 3 4 5 > file2.txt
$ cat > myfile.txt <<EOF
key1 file1.txt
key2 file2.txt
EOF
$ cat myfile.txt | awk '{ ("wc -l " $2) | getline size; sub(/ .+$/,"",size); print $1, size }'
key1 3
key2 5
On each line we first we run wc -l $2 and save the result into a variable. Not sure about yours, but on my system wc -l includes the filename in the output, so we strip it with sub() to match your example output. And finally, we print the $1 field (key) and the size we got from wc -l command.
Also, can be done with shell, now that I think about it:
cat myfile.txt | while read -r key value; do
printf '%s %s\n' "$key" "$(wc -l "$value" | cut -d' ' -f1)"
done
Or more generally, by piping to two commands and using paste, therefore answering the question:
cat myfile.txt | while read -r line; do
printf %s "$line" | cut -f1
printf %s "$line" | cut -f2 | xargs wc -l | cut -d' ' -f1
done | paste - -
P.S. The use of cat here is useless, I know. But it's just a placeholder for the real command.

Related

GNU parallel with custom script doing string comparison

The follwoing script.sh compares part of a string (coming from stdin by cating a csv file) to a defined string and reports the differences in a certain format
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done < "${1:-/dev/stdin}"
It is intendet to be executed on a number of rows from a very large file in the format
XYZ,ABMDEFG
and it works well when i use it in a pipe:
cat large_file | ./find_something.sh
However, when I try to use it with parallel, i get this error:
$ cat large_file | parallel ./find_something.sh
./find_something.sh: line 9: XYZ, ABMDEFG : No such file or directory
What is causing this? Is parallel supposed to work for something like this, if I want to redirect the output to a single file afterwards?
Less important side note: I'm rather proud of my string comparison method, but if someone has a faster way to get from comparing ABCDEFG and XYZ,ABMDEFG to obtain XYZ,C3M I'd be happy to hear that, too.
Edit:
I should have said, I also want to preserve the order of each line in the output, corresponding to the input. Is that possible using parallel?
Your script accepts its input from a file (defaulting to stdin), whereas parallel will pass input as arguments, not via stdin. In that sense, parallel is closer to xargs.
Presumably, you want each of the lines in large_file to be processed as a unit, possibly in parallel.
That means you need your script to only process one such line at a time, and let parallel call your script many times, once for each line.
So your script should look like this:
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
line="$1"
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
Then you can redirect to a file as follows:
cat large_file | parallel ./find_something.sh > output_file
-k keeps the order.
#!/usr/bin/env bash
doit() {
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done
}
export -f doit
cat large_file | parallel --pipe -k doit
#or
parallel --pipepart -a large_file --block -10 -k doit

How to get nth line of a file in bash?

I want to extract from a file named datax.txt the second line being :
0/0/0/0/0/0 | 0/0/0/0/0/0 | 0/0/0/0/0/0
And then I want to store in 3 variables the 3 sequences 0/0/0/0/0/0.
How am I supposed to do?
Read the 2nd line into variables a,b and c.
read a b c <<< $(awk -F'|' 'NR==2{print $1 $2 $3}' datax)
the keys is to split the problem in two:
you want to get the nth line of a file -> see here
you want to split a line in chunks according to a delimiter -> that's the job of many tools, cut is one of them
For future questions, be sure to include a more complete dataset, here is one for now. I changed a bit the second line so that we can verify that we got the right column:
f.txt
4/4/4/4/4/4 | 4/4/4/4/4/4 | 4/4/4/4/4/4
0/0/0/0/a/0 | 0/0/0/0/b/0 | 0/0/0/0/c/0
8/8/8/8/8/8 | 8/8/8/8/8/8 | 8/8/8/8/8/8
8/8/8/8/8/8 | 8/8/8/8/8/8 | 8/8/8/8/8/8
Then a proper script building on the two key actions described above:
extract.bash
file=$1
target_line=2
# get the n-th line
# https://stackoverflow.com/questions/6022384/bash-tool-to-get-nth-line-from-a-file
line=$(cat $file | head -n $target_line | tail -1)
# get the n-th field on a line, using delimiter '|'
var1=$(echo $line | cut --delimiter='|' --fields=1)
echo $var1
var2=$(echo $line | cut --delimiter='|' --fields=2)
echo $var2
var3=$(echo $line | cut --delimiter='|' --fields=3)
echo $var3
aaand:
$ ./extract.bash f.txt
0/0/0/0/a/0
0/0/0/0/b/0
0/0/0/0/c/0
Please try the following:
IFS='|' read a b c < <(sed -n 2P < datax | tr -d ' ')
Then the variables a, b and c are assigned to each field of the 2nd line.
You can use sed to print a specific line of a file, so for your example on the second line:
sed -n -e 2p ./datax
Set the output of the sed to be a variable:
Var=$(sed -n -e 2p ./datax)
Then split the string into the 3 variables you need:
A="$(echo $Var | cut -d'|' -f1)"
B="$(echo $Var | cut -d'|' -f2)"
C="$(echo $Var | cut -d'|' -f3)"

Print Unique Values while using Do-While loop

I have a file named textfile.txt like below:
a 1 xxx
b 1 yyy
c 2 zzz
d 2 aaa
e 3 bbb
f 3 ccc
I am trying to filter the second column with a unique values in that. I had below code:
while read LINE
do
compname=`echo ${LINE} | cut -d' ' -f2 | uniq`
echo -e "${compname}"
done < textfile.txt
It is displaying:
1
1
2
2
3
3
But I am looking for an output like:
1
2
3
I tried another command also like : echo ${LINE} | cut -d' ' -f2 | sort -u | uniq
still not expected output.
Can anyone help me?
There's no need to loop, sort -u already processes the whole input.
cut -d' ' -f2 textfile.txt | sort -u
Maybe you wanted to get the output in the original order, showing the first occurrence only? You can use an associative array to remember which values have been already seen:
#! /bin/bash
declare -A seen
while read x ; do
[[ ${seen[$x]} ]] || printf '%s\n' "$x"
seen[$x]=1
done < <(cut -d' ' -f2 textfile.txt)
For the last occurrence only, change the last line to
done < <(cut -d' ' -f2 textfile.txt | tac) | tac
(i.e. the last occurrence is the first occurrence in the reversed order)
Just pipe the output of the loop to sort -u. There's no need for cut; the read command can handle this type of splitting.
while read -r _ compname _; do
echo "$compname"
done < textfile.txt | sort -u
Try moving the sort -u or sort | uniq after the done statement like this:
while read LINE;
do
compname=$(echo ${LINE} | cut -d' ' -f2)
echo "${compname}"
done < textfile.txt | sort -u

Use each line of piped output as parameter for script

I have an application (myapp) that gives me a multiline output
result:
abc|myparam1|def
ghi|myparam2|jkl
mno|myparam3|pqr
stu|myparam4|vwx
With grep and sed I can get my parameters as below
myapp | grep '|' | sed -e 's/^[^|]*//' | sed -e 's/|.*//'
But then want these myparamx values as paramaters of a script to be executed for each parameter.
myscript.sh myparam1
myscript.sh myparam2
etc.
Any help greatly appreciated
Please see xargs. For example:
myapp | grep '|' | sed -e 's/^[^|]*//' | sed -e 's/|.*//' | xargs -n 1 myscript.sh
May be this can help -
myapp | awk -F"|" '{ print $2 }' | while read -r line; do /path/to/script/ "$line"; done
I like the xargs -n 1 solution from Dark Falcon, and while read is a classical tool for such kind of things, but just for completeness:
myapp | awk -F'|' '{print "myscript.sh", $2}' | bash
As a side note, speaking about extraction of 2nd field, you could use cut:
myapp | cut -d'|' -f 1 # -f 1 => second field, starting from 0

hash each line in text file

I'm trying to write a little script which will open a text file and give me an md5 hash for each line of text. For example I have a file with:
123
213
312
I want output to be:
ba1f2511fc30423bdbb183fe33f3dd0f
6f36dfd82a1b64f668d9957ad81199ff
390d29f732f024a4ebd58645781dfa5a
I'm trying to do this part in bash which will read each line:
#!/bin/bash
#read.file.line.by.line.sh
while read line
do
echo $line
done
later on I do:
$ more 123.txt | ./read.line.by.line.sh | md5sum | cut -d ' ' -f 1
but I'm missing something here, does not work :(
Maybe there is an easier way...
Almost there, try this:
while read -r line; do printf %s "$line" | md5sum | cut -f1 -d' '; done < 123.txt
Unless you also want to hash the newline character in every line you should use printf or echo -n instead of echo option.
In a script:
#! /bin/bash
cat "$#" | while read -r line; do
printf %s "$line" | md5sum | cut -f1 -d' '
done
The script can be called with multiple files as parameters.
You can just call md5sum directly in the script:
#!/bin/bash
#read.file.line.by.line.sh
while read line
do
echo $line | md5sum | awk '{print $1}'
done
That way the script spits out directly what you want: the md5 hash of each line.
this worked for me..
cat $file | while read line; do printf %s "$line" | tr -d '\r\n' | md5 >> hashes.csv; done

Resources