Bash: set a shell variable in the middle of the pipeline - bash

I have text coming from some command (in example it's echo -e "10 ABC \n5 DEF \n87 GHI"). This text goes through the pipeline and I get wanted output (in example it's GHI). Wanted output is sent to the following pipeline step (in example it's | xargs -I {} grep -w {} FILES |).
My question is:
I want to append a variable to an "inter pipe" output before it's sent to a following step - How can I do this?
Example:
echo -e "10 ABC \n5 DEF \n87 GHI" |
sort -nr -k1 |
head -n1 |
cut -f 2 | # Wanted output comes here. I want to append it to a variable before it goes to `grep`
xargs -I {} grep -w {} FILES |
# FOLLOWING ANALYSIS

You can't set a shell variable in the middle of the pipeline, but you can send the output to a file using the tee command, and then read that file later.
echo -e "10 ABC \n5 DEF \n87 GHI" |
sort -nr -k1 |
head -n1 |
cut -f 2 |
tee intermediate.txt |
xargs -I {} grep -w {} FILES |
# FOLLOWING ANALYSIS
# intermediate.txt now contains 87 GHI

How about something like this
echo -e "10 ABC \n5 DEF \n87 GHI" | sort -nr -k1 | head -n1 | cut -f 2 | while read MYVAR; do echo "intermediate value: $MYVAR"; echo $MYVAR | xargs -I {} grep -w {} FILES; done

Insert it to a stream. so I think your looking just to add the ?contents? of a variable to every 'line' from the stream? This prepends the contents of $example
ie
example="A String"
echo -e "10 ABC \n5 DEF \n87 GHI" |
sort -nr -k1 |
head -n1 |
cut -f 2 |
sed s/^/$example/ |
xargs -I {} grep -w {} FILES |
# FOLLOWING ANALYSIS
sed s/$/$example/ to append
NB I tend to do lot of things this way in bash, but a long pipeline of cuts, seds and heads etc does suggest maybe its time to break out awk or perl.

Related

convert bash pipeline into a function with parameter

I have a pipeline I use to preview csv files:
cat file_name.csv | sed -e 's/,,/, ,/g' | column -t -s ","| less -s
But i want to create an alias viewcsv that will allow to just replace the filename.
I tried viewcsv="cat $1 | sed -e 's/,,/, ,/g' | column -t -s ","| less -s" but that didn't work. Googling turned up that I need to convert this pipeline to a function? How can i convert this to a function so that viewcsv file_name.csv will return same output as cat file_name.csv | sed -e 's/,,/, ,/g' | column -t -s ","| less -s does?
Function syntax looks like this:
viewcsv() {
sed -e 's/,,/, ,/g' "$1" | column -t -s ","| less -s
}
Notice that I have replaced cat "$1" | sed with sed "$1".
csvkit has a CSV previewer, by the way:
$ csvlook <<< $'a,b,c\n10,20,30'
| a | b | c |
| -- | -- | -- |
| 10 | 20 | 30 |

GNU parallel with custom script doing string comparison

The follwoing script.sh compares part of a string (coming from stdin by cating a csv file) to a defined string and reports the differences in a certain format
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done < "${1:-/dev/stdin}"
It is intendet to be executed on a number of rows from a very large file in the format
XYZ,ABMDEFG
and it works well when i use it in a pipe:
cat large_file | ./find_something.sh
However, when I try to use it with parallel, i get this error:
$ cat large_file | parallel ./find_something.sh
./find_something.sh: line 9: XYZ, ABMDEFG : No such file or directory
What is causing this? Is parallel supposed to work for something like this, if I want to redirect the output to a single file afterwards?
Less important side note: I'm rather proud of my string comparison method, but if someone has a faster way to get from comparing ABCDEFG and XYZ,ABMDEFG to obtain XYZ,C3M I'd be happy to hear that, too.
Edit:
I should have said, I also want to preserve the order of each line in the output, corresponding to the input. Is that possible using parallel?
Your script accepts its input from a file (defaulting to stdin), whereas parallel will pass input as arguments, not via stdin. In that sense, parallel is closer to xargs.
Presumably, you want each of the lines in large_file to be processed as a unit, possibly in parallel.
That means you need your script to only process one such line at a time, and let parallel call your script many times, once for each line.
So your script should look like this:
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
line="$1"
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
Then you can redirect to a file as follows:
cat large_file | parallel ./find_something.sh > output_file
-k keeps the order.
#!/usr/bin/env bash
doit() {
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done
}
export -f doit
cat large_file | parallel --pipe -k doit
#or
parallel --pipepart -a large_file --block -10 -k doit

Unable to substitute redirection for redundant cat

cat joined.txt | xargs -t -a <(cut --fields=1 | sort -u | grep -E '\S') -I{} --max-args=1 --max-procs=4 echo "mkdir -p imdb/movies/{}; grep '^{}' joined.txt > imdb/movies/{}/movies.txt" | bash
The code above works but substituting the redundant cat at the start of the code with a redirection like below doesn't work and leads to a cut input output error.
< joined.txt xargs -t -a <(cut --fields=1 | sort -u | grep -E '\S') -I{} --max-args=1 --max-procs=4 echo "mkdir -p imdb/movies/{}; grep '^{}' joined.txt > imdb/movies/{}/movies.txt" | bash
In either case, it is the cut command inside the process substitution (and not xargs) that should be reading from joined.txt, so to be completely safe, you should put either the pipe or the input redirection inside the the process substitution. Actually, neither is necessary; cut can just take joined.txt as an argument.
xargs -t -a <( cat joined.txt | cut ... ) ... | bash
or
xargs -t -a <( cut -f1 joined.txt | ... ) ... | bash
However, it would be clearest to skip the process substitution altogether, and pipe the output of that pipeline to xargs:
cut -f joined.txt | sort -u | grep -E '\S' | xargs -t ...

Feed line number to sed

I have a piped unix script which finally yields a line number to me in the subject file.
Now,I need to print out the file contents from this particular line to the end.
Is it possible to feed the line number to sed via xargs,for sed to print out the desired.
.....|tail -1 | cut -f 1 | xargs sed ...?
Is this possible?
..... | tail -1 | cut -f 1 | xargs -i sed -n '{},$p' your_file

Use each line of piped output as parameter for script

I have an application (myapp) that gives me a multiline output
result:
abc|myparam1|def
ghi|myparam2|jkl
mno|myparam3|pqr
stu|myparam4|vwx
With grep and sed I can get my parameters as below
myapp | grep '|' | sed -e 's/^[^|]*//' | sed -e 's/|.*//'
But then want these myparamx values as paramaters of a script to be executed for each parameter.
myscript.sh myparam1
myscript.sh myparam2
etc.
Any help greatly appreciated
Please see xargs. For example:
myapp | grep '|' | sed -e 's/^[^|]*//' | sed -e 's/|.*//' | xargs -n 1 myscript.sh
May be this can help -
myapp | awk -F"|" '{ print $2 }' | while read -r line; do /path/to/script/ "$line"; done
I like the xargs -n 1 solution from Dark Falcon, and while read is a classical tool for such kind of things, but just for completeness:
myapp | awk -F'|' '{print "myscript.sh", $2}' | bash
As a side note, speaking about extraction of 2nd field, you could use cut:
myapp | cut -d'|' -f 1 # -f 1 => second field, starting from 0

Resources