Unable to substitute redirection for redundant cat - bash

cat joined.txt | xargs -t -a <(cut --fields=1 | sort -u | grep -E '\S') -I{} --max-args=1 --max-procs=4 echo "mkdir -p imdb/movies/{}; grep '^{}' joined.txt > imdb/movies/{}/movies.txt" | bash
The code above works but substituting the redundant cat at the start of the code with a redirection like below doesn't work and leads to a cut input output error.
< joined.txt xargs -t -a <(cut --fields=1 | sort -u | grep -E '\S') -I{} --max-args=1 --max-procs=4 echo "mkdir -p imdb/movies/{}; grep '^{}' joined.txt > imdb/movies/{}/movies.txt" | bash

In either case, it is the cut command inside the process substitution (and not xargs) that should be reading from joined.txt, so to be completely safe, you should put either the pipe or the input redirection inside the the process substitution. Actually, neither is necessary; cut can just take joined.txt as an argument.
xargs -t -a <( cat joined.txt | cut ... ) ... | bash
or
xargs -t -a <( cut -f1 joined.txt | ... ) ... | bash
However, it would be clearest to skip the process substitution altogether, and pipe the output of that pipeline to xargs:
cut -f joined.txt | sort -u | grep -E '\S' | xargs -t ...

Related

GNU parallel with custom script doing string comparison

The follwoing script.sh compares part of a string (coming from stdin by cating a csv file) to a defined string and reports the differences in a certain format
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done < "${1:-/dev/stdin}"
It is intendet to be executed on a number of rows from a very large file in the format
XYZ,ABMDEFG
and it works well when i use it in a pipe:
cat large_file | ./find_something.sh
However, when I try to use it with parallel, i get this error:
$ cat large_file | parallel ./find_something.sh
./find_something.sh: line 9: XYZ, ABMDEFG : No such file or directory
What is causing this? Is parallel supposed to work for something like this, if I want to redirect the output to a single file afterwards?
Less important side note: I'm rather proud of my string comparison method, but if someone has a faster way to get from comparing ABCDEFG and XYZ,ABMDEFG to obtain XYZ,C3M I'd be happy to hear that, too.
Edit:
I should have said, I also want to preserve the order of each line in the output, corresponding to the input. Is that possible using parallel?
Your script accepts its input from a file (defaulting to stdin), whereas parallel will pass input as arguments, not via stdin. In that sense, parallel is closer to xargs.
Presumably, you want each of the lines in large_file to be processed as a unit, possibly in parallel.
That means you need your script to only process one such line at a time, and let parallel call your script many times, once for each line.
So your script should look like this:
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
line="$1"
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
Then you can redirect to a file as follows:
cat large_file | parallel ./find_something.sh > output_file
-k keeps the order.
#!/usr/bin/env bash
doit() {
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done
}
export -f doit
cat large_file | parallel --pipe -k doit
#or
parallel --pipepart -a large_file --block -10 -k doit

How to save the output of an sh to a Groovy variable?

I need to store the output of this command into a variable:
sh "curl -s 'http://nexus-cicd.stgcloud.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xml' | grep '<version>.*</version>' | sort --version-sort | uniq | tail -n1 | sed -e 's#\\(.*\\)\\(<version>\\)\\(.*\\)\\(</version>\\)\\(.*\\)#\\3#g'"
I tried the following, but echo output null
parentLast = sh ("curl -s 'http://nexus-cicd.stgcloud.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xml' | grep '<version>.*</version>' | sort --version-sort | uniq | tail -n1 | sed -e 's#\\(.*\\)\\(<version>\\)\\(.*\\)\\(</version>\\)\\(.*\\)#\\3#g'")
echo "$parentLast"

Grep Spellchecker

I am trying to write a simple shell script that takes a text file as input and checks all non-punctuated words against a dictionary (english.txt). It should return all non-matching (misspelled) words. I am using grep but it does not seem to successfully match all the lines in english.txt. I have included my code below.
#!/bin/bash
cat $1 |
tr ' \t' '\n\n' |
sed -e "/'/d" |
tr -d '[:punct:]' |
tr -cd '[:alpha:]\n' |
sed -e "/^$/d" |
grep -v -i -w -f english.txt

echo -e cat: argument line too long

I have bash script that would merge a huge list of text files and filter it. However I'll encounter 'argument line too long' error due to the huge list.
echo -e "`cat $dir/*.txt`" | sed '/^$/d' | grep -v "\-\-\-" | sed '/</d' | tr -d \' | tr -d '\\\/<>(){}!?~;.:+`*-_ͱ' | tr -s ' ' | sed 's/^[ \t]*//' | sort -us -o $output
I have seen some similar answers here and i know i could rectify it using find and cat the files 1st. However, i would i like to know what is the best way to run a one liner code using echo -e and cat without breaking the code and to avoid the argument line too long error. Thanks.
First, with respect to the most immediate problem: Using find ... -exec cat -- {} + or find ... -print0 | xargs -0 cat -- will prevent more arguments from being put on the command line to cat than it can handle.
The more portable (POSIX-specified) alternative to echo -e is printf '%b\n'; this is available even in configurations of bash where echo -e prints -e on output (as when the xpg_echo and posix flags are set).
However, if you use read without the -r argument, the backslashes in your input string are removed, so neither echo -e nor printf %b will be able to process them later.
Fixing this can look like:
while IFS= read -r line; do
printf '%b\n' "$line"
done \
< <(find "$dir" -name '*.txt' -exec cat -- '{}' +) \
| sed [...]
grep -v '^$' $dir/*.txt | grep -v "\-\-\-" | sed '/</d' | tr -d \' \
| tr -d '\\\/<>(){}!?~;.:+`*-_ͱ' | tr -s ' ' | sed 's/^[ \t]*//' \
| sort -us -o $output
If you think about it some more you can probably get rid of a lot more stuff and turn it into a single sed and sort, roughly:
sed -e '/^$/d' -e '/\-\-\-/d' -e '/</d' -e 's/\'\\\/<>(){}!?~;.:+`*-_ͱ//g' \
-e 's/ / /g' -e 's/^[ \t]*//' $dir/*.txt | sort -us -o $output

OpenStreetMap Bash / CGI Script

I am trying to build a Bash CGI Script that takes in coordinates as parameters from the url and uses osmosis to extract the map and the splitter and mkgmap to make the map so that it can be opened with Qlandkarte. My problem being is that when i type wget localhost/cgi-bin/script.pl?top=42&left=10&bottom=39&right=9&file=map.osm the linux terminal reads the file with the coordinates. How can I make wget just activate the script so it takes the coordinates and executes the commands. And also when the map is created at the end how can a return the file that was created by the script.
Thanks
#!/bin/bash
TOP=`echo "$QUERY_STRING" | grep -oE "(^|[?&])top=[0.0-9.0]+" | cut -f 2 -d "=" | head -n1`
LEFT=`echo "$QUERY_STRING" | grep -oE "(^|[?&])left=[0.0-9.0]+" | cut -f 2 -d "=" | head -n1`
BOTTOM=`echo "$QUERY_STRING" | grep -oE "(^|[?&])bottom=[0.0-9.0]+" | cut -f 2 -d "=" | head -n1`
RIGHT=`echo "$QUERY_STRING" | grep -oE "(^|[?&])right=[0.0-9.0]+" | cut -f 2 -d "=" | head -n1`
FILE=`echo "$QUERY_STRING" | grep -oE "(^|[?&])file=[^&]+" | sed "s/%20/ /g" | cut -f 2 -d "="`
$(sudo osmosis --read-xml file=bulgaria.osm --bounding-box top=$TOP left=$LEFT bottom=$BOTTOM right=$RIGHT --write-xml file=$FILE)
$(sudo java -Xmx900m -jar splitter.jar --max-nodes=110000 $FILE)
$(sudo java -ea -Xmx900m -jar mkgmap.jar --tdbfile --route -c template.args)
echo "Content-type: text/html"
echo ""

Resources