bash or zsh: how to pass multiple inputs to interactive piped parameters? - bash

I have 3 different files that I want to compare
words_freq
words_freq_deduped
words_freq_alpha
For each file, I run a command like so, which I iterate on constantly to compare the results.
For example, I would do this:
$ cat words_freq | grep -v '[soe]'
$ cat words_freq_deduped | grep -v '[soe]'
$ cat words_freq_alpha | grep -v '[soe]'
and then review the results, and then do it again, with an additional filter
$ cat words_freq | grep -v '[soe]' | grep a | grep r | head -n20
a
$ cat words_freq_deduped | grep -v '[soe]' | grep a | grep r | head -n20
b
$ cat words_freq_alpha | grep -v '[soe]' | grep a | grep r | head -n20
c
This continues on until I've analyzed my data.
I would like to write a script that could take the piped portions, and pass it to each of these files, as I iterate on the grep/head portions of the command.
e.g. The following would dump the results of running the 3 commands above AND also compare the 3 results, and dump additional calculations on them
$ myScript | grep -v '[soe]' | grep a | grep r | head -n20
the letters were in all 3 runs, and it took 5 seconds
a
b
c
How can I do this using bash/python or zsh for the myScript part?
EDIT: After asking the question, it occurred to me that I could use eval to do it, like so, which I've added as an answer as well
The following approach allows me to process multiple files by using eval, which I know is frowned upon - any other suggestions are greatly appreciated!
$ myScript "grep -v '[soe]' | grep a | grep r | head -n20"
myScript
#!/usr/bin/env bash
function doIt(){
FILE=$1
CMD="cat $1 | $2"
echo processing file "$FILE"
eval "$CMD"
echo
}
doIt words_freq "$#"
doIt words_freq_deduped "$#"
doIt words_freq_alpha "$#"

You can't avoid your shell from running pipes itself, so using it like that isn't very practical - you'd need to either quote everything and then eval it, which would make it hard to pass arguments with spaces, or quote every pipe, which you can then eval, making it so you have to quote every pipe. But yeah, these solutions are kinda hacky.
I'd suggest doing one of these two:
Keep your editor open, and put whatever you want to run inside the doIt function itself before you run it. Then run it in your shell without any arguments:
#!/usr/bin/env bash
doIt() {
# grep -v '[soe]' < "$1"
grep -v '[soe]' < "$1" | grep a | grep r | head -n20
}
doIt words_freq
doIt words_freq_deduped
doIt words_freq_alpha
Or, you could always use a "for" in your shell, which you can use Ctrl+r to find in your history when you want to use:
$ for f in words_freq*; do grep -v '[soe]' < "$f" | grep a | grep r | head -n20; done
But if you really want your approach, I tried to make it accept spaces, but it ended up being even hackier:
#!/usr/bin/env bash
doIt() {
local FILE=$1
shift
echo processing file "$FILE"
local args=()
for n in $(seq 1 $#); do
arg=$1
shift
if [[ $arg == '|' ]]; then
args+=('|')
else
args+=("\"$arg\"")
fi
done
eval "cat '$FILE' | ${args[#]}"
}
doIt words_freq "$#"
doIt words_freq_deduped "$#"
doIt words_freq_alpha "$#"
With this version you can use it like this:
$ ./myScript grep "a a" "|" head -n1
Notice that it need you to quote the |, and that it now handles arguments with spaces.

Not fully understood problem correctly.
I understood you want to write a script without pipes, by including the filtering logic into the script.
And feeding the filtering patterns as arguments.
Here is a gawk script (standard Linux awk).
With one sweep on 3 input files, without piping.
script.awk
BEGIN {
RS="!#!#!#!#!#!#!#";
# set record separator to something unlikely matched, causing each file to be read entirely as a single record
}
$0 !~ excludeRegEx # if file does not match excludeRegEx
&& $0 ~ includeRegEx1 # and match includeRegEx1
&& $0 ~ includeRegEx2 { # and match includeRegEx2
system "head -n20 "FILENAME; # call shell command "head -n20 " on current filename
}
Running script.awk
awk -v excludeRegEx='[soe]' \
-v includeRegEx1='a' \
-v includeRegEx2='r' \
-f script.awk words_freq words_freq_deduped words_freq_alpha

The following approach allows me to process multiple files by using eval, which I know is frowned upon - any other suggestions are greatly appreciated!
$ myScript "grep -v '[soe]' | grep a | grep r | head -n20"
myScript
#!/usr/bin/env bash
function doIt(){
FILE=$1
CMD="cat $1 | $2"
echo processing file "$FILE"
eval "$CMD"
echo
}
doIt words_freq "$#"
doIt words_freq_deduped "$#"
doIt words_freq_alpha "$#"

Related

How to filter all the paths from the urls using "sed" or "grep"

I was trying to filter all the files from the URLs and get only paths.
echo -e "http://sub.domain.tld/secured/database_connect.php\nhttp://sub.domain.tld/section/files/image.jpg\nhttp://sub.domain.tld/.git/audio-files/top-secret/audio.mp3" | grep -Ei "(http|https)://[^/\"]+" | sort -u
http://sub.domain.tld
But I want the result like this
http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/
Is there any way to do it with sed or grep
Using grep
$ echo ... | grep -o '.*/'
http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/
with grep
If your grep has the -o option:
... | grep -Eio 'https?://.*/'
If there could be multiple URLs per line:
... | grep -Eio 'https?://[^[:space:]]+/'
with sed
If the input is always precisely one URL per line and nothing else, you can just delete the filename part:
... | sed 's/[^/]*$//'
You could use match function of awk, will work in any version of awk. Simple explanation would be, passing echo command's output to awk program. Using match matching everything till last occurrence of / and then printing the sub-string to print just before /(with -1 to RLENGTH).
your_echo_command | awk 'match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}'
GNU Awk
$ echo ... | awk 'match($0,/.*\//,a){print a[0]}'
$ echo ... | awk '{print gensub(/(.*\/).*/,"\\1",1)}'
$ echo ... | awk 'sub(/[^/]*$/,"")'
http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/
xargs
$ echo ... | xargs -i sh -c 'echo $(dirname "{}")/'
http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/

echo strings with envrionment variables from lines pulled from a file in bash

I have a file like so:
- ${VAR1}/blah/blah:/blah1
- ${VAR2}/blah/blah:/blah2
- $VAR3:/blah3
I ultimately need to create those three folders.
I am using sed to extract the folder part:
$ cat test.txt | grep -E '^ +- \$.*?:.*?$' | sed 's/.*- \(\$.*\):.*/\1/g'
${VAR1}/blah/blah
${VAR2}/blah/blah
$VAR3
I need to create those folders but I need those shell variables to expand. Right now they don't:
$ cat test.txt | grep -E '^ +- \$.*?:.*?$' | sed 's/.*- \(\$.*\):.*/\1/g' | while read line; do echo "$line"; done
${VAR1}/blah/blah
${VAR2}/blah/blah
$VAR3
Is there a way to get the expanded strings so I can run mkdir instead of echo to make the folders?
You may use this bash script with envsubst:
#!/usr/bin/env bash
export VAR1 VAR2 VAR3
while IFS=' -:' read -r _ d _; do
mkdir -p "$d"
done < <(envsubst < test.txt)
Alternatively use this envsubst + awk + xargs solution:
envsubst < text.txt |
awk -F '[-:[:blank:]]+' -v ORS='\0' '{print $2}' |
xargs -0 mkdir -p
First of all those variables should be exported to be accessible from your script. Then you could just use the cut and tr commands combination to extract dir name in a loop like the following:
#!/bin/bash -eu
while read -r LINE; do
echo "$LINE" | cut -d ':' -f 1 | tr -d ' ' | tr -d '-'
done < test.txt

GNU parallel with custom script doing string comparison

The follwoing script.sh compares part of a string (coming from stdin by cating a csv file) to a defined string and reports the differences in a certain format
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done < "${1:-/dev/stdin}"
It is intendet to be executed on a number of rows from a very large file in the format
XYZ,ABMDEFG
and it works well when i use it in a pipe:
cat large_file | ./find_something.sh
However, when I try to use it with parallel, i get this error:
$ cat large_file | parallel ./find_something.sh
./find_something.sh: line 9: XYZ, ABMDEFG : No such file or directory
What is causing this? Is parallel supposed to work for something like this, if I want to redirect the output to a single file afterwards?
Less important side note: I'm rather proud of my string comparison method, but if someone has a faster way to get from comparing ABCDEFG and XYZ,ABMDEFG to obtain XYZ,C3M I'd be happy to hear that, too.
Edit:
I should have said, I also want to preserve the order of each line in the output, corresponding to the input. Is that possible using parallel?
Your script accepts its input from a file (defaulting to stdin), whereas parallel will pass input as arguments, not via stdin. In that sense, parallel is closer to xargs.
Presumably, you want each of the lines in large_file to be processed as a unit, possibly in parallel.
That means you need your script to only process one such line at a time, and let parallel call your script many times, once for each line.
So your script should look like this:
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
line="$1"
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
Then you can redirect to a file as follows:
cat large_file | parallel ./find_something.sh > output_file
-k keeps the order.
#!/usr/bin/env bash
doit() {
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done
}
export -f doit
cat large_file | parallel --pipe -k doit
#or
parallel --pipepart -a large_file --block -10 -k doit

bash scripting cat and echo

I new in bash scripting and i bug with this:
tab=( "^[A-Z]\{4,\}[0-9]\{4,\}" )
for (( i=0; i<=$(( ${#tab[*]} - 1 )); i++ ))
do
tmp+=" grep -v \"${tab[i]}\" |"
done
# for remove the last |
chaine=`echo $tmp| rev | cut -c2- | rev`
#result anticipe "cat ${oldConfFile[0]} | grep -v "^[A-Z]\{4,\}[0-9]\{4,\}"
cat ${oldConfFile[0]} | echo $chaine
My trouble is there, how use cat and echo at the same time ?
thanks a lot.
You don't need to grep for each pattern, just join your patterns with pipes (|) and grep once. For example, if you want to filter out lines that contain foo, bar and baz from file, use the following:
grep -v 'foo|bar|baz' file
And you can build up the pattern outside of the invocation for better readability like this:
my_pattern='foo'
my_pattern+='|bar'
my_pattern+='|baz'
grep -v "$my_pattern" file

why shell for expression cannot parse xargs parameter correctly

I have a black list to save tag id list, e.g. 1-3,7-9, actually it represents 1,2,3,7,8,9. And could expand it by below shell
for i in {1..3,7..9}; do for j in {$i}; do echo -n "$j,"; done; done
1,2,3,7,8,9
but first I should convert - to ..
echo -n "1-3,7-9" | sed 's/-/../g'
1..3,7..9
then put it into for expression as a parameter
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # for i in {#}; do for j in {$i}; do echo -n "$j,"; done; done
zsh: parse error near `do'
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # echo #
1..3,7..9
but for expression cannot parse it correctly, why is so?
Because you didn't do anything to stop the outermost shell from picking up the special keywords and characters ( do, for, $, etc ) that you mean to be run by xargs.
xargs isn't a shell built-in; it gets the command line you want it to run for each element on stdin, from its arguments. just like any other program, if you want ; or any other sequence special to be bash in an argument, you need to somehow escape it.
It seems like what you really want here, in my mind, is to invoke in a subshell a command ( your nested for loops ) for each input element.
I've come up with this; it seems to to the job:
echo -n "1-3,7-9" \
| sed 's/-/../g' \
| xargs -I # \
bash -c "for i in {#}; do for j in {\$i}; do echo -n \"\$j,\"; done; done;"
which gives:
{1..3},{7..9},
Could use below shell to achieve this
# Mac newline need special treatment
echo "1-3,7-9" | sed -e 's/-/../g' -e $'s/,/\\\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,%
#Linux
echo "1-3,7-9" | sed -e 's/-/../g' -e 's/,/\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,
but use this way is a little complicated maybe awk is more intuitive
# awk
echo "1-3,7-9,11,13-17" | awk '{n=split($0,a,","); for(i=1;i<=n;i++){m=split(a[i],a2,"-");for(j=a2[1];j<=a2[m];j++){print j}}}' | tr '\n' ','
1,2,3,7,8,9,11,13,14,15,16,17,%
echo -n "1-3,7-9" | perl -ne 's/-/../g;$,=",";print eval $_'

Resources