How to pass part of an argument to a gnu parallel command - bash

I'm trying to run a GNU parallel command and pass it a bunch of dates, something like this but a more complex command:
parallel '/some/binary {}' ::: 20131017 20131018
this works, but then i need the dates to span two different months and the command should look like this for argument 20131018:
'/some/binary 201310/20131018'
so it split off the first part of the argument..how can I achieve this effect? Thinking in terms of bash variables I imagine:
'/some/binary {:4}/{}' ::: 20130910 20131018 etc...

The command for parallel is interpreted as a shell command, so you can just do
parallel --gnu 'var="{}"; /some/binary "${var:0:6}/$var"' ::: 20131017 20131018
This will execute
/some/binary 201310/20131017
/some/binary 201310/20131018

From 20140722 you can:
parallel /some/binary '{=s/..$//=}'/{} ::: 20131017 20131018

For pure ugly, don't forget about awk to munge data with the result piped to parallel:
$ echo 20131017 > foo
$ echo 20131018 >> foo
$ awk '{printf "%s/%s\n", substr($1,0,4), $1}' foo | parallel echo
Ugly aside, this is pipeline friendly. A plain print along with some OFS magic would work more cleanly than printf I suspect. You could alternately use sed if that's your jam.
That said, I'd personally modify /some/binary to not expect such wonky input.

Related

AWK script shebang to allow dash-prefixed arguments

I wanted to write a fairly complex AWK script, which would take a bunch of command-line arguments, parse them and then perform some work.
Unfortunately I ran into trouble trying to pass dash-prefixed (-arg) arguments to the script, as they are being interpreted by AWK instead.
$ ./script.awk -arg
awk: not an option: -arg
I noticed the -- option, but I am unsure how to use it in the shebang meaningfully. I was unable to find any way to get the name of the file and reference it in the script's shebang (something like #!/usr/bin/awk -f $FILE --).
Then I thought maybe the -W exec option could be used to solve the issue, but I keep getting the following error (even without attempting to use the -- option with it), which seems to suggest that the name of the file is not even really being appended to the end of the shebang command.
$ ./script.awk
awk: vacuous option: -W exec
awk: 1: unexpected character '.'
Is there a way to make a standalone (single file, no wrapper script) executable AWK script, which can accept dash-prefixed arguments?
Why am I trying to abuse AWK to this extent? Mostly out of curiosity, but also to get rid of the wrapper shell script, which I currently have to use just to execute the AWK script:
#!/bin/sh
awk -f script.awk -- "$#"
The solution should be POSIX-compliant (assuming AWK's path is /usr/bin/awk). Even if you have a non-POSIX-compliant solution, please share it as well.
Understanding the problem:
As far as I understand, the OP has a complex script called script.awk:
#!/usr/bin/awk -f
BEGIN{print "ARGC", ARGC; for(i=0;i<ARGC;++i) print "ARG"i,ARGV[i]}
which the OP would like to call using various traditional POSIX-style one letter options, or GNU-style long options. POSIX options start with a single <hyphen>-character (-) while long options start with a two <hyphen>-characters (--). This, however, fails as awk is interpreting these arguments to be passed on to awk itself and not to the scripts argument list. Eg.
$ ./script.awk
ARGC 1
ARG0 awk
$ ./script.awk -arg
awk: not an option: -arg
Question: Is there a way to write a POSIX compliant script which can handle such hyphenated arguments? (Suggestions are made in the original question.)
Observation 1: While not immediately clear, it must be mentioned that the error message is generated by mawk and not the more common GNU version gawk. Where mawk fails, gawk does not:
$ mawk -f script.awk -arg
mawk: not an option -arg
$ gawk -f script.awk -arg
ARGC 2
ARG0 gawk
ARG1 -arg
Nonetheless, it must be mentioned that for both gawk and mawk, different behavriour can be observed when the arguments clash with the optional arguments of awk. Example:
$ mawk -f script.awk -var # this fails as gawk expects -v ar=foo
mawk: improper assignment: -v ar
$ gawk -f script.awk -var # this fails as gawk expects -v ar=foo
gawk: `oo' argument to `-v' not in `var=value' form
$ gawk -f script.awk -var=1 # this works and creates variable ar
$ mawk -f script.awk -var=1 # this works and creates variable ar
$ mawk -f script.awk -foo # this fails as it expects a file oo
mawk: cannot open oo (No such file or directory)
$ gawk -f script.awk -foo # this fails as it expects a file oo
gawk: fatal: can't open source file `oo' for reading (No such file or directory)
Observation 2: The OP suggests the usage of a double-<hyphen> to indicate that the consecutive options are only part of awk. This, however, is an extension of both mawk and gawk and not part of the POSIX standard.
--: indicates the unambiguous end of options. source: man mawk
--: Signal the end of options. This is useful to allow further arguments to the AWK program itself to start with a -. This provides consistency with the argument parsing convention used by most other POSIX programs. source: man gawk
Furthermore, the usage of the double-hyphen assumes that all arguments after -- are files:
$ ./script.awk -- -arg1 file
ARGC 3
ARG0 mawk
ARG1 -arg1
ARG2 file
mawk: cannot open -arg1 (No such file or directory)
Suggestion 1: While the concept of flags are a nice-to-have, you might consider making use of the standard POSIX compliant assignment as arguments:
$ ./script.awk arg1=1 arg2=1 arg3=1 file
However, the downside of this is that these assignments are only processed after the BEGIN block is executed. (cfr. POSIX standard)
Suggestion 2: a simple improvement would be to make use of ARGV and ARGC and use hyphen-less arguments. This is a bit more BSD-like (cfr ps aux), and could look like:
$ ./script.awk arg1 arg2 arg3
ARGC 4
ARG0 gawk
ARG1 arg1
ARG2 arg2
ARG3 arg3
Suggestion 3: If none of the above options are up to your liking, you have to consider using a hybrid between sh and awk. The word hybrid implies we write syntax that is recognized by both sh and awk. An awk program is composed of pairs of the form:
pattern { action }
where pattern can be ignored. This resembles closely the compound command syntax of sh:
{ compound-list ; }
This allows us now to write the following shell script script.sh:
#!/bin/sh
{ "awk" "-f" "$0" "--" "${#}" ; "exit" ;}
# your awk script comes here
By writing it this way, awk will interpret the first action as nothing more than a concatenation of strings. sh on the other hand will execute it nominally.
Sadly, while it looks promising, this does NOT work due to the effect of the double hyphen.
$ ./script.sh file # this works
ARGC 2
ARG0 awk
ARG1 file
$ ./script.sh -arg file # this does not work
ARGC 3
ARG0 mawk
ARG1 -arg1
ARG2 file
mawk: cannot open -arg1 (No such file or directory)
An ugly solution could be by starting to parse the script itself in itself to remove the first two lines before passing it back to awk. But this will only solve the problem for scripts only having a BEGIN block.

Filtering GNU split with a custom shell function

Is there a way to use GNU split --filter with a custom shell function, like
my_func () {
echo $1
}
split -d 10 INPUT_FILE chunk_ --filter="my_func $FILE$"
which I would expect to output
chunk_00
chunk_01
...
Of course the echo in the custom func is just for expressing my question here, in my concrete case the custom function creates a script that uses the chunks from split as input.
It seems that GNU shell only accepts standard shell commands within --filter.
Any smart way around this?
You can do this by exporting the function to the environment, which is available to the sub-shell run by split. For example with bash:
ex.sh
#!/bin/bash
my_func() {
echo $1
}
export -f my_func
seq inf | split -d --filter='my_func $FILE' /dev/stdin chunk_
If you run it like this:
bash ex.sh | head
The output is:
chunk_00
chunk_01
chunk_02
chunk_03
chunk_04
chunk_05
chunk_06
chunk_07
chunk_08
chunk_09
More details in this answer on UL.
Note that split uses whatever the SHELL variable is set to as the sub-shell to run the --filter command. If you are running a different shell, you may need to add export SHELL=/bin/bash before running split.

Bash how to print out all the input arguments without using loop

I've heard of that loop in bash is in efficient and complicated. so i'm try to make a script that print out all the argument in a specific format without using any loop.
When i do
bash script_name arg1 arg2 arg3...
or
./script_name arg1 arg2 arg3...
it should output:
0: script_name
1: arg1
2: arg2
...
I try to use $# but the problem is it never prints out the script name, and i'm having trouble to come up with a way to print out the index.
can anyone pls give me a hint? thank you
Firstly, $# does not contain the script name, it only contains the actual command-line arguments. You have to print $0 explicitly to print the script name.
Here are two solutions:
exec paste -d: <(seq 0 $#;) <(printf ' %s\n' "$0" "$#";);
The above runs the paste utility to paste together (1) a sequence of numbers from 0 to the number of arguments given, generated by the seq utility, and (2) the script name and command-line arguments, printed by printf, one per line. Each of the seq and printf commands is run in a process substitution construct, with each generated device file representing the command output (e.g. /dev/fd/63 on my system) passed to paste.
exec printf '%s\n' "$0" "$#"| awk '{ print(i++": "$0); };';
The above prints the script name and command-line arguments, one per line, and pipes them to awk, which adds the desired numbering prefix.
It should be noted that each of these two solutions involves starting at least one additional system process, either for process substitution or for a pipeline. That is pretty much guaranteed to be much more inefficient than running the bash built-in for-loop, which requires no additional processes (assuming that you only call built-in stuff within the loop, e.g. printf). In general, built-in commands/keywords/constructs will almost always be faster than running external executables.

printf, ignoring excess arguments?

I noticed today Bash printf has a -v option
-v var assign the output to shell variable VAR rather than
display it on the standard output
If I invoke like this it works
$ printf -v var "Hello world"
$ printf "$var"
Hello world
Coming from a pipe it does not work
$ grep "Hello world" test.txt | xargs printf -v var
-vprintf: warning: ignoring excess arguments, starting with `var'
$ grep "Hello world" test.txt | xargs printf -v var "%s"
-vprintf: warning: ignoring excess arguments, starting with `var'
xargs will invoke /usr/bin/printf (or wherever that binary is installed on your system). It will not invoke bash's builtin function. And only a builtin (or sourcing a script or similar) can modify the shell's environment.
Even if it could call bash's builtin, the xargs in your example runs in a subsell. The subshell cannot modify it's parent's environment anyway. So what you're trying cannot work.
A few options I see if I understand your sample correctly; sample data:
$ cat input
abc other stuff
def ignored
cba more stuff
Simple variable (a bit tricky depending on what exactly you want):
$ var=$(grep a input)
$ echo $var
abc other stuff cba more stuff
$ echo "$var"
abc other stuff
cba more stuff
With an array if you want individual words in the arrays:
$ var=($(grep a input))
$ echo "${var[0]}"-"${var[1]}"
abc-other
Or if you want the whole lines in each array element:
$ IFS=$'\n' var=($(grep a input)) ; unset IFS
$ echo "${var[0]}"-"${var[1]}"
abc other stuff-cba more stuff
There are two printf's - one is a shell bultin and this is invoked if you just run printf and the other is a regular binary, usually /usr/bin/printf. The latter doesn't take a -v argument, hence the error message. Since printf is an argument to xargs here, the binary is run, not the shell bulitin. Additionally, since it's at the receiving end of a pipeline, it is run as a subprocess. Variables can only be inherited from parent to child process but not the other way around, so even if the printf binary could modify the environment, the change wouldn't be visible to the parent process. So there are two reasons why your command cannot work. But you can always do var=$(something | bash -c 'some operation using builtin printf').
Mat gives an excellent explanation of what's going on and why.
If you want to iterate over the output of a command and set a variable to successive values using Bash's sprintf-style printf feature (-v), you can do it like this:
grep "Hello world" test.txt | xargs bash -c 'printf -v var "%-25s" "$#"; do_something_with_formatted "$var"' _ {} \;

Treat arguments as input stream in bash

Is there any bash trick that allows giving some parameters in command line to a program that gets its inputs via input stream? Something like this:
program < 'a=1;b=a*2;'
but < needs a file input stream.
For very short here-documents, there are also here-strings:
program <<< "a=1;b=a*2"
I think
echo 'a=1;b=a*2;' | program
is what you need. This process is called "piping"
As a side note: doing the opposite (i.e. piping other programs output as arguments) could be done with xargs
echo works great. The other answer is Here-documents [1]
program <<EOF
a=1;b=a*2;
EOF
I use echo when I have one very short thing on one line, and heredocs when I have something that requires newlines.
[1] http://tldp.org/LDP/abs/html/here-docs.html
shopt -s expand_aliases
alias 'xscript:'='<<:ends'
xscript: bc | anotherprog | yetanotherprog ...
a=1;b=a*2;
:ends
Took me a year to hack this one out. Premium bash script here fellas. Give respect where due please :)
I call this little 'diddy' xscript because you can expand bash variables and substitutions inside of the here document.
alias 'script:'='<<":ends"'
The above version does not expand substitutions.
xscript: cat
The files in our path are: `ls -A`
:ends
script: cat
The files in our path are: `ls -A`
:ends
I'm not finished!
source <(xscript: cat
echo \$BASH "hello world, I'mma script genius!"
echo You can thank me now $USER
:ends
)

Resources