AWK script shebang to allow dash-prefixed arguments

AWK script shebang to allow dash-prefixed arguments - shell

I wanted to write a fairly complex AWK script, which would take a bunch of command-line arguments, parse them and then perform some work.
Unfortunately I ran into trouble trying to pass dash-prefixed (-arg) arguments to the script, as they are being interpreted by AWK instead.
$ ./script.awk -arg
awk: not an option: -arg
I noticed the -- option, but I am unsure how to use it in the shebang meaningfully. I was unable to find any way to get the name of the file and reference it in the script's shebang (something like #!/usr/bin/awk -f $FILE --).
Then I thought maybe the -W exec option could be used to solve the issue, but I keep getting the following error (even without attempting to use the -- option with it), which seems to suggest that the name of the file is not even really being appended to the end of the shebang command.
$ ./script.awk
awk: vacuous option: -W exec
awk: 1: unexpected character '.'
Is there a way to make a standalone (single file, no wrapper script) executable AWK script, which can accept dash-prefixed arguments?
Why am I trying to abuse AWK to this extent? Mostly out of curiosity, but also to get rid of the wrapper shell script, which I currently have to use just to execute the AWK script:
#!/bin/sh
awk -f script.awk -- "$#"
The solution should be POSIX-compliant (assuming AWK's path is /usr/bin/awk). Even if you have a non-POSIX-compliant solution, please share it as well.

Understanding the problem:
As far as I understand, the OP has a complex script called script.awk:
#!/usr/bin/awk -f
BEGIN{print "ARGC", ARGC; for(i=0;i<ARGC;++i) print "ARG"i,ARGV[i]}
which the OP would like to call using various traditional POSIX-style one letter options, or GNU-style long options. POSIX options start with a single <hyphen>-character (-) while long options start with a two <hyphen>-characters (--). This, however, fails as awk is interpreting these arguments to be passed on to awk itself and not to the scripts argument list. Eg.
$ ./script.awk
ARGC 1
ARG0 awk
$ ./script.awk -arg
awk: not an option: -arg
Question: Is there a way to write a POSIX compliant script which can handle such hyphenated arguments? (Suggestions are made in the original question.)
Observation 1: While not immediately clear, it must be mentioned that the error message is generated by mawk and not the more common GNU version gawk. Where mawk fails, gawk does not:
$ mawk -f script.awk -arg
mawk: not an option -arg
$ gawk -f script.awk -arg
ARGC 2
ARG0 gawk
ARG1 -arg
Nonetheless, it must be mentioned that for both gawk and mawk, different behavriour can be observed when the arguments clash with the optional arguments of awk. Example:
$ mawk -f script.awk -var # this fails as gawk expects -v ar=foo
mawk: improper assignment: -v ar
$ gawk -f script.awk -var # this fails as gawk expects -v ar=foo
gawk: `oo' argument to `-v' not in `var=value' form
$ gawk -f script.awk -var=1 # this works and creates variable ar
$ mawk -f script.awk -var=1 # this works and creates variable ar
$ mawk -f script.awk -foo # this fails as it expects a file oo
mawk: cannot open oo (No such file or directory)
$ gawk -f script.awk -foo # this fails as it expects a file oo
gawk: fatal: can't open source file `oo' for reading (No such file or directory)
Observation 2: The OP suggests the usage of a double-<hyphen> to indicate that the consecutive options are only part of awk. This, however, is an extension of both mawk and gawk and not part of the POSIX standard.
--: indicates the unambiguous end of options. source: man mawk
--: Signal the end of options. This is useful to allow further arguments to the AWK program itself to start with a -. This provides consistency with the argument parsing convention used by most other POSIX programs. source: man gawk
Furthermore, the usage of the double-hyphen assumes that all arguments after -- are files:
$ ./script.awk -- -arg1 file
ARGC 3
ARG0 mawk
ARG1 -arg1
ARG2 file
mawk: cannot open -arg1 (No such file or directory)
Suggestion 1: While the concept of flags are a nice-to-have, you might consider making use of the standard POSIX compliant assignment as arguments:
$ ./script.awk arg1=1 arg2=1 arg3=1 file
However, the downside of this is that these assignments are only processed after the BEGIN block is executed. (cfr. POSIX standard)
Suggestion 2: a simple improvement would be to make use of ARGV and ARGC and use hyphen-less arguments. This is a bit more BSD-like (cfr ps aux), and could look like:
$ ./script.awk arg1 arg2 arg3
ARGC 4
ARG0 gawk
ARG1 arg1
ARG2 arg2
ARG3 arg3
Suggestion 3: If none of the above options are up to your liking, you have to consider using a hybrid between sh and awk. The word hybrid implies we write syntax that is recognized by both sh and awk. An awk program is composed of pairs of the form:
pattern { action }
where pattern can be ignored. This resembles closely the compound command syntax of sh:
{ compound-list ; }
This allows us now to write the following shell script script.sh:
#!/bin/sh
{ "awk" "-f" "$0" "--" "${#}" ; "exit" ;}
# your awk script comes here
By writing it this way, awk will interpret the first action as nothing more than a concatenation of strings. sh on the other hand will execute it nominally.
Sadly, while it looks promising, this does NOT work due to the effect of the double hyphen.
$ ./script.sh file # this works
ARGC 2
ARG0 awk
ARG1 file
$ ./script.sh -arg file # this does not work
ARGC 3
ARG0 mawk
ARG1 -arg1
ARG2 file
mawk: cannot open -arg1 (No such file or directory)
An ugly solution could be by starting to parse the script itself in itself to remove the first two lines before passing it back to awk. But this will only solve the problem for scripts only having a BEGIN block.

Related

Trying to run few awk commands stored in file

I have a file with below commands
cat /some/dir/with/files/file1_name.tsv|awk -F "\\t" '{print $21$19$23$15}'
cat /some/dir/with/files/file2_name.tsv|awk -F "\\t" '{print $2$13$3$15}'
cat /some/dir/with/files/file3_name.tsv|awk -F "\\t" '{print $22$19$3$15}'
When i loop through the file to run the command, i get below error
cat file | while read line; do $line; done
cat: invalid option -- 'F'
Try `cat --help' for more information.

You are not executing the command properly as you intended it. Since you are reading line by line on the file (for unknown reason) you could call the interpreter directly as below
#!/bin/bash
# ^^^^ for running under 'bash' shell
while IFS= read -r line
do
printf "%s" "$line" | bash
done <file
But this has an overhead of creating a forking a new process for each line of the file. If your commands present under file are harmless and is safe to be run in one shot, you can just as
bash file
and be done with it.
Also for using awk, just do as below for each of the lines to avoid useless cat
awk -F "\\t" '{print $21$19$23$15}' file1_name.tsv

You are expecting the pipe (|) symbol to act as you are accustomed to, but it doesn't. To help you understand, try this :
A="ls / | grep e" # Loads a variable with a command with pipe
$A # Does not work
eval "$A" # Works
When expanding a variable without using eval, expansion and word splitting occurs after the shell interprets redirections and pipes, so your pipe symbol is seen just as a literal character.
Some options you have :
A) Avoid piping, by passing the file name as an argument
awk -F "\\t" '{print $21$19$23$15}' /some/dir/with/files/file1_name.tsv
B) Use eval as shown below, the potential security implications of which I would suggest you to research.
C) Put arguments in file and parse it, avoiding the use of eval, something like :
# Assumes arguments separated by spaces
IFS=" " read -r -a arguments;
awk "${arguments[#]-}"
D) Implement the parsing of your data files in Bash instead of awk, and use your configuration file to specify output without the need for expanding anything (e.g. by specifying fields to print separated by spaces).
The first three approaches involve some form of interpretation of outside data as code, and that comes with risks if the file used as input cannot be guaranteed safe. Approach C might be considered a bit better in that regard, but since the command you are calling is awk, an actual program is passed to awk, so whatever awk can do, an attacker (or careless user) with write access to your file can cause your script to do anything awk can do.

How do I pass command line options to a Perl program with perl -e?

I want to pass command line options that start with a dash (- or --) to a Perl programm I am running with the -e flag:
$ perl -E 'say #ARGV' -foo
Unrecognized switch: -foo (-h will show valid options).
Passing arguments that don't start with a - obviously work:
$ perl -E 'say #ARGV' foo
foo
How do I properly escape those so the program reads them correctly?
I tried a bunch of variations like \-foo, \\-foo, '-foo', '\-foo', '\\-foo'. None of those work though some produce different messages. \\-foo actually runs and outputs \-foo.

You can use the -s, like:
perl -se 'print "got $some\n"' -- -some=SOME
the above prints:
got SOME
From the perlrun:
-s enables rudimentary switch parsing for switches on the command
line after the program name but before any
filename arguments (or before an argument of --). Any switch found there is removed from #ARGV and sets
the corresponding variable in the Perl program. The following program prints "1" if the program is
invoked with a -xyz switch, and "abc" if it is invoked with -xyz=abc.
#!/usr/bin/perl -s
if ($xyz) { print "$xyz\n" }
Do note that a switch like --help creates the variable "${-help}", which is not compliant with "use strict
"refs"". Also, when using this option on a script with warnings enabled you may get a lot of spurious
"used only once" warnings.
For the simple arg-passing use the --, like:
perl -E 'say "#ARGV"' -- -some -xxx -ddd
prints
-some -xxx -ddd

Just pass -- before the flags that are to go to the program, like so:
perl -e 'print join("/", #ARGV)' -- -foo bar
prints
-foo/bar

Create file from shell script (E.g, this-is-the-title)

I am trying to create a file using the following script (see below). While the script runs without errors (at least according to shellcheck), I cannot get the resulting file to have the correct name.
#!/bin/bash
# Set some variables
export site_path=~/Documents/Blog
drafts_path=~/Documents/Blog/_drafts
title="$title"
# Create the filename
title=$("$title" | "awk {print tolower($0)}")
filename="$title.markdown"
file_path="$drafts_path/$filename"
echo "File path: $file_path"
# Create the file, Add metadata fields
cat >"$file_path" <<EOL
---
title: \"$title\"
layout:
tags:
---
EOL
# Open the file in BBEdit
bbedit "$file_path"
exit 0
Very new to bash, so I'm not quite sure what I'm doing wrong...

The most glaring error is this:
title=$("$title" | "awk {print tolower($0)}")
It's wrong for several reasons:
This pipeline runs "$title" as a command -- meaning that it looks for a command named with the title of your blog post to run -- and pipes the output of that command (a command that presumably won't exist) to awk.
Using double-quotes around the entire awk command means you're looking for a command named something like /usr/bin/awk {print tolower(bash-)} (if $0 evaluates to bash-, which it will in an interactive interpreter; behavior will differ elsewhere).
Using double-quotes rather than single-quotes to protect your awk script means that the $0 gets evaluated to the shell rather than by awk.
A better alternative might look like:
title=$(awk '{print tolower($0)}' <<<"$title")
...or, to use simpler tools:
title=$(tr '[:upper:]' '[:lower:]' <<<"$title")
...or, to use bash 4.x built-in functionality:
title=${title,,}
Of course, all that assumes that title is set to start with. If you aren't passing it through your environment, you might want something like title=$1 rather than title="$title" earlier in your script.

Passing a variable into awk within a shell script

I have a shell script that I'm writing to search for a process by name and return output if that process is over a given value.
I'm working on finding the named process first. The script currently looks like this:
#!/bin/bash
findProcessName=$1
findCpuMax=$2
#echo "parameter 1: $findProcessName, parameter2: $findCpuMax"
tempFile=`mktemp /tmp/processsearch.XXXXXX`
#echo "tempDir: $tempFile"
processSnapshot=`ps aux > $tempFile`
findProcess=`awk -v pname="$findProcessName" '/pname/' $tempFile`
echo "process line: "$findProcess
`rm $tempFile`
The error is occuring when I try to pass the variable into the awk command. I checked my version of awk and it definitely does support the -v flag.
If I replace the '/pname/' portion of the findProcess variable assignment the script works.
I checked my syntax and it looks right. Could anyone point out where I'm going wrong?

The processSnapshot will always be empty: the ps output is going to the file
when you pass the pattern as a variable, use the pattern match operator:
findProcess=$( awk -v pname="$findProcessName" '$0 ~ pname' $tempFile )
only use backticks when you need the output of a command. This
`rm $tempFile`
executes the rm command, returns the output back to the shell and, it the output is non-empty, the shell attempts to execute that output as a command.
$ `echo foo`
bash: foo: command not found
$ `echo whoami`
jackman
Remove the backticks.
Of course, you don't need the temp file at all:
pgrep -fl $findProcessName

printf, ignoring excess arguments?

I noticed today Bash printf has a -v option
-v var assign the output to shell variable VAR rather than
display it on the standard output
If I invoke like this it works
$ printf -v var "Hello world"
$ printf "$var"
Hello world
Coming from a pipe it does not work
$ grep "Hello world" test.txt | xargs printf -v var
-vprintf: warning: ignoring excess arguments, starting with `var'
$ grep "Hello world" test.txt | xargs printf -v var "%s"
-vprintf: warning: ignoring excess arguments, starting with `var'

xargs will invoke /usr/bin/printf (or wherever that binary is installed on your system). It will not invoke bash's builtin function. And only a builtin (or sourcing a script or similar) can modify the shell's environment.
Even if it could call bash's builtin, the xargs in your example runs in a subsell. The subshell cannot modify it's parent's environment anyway. So what you're trying cannot work.
A few options I see if I understand your sample correctly; sample data:
$ cat input
abc other stuff
def ignored
cba more stuff
Simple variable (a bit tricky depending on what exactly you want):
$ var=$(grep a input)
$ echo $var
abc other stuff cba more stuff
$ echo "$var"
abc other stuff
cba more stuff
With an array if you want individual words in the arrays:
$ var=($(grep a input))
$ echo "${var[0]}"-"${var[1]}"
abc-other
Or if you want the whole lines in each array element:
$ IFS=$'\n' var=($(grep a input)) ; unset IFS
$ echo "${var[0]}"-"${var[1]}"
abc other stuff-cba more stuff

There are two printf's - one is a shell bultin and this is invoked if you just run printf and the other is a regular binary, usually /usr/bin/printf. The latter doesn't take a -v argument, hence the error message. Since printf is an argument to xargs here, the binary is run, not the shell bulitin. Additionally, since it's at the receiving end of a pipeline, it is run as a subprocess. Variables can only be inherited from parent to child process but not the other way around, so even if the printf binary could modify the environment, the change wouldn't be visible to the parent process. So there are two reasons why your command cannot work. But you can always do var=$(something | bash -c 'some operation using builtin printf').

Mat gives an excellent explanation of what's going on and why.
If you want to iterate over the output of a command and set a variable to successive values using Bash's sprintf-style printf feature (-v), you can do it like this:
grep "Hello world" test.txt | xargs bash -c 'printf -v var "%-25s" "$#"; do_something_with_formatted "$var"' _ {} \;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio