Function with awk to print single or multiple columns - bash

I use awk a lot to select single columns and after learning what an alias was I started off with
alias a1='awk '\{print $1}'\'
alias a2='awk '\{print $2}'\'
...
After I learned a little more I thought those were cheesy and replaced them with
function a() {
awk "{print \$$1}"
}
so now I can do a 3 or a 11 without needing to create explicit aliases.
So that's good, but sometimes I need to select more than one column, and when I do I have to resort to typing out the actual full awk '{print ...}' command (the horror!).
So I'm trying to come up with a way to do something similar to the a function but one that will accept different numbers of arguments, so I could do a 3 or a 5 7 or a 2 4 9.
I've tried diff things with $# and $* but can't get it right and everything I'm trying now I know are cheesy workarounds and so I'd rather just stop and ask how to do it the proper way.
Thanks all.

$ cat tst.sh
function a {
awk -v args="$*" '
BEGIN { n=split(args,f) }
{ for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS) }
'
}
echo "a b c d e f" | a 1 3 5
echo "---"
echo "a b c d e f" | a 1 3 4 6
$ ./tst.sh
a c e
---
a c d f

You could get arbitrary complicated with this sort of thing (what if you wanted to be able to say a 2-5 7 11-, as with cut?) but here's one that will work with a list of numbers:
a() { (IFS=,; awk '{print '"${*/#/$}"'}'); }
That requires a bit of explanation.
a() { ... }
defines a shell function, which differs from an alias in various ways, one of which being that you can give it parameters.
Inside the shell function, I want to change the value of IFS; to avoid having to remember the old value and change it back, I surround the command I actually want to execute with (...), which causes it to execute in a subshell. When the subshell finishes, all environmental changes finish with it, so it effectively makes the change to IFS local.
IFS is the set of characters used for word splitting, but it also defines the character used to separate elements in the expansion of "$*" (that is, the list of function or script arguments) when it is surrounded by quotes. So setting it to , means the $* expansion will be a comma-separated list.
The awk program I want to create is actually something like {print $1,$4,$7}, so aside from putting commas between the list, I need to add a $ before each number. I do that with the bash parameter expansion substitute syntax: ${parameter/pattern/replacement}. By specifying * as the parameter, I get $* with the substitution applied to each argument. (Note that the expansion is quoted. If it weren't, it wouldn't work.)
In the replacement expression, the pattern is empty because the # character at the beginning of the pattern indicates that the match must be at the beginning of the string. Since the actual pattern is empty, the first match always be at the beginning of the string and the replacement ($) will therefore be inserted at the beginning of each argument. The # is needed because // is syntactically different: it means "change all occurrences of the pattern", instead of just the first one.
Unlike many languages, in bash search-and-replace expressions are not terminated with a /, but rather with the matching }. If you type ${p/foo/bar/}, it will replace the first instance of foo with bar/.

Related

Parameter expansion with replacement, avoid additional variable

I'm trying to join input $* which is one parameter consisting of all the parameters added together.
This works.
#!/bin/bash
foo() {
params="${*}"
echo "${params//[[:space:]]/-}"
}
foo 1 2 3 4
1-2-3-4
However, is it possible to skip the assignment of variable?
"${"${*}"//[[:space:]]/-}"
I'm getting bad substitution error.
I can also do
: "${*}"
echo "${_//[[:space:]]/-}"
But it feels hacky.
One option could be to set bash's internal field separator, IFS, to - locally and just echo "$*":
foo() {
local IFS=$'-'
echo "$*"
}
To answer your question, you can do global pattern substitutions on the positional parameters like this:
${*//pat/sub}
${#//pat/sub}
And also arrays like this:
${arr[*]//pat/sub}
${arr[#]//pat/sub}
This won’t join the parameters, but substitute inside them.
Setting IFS to dash adds a dash in between each parameter for echo "$*", or p=$*, but won’t replace anything inside a parameter.
Eg:
$ set -- aa bb 'cc cc'
$ IFS=-
$ echo "$*"
aa-bb-cc cc
To remove all whitespace, including inside a parameter, you can combine them:
IFS=-
echo "${*//[[:space:]]/-}"
Or just assign to a name first, like you were doing:
no_spaces=$*
echo "${no_spaces//[[:space:]]/-}"

Looping through variable with spaces

This piece of code works as expected:
for var in a 'b c' d;
do
echo $var;
done
The bash script loops through 3 arguments printing
a
b c
d
However, if this string is read in via jq , and then looped over like so:
JSON_FILE=path/to/jsonfile.json
ARGUMENTS=$(jq -r '.arguments' "${JSON_FILE}")
for var in ${ARGUMENTS};
do
echo $var;
done
The result is 4 arguments as follows:
a
'b
c'
d
Example json file for reference:
{
"arguments" : "a 'b c' d"
}
What is the reason for this? I tried putting quotes around the variable like suggested in other SO answers but that caused everything to just be handled as 1 argument.
What can I do to get the behavior of the first case (3 arguments)?
What is the reason for this?
The word splitting expansion is run over unquoted results of other expansions. Because ${ARGUMENTS} expansion in for var in ${ARGUMENTS}; is unquoted, word splitting is performed. No, word splitting ignores quotes resulted from variable expansion - it only cares about whitespaces.
What can I do to get the behavior of the first case (3 arguments)?
The good way™ would be to write your own parser, to parse the quotes inside the strings and split the argument depending on the quotes.
I advise to use xargs, it (by default, usually a confusing behavior) parses quotes in the input strings:
$ arguments="a 'b c' d"
$ echo "${arguments}" | xargs -n1 echo
a
b c
d
# convert to array
$ readarray -d '' arr < <(<<<"${arguments}" xargs printf "%s\0")
As presented in the other answer, you may use eval, but please do not, eval is evil and will run expansions over the input string.
Change IFS to a new line to make it work:
...
IFS='\n'; for var in $ARGUMENTS;
do
echo $var;
done

return array from perl to bash

I'm trying to get back an array from perl to bash.
My perl scrip has an array and then I use return(#arr)
from my bash script I use
VAR = `perl....
when I echo VAR
I get the aray as 1 long string with all the array vars connected with no spaces.
Thanks
In the shell (and in Perl), backticks (``) capture the output of a command. However, Perl's return is normally for returning variables from subroutines - it does not produce output, so you probably want print instead. Also, in bash, array variables are declared with parentheses. So this works for me:
$ ARRAY=(`perl -wMstrict -le 'my #array = qw/foo bar baz/; print "#array"'`); \
echo "<${ARRAY[*]}> 0=${ARRAY[0]} 1=${ARRAY[1]} 2=${ARRAY[2]}"
<foo bar baz> 0=foo 1=bar 2=baz
In Perl, interpolating an array into a string (like "#array") will join the array with the special variable $" in between elements; that variable defaults to a single space. If you simply print #array, then the array elements will be joined by the variable $,, which is undef by default, meaning no space between the elements. This probably explains the behavior you mentioned ("the array vars connected with no spaces").
Note that the above will not work the way you expect if the elements of the array contain whitespace, because bash will split them into separate array elements. If your array does contain whitespace, then please provide an MCVE with sample data so we can perhaps make an alternative suggestion of how to return that back to bash. For example:
( # subshell so IFS is only affected locally
IFS=$'\n'
ARRAY=(`perl -wMstrict -e 'my #array = ("foo","bar","quz baz"); print join "\n", #array'`)
echo "0=<${ARRAY[0]}> 1=<${ARRAY[1]}> 2=<${ARRAY[2]}>"
)
Outputs: 0=<foo> 1=<bar> 2=<quz baz>
Here is one way using Bash word splitting, it will split the string on white space into the new array array:
array_str=$(perl -E '#a = 1..5; say "#a"')
array=( $array_str )
for item in ${array[#]} ; do
echo ": $item"
done
Output:
: 1
: 2
: 3
: 4
: 5

How to parse a string into variables?

I know how to parse a string into variables in the manner of this SO question, e.g.
ABCDE-123456
becomes:
var1=ABCDE
var2=123456
via, say, cut. I can do that in one script, no problem.
But I have a few dozen scripts which parse strings/arguments all in the same fashion (same arguments & variables, i.e. same parsing strategy).
And sometimes I need to make a change or add a variable to the parsing mechanism.
Of course, I could go through every one of my dozens of scripts and change the parsing manually (even if just copy & paste), but that would be tedious and more error-prone to bugs/mistakes.
Is there a modular way to do parse strings/arguments as such?
I thought of writing a script which parses the string/args into variables and then exports, but the export command does not work form child-to-parent, (only vice-versa).
Something like this might work:
parse_it () {
SEP=${SEP--}
string=$1
names=${#:2}
IFS="$SEP" read $names <<< "$string"
}
$ parse_it ABCDE-123456 var1 var2
$ echo "$var1"
ABCDE
$ echo "$var2"
123456
$ SEP=: parse_it "foo:bar:baz" id1 id2 id3
$ echo $id2
bar
The first argument is the string to parse, the remaining arguments are names of variables that get passed to read as the variables to set. (Not quoting $names here is intentional, as we will let the shell split the string into multiple words, one per variable. Valid variable names consist of only _, letters, and numbers, so there are no worries about undesired word splitting or pathname generation by not quoting $names). The function assumes the string uses a single separator of "-", which can be overridden via the environment.
For more complex parsing, you may want to use a custom regular expression (bash 4 or later required for the -g flag to declare):
parse_it () {
reg_ex=$1
string=$2
shift 2
[[ $string =~ $reg_ex ]] || return
i=1
for name; do
declare -g "$name=${BASH_REMATCH[i++]}"
done
}
$ parse_it '(.*)-(.*):(.*)' "abc-123:xyz" id1 id2 id3
$ echo "$id2"
123
I think what you really want is to write your function in one script and include it in all of your other scripts.
You can include other shell scripts by the source or . command.
For example, you can define your parse function in parseString.sh
function parseString {
...
}
And then in any of your other script, do
source parseString.sh
# now we can call parseString function
parseString abcde-12345

Reading java .properties file from bash

I am thinking of using sed for reading .properties file, but was wondering if there is a smarter way to do that from bash script?
This would probably be the easiest way: grep + cut
# Usage: get_property FILE KEY
function get_property
{
grep "^$2=" "$1" | cut -d'=' -f2
}
The solutions mentioned above will work for the basics. I don't think they cover multi-line values though. Here is an awk program that will parse Java properties from stdin and produce shell environment variables to stdout:
BEGIN {
FS="=";
print "# BEGIN";
n="";
v="";
c=0; # Not a line continuation.
}
/^\#/ { # The line is a comment. Breaks line continuation.
c=0;
next;
}
/\\$/ && (c==0) && (NF>=2) { # Name value pair with a line continuation...
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e - 1); # Trim off the backslash.
c=1; # Line continuation mode.
next;
}
/^[^\\]+\\$/ && (c==1) { # Line continuation. Accumulate the value.
v= "" v substr($0,1,length($0)-1);
next;
}
((c==1) || (NF>=2)) && !/^[^\\]+\\$/ { # End of line continuation, or a single line name/value pair
if (c==0) { # Single line name/value pair
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e);
} else { # Line continuation mode - last line of the value.
c=0; # Turn off line continuation mode.
v= "" v $0;
}
# Make sure the name is a legal shell variable name
gsub(/[^A-Za-z0-9_]/,"_",n);
# Remove newlines from the value.
gsub(/[\n\r]/,"",v);
print n "=\"" v "\"";
n = "";
v = "";
}
END {
print "# END";
}
As you can see, multi-line values make things more complex. To see the values of the properties in shell, just source in the output:
cat myproperties.properties | awk -f readproperties.awk > temp.sh
source temp.sh
The variables will have '_' in the place of '.', so the property some.property will be some_property in shell.
If you have ANT properties files that have property interpolation (e.g. '${foo.bar}') then I recommend using Groovy with AntBuilder.
Here is my wiki page on this very topic.
I wrote a script to solve the problem and put it on my github.
See properties-parser
One option is to write a simple Java program to do it for you - then run the Java program in your script. That might seem silly if you're just reading properties from a single properties file. However, it becomes very useful when you're trying to get a configuration value from something like a Commons Configuration CompositeConfiguration backed by properties files. For a time, we went the route of implementing what we needed in our shell scripts to get the same behavior we were getting from CompositeConfiguration. Then we wisened up and realized we should just let CompositeConfiguration do the work for us! I don't expect this to be a popular answer, but hopefully you find it useful.
If you want to use sed to parse -any- .properties file, you may end up with a quite complex solution, since the format allows line breaks, unquoted strings, unicode, etc: http://en.wikipedia.org/wiki/.properties
One possible workaround would using java itself to preprocess the .properties file into something bash-friendly, then source it. E.g.:
.properties file:
line_a : "ABC"
line_b = Line\
With\
Breaks!
line_c = I'm unquoted :(
would be turned into:
line_a="ABC"
line_b=`echo -e "Line\nWith\nBreaks!"`
line_c="I'm unquoted :("
Of course, that would yield worse performance, but the implementation would be simpler/clearer.
In Perl:
while(<STDIN>) {
($prop,$val)=split(/[=: ]/, $_, 2);
# and do stuff for each prop/val
}
Not tested, and should be more tolerant of leading/trailing spaces, comments etc., but you get the idea. Whether you use Perl (or another language) over sed is really dependent upon what you want to do with the properties once you've parsed them out of the file.
Note that (as highlighted in the comments) Java properties files can have multiple forms of delimiters (although I've not seen anything used in practice other than colons). Hence the split uses a choice of characters to split upon.
Ultimately, you may be better off using the Config::Properties module in Perl, which is built to solve this specific problem.
I have some shell scripts that need to look up some .properties and use them as arguments to programs I didn't write. The heart of the script is a line like this:
dbUrlFile=$(grep database.url.file etc/zocalo.conf | sed -e "s/.*: //" -e "s/#.*//")
Effectively, that's grep for the key and filter out the stuff before the colon and after any hash.
if you want to use "shell", the best tool to parse files and have proper programming control is (g)awk. Use sed only simple substitution.
I have sometimes just sourced the properties file into the bash script. This will lead to environment variables being set in the script with the names and contents from the file. Maybe that is enough for you, too. If you have to do some "real" parsing, this is not the way to go, of course.
Hmm, I just run into the same problem today. This is poor man's solution, admittedly more straightforward than clever;)
decl=`ruby -ne 'puts chomp.sub(/=(.*)/,%q{="\1";}).gsub(".","_")' my.properties`
eval $decl
then, a property 'my.java.prop' can be accessed as $my_java_prop.
This can be done with sed or whatever, but I finally went with ruby for its 'irb' which was handy for experimenting.
It's quite limited (dots should be replaced only before '=',no comment handling), but could be a starting point.
#Daniel, I tried to source it, but Bash didn't like dots in variable names.
I have had some success with
PROPERTIES_FILE=project.properties
function source_property {
local name=$1
eval "$name=\"$(sed -n '/^'"$name"'=/,/^[A-Z]\+_*[A-Z]*=/p' $PROPERTIES_FILE|sed -e 's/^'"$name"'=//g' -e 's/"/\\"/g'|head -n -1)\""
}
source_property 'SOME_PROPERTY'
This is a solution that properly parses quotes and terminates at a space when not given quotes. It is safe: no eval is used.
I use this code in my .bashrc and .zshrc for importing variables from shell scripts:
# Usage: _getvar VARIABLE_NAME [sourcefile...]
# Echos the value that would be assigned to VARIABLE_NAME
_getvar() {
local VAR="$1"
shift
awk -v Q="'" -v QQ='"' -v VAR="$VAR" '
function loc(text) { return index($0, text) }
function unquote(d) { $0 = substr($0, eq+2) d; print substr($0, 1, loc(d)-1) }
{ sub(/^[ \t]+/, ""); eq = loc("=") }
substr($0, 1, eq-1) != VAR { next } # assignment is not for VAR: skip
loc("=" QQ) == eq { unquote(QQ); exit }
loc("=" Q) == eq { unquote( Q); exit }
{ print substr($1, eq + 1); exit }
' "$#"
}
This saves the desired variable name and then shifts the argument array so the rest can be passed as files to awk.
Because it's so hard to call shell variables and refer to quote characters inside awk, I'm defining them as awk variables on the command line. Q is a single quote (apostrophe) character, QQ is a double quote, and VAR is that first argument we saved earlier.
For further convenience, there are two helper functions. The first returns the location of the given text in the current line, and the second prints the content between the first two quotes in the line using quote character d (for "delimiter"). There's a stray d concatenated to the first substr as a safety against multi-line strings (see "Caveats" below).
While I wrote the code for POSIX shell syntax parsing, that appears to only differ from your format by whether there is white space around the asignment. You can add that functionality to the above code by adding sub(/[ \t]*=[ \t]*/, "="); before the sub(…) on awk's line 4 (note: line 1 is blank).
The fourth line strips off leading white space and saves the location of the first equals sign. Please verify that your awk supports \t as tab, this is not guaranteed on ancient UNIX systems.
The substr line compares the text before the equals sign to VAR. If that doesn't match, the line is assigning a different variable, so we skip it and move to the next line.
Now we know we've got the requested variable assignment, so it's just a matter of unraveling the quotes. We do this by searching for the first location of =" (line 6) or =' (line 7) or no quotes (line 8). Each of those lines prints the assigned value.
Caveats: If there is an escaped quote character, we'll return a value truncated to it. Detecting this is a bit nontrivial and I decided not to implement it. There's also a problem of multi-line quotes, which get truncated at the first line break (this is the purpose of the "stray d" mentioned above). Most solutions on this page suffer from these issues.
In order to let Java do the tricky parsing, here's a solution using jrunscript to print the keys and values in a bash read-friendy (key, tab character, value, null character) way:
#!/usr/bin/env bash
jrunscript -e '
p = new java.util.Properties();
p.load(java.lang.System.in);
p.forEach(function(k,v) { out.format("%s\t%s\000", k, v); });
' < /tmp/test.properties \
| while IFS=$'\t' read -d $'\0' -r key value; do
key=${key//./_}
printf -v "$key" %s "$value"
printf '=> %s = "%s"\n' "$key" "$value"
done
I found printf -v in this answer by #david-foerster.
To quote jrunscript: Warning: Nashorn engine is planned to be removed from a future JDK release

Resources