Reading java .properties file from bash - bash

I am thinking of using sed for reading .properties file, but was wondering if there is a smarter way to do that from bash script?

This would probably be the easiest way: grep + cut
# Usage: get_property FILE KEY
function get_property
{
grep "^$2=" "$1" | cut -d'=' -f2
}

The solutions mentioned above will work for the basics. I don't think they cover multi-line values though. Here is an awk program that will parse Java properties from stdin and produce shell environment variables to stdout:
BEGIN {
FS="=";
print "# BEGIN";
n="";
v="";
c=0; # Not a line continuation.
}
/^\#/ { # The line is a comment. Breaks line continuation.
c=0;
next;
}
/\\$/ && (c==0) && (NF>=2) { # Name value pair with a line continuation...
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e - 1); # Trim off the backslash.
c=1; # Line continuation mode.
next;
}
/^[^\\]+\\$/ && (c==1) { # Line continuation. Accumulate the value.
v= "" v substr($0,1,length($0)-1);
next;
}
((c==1) || (NF>=2)) && !/^[^\\]+\\$/ { # End of line continuation, or a single line name/value pair
if (c==0) { # Single line name/value pair
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e);
} else { # Line continuation mode - last line of the value.
c=0; # Turn off line continuation mode.
v= "" v $0;
}
# Make sure the name is a legal shell variable name
gsub(/[^A-Za-z0-9_]/,"_",n);
# Remove newlines from the value.
gsub(/[\n\r]/,"",v);
print n "=\"" v "\"";
n = "";
v = "";
}
END {
print "# END";
}
As you can see, multi-line values make things more complex. To see the values of the properties in shell, just source in the output:
cat myproperties.properties | awk -f readproperties.awk > temp.sh
source temp.sh
The variables will have '_' in the place of '.', so the property some.property will be some_property in shell.
If you have ANT properties files that have property interpolation (e.g. '${foo.bar}') then I recommend using Groovy with AntBuilder.
Here is my wiki page on this very topic.

I wrote a script to solve the problem and put it on my github.
See properties-parser

One option is to write a simple Java program to do it for you - then run the Java program in your script. That might seem silly if you're just reading properties from a single properties file. However, it becomes very useful when you're trying to get a configuration value from something like a Commons Configuration CompositeConfiguration backed by properties files. For a time, we went the route of implementing what we needed in our shell scripts to get the same behavior we were getting from CompositeConfiguration. Then we wisened up and realized we should just let CompositeConfiguration do the work for us! I don't expect this to be a popular answer, but hopefully you find it useful.

If you want to use sed to parse -any- .properties file, you may end up with a quite complex solution, since the format allows line breaks, unquoted strings, unicode, etc: http://en.wikipedia.org/wiki/.properties
One possible workaround would using java itself to preprocess the .properties file into something bash-friendly, then source it. E.g.:
.properties file:
line_a : "ABC"
line_b = Line\
With\
Breaks!
line_c = I'm unquoted :(
would be turned into:
line_a="ABC"
line_b=`echo -e "Line\nWith\nBreaks!"`
line_c="I'm unquoted :("
Of course, that would yield worse performance, but the implementation would be simpler/clearer.

In Perl:
while(<STDIN>) {
($prop,$val)=split(/[=: ]/, $_, 2);
# and do stuff for each prop/val
}
Not tested, and should be more tolerant of leading/trailing spaces, comments etc., but you get the idea. Whether you use Perl (or another language) over sed is really dependent upon what you want to do with the properties once you've parsed them out of the file.
Note that (as highlighted in the comments) Java properties files can have multiple forms of delimiters (although I've not seen anything used in practice other than colons). Hence the split uses a choice of characters to split upon.
Ultimately, you may be better off using the Config::Properties module in Perl, which is built to solve this specific problem.

I have some shell scripts that need to look up some .properties and use them as arguments to programs I didn't write. The heart of the script is a line like this:
dbUrlFile=$(grep database.url.file etc/zocalo.conf | sed -e "s/.*: //" -e "s/#.*//")
Effectively, that's grep for the key and filter out the stuff before the colon and after any hash.

if you want to use "shell", the best tool to parse files and have proper programming control is (g)awk. Use sed only simple substitution.

I have sometimes just sourced the properties file into the bash script. This will lead to environment variables being set in the script with the names and contents from the file. Maybe that is enough for you, too. If you have to do some "real" parsing, this is not the way to go, of course.

Hmm, I just run into the same problem today. This is poor man's solution, admittedly more straightforward than clever;)
decl=`ruby -ne 'puts chomp.sub(/=(.*)/,%q{="\1";}).gsub(".","_")' my.properties`
eval $decl
then, a property 'my.java.prop' can be accessed as $my_java_prop.
This can be done with sed or whatever, but I finally went with ruby for its 'irb' which was handy for experimenting.
It's quite limited (dots should be replaced only before '=',no comment handling), but could be a starting point.
#Daniel, I tried to source it, but Bash didn't like dots in variable names.

I have had some success with
PROPERTIES_FILE=project.properties
function source_property {
local name=$1
eval "$name=\"$(sed -n '/^'"$name"'=/,/^[A-Z]\+_*[A-Z]*=/p' $PROPERTIES_FILE|sed -e 's/^'"$name"'=//g' -e 's/"/\\"/g'|head -n -1)\""
}
source_property 'SOME_PROPERTY'

This is a solution that properly parses quotes and terminates at a space when not given quotes. It is safe: no eval is used.
I use this code in my .bashrc and .zshrc for importing variables from shell scripts:
# Usage: _getvar VARIABLE_NAME [sourcefile...]
# Echos the value that would be assigned to VARIABLE_NAME
_getvar() {
local VAR="$1"
shift
awk -v Q="'" -v QQ='"' -v VAR="$VAR" '
function loc(text) { return index($0, text) }
function unquote(d) { $0 = substr($0, eq+2) d; print substr($0, 1, loc(d)-1) }
{ sub(/^[ \t]+/, ""); eq = loc("=") }
substr($0, 1, eq-1) != VAR { next } # assignment is not for VAR: skip
loc("=" QQ) == eq { unquote(QQ); exit }
loc("=" Q) == eq { unquote( Q); exit }
{ print substr($1, eq + 1); exit }
' "$#"
}
This saves the desired variable name and then shifts the argument array so the rest can be passed as files to awk.
Because it's so hard to call shell variables and refer to quote characters inside awk, I'm defining them as awk variables on the command line. Q is a single quote (apostrophe) character, QQ is a double quote, and VAR is that first argument we saved earlier.
For further convenience, there are two helper functions. The first returns the location of the given text in the current line, and the second prints the content between the first two quotes in the line using quote character d (for "delimiter"). There's a stray d concatenated to the first substr as a safety against multi-line strings (see "Caveats" below).
While I wrote the code for POSIX shell syntax parsing, that appears to only differ from your format by whether there is white space around the asignment. You can add that functionality to the above code by adding sub(/[ \t]*=[ \t]*/, "="); before the sub(…) on awk's line 4 (note: line 1 is blank).
The fourth line strips off leading white space and saves the location of the first equals sign. Please verify that your awk supports \t as tab, this is not guaranteed on ancient UNIX systems.
The substr line compares the text before the equals sign to VAR. If that doesn't match, the line is assigning a different variable, so we skip it and move to the next line.
Now we know we've got the requested variable assignment, so it's just a matter of unraveling the quotes. We do this by searching for the first location of =" (line 6) or =' (line 7) or no quotes (line 8). Each of those lines prints the assigned value.
Caveats: If there is an escaped quote character, we'll return a value truncated to it. Detecting this is a bit nontrivial and I decided not to implement it. There's also a problem of multi-line quotes, which get truncated at the first line break (this is the purpose of the "stray d" mentioned above). Most solutions on this page suffer from these issues.

In order to let Java do the tricky parsing, here's a solution using jrunscript to print the keys and values in a bash read-friendy (key, tab character, value, null character) way:
#!/usr/bin/env bash
jrunscript -e '
p = new java.util.Properties();
p.load(java.lang.System.in);
p.forEach(function(k,v) { out.format("%s\t%s\000", k, v); });
' < /tmp/test.properties \
| while IFS=$'\t' read -d $'\0' -r key value; do
key=${key//./_}
printf -v "$key" %s "$value"
printf '=> %s = "%s"\n' "$key" "$value"
done
I found printf -v in this answer by #david-foerster.
To quote jrunscript: Warning: Nashorn engine is planned to be removed from a future JDK release

Related

'sed' replace last patern and delete others pattern

I want to replace only the last string "delay" by "ens_delay" in my file and delete the others one before the last one:
Input file:
alpha_notify_teta=''
alpha_notify_check='YES'
text='CRDS'
delay=''
delay=''
delay=''
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_orange='YES'
alpha_orange_interval='300'
alpha_notification_level='ALL'
expression='YES'
delay='9'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
Output file: (expected value)
alpha_notify_teta=''
alpha_notify_check='YES'
text='CRDS'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_orange='YES'
alpha_orange_interval='300'
alpha_notification_level='ALL'
expression='YES'
ens_delay='9'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
Here my first command but it doesn't work because it will work only if I have delay as last line.
sed -e '$,/delay/ s/delay/ens_delay/'
My second command will delete all lines contain "delay", even "ens_delay" will be deleted.
sed -i '/delay/d'
Thank you
This might work for you (GNU sed):
sed '/^delay=/,$!b;/^delay=/!H;//{x;s/^[^\n]*\n\?//;/./p;x;h};$!d;x;s/^/ens_/' file
Lines before the first line beginning delay= should be printed as normal. Otherwise, a line beginning delay= is stored in the hold space and subsequent lines that do not begin delay= are appended to it. Should the hold space already contain such lines, the first line is deleted and the remaining lines printed before the hold space is replaced by the current line. At the end of the file, the first line of the hold space is amended to prepend the string ens_ and then the whole of the hold space is printed.
You cannot do this kind of thing with sed. There is no way in sed to "look forward" and tell if there are more matches to the pattern. You can kind of look back, but that won't be sufficient to solve this problem.
This perl script will solve it:
#!/usr/bin/perl
use strict;
use warnings;
my ($seek, $replacement, $last, #new) = (shift, shift, 0);
open(my $fh, shift) or die $!;
my #l = <$fh>;
close($fh) or die $!;
foreach (reverse #l){
if(/$seek/){
if ($last++ == 0){
s/$seek/$replacement/;
} else {
next;
}
}
unshift(#new, $_);
}
print join "", #new;
Call like:
./script delay= ens_delay= inputfile
I chose to entirely eliminate lines which you intended to delete rather than collapse them in to a single blank line. If that is really required then it's a bit more complicated: the first such line in any consecutive set (or rather the last such) must be pushed on to the output list and you have to track whether this has just been done so you know whether to push the next time, too.
You could also solve this problem with awk, python, or any number of other languages. Just not sed.
Have this monster:
sed -e "1,$(expr $(sed -n '/^delay=/=' your_file.txt | tail -1) - 1)"'s/^delay=.*$//' \
-e 's/^delay=/ens_delay=/' your_file.txt
Here:
sed -n '/^delay=/=' your_file.txt | tail -1 return the last line number of the encountered pattern (let's name it X)
expr is used to get the X-1 line
"1,X-1"'[command]' means "perform this command betwen the first and the X-1 line included (I used double quotes to let the expansion getting done)
's/^delay=.*$//' the said [command]
-e 's/^delay=/ens_delay=/' the next expression to perform (will occur only on the last line)
Output:
alpha_notify_teta=''
alpha_notify_check='YES'
text='CRDS'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_hsm_backup_notification='YES'
alpha_orange='YES'
alpha_orange_interval='300'
alpha_notification_level='ALL'
expression='YES'
ens_delay='9'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_hsm_backup_notification='YES'
If you want to delete the lines instead of leaving them blank:
sed -e "1,$(expr $(sed -n '/^delay=/=' your_file.txt | tail -1) - 1)"'{/^delay=.*$/d}' \
-e 's/^delay=/ens_delay=/' your_file.txt
As was mentioned elsewhere, sed can't know which occurrence of a substring is the last one. But awk can keep track of things in arrays. For example, the following will delete all duplicate assignments, as well ask making your substitution:
awk 'BEGIN{FS=OFS="="} $1=="delay"{$1="ens_delay"} !($1 in a){o[++i]=$1} {a[$1]=$0} END{for(x=0;x<i;x++) printf "%s\n",a[o[x]]}' inputfile
Or, broken out for easier reading/comments:
BEGIN {
FS=OFS="=" # set the field separator, to help isolate the left hand side
}
$1=="delay" {
$1="ens_delay" # your field substitution
}
!($1 in a) {
o[++i]=$1 # if we haven't seen this variable, record its position
}
{
a[$1]=$0 # record the value of the last-seen occurrence of this variable
}
END {
for (x=0;x<i;x++) # step through the array,
printf "%s\n",a[o[x]] # printing the last-seen values, in the order
} # their variable was first seen in the input file.
You might not care about the order of the variables. If so, the following might be simpler:
awk 'BEGIN{FS=OFS="="} $1=="delay"{$1="ens_delay"} {o[$1]=$0} END{for(i in o) printf "%s\n", o[i]}' inputfile
This simply stores the last-seen line in an array whose key is the variable name, then prints out the content of the array in an unknown order.
Assuming I understand your specifications properly, this should do what you need. Given infile x,
$: last=$( grep -n delay x|tail -1|sed 's/:.*//' )
This grep's the file for all lines with delay and returns them with the line number prepended with a colon. The tail -1 grabs the last of those lines, ignoring all the others. sed 's/:.*//' strips the colon and the actual line content, leaving only the number (here it was 14.)
That all evaluates out to assign 14 as $last.
$: sed '/delay/ { '$last'!d; '$last' s/delay/ens_delay/; }' x
alpha_notify_teta=''
alpha_notify_check='YES'
text='CRDS'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_orange='YES'
alpha_orange_interval='300'
alpha_notification_level='ALL'
expression='YES'
ens_delay='9'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
Apologies for the ugly catenation. What this does is writes the script using the value of $last so that the result looks like this to sed:
$: sed '/delay/ { 14!d; 14 s/delay/ens_delay/; }' x
sed reads leading numbers as line selectors, so what this script of commands do -
First, sed automatically prints lines unless told not to, so by default it would just print every line. The script modifies that.
/delay/ {...} is a pattern-based record selector. It will apply the commands between the {} to all lines that match /delay/, which is why it doesn't need another grep - it handles that itself. Inside the curlies, the script does two things.
First, 14!d says (only if this line has delay, which it will) that if the line number is 14, do not (the !) delete the record. Since all the other lines with delay won't be line 14 (or whatever value of the last one the earlier command created), those will get deleted, which automatically restarts the cycle and reads the next record.
Second, if the line number is 14, then it won't delete, and so will progress to the s/delay/ens_delay/ which updates your value.
For all lines that don't match /delay/, sed just prints them as-is.

Regex inside sed inside for cycle in bash applied on file

I have file like this in variable (let's say ${var_file}):
ABC+123+456+789+12'\r
DEF+987+98790+12+00'\r
GHI+12+12?+39+123498345+21+1'\r
ABC+485+2?'\r
ABC+34?+8kj+3949+1+sdfkj+sdfkj'\r
GHC+++sdf'\r
ABC+123++235+5435'\r
I also have a variable $var1 containing number in INT format and $var2 containing start of line.
I need a sed command (or awk/cut?) or function that will be part of bash script, that will replace $var1 position between + delimiters on $var2 line start (for example ABC or ABC+123), globally on the file (for all possible lines).
There is also possibility of ?+, since ? is escape character, so in this case + does not work as delimiter, it's normal text. Lines should be always long enough to contain defined + count in $var1 for selected line start in $var2.
Example of output for position 3 ($var1 = "3", between 3rd and 4th +) on line starting with ABC+123 ($var2 = "ABC+123")
Therefore output should be:
ABC+123+456++12'\r
DEF+987+98790+12+00'\r
GHI+12+12?+39+123498345+21+1'\r
ABC+485+2?'\r
ABC+34?+8kj+3949+1+sdfkj+sdfkj'\r
GHC+++sdf'\r
ABC+123+++5435'\r
The change is on line 1 and line 7.
In case of just delete 3rd position ($var1 = "3") on line starting with ABC ($var2 = "ABC"):
ABC+123+456++12'\r
DEF+987+98790+12+00'\r
GHI+12+12?+39+123498345+21+1'\r
ABC+485+2?'\r
ABC+34?+8kj+3949++sdfkj+sdfkj'\r
GHC+++sdf'\r
ABC+123+++5435'\r
The change is on line 1, 5 and 7.
Can someone help me with this? I tried various sed commands and i just can't find solution..
Thank you!
Your question isn't very clear but I THINK what you want to do is:
awk -v start="$var2" -v col="$var1" '
BEGIN { FS=OFS="+" }
{ gsub(/[?][+]/,RS) }
index($0,start)==1 { $(col+1)="" }
{ gsub(RS,"?+") } 1
' file
You can try this one with sed
sed 's/?+/\n/g;/'"$var2"'/s/\(+[^+]*\)/+/'"$var1"';s/\n/?+/g' file

Nested dollar signs inside quotes

Trying to write a bash script containing nested dollar variables and I can't get it to work :
#!/bin/bash
sed '4s/.*/$(grep "remote.*$1" /home/txtfile)/' /home/target
The error says :
sed / -e expression #1, char 30: unkown option to 's'
The problem seems to come from $1 which need to be replaced by the parameter passed from the bash call and then the whole $(...) needs to be replaced by the command call so we replace the target line 4 by the string output.
Variable expansion and Command substitution won't be done when put inside single quotes, use double quotes instead:
sed "4s/.*/$(grep "remote.*$1" /home/txtfile)/" /home/target
Your approach is wrong, the right way to do what you want is just one command, something like this (depending on your possible $1 values and input file contents which you haven't shown us):
awk -v tgt='remote.*$1' '
NR==FNR { if ($0 ~ tgt) str = str $0 ORS; next }
FNR==4 { printf "%s", str; next }
{ print }
' /home/txtfile /home/target

bash script to modify and extract information

I am creating a bash script to modify and summarize information with grep and sed. But it gets stuck.
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
#Extract lines starting with ">#HWI"
ONLY=`grep -v ^\>#HWI`
#replaces A and G with R in lines
ONLYR=`sed -e s/A/R/g -e s/G/R/g $ONLY`
grep R $ONLYR | wc -l
The correct way to write a shell script to do what you seem to be trying to do is:
awk '
!/^>#HWI/ {
gsub(/[AG]/,"R")
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
Just put that in the file myscript.sh and execute it as you do today.
To be clear - the bulk of the above code is an awk script, the shell script part is the first and last lines where the shell just calls awk and passes it the input file names.
If you WANT to have intermediate variables then you can create/print them with:
awk '
!/^>#HWI/ {
only = $0
onlyR = only
gsub(/[AG]/,"R",onlyR)
print "only:", only
print "onlyR:", onlyR
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
The above will work robustly, portably, and efficiently on all UNIX systems.
First of all, and as #fedorqui commented - you're not providing grep with a source of input, against which it will perform line matching.
Second, there are some problems in your script, which will result in unwanted behavior in the future, when you decide to manipulate some data:
Store matching lines in an array, or a file from which you'll later read values. The variable ONLY is not the right data structure for the task.
By convention, environment variables (PATH, EDITOR, SHELL, ...) and internal shell variables (BASH_VERSION, RANDOM, ...) are fully capitalized. All other variable names should be lowercase. Since
variable names are case-sensitive, this convention avoids accidentally overriding environmental and internal variables.
Here's a better version of your script, considering these points, but with an open question regarding what you were trying to do in the last line : grep R $ONLYR | wc -l :
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
input_file=$1
# Read lines not matching the provided regex, from $input_file
mapfile -t only < <(grep -v '^\>#HWI' "$input_file")
#replaces A and G with R in lines
for((i=0;i<${#only[#]};i++)); do
only[i]="${only[i]//[AG]/R}"
done
# DEBUG
printf '%s\n' "Here are the lines, after relpace:"
printf '%s\n' "${only[#]}"
# I'm not sure what you were trying to do here. Am I gueesing right that you wanted
# to count the number of R's in ALL lines ?
# grep R $ONLYR | wc -l

Function with awk to print single or multiple columns

I use awk a lot to select single columns and after learning what an alias was I started off with
alias a1='awk '\{print $1}'\'
alias a2='awk '\{print $2}'\'
...
After I learned a little more I thought those were cheesy and replaced them with
function a() {
awk "{print \$$1}"
}
so now I can do a 3 or a 11 without needing to create explicit aliases.
So that's good, but sometimes I need to select more than one column, and when I do I have to resort to typing out the actual full awk '{print ...}' command (the horror!).
So I'm trying to come up with a way to do something similar to the a function but one that will accept different numbers of arguments, so I could do a 3 or a 5 7 or a 2 4 9.
I've tried diff things with $# and $* but can't get it right and everything I'm trying now I know are cheesy workarounds and so I'd rather just stop and ask how to do it the proper way.
Thanks all.
$ cat tst.sh
function a {
awk -v args="$*" '
BEGIN { n=split(args,f) }
{ for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS) }
'
}
echo "a b c d e f" | a 1 3 5
echo "---"
echo "a b c d e f" | a 1 3 4 6
$ ./tst.sh
a c e
---
a c d f
You could get arbitrary complicated with this sort of thing (what if you wanted to be able to say a 2-5 7 11-, as with cut?) but here's one that will work with a list of numbers:
a() { (IFS=,; awk '{print '"${*/#/$}"'}'); }
That requires a bit of explanation.
a() { ... }
defines a shell function, which differs from an alias in various ways, one of which being that you can give it parameters.
Inside the shell function, I want to change the value of IFS; to avoid having to remember the old value and change it back, I surround the command I actually want to execute with (...), which causes it to execute in a subshell. When the subshell finishes, all environmental changes finish with it, so it effectively makes the change to IFS local.
IFS is the set of characters used for word splitting, but it also defines the character used to separate elements in the expansion of "$*" (that is, the list of function or script arguments) when it is surrounded by quotes. So setting it to , means the $* expansion will be a comma-separated list.
The awk program I want to create is actually something like {print $1,$4,$7}, so aside from putting commas between the list, I need to add a $ before each number. I do that with the bash parameter expansion substitute syntax: ${parameter/pattern/replacement}. By specifying * as the parameter, I get $* with the substitution applied to each argument. (Note that the expansion is quoted. If it weren't, it wouldn't work.)
In the replacement expression, the pattern is empty because the # character at the beginning of the pattern indicates that the match must be at the beginning of the string. Since the actual pattern is empty, the first match always be at the beginning of the string and the replacement ($) will therefore be inserted at the beginning of each argument. The # is needed because // is syntactically different: it means "change all occurrences of the pattern", instead of just the first one.
Unlike many languages, in bash search-and-replace expressions are not terminated with a /, but rather with the matching }. If you type ${p/foo/bar/}, it will replace the first instance of foo with bar/.

Resources