Use sed to escape a pattern to another sed - bash

I would like an approach to do a replace using the sed command that escapes a "pattern" (string) to be used in another sed command. This escape process must include handling for multi-line strings pattern.
To illustrate I present the code below. It works perfectly (well tested so far), but fails when we have strings with multiple lines (see "STRING_TO_ESCAPE").
#!/bin/bash
# Escape TARGET_STRING.
read -r -d '' STRING_TO_ESCAPE <<'HEREDOC'
$N = "magic_quotes_gpc = <b>"._("On")."</b>";
$D = _("Increase your server security by setting magic_quotes_gpc to 'on'. PHP will escape all quotes in strings in this case.");
$S = _("Search for 'magic_quotes_gpc' in your php.ini and set it to 'On'.");
$R = ini_get('magic_quotes_gpc');
$M = TRUE;
$this->config_checks[] = array("NAME" => $N , "DESC" => $D , "RESULT" => $R , "SOLUTION" => $S , "MUST" => $M );
HEREDOC
ESCAPED_STRING=$(echo "'${STRING_TO_ESCAPE}'" | sed 's/[]\/$*.^|[]/\\&/g')
ESCAPED_STRING=${ESCAPED_STRING%?}
TARGET_STRING=${ESCAPED_STRING#?}
# NOTE: The single quotes in "'${STRING_TO_ESCAPE}'" serve to prevent spaces
# being "lost" at the beginning and end of the string! The manipulations with
# "ESCAPED_STRING" are used to remove them. When we use sed with the file being
# input (flag "-i") this problem does not occur.
# Escape REPLACE_STRING.
read -r -d '' STRING_TO_ESCAPE <<'HEREDOC'
/* NOTE: "Magic_quotes_gpc" is no longer required. We taught GOsa2 to deal with it (see /usr/share/gosa/html/main.php). By Questor */
/* Automatic quoting must be turned on */
/* $N = "magic_quotes_gpc = <b>"._("On")."</b>";
$D = _("Increase your server security by setting magic_quotes_gpc to 'on'. PHP will escape all quotes in strings in this case.");
$S = _("Search for 'magic_quotes_gpc' in your php.ini and set it to 'On'.");
$R = ini_get('magic_quotes_gpc');
$M = TRUE;
$this->config_checks[] = array("NAME" => $N , "DESC" => $D , "RESULT" => $R , "SOLUTION" => $S , "MUST" => $M ); */
HEREDOC
ESCAPED_STRING=$(echo "'${STRING_TO_ESCAPE}'" | sed 's/[]\/$*.^|[]/\\&/g')
ESCAPED_STRING=${ESCAPED_STRING%?}
REPLACE_STRING=${ESCAPED_STRING#?}
# Do the replace.
STRING_TO_MODIFY=$(cat file_name.txt)
MODIFIED_STRING=$(echo "'${STRING_TO_MODIFY}'" | sed 's/$TARGET_STRING/$REPLACE_STRING/g')
MODIFIED_STRING=${MODIFIED_STRING%?}
MODIFIED_STRING=${MODIFIED_STRING#?}
echo "$MODIFIED_STRING"
Thanks! =D

This escape process must include handling for multi-line strings pattern.
I think you're barking up the wrong tree. If you're trying to match a multiline pattern then the most significant problem is not how to escape the pattern, but rather how to write a sed script that will successfully match anything to it.
The problem is that sed reads input one line at a time. There are various ways to collect multiple lines and to operate on such collections, but you need to do that explicitly in the program. sed is therefore a poor choice for attempting to match arbitrary multiline text. To make your task feasible, you would want to know how many lines the pattern will contain, so as to write your sed program to be specific for that. Even then, this might be a better job for Perl.
Update:
Because I like sed, however, here's an example of how you could write a sed program that matches multiline patterns:
#!/bin/sed -f
# Build up a three-line window in the pattern space
:a
/\(.*\
\)\{2\}/! { N; ba; }
# A(nother) multiline pattern. If the pattern fails to match then the
# first line of the pattern space is printed and deleted, then
# we loop back to reload.
/^The\
quick\
brown$/! { P; D; ba; }
# Do whatever we want to do in the event of a match
s/brown/red/
# If control reaches here then the whole pattern space is printed,
# and if any input lines remain then we start again from the beginning
# with an initially-empty pattern space.
Example input:
$ ./ml.sed <<EOF
The
quick
brown
fox
jumped over
the lazy dog.
EOF
Output:
The
quick
red
fox
jumped over
the lazy dog.
Note well that newlines are matched as ordinary characters, but that literal newlines in pattern or replacement text need to be escaped in the normal way, for syntactic reasons.
Update 2:
Here's a variation the replaces appearances of the three-line sequence
brown
fox
jumped over
with the three-line sequence
red
pig
is fat
. Of course there are many other ways to accomplish the same thing with sed, and one of the others might be preferable to this for your particular purposes.
#!/bin/sed -f
:a
/\(.*\
\)\{2\}/! { N; ba; }
/^brown\
fox\
jumped over$/! { P; D; ba; }
s/.*/red\
pig\
is fat/

Related

Convert multi-line csv to single line using Linux tools

I have a .csv file that contains double quoted multi-line fields. I need to convert the multi-line cell to a single line. It doesn't show in the sample data but I do not know which fields might be multi-line so any solution will need to check every field. I do know how many columns I'll have. The first line will also need to be skipped. I don't how much data so performance isn't a consideration.
I need something that I can run from a bash script on Linux. Preferably using tools such as awk or sed and not actual programming languages.
The data will be processed further with Logstash but it doesn't handle double quoted multi-line fields hence the need to do some pre-processing.
I tried something like this and it kind of works on one row but fails on multiple rows.
sed -e :0 -e '/,.*,.*,.*,.*,/b' -e N -e '1n;N;N;N;s/\n/ /g' -e b0 file.csv
CSV example
First name,Last name,Address,ZIP
John,Doe,"Country
City
Street",12345
The output I want is
First name,Last name,Address,ZIP
John,Doe,Country City Street,12345
Jane,Doe,Country City Street,67890
etc.
etc.
First my apologies for getting here 7 months late...
I came across a problem similar to yours today, with multiple fields with multi-line types. I was glad to find your question but at least for my case I have the complexity that, as more than one field is conflicting, quotes might open, close and open again on the same line... anyway, reading a lot and combining answers from different posts I came up with something like this:
First I count the quotes in a line, to do that, I take out everything but quotes and then use wc:
quotes=`echo $line | tr -cd '"' | wc -c` # Counts the quotes
If you think of a single multi-line field, knowing if the quotes are 1 or 2 is enough. In a more generic scenario like mine I have to know if the number of quotes is odd or even to know if the line completes the record or expects more information.
To check for even or odd you can use the mod operand (%), in general:
even % 2 = 0
odd % 2 = 1
For the first line:
Odd means that the line expects more information on the next line.
Even means the line is complete.
For the subsequent lines, I have to know the status of the previous one. for instance in your sample text:
First name,Last name,Address,ZIP
John,Doe,"Country
City
Street",12345
You can say line 1 (John,Doe,"Country) has 1 quote (odd) what means the status of the record is incomplete or open.
When you go to line 2, there is no quote (even). Nevertheless this does not mean the record is complete, you have to consider the previous status... so for the lines following the first one it will be:
Odd means that record status toggles (incomplete to complete).
Even means that record status remains as the previous line.
What I did was looping line by line while carrying the status of the last line to the next one:
incomplete=0
cat file.csv | while read line; do
quotes=`echo $line | tr -cd '"' | wc -c` # Counts the quotes
incomplete=$((($quotes+$incomplete)%2)) # Check if Odd or Even to decide status
if [ $incomplete -eq 1 ]; then
echo -n "$line " >> new.csv # If line is incomplete join with next
else
echo "$line" >> new.csv # If line completes the record finish
fi
done
Once this was executed, a file in your format generates a new.csv like this:
First name,Last name,Address,ZIP
John,Doe,"Country City Street",12345
I like one-liners as much as everyone, I wrote that script just for the sake of clarity, you can - arguably - write it in one line like:
i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l " || echo "$l";done >new.csv
I would appreciate it if you could go back to your example and see if this works for your case (which you most likely already solved). Hopefully this can still help someone else down the road...
Recovering the multi-line fields
Every need is different, in my case I wanted the records in one line to further process the csv to add some bash-extracted data, but I would like to keep the csv as it was. To accomplish that, instead of joining the lines with a space I used a code - likely unique - that I could then search and replace:
i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l ~newline~ " || echo "$l";done >new.csv
the code is ~newline~, this is totally arbitrary of course.
Then, after doing my processing, I took the csv text file and replaced the coded newlines with real newlines:
sed -i 's/ ~newline~ /\n/g' new.csv
References:
Ternary operator: https://stackoverflow.com/a/3953666/6316852
Count char occurrences: https://stackoverflow.com/a/41119233/6316852
Other peculiar cases: https://www.linuxquestions.org/questions/programming-9/complex-bash-string-substitution-of-csv-file-with-multiline-data-937179/
TL;DR
Run this:
i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l " || echo "$l";done >new.csv
... and collect results in new.csv
I hope it helps!
If Perl is your option, please try the following:
perl -e '
while (<>) {
$str .= $_;
}
while ($str =~ /("(("")|[^"])*")|((^|(?<=,))[^,]*((?=,)|$))/g) {
if (($el = $&) =~ /^".*"$/s) {
$el =~ s/^"//s; $el =~ s/"$//s;
$el =~ s/""/"/g;
$el =~ s/\s+(?!$)/ /g;
}
push(#ary, $el);
}
foreach (#ary) {
print /\n$/ ? "$_" : "$_,";
}' sample.csv
sample.csv:
First name,Last name,Address,ZIP
John,Doe,"Country
City
Street",12345
John,Doe,"Country
City
Street",67890
Result:
First name,Last name,Address,ZIP
John,Doe,Country City Street,12345
John,Doe,Country City Street,67890
This might work for you (GNU sed):
sed ':a;s/[^,]\+/&/4;tb;N;ba;:b;s/\n\+/ /g;s/"//g' file
Test each line to see that it contains the correct number of fields (in the example that was 4). If there are not enough fields, append the next line and repeat the test. Otherwise, replace the newline(s) by spaces and finally remove the "'s.
N.B. This may be fraught with problems such as ,'s between "'s and quoted "'s.
Try cat -v file.csv. When the file was made with Excel, you might have some luck: When the newlines in a field are a simple \n and the newline at the end is a \r\n (which will look like ^M), parsing is simple.
# delete all newlines and replace the ^M with a new newline.
tr -d "\n" < file.csv| tr "\r" "\n"
# Above two steps with one command
tr "\n\r" " \n" < file.csv
When you want a space between the joined line, you need an additional step.
tr "\n\r" " \n" < file.csv | sed '2,$ s/^ //'
EDIT: #sjaak commented this didn't work is his case.
When your broken lines also have ^M you still can be a lucky (wo-)man.
When your broken field is always the first field in double quotes and you have GNU sed 4.2.2, you can join 2 lines when the first line has exactly one double quote.
sed -rz ':a;s/(\n|^)([^"]*)"([^"]*)\n/\1\2"\3 /;ta' file.csv
Explanation:
-z don't use \n as line endings
:a label for repeating the step after successful replacement
(\n|^) Search after a newline or the very first line
([^"]*) Substring without a "
ta Go back to label a and repeat
awk pattern matching is working.
answer in one line :
awk '/,"/{ORS=" "};/",/{ORS="\n"}{print $0}' YourFile
if you'd like to drop quotes, you could use:
awk '/,"/{ORS=" "};/",/{ORS="\n"}{print $0}' YourFile | sed 's/"//gw NewFile'
but I prefer to keep it.
to explain the code:
/Pattern/ : find pattern in current line.
ORS : indicates the output line record.
$0 : indicates the whole of the current line.
's/OldPattern/NewPattern/': substitude first OldPattern with NewPattern
/g : does the previous action for all OldPattern
/w : write the result to Newfile

Adding zero to part of string using sed

I have SNMP outputs like:
IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:8:4:56:f4:70
As you can see mac-address output is incorrect, and i fix it with sed:
echo IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:8:4:56:f4:70 |
sed -e 's/\b\(\w\)\b/0\1/g'
Output:
IP-MIB::ipNetToMediaPhysAddress.5122.192.19.03.25 = STRING: 34:08:04:56:f4:70
It fixes address but changes IP as well from 192.19.3.25 to 192.19.03.25. How can I avoid it and force to perform sed only after STRING: or only after last space in the string ?
The MAC address is colon-separated. You can use that to limit the substitutions. This will perform the substitutions that you are interested in but only if the word character is next to a colon:
sed -e 's/\b\w:/0&/g; s/:\(\w\)\b/:0\1/g'
For example:
$ echo IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:8:4:56:f4:70 | sed -e 's/\b\w:/0&/g; s/:\(\w\)\b/:0\1/g'
IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:08:04:56:f4:70
How it works
s/\b\w:/0&/g
This performs the substitution if the word character is preceded by a word break, \b, and followed by a colon. Since we just need to put a zero in front of the entire matched text, not just some section of it, we can omit the parens and just use & to copy the matched text.
s/:\(\w\)\b/:0\1/g
If there are any remaining substitutions that need to be done where the word character is preceded by a colon and followed by a word break, this does them.
Note: We are using GNU extensions that may not be portable.
Another way with sed if the MAC address is at end of line
echo IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 4:8:d:56:f4:7 |
sed -E '
s/$/:/
:A
s/([^[:xdigit:]])([[:xdigit:]]:)/\10\2/
tA
s/:$//'

remove only *some* fullstops from a csv file

If I have lines like the following:
1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
how can I replace all instances of ,., with ,?,
I want to preserve actual decimal places in the numbers so I can't just do
sed 's/./?/g' file
however when doing:
sed 's/,.,/,?,/g' file
this only appears to work in some cases. i.e. there are still instances of ,., hanging around.
anyone have any pointers?
Thanks
This should work :
sed ':a;s/,\.,/,?,/g;ta' file
With successive ,., strings, after a substitution succeeded, next character to be processed will be the following . that doesn't match the pattern, so with you need a second pass.
:a is a label for upcoming loop
,\., will match dot between commas. Note that the dot must be escaped because . is for matching any character (,a, would match with ,.,).
g is for general substitution
ta tests previous substitution and if it succeeded, loops to :a label for remaining substitutions.
Using sed it is possible by running a loop as shown in above answer however problem is easily solved using perl command line with lookarounds:
perl -pe 's/(?<=,)\.(?=,)/?/g' file
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
This command doesn't need a loop because instead of matching surrounding commas we're just asserting their position using a lookbehind and lookahead.
All that's necessary is a single substitution
$ perl -pe 's/,\.(?=,)/,?/g' dots.csv
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
You have an example using sed style regular expressions. I'll offer an alternative - parse the CSV, and then treat each thing as a 'field':
#!/usr/bin/perl
use strict;
use warnings;
#iterate input row by row
while ( <DATA> ) {
#remove linefeeds
chomp;
#split this row on ,
my #row = split /,/;
#iterate each field
foreach my $field ( #row ) {
#replace this field with "?" if it's "."
$field = "?" if $field eq ".";
}
#stick this row together again.
print join ",", #row,"\n";
}
__DATA__
1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
This is more verbose than it needs to be, to illustrate the concept. This could be reduced down to:
perl -F, -lane 'print join ",", map { $_ eq "." ? "?" : $_ } #F'
If your CSV also has quoting, then you can break out the Text::CSV module, which handles that neatly.
You just need 2 passes since the trailing , found on a ,., match isn't available to match the leading , on the next ,.,:
$ sed 's/,\.,/,?,/g; s/,\.,/,?,/g' file
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
The above will work in any sed on any OS.

In Perl, best way to insert a char every N chars

I would like to find the best way in Perl to insert a char every N chars in a string.
Suppose I have the following :
my $str = 'ABCDEFGH';
I would like to insert a space every two chars, so that I get:
my $finalstr = 'AB CD EF GH';
The innocent way would be:
my $finalstr;
while ($str =~ s/(..)//) {
$finalstr .= $1.' ';
}
(But the last space does not make me happy.)
Can we do better? Is it possible using a single substitution pattern s///, especially to use that same string $str (and not using $finalstr)?
The next step: do the same but with text before and after patterns to be cut (and to be kept, for sure), say for example '<<' and '>>':
my $str = 'blah <<ABCDEFGH>> blah';
my $finalstr1 = 'blah <<AB CD EF GH>> blah';
my $finalstr2 = 'blah << AB CD EF GH >> blah'; # alternate
Using positive lookahead and lookbehind assertions to insert a space:
my $str = 'ABCDEFGH';
$str =~ s/..\K(?=.)/ /sg;
use Data::Dump;
dd $str;
Outputs:
"AB CD EF GH"
Enhancement for limiting the Translation
If you want to apply this modification to only part of the string, break it into steps:
my $str = 'blah <<ABCDEFGH>> blah';
$str =~ s{<<\K(.*?)(?=>>)}{$1 =~ s/..\K(?=.)/ /sgr}esg;
use Data::Dump;
dd $str;
Outputs:
"blah <<AB CD EF GH>> blah"
The best solution using substitutions would probably be s/\G..\K/ /sg. Why?
The \G anchores at the current “position” of the string. This position is where the last match ended (usually this is set to the beginning of the string. If in doubt, set pos($str) = 0). Because we use the /g modifier, this will be where the previous substitution ended.
The .. matches any two characters. Note that we also use the /s modifier which causes . to really match any character, and not just the [^\n] character class.
The \K treats the previous part of the regex as a look-behind, by not including the previously matched part of the string in the substring that will be substituted. So \G..\K matches the zero length string after two arbitrary characters.
We substitute that zero length string with a single space.
I'd let the regex engine handle the substitution, rather than manually appending $1 . " ". Also, my lookbehind solution avoids the cost of using captures like $1.
You want the //g modifier with its many capabilities. See e.g. here for an introduction to the intricacies of global matching.
Do you mean something like...
$str =~ s/(..)/$1 /sg;
update: For more complex substitutions as the one you are asking in the second part of your question, you can use the e modifier that allows you to evaluate arbitrary perl code:
sub insert_spcs {
my $str = shift;
join ' ', $str =~ /(..?)/sg
}
my $str = 'blah <<ABCDEFGH>> blah';
$str =~ s/<<(.*?)>>/'<< '.insert_spcs($1).' >>'/se;
Personally I'd split the text with m//g and use join:
my $input = "ABCDEFGH";
my $result = join " ", ( $input =~ m/(..)/g );
say "RESULT <$result>";'
Yields
RESULT <AB CD EF GH>
The other answers are better, but just for giggles:
join ' ', grep length, split /(..)/, 'ABCDEFGH';

Reading java .properties file from bash

I am thinking of using sed for reading .properties file, but was wondering if there is a smarter way to do that from bash script?
This would probably be the easiest way: grep + cut
# Usage: get_property FILE KEY
function get_property
{
grep "^$2=" "$1" | cut -d'=' -f2
}
The solutions mentioned above will work for the basics. I don't think they cover multi-line values though. Here is an awk program that will parse Java properties from stdin and produce shell environment variables to stdout:
BEGIN {
FS="=";
print "# BEGIN";
n="";
v="";
c=0; # Not a line continuation.
}
/^\#/ { # The line is a comment. Breaks line continuation.
c=0;
next;
}
/\\$/ && (c==0) && (NF>=2) { # Name value pair with a line continuation...
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e - 1); # Trim off the backslash.
c=1; # Line continuation mode.
next;
}
/^[^\\]+\\$/ && (c==1) { # Line continuation. Accumulate the value.
v= "" v substr($0,1,length($0)-1);
next;
}
((c==1) || (NF>=2)) && !/^[^\\]+\\$/ { # End of line continuation, or a single line name/value pair
if (c==0) { # Single line name/value pair
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e);
} else { # Line continuation mode - last line of the value.
c=0; # Turn off line continuation mode.
v= "" v $0;
}
# Make sure the name is a legal shell variable name
gsub(/[^A-Za-z0-9_]/,"_",n);
# Remove newlines from the value.
gsub(/[\n\r]/,"",v);
print n "=\"" v "\"";
n = "";
v = "";
}
END {
print "# END";
}
As you can see, multi-line values make things more complex. To see the values of the properties in shell, just source in the output:
cat myproperties.properties | awk -f readproperties.awk > temp.sh
source temp.sh
The variables will have '_' in the place of '.', so the property some.property will be some_property in shell.
If you have ANT properties files that have property interpolation (e.g. '${foo.bar}') then I recommend using Groovy with AntBuilder.
Here is my wiki page on this very topic.
I wrote a script to solve the problem and put it on my github.
See properties-parser
One option is to write a simple Java program to do it for you - then run the Java program in your script. That might seem silly if you're just reading properties from a single properties file. However, it becomes very useful when you're trying to get a configuration value from something like a Commons Configuration CompositeConfiguration backed by properties files. For a time, we went the route of implementing what we needed in our shell scripts to get the same behavior we were getting from CompositeConfiguration. Then we wisened up and realized we should just let CompositeConfiguration do the work for us! I don't expect this to be a popular answer, but hopefully you find it useful.
If you want to use sed to parse -any- .properties file, you may end up with a quite complex solution, since the format allows line breaks, unquoted strings, unicode, etc: http://en.wikipedia.org/wiki/.properties
One possible workaround would using java itself to preprocess the .properties file into something bash-friendly, then source it. E.g.:
.properties file:
line_a : "ABC"
line_b = Line\
With\
Breaks!
line_c = I'm unquoted :(
would be turned into:
line_a="ABC"
line_b=`echo -e "Line\nWith\nBreaks!"`
line_c="I'm unquoted :("
Of course, that would yield worse performance, but the implementation would be simpler/clearer.
In Perl:
while(<STDIN>) {
($prop,$val)=split(/[=: ]/, $_, 2);
# and do stuff for each prop/val
}
Not tested, and should be more tolerant of leading/trailing spaces, comments etc., but you get the idea. Whether you use Perl (or another language) over sed is really dependent upon what you want to do with the properties once you've parsed them out of the file.
Note that (as highlighted in the comments) Java properties files can have multiple forms of delimiters (although I've not seen anything used in practice other than colons). Hence the split uses a choice of characters to split upon.
Ultimately, you may be better off using the Config::Properties module in Perl, which is built to solve this specific problem.
I have some shell scripts that need to look up some .properties and use them as arguments to programs I didn't write. The heart of the script is a line like this:
dbUrlFile=$(grep database.url.file etc/zocalo.conf | sed -e "s/.*: //" -e "s/#.*//")
Effectively, that's grep for the key and filter out the stuff before the colon and after any hash.
if you want to use "shell", the best tool to parse files and have proper programming control is (g)awk. Use sed only simple substitution.
I have sometimes just sourced the properties file into the bash script. This will lead to environment variables being set in the script with the names and contents from the file. Maybe that is enough for you, too. If you have to do some "real" parsing, this is not the way to go, of course.
Hmm, I just run into the same problem today. This is poor man's solution, admittedly more straightforward than clever;)
decl=`ruby -ne 'puts chomp.sub(/=(.*)/,%q{="\1";}).gsub(".","_")' my.properties`
eval $decl
then, a property 'my.java.prop' can be accessed as $my_java_prop.
This can be done with sed or whatever, but I finally went with ruby for its 'irb' which was handy for experimenting.
It's quite limited (dots should be replaced only before '=',no comment handling), but could be a starting point.
#Daniel, I tried to source it, but Bash didn't like dots in variable names.
I have had some success with
PROPERTIES_FILE=project.properties
function source_property {
local name=$1
eval "$name=\"$(sed -n '/^'"$name"'=/,/^[A-Z]\+_*[A-Z]*=/p' $PROPERTIES_FILE|sed -e 's/^'"$name"'=//g' -e 's/"/\\"/g'|head -n -1)\""
}
source_property 'SOME_PROPERTY'
This is a solution that properly parses quotes and terminates at a space when not given quotes. It is safe: no eval is used.
I use this code in my .bashrc and .zshrc for importing variables from shell scripts:
# Usage: _getvar VARIABLE_NAME [sourcefile...]
# Echos the value that would be assigned to VARIABLE_NAME
_getvar() {
local VAR="$1"
shift
awk -v Q="'" -v QQ='"' -v VAR="$VAR" '
function loc(text) { return index($0, text) }
function unquote(d) { $0 = substr($0, eq+2) d; print substr($0, 1, loc(d)-1) }
{ sub(/^[ \t]+/, ""); eq = loc("=") }
substr($0, 1, eq-1) != VAR { next } # assignment is not for VAR: skip
loc("=" QQ) == eq { unquote(QQ); exit }
loc("=" Q) == eq { unquote( Q); exit }
{ print substr($1, eq + 1); exit }
' "$#"
}
This saves the desired variable name and then shifts the argument array so the rest can be passed as files to awk.
Because it's so hard to call shell variables and refer to quote characters inside awk, I'm defining them as awk variables on the command line. Q is a single quote (apostrophe) character, QQ is a double quote, and VAR is that first argument we saved earlier.
For further convenience, there are two helper functions. The first returns the location of the given text in the current line, and the second prints the content between the first two quotes in the line using quote character d (for "delimiter"). There's a stray d concatenated to the first substr as a safety against multi-line strings (see "Caveats" below).
While I wrote the code for POSIX shell syntax parsing, that appears to only differ from your format by whether there is white space around the asignment. You can add that functionality to the above code by adding sub(/[ \t]*=[ \t]*/, "="); before the sub(…) on awk's line 4 (note: line 1 is blank).
The fourth line strips off leading white space and saves the location of the first equals sign. Please verify that your awk supports \t as tab, this is not guaranteed on ancient UNIX systems.
The substr line compares the text before the equals sign to VAR. If that doesn't match, the line is assigning a different variable, so we skip it and move to the next line.
Now we know we've got the requested variable assignment, so it's just a matter of unraveling the quotes. We do this by searching for the first location of =" (line 6) or =' (line 7) or no quotes (line 8). Each of those lines prints the assigned value.
Caveats: If there is an escaped quote character, we'll return a value truncated to it. Detecting this is a bit nontrivial and I decided not to implement it. There's also a problem of multi-line quotes, which get truncated at the first line break (this is the purpose of the "stray d" mentioned above). Most solutions on this page suffer from these issues.
In order to let Java do the tricky parsing, here's a solution using jrunscript to print the keys and values in a bash read-friendy (key, tab character, value, null character) way:
#!/usr/bin/env bash
jrunscript -e '
p = new java.util.Properties();
p.load(java.lang.System.in);
p.forEach(function(k,v) { out.format("%s\t%s\000", k, v); });
' < /tmp/test.properties \
| while IFS=$'\t' read -d $'\0' -r key value; do
key=${key//./_}
printf -v "$key" %s "$value"
printf '=> %s = "%s"\n' "$key" "$value"
done
I found printf -v in this answer by #david-foerster.
To quote jrunscript: Warning: Nashorn engine is planned to be removed from a future JDK release

Resources