zsh sed expanding a variable with special characters and keeping them - macos

I'm trying to store a string in a variable, then expand that variable in a sed command.
Several of the values I'm going to put in the variable before calling the command will have parentheses (with and without slashes before the left parentheses, but never before the right), new lines and other special characters. Also, the string will have double quotes around it in the file that's being searched, and I'd like to use those to limit only to the string I'm querying.
The command needs to be able to match with those special characters in the file. Using zsh / Mac OS, although if the command was compatible with bash 4.2 that'd be a nice bonus. echoing to xargs is fine too. Also, if awk would be better for this, I have no requirement to use sed.
Something like...
sed 's/"\"$(echo -E - ${val})\""/"${key}.localized"/g' "${files}"
Given that $val is the variable I described above, $key has no spaces (but underscores) & $files is an array of file paths (preferably compatible with spaces, but not required).
Example Input values for $val...
... "something \(customStringConvertible) here" ...
... "something (notVar) here" ...
... "something %# here" ...
... "something # 100% here" ...
... "something for $100.00" ...
Example Output:
... "some_key".localized ...
I was using the sed command to replace the examples above. The text I'm overwriting it with is pretty straight forward.
The key problem I'm having is getting the command to match with the special characters instead of expanding them and then trying to match.
Thanks in advance for any assistance.

awk is better since it provides functions that work with literal strings:
$ val='something \(customStringConvertible) here' awk 'index($0,ENVIRON["val"])' file
... "something \(customStringConvertible) here" ...
$ val='something for $100.00' awk 'index($0,ENVIRON["val"])' file
... "something for $100.00" ...
The above was run on this input file:
$ cat file
... "something \(customStringConvertible) here" ...
... "something (notVar) here" ...
... "something %# here" ...
... "something # 100% here" ...
... "something for $100.00" ...
With sed you'd have to follow the instructions at Is it possible to escape regex metacharacters reliably with sed to try to fake sed out.
It's not clear what your real goal is so edit your question to provide concise, testable sample input and expected output if you need more help. Having said that, it looks like you're doing a substitution so maybe this is what you want:
$ old='"something for $100.00"' new='here & there' awk '
s=index($0,ENVIRON["old"]) { print substr($0,1,s-1) ENVIRON["new"] substr($0,s+length(ENVIRON["old"])) }
' file
... here & there ...
or if you prefer:
$ old='"something for $100.00"' new='here & there' awk '
BEGIN { old=ENVIRON["old"]; new=ENVIRON["new"]; lgth=length(old) }
s=index($0,old) { print substr($0,1,s-1) new substr($0,s+lgth) }
' file
or:
awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]=""; lgth=length(old) }
s=index($0,old) { print substr($0,1,s-1) new substr($0,s+lgth) }
' '"something for $100.00"' 'here & there' file
... here & there ...
See How do I use shell variables in an awk script? for info on how I'm using ENVIRON[] vs ARGV[] above.

Related

Bash next line (\) in command causing spacing issues

I have the following function I want to call:
function print_something {
echo "test\
something\
yep"
}
when called it prints:
'test something yep'
I would like it to print:
'testsomethingyep'
I can get this to print if I do:
function print_something {
echo "test\
something\
yep"
}
but I don't think that looks great..
(Root problem is a curl command not an echo)
Consider assembling your pieces an array, and then combining them into a string later. Array definition syntax is far more forgiving -- not requiring backslashes at all, and also allowing comments on each line and between lines.
#!/usr/bin/env bash
# ^^^^- arrays and printf -v require bash, not sh
pieces=(
test # this also lets you use comments
something # and you don't need any backslashes at all!
# one can also have a full-line comment midway through your array
"space here" # plus if you want to add a literal space you can do it
)
printf -v oneword '%s' "${pieces[#]}"
echo "$oneword"
...properly emits:
testsomethingspace here
Here are three ideas:
#!/bin/bash
print_something() {
tr -d \\n <<- EOF
test
something
yep
EOF
echo
}
print_something2() {
echo "test"$(:
)"something"$(:
)"yep"
}
print_something3() {
tr -d \\t <<- EOF
test\
something\
yep
EOF
}
print_something
print_something2
print_something3
The first uses a <<- style heredoc to remove all of the leading indentation (that indentation must be hard-tabs for this to work, and coding styles that mandate the use of spaces for indentation render this solution unusable (this is one reason coding styles that mandate the use of spaces in shell scripts are IMO utterly useless)) and the extra tr to remove the newlines. (The additional echo is then needed to add the trailing newline.). The second uses the $(:) command substitution to discard all the intervening whitespace. The 3rd manually deletes all the hard tabs.

Can the regex matching pattern for awk be placed above the opening brace of the action line, or must it be on the same line?

I'm studying awk pretty fiercely to write a git diffn implementation which will show line numbers for git diff, and I want confirmation on whether or not this Wikipedia page on awk is wrong [Update: I've now fixed this part of that Wikipedia page, but this is what it used to say]:
(pattern)
{
print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)
}
Output may be sent to a file:
(pattern)
{
print "expression" > "file name"
}
or through a pipe:
(pattern)
{
print "expression" | "command"
}
Notice (pattern) is above the opening brace. I'm pretty sure this is wrong but need to know for certain before editing the page. What I think that page should look like is this:
/regex_pattern/ {
print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)
}
Output may be sent to a file:
/regex_pattern/ {
print "expression" > "file name"
}
or through a pipe:
/regex_pattern/ {
print "expression" | "command"
}
Here's a test to "prove" it. I'm on Linux Ubuntu 18.04.
1. test_awk.sh
gawk \
'
BEGIN
{
print "START OF AWK PROGRAM"
}
'
Test and error output:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
gawk: cmd. line:3: BEGIN blocks must have an action part
But with this:
2. test_awk.sh
gawk \
'
BEGIN {
print "START OF AWK PROGRAM"
}
'
It works fine!:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
START OF AWK PROGRAM
Another example (fails to provide expected output):
3. test_awk.sh
gawk \
'
/hey/
{
print $0
}
'
Erroneous output:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
hey1
hey1
hello
hey2
hey2
But like this:
4. test_awk.sh
gawk \
'
/hey/ {
print $0
}
'
It works as expected:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
hey1
hey2
Updates: after solving this problem, I just added these sections below:
Learning material:
In the process of working on this problem, I just spent several hours and created these examples: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/tree/master/awk. These examples, comments, and links would prove useful to anyone getting started learning awk/gawk.
Related:
git diff with line numbers and proper code alignment/indentation
"BEGIN blocks must have an action part" error in awk script
The whole point of me learning awk at all in the first place was to write git diffn. I just got it done: Git diff with line numbers (Git log with line numbers)
I agree with you that the Wikipedia page is wrong. It's right in the awk manual:
A pattern-action statement has the form
pattern { action }
A missing { action } means print the line; a missing pattern always matches. Pattern-action statements are separated by newlines or semicolons.
...
Statements are terminated by semicolons, newlines or right braces.
This the man page for the default awk on my Mac. The same information is in the GNU awk manual, it's just buried a little deeper. And the POSIX specification of awk states
An awk program is composed of pairs of the form:
pattern { action }
Either the pattern or the action (including the enclosing brace characters) can be omitted.
A missing pattern shall match any record of input, and a missing action shall be equivalent to:
{ print }
You can see in you examples that instead of semicolons at the end of statements you can separate them with new lines. When you have
/regex/
{ ...
}
it's equivalent to /regex/; {...} which is equal to /regex/{print $0} {...} as you tested the behavior.
Note that BEGIN and END are special markers and they need action statements explicitly since for BEGIN {print $0} is not possible as the default action. That's why the open curly brace should be on the same line. Perhaps due to convenience but it's all consistent.

Referencing ANSI escape sequences in environment variables within awk

I'm sharing a source file defining different ANSI escape codes for different colors. The codes are sourced in a shellscript (bash) which also starts an awk script and the envvars are referenced in awk.
However I'm not getting the output I want - i.e., the colors.
Examples in bash:
export Red='\033[0;31m'
export Color_Off='\033[0m'
# This works, Output is "Hello" in Red
echo $Red Hello $Color_Off
Examples in awk (the envvars are still exported/set):
# This does not
$ awk 'BEGIN { print "Output: " ENVIRON["Red"] "Hello" ENVIRON["Color_Off"] }'
Output: \033[0;31mHello\033[0m
# This works, Output is "Hello" in Red
awk 'BEGIN { R="\033[0;31m" ; O="\033[0m" ; print R "Hello" O }'
I'm assuming the answer is lying there right in front of me, but I fail to find it just now.
There is a way to achieve this, but you need to declare color escape sequences slightly differently. Use ANSI C quoting to directly insert escape sequences instead of using a escape-led string and letting the shell expand later:
export Red=$'\e[0;31m'
export Color_Off=$'\e[0m'
awk 'BEGIN { print "Output: " ENVIRON["Red"] "Hello" ENVIRON["Color_Off"] " Bye" }'
This should work as expected. I also believe this is the superior way to declare colors (for instance, it's done this way in Zsh's colors contrib function).
Try the following awk command instead (slightly modified from yours):
$ awk -v R=$Red -v O=$Color_Off 'BEGIN { print R "Hello" O }'

Print a string with its special characters printed as literal escape sequences

I have a string in a shell/bash script. I want to print the string with all its "special characters" (eg. newlines, tabs, etc.) printed as literal escape sequences (eg. a newline is printed as \n, a tab is printed as \t, and so on).
(Not sure if I'm using the correct terminology; the example should hopefully clarify things.)
Example
The desired output of...
a="foo\t\tbar"
b="foo bar"
print_escape_seq "$a"
print_escape_seq "$b"
...is:
foo\t\tbar
foo\t\tbar
$a and $b are strings that were read in from a text file.
There are two tab characters between foo and bar in the $b variable.
An attempt
This is what I've tried:
#!/bin/sh
print_escape_seq() {
str=$(printf "%q\n" $1)
str=${str/\/\//\/}
echo $str
}
a="foo\t\tbar"
b="foo bar"
print_escape_seq "$a"
print_escape_seq "$b"
The output is:
foo\t\tbar
foo bar
So, it doesn't work for $b.
Is there an entirely straightforward way to accomplish this that I've missed completely?
Bash has a string quoting operation ${var#Q}
Here is some example code
bash_encode () {
esc=${1#Q}
echo "${esc:2:-1}"
}
testval=$(printf "hello\t\tworld")
set | grep "^testval="
echo "The encoded value of testval is" $(bash_encode "$testval")
Here is the output
testval=$'hello\t\tworld'
The encoded value of testval is hello\t\tworld
You will need to create a search and replace pattern for each binary value you wish to replace. Something like this:
#!/bin/bash
esc() {
# space char after //
v=${1// /\\s}
# tab character after //
v=${v// /\\t}
echo $v
}
esc "hello world"
esc "hello world"
This outputs
hello\sworld
hello\tworld
I required something similar for file paths, and I realized that ls -1b does the work, but in the research I found this solution in stackoverflow which is closer to what you were requiring.
Command to escape a string in bash
just compile it with gcc -o "escapify" escapify.c

Reading java .properties file from bash

I am thinking of using sed for reading .properties file, but was wondering if there is a smarter way to do that from bash script?
This would probably be the easiest way: grep + cut
# Usage: get_property FILE KEY
function get_property
{
grep "^$2=" "$1" | cut -d'=' -f2
}
The solutions mentioned above will work for the basics. I don't think they cover multi-line values though. Here is an awk program that will parse Java properties from stdin and produce shell environment variables to stdout:
BEGIN {
FS="=";
print "# BEGIN";
n="";
v="";
c=0; # Not a line continuation.
}
/^\#/ { # The line is a comment. Breaks line continuation.
c=0;
next;
}
/\\$/ && (c==0) && (NF>=2) { # Name value pair with a line continuation...
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e - 1); # Trim off the backslash.
c=1; # Line continuation mode.
next;
}
/^[^\\]+\\$/ && (c==1) { # Line continuation. Accumulate the value.
v= "" v substr($0,1,length($0)-1);
next;
}
((c==1) || (NF>=2)) && !/^[^\\]+\\$/ { # End of line continuation, or a single line name/value pair
if (c==0) { # Single line name/value pair
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e);
} else { # Line continuation mode - last line of the value.
c=0; # Turn off line continuation mode.
v= "" v $0;
}
# Make sure the name is a legal shell variable name
gsub(/[^A-Za-z0-9_]/,"_",n);
# Remove newlines from the value.
gsub(/[\n\r]/,"",v);
print n "=\"" v "\"";
n = "";
v = "";
}
END {
print "# END";
}
As you can see, multi-line values make things more complex. To see the values of the properties in shell, just source in the output:
cat myproperties.properties | awk -f readproperties.awk > temp.sh
source temp.sh
The variables will have '_' in the place of '.', so the property some.property will be some_property in shell.
If you have ANT properties files that have property interpolation (e.g. '${foo.bar}') then I recommend using Groovy with AntBuilder.
Here is my wiki page on this very topic.
I wrote a script to solve the problem and put it on my github.
See properties-parser
One option is to write a simple Java program to do it for you - then run the Java program in your script. That might seem silly if you're just reading properties from a single properties file. However, it becomes very useful when you're trying to get a configuration value from something like a Commons Configuration CompositeConfiguration backed by properties files. For a time, we went the route of implementing what we needed in our shell scripts to get the same behavior we were getting from CompositeConfiguration. Then we wisened up and realized we should just let CompositeConfiguration do the work for us! I don't expect this to be a popular answer, but hopefully you find it useful.
If you want to use sed to parse -any- .properties file, you may end up with a quite complex solution, since the format allows line breaks, unquoted strings, unicode, etc: http://en.wikipedia.org/wiki/.properties
One possible workaround would using java itself to preprocess the .properties file into something bash-friendly, then source it. E.g.:
.properties file:
line_a : "ABC"
line_b = Line\
With\
Breaks!
line_c = I'm unquoted :(
would be turned into:
line_a="ABC"
line_b=`echo -e "Line\nWith\nBreaks!"`
line_c="I'm unquoted :("
Of course, that would yield worse performance, but the implementation would be simpler/clearer.
In Perl:
while(<STDIN>) {
($prop,$val)=split(/[=: ]/, $_, 2);
# and do stuff for each prop/val
}
Not tested, and should be more tolerant of leading/trailing spaces, comments etc., but you get the idea. Whether you use Perl (or another language) over sed is really dependent upon what you want to do with the properties once you've parsed them out of the file.
Note that (as highlighted in the comments) Java properties files can have multiple forms of delimiters (although I've not seen anything used in practice other than colons). Hence the split uses a choice of characters to split upon.
Ultimately, you may be better off using the Config::Properties module in Perl, which is built to solve this specific problem.
I have some shell scripts that need to look up some .properties and use them as arguments to programs I didn't write. The heart of the script is a line like this:
dbUrlFile=$(grep database.url.file etc/zocalo.conf | sed -e "s/.*: //" -e "s/#.*//")
Effectively, that's grep for the key and filter out the stuff before the colon and after any hash.
if you want to use "shell", the best tool to parse files and have proper programming control is (g)awk. Use sed only simple substitution.
I have sometimes just sourced the properties file into the bash script. This will lead to environment variables being set in the script with the names and contents from the file. Maybe that is enough for you, too. If you have to do some "real" parsing, this is not the way to go, of course.
Hmm, I just run into the same problem today. This is poor man's solution, admittedly more straightforward than clever;)
decl=`ruby -ne 'puts chomp.sub(/=(.*)/,%q{="\1";}).gsub(".","_")' my.properties`
eval $decl
then, a property 'my.java.prop' can be accessed as $my_java_prop.
This can be done with sed or whatever, but I finally went with ruby for its 'irb' which was handy for experimenting.
It's quite limited (dots should be replaced only before '=',no comment handling), but could be a starting point.
#Daniel, I tried to source it, but Bash didn't like dots in variable names.
I have had some success with
PROPERTIES_FILE=project.properties
function source_property {
local name=$1
eval "$name=\"$(sed -n '/^'"$name"'=/,/^[A-Z]\+_*[A-Z]*=/p' $PROPERTIES_FILE|sed -e 's/^'"$name"'=//g' -e 's/"/\\"/g'|head -n -1)\""
}
source_property 'SOME_PROPERTY'
This is a solution that properly parses quotes and terminates at a space when not given quotes. It is safe: no eval is used.
I use this code in my .bashrc and .zshrc for importing variables from shell scripts:
# Usage: _getvar VARIABLE_NAME [sourcefile...]
# Echos the value that would be assigned to VARIABLE_NAME
_getvar() {
local VAR="$1"
shift
awk -v Q="'" -v QQ='"' -v VAR="$VAR" '
function loc(text) { return index($0, text) }
function unquote(d) { $0 = substr($0, eq+2) d; print substr($0, 1, loc(d)-1) }
{ sub(/^[ \t]+/, ""); eq = loc("=") }
substr($0, 1, eq-1) != VAR { next } # assignment is not for VAR: skip
loc("=" QQ) == eq { unquote(QQ); exit }
loc("=" Q) == eq { unquote( Q); exit }
{ print substr($1, eq + 1); exit }
' "$#"
}
This saves the desired variable name and then shifts the argument array so the rest can be passed as files to awk.
Because it's so hard to call shell variables and refer to quote characters inside awk, I'm defining them as awk variables on the command line. Q is a single quote (apostrophe) character, QQ is a double quote, and VAR is that first argument we saved earlier.
For further convenience, there are two helper functions. The first returns the location of the given text in the current line, and the second prints the content between the first two quotes in the line using quote character d (for "delimiter"). There's a stray d concatenated to the first substr as a safety against multi-line strings (see "Caveats" below).
While I wrote the code for POSIX shell syntax parsing, that appears to only differ from your format by whether there is white space around the asignment. You can add that functionality to the above code by adding sub(/[ \t]*=[ \t]*/, "="); before the sub(…) on awk's line 4 (note: line 1 is blank).
The fourth line strips off leading white space and saves the location of the first equals sign. Please verify that your awk supports \t as tab, this is not guaranteed on ancient UNIX systems.
The substr line compares the text before the equals sign to VAR. If that doesn't match, the line is assigning a different variable, so we skip it and move to the next line.
Now we know we've got the requested variable assignment, so it's just a matter of unraveling the quotes. We do this by searching for the first location of =" (line 6) or =' (line 7) or no quotes (line 8). Each of those lines prints the assigned value.
Caveats: If there is an escaped quote character, we'll return a value truncated to it. Detecting this is a bit nontrivial and I decided not to implement it. There's also a problem of multi-line quotes, which get truncated at the first line break (this is the purpose of the "stray d" mentioned above). Most solutions on this page suffer from these issues.
In order to let Java do the tricky parsing, here's a solution using jrunscript to print the keys and values in a bash read-friendy (key, tab character, value, null character) way:
#!/usr/bin/env bash
jrunscript -e '
p = new java.util.Properties();
p.load(java.lang.System.in);
p.forEach(function(k,v) { out.format("%s\t%s\000", k, v); });
' < /tmp/test.properties \
| while IFS=$'\t' read -d $'\0' -r key value; do
key=${key//./_}
printf -v "$key" %s "$value"
printf '=> %s = "%s"\n' "$key" "$value"
done
I found printf -v in this answer by #david-foerster.
To quote jrunscript: Warning: Nashorn engine is planned to be removed from a future JDK release

Resources