What is the function of this sed shell script?

What is the function of this sed shell script? - bash

I was reading this example about setting up a cluster with pgpool and Watchdog and decided to give it a try as an exercise.
I'm far from being a master of shell scripting, but I could follow the documentation and modify it according to the settings of my virtual machines. But I don't get what is the purpose of the following snippet:
if [ ${PGVERSION} -ge 12 ]; then
sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
-e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${DEST_NODE_PGDATA}/postgresql.conf
fi
In my case PGVERSION will be 12 (so the script will execute the code after the condition), RECOVERYCONF is /usr/local/pgsql/data/myrecovery.conf and DEST_NODE_PGDATA is /usr/local/pgsql/data.
I get (please excuse and correct me if I'm wrong) that -e indicates that a script comes next, the $(some commands) part evaluates the expression and returns the result, and that the sed regular expression indicates that the '/'s will be replaced by \/ (forward slash and slash). What is puzzling me are the "\\\$ainclude_if_exists =" and "/^include_if_exists" parts, I don't know what they mean or what are they intended for, nor how they interact. Also, the -e after the first sed regular expression is confusing me.
If you are interested in the context, those commands are near the end of the /var/lib/pgsql/11/data/recovery_1st_stage example script.
Thanks in advance for your time.

Here's a tiny representation of the same code:
sed -i -e '$amyvalue = foo' -e '/^myvalue = foo/d' myfile.txt
The first sed expression is:
$ # On the last line
a # Append the following text
myvalue = foo # (text to be appended)
The second is:
/ # On lines matching regex..
^myvalue = foo # (regex to match)
/ # (end of regex)
d # ..delete the line
So it deletes any myvalue = foo that may already exists, and then adds one such line at the end. The point is just to ensure that you have exactly one line, by A. adding the line if it's missing, B. not duplicate the line if it already exists.
The rest of the expression is merely complicated by the fact that this snippet uses variables and is embedded in a double quoted string that's being passed to a different host via ssh, and therefore requires some additional escaping both of the variables and of the quotes.

Related

Inserting “local” before variable name in a shell script leads to an error

This code (it’s a part of a shell function) works perfectly:
output=$(\
cat "${vim_file}" | \
sed -rne "${EXTRACT_ENTITIES}" | \
sed -re "${CLEAR_LEADING_QUOTES}" | \
sed -re "${NORMALIZE_NAMES}" \
)
But when I’m trying to insert the word “local” before the assignment…
local output=$(\
cat "${vim_file}" | \
sed -rne "${EXTRACT_ENTITIES}" | \
sed -re "${CLEAR_LEADING_QUOTES}" | \
sed -re "${NORMALIZE_NAMES}" \
)
…I get a strange error:
local: commands.: bad variable name
There are no wrong invisible characters in the code: only tabs making indentations and spaces in the other places. The script begins with “#!/bin/sh”. Inserting the “local” before other variables in the function doesn’t lead to any problem. Replacing “output” (the name of the variable) with another arbitrary string changes nothing. The OS is Linux.

Really short answer: Use more quotes!
local output="$(\
cat "${vim_file}" | \
sed -rne "${EXTRACT_ENTITIES}" | \
sed -re "${CLEAR_LEADING_QUOTES}" | \
sed -re "${NORMALIZE_NAMES}" \
)"
Longer answer: It's almost always a good idea to double-quote variable references and command substitutions. Double-quoting prevents them from being subject to word splitting and filename wildcard expansion, which is rarely something you want, and can cause confusing problems.
There are situations where it's safe to leave the double-quotes off, but the rules are confusing and hard to remember, and easy to get wrong. This is one of those confusing cases. One of the situations where word splitting and wildcard expansion don't happen (and therefore it's safe to leave the double-quotes off) is on the right-hand side of an assignment:
var=$othervar # safe to omit double-quotes
var2=$(somecommand) # also safe
var="$othervar" # this also works fine
var2="$(somecommand)" # so does this
Some shells extend this to assignments that're part of a command, like local or export:
export var=$othervar # *Maybe* ok, depending on the shell
local var2=$(somecommand) # also *maybe* ok
bash treats these as a type of assignment, so it doesn't do the split-expand thing with the values. But dash treats this more like a regular command (where the arguments do get split-expanded), so if your script is running under dash it can have problems like this.
For example, suppose somecommand prints "export and local are shell commands." Then in dash, local var2=$(somecommand) would expand to:
local var2=export and local are shell commands.
...which would declare the local variables var2 (which gets set to "export"), and, local, are, and shell. It would also try to declare commands. as a local variable, but fail because it's not a legal variable name.
Therefore, use more quotes!
export var="$othervar" # Safe in all shells
local var2="$(somecommand)" # also safe
Or separate the declarations (or both!):
export var
var=$othervar # Safe in all shells, with or without quotes
local var2
var2=$(somecommand) # also safe, with or without quotes

The answer was found here: Advanced Bash-Scripting Guide. Chapter 24. Functions
This is a quotation from there:
As Evgeniy Ivanov points out, when declaring and setting a local variable in a single command, apparently the order of operations is to first set the variable, and only afterwards restrict it to local scope.
It means that if a local variable contains a space, then, trying to execute the local command, the shell will take only the first word for the assignment. The rest of the string will be interpreted dependently on the content.
The way the shell interprets the rest content is still a puzzle for me. In my case it tried to perform assignment using arbitrary parts of the files being read. For example, the “commands.” string in the error message was the end of a sentence in one of the files the cat command operated on.
So, there are two ways to solve the problem.
The first one is to split the assignment. I.e. instead of…
local output=$(cat ...
…it must be:
local output
output=$(cat ...
The second approach has been taken from the comments under the question — using surrounding quotes for the entire expression:
local output="$(cat...)"
Summarizing: using shell, we all must always remember about insidious splitting at spaces.
P.S. Read the brilliant explanation from Gordon Davisson.

Look at the error message: you've provided an invalid variable name:
$ sh
$ foo () { local commands.; commands=5; echo "${commands}"; }
$ foo
sh: local: `commands.': not a valid identifier
5

What does this sed syntax mean? "s/MY_BASE_DIR=$.*$/MY_BASE_DIR=${MY_BASE_DIR-\1}/"

This is a simple question but i am unable to find it in tutorials. Could anybody please explain what this statement does when executed in a bash shell within a folder containing .sh scripts. I know -i does in place editing, i understand that it will run sed on all scripts within the current directory. And i know that it does some sort of substitution. But what does this \(.*\) mean?
sed -i 's/MY_BASE_DIR=\(.*\)/MY_BASE_DIR=${MY_BASE_DIR-\1}/' *.sh
Thanks in advance.

You have an expression like:
sed -i 's/XXX=\(YYY\)/XXX=ZZZ/' file
This looks for a string XXX= in a file and captures what goes after. Then, it replaces this captured content with ZZZ. Since there is a captured group, it is accessed with \1. Finally, using the -i flag in sed makes the edition to be in-place.
For the replacement, it uses the following syntax described in Shell parameter expansion:
${parameter:-word}
If parameter is unset or null, the expansion of word is substituted.
Otherwise, the value of parameter is substituted.
Example:
$ d=5
$ echo ${d-3}
5
$ echo ${a-3}
3
So with ${MY_BASE_DIR-SOMETHING-\1} you are saying: print $MY_BAS_DIR. And if this variable is unset or null, print what is stored in \1.
All together, this is resetting MY_BASE_DIR to the value in the variable $MY_BASE_DIR unless this is not set; in such case, the value remains the same.
Note though that the variable won't be expanded unless you use double quotes.
Test:
$ d=5
$ cat a
d=23
blabla
$ sed "s/d=\(.*\)/d=${d-\1}/" a # double quotes -> value is replaced
d=5
blabla
$ sed 's/d=\(.*\)/d=${d-\1}/' a # single quotes -> variable is not expanded
d=${d-23}
blabla
Andd see how the value remains the same if $d is not set:
$ unset d
$ sed "s/d=\(.*\)/d=${d-\1}/" a
d=23

The scripts contain lines like this:
MY_BASE_DIR=/usr/local
The sed expression changes them to:
MY_BASE_DIR=${MY_BASE_DIR-/usr/local}
The effect is that /usr/local is not used as a fixed value, but only as the default value. You can override it by setting the environment variable MY_BASE_DIR.

For future reference, I would take a look at the ExplainShell website:
http://explainshell
that will give you a breakdown of the command structure etc. In this instance, let step through the details...Let's start with a simple example, let's assume that we were going to make the simple change - commenting out all lines by adding a "#" before each line. We can do this for all *.sh files in a directory with the ".sh" extension in the current directory:
sed 's/^/\#/' *.sh
i.e. Substitute beginning of line ^, with a # ...
Caveat: You did not specify the OS you are using. You may get different results with different versions of sed and OS...
ok, now we can drill into the substitution in the script. An example is probably easier to explain:
File: t.sh
MY_BASE_DIR="/important data/data/bin"
the command 's/MY_BASE_DIR=$.*$/MY_BASE_DIR=${MY_BASE_DIR-\1}/' *.sh
will search for "MY_BASE_DIR" in each .sh file in the directory.
When it encounters the string "MY_BASE_DIR=.*", in the file, it expands it to be MY_BASE_DIR="/important data/data/bin", this is now replaced on the right side of the expression /MY_BASE_DIR=${MY_BASE_DIR-\1}/ which becomes
MY_BASE_DIR=${MY_BASE_DIR-"/important data/data/bin"}
essentially what happens is that the substitute operation takes
MY_BASE_DIR="/important data/data/bin"
and inserts
MY_BASE_DIR=${MY_BASE_DIR-"/important data/data/bin"}
now if we run the script with the variable MY_BASE_DIR set
export MY_BASE_DIR="/new/import/dir"
the scripts modified by the sed script referenced will now substitute /important data/data/bin with /new/import/dir...

Cannot understand a sed pattern

My original issue was to be able to add a line at the end of a specific block in a configuration file.
############
# MY BLOCK #
############
VALUE1 = XXXXX
VALUE2 = YYYYY
MYNEWVALUE = XXXXX <<< I want to add this one
##############
# MY BLOCK 2 #
##############
To do this I used the following sed script and it work flawlessly (found it in another post) :
sed -i -e "/# MY BLOCK #/{:a;n;/^$/!ba;i\MYNEWVALUE = XXXXX" -e '}' myfile
This worked perfectly when executed inside a shell script but I can't manage to use it directly in an interactive shell (it gave me an error: "!ba event not found"). To solve this, I tried to add '\' before '!ba' but now it gave me another error which tells me that '\' is an unknown command.
Could anyone explain where my mistake is on the above issue and how this script works?
Here is my understanding:
-i : insert new line (i think the first one is useless, am i right?)
-e : execute this sed script (don't understand why there is a second one at the end to close the })
:a : begin a loop
n : read each line with the pattern ^$ (empty lines)
! : reverse the loop
ba : end of the loop
Thanks !

Use ' instead of " to avoid having bash try to do history substitution on the !
If XXXXX contains a shell parameter expansion or somesuch, you can do it like this:
sed -i -e"/# $BLOCK_NAME"'#/{:a;n;/^$/!ba;i\'"$NEW_VAR = $NEW_VALUE" -e"}" myfile
The second -e is required to effectively insert a newline to close off the i command. You could actually insert the newline directly, instead:
sed -i -e"/# $BLOCK_NAME "'#/{:a;n;/^$/!ba;i\'"$NEW_VAR = $NEW_VALUE"$'\n}' myfile

:a introduces a label, named a.
n writes current pattern space to output, and replace pattern space with next line of input.
/^$/! means to match lines that are NOT (!) blank lines in pattern space; the following ba is a "branch to label a" when that match (not blank line) occurs.
If the branch doesn't occur, the i insert then takes place.
Use single quotes (') instead of double quotes (") on command line to prevent shell from performing shell substitutions (including the "$" and "!" characters).

In interactive shells, ! is used for history substitution, so you need to escape it:
sed -i -e "/# MY BLOCK #/{:a;n;/^\$/\!ba;i\MYNEWVALUE = XXXXX" -e '}' myfile
You should also escape $, since it has special meaning inside doublequoted strings (although in this case it's OK, because it's followed by /, not a variable name).

How to make bash regex'es more readable?

Here is an example of nicely indented Python regex (taken from here):
charref = re.compile(r"""
&[#] # Start of a numeric entity reference
(
0[0-7]+ # Octal form
| [0-9]+ # Decimal form
| x[0-9a-fA-F]+ # Hexadecimal form
)
; # Trailing semicolon
""", re.VERBOSE)
Now, I would like to use the same technique for bash regexes (i.e. sed or grep), but can't find any reference to similar features so far. Is it even possible to indent (and comment) something like this?
echo "$MULTILINE | sed -re 's/(expr1|expr2)|(expr3|expr4)/expr5/g'

You can use bash's line continuation, possibly:
echo "start of a line \
continues the previous line \
yet another continuation
oops. this is a brand new line"
Note the backslashes at the end of the first two lines. they essentially 'escape' the newline/linebreak that would otherwise tell bash you're starting a new line, which also implicitly terminate the statement being defined.

How do you escape a user-provided search term that you don't want evaluated for sed?

I'm trying to escape a user-provided search string that can contain any arbitrary character and give it to sed, but can't figure out how to make it safe for sed to use. In sed, we do s/search/replace/, and I want to search for exactly the characters in the search string without sed interpreting them (e.g., the '/' in 'my/path' would not close the sed expression).
I read this related question concerning how to escape the replace term. I would have thought you'd do the same thing to the search, but apparently not because sed complains.
Here's a sample program that creates a file called "my_searches". Then it reads each line of that file and performs a search and replace using sed.
#!/bin/bash
# The contents of this heredoc will be the lines of our file.
read -d '' SAMPLES << 'EOF'
/usr/include
P#$$W0RD$?
"I didn't", said Jane O'Brien.
`ls -l`
~!##$%^&*()_+-=:'}{[]/.,`"\|
EOF
echo "$SAMPLES" > my_searches
# Now for each line in the file, do some search and replace
while read line
do
echo "------===[ BEGIN $line ]===------"
# Escape every character in $line (e.g., ab/c becomes \a\b\/\c). I got
# this solution from the accepted answer in the linked SO question.
ES=$(echo "$line" | awk '{gsub(".", "\\\\&");print}')
# Search for the line we read from the file and replace it with
# the text "replaced"
sed 's/'"$ES"'/replaced/' < my_searches # Does not work
# Search for the text "Jane" and replace it with the line we read.
sed 's/Jane/'"$ES"'/' < my_searches # Works
# Search for the line we read and replace it with itself.
sed 's/'"$ES"'/'"$ES"'/' < my_searches # Does not work
echo "------===[ END ]===------"
echo
done < my_searches
When you run the program, you get sed: xregcomp: Invalid content of \{\} for the last line of the file when it's used as the 'search' term, but not the 'replace' term. I've marked the lines that give this error with # Does not work above.
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: xregcomp: Invalid content of \{\}
------===[ END ]===------
If you don't escape the characters in $line (i.e., sed 's/'"$line"'/replaced/' < my_searches), you get this error instead because sed tries to interpret various characters:
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: bad format in substitution expression
sed: No previous regexp.
------===[ END ]===------
So how do I escape the search term for sed so that the user can provide any arbitrary text to search for? Or more precisely, what can I replace the ES= line in my code with so that the sed command works for arbitrary text from a file?
I'm using sed because I'm limited to a subset of utilities included in busybox. Although I can use another method (like a C program), it'd be nice to know for sure whether or not there's a solution to this problem.

This is a relatively famous problem—given a string, produce a pattern that matches only that string. It is easier in some languages than others, and sed is one of the annoying ones. My advice would be to avoid sed and to write a custom program in some other language.
You could write a custom C program, using the standard library function strstr. If this is not fast enough, you could use any of the Boyer-Moore string matchers you can find with Google—they will make search extremely fast (sublinear time).
You could write this easily enough in Lua:
local function quote(s) return (s:gsub('%W', '%%%1')) end
local function replace(first, second, s)
return (s:gsub(quote(first), second))
end
for l in io.lines() do io.write(replace(arg[1], arg[2], l), '\n') end
If not fast enough, speed things up by applying quote to arg[1] only once, and inline frunciton replace.

As ghostdog mentioned, awk '{gsub(".", "\\\\&");print}' is incorrect because it escapes out non-special characters. What you really want to do is perhaps something like:
awk 'gsub(/[^[:alpha:]]/, "\\\\&")'
This will escape out non-alpha characters. For some reason I have yet to determine, I still cant replace "I didn't", said Jane O'Brien. even though my code above correctly escapes it to
\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\.
It's quite odd because this works perfectly fine
$ echo "\"I didn't\", said Jane O'Brien." | sed s/\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\./replaced/
replaced`

this : echo "$line" | awk '{gsub(".", "\\\\&");print}' escapes every character in $line, which is wrong!. do an echo $ES after that and $ES appears to be \/\u\s\r\/\i\n\c\l\u\d\e. Then when you pass to the next sed, (below)
sed 's/'"$ES"'/replaced/' my_searches
, it will not work because there is no line that has pattern \/\u\s\r\/\i\n\c\l\u\d\e. The correct way is something like:
$ sed 's|\([#$#^&*!~+-={}/]\)|\\\1|g' file
\/usr\/include
P\#\$\$W0RD\$?
"I didn't", said Jane O'Brien.
\`ls -l\`
\~\!\#\#\$%\^\&\*()_\+-\=:'\}\{[]\/.,\`"\|
you put all the characters you want escaped inside [], and choose a suitable delimiter for sed that is not in your character class, eg i chose "|". Then use the "g" (global) flag.
tell us what you are actually trying to do, ie an actual problem you are trying to solve.

This seems to work for FreeBSD sed:
# using FreeBSD & Mac OS X sed
ES="$(printf "%q" "${line}")"
ES="${ES//+/\\+}"
sed -E s$'\777'"${ES}"$'\777'replaced$'\777' < my_searches
sed -E s$'\777'Jane$'\777'"${line}"$'\777' < my_searches
sed -E s$'\777'"${ES}"$'\777'"${line}"$'\777' < my_searches

The -E option of FreeBSD sed is used to turn on extended regular expressions.
The same is available for GNU sed via the -r or --regexp-extended options respectively.
For the differences between basic and extended regular expressions see, for example:
http://www.gnu.org/software/sed/manual/sed.html#Extended-regexps
Maybe you can use FreeBSD-compatible minised instead of GNU sed?
# example using FreeBSD-compatible minised,
# http://www.exactcode.de/site/open_source/minised/
# escape some punctuation characters with printf
help printf
printf "%s\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
printf "%q\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
# example line
line='!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~ ... and Jane ...'
# escapes in regular expression
ES="$(printf "%q" "${line}")" # escape some punctuation characters
ES="${ES//./\\.}" # . -> \.
ES="${ES//\\\\(/(}" # \( -> (
ES="${ES//\\\\)/)}" # \) -> )
# escapes in replacement string
lineEscaped="${line//&/\&}" # & -> \&
minised s$'\777'"${ES}"$'\777'REPLACED$'\777' <<< "${line}"
minised s$'\777'Jane$'\777'"${lineEscaped}"$'\777' <<< "${line}"
minised s$'\777'"${ES}"$'\777'"${lineEscaped}"$'\777' <<< "${line}"

To avoid potential backslash confusion, we could (or rather should) use a backslash variable like so:
backSlash='\\'
ES="${ES//${backSlash}(/(}" # \( -> (
ES="${ES//${backSlash})/)}" # \) -> )
(By the way using variables in such a way seems like a good approach for tackling parameter expansion issues ...)

... or to complete the backslash confusion ...
backSlash='\\'
lineEscaped="${line//${backSlash}/${backSlash}}" # double backslashes
lineEscaped="${lineEscaped//&/\&}" # & -> \&

If you have bash, and you're just doing a pattern replacement, just do it natively in bash. The ${parameter/pattern/string} expansion in Bash will work very well for you, since you can just use a variable in place of the "pattern" and replacement "string" and the variable's contents will be safe from word expansion. And it's that word expansion which makes piping to sed such a hassle. :)
It'll be faster than forking a child process and piping to sed anyway. You already know how to do the whole while read line thing, so creatively applying the capabilities in Bash's existing parameter expansion documentation can help you reproduce pretty much anything you can do with sed. Check out the bash man page to start...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio