Conditional sed substitution

Conditional sed substitution - bash

Currently, I have a fully functional sed command that adds an item to a list in a specific text file in the following way ...
ITEMS="$ITEM1 $ITEM2 $ITEM3"
becomes the following when we wish to insert $ITEM4 ...
ITEMS="$ITEM1 $ITEM2 $ITEM3 $ITEM4"
The number of items in the list is not known; it is dynamic. And the dollar signs and quotes in that text are to be taken literally.
I use the following command to accomplish the addition to the list (variable $itemNum is assigned elsewhere) ...
sed "/^\s*ITEMS=/s/\"$/ \$ITEM$itemNum&/" file.txt
All this does, of course, is find the line starting with ITEMS= (and possibly with leading spaces) and replace the last double quote in that line with a space plus the desired item plus a double quote (to put the replaced double quote back in place). However, I have an additional case where the ITEMS list could be empty like so ...
ITEMS=""
and in this case, my command would insert $ITEM4 at the end of the list like so ...
ITEMS=" $ITEM4"
However, I want my command to be able to account for an empty list and not put the leading space when the list is indeed empty, so it just looks like ...
ITEMS="$ITEM4"
How could I alter my existing command to best accomplish this?

Taking your subject as a guide, here is one way to do it conditionally with sed:
sed -e '/^\s*ITEMS=/{s/=""/="$ITEM'"$itemNum"'"/;t;s/"$/ $ITEM'"$itemNum&/;}" file.txt
The main change is we try to substitute empty quotes ("") first. If that succeeds, the conditional branch t sends us to the end of the script, bypassing the other substitution. The second substitution only occurs if the first fails. I've changed the quoting to get rid of some of the "leaning toothpicks", but it's about 66% your original command.
Edit: The "leaning toothpicks" thing by OP's request
The original put the entire sed script line in double quotes so all " and most $ had to be escaped with backslashes. Rewriting my version like that looks like this:
sed -e "/^\s*ITEMS=/{s/=\"\"/=\"\$ITEM$itemNum\"/;t;s/\"$/ \$ITEM$itemNum&/;}" file.txt
I just find all those backslashes distracting, so I put as much of it into single quotes as I could, leaving only $itemNum in double quotes. In a shell script, if two strings are adjacent, they get stuck together so var1=QRS; var2='ABC'$var1'XYZ' leaves var2 set to 'ABCQRSXYZ' - if you think there's a chance the variable could contain a space, then it is best to quote it: var1=QRS; var2='ABC'"$var1"'XYZ' (no space between '" and "' in there.

in this situation, awk is more flexible:
awk -F'"' -vn="$itemNum" '/^\s*ITEMS=/{sub(/"$/,($2?" ":"")"$ITEM"n"&")}1' file
test:
kent$  itemNum=4 
kent$ echo 'ITEMS=""'|awk -F'"' -vn="$itemNum" '/^\s*ITEMS=/{sub(/"$/,($2?" ":"")"$ITEM"n"&")}1'
ITEMS="$ITEM4"
kent$ echo 'ITEMS="$ITEM1 $ITEM2 $ITEM3"'|awk -F'"' -vn="$itemNum" '/^\s*ITEMS=/{sub(/"$/,($2?" ":"")"$ITEM"n"&")}1'
ITEMS="$ITEM1 $ITEM2 $ITEM3 $ITEM4"

Related

sed substitution: substitute string is a variable needing expansion AND contains slashes

I am fighting with sed to do a substitution where the substitute string contains slashes. This general topic has been discussed on stack overflow before. But, AFAICT, I have anew wrinkle that hasn't been addressed in previous questions.
Let's say I have a file, ENVIRO.tpml, which has several lines, one of which is
Loaded modules: SUPPLY_MODULES_HERE
I want to replace SUPPLY_MODULES_HERE in an automated fashion with a list of loaded modules. (At this point, if anyone has a better way to do this than sed, please let me know!) My first effort here is to define an environment variable and use sed to put it into the file:
> modules=$(module list 2>&1)
> sed "s/SUPPLY_MODULES_HERE/${modules}/" ENVIRO.tmpl > ENVIRO.txt
(The 2>&1 being needed because module list sends its output to STDERR, for reasons I can't begin to understand.) However, as is often the case, the modules have slashes in them. For example
> echo ${modules}
gcc/9.2.0 mpt/2.20
The slashes kill my command because sed can't understand the expression and thinks my substitution command is "unterminated".
So I do the usual thing and use some other character for the command delimiter:
> modules=$(module list 2>&1)
> sed "s|SUPPLY_MODULES_HERE|${modules}|" ENVIRO.tmpl > ENVIRO.txt
and I still get an "unterminated 's'" error.
So I replace double quotes with single quotes:
> sed 's|SUPPLY_MODULES_HERE|${modules}|' ENVIRO.tmpl > ENVIRO.txt
and now I get no error, but the line in ENVIRO.txt looks like
Loaded modules: ${modules}
Not what I was hoping for.
So, AFAICT, I need double quotes to expand the variable, but I need single quotes to make the alternative delimiters work. But I need both at the same time. How do I get this?
UPDATE: Gordon Davisson's comment below got to the root of the matter: "echo ${modules} can be highly misleading". Examining $modules with declare -p shows that it actually has a newline (or, more generally, some kind of line break) in it. What I did was add an extra step to extract newlines out of the variable. With that change, everything worked fine. An alternative would be to convince sed to expand the variable with line breaks and substitute it as such into the text, but I haven't been able to make that work. Any takers?

sed is not the best tool here due to use of regex and delimiters.
Better to use awk command that doesn't require any regular expression.
awk -v kw='SUPPLY_MODULES_HERE' -v repl="$(module list 2>&1)" '
n = index($0, kw) {
$0 = substr($0, 1, n-1) repl substr($0, n+length(kw))
} 1
' file
index function uses plain string search in awk.
substr function is used to get substring before and after the search keyword.

Extract a section in a config file line using sed

I'm trying to continue to extract and isolate sections of text within my wordpress config file via bash script. Can someone help me figure out my sytax?
The lineof code in the wp-config.php file is:
$table_prefix = 'xyz_';
This is what I'm trying to use to extract the xyz_ portion.
prefix=$(sed -n "s/$table_prefix = *'[^']*'/p" wp-config.php)
echo -n "$prefix"
There's something wrong with my characters obviously. Any help would be much appreciated!

Your sed command is malformed. You can use s/regex/replacement/p to print your sed command. Yours, as written, will give unterminated 's' command. If you want to print your whole line out, you can use the capture group \0 to match it as s/<our_pattern>/\0/p
Bash interpets $table_prefix as a variable, and because it is in double quotes, it tries to expand it. Unless you set this variable to something, it expands to nothing. This would cause your sed command to match much more liberally, and we can fix it by escaping the $ as \$table_prefix.
Next, this won't actually match. Your line has multiple spaces before the =, so we need another wildcard there as in ...prefix *= *...
Lastly, to extract the xyz_ portion alone, we'll need to do some things. First, we have to make sure our pattern matches the whole line, so that when we substitute, the rest of the line won't be kept. We can do this by wrapping our pattern to match in ^.* ... .*\$. Next, we want to wrap the target section in a capture group. In sed, this is done with \(<stuff>\). The zeroth capture group is the whole line, and then capture groups are numbered in the order the parentheses appear. this means we can do \([^']*\) to grab that section, and \1 to output it:
All that gives us:
prefix=$(sed -n "s/^.*\$table_prefix *= *'\([^']*\)'.*\$/\1/p" wp-config.php)

The only issue with the regex is that the '$' character specifies that you are using a bash variable and since the pattern is wrapped in double quotes (", bash will attempt to expand the variable. You can mitigate this by either escapping the $ or wrapping the pattern in single quotes and escaping the single quotes in the pattern
Lastly, you are using the sed command s which stands for subsitute. It takes a pattern and replaces the matches with text in the form of s/<pattern>/<replace>/. You can omit the 's' and leave the 'p' or print command at the end. After all your command should look something like:
sed -n "/\$table_prefix = *'[^']*'/p" wp-config.php

sed partial replace or variable

I'd like to use sed to do a replace, but not by searching for what to replace.
Allow me to explain. I have a variable set to a default value initially.
VARIABLE="DEFAULT"
I can do a sed to replace DEFAULT with what I want, but then I would have to put DEFAULT back when I was all done. This is becuase what gets stored to VARIABLE is unique to the user. I'd like to use sed to search for somthing else other than what to replace. For example, search for VARIABLE=" and " and replace whats between it. That way it just constantly updates and there is no need to reset VARIABLE.
This is how I do it currently:
I call the script and pass an argument
./script 123456789
Inside the script, this is what happens:
sed -i "s%DEFAULT%$1%" file_to_modify
This replaces
VARIABLE="DEFAULT"
with
VARIABLE="123456789"
It would be nice if I didn't have to search for "DEFAULT", because then I would not have to reset VARIABLE at end of script.

sed -r 's/VARIABLE="[^"]*"/VARIABLE="123456789"/' file_to_modify
Or, more generally:
sed -r 's/VARIABLE="[^"]*"/VARIABLE="'"$1"'"/' file_to_modify
Both of the above use a regular expression that looks for 'VARIABLE="anything-at-all"' and replaces it with, in the first example above 'VARIABLE="123456789"' or, in the second, 'VARIABLE="$1"' where "$1" is the first argument to your script. The key element is [^"]. It means any character other than double-quote. [^"]* means any number of characters other than double-quote. Thus, we replace whatever was in the double-quotes before, "[^"]*", with our new value "123456789" or, in the second case, "$1".
The second case is a bit tricky. We want to substitute $1 into the expression but the expression is itself in single quotes. Inside single-quotes, bash will not substitute for $1. So, the sed command is broken up into three parts:
# spaces added for exposition but don't try to use it this way
's/VARIABLE="[^"]*"/VARIABLE="' "$1" '"/'
The first part is in single quotes and bash passes it literally to sed. The second part is in double-quotes, so bash will subsitute in for the value of `$``. The third part is in single-quotes and gets passed to sed literally.
MORE: Here is a simple way to test this approach on the command line without depending on any files:
$ new=1234 ; echo 'VARIABLE="DEFAULT"' | sed -r 's/VARIABLE="[^"]*"/VARIABLE="'"$new"'"/'
VARIABLE="1234"
The first line above is the command run at the prompt ($). The second is the output from running the command..

Delete all comments in a file using sed

How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?
This helped out a lot except for the string portion.

If # always means comment, and can appear anywhere on a line (like after some code):
sed 's:#.*$::g' <file-name>
If you want to change it in place, add the -i switch:
sed -i 's:#.*$::g' <file-name>
This will delete from any # to the end of the line, ignoring any context. If you use # anywhere where it's not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
sed 's:^#.*$::g' <file-name>
If they may be preceded by whitespace, but nothing else, do:
sed 's:^\s*#.*$::g' <file-name>
These two will be a little safer because they likely won't delete valid usage of # in your code, such as in strings.
Edit:
There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:
Strings can likely span lines
A regular expression can't tell the difference between apostrophies and single quotes
A regular expression can't match nested quotes (these cases will confuse the regex):
# "hello there"
# hello there"
"# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
sed 's:#[^"]*$::g' <file-name>
That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.

This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
/#/!b if the line does not contain a # bail out
s/^/\n/ insert a unique marker (\n)
ta;:a jump to a loop label (resets the substitute true/false flag)
s/\n$//;t if marker at the end of the line, remove and bail out
s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta if the string following the marker is a quoted one, bump the marker forward of it and loop.
s/\n\([^#]\)/\1\n/;ta if the character following the marker is not a #, bump the marker forward of it and loop.
s/\n.*// the remainder of the line is comment, remove the marker and the rest of line.

Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
sed '/^\s*#/d' file
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace), followed by a #, then delete the line by d command.
Any lines like:
# comment started from beginning.
# any number of white-space character before
# or 'quote' in "here"
They will be deleted.
But
a="foobar in #comment"
will not be deleted, which is the desired result.
Case 2: comment after actual code
For example:
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
sed "s/\s*#*[^\"']*$//" file
[^\"'] is used to prevent quoted string confusion, however, it also means that comments with quotations ' or " will not to be removed.
Final sed
sed "/^\s*#/d;s/\s*#[^\"']*$//" file

To remove comment lines (lines whose first non-whitespace character is #) but not shebang lines (lines whose first characters are #!):
sed '/^[[:space:]]*#[^!]/d; /#$/d' file
The first argument to sed is a string containing a sed program consisting of two delete-line commands of the form /regex/d. Commands are separated by ;. The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.
The last argument to sed is a file to use as input. In Bash, you can also operate on a string variable like this:
sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${MYSTRING}"
Example:
# test.sh
S0=$(cat << HERE
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
HERE
)
printf "\nBEFORE removal:\n\n${S0}\n\n"
S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${S0}")
printf "\nAFTER removal:\n\n${S1}\n\n"
Output:
$ bash test.sh
BEFORE removal:
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
AFTER removal:
#!/usr/bin/env bash
echo 'FOO' # trailing comment

Supposing "being in a string" means "occurs between a pair of quotes, either single or double", the question can be rephrased as "remove everything after the first unquoted #". You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.
So we get something like [^\"'#] for the trivial case -- a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\. -- that's not a literal dot, that's a literal backslash, followed by a dot metacharacter which matches any character.
Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: "\(\\.\|[^\"]\)*" or similarly for single-quoted strings '\(\\.\|[^\']\)*'.
Piecing all of this together, your sed script could look something like this:
s/^\([^\"'#]*\|\\.\|"\(\\.\|[^\"]\)*"\|'\(\\.\|[^\']\)*'\)*\)#.*/\1/
But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like "foo"'bar' gets replaced with foobar -- foo in double quotes, and bar in single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string -- '"foo"'"'" is "foo" in single quotes next to ' in double quotes, thus "foo"'; and "' can be expressed as '"' adjacent to "'". And so a single-quoted string containing both double quotes foo"'bar can be quoted with 'foo"' adjacent to "'bar" or, perhaps more realistically for this case 'foo"' adjacent to "'" adjacent to another single-quoted string 'bar', yielding 'foo'"'"'bar'.
sed 's/^\(\(\\.\|[^\#"'"'"']*\|"\(\\.\|[^\"]\)*"\|'"'"'\(\\.\|[^\'"'"']\)*'"'"'\)*\)#.*/\1/p' file
This was tested on Linux; on other platforms, the sed dialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.
Alas, if you may have multi-line quoted strings, this will not work; sed, by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.

As you have pointed out, sed won't work well if any parts of a script look like comments but actually aren't. For example, you could find a # inside a string, or the rather common $# and ${#param}.
I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:
$ cat foo.sh
echo $# # inline comment
# lone comment
echo '# this is not a comment'
[mvdan#carbon:12] [0] [/home/mvdan]
$ shfmt -mn foo.sh
echo $#
echo '# this is not a comment'
The parser and printer are Go packages, so if you'd like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.

sed 's:^#\(.*\)$:\1:g' filename
Supposing the lines starts with single # comment, Above command removes all comments from file.

replace substring in lines using sed or grep

I have a file with a lot of lines, two of them are:
videoId: 'S2Rgr6yuuXQ'
var vid_seq=1;
in a shell script, I have two variables,
for id, the value is always 11 characters/numbers
id='fsafsferii2'
id_seq=80
I want to modify these two lines with id and id_seq
videoId: 'fsafsferii2'
var vid_seq=80;
I used
sed -i 's/\(videoId: \).*\\1'${id}'/\2' file
but there are errors, what is wrong with my script?
thanks

The grep command won't "replace" text, it is for "global regular expression print". But sed will.
sed -i'' '/^videoId: /s/: .*/: '"$id"'/;/^var vid_seq=/s/=.*/='"$id_seq"';/'
I'm not a big fan of inserting variables into sed scripts this way, but sed is simple, and provides no mechanism for actually using actual variables on its own. If you're going to do this, include some format checking for the two variables to make sure they contain the data you want them to contain, before you run this sed script. An accidental / in a variable would cause the sed script to fail.
UPDATE per comments:
Here's a successful test:
$ id=fsafsferii2
$ id_seq=80
$ cat inp686
videoId: 'S2Rgr6yuuXQ'
var vid_seq=1;
$ sed '/^videoId: /s/: .*/: '"$id"'/;/^var vid_seq=/s/=.*/='"$id_seq"';/' < inp686
videoId: fsafsferii2
var vid_seq=80;
$
Of course, you'll need to do some quote magic to get the single quotes into your videoId, but I'm sure you can figure that out yourself.
UPDATE 2
According to sed's man page, the substitute command is in the form:
[2addr]s/regular expression/replacement/flags
The [2addr] means you can specify up to two "addresses", which can be line numbers or regular expressions to match. So the s (substitute) command can take a line, a range, a match, or a span between matches. In our case, we're just using a single match to identify what lines we want to execute the substitution on.
The script above is made up of two sed commands, separated by a semicolon.
/^videoId: / -- Match lines that start with the word videoId:...
s/: .*/: '"$id"'/; -- Substitute all text from the colon to the end of the line with whatever is in the $id environment variable.
/^var vid_seq=/ -- Match lines that ... meh, as above.
s/=.*/='"$id_seq"';/ -- Substitute all text from the equals sign on with $id_seq.
Note that the '"$id"' construct means that we are exiting the single quotes, then immediately entering double quotes for the expansion of the variable ... then exiting the double quotes and going back into a new set of single quotes. Sed scripts are safest inside single quotes because of the frequent use of characters that might be interpreted by a shell.
Note also that because sed's substitute command uses a forward slash as a delimiter, the $id and $id_seq variables may not contain a slash. If they might, you can switch to a different delimiter.

What is wrong with:
sed -i 's/\(videoId: \).*\\1'${id}'/\2' file
Missing the third delimiter (/). Valid syntax is s/regex/replace/
Incorrect regex pattern (let's assume ${id} has been substituted)
\(videoId: \).*\\1fsafsferii2
is telling it to match a string that looks like this:
videoId: anything\1fsafsferii2
(\\ in regex matches literal backslash, so \\1 would match a literal backslash followed by 1 instead of 1st sub-expression)
Replace the matched string with \2
But since there is only one set of parentheses, \2 is actually empty.
Also, since the regex pattern in 2. doesn't match anything, nothing is replaced.
This should work (GNU sed)
sed -i 's/\(videoId: \).*/\1 \x27'${id}'\x27/
s/\(var vid_seq=\).*/\1'${id_seq}'\;/' file
Note:
\x27 is the hexadecimal representation of single quote (to prevent clashing with the other single quote)
\; for literal semicolon. If ; is not escaped, it's interpreted to terminate the s command in sed.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Conditional sed substitution - bash

Related

sed substitution: substitute string is a variable needing expansion AND contains slashes

Extract a section in a config file line using sed

sed partial replace or variable

Delete all comments in a file using sed

replace substring in lines using sed or grep

Categories

Resources