Replacing a parameter in bash using sed - bash

Trying to clean up several dozen redundant nagios config files, but sed isn't working for me (yes I'm fairly new to bash), here's the string I want to replace:
use store-service
host_name myhost
service_description HTTP_JVM_SYM_DS
check_command check_http!'-p 8080 -N -u /SymmetricDS/app'
check_interval 1
with this:
use my-template-service
host_name myhost
just the host_name should stay unchanged since it'll be different for each file. Any help will be greatly appreciated. Tried escaping the ' and !, but get this error -bash: !'-p: event not found
Thanks

Disclaimer: This question is somewhat light on info and rings a bit like "write my code for me". In good faith I'm assuming that it's not that, so I am answering in hopes that this can be used to learn more about text processing/regex substitutions in general, and not just to be copy-pasted somewhere and forgotten.
I suggest using perl instead of sed. While sed is often the right tool for the job, in this case I think Perl's better, for the following reasons:
Perl lets you easily do multi-line matches on a regex. This is possible with sed, but difficult (see this question for more info).
With multiple lines and complex delimiters and quote characters, sed starts to display different behavior depending on what platform you're using it on. For example, trying to do this with sed in "sorta multiline" mode gave me different results on OSX versus Linux (really GNU sed vs BSD sed). When using semi-advanced functionality like that, I'd stick with a tool that behaves consistently across platforms, which Perl does in this case.
Perl lets you deal with ASCII values and other special characters without a ton of "toothpick tower" escaping or subshelling. Since it's convenient to use ASCII values to match the single quotes in your pattern (we could use mixed double and single quotes instead, but that makes it harder to copy/paste this command into, say, a subshell or an eval'd part of a script), it's better to use a tool that supports this without extra hassle. It's possible with sed, but tricky; see this article for more info.
In sed/BRE, doing something as simple as a "one or more" match usually requires escaping special characters, aka [[:space:]]\{1,\}, which gets tedious. Since it's convenient to use a lot of repetition/grouping characters in this pattern, I prefer Perl for conciseness in this case, since it improves clarity of the matching code.
Perl lets you write comments in regex statements in one-liner mode via the x modifier. For big, multiline patterns like this one, having the pattern broken up and commented for readability really helps if you ever need to go back and change it. sed has comments too, but using them in single-pasteable-command mode (as opposed to a file of sed script code) can be tricky, and can result in less readable commands.
Anyway, following is the matcher I came up with. It's commented inline as much as I can make it, but the non-commented parts are explained here:
The -0777 switch tells perl to consume input files whole before processing them, rather than operating line-by-line. See perlrun for more info on this and the other flags. Thanks to #glennjackman for pointing this out in the comments on the original question!
The -p switch tells Perl to read STDIN until it sees a delimiter (which is end-of-input as set by -0777), run the program supplied, and print that program's return value before shutting down. Since our "program" is just a string substitution statement, its return value is the substituted string.
The -e switch tells perl to evaluate the next string argument for a program to run, rather than finding a script file or similar.
Input is piped from mytext.txt, which could be a file containing your pattern. You could also pipe input to Perl e.g. via cat mytext.txt | perl ... and it would work exactly the same way.
The regex modifiers work as follows: I use the multiline m modifier to match more than one \n-delimited statement, and the extended x modifier so we can have comments and turn off matching of literal whitespace, for clarity. You could get rid of comments and literal whitespace and splat it all into one line if you wanted, but good luck making any changes after you've forgotten what it does. See perlre for more info on these modifiers.
This command will replace the literal string you supplied, in a file that contains it (it can have more than just that string before/after it; only that block of text will be manipulated). It is less than literal in one minor way: it allows any number (one or more) of space characters between the first and second words in each line. If I remember Nagios configs, the number of spaces doesn't particularly matter anyway.
This command will not change the contents of a file it is supplied. If a file does not match the pattern, its contents will be printed out unchanged by this command. If it contains that pattern, the replaced contents will be printed out. You can write those contents to a new file, or do anything you like with them.
perl -0777pe '
# Use the pipe "|" character as an expression delimiter, since
# the pattern contains slashes.
s|
# 'use', one or more space-equivalent characters, and then 'store-service',
# on one line.
use \s+ store-service \n
# Open a capturing group.
(
# Capture the host name line in its entirety, then close the group.
host_name \s+ \S+
# Close the group and end the line.
) \n
service_description \s+ HTTP_JVM_SYM_DS \n
# Look for check_command, spaces, and check_http!, but keep matching on the
# same line.
check_command \s+ check_http!
# Look for a single quote character by ASCII value, since shell
# escaping these can be ugly/tricky, and makes your code less copy-
# pasteable in/out of scripts/subcommands.
\047
# Look for the arguments to check_http, delimited by explicit \s
# spaces, since we are in "extended" mode in order to be able to write
# these comments and the expression on multiple lines.
-p \s 8080 \s -N \s -u \s /SymmetricDS/app
# Look for another single quote and the end of the line.
\047 \n
check_interval \s+ 1\n
# Replace all of the matched text with the "use my-template-service" line,
# followed by the contents of the first matching group (the host_name line).
# You could capture the "use" statement in another group, or use e.g.
# sprintf() to align fields here instead of a big literal space line, but
# this is the simplest, most obvious way to get the replacement done.
|use my-template-service\n$1|mx
' < mytext.txt

Assuming you can glob the files to select on the log files of interest, I would first filter the files that you want to replace to be limited to five lines.
You can do that with Bash and awk:
for fn in *; do # make that glob apply to your files...
[[ -e "$fn" && -f "$fn" && -s "$fn" ]] || continue
line_cnt=$(awk 'FNR==NR{next}
END {print NR}' "$fn")
(( line_cnt == 5 )) || continue
# at this point you only have files with 5 lines of text...
done
Once you have done that, you can add another awk to the loop to make the replacements:
for fn in *; do
[[ -e "$fn" && -f "$fn" && -s "$fn" ]] || continue
line_cnt=$(awk -v l=5 'FNR==NR{next}
END {print NR}' "$fn")
(( line_cnt == 5 )) || continue
awk 'BEGIN{tgt["use"]="my-template-service"
tgt["host_name"]=""}
$1 in tgt { if (tgt[$1]=="") s=$2
else s=tgt[$1]
printf "%-33s%s\n", $1, s
}
' "$fn"
done

This is the GNU sed solution, check it. Backup your files before testing.
#!/bin/bash
# You should escape all special characters in this string (like $, ^, /, {, }, etc),
# which you need interpreted literally, not as regex - by the backslash.
# Your original string was contained only slashes from this list, but
# I decide don't escape them by backslashes, but change sed's s/pattern/replace/
# command to the s|patter|replace|. You can pick any more fittable character.
needle="use\s{1,}store-service\n\
host_name\s{1,}myhost\n\
service_description\s{1,}HTTP_JVM_SYM_DS\n\
check_command\s{1,}check_http!'-p 8080 -N -u /SymmetricDS/app'\n\
check_interval\s{1,}1"
replacement="use my-template-service\n\
host_name myhost"
# This echo command displays the generated substitute command,
# which will be used by sed
# uncomment it for viewing
# echo "s/$needle/$replacement/"
# for changing the file in place add the -i option.
sed -r "
/use\s{1,}store-service/ {
N;N;N;N;
s|$needle|$replacement|
}" input.txt
Input
one
two
use store-service
host_name myhost
service_description HTTP_JVM_SYM_DS
check_command check_http!'-p 8080 -N -u /SymmetricDS/app'
check_interval 1
three
four
Output
one
two
use my-template-service
host_name myhost
three
four

Related

How to make sed avoid replacement after specific symbol

I am writing a script for formatting a Fortran source code.
Simple formatting, like having all keywords in capitals or in small letters, etc.
Here is the main command
sed -i -e "/^\!/! s/$small\s/$cap /gI" $filein
It replaces every keyword $small (followed by a space) by a keyword $caps. And the replacement happens only if the line does not start with the "!".
It does what it should. Question:
How to avoid replacement if "!" is encountered in the middle of a line.
Or more generally, how to replace patterns everywhere, but not after a specific symbol, which can be either in the beginning of the line or somewhere else.
Example:
Program test ! It should not change the next program to caps
! Hi, is anything changing here? like program ?
This line does not have any key words
This line has Program and not exclamation mark.
"program" is a keyword. After running the script the result is:
PROGRAM test ! It should not change the next PROGRAM to caps
! Hi, is anything changed here? like program ?
This line does not have any key words
This line has PROGRAM and not exclamation mark.
I want:
PROGRAM test ! It should not change the next program to caps
! Hi, is anything changed here? like program ?
This line does not have any key words
This line has PROGRAM and not exclamation mark.
So far, I've failed to find a nice solution, which does the trick, hopefully with the sed command.
The typicall way in sed is to:
split the string into two parts - save one part in hold space.
do operations on pattern space
get hold space and shuffle for output.
Would be something along:
sed '/!/!b;/[^!]/{b};h;s/.*!//;x;s/!.*//;s/program/PROGRAM/gI;G;s/\n/!/'
/!/!b; - if the line has no !, then print it and start over.
h;s/.*!//;x;s/!.*// - put part after ! in hold space, part before ! in pattern space
s/program/PROGRAM/gI; - do the substitution on part of the string
G;s/\n/!/ - grab the part from hold space and shuffle output - it's easy here.
Assumptions:
OP needs to convert multiple keywords to uppercase
keywords to be capitalized do not include white space (eg, program name will need to be processed as two separate strings program and name)
input delimiter is white space
keywords with 'attached' non-alphanums will be ignored (eg, Program, will be ignored since , will be picked up as part of string) unless OP specifically includes the non-alphanum as part of the keyword definition (eg, keywords includes Program,)
all keywords to be converted to uppercase (ie, not going to worry about any flags to switch between lowercase, uppercase, camelcase, etc)
Sample input data:
$ cat source.txt
Program test ! It should not change the next program to caps # change first 'Program'
! Hi, is anything changing here? like program or MarK? # change nothing
This line does not have any key words ! except here - pRoGraM Mark # change nothing
This line has Program and not exclamation mARk plus MarKer. # change 'Program' and 'mARk' but not MarKer
Hi, hi, hI # change 'Hi,' and 'hi,' but not 'hI'
List of keywords provided in a separate file (whitespace delimited);
$ cat keywords.dat
program
mark hi, # 2 separate keywords: 'mark' and 'hi,' (comma included)
One awk idea:
awk -v comment="!" ' # define character after which conversions are to be ignored
FNR==NR { for ( i=1; i<=NF; i++) # first file contains keywords; process each field as a separate keywork
keywords[toupper($i)] # convert to uppercase and use as index in associative array keywords[]
next
}
{ for ( i=1; i<=NF; i++ ) # second file, process each field separately
{ if ( $i == comment ) # if field is our comment character then stop processing rest of line else ...
break
if ( toupper($i) in keywords ) # if current field is a keyword then convert to uppercase
$i=toupper($i)
}
print # print the current line
}
' keywords.dat source.txt
This generates:
PROGRAM test ! It should not change the next program to caps
! Hi, is anything changing here? like program or MarK?
This line does not have any key words ! except here - pRoGraM Mark
This line has PROGRAM and not exclamation MARK plus MarKer.
HI, HI, hI
NOTES:
while GNU awk can be told to overwrite the input file (eg, awk -i inplace == sed -i), this will require a different approach for processing the keywords.dat file (to keep from overwriting with nothing)
(quite a bit) of additional logic could be added to support uppercase vs lowercase vs camelcase vs whatever ... ignore or include non-alphanums in comparisons ... using multiple/different 'comment' characters ... standardizing other portions of (Fortran) code (eg, indentation) ... etc
This might work for you (GNU sed):
small='Program ' caps='PROGRAM '
sed -E ':a;s/^([^!]*)('"$small"')/\1\n/;ta;s/\n/'"$caps"'/g' file
Replace any occurrence of the variable $small before the symbol ! with a newline, then replace all newlines by the variable $caps.
N.B. The newline is chosen because it can not normally exist in any line presented by sed as it is the delimiter sed uses to present lines in the pattern space. Secondly, the words matching $small are iteratively replaced by a newline, then all newlines globally replaced by $caps. This allows for the replacement to by a superset of the first. If this were not the order of operations, the iterative process may become an endless loop.
If $small is to represent a case insensitive match, add the i flag to the first substitution.
I've tried suggested options, but all of them did not work as expected for the whole file.
I have ended up with multiple sed commands; I am sure that it is not the best solution, but it works for me and does what I need.
My main problem was to avoid replacement after "!" if it appears somewhere in the middle of the line.
So I switched this problem to the one I could handle.
sed -i -e "/^\!/! s/!/!c7u!!c7u!/" $filein # 1. If a line does NOT start with !, search next "!" and replace it with "!c7u!!c7u!"
sed -i "s/!c7u!/\n/" $filein # 2. Move that comment to a new line
for ((i=0; i<$nwords; i++ )); do # Loop through all keywords
word=${words[$i]} # Take a keyword from the list
small=${word,,} # Write it in small letters
cap=${word^^} # Write it in capitals
sed -i -e "/^\!/! s/$small\b/$cap/gI" $filein # 3. Actual replacement in lines not starting with "!"
done
sed -i -e :a -e '$!N;s/\n!c7u//;ta' -e 'P;D' $filein # 4. Undo step 1-2, moving inline comments back

bash script on specific URL string manipulation

I need to manipulate a string (URL) of which I don't know lenght.
the string is something like
https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring
I basically need a regular expression which returns this:
https://x.xx.xxx.xxx/keyword/restofstring
where the x is the current ip which can vary everytime and I don't know the number of dontcares.
I actually have no idea how to do it, been 2 hours on the problem but didn't find a solution.
thanks!
You can use sed as follows:
sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2='
s stands for substitute and has the form s=search pattern=replacement pattern=.
The search pattern is a regex in which we grouped (...) the parts you want to extract.
The replacement pattern accesses these groups with \1 and \2.
You can feed a file or stdin to sed and it will process the input line by line.
If you have a string variable and use bash, zsh, or something similar you also can feed that variable directly into stdin using <<<.
Example usage for bash:
input='https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring'
output="$(sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2=' <<< "$input")"
echo "$output" # prints https://x.xx.xxx.xxx/keyword/restofstring
echo "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring" | sed "s/dontcare[0-9]\+\///g"
sed is used to manipulate text. dontcare[0-9]\+\///g is an escaped form of the regular expression dontcare[0-9]+/, which matches the word "dontcare" followed by 1 or more digits, followed by the / character.
sed's pattern works like this: s/find/replace/g, where g is a command that allowed you to match more than one instance of the pattern.
You can see that regular expression in action here.
Note that this assumes there are no dontcareNs in the rest of the string. If that's the case, Socowi's answer works better.
You could also use read with a / value for $IFS to parse out the trash.
$: IFS=/ read proto trash url trash trash trash keyword rest <<< "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring"
$: echo "$proto//$url/$keyword/$rest"
https://x.xx.xxx.xxx/keyword/restofstring
This is more generalized when the dontcare... values aren't known and predictable strings.
This one is pure bash, though I like Socowi's answer better.
Here's a sed variation which picks out the host part and the last two components from the path.
url='http://example.com:1234/ick/poo/bar/quux/fnord'
newurl=$(echo "$url" | sed 's%\(https*://[^/?]*[^?/]\)[^ <>'"'"'"]*/\([^/ <>'"''"]*/^/ <>'"''"]*\)%\1\2%')
The general form is sed 's%pattern%replacement%' where the pattern matches through the end of the host name part (captured into one set of backslashed parentheses) then skips through the penultimate slash, then captures the remainder of the URL including the last slash; and the replacement simply recalls the two captured groups without the skipped part between them.

Extract parts of file path and concatenate in bash

I am new to bash scripting and my dir structure looks like below.
"/ABC/DEF/GHI/JKL/2015/01/01"
I am trying to produce the output like this - "JKL_2015-01-01".
I am trying using sed and cut and might take a while but this is needed immediately and any help is appreciated. Thanks.
i=/ABC/DEF/GHI/JKL/2015/01/01
o=`echo $i | sed -r 's|^.+/([^/]+)/([0-9]+)/([0-9]+)/([0-9]+)$|\1_\2-\3-\4|'`
i=xxx is a variable assignment, no whitespace around = allowed!
`command`
enclosed by backticks is a command substitution, which captures the standard output of the command inside as a string.
And sed is the stream editor, applying mostly regex based operations to each line from standard input, and emitting the result on standard output.
sed's s||| operation is regex based substitution. I capture 4 character groups with parens (): (non-slashes), slash, (numbers), slash, (numbers), slash, (numbers), end-of-string $. Then in the second part of the subst I print the for captured groups, separated by an underscore and 2 dashes, respectively.
There's no need to use tools that aren't built into bash for this -- using builtins is far more efficient than external tools like sed.
s="/ABC/DEF/GHI/JKL/2015/01/01"
s_re='/([^/]+)/([^/]+)/([^/]+)/([^/]+)$'
if [[ $s =~ $s_re ]]; then
name="${BASH_REMATCH[1]}_${BASH_REMATCH[2]}-${BASH_REMATCH[3]}-${BASH_REMATCH[4]}"
echo "$name"
fi
Alternately, and perhaps more readably (using string manipulation techniques documented in BashFAQ #100):
s="/ABC/DEF/GHI/JKL/2015/01/01"
s_prefix=${s%/*/*/*/*} # find the content we don't care about
s_suffix=${s#"$s_prefix"/} # strip that content
# read the rest into named variables
IFS=/ read -r category year month day <<<"$s_suffix"
# assemble those named variables into the string we care about
echo "${category}_${year}-${month}-${day}"

Delete all comments in a file using sed

How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?
This helped out a lot except for the string portion.
If # always means comment, and can appear anywhere on a line (like after some code):
sed 's:#.*$::g' <file-name>
If you want to change it in place, add the -i switch:
sed -i 's:#.*$::g' <file-name>
This will delete from any # to the end of the line, ignoring any context. If you use # anywhere where it's not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
sed 's:^#.*$::g' <file-name>
If they may be preceded by whitespace, but nothing else, do:
sed 's:^\s*#.*$::g' <file-name>
These two will be a little safer because they likely won't delete valid usage of # in your code, such as in strings.
Edit:
There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:
Strings can likely span lines
A regular expression can't tell the difference between apostrophies and single quotes
A regular expression can't match nested quotes (these cases will confuse the regex):
# "hello there"
# hello there"
"# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
sed 's:#[^"]*$::g' <file-name>
That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.
This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
/#/!b if the line does not contain a # bail out
s/^/\n/ insert a unique marker (\n)
ta;:a jump to a loop label (resets the substitute true/false flag)
s/\n$//;t if marker at the end of the line, remove and bail out
s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta if the string following the marker is a quoted one, bump the marker forward of it and loop.
s/\n\([^#]\)/\1\n/;ta if the character following the marker is not a #, bump the marker forward of it and loop.
s/\n.*// the remainder of the line is comment, remove the marker and the rest of line.
Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
sed '/^\s*#/d' file
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace), followed by a #, then delete the line by d command.
Any lines like:
# comment started from beginning.
# any number of white-space character before
# or 'quote' in "here"
They will be deleted.
But
a="foobar in #comment"
will not be deleted, which is the desired result.
Case 2: comment after actual code
For example:
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
sed "s/\s*#*[^\"']*$//" file
[^\"'] is used to prevent quoted string confusion, however, it also means that comments with quotations ' or " will not to be removed.
Final sed
sed "/^\s*#/d;s/\s*#[^\"']*$//" file
To remove comment lines (lines whose first non-whitespace character is #) but not shebang lines (lines whose first characters are #!):
sed '/^[[:space:]]*#[^!]/d; /#$/d' file
The first argument to sed is a string containing a sed program consisting of two delete-line commands of the form /regex/d. Commands are separated by ;. The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.
The last argument to sed is a file to use as input. In Bash, you can also operate on a string variable like this:
sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${MYSTRING}"
Example:
# test.sh
S0=$(cat << HERE
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
HERE
)
printf "\nBEFORE removal:\n\n${S0}\n\n"
S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${S0}")
printf "\nAFTER removal:\n\n${S1}\n\n"
Output:
$ bash test.sh
BEFORE removal:
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
AFTER removal:
#!/usr/bin/env bash
echo 'FOO' # trailing comment
Supposing "being in a string" means "occurs between a pair of quotes, either single or double", the question can be rephrased as "remove everything after the first unquoted #". You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.
So we get something like [^\"'#] for the trivial case -- a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\. -- that's not a literal dot, that's a literal backslash, followed by a dot metacharacter which matches any character.
Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: "\(\\.\|[^\"]\)*" or similarly for single-quoted strings '\(\\.\|[^\']\)*'.
Piecing all of this together, your sed script could look something like this:
s/^\([^\"'#]*\|\\.\|"\(\\.\|[^\"]\)*"\|'\(\\.\|[^\']\)*'\)*\)#.*/\1/
But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like "foo"'bar' gets replaced with foobar -- foo in double quotes, and bar in single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string -- '"foo"'"'" is "foo" in single quotes next to ' in double quotes, thus "foo"'; and "' can be expressed as '"' adjacent to "'". And so a single-quoted string containing both double quotes foo"'bar can be quoted with 'foo"' adjacent to "'bar" or, perhaps more realistically for this case 'foo"' adjacent to "'" adjacent to another single-quoted string 'bar', yielding 'foo'"'"'bar'.
sed 's/^\(\(\\.\|[^\#"'"'"']*\|"\(\\.\|[^\"]\)*"\|'"'"'\(\\.\|[^\'"'"']\)*'"'"'\)*\)#.*/\1/p' file
This was tested on Linux; on other platforms, the sed dialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.
Alas, if you may have multi-line quoted strings, this will not work; sed, by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.
As you have pointed out, sed won't work well if any parts of a script look like comments but actually aren't. For example, you could find a # inside a string, or the rather common $# and ${#param}.
I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:
$ cat foo.sh
echo $# # inline comment
# lone comment
echo '# this is not a comment'
[mvdan#carbon:12] [0] [/home/mvdan]
$ shfmt -mn foo.sh
echo $#
echo '# this is not a comment'
The parser and printer are Go packages, so if you'd like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.
sed 's:^#\(.*\)$:\1:g' filename
Supposing the lines starts with single # comment, Above command removes all comments from file.

How do you escape a user-provided search term that you don't want evaluated for sed?

I'm trying to escape a user-provided search string that can contain any arbitrary character and give it to sed, but can't figure out how to make it safe for sed to use. In sed, we do s/search/replace/, and I want to search for exactly the characters in the search string without sed interpreting them (e.g., the '/' in 'my/path' would not close the sed expression).
I read this related question concerning how to escape the replace term. I would have thought you'd do the same thing to the search, but apparently not because sed complains.
Here's a sample program that creates a file called "my_searches". Then it reads each line of that file and performs a search and replace using sed.
#!/bin/bash
# The contents of this heredoc will be the lines of our file.
read -d '' SAMPLES << 'EOF'
/usr/include
P#$$W0RD$?
"I didn't", said Jane O'Brien.
`ls -l`
~!##$%^&*()_+-=:'}{[]/.,`"\|
EOF
echo "$SAMPLES" > my_searches
# Now for each line in the file, do some search and replace
while read line
do
echo "------===[ BEGIN $line ]===------"
# Escape every character in $line (e.g., ab/c becomes \a\b\/\c). I got
# this solution from the accepted answer in the linked SO question.
ES=$(echo "$line" | awk '{gsub(".", "\\\\&");print}')
# Search for the line we read from the file and replace it with
# the text "replaced"
sed 's/'"$ES"'/replaced/' < my_searches # Does not work
# Search for the text "Jane" and replace it with the line we read.
sed 's/Jane/'"$ES"'/' < my_searches # Works
# Search for the line we read and replace it with itself.
sed 's/'"$ES"'/'"$ES"'/' < my_searches # Does not work
echo "------===[ END ]===------"
echo
done < my_searches
When you run the program, you get sed: xregcomp: Invalid content of \{\} for the last line of the file when it's used as the 'search' term, but not the 'replace' term. I've marked the lines that give this error with # Does not work above.
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: xregcomp: Invalid content of \{\}
------===[ END ]===------
If you don't escape the characters in $line (i.e., sed 's/'"$line"'/replaced/' < my_searches), you get this error instead because sed tries to interpret various characters:
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: bad format in substitution expression
sed: No previous regexp.
------===[ END ]===------
So how do I escape the search term for sed so that the user can provide any arbitrary text to search for? Or more precisely, what can I replace the ES= line in my code with so that the sed command works for arbitrary text from a file?
I'm using sed because I'm limited to a subset of utilities included in busybox. Although I can use another method (like a C program), it'd be nice to know for sure whether or not there's a solution to this problem.
This is a relatively famous problem—given a string, produce a pattern that matches only that string. It is easier in some languages than others, and sed is one of the annoying ones. My advice would be to avoid sed and to write a custom program in some other language.
You could write a custom C program, using the standard library function strstr. If this is not fast enough, you could use any of the Boyer-Moore string matchers you can find with Google—they will make search extremely fast (sublinear time).
You could write this easily enough in Lua:
local function quote(s) return (s:gsub('%W', '%%%1')) end
local function replace(first, second, s)
return (s:gsub(quote(first), second))
end
for l in io.lines() do io.write(replace(arg[1], arg[2], l), '\n') end
If not fast enough, speed things up by applying quote to arg[1] only once, and inline frunciton replace.
As ghostdog mentioned, awk '{gsub(".", "\\\\&");print}' is incorrect because it escapes out non-special characters. What you really want to do is perhaps something like:
awk 'gsub(/[^[:alpha:]]/, "\\\\&")'
This will escape out non-alpha characters. For some reason I have yet to determine, I still cant replace "I didn't", said Jane O'Brien. even though my code above correctly escapes it to
\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\.
It's quite odd because this works perfectly fine
$ echo "\"I didn't\", said Jane O'Brien." | sed s/\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\./replaced/
replaced`
this : echo "$line" | awk '{gsub(".", "\\\\&");print}' escapes every character in $line, which is wrong!. do an echo $ES after that and $ES appears to be \/\u\s\r\/\i\n\c\l\u\d\e. Then when you pass to the next sed, (below)
sed 's/'"$ES"'/replaced/' my_searches
, it will not work because there is no line that has pattern \/\u\s\r\/\i\n\c\l\u\d\e. The correct way is something like:
$ sed 's|\([#$#^&*!~+-={}/]\)|\\\1|g' file
\/usr\/include
P\#\$\$W0RD\$?
"I didn't", said Jane O'Brien.
\`ls -l\`
\~\!\#\#\$%\^\&\*()_\+-\=:'\}\{[]\/.,\`"\|
you put all the characters you want escaped inside [], and choose a suitable delimiter for sed that is not in your character class, eg i chose "|". Then use the "g" (global) flag.
tell us what you are actually trying to do, ie an actual problem you are trying to solve.
This seems to work for FreeBSD sed:
# using FreeBSD & Mac OS X sed
ES="$(printf "%q" "${line}")"
ES="${ES//+/\\+}"
sed -E s$'\777'"${ES}"$'\777'replaced$'\777' < my_searches
sed -E s$'\777'Jane$'\777'"${line}"$'\777' < my_searches
sed -E s$'\777'"${ES}"$'\777'"${line}"$'\777' < my_searches
The -E option of FreeBSD sed is used to turn on extended regular expressions.
The same is available for GNU sed via the -r or --regexp-extended options respectively.
For the differences between basic and extended regular expressions see, for example:
http://www.gnu.org/software/sed/manual/sed.html#Extended-regexps
Maybe you can use FreeBSD-compatible minised instead of GNU sed?
# example using FreeBSD-compatible minised,
# http://www.exactcode.de/site/open_source/minised/
# escape some punctuation characters with printf
help printf
printf "%s\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
printf "%q\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
# example line
line='!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~ ... and Jane ...'
# escapes in regular expression
ES="$(printf "%q" "${line}")" # escape some punctuation characters
ES="${ES//./\\.}" # . -> \.
ES="${ES//\\\\(/(}" # \( -> (
ES="${ES//\\\\)/)}" # \) -> )
# escapes in replacement string
lineEscaped="${line//&/\&}" # & -> \&
minised s$'\777'"${ES}"$'\777'REPLACED$'\777' <<< "${line}"
minised s$'\777'Jane$'\777'"${lineEscaped}"$'\777' <<< "${line}"
minised s$'\777'"${ES}"$'\777'"${lineEscaped}"$'\777' <<< "${line}"
To avoid potential backslash confusion, we could (or rather should) use a backslash variable like so:
backSlash='\\'
ES="${ES//${backSlash}(/(}" # \( -> (
ES="${ES//${backSlash})/)}" # \) -> )
(By the way using variables in such a way seems like a good approach for tackling parameter expansion issues ...)
... or to complete the backslash confusion ...
backSlash='\\'
lineEscaped="${line//${backSlash}/${backSlash}}" # double backslashes
lineEscaped="${lineEscaped//&/\&}" # & -> \&
If you have bash, and you're just doing a pattern replacement, just do it natively in bash. The ${parameter/pattern/string} expansion in Bash will work very well for you, since you can just use a variable in place of the "pattern" and replacement "string" and the variable's contents will be safe from word expansion. And it's that word expansion which makes piping to sed such a hassle. :)
It'll be faster than forking a child process and piping to sed anyway. You already know how to do the whole while read line thing, so creatively applying the capabilities in Bash's existing parameter expansion documentation can help you reproduce pretty much anything you can do with sed. Check out the bash man page to start...

Resources