Sed syntax difference MacOS vs GNU - bash

I have the following piece of code that works perfectly on MacOS, but doesn't work on GNU (I'm using MinGW):
filename=$1
iter_count=$((${#src_op_lst[#]} -1))
for i in $(eval echo "{0..$iter_count}");do
echo "Checking File :" $filename
sed -i "" "s/[[:<:]]"${src_op_lst[$i]}"[[:>:]]/"${tgt_op_lst[$i]}"/g" $filename
done
I guess the reason is that sed syntax is differ, but I have no idea how to make it work on GNU. I've tried to use sed -i -e ..., but sed gives me following error: -e expression #1, char 50: Invalid character class name.
I'm very new to bash scripting, excuse me if this question is stupid. Any help would be greatly appreciated!

[[:<:]] and [[:>:]] are tokens that MacOS's sed inteprets as directional word-boundaries : the first one lets you check you're matching at the start of a word and the second at the end of a word.
Those let you make sure you're replacing whole words rather than parts of words (e.g. avoiding to change professor into confessor when you're trying to replace pro with con).
The exact equivalent in GNU sed would be \< and \>.
However, as someone familiar with regex but not necessarily with the bazillion different implementations, I would suggest you use the more common \b instead of both \< and \>.
It's a direction-less word boundary which will match if you're either at the start or the end of a word. It will produce the exact same result in your case (and in most cases) and future maintainers will be more likely to be familiar with it and not go through the struggle you just experienced.

Related

sed substitution: substitute string is a variable needing expansion AND contains slashes

I am fighting with sed to do a substitution where the substitute string contains slashes. This general topic has been discussed on stack overflow before. But, AFAICT, I have anew wrinkle that hasn't been addressed in previous questions.
Let's say I have a file, ENVIRO.tpml, which has several lines, one of which is
Loaded modules: SUPPLY_MODULES_HERE
I want to replace SUPPLY_MODULES_HERE in an automated fashion with a list of loaded modules. (At this point, if anyone has a better way to do this than sed, please let me know!) My first effort here is to define an environment variable and use sed to put it into the file:
> modules=$(module list 2>&1)
> sed "s/SUPPLY_MODULES_HERE/${modules}/" ENVIRO.tmpl > ENVIRO.txt
(The 2>&1 being needed because module list sends its output to STDERR, for reasons I can't begin to understand.) However, as is often the case, the modules have slashes in them. For example
> echo ${modules}
gcc/9.2.0 mpt/2.20
The slashes kill my command because sed can't understand the expression and thinks my substitution command is "unterminated".
So I do the usual thing and use some other character for the command delimiter:
> modules=$(module list 2>&1)
> sed "s|SUPPLY_MODULES_HERE|${modules}|" ENVIRO.tmpl > ENVIRO.txt
and I still get an "unterminated 's'" error.
So I replace double quotes with single quotes:
> sed 's|SUPPLY_MODULES_HERE|${modules}|' ENVIRO.tmpl > ENVIRO.txt
and now I get no error, but the line in ENVIRO.txt looks like
Loaded modules: ${modules}
Not what I was hoping for.
So, AFAICT, I need double quotes to expand the variable, but I need single quotes to make the alternative delimiters work. But I need both at the same time. How do I get this?
UPDATE: Gordon Davisson's comment below got to the root of the matter: "echo ${modules} can be highly misleading". Examining $modules with declare -p shows that it actually has a newline (or, more generally, some kind of line break) in it. What I did was add an extra step to extract newlines out of the variable. With that change, everything worked fine. An alternative would be to convince sed to expand the variable with line breaks and substitute it as such into the text, but I haven't been able to make that work. Any takers?
sed is not the best tool here due to use of regex and delimiters.
Better to use awk command that doesn't require any regular expression.
awk -v kw='SUPPLY_MODULES_HERE' -v repl="$(module list 2>&1)" '
n = index($0, kw) {
$0 = substr($0, 1, n-1) repl substr($0, n+length(kw))
} 1
' file
index function uses plain string search in awk.
substr function is used to get substring before and after the search keyword.

Bash command stopped working after adding inline comment

The assignment of a new value generated in a subshell does work without the trailing comment:
newname=$(echo "$newname" | sed 's#TD.'"$oldnewTD"'#TD.r'"$ftd1"'#')
But the variable newname stays unchanged if a trailing comment is added:
newname=$(echo "$newname" | sed 's#TD.'"$oldnewTD"'#TD.r'"$ftd1"'#')# let us not change NonEqRead to NonEq
Why?
Bash version 5.0.3.
It turned out the space BEFORE the hash is extremely important in bash — something not frequently mentioned because it apparently seems too obvious due to otherwise impaired readability. When You use the syntax highlighting, however, it is easy to leave out that whitespace without noticing it (editor-dependent problem, of course; e.g., vim is affected). I spent a good time trying to figure out where the error was.
newname=$(echo "$newname" | sed 's#TD.'"$oldnewTD"'#TD.r'"$ftd1"'#') # let us not change NonEqRead to NonEq
Without the whitespace, the whole line silently fails (i.e., without any error message). As far as I understand, bash tries to interpret the hash as a some modifier to the subshell or to the assignment operator. In any case, this seems to be connected with how the bash scripts are read word by word.
See the explanation for a related case: https://stackoverflow.com/a/60238806/2010413

How to use sed regex pattern matching

I'm learning bash and I'm trying to parse a webpage(https://chromium-i18n.appspot.com/ssl-address) and extract the href o
f interest using sed. The pattern I'm using is:
/<a\shref=\'\/ssl-address\/data\/([^\"]*)\'>/siU
However, I cant get the expression to work with sed. When i run:
data=$(wget ${serviceUrl} -q -O -)
parsedData=$(sed '/<a\shref=\'\''\/ssl-address\/data\/([^\"]*)\'\''>/siU/' <<< ${data})
echo ${parsedData}
I get the following error:
sed: 1: "/<a\shref=\'\/ssl-addre ...": unterminated substitute pattern
What am I doing wrong?
Is this what you're trying to do?
$ wget 'https://chromium-i18n.appspot.com/ssl-address' -q -O - |
sed -n 's:.*/ssl-address/data/\([^'\'']*\).*:\1:p'
AC
AD
AD/Canillo
AD/Encamp
I see you're getting some answers using double quotes instead of single around your sed script so you can do "...'..." instead of '...'\''...' - though tempting and it'd function OK for this particular current example, don't do it. To avoid any surprises now or if/when your requirements change later, in all shell programming always enclose strings and scripts in single quotes unless you need to expose them to the shell for interpretation and then use double quotes unless you need the shell to do globbing and file name expansion on them and then use no quotes.
All right, you are trying to parse an entire webpage.
This situation require to delete all the lines you don't need.
As #Ed Morton said, you can use something else than sed.
Your webpage is this as you told us in a comment, so you first need do download it.
Note that the changing how you download the source of the page, you can change some thing (E.G. copy pasting it from the console of Firefox you will have href=", using wget you will have href=').
That said, let's use wget as you are currently doing in your question.
# This will create the ssl-address file
wget "https://chromium-i18n.appspot.com/ssl-address"
# This will give you a list of all of the links in a href.
sed -e "/<a href='.*/! d" -e "s/<a href='\/ssl-address\/data\/\(.*\)'.*/\1/" ssl-address
EDIT:
Reading your comments I saw you would like to filter some of the output (E.G. deleting all the examples link)
This can be done adding a piece of sed in order to delete lines you don't need.
In your case you just need to add -e "/<a href='\/ssl-address\/examples.*/d" so the whole line of code should be as follow:
sed -e "/<a href='.*/! d" -e "/<a href='\/ssl-address\/examples.*/d" -e "s/<a href='\/ssl-address\/data\/\(.*\)'.*/\1/" ssl-address
You probably want something like this, based on that input data:
sed -e "s/.*href='\([^']*\)'.*/\1/"
It says, "match anything .* followed by the literal characters href=' followed by anything other than the ' character [^']* (we capture using the \( ... \) notation) followed by the ' character followed by anything".
Note I used the " to enclose the sed expression, to avoid you having to quote the '.

Extracting snmpdump values (with an exact MIB) from a shell script

I have a a some SNMP dump:
1.3.6.1.2.1.1.2.0|5|1.3.6.1.4.1.9.1.1178
1.3.6.1.2.1.1.3.0|7|1881685367
1.3.6.1.2.1.1.4.0|6|""
1.3.6.1.2.1.1.5.0|6|"hgfdhg-4365.gfhfg.dfg.com"
1.3.6.1.2.1.1.6.0|6|""
1.3.6.1.2.1.1.7.0|2|6
1.3.6.1.2.1.1.8.0|7|0
1.3.6.1.2.1.1.9.1.2.1|5|1.3.6.1.4.1.9.7.129
1.3.6.1.2.1.1.9.1.2.2|5|1.3.6.1.4.1.9.7.115
And need to grep all data in first string after 1.3.6.1.2.1.1.2.0|5|, but not include this start of the string in grep itself. So, I must receive 1.3.6.1.4.1.9.1.1178 in grep. I've tried to use regex:
\b1.3.6.1.2.1.1.2.0\|5\|\s*([^\n\r]*)
But without any success. If a regular expression, or grep, is in fact the right tool, can you help me find the right regex? Otherwise, what tools should I consider instead?
With GNU grep +PCRE support, you can use Perl's \K flag to discard part of the matched string :
grep -Po "1\.3\.6\.1\.2\.1\.1\.2\.0\|5\|\K.*"
-P enables Perl's regex mode and -o switches output to matched parts rather than whole lines.
I had to escape the characters that have special meaning in Perl regexs, but this can be avoided as 123 suggests, by enclosing the characters to interpret literally between \Q and \E :
grep -Po "\Q1.3.6.1.2.1.1.2.0|5|\E\K.*"
I would usually solve this with sed as follows :
sed -n 's/1\.3\.6\.1\.2\.1\.1\.2\.0|5|\(.*\)/\1/p'
The -n flag disables implicit output and the search and replace command will remove the searched prefix from the line, leaving the relevant part to be printed.
The characters that have special meaning in GNU Basic Regular Expressions (BRE) must be escaped, which in this case is only .. Also note that the grouping tokens are \( and \) rather than the usual ( and ).
An alternate way to do this is in native shell, without any regexes at all. Consider:
prefix='1.3.6.1.2.1.1.2.0|5|'
while read -r line; do
[[ $line = "$prefix"* ]] && printf '%s\n' "${line#$prefix}"
done
If your original string is piped into the while read loop, the output is precisely 1.3.6.1.4.1.9.1.1178.

How do you escape a user-provided search term that you don't want evaluated for sed?

I'm trying to escape a user-provided search string that can contain any arbitrary character and give it to sed, but can't figure out how to make it safe for sed to use. In sed, we do s/search/replace/, and I want to search for exactly the characters in the search string without sed interpreting them (e.g., the '/' in 'my/path' would not close the sed expression).
I read this related question concerning how to escape the replace term. I would have thought you'd do the same thing to the search, but apparently not because sed complains.
Here's a sample program that creates a file called "my_searches". Then it reads each line of that file and performs a search and replace using sed.
#!/bin/bash
# The contents of this heredoc will be the lines of our file.
read -d '' SAMPLES << 'EOF'
/usr/include
P#$$W0RD$?
"I didn't", said Jane O'Brien.
`ls -l`
~!##$%^&*()_+-=:'}{[]/.,`"\|
EOF
echo "$SAMPLES" > my_searches
# Now for each line in the file, do some search and replace
while read line
do
echo "------===[ BEGIN $line ]===------"
# Escape every character in $line (e.g., ab/c becomes \a\b\/\c). I got
# this solution from the accepted answer in the linked SO question.
ES=$(echo "$line" | awk '{gsub(".", "\\\\&");print}')
# Search for the line we read from the file and replace it with
# the text "replaced"
sed 's/'"$ES"'/replaced/' < my_searches # Does not work
# Search for the text "Jane" and replace it with the line we read.
sed 's/Jane/'"$ES"'/' < my_searches # Works
# Search for the line we read and replace it with itself.
sed 's/'"$ES"'/'"$ES"'/' < my_searches # Does not work
echo "------===[ END ]===------"
echo
done < my_searches
When you run the program, you get sed: xregcomp: Invalid content of \{\} for the last line of the file when it's used as the 'search' term, but not the 'replace' term. I've marked the lines that give this error with # Does not work above.
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: xregcomp: Invalid content of \{\}
------===[ END ]===------
If you don't escape the characters in $line (i.e., sed 's/'"$line"'/replaced/' < my_searches), you get this error instead because sed tries to interpret various characters:
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: bad format in substitution expression
sed: No previous regexp.
------===[ END ]===------
So how do I escape the search term for sed so that the user can provide any arbitrary text to search for? Or more precisely, what can I replace the ES= line in my code with so that the sed command works for arbitrary text from a file?
I'm using sed because I'm limited to a subset of utilities included in busybox. Although I can use another method (like a C program), it'd be nice to know for sure whether or not there's a solution to this problem.
This is a relatively famous problem—given a string, produce a pattern that matches only that string. It is easier in some languages than others, and sed is one of the annoying ones. My advice would be to avoid sed and to write a custom program in some other language.
You could write a custom C program, using the standard library function strstr. If this is not fast enough, you could use any of the Boyer-Moore string matchers you can find with Google—they will make search extremely fast (sublinear time).
You could write this easily enough in Lua:
local function quote(s) return (s:gsub('%W', '%%%1')) end
local function replace(first, second, s)
return (s:gsub(quote(first), second))
end
for l in io.lines() do io.write(replace(arg[1], arg[2], l), '\n') end
If not fast enough, speed things up by applying quote to arg[1] only once, and inline frunciton replace.
As ghostdog mentioned, awk '{gsub(".", "\\\\&");print}' is incorrect because it escapes out non-special characters. What you really want to do is perhaps something like:
awk 'gsub(/[^[:alpha:]]/, "\\\\&")'
This will escape out non-alpha characters. For some reason I have yet to determine, I still cant replace "I didn't", said Jane O'Brien. even though my code above correctly escapes it to
\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\.
It's quite odd because this works perfectly fine
$ echo "\"I didn't\", said Jane O'Brien." | sed s/\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\./replaced/
replaced`
this : echo "$line" | awk '{gsub(".", "\\\\&");print}' escapes every character in $line, which is wrong!. do an echo $ES after that and $ES appears to be \/\u\s\r\/\i\n\c\l\u\d\e. Then when you pass to the next sed, (below)
sed 's/'"$ES"'/replaced/' my_searches
, it will not work because there is no line that has pattern \/\u\s\r\/\i\n\c\l\u\d\e. The correct way is something like:
$ sed 's|\([#$#^&*!~+-={}/]\)|\\\1|g' file
\/usr\/include
P\#\$\$W0RD\$?
"I didn't", said Jane O'Brien.
\`ls -l\`
\~\!\#\#\$%\^\&\*()_\+-\=:'\}\{[]\/.,\`"\|
you put all the characters you want escaped inside [], and choose a suitable delimiter for sed that is not in your character class, eg i chose "|". Then use the "g" (global) flag.
tell us what you are actually trying to do, ie an actual problem you are trying to solve.
This seems to work for FreeBSD sed:
# using FreeBSD & Mac OS X sed
ES="$(printf "%q" "${line}")"
ES="${ES//+/\\+}"
sed -E s$'\777'"${ES}"$'\777'replaced$'\777' < my_searches
sed -E s$'\777'Jane$'\777'"${line}"$'\777' < my_searches
sed -E s$'\777'"${ES}"$'\777'"${line}"$'\777' < my_searches
The -E option of FreeBSD sed is used to turn on extended regular expressions.
The same is available for GNU sed via the -r or --regexp-extended options respectively.
For the differences between basic and extended regular expressions see, for example:
http://www.gnu.org/software/sed/manual/sed.html#Extended-regexps
Maybe you can use FreeBSD-compatible minised instead of GNU sed?
# example using FreeBSD-compatible minised,
# http://www.exactcode.de/site/open_source/minised/
# escape some punctuation characters with printf
help printf
printf "%s\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
printf "%q\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
# example line
line='!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~ ... and Jane ...'
# escapes in regular expression
ES="$(printf "%q" "${line}")" # escape some punctuation characters
ES="${ES//./\\.}" # . -> \.
ES="${ES//\\\\(/(}" # \( -> (
ES="${ES//\\\\)/)}" # \) -> )
# escapes in replacement string
lineEscaped="${line//&/\&}" # & -> \&
minised s$'\777'"${ES}"$'\777'REPLACED$'\777' <<< "${line}"
minised s$'\777'Jane$'\777'"${lineEscaped}"$'\777' <<< "${line}"
minised s$'\777'"${ES}"$'\777'"${lineEscaped}"$'\777' <<< "${line}"
To avoid potential backslash confusion, we could (or rather should) use a backslash variable like so:
backSlash='\\'
ES="${ES//${backSlash}(/(}" # \( -> (
ES="${ES//${backSlash})/)}" # \) -> )
(By the way using variables in such a way seems like a good approach for tackling parameter expansion issues ...)
... or to complete the backslash confusion ...
backSlash='\\'
lineEscaped="${line//${backSlash}/${backSlash}}" # double backslashes
lineEscaped="${lineEscaped//&/\&}" # & -> \&
If you have bash, and you're just doing a pattern replacement, just do it natively in bash. The ${parameter/pattern/string} expansion in Bash will work very well for you, since you can just use a variable in place of the "pattern" and replacement "string" and the variable's contents will be safe from word expansion. And it's that word expansion which makes piping to sed such a hassle. :)
It'll be faster than forking a child process and piping to sed anyway. You already know how to do the whole while read line thing, so creatively applying the capabilities in Bash's existing parameter expansion documentation can help you reproduce pretty much anything you can do with sed. Check out the bash man page to start...

Resources