Writing a bash function with quotation marks - macos

I'm trying to write a bash function that uses perl to find and replace characters. I've written the following function:
find_replace() {
perl -p -i -e "s/$1/$2/g" "$3"
}
It is not working right now, I think because $1 and $2 are being escaped by the quotation marks that surround them (which as far as I know, are a part of the perl syntax that needs to be there).
Any tips on how to make this function work (or a better way to write it that avoids this problem)?
EDIT:
Following Barmar's suggestion, here is the output when I attempt to run the function:
dholtz$ find_replace \001 , revenue_by_offer_tid
+ find_replace 001 , revenue_by_offer_tid
+ perl -p -i -e ''\''s/001/,/g'\''' revenue_by_offer_tid
++ update_terminal_cwd
++ local 'SEARCH= '
++ local REPLACE=%20
++ local PWD_URL=file://Dave-Mac-2.local/Users/dholtz
++ printf '\e]7;%s\a' file://Dave-Mac-2.local/Users/dholtz
dholtz$ head revenue_by_offer_tid
+ head revenue_by_offer_tid
Friday00228686050.0
Friday00228690410.0
Friday017438366585.040000000000004
Friday017438366591.3200000000000003
Friday017438366600.12
Friday0174383666114.759999999999962
Friday017438371407.440000000000006
Friday0174383815118.599999999999977
Friday017438382221.5600000000000005
Friday017438383663.480000000000002
Expected output is:
Friday,0,0,22,86860,50.0
Friday,0,0,22,86904,10.0
Friday,0,1,7438,36658,5.040000000000004
Friday,0,1,7438,36659,1.3200000000000003
Friday,0,1,7438,36660,0.12
Friday,0,1,7438,36661,14.759999999999962
Friday,0,1,7438,37140,7.440000000000006
Friday,0,1,7438,38151,18.599999999999977
Friday,0,1,7438,38222,1.5600000000000005
Friday,0,1,7438,38366,3.480000000000002

The backslashes were preventing the quotes from being processed properly
find_replace() {
perl -p -i -e "s/$1/$2/g" "$3"
}

The bash manual tells:
A non-quoted backslash (\) is the escape character. It preserves the
literal value of the next character that follows, ...
So, as we see in the output above, your command
find_replace \001 , revenue_by_offer_tid
is treated as (\ just unnecessarily preserving the 0)
find_replace 001 , revenue_by_offer_tid
- not as you wanted. To preserve the backslash, you must quote it in the command line you enter, e. g.
find_replace \\001 , revenue_by_offer_tid

Related

Double quotes containing variable not working in sed [duplicate]

In my bash script I have an external (received from user) string, which I should use in sed pattern.
REPLACE="<funny characters here>"
sed "s/KEYWORD/$REPLACE/g"
How can I escape the $REPLACE string so it would be safely accepted by sed as a literal replacement?
NOTE: The KEYWORD is a dumb substring with no matches etc. It is not supplied by user.
Warning: This does not consider newlines. For a more in-depth answer, see this SO-question instead. (Thanks, Ed Morton & Niklas Peter)
Note that escaping everything is a bad idea. Sed needs many characters to be escaped to get their special meaning. For example, if you escape a digit in the replacement string, it will turn in to a backreference.
As Ben Blank said, there are only three characters that need to be escaped in the replacement string (escapes themselves, forward slash for end of statement and & for replace all):
ESCAPED_REPLACE=$(printf '%s\n' "$REPLACE" | sed -e 's/[\/&]/\\&/g')
# Now you can use ESCAPED_REPLACE in the original sed statement
sed "s/KEYWORD/$ESCAPED_REPLACE/g"
If you ever need to escape the KEYWORD string, the following is the one you need:
sed -e 's/[]\/$*.^[]/\\&/g'
And can be used by:
KEYWORD="The Keyword You Need";
ESCAPED_KEYWORD=$(printf '%s\n' "$KEYWORD" | sed -e 's/[]\/$*.^[]/\\&/g');
# Now you can use it inside the original sed statement to replace text
sed "s/$ESCAPED_KEYWORD/$ESCAPED_REPLACE/g"
Remember, if you use a character other than / as delimiter, you need replace the slash in the expressions above wih the character you are using. See PeterJCLaw's comment for explanation.
Edited: Due to some corner cases previously not accounted for, the commands above have changed several times. Check the edit history for details.
The sed command allows you to use other characters instead of / as separator:
sed 's#"http://www\.fubar\.com"#URL_FUBAR#g'
The double quotes are not a problem.
The only three literal characters which are treated specially in the replace clause are / (to close the clause), \ (to escape characters, backreference, &c.), and & (to include the match in the replacement). Therefore, all you need to do is escape those three characters:
sed "s/KEYWORD/$(echo $REPLACE | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')/g"
Example:
$ export REPLACE="'\"|\\/><&!"
$ echo fooKEYWORDbar | sed "s/KEYWORD/$(echo $REPLACE | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')/g"
foo'"|\/><&!bar
Based on Pianosaurus's regular expressions, I made a bash function that escapes both keyword and replacement.
function sedeasy {
sed -i "s/$(echo $1 | sed -e 's/\([[\/.*]\|\]\)/\\&/g')/$(echo $2 | sed -e 's/[\/&]/\\&/g')/g" $3
}
Here's how you use it:
sedeasy "include /etc/nginx/conf.d/*" "include /apps/*/conf/nginx.conf" /etc/nginx/nginx.conf
It's a bit late to respond... but there IS a much simpler way to do this. Just change the delimiter (i.e., the character that separates fields). So, instead of s/foo/bar/ you write s|bar|foo.
And, here's the easy way to do this:
sed 's|/\*!50017 DEFINER=`snafu`#`localhost`\*/||g'
The resulting output is devoid of that nasty DEFINER clause.
It turns out you're asking the wrong question. I also asked the wrong question. The reason it's wrong is the beginning of the first sentence: "In my bash script...".
I had the same question & made the same mistake. If you're using bash, you don't need to use sed to do string replacements (and it's much cleaner to use the replace feature built into bash).
Instead of something like, for example:
function escape-all-funny-characters() { UNKNOWN_CODE_THAT_ANSWERS_THE_QUESTION_YOU_ASKED; }
INPUT='some long string with KEYWORD that need replacing KEYWORD.'
A="$(escape-all-funny-characters 'KEYWORD')"
B="$(escape-all-funny-characters '<funny characters here>')"
OUTPUT="$(sed "s/$A/$B/g" <<<"$INPUT")"
you can use bash features exclusively:
INPUT='some long string with KEYWORD that need replacing KEYWORD.'
A='KEYWORD'
B='<funny characters here>'
OUTPUT="${INPUT//"$A"/"$B"}"
Use awk - it is cleaner:
$ awk -v R='//addr:\\file' '{ sub("THIS", R, $0); print $0 }' <<< "http://file:\_THIS_/path/to/a/file\\is\\\a\\ nightmare"
http://file:\_//addr:\file_/path/to/a/file\\is\\\a\\ nightmare
Here is an example of an AWK I used a while ago. It is an AWK that prints new AWKS. AWK and SED being similar it may be a good template.
ls | awk '{ print "awk " "'"'"'" " {print $1,$2,$3} " "'"'"'" " " $1 ".old_ext > " $1 ".new_ext" }' > for_the_birds
It looks excessive, but somehow that combination of quotes works to keep the ' printed as literals. Then if I remember correctly the vaiables are just surrounded with quotes like this: "$1". Try it, let me know how it works with SED.
These are the escape codes that I've found:
* = \x2a
( = \x28
) = \x29
" = \x22
/ = \x2f
\ = \x5c
' = \x27
? = \x3f
% = \x25
^ = \x5e
sed is typically a mess, especially the difference between gnu-sed and bsd-sed
might just be easier to place some sort of sentinel at the sed side, then a quick pipe over to awk, which is far more flexible in accepting any ERE regex, escaped hex, or escaped octals.
e.g. OFS in awk is the true replacement ::
date | sed -E 's/[0-9]+/\xC1\xC0/g' |
mawk NF=NF FS='\xC1\xC0' OFS='\360\237\244\241'
1 Tue Aug 🤡 🤡:🤡:🤡 EDT 🤡
(tested and confirmed working on both BSD-sed and GNU-sed - the emoji isn't a typo that's what those 4 bytes map to in UTF-8 )
There are dozens of answers out there... If you don't mind using a bash function schema, below is a good answer. The objective below was to allow using sed with practically any parameter as a KEYWORD (F_PS_TARGET) or as a REPLACE (F_PS_REPLACE). We tested it in many scenarios and it seems to be pretty safe. The implementation below supports tabs, line breaks and sigle quotes for both KEYWORD and replace REPLACE.
NOTES: The idea here is to use sed to escape entries for another sed command.
CODE
F_REVERSE_STRING_R=""
f_reverse_string() {
: 'Do a string reverse.
To undo just use a reversed string as STRING_INPUT.
Args:
STRING_INPUT (str): String input.
Returns:
F_REVERSE_STRING_R (str): The modified string.
'
local STRING_INPUT=$1
F_REVERSE_STRING_R=$(echo "x${STRING_INPUT}x" | tac | rev)
F_REVERSE_STRING_R=${F_REVERSE_STRING_R%?}
F_REVERSE_STRING_R=${F_REVERSE_STRING_R#?}
}
# [Ref(s).: https://stackoverflow.com/a/2705678/3223785 ]
F_POWER_SED_ECP_R=""
f_power_sed_ecp() {
: 'Escape strings for the "sed" command.
Escaped characters will be processed as is (e.g. /n, /t ...).
Args:
F_PSE_VAL_TO_ECP (str): Value to be escaped.
F_PSE_ECP_TYPE (int): 0 - For the TARGET value; 1 - For the REPLACE value.
Returns:
F_POWER_SED_ECP_R (str): Escaped value.
'
local F_PSE_VAL_TO_ECP=$1
local F_PSE_ECP_TYPE=$2
# NOTE: Operational characters of "sed" will be escaped, as well as single quotes.
# By Questor
if [ ${F_PSE_ECP_TYPE} -eq 0 ] ; then
# NOTE: For the TARGET value. By Questor
F_POWER_SED_ECP_R=$(echo "x${F_PSE_VAL_TO_ECP}x" | sed 's/[]\/$*.^[]/\\&/g' | sed "s/'/\\\x27/g" | sed ':a;N;$!ba;s/\n/\\n/g')
else
# NOTE: For the REPLACE value. By Questor
F_POWER_SED_ECP_R=$(echo "x${F_PSE_VAL_TO_ECP}x" | sed 's/[\/&]/\\&/g' | sed "s/'/\\\x27/g" | sed ':a;N;$!ba;s/\n/\\n/g')
fi
F_POWER_SED_ECP_R=${F_POWER_SED_ECP_R%?}
F_POWER_SED_ECP_R=${F_POWER_SED_ECP_R#?}
}
# [Ref(s).: https://stackoverflow.com/a/24134488/3223785 ,
# https://stackoverflow.com/a/21740695/3223785 ,
# https://unix.stackexchange.com/a/655558/61742 ,
# https://stackoverflow.com/a/11461628/3223785 ,
# https://stackoverflow.com/a/45151986/3223785 ,
# https://linuxaria.com/pills/tac-and-rev-to-see-files-in-reverse-order ,
# https://unix.stackexchange.com/a/631355/61742 ]
F_POWER_SED_R=""
f_power_sed() {
: 'Facilitate the use of the "sed" command. Replaces in files and strings.
Args:
F_PS_TARGET (str): Value to be replaced by the value of F_PS_REPLACE.
F_PS_REPLACE (str): Value that will replace F_PS_TARGET.
F_PS_FILE (Optional[str]): File in which the replacement will be made.
F_PS_SOURCE (Optional[str]): String to be manipulated in case "F_PS_FILE" was
not informed.
F_PS_NTH_OCCUR (Optional[int]): [1~n] - Replace the nth match; [n~-1] - Replace
the last nth match; 0 - Replace every match; Default 1.
Returns:
F_POWER_SED_R (str): Return the result if "F_PS_FILE" is not informed.
'
local F_PS_TARGET=$1
local F_PS_REPLACE=$2
local F_PS_FILE=$3
local F_PS_SOURCE=$4
local F_PS_NTH_OCCUR=$5
if [ -z "$F_PS_NTH_OCCUR" ] ; then
F_PS_NTH_OCCUR=1
fi
local F_PS_REVERSE_MODE=0
if [ ${F_PS_NTH_OCCUR} -lt -1 ] ; then
F_PS_REVERSE_MODE=1
f_reverse_string "$F_PS_TARGET"
F_PS_TARGET="$F_REVERSE_STRING_R"
f_reverse_string "$F_PS_REPLACE"
F_PS_REPLACE="$F_REVERSE_STRING_R"
f_reverse_string "$F_PS_SOURCE"
F_PS_SOURCE="$F_REVERSE_STRING_R"
F_PS_NTH_OCCUR=$((-F_PS_NTH_OCCUR))
fi
f_power_sed_ecp "$F_PS_TARGET" 0
F_PS_TARGET=$F_POWER_SED_ECP_R
f_power_sed_ecp "$F_PS_REPLACE" 1
F_PS_REPLACE=$F_POWER_SED_ECP_R
local F_PS_SED_RPL=""
if [ ${F_PS_NTH_OCCUR} -eq -1 ] ; then
# NOTE: We kept this option because it performs better when we only need to replace
# the last occurrence. By Questor
# [Ref(s).: https://linuxhint.com/use-sed-replace-last-occurrence/ ,
# https://unix.stackexchange.com/a/713866/61742 ]
F_PS_SED_RPL="'s/\(.*\)$F_PS_TARGET/\1$F_PS_REPLACE/'"
elif [ ${F_PS_NTH_OCCUR} -gt 0 ] ; then
# [Ref(s).: https://unix.stackexchange.com/a/587924/61742 ]
F_PS_SED_RPL="'s/$F_PS_TARGET/$F_PS_REPLACE/$F_PS_NTH_OCCUR'"
elif [ ${F_PS_NTH_OCCUR} -eq 0 ] ; then
F_PS_SED_RPL="'s/$F_PS_TARGET/$F_PS_REPLACE/g'"
fi
# NOTE: As the "sed" commands below always process literal values for the "F_PS_TARGET"
# so we use the "-z" flag in case it has multiple lines. By Quaestor
# [Ref(s).: https://unix.stackexchange.com/a/525524/61742 ]
if [ -z "$F_PS_FILE" ] ; then
F_POWER_SED_R=$(echo "x${F_PS_SOURCE}x" | eval "sed -z $F_PS_SED_RPL")
F_POWER_SED_R=${F_POWER_SED_R%?}
F_POWER_SED_R=${F_POWER_SED_R#?}
if [ ${F_PS_REVERSE_MODE} -eq 1 ] ; then
f_reverse_string "$F_POWER_SED_R"
F_POWER_SED_R="$F_REVERSE_STRING_R"
fi
else
if [ ${F_PS_REVERSE_MODE} -eq 0 ] ; then
eval "sed -i -z $F_PS_SED_RPL \"$F_PS_FILE\""
else
tac "$F_PS_FILE" | rev | eval "sed -z $F_PS_SED_RPL" | tac | rev > "$F_PS_FILE"
fi
fi
}
MODEL
f_power_sed "F_PS_TARGET" "F_PS_REPLACE" "" "F_PS_SOURCE"
echo "$F_POWER_SED_R"
EXAMPLE
f_power_sed "{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate" "[ ]+|$/,\"\0\"" "" "Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate that concatenation of the final \", \" then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,\"\0\"); print; }' <<<\"$string\") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readar"
echo "$F_POWER_SED_R"
IF YOU JUST WANT TO ESCAPE THE PARAMETERS TO THE SED COMMAND
MODEL
# "TARGET" value.
f_power_sed_ecp "F_PSE_VAL_TO_ECP" 0
echo "$F_POWER_SED_ECP_R"
# "REPLACE" value.
f_power_sed_ecp "F_PSE_VAL_TO_ECP" 1
echo "$F_POWER_SED_ECP_R"
IMPORTANT: If the strings for KEYWORD and/or replace REPLACE contain tabs or line breaks you will need to use the "-z" flag in your "sed" command. More details here.
EXAMPLE
f_power_sed_ecp "{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate" 0
echo "$F_POWER_SED_ECP_R"
f_power_sed_ecp "[ ]+|$/,\"\0\"" 1
echo "$F_POWER_SED_ECP_R"
NOTE: The f_power_sed_ecp and f_power_sed functions above was made available completely free as part of this project ez_i - Create shell script installers easily!.
Standard recommendation here: use perl :)
echo KEYWORD > /tmp/test
REPLACE="<funny characters here>"
perl -pi.bck -e "s/KEYWORD/${REPLACE}/g" /tmp/test
cat /tmp/test
don't forget all the pleasure that occur with the shell limitation around " and '
so (in ksh)
Var=">New version of \"content' here <"
printf "%s" "${Var}" | sed "s/[&\/\\\\*\\"']/\\&/g' | read -r EscVar
echo "Here is your \"text\" to change" | sed "s/text/${EscVar}/g"
If the case happens to be that you are generating a random password to pass to sed replace pattern, then you choose to be careful about which set of characters in the random string. If you choose a password made by encoding a value as base64, then there is is only character that is both possible in base64 and is also a special character in sed replace pattern. That character is "/", and is easily removed from the password you are generating:
# password 32 characters log, minus any copies of the "/" character.
pass=`openssl rand -base64 32 | sed -e 's/\///g'`;
If you are just looking to replace Variable value in sed command then just remove
Example:
sed -i 's/dev-/dev-$ENV/g' test to sed -i s/dev-/dev-$ENV/g test
I have an improvement over the sedeasy function, which WILL break with special characters like tab.
function sedeasy_improved {
sed -i "s/$(
echo "$1" | sed -e 's/\([[\/.*]\|\]\)/\\&/g'
| sed -e 's:\t:\\t:g'
)/$(
echo "$2" | sed -e 's/[\/&]/\\&/g'
| sed -e 's:\t:\\t:g'
)/g" "$3"
}
So, whats different? $1 and $2 wrapped in quotes to avoid shell expansions and preserve tabs or double spaces.
Additional piping | sed -e 's:\t:\\t:g' (I like : as token) which transforms a tab in \t.
An easier way to do this is simply building the string before hand and using it as a parameter for sed
rpstring="s/KEYWORD/$REPLACE/g"
sed -i $rpstring test.txt

How to search & replace arbitrary literal strings in sed and awk (and perl)

Say we have some arbitrary literals in a file that we need to replace with some other literal.
Normally, we'd just reach for sed(1) or awk(1) and code something like:
sed "s/$target/$replacement/g" file.txt
But what if the $target and/or $replacement could contain characters that are sensitive to sed(1) such as regular expressions. You could escape them but suppose you don't know what they are - they are arbitrary, ok? You'd need to code up something to escape all possible sensitive characters - including the '/' separator. eg
t=$( echo "$target" | sed 's/\./\\./g; s/\*/\\*/g; s/\[/\\[/g; ...' ) # arghhh!
That's pretty awkward for such a simple problem.
perl(1) has \Q ... \E quotes but even that can't cope with the '/' separator in $target.
perl -pe "s/\Q$target\E/$replacement/g" file.txt
I just posted an answer!! So my real question is, "is there a better way to do literal replacements in sed/awk/perl?"
If not, I'll leave this here in case it comes in useful.
The quotemeta, which implements \Q, absolutely does what you ask for
all ASCII characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash
Since this is presumably in a shell script, the problem is really of how and when shell variables get interpolated and so what the Perl program ends up seeing.
The best way is to avoid working out that interpolation mess and instead properly pass those shell variables to the Perl one-liner. This can be done in several ways; see this post for details.
Either pass the shell variables simply as arguments
#!/bin/bash
# define $target
perl -pe"BEGIN { $patt = shift }; s{\Q$patt}{$replacement}g" "$target" file.txt
where the needed arguments are removed from #ARGV and utilized in a BEGIN block, so before the runtime; then file.txt gets processed. There is no need for \E in the regex here.
Or, use the -s switch, which enables command-line switches for the program
# define $target, etc
perl -s -pe"s{\Q$patt}{$replacement}g" -- -patt="$target" file.txt
The -- is needed to mark the start of arguments, and switches must come before filenames.
Finally, you can also export the shell variables, which can then be used in the Perl script via %ENV; but in general I'd rather recommend either of the above two approaches.
A full example
#!/bin/bash
# Last modified: 2019 Jan 06 (22:15)
target="/{"
replacement="&"
echo "Replace $target with $replacement"
perl -wE'
BEGIN { $p = shift; $r = shift };
$_=q(ah/{yes); s/\Q$p/$r/; say
' "$target" "$replacement"
This prints
Replace /{ with &
ah&yes
where I've used characters mentioned in a comment.
The other way
#!/bin/bash
# Last modified: 2019 Jan 06 (22:05)
target="/{"
replacement="&"
echo "Replace $target with $replacement"
perl -s -wE'$_ = q(ah/{yes); s/\Q$patt/$repl/; say' \
-- -patt="$target" -repl="$replacement"
where code is broken over lines for readability here (and thus needs the \). Same printout.
Me again!
Here's a simpler way using xxd(1):
t=$( echo -n "$target" | xxd -p | tr -d '\n')
r=$( echo -n "$replacement" | xxd -p | tr -d '\n')
xxd -p file.txt | sed "s/$t/$r/g" | xxd -p -r
... so we're hex-encoding the original text with xxd(1) and doing search-replacement using hex-encoded search strings. Finally we hex-decode the result.
EDIT: I forgot to remove \n from the xxd output (| tr -d '\n') so that patterns can span the 60-column output of xxd. Of course, this relies on GNU sed's ability to operate on very long lines (limited only by memory).
EDIT: this also works on multi-line targets eg
target=$'foo\nbar'
replacement=$'bar\nfoo'
With awk you could do it like this:
awk -v t="$target" -v r="$replacement" '{gsub(t,r)}' file
The above expects t to be a regular expression, to use it a string you can use
awk -v t="$target" -v r="$replacement" '{while(i=index($0,t)){$0 = substr($0,1,i-1) r substr($0,i+length(t))} print}' file
Inspired from this post
Note that this won't work properly if the replacement string contains the target. The above link has solutions for that too.
This is an enhancement
of wef’s answer.
We can remove the issue of the special meaning of various special characters
and strings (^, ., [, *, $, \(, \), \{, \}, \+, \?,
&, \1, …, whatever, and the / delimiter)
by removing the special characters. 
Specifically, we can convert everything to hex;
then we have only 0-9 and a-f to deal with. 
This example demonstrates the principle:
$ echo -n '3.14' | xxd
0000000: 332e 3134 3.14
$ echo -n 'pi' | xxd
0000000: 7069 pi
$ echo '3.14 is a transcendental number. 3614 is an integer.' | xxd
0000000: 332e 3134 2069 7320 6120 7472 616e 7363 3.14 is a transc
0000010: 656e 6465 6e74 616c 206e 756d 6265 722e endental number.
0000020: 2020 3336 3134 2069 7320 616e 2069 6e74 3614 is an int
0000030: 6567 6572 2e0a eger..
$ echo "3.14 is a transcendental number. 3614 is an integer." | xxd -p \
| sed 's/332e3134/7069/g' | xxd -p -r
pi is a transcendental number. 3614 is an integer.
whereas, of course, sed 's/3.14/pi/g' would also change 3614.
The above is a slight oversimplification; it doesn’t account for boundaries. 
Consider this (somewhat contrived) example:
$ echo -n 'E' | xxd
0000000: 45 E
$ echo -n 'g' | xxd
0000000: 67 g
$ echo '$Q Eak!' | xxd
0000000: 2451 2045 616b 210a $Q Eak!.
$ echo '$Q Eak!' | xxd -p | sed 's/45/67/g' | xxd -p -r
&q gak!
Because $ (24) and Q (51)
combine to form 2451,
the s/45/67/g command rips it apart from the inside. 
It changes 2451 to 2671, which is &q (26 + 71). 
We can prevent that by separating the bytes of data in the search text,
the replacement text and the file with spaces. 
Here’s a stylized solution:
encode() {
xxd -p -- "$#" | sed 's/../& /g' | tr -d '\n'
}
decode() {
xxd -p -r -- "$#"
}
left=$( printf '%s' "$search" | encode)
right=$(printf '%s' "$replacement" | encode)
encode file.txt | sed "s/$left/$right/g" | decode
I defined an encode function because I used that functionality three times,
and then I defined decode for symmetry. 
If you don’t want to define a decode function, just change the last line to
encode file.txt | sed "s/$left/$right/g" | xxd -p –r
Note that the encode function triples the size of the data (text)
in the file, and then sends it through sed as a single line
— without even having a newline at the end. 
GNU sed seems to be able to handle this;
other versions might not be able to.
As an added bonus, this solution handles multi-line search and replace
(in other words, search and replacement strings that contain newline(s)).
I can explain why this doesn't work:
perl(1) has \Q ... \E quotes but even that can't cope with the '/' separator in $target.
The reason is because the \Q and \E (quotemeta) escapes are processed after the regex is parsed, and a regex is not parsed unless there are valid pattern delimiters defining a regex.
As an example, here's an attempt to replace the string /etc/ in /etc/hosts by using a variable in a string passed to perl:
$target="/etc/";
perl -pe "s/\Q$target\E/XXX/" <<<"/etc/hosts";
After the shell expands the variable in the string, perl receives the command s/\Q/etc/\E/XXX/ which is not a valid regex because it doesn't contain three pattern delimiters (perl sees five delimiters, i.e., s/…/…/…/…/). Therefore, the \Q and \E are never even executed.
The solution, as #zdim suggested, is to pass the variables to perl in a way that they are included in the regex after the regex is parsed, such as like this:
perl -s -pe 's/\Q$target\E/XXX/ig' -- -target="/etc/" <<<"/etc/123"
awk escaping is not all that complex either :
on the searching regex, just these 2 suffices to escape any and all awk variants - simply "cage" all of them, with additional escaping performed for just the circumflex/caret, and backslash itself :
-- technically you don't need to escape space at all - sometimes i like using it for marking an unambiguous anchoring point for the character instead of letting awk be too agile about how it handles spaces and tabs. swap the space for "!" inside the regex if u like
jot -s '' -c - 32 126 |
mawk 'gsub("[[-\440{-~:-# -/]", "[&]") \
gsub(/\\|\^/, "\\\\&")^_' FS='^$' RS='^$'
\440 is (`) - i'm just not a fan of having those exposed in my code
|
[ ][!]["][#][$][%][&]['][(][)][*][+] [,] [-][.] [/] # re-aligned for
0123456789 [:][;] [<] [=][>] [?] # readability
[#]ABCDEFGHIJKLMNOPQRSTUVWXYZ [[][\\] []][\^][_]
[`]abcdefghijklmnopqrstuvwxyz [{] [|] [}][~]
as for replacement, only literal "&" needs to be escaped via
gsub(target_regex, "&") # nothing escaped
matched text
gsub(target_regex, "\\&") # 2 backslashes
literal "&"
gsub("[[:punct:]]", "\\\\&") # 4 backslashes
\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\#\[\\\]\^\_\`\{\|\}\~
—- (personally prefer using square-brackets i.e. char classes as an escaping mechanism than having backslash galore)
gsub("[[:punct:]]", "\\\\\\&") # 6 backslashes
\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&
Use 6-backslashes only if you're planning to feed this output further down to another gsub()/match() function call

How can I accept unquoted strings containing backslashes?

I want a command to convert from windows to unix filenames, simply to replace backslashes with frontslashes... but without quoting the argument with "" because that's a chore when copy-pasting.
It works in the other direction (u2w) with the input quoted and without, but not for w2u.
machine:~/glebbb> w2u "a\b\c"
a/b/c
machine:~/glebbb> w2u a\b\c
abc
How can I make it work? I tried every form of escaping, echo -E, printf etc, nothing seems to work!
function w2u {
if [ -z "$1" ] ; then
echo "w2u: must provide path to convert!"
return 1
else
printf "\n%s\n\n" "$1" | sed -e 's#\\#\/#g'
return 0
fi
}
If you're copy-pasting and the path is contained in the X clipboard, you can use xclip:
xclip -o | sed -e 's#\\#\/#g'
If you've got a ton of file paths to convert, you can process the whole file instead:
sed ... < file
will produce a new stream with the backslashes changed to slashes.
Otherwise I can't think of any way how to not-escape the parameters to w2u and yet have backslashes lose their meaning.

sed Capital_Case not working

I'm trying to convert a string that has either - (hyphen) or _ (underscore) to Capital_Case string.
#!/usr/bin/env sh
function cap_case() {
[ $# -eq 1 ] || return 1;
_str=$1;
_capitalize=${_str//[-_]/_} | sed -E 's/(^|_)([a-zA-Z])/\u\2/g'
echo "Capitalize:"
echo $_capitalize
return 0
}
read string
echo $(cap_case $string)
But I don't get anything out.
First I am replacing any occurrence of - and _ with _ ${_str//[-_]/_}, and then I pipe that string to sed which finds the first letter, or _ as the first group, and then the letter after the first group in the second group, and I want to uppercase the found letter with \u\2. I tried with \U\2 but that didn't work as well.
I want the string some_string to become
Some_String
And string some-string to become
Some_String
I'm on a mac, using zsh if that is helpful.
EDIT: More generic solution here to make each field's first letter Capital.
echo "some_string_other" | awk -F"_" '{for(i=1;i<=NF;i++){$i=toupper(substr($i,1,1)) substr($i,2)}} 1' OFS="_"
Following awk may help you.
echo "some_string" | awk -F"_" '{$1=toupper(substr($1,1,1)) substr($1,2);$2=toupper(substr($2,1,1)) substr($2,2)} 1' OFS="_"
Output will be as follows.
echo "some_string" | awk -F"_" '{$1=toupper(substr($1,1,1)) substr($1,2);$2=toupper(substr($2,1,1)) substr($2,2)} 1' OFS="_"
Some_String
This being zsh, you don't need sed (or even a function, really):
$ s=some-string-bar
$ print ${(C)s:gs/-/_}
Some_String_Bar
The (C) flag capitalizes words (where "words" are defined as sequences of alphanumeric characters separated by other characters); :gs/-/_ replaces hyphens with underscores.
If you really want a function, it's cap_case () { print ${(C)1:gs/-/_} }.
pure bash:
#!/bin/bash
camel_case(){
local d display string
declare -a strings # = scope local
[ "$2" ] && d="$2" || d=" " # optional output delimiter
ifs_ini="$IFS"
IFS+='_-' # we keep initial IFS
strings=( "$1" ) # array
for string in ${strings[#]} ; do
display+="${string^}$d"
done
echo "${display%$d}"
IFS="$ifs_ini"
}
camel_case "some-string_here" "_"
camel_case "some-string_here some strings here" "+"
camel_case "some-string_here some strings here"
echo "$BASH_VERSION"
exit
output:
Some_String_Here
Some+String+Here+Some+Strings+Here
Some String Here Some Strings Here
4.4.18(1) release
You can try this gnu sed
echo 'some_other-string' | sed -E 's/(^.)/\u&/;s/[_-](.)/_\u\1/g'
Explains :
s/(^.)/\u&/
(^.) match the first char and \u& put the match in capital letter.
s/[_-](.)/_\u\1/g
[_-](.) capture a char preceded by _ or - and replace it by _ and the matched char in capital letter.
The g at the end tell sed to make the replacement for each char which meet the criteria
You didn't assign to _capitalize - you set a _capitalize environment variable for the empty command that you piped into sed.
You probably meant
_capitalize=$(<<<"${_str//[-_]/_}" sed -E 's/(^|_)([a-zA-Z])/\1\u\2/g')
Note also that ${//} isn't standard shell, so you really ought to specify an interpreter other than sh.
A simpler approach would be simply:
#!/bin/sh
cap_case() {
printf "Capitalize: "
echo "$*" | sed -e 'y/-/_/' -e 's/\(^\|_\)[[:alpha:]]/\U&/g'
}
echo $(cap_case "snake_case")
Note that the \u / \U replacement is a GNU extension to sed - if you're using a non-GNU implementation, check whether it supports this feature.

Looking for a regex pattern, passing that pattern to a script, and replacing the pattern with the output of the script

For every time the pattern shows up (In this example the case of a 2 digit number) I want to pass that pattern to a script and replace that pattern with the output of a script.
I'm using sed an example of what it should look like would be
echo 'siedi87sik65owk55dkd' | sed 's/[0-9][0-9]/.\/script.sh/g'
Right now this returns
siedi./script.shsik./script.showk./script.shdkd
But I would like it to return
siedi!!!87!!!sik!!!65!!!owk!!!55!!!dkd
This is what is in ./script.sh
#!/bin/bash
echo "!!!$1!!!"
It has to be replaced with the output. In this example I know I could just use a normal sed substitution but I don't want that as an answer.
sed is for simple substitutions on individual lines, that is all. Anything else, even if it can be done, requires arcane language constructs that became obsolete in the mid-1970s when awk was invented and are used today purely for the mental exercise. Your problem is not a simple substitution so you shouldn't try to use sed to solve it.
You're going to want something like:
awk '{
head = ""
tail = $0
while ( match(tail,/[0-9]{2}/) ) {
tgt = substr(tail,RSTART,RLENGTH)
cmd = "./script.sh " tgt
if ( (cmd | getline line) > 0) {
tgt = line
}
close(cmd)
head = head substr(tail,1,RSTART-1) tgt
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
e.g. using an echo in place of your script.sh command:
$ echo 'siedi87sik65owk55dkd' |
awk '{
head = ""
tail = $0
while ( match(tail,/[0-9]{2}/) ) {
tgt = substr(tail,RSTART,RLENGTH)
cmd = "echo !!!" tgt "!!!"
if ( (cmd | getline line) > 0) {
tgt = line
}
close(cmd)
head = head substr(tail,1,RSTART-1) tgt
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
siedi!!!87!!!sik!!!65!!!owk!!!55!!!dkd
Ed's awk solution is obviously the way to go here.
For fun, I tried to come up with a sed solution, and here is (a convoluted GNU sed) one that takes the pattern and the script to be run as parameters; the input is either read from standard input (i.e., you can pipe to it) or from a file supplied as the third argument.
For your example, we'd have infile with contents
siedi87sik65owk55dkd
siedi11sik22owk33dkd
(two lines to demonstrate how this works for multiple lines), then script with contents
#!/bin/bash
echo "!!!${1}!!!"
and finally the solution script itself, so. Usage is
./so pattern script [input]
where pattern is an extended regular expression as understood by GNU sed (with the -r option), script is the name of the command you want to run for each match, and the optional input is the name of the input file if input is not standard input.
For your example, this would be
./so '[[:digit:]]{2}' script infile
or, as a filter,
cat infile | ./so '[[:digit:]]{2}' script
with output
siedi!!!87!!!sik!!!65!!!owk!!!55!!!dkd
siedi!!!11!!!sik!!!22!!!owk!!!33!!!dkd
This is what so looks like:
#!/bin/bash
pat=$1 # The pattern to match
script=$2 # The command to run for each pattern
infile=${3:-/dev/stdin} # Read from standard input if not supplied
# Use sed and have $pattern and $script expand to the supplied parameters
sed -r "
:build_loop # Label to loop back to
h # Copy pattern space to hold space
s/.*($pat).*/.\/\"$script\" \1/ # (1) Extract last match and prepare command
# Replace pattern space with output of command
e
G # (2) Append hold space to pattern space
s/(.*)$pat(.*)/\1~~~\2/ # (3) Replace last match of pattern with ~~~
/\n[^\n]*$pat[^\n]*$/b build_loop # Loop if string contains match
:fill_loop # Label for second loop
s/(.*\n)(.*)\n([^\n]*)~~~([^\n]*)$/\1\3\2\4/ # (4) Replace last ~~~
t fill_loop # Loop if there was a replacement
s/(.*)\n(.*)~~~(.*)$/\2\1\3/ # (5) Final ~~~ replacement
" < "$infile"
The sed command works with two loops. The first one copies the pattern space to the hold space, then removes everything but the last match from the pattern space and prepares the command to be run. After the substitution with (1) in its comment, the pattern space looks like this:
./script 55
The e command (a GNU extension) then replaces the pattern space with the output of this command. After this, G appends the hold space to the pattern space (2). The pattern space now looks like this:
!!!55!!!
siedi87sik65owk55dkd
The substitution at (3) replaces the last match with a string hopefully not equal to the pattern and we get
!!!55!!!
siedi87sik65owk~~~dkd
The loop repeats if the last line of the pattern space still has a match for the pattern. After three loops, the pattern space looks like this:
!!!87!!!
!!!65!!!
!!!55!!!
siedi~~~sik~~~owk~~~dkd
The second loop now replaces the last ~~~ with the second to last line of the pattern space with substitution (4). The command uses lots of "not a newline" ([^\n]) to make sure we're not pulling the wrong replacement for ~~~.
Because of the way command (4) is written, the loop ends with one last substitution to go, so before command (5), we have this pattern space:
!!!87!!!
siedi~~~sik!!!65!!!owk!!!55!!!dkd
Command (5) is a simpler version of command (4), and after it, the output is as desired.
This seems to be fairly robust and can deal with spaces in the name of the script to be run as long as it's properly quoted when calling:
./so '[[:digit:]]{2}' 'my script' infile
This would fail if
The input file contains ~~~ (solvable by replacing all occurrences at the start, putting them back at the end)
The output of script contains ~~~
The pattern contains ~~~
i.e., the solution very much depends on ~~~ being unique.
Because nobody asked: so as a one-liner.
#!/bin/bash
sed -re ":b;h;s/.*($1).*/.\/\"$2\" \1/;e" -e "G;s/(.*)$1(.*)/\1~~~\2/;/\n[^\n]*$1[^\n]*$/bb;:f;s/(.*\n)(.*)\n([^\n]*)~~~([^\n]*)$/\1\3\2\4/;tf;s/(.*)\n(.*)~~~(.*)$/\2\1\3/" < "${3:-/dev/stdin}"
Still works!
A conceptually simpler multi-utility solution:
Using GNU utilities:
echo 'siedi87sik65owk55dkd' |
sed 's|[0-9]\{2\}|$(./script.sh &)|g' |
xargs -d'\n' -I% sh -c 'echo '\"%\"
Using BSD utilities (also works with GNU utilities):
echo 'siedi87sik65owk55dkd' |
sed 's|[0-9]\{2\}|$(./script.sh &)|g' | tr '\n' '\0' |
xargs -0 -I% sh -c 'echo '\"%\"
The idea is to use sed to translate the tokens of interest lexically into a string containing shell command substitutions that invoke the target script with the token, and then pass the result to the shell for evaluation.
Note:
Any embedded " and $ characters in the input must be \-escaped.
xargs -d'\n' (GNU) and tr '\n' '\0' / xargs -0 (BSD) are only needed to correctly preserve whitespace in the input - if that is not needed, the following POSIX-compliant solution will do:
echo 'siedi87sik65owk55dkd' |
sed 's|[0-9]\{2\}|$(./script.sh &)|g' | tr '\n' '\0' |
xargs -I% sh -c 'printf "%s\n" '\"%\"

Resources