Copyright character in sed - utf-8

I'm trying to remove all lines containing the copyright character (among other things, in a bash script), but it's not working at all:
cat $srcdir/$txtfile |
sed "s/.*©.*/d" |
cat > $tgtdir/$txtfile
does nothing. However, running
echo blah © blah | sed "s/.*©.*//g"
in the terminal correctly yields
blah blah
I'm using SciTE set to UTF-8 encoding, so the first block of code above is exactly what I see in the editor. Any ideas on how I could represent it in an editor so sed will recognise it?

You might try using the octal representation of © which is 251
$ echo blah © blah | sed 's/\o251/X/'
blah blah
That is on "oh" and not a zero.
To delete lines containing that character, use
sed '/\o251/d'

The sed command doesn’t look right. Try
sed '/©/d'
And check that the appropriate locale environment variable set in the shell in which the script runs. For example, I use
LC_ALL=en_US.UTF-8

try using grep instead
grep -v '©'

Related

sed replacement for no_proxy cuts number in localhost

I have a no_proxy string I'd like to modify to fit with Java Opts.
❯ export no_proxy_test=localhost,127.0.0.1,.myfancy.domain,.myotherfancy.domain
❯ echo $no_proxy_test
localhost,127.0.0.1,.myfancy.domain,.myotherfancy.domain
I need for each value to be delimited with pipes "|" instead and I need to remove the dots for the wildcard. So should be:
localhost|127.0.0.1|myfancy.domain|myotherfancy.domain
When I use:
❯ echo $no_proxy_test | sed 's/,./|/g'
localhost|27.0.0.1|myfancy.domain|myotherfancy.domain
For some reason it cuts the 1 in 127.0.0.1 and I don't understand why? I thought I may achieve it with a double sed:
❯ echo $no_proxy_test | sed 's/,/|/g' | sed 's/|./|/g'
localhost|27.0.0.1|myfancy.domain|myotherfancy.domain
But same problem there. Does anyone have an idea? I don't want to sed replace 27 with 127. Would be interesting if I ran into a bug or if anyone could explain why 1 is being cut from the string.
Thank you!
You can use
sed -E 's/,\.?/|/g' # POSIX ERE
sed 's/,\.\{0,1\}/|/g' # POSIX BRE
This will replace a comma followed with an optional . char with a pipe symbol.
See the online demo:
#!/bin/bash
s='localhost,127.0.0.1,.myfancy.domain,.myotherfancy.domain'
sed 's/,\.\{0,1\}/|/g' <<< "$s"
# => localhost|127.0.0.1|myfancy.domain|myotherfancy.domain

cat -E analog for Mac OS

I used the scripts on Linux for like a couple of years with cat -E argument to display EOL as $.
But in Mac OS cat did not support -E argument.
Any alternative to do the same thing here?
The less code is better,
Maybe just relying on sed after some pipe?
I mean:
cat | sed -e 's/$/\$/'
Use
cat -e ...
This will display a dollar sign ($) at the end of a line. Use cat -etv ... to display the ending line, tab characters as ^I and other non-printing characters such as the carriage return as ^M.

How to replace '. ' with '.\n' using MacOS sed? [duplicate]

I am trying to scrub some lists into a properly formatted CSV file for database import.
My starting file, looks something like this with what is supposed to be each "line" spanning multiple lines like below
Mr. John Doe
Exclusively Stuff, 186
Caravelle Drive, Ponte Vedra
33487.
I created a sed script that cleans up the file (there's lots of "dirty" formatting like double spaces and spaces before/after commas). The problem is the Zip with the period. I would like to change that period for a new line, but I cannot get it to work.
The command that I use is:
sed -E -f scrub.sed test.txt
and the scrub.sed script is as follows:
:a
N
s|[[:space:]][[:space:]]| |g
s|,[[:space:]]|,|g
s|[[:space:]],|,|g
s|\n| |g
s|[[:space:]]([0-9]{5})\.|,FL,\1\n |g
$!ba
What I get is
Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487n
If figured that the Zip+.(period) would be a great "delimiter" to use the substitution on and while I can find it, I can't seem to tell it to put a newline there.
Most of the things I found online are about replacing the newline with something else (usually deleting them), but not much on replacing with a newline. I did find this, but it didn't work: How to insert newline character after comma in `),(` with sed?
Is there something I am missing?
Update:
I edited my scrub.sed file putting the literal new line as instucted. It still doesn't work
:a
N
s|[[:space:]][[:space:]]| |g
s|,[[:space:]]|,|g
s|[[:space:]],|,|g
s|\n| |g
s|[[:space:]]([0-9]{5})\.|,FL,\1\
|g
$!ba
What I get is (everything on one line):
Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487 Mrs. Jane Smith,Props and Stuff,123 Main Drive,Jacksonville,FL,336907
My expected output should be:
Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487
Mrs. Jane Smith,Props and Stuff,123 Main Drive,Jacksonville,FL,336907
The sed on BSD does not support the \n representation of a new line (turning it into a literal n):
$ echo "123." | sed -E 's/([[:digit:]]*)\./\1\n next line/'
123n next line
GNU sed does support the \n representation:
$ echo "123." | gsed -E 's/([[:digit:]]*)\./\1\nnext line/'
123
next line
Alternatives are:
Use a single character delimiter that you then use tr translate into a new line:
$ echo "123." | sed -E 's/([[:digit:]]*)\./\1|next line/' | tr '|' '\n'
123
next line
Or use an escaped literal new line in your sed script:
$ echo "123." | sed -E 's/([[:digit:]]*)\./\1\
next line/'
123
next line
Or define a new line:
POSIX:
nl='
'
BASH / zsh / others that support ANSI C quoting:
nl=$'\n'
And then use sed with appropriate quoting and escapes to insert the literal \n:
echo "123." | sed 's/\./'"\\${nl}"'next line/'
123
next line
Or use awk:
$ echo "123." | awk '/^[[:digit:]]+\./{sub(/\./,"\nnext line")} 1'
123
next line
Or use GNU sed which supports \n
The portable way to get a newline in sed is a backslash followed by a literal newline:
$ echo 'foo' | sed 's/foo/foo\
bar/'
foo
bar
I guarantee there's a far simpler solution to your whole problem by using awk rather than sed though.
The following works on Oracle Linux, x8664:
$ echo 'foobar' | sed 's/foo/foo\n/'
foo
bar
If you need it to match more than once per line, you'll need to place a g at the end, as in:
$ echo 'foobarfoobaz' | sed 's/foo/foo\n/g'
foo
barfoo
baz
Add a line after a match.
The sed command can add a new line after a pattern match is found. The "a" command to sed tells it to add a new line after a match is found.
sed '/unix/ a "Add a new line"' file.txt
unix is great os. unix is opensource. unix is free os.
"Add a new line"
learn operating system.
unixlinux which one you choose.
"Add a new line"
Add a line before a match
The sed command can add a new line before a pattern match is found. The "i" command to sed tells it to add a new line before a match is found.
sed '/unix/ i "Add a new line"' file.txt
"Add a new line"
unix is great os. unix is opensource. unix is free os.
learn operating system.
"Add a new line"
unixlinux which one you choose.

sed: remove all characters except for last n characters

I am trying to remove every character in a text string except for the remaining 11 characters. The string is Sample Text_that-would$normally~be,here--pe_-l4_mBY and what I want to end up with is just -pe_-l4_mBY.
Here's what I've tried:
$ cat food
Sample Text_that-would$normally~be,here--pe_-l4_mBY
$ cat food | sed 's/^.*(.{3})$/\1/'
sed: 1: "s/^.*(.{3})$/\1/": \1 not defined in the RE
Please note that the text string isn't really stored in a file, I just used cat food as an example.
OS is macOS High Sierra 10.13.6 and bash version is 3.2.57(1)-release
You can use this sed with a capture group:
sed -E 's/.*(.{11})$/\1/' file
-pe_-l4_mBY
Basic regular expressions (used by default by sed) require both the parentheses in the capture group and the braces in the brace expression to be escaped. ( and { are otherwise treated as literal characters to be matched.
$ cat food | sed 's/^.*\(.\{3\}\)$/\1/'
mBY
By contrast, explicitly requesting sed to use extended regular expressions with the -E option reverses the meaning, with \( and \{ being the literal characters.
$ cat food | sed -E 's/^.*(.{3})$/\1/'
mBY
Try this also:
grep -o -E '.{11}$' food
grep, like sed, accepts an arbitrary number of file name arguments, so there is no need for a separate cat. (See also useless use of cat.)
You can use tail or Parameter Expansion :
string='Sample Text_that-would$normally~be,here--pe_-l4_mBY'
echo "$string" | tail -c 11
echo "${string#${string%??????????}}"
pe_-l4_mBY
pe_-l4_mBY
also with rev/cut/rev
$ echo abcdefghijklmnopqrstuvwxyz | rev | cut -c1-11 | rev
pqrstuvwxyz
man rev => rev - reverse lines characterwise

Won't delete or replace £ (pound) sign

I'm trying to delete all the £ symbols in a text file (.csv) using sed, however it doesn't work and I've no clue as to why. Any suggestions? Basically using:
sed -i 's/£//g' file.csv
Output of locale:
LANG=en_GB.UTF-8
LANGUAGE=
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
Update:
So, I don't know why sed wasn't working, but
tr -d '\243' < inputfile
Does work! Thanks to #devnull for that!
This might work for you (GNU sed):
sed 's/o\302\o243//g' file
N.B. if you are in doubt how sed sees a particular character use the l command and sed will display the character as it sees it (in this case in octal). e.g.
echo '£' | sed 'l'
\302\243$
The dollar following the octal codes represents the end of string.
For long strings use l0 this will prevent wrapping.
Did you try:
sed "s/\xA3//" inputfile
Alternatively, you can use tr:
tr -d '\243' < inputfile
Use the hex code for the pound sign.
Seems like different versions in different platforms, sed act very differently. I just got the GNU version and works fine.
Before using sed, one should read the section "When should I NOT use sed"

Resources