sed branching not working on OSX: undefined label - macos

I'm trying to adapt the answer from https://stackoverflow.com/a/66365284/1236401 that adds control flow to provide match status code:
cat file.txt | sed 's/1/replaced-it/;tx;q1;:x'
It works as expected on Ubuntu and Alpine, but fails on Mac OSX (11.6), using any shell.
sed: 1: "s/1/replaced-it/;tx;q1;:x": undefined label 'x;q1;:x'
All references I could find to sed misbehaving on OSX were for in-place file edit, which is not the case here.

Commands in sed are separated primarily by newlines.
| sed 's/1/replaced-it/
tx
q1
:x
'
Alternatively:
sed -e 's/1/replaced-it/' -e 'tx' -e 'q1' -e ':x'
Additionally q1 is a GNU sed extension - it's not supported in every sed. It has to be removed, refactored, or you have to install GNU sed.
Overall, write it in awk, python or perl.

Here is an Awk refactoring.
awk '/1/ { sub("1", "replaced-it", $0); replaced=1 } 1
END { exit 1-replaced }' file.txt
Notice also how the cat is useless (with sed too, and generally any standard Unix command except annoyingly tr).

An AWK equivalent would be:
awk '!sub(/1/,"replaced-it") {print; exit 1} 1' file.txt
If a substitution is not made successfully: print and exit with a status of 1; otherwise print.

Related

"sed" doesn't match pattern

I'm trying to format cut, paste output but sed not working.
file.txt
Apple
Banana
Apple
Banana
Orange
Apple
Orange
code.sh
cut -f2 file.txt | sort | uniq | sed 's/^\|$/#/g'| paste -sd,\& -
expected output / output on ubuntu
#Apple#,#Banana#&#Orange#
getting output / output on macos
Apple,Banana&Orange
Note: The code works on Ubuntu, but on MacOS it doesn't.
This can be done in a single gnu-awk:
awk '!seen[$1]++{} END {
PROCINFO["sorted_in"]="#ind_str_asc"
for (i in seen)
s = s (s == "" ? "" : (++j==1?",":"&")) "#" i "#"
print s
}' file
#Apple#,#Banana#&#Orange#
On OSX I have gnu awk installed via home brew.
As mentioned elsewhere, BSD sed doesn't support \|. Instead of replacing ^ and $, you can substitute # around the whole line.
sort -u file.txt | sed 's/.*/#&#/' | paste -sd,'&' -
As far as I know, BSD/Mac sed doesn't support \|. See sed not giving me correct substitute operation for newline with Mac - differences between GNU sed and BSD / OSX sed for details.
As an alternate, you can use ERE instead of BRE. I checked it on Linux, apparently this still doesn't seem to work on Mac (See also: MacOS sed: match either beginning or end).
$ echo 'Apple' | sed -E 's/^|$/#/g'
#Apple#
# workaround for Mac
$ echo 'Apple' | sed -e 's/^/#/' -e 's/$/#/'
#Apple#
Instead of sort+uniq+sed, you can also use awk (but note that awk solution shown here removes duplicates while preserving original order, doesn't sort the input):
$ awk '!seen[$0]++{print "#" $0 "#"}' ip.txt
#Apple#
#Banana#
#Orange#
Change $0 to $2 if you want only the second field, based on your use of cut
A simple way to do it using the sed command:
sed -E 's/[[:alnum:]]+/#&#/'
the -E option for enabling the POSIX ERE (extended regular
expression)
[[:alnum:]]+ The alphanumeric characters; in ASCII, equivalent to [A-Za-z0-9] with the plus (+) to refer to one or more.
the & symbol, does bring or refer to the content of the pattern we found. (on which we surrounded it with #)

How to delete a line (matching a pattern) from a text file? [duplicate]

How would I use sed to delete all lines in a text file that contain a specific string?
To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile
There are many other ways to delete lines with specific string besides sed:
AWK
awk '!/pattern/' file > temp && mv temp file
Ruby (1.9+)
ruby -i.bak -ne 'print if not /test/' file
Perl
perl -ni.bak -e "print unless /pattern/" file
Shell (bash 3.2 and later)
while read -r line
do
[[ ! $line =~ pattern ]] && echo "$line"
done <file > o
mv o file
GNU grep
grep -v "pattern" file > temp && mv temp file
And of course sed (printing the inverse is faster than actual deletion):
sed -n '/pattern/!p' file
You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
sed -i '/pattern/d' filename
or
grep -v "pattern" filename > filename2; mv filename2 filename
The first command takes 3 times longer on my machine anyway.
The easy way to do it, with GNU sed:
sed --in-place '/some string here/d' yourfile
You may consider using ex (which is a standard Unix command-based editor):
ex +g/match/d -cwq file
where:
+ executes given Ex command (man ex), same as -c which executes wq (write and quit)
g/match/d - Ex command to delete lines with given match, see: Power of g
The above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for ex.
The difference with sed is that:
sed is a Stream EDitor, not a file editor.BashFAQ
Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/-i) are non-standard FreeBSD extensions and may not be available on other operating systems.
I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where $file is the file where deletion is needed and $pattern is the pattern to be matched for deletion.
I picked the '' from this comment.
The thing to note here is use of double quotes in "/$pattern/d". Variable won't work when we use single quotes.
You can also use this:
grep -v 'pattern' filename
Here -v will print only other than your pattern (that means invert match).
To get a inplace like result with grep you can do this:
echo "$(grep -v "pattern" filename)" >filename
I have made a small benchmark with a file which contains approximately 345 000 lines. The way with grep seems to be around 15 times faster than the sed method in this case.
I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
time sed -i "/CDGA_00004.pdbqt.gz.tar/d" /tmp/input.txt
real 0m0.711s
user 0m0.179s
sys 0m0.530s
time perl -ni -e 'print unless /CDGA_00004.pdbqt.gz.tar/' /tmp/input.txt
real 0m0.105s
user 0m0.088s
sys 0m0.016s
time (grep -v CDGA_00004.pdbqt.gz.tar /tmp/input.txt > /tmp/input.tmp; mv /tmp/input.tmp /tmp/input.txt )
real 0m0.046s
user 0m0.014s
sys 0m0.019s
Delete lines from all files that match the match
grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'
SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'
perl -i -nle'/regexp/||print' file1 file2 file3
perl -i.bk -nle'/regexp/||print' file1 file2 file3
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).
You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.
echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt
Just in case someone wants to do it for exact matches of strings, you can use the -w flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:
-bash-4.1$ head file
1
11
111
-bash-4.1$ grep -v "11" file
1
-bash-4.1$ grep -w -v "11" file
1
111
It also works with the -f flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":
grep -w -v -f blacklist file
to show the treated text in console
cat filename | sed '/text to remove/d'
to save treated text into a file
cat filename | sed '/text to remove/d' > newfile
to append treated text info an existing file
cat filename | sed '/text to remove/d' >> newfile
to treat already treated text, in this case remove more lines of what has been removed
cat filename | sed '/text to remove/d' | sed '/remove this too/d' | more
the | more will show text in chunks of one page at a time.
Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
python: re.escape(STRING)
ruby: Regexp.escape(STRING)
java: Pattern.quote(STRING)
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
sed 's/\([\[/({.*+^$?]\)/\\\1/g'
So given an arbitrary string $STRING we could write something like:
re=$(sed 's/\([\[({.*+^$?]\)/\\\1/g' <<< "$STRING")
sed "/$re/d" FILE
or as a one-liner:
sed "/$(sed 's/\([\[/({.*+^$?]\)/\\\1/g' <<< "$STRING")/d"
with variations as described elsewhere on this page.
cat filename | grep -v "pattern" > filename.1
mv filename.1 filename
You can use good old ed to edit a file in a similar fashion to the answer that uses ex. The big difference in this case is that ed takes its commands via standard input, not as command line arguments like ex can. When using it in a script, the usual way to accomodate this is to use printf to pipe commands to it:
printf "%s\n" "g/pattern/d" w | ed -s filename
or with a heredoc:
ed -s filename <<EOF
g/pattern/d
w
EOF
This solution is for doing the same operation on multiple file.
for file in *.txt; do grep -v "Matching Text" $file > temp_file.txt; mv temp_file.txt $file; done
I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source

Delete all "\n" occurrences with sed

I would like to delete all "\n" (quotes, new line, quotes) in a text file.
I have tried:
sed 's/"\n"//g' < in > out
and also sed '/"\n"/d' < in > out but non of those seds worked.
What am I doing wrong?
This works with GNU sed on Linux: I don't have a Mac to test with.
sed '
# this reads the whole file into pattern space
:a; N; $ bb; ba; :b
# *now* make the replacement
s/"\n"//g
' <<END
one
two"
"three
four"
five
"six
END
one
twothree
four"
five
"six
This perl command accomplishes the same thing:
perl -0777 -pe 's/"\n"//g'
This awk-oneliner works here, you can give it a try:
awk -F'"\n"' -v RS='\0' -v ORS="" '{$1=$1;print}' file
a small test: tested with gawk
kent$ cat f
foo"
"bar"
"bla"
new line should be kept
this too
kent$ awk -F'"\n"' -v RS='\0' -v ORS="" '{$1=$1;print}' f
foo bar bla"
new line should be kept
this too
If you don't want to have the space between foo and bar blah .., add -v OFS="" to awk
Try this -- you need to escape the backslash to make it literal.
sed 's/"\\n"//g' < in > out
Verified on OSX.
The accepted answer was marked as such because of the Perl command it contains.
The sed command doesn't actually work on OSX, because it uses features specific to GNU sed, whereas OSX use BSD sed.
An equivalent answer requires only a few tweaks - note that this will work with both BSD and GNU sed:
Using multiple -e options:
sed -e ':a' -e '$!{N;ba' -e '}; s/"\n"//g' < in > out
Or, using an ANSI C-quoted string in Bash:
sed $':a\n$!{N;ba\n}; s/"\\n"//g' < in > out
Or, using a multi-line string literal:
sed ':a
$!{N;ba
}; s/"\n"//g' < in > out
BSD sed requires labels (e.g., :a) and branching commands (e.g., b) to be terminated with an actual newline (whereas in GNU sed a ; suffices), or, alternatively, for the script to be broken into multiple -e options, with each part ending where a newline is required.
For a detailed discussion of the differences between GNU and BSD sed, see https://stackoverflow.com/a/24276470/45375
$':a\n$!{N;ba\n}' is a common sed idiom for reading all input lines into the so-called pattern space (buffer on which (subsequent) commands operate):
:a is a label that can be branched to
$! matches every line but the last
{N;ba\n} keeps building the buffer by adding the current line (N) to it, then branching back to label :a to repeat the cycle.
Once the last line is reached, no branching is performed, and the buffer at that point contains all input lines, at which point the desired substitution (s/"\n"//g) is performed on the entire buffer.
As for why the OP's approach didn't work:
sed reads files line by line by default, so by default it can only operate on one line at a time.
In order to be able to replace newline chars. - i.e., to operate across multiple lines - you must explicitly read multiple/all lines first, as above.
instead of sed you could also use tr, I've tested it and for me it worked
tr -d '"\\n"' < input.txt > output.txt

Bash - remove all lines beginning with 'P'

I have a text file that's about 300KB in size. I want to remove all lines from this file that begin with the letter "P". This is what I've been using:
> cat file.txt | egrep -v P*
That isn't outputting to console. I can use cat on the file without another other commands and it prints out fine. My final intention being to:
> cat file.txt | egrep -v P* > new.txt
No error appears, it just doesn't print anything out and if I run the 2nd command, new.txt is empty.
I should say I'm running Windows 7 with Cygwin installed.
Explanation
use ^ to anchor your pattern to the beginning of the line ;
delete lines matching the pattern using sed and the d flag.
Solution #1
cat file.txt | sed '/^P/d'
Better solution
Use sed-only:
sed '/^P/d' file.txt > new.txt
With awk:
awk '!/^P/' file.txt
Explanation
The condition starts with an ! (negation), that negates the following pattern ;
/^P/ means "match all lines starting with a capital P",
So, the pattern is negated to "ignore lines starting with a capital P".
Finally, it leverage awk's behavior when { … } (action block) is missing, that is to print the record validating the condition.
So, to rephrase, it ignores lines starting with a capital P and print everything else.
Note
sed is line oriented and awk column oriented. For your case you should use the first one, see Edouard Lopez's reponse.
Use sed with inplace substitution (for GNU sed, will also for your cygwin)
sed -i '/^P/d' file.txt
BSD (Mac) sed
sed -i '' '/^P/d' file.txt
Use start of line mark and quotes:
cat file.txt | egrep -v '^P.*'
P* means P zero or more times so together with -v gives you no lines
^P.* means start of line, then P, and any char zero or more times
Quoting is needed to prevent shell expansion.
This can be shortened to
egrep -v ^P file.txt
because .* is not needed, therefore quoting is not needed and egrep can read data from file.
As we don't use extended regular expressions grep will also work fine
grep -v ^P file.txt
Finally
grep -v ^P file.txt > new.txt
This works:
cat file.txt | egrep -v -e '^P'
-e indicates expression.

awk and cat - How to ignore multiple lines?

I need to extract Voip log from a D-Link router, so I've setup a little python script that executes a command in this router via telnet.
My script does a "cat /var/log/calls.log" and returns the result, however...
it also sends non-important stuff, like the BusyBox banner, etc...
How can I ignore lines from 1 to 6 and the last 2 ?
This is my current output:
yaba#foobar:/stuff$ python calls.py
BusyBox v1.00 (2009.04.09-11:17+0000) Built-in shell (msh)
Enter 'help' for a list of built-in commands.
DVA-G3170i/PT # cat /var/call.log
1 ,1294620563,2 ,+351xxx080806 ,xxx530802 ,1 ,3 ,1
DVA-G3170i/PT # exit
And I just need:
1 ,1294620563,2 ,+351xxx080806 ,xxx530802 ,1 ,3 ,1
(it can have multiple lines)
So that I can save it to a CSV and later to a sql db.
Thanks, and sorry my bad english.
Why not use a pattern in AWK to match the text you want?
python calls.py | awk '/^[0-9]/{print}/'
The whole POINT of AWK is matching lines based on patterns and manipulating/printing those matched lines.
Edited to add example run.
Here's a junk data file based on your sample above.
$ cat junk.dat
BusyBox v1.00 (2009.04.09-11:17+0000) Built-in shell (msh)
Enter 'help' for a list of built-in commands.
DVA-G3170i/PT # cat /var/call.log
1 ,1294620563,2 ,+351xxx080806 ,xxx530802 ,1 ,3 ,1
DVA-G3170i/PT # exit
Here's running it through AWK with a filter.
$ cat junk.dat | awk '/^[0-9]/ {print}'
1 ,1294620563,2 ,+351xxx080806 ,xxx530802 ,1 ,3 ,1
No need for SED, no need for counting lines, no need for anything but AWK. Why make things more complicated than they need to be?
In one call to sed:
sed -n '1,6d;7,${N;$q;P;D}'
or for picky versions of sed:
sed -ne '1,6d' -e '7,${N' -e '$q' -e 'P' -e 'D}'
You could also do it based on matches:
sed -n '/^[0-9]+/p'
or something similar.
But why doesn't your Python script read the file and do the filtering (instead of calling an external utility)?
python calls.py | sed -e 1,6d -e '$d'
So that might work. It will filter out the first 6 and the last, which is what your example indicates you need. If you really want to clobber the last two lines then you could do:
python calls.py | sed -e 1,6d -e '$d' | sed -e '$d'
But wait ... you said awk, so...
python calls.py | awk '{ if(NR > 7) { print t }; t = $0 }'
This might work for you:
sed '1,6d;$!N;$d;P;D' file
I'm not sure this is the best way to do it (maybe D-Link router has FTP or SSH support) but you can do it with awk:
awk '/cat/, /exit/' | sed -e '1d' -e '$d'
awk will print everything between lines containing "cat" and "exit", unfortunately including these two lines. That's what the remaining commands are for, I couldn't figure out how to do it nicer than that...

Resources