sed: c command, change only the first line, text deleted - macos

I'm on my second day ever of shell scripting and I've stumbled into this problem: I want to change an entire line of code, which I identify by one word only, and I would like to do that only for the first occurrence.
I'm using sed and the c command, something that looks like this:
Text in file called "prova":
Apple is red
Apple is green
Banana
Tangerine
sed bit of code:
sed -i.bak '1,/Apple/c\
Apricot
' prova
(I'm using Mac OSX)
Strangely enough, and in agreement with what reported by these guys, if I do, I get this output for the prova file:
Apricot
Banana
Tangerine
One "Apple" is gone! Is there a way around this? Please, be patient, I'm a beginner...
Thanks in advance!

Try
sed '1cApricot' prova
With 1,/Apple/, you define a range, starting from line 1 and ending at the first occurrence of Apple after line 1. What you want is not a range, though, just a single line. This can be achieved by only using 1 (instead of e.g. 1,2).
The above command does work for me, but it depends on the sed version, if it doesn't work, try
sed '1c\
Apricot' prova
With the 1 you tell sed to change the first line.
If you don't necessarily want to change line 1, but the first occurrence of Apple, you can do
sed '0,/Apple/s/.*Apple.*/Apricot/'
I used the substitute command s (frankly, I never use c) here and it's only applied to the range starting from line 0 to the first occurrence of Apple. If it finds Apple, the whole line is replaced with Apricot.

sed 's/PatternToFind/PatternToReplaceWith/option'
So if you know the workd top find, use it in first part after the s/ ( PatternToFind). This is a Reduce Regular Expression so be carefull with char like*.[((and should be escape by` before) but alphanumeric are explicit.
Replace the (whole) corresponding pattern with the PatternToReplaceWith (here only few character like \& are special and should be escape by \)
You could also make several substitution serialy with a separation by new line or ;
sed 's/Apple/Pie/;s/Banana/Split/;s/Ice/Cream/g' YourFile
note the last g that mean every occurence on the line.
for first occurence only, you need to load the full file before in buffer before (load each line in holding buffer, at last line recall the buffer in working buffer and make your substitution
sed '1h;1!H
$ {x
s/Apple/Pie/;s/Banana/Split/;s/Ice/Cream/
}' YourFile

Related

Using shell scripts to remove all commas except for the first on each line

I have a text file consisting of lines which all begin with a numerical code, followed by one or several words, a comma, and then a list of words separated by commas. I need to delete all commas in every line apart from the first comma. For example:
1.2.3 Example question, a, question, that, is, hopefully, not, too, rudimentary
which should be changed to
1.2.3 Example question, a question that is hopefully not too rudimentary
I have tried using sed and shell scripts to solve this, and I can figure out how to delete the first comma on each line (1) and how to delete all commas (2), but not how to delete only the commas after the first comma on each line
(1)
while read -r line
do
echo "${line/,/}"
done <"filename.txt" > newfile.txt
mv newfile.txt filename.txt
(2)
sed 's/,//g' filename.txt > newfile.txt
You need to capture the first comma, and then remove the others. One option is to change the first comma into some otherwise unused character (Control-A for example), then remove the remaining commas, and finally replace the replacement character with a comma:
sed -e $'s/,/\001/; s/,//g; s/\001/,/'
(using Bash ANSI C quoting — the \001 maps to Control-A).
An alternative mechanism uses sed's labels and branches, as illustrated by Wiktor Stribiżew's answer.
If using GNU sed, you can specify a number in the flags of sed's s/// command along with g to indicate which match to start replacing at:
$ sed 's/,//2g' <<<'1.2.3 Example question, a, question, that, is, hopefully, not, too, rudimentary'
1.2.3 Example question, a question that is hopefully not too rudimentary
Its manual says:
Note: the POSIX standard does not specify what should happen when you mix the g and NUMBER modifiers, and currently there is no widely agreed upon meaning across sed implementations. For GNU sed, the interaction is defined to be: ignore matches before the NUMBERth, and then match and replace all matches from the NUMBERth on.
so if you're using a different sed, your mileage may vary. (OpenBSD and NetBSD seds raise an error instead, for example).
You can use
sed ':a; s/^\([^,]*,[^,]*\),/\1/;ta' filename.txt > newfile.txt
Details
:a - sets an a label
s/^\([^,]*,[^,]*\),/\1/ - finds 0+ non-commas at the start of string, a comma and again 0+ non-commas, capturing this substring into Group 1, and then just matching a , and replacing the match with the contents of Group 1 (removes the non-first comma)
ta - upon a successful replacement, jumps back to the a label location.
See an online sed demo:
s='1.2.3 Example question, a, question, that, is, hopefully, not, too, rudimentary'
sed ':a; s/^\([^,]*,[^,]*\),/\1/;ta' <<< "$s"
# => 1.2.3 Example question, a question that is hopefully not too rudimentary
awk 'NF>1 {$1=$1","} 1' FS=, OFS= filename.txt
sed ':a;s/,//2;t a' filename.txt
sed 's/,/\
/;s/,//g;y/\n/,/' filename.txt
This might work for you (GNU sed):
sed 's/,/&\n/;h;s/,//g;H;g;s/\n.*\n//' file
Append a newline to the first comma.
Copy the current line to the hold space.
Remove all commas in the current line.
Append the current line to the hold space.
Swap the current line for the hold space.
Remove everything between the introduced newlines.

Sed substitution places characters after back reference at beginning of line

I have a text file that I am trying to convert to a Latex file for printing. One of the first steps is to go through and change lines that look like:
Book 01 Introduction
To look like:
\chapter{Introduction}
To this end, I have devised a very simple sed script:
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)/\\chapter{\1}/p'
This does the job, except, the closing curly bracket is placed where the initial backslash should be in the substituted output. Like so:
}chapter{Introduction
Any ideas as to why this is the case?
Your call to sed is fine; the problem is that your file uses DOS line endings (CRLF), but sed does not recognize the CR as part of the line ending, but as just another character on the line. The string Introduction\r is captured, and the result \chapter{Introduction\r} is printed by printing everything up to the carriage return (the ^ represents the cursor position)
\chapter{Introduction
^
then moving the cursor to the beginning of the line
\chapter{Introduction
^
then printing the rest of the result (}) over what has already been printed
}chapter{Introduction
^
The solution is to either fix the file to use standard POSIX line endings (linefeed only), or to modify your regular expression to not capture the carriage return at the end of the line.
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)\r?$/\\chapter{\1}/p'
As an alternative to sed, awk using gsub might work well in this situation:
awk '{gsub(/Book [0-9]+/,"\\chapter"); print $1"{"$2"}"}'
Result:
\chapter{Introduction}
A solution is to modify the capture group. In this case, since all book chapter names consist only of alphabetic characters I was able to use [[:alpha:]]*. This gave a revised sed script of:
sed -n -e 's/Book [[:digit:]]\{2\}\s*\([[:alpha:]]*\)/\\chapter{\1}/p'.

use sed to merge lines and add comma

I found several related questions, but none of them fits what I need, and since I am a real beginner, I can't figure it out.
I have a text file with entries like this, separated by a blank line:
example entry &with/ special characters
next line (any characters)
next %*entry
more words
I would like the output merge the lines, put a comma between, and delete empty lines. I.e., the example should look like this:
example entry &with/ special characters, next line (any characters)
next %*entry, more words
I would prefer sed, because I know it a little bit, but am also happy about any other solution on the linux command line.
Improved per Kent's elegant suggestion:
awk 'BEGIN{RS="";FS="\n";OFS=","}{$1=$1}7' file
which allows any number of lines per block, rather than the 2 rigid lines per block I had. Thank you, Kent. Note: The 7 is Kent's trademark... any non-zero expression will cause awk to print the entire record, and he likes 7.
You can do this with awk:
awk 'BEGIN{RS="";FS="\n";OFS=","}{print $1,$2}' file
That sets the record separator to blank lines, the field separator to newlines and the output field separator to a comma.
Output:
example entry &with/ special characters,next line (any characters)
next %*entry,more words
Simple sed command,
sed ':a;N;$!ba;s/\n/, /g;s/, , /\n/g' file
:a;N;$!ba;s/\n/, /g -> According to this answer, this code replaces all the new lines with ,(comma and space).
So After running only the first command, the output would be
example entry &with/ special characters, next line (any characters), , next %*entry, more words
s/, , /\n/g - > Replacing , , with new line in the above output will give you the desired result.
example entry &with/ special characters, next line (any characters)
next %*entry, more words
This might work for you (GNU sed):
sed ':a;$!N;/.\n./s/\n/, /;ta;/^[^\n]/P;D' file
Append the next line to the current line and if there are characters either side of the newline substitute the newline with a comma and a space and then repeat. Eventually an empty line or the end-of-file will be reached, then only print the next line if it is not empty.
Another version but a little more sofisticated (allowing for white space in the empty line) would be:
sed ':a;$!N;/^\s*$/M!s/\n/, /;ta;/\`\s*$/M!P;D' file
sed -n '1h;1!H
$ {x
s/\([^[:cntrl:]]\)\n\([^[:cntrl:]]\)/\1, \2/g
s/\(\n\)\n\{1,\}/\1/g
p
}' YourFile
change all after loading file in buffer. Could be done "on the fly" while reading the file and based on empty line or not.
use -e on GNU sed

Empty regular expression in sed script

Found the following sed script to reverse characters in each line, from the famous "sed one liners", and I am not able to follow the following command in //D of the script
sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
Suppose the inital file had two lines to start with say,
apple
banana
After the first command,
/\n/!G
pattern space would be,
apple
banana
[a new line introduced after each line. Code tag removing the last new line here. So it is not shown].
After the second command,
s/\(.\)\(.*\n\)/&\2\1/
pattern space would be,
apple
pple
a
banana
anana
b
How does the third command work after this? Also, I understand empty regular expression(//) matches the previously matched regexp. But in this case, what that will be? \n from the 1st command or the regexp substituted by the 2nd command? Any help would be much appreciated. Thanks.
Using the suggestion from my own comment above
this is what happens:
After /\n/!G pattern space would be
apple¶
banana¶
After s/\(.\)\(.*\n\)/&\2\1/ pattern space would be
apple¶pple¶a
banana¶anana¶b
then comes the D command. from man sed:
D Delete up to the first embedded newline in the pattern space.
Start next cycle, but skip reading from the input if there is
still data in the pattern space.
so the first word and the first ¶ is deleted. then sed starts from the
1st command but since the pattern space contains a ¶ the pattern /\n/
does not match and the G command is not executed.
The 2nd command leads to
pple¶ple¶pa
anana¶nana¶ab
can you continue from there?
D mean Delete first line (until first \n) and restart the current cycle if there is still something in the buffer
// is a shortcut to previous pattern matching (reuse the last pattern to serach for)
$ echo "123" | sed -n 's/2/other/;// p'
$
No corresponding (because it change the pattern matching content)
$ echo "123" | sed -n 's/.2/&still/;// p'
12still3
$
Pattern .2 is found also when // p is used because it is the equivalent to /.2/ p

How to use sed to find and replace items, only if the match is not bound by letters or numbers?

I am using sed to find and replace items, e.g.:
sed -i 's/fish/bear/g' ./file.txt
I want to limit this to only change items which do not have a letter or number before or after, e.g.:
The fish ate the worm. would change, because only spaces are before and after.
The lionfish ate the worm. would not change, because there is a letter before fish.
How can I find and replace some items, but not if at least one letter or number appears immediately before or after?
Use word boundary escapes:
sed -i 's/\<fish\>/bear/g' inputfile
Some versions of sed may not support this.
Use a negative character class before and after fish, like so: \(^\|[^[:alnum:]]\)fish\($\|[^[:alnum:]]\). This says:
Start of line or anything that's not alphanumeric
Followed by fish
Followed by end of line or anything that's not alphanumeric
This guarantees that the characters immediately preceding and immediately following fish are not alphanumeric.
sed 's/\(^\|[^[:alnum:]]\)fish\($\|[^[:alnum:]]\)/\1bear\2/g'
Check the character in front of and behind the string. If it's at the beginning or end, there won't be a character to check so check that too.
sed -i 's/\(^\|[^[:alnum:]]\)fish\($\|[^[:alnum:]]\)/\1bear\2/g'

Resources