BASH: Find newlines in between text and replace with two newlines - bash

I am looking to programmatically edit the newlines of .txt files. The desired behavior is that any single newline in between lines of text will become two newlines.
edit (clarification by #kaan): Lines separated by one newline should be separated by two newlines. Any lines that are already separated by two or more lines should be left as is
edit (context): I am working with the .fountain syntax and an npm module called afterwriting that exports text files into a script format as a pdf. lines of text separated by only one new line do not properly space when printed to pdf using the package. So i want to automatically convert single newlines into double, because i also don't want to have to add two new lines in all of the files i am converting
For instance an example of an input would look like:
File with text in it
A new line
Another new line
Line with three new lines above
One last new line
would become
File with text in it
A new line
Another new line
Line with three new lines above
One last new line
Any ideas of how this could be achieved in a bash script would be appreciated

This might work for you (GNU sed):
sed '/\S/b;N;//{P;b};:a;n;//!ba' file
This solution appends another line to the first empty line encountered. If the appended line is not empty it prints the first and bails out, thus doubling the empty line. Otherwise if the appended line is empty, it print them both and then prints any further empty lines until it encounters a non-empty line.

Here is a way to do it using sed:
read the whole file (since normal sed behavior will remove all newlines)
look for a word boundary (\b) followed by two newlines (\n\n – one for ending the current line, then one that's the single blank line), then one more word boundary (\b)
for any matches, add one extra newline in there.
With your sample text inside data.txt, it looks like this:
sed -n 'H; ${x; s/\b\n\n\b/\n\n\n/g; p}' < data.txt | tail -n +2
(Edit: added | tail -n +2 to remove the extra newline that's inserted at the beginning)

Related

How to refresh the line numbers in sed

How can you refresh the line numbers of a sed output inside the same sed command?
I have a sed script as follows -
#!/usr/bin/sed -f
/pattern/i #inserting a line
1~10i ####
What this does is that it inserts lines wherever the pattern is matched and then inserts #### every ten lines. The problem is that it inserts the hashes every 10 lines according to the line numbers of the original file before inserting the lines for the matching pattern. I want to refresh the line numbers after inserting the lines and use them for inserting the 4 hashes every 10 lines.
Anyway this can be done without piping the output into a new sed?
Interesting challenge. If your file is not too large, the following may work for you (tested with GNU sed):
#!/usr/bin/sed -nEf
:a; N; $!ba
{
s/([^\n]*pattern[^\n]*\n)/#inserting a line\n\1/g
s/\n/ \n/g
s/\`/####\n/
:b
s/(.*####\n([^\n]* \n){9}[^\n]*) \n/\1\n####\n/
tb
s/ \n/\n/g
p
}
Explanations, line by line:
No print, extended RE mode (-nE).
Loop around label a to concatenate the whole file in the pattern space (reason why its size matters).
Add #inserting a line\n before each line containing pattern.
Add a space before all endline characters.
Insert ####\n before the first line.
Label b.
Append ####\n' to anything followed by ####\n` and 10 space-terminated lines, removing the final space (to prevent subsequent matches).
Goto b if there was a substitution.
Remove all spaces at the end of a line.
print.
Note: if your file does not contain NUL characters the -z option of GNU sed saves a few commands:
#!/usr/bin/sed -Ezf
s/([^\n]*pattern[^\n]*\n)/#inserting a line\n\1/g
s/\n/ \n/g
s/\`/####\n/
:a
s/(.*####\n([^\n]* \n){9}[^\n]*) \n/\1\n####\n/
ta
s/ \n/\n/g
Note: with the hold space we could probably do the same on the fly, instead of storing the whole file in the pattern space.
This might work for you (GNU sed):
sed -zE 's/.*pattern/# insert line\n&/mg
s/([^\n]*\n){10}/&####\n/g
s/^/####\n/' file
Slurp the file into memory.
Insert desired text before lines containing pattern.
Insert #### every 10 lines and before the first line.

SED's Substituted string is considered as one-line string, whereas it contains newline character

I am testing the sed command to substitute one line with 3 lines and, then, to delete the last line. (I could have substituted it with only the 2 first lines, but this is deliberately stated like this to showcase the main issue).
Let's say that I have the following text :
// ##OPTION_NAME: xxxx
I want to replace the token ##OPTION_NAME by ##OP-NAME and surround it by 2 new lines; Like so :
// ##OP-START
// ##OP-NAME: xxxx
// ##OP-END
To illustrate this, I put this text in a code.c file, and the sed commands in a sed script named script.sed.
Then, I call the following shell command :
Shell command
sed -f script.sed code.c
script.sed
# Begin by replacing patterns by their equivalents, surrounding them with ##OP-START and ##OP-END lines
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g
The problem
Now, I add another sed command in script.sed to delete the line containing ##OP-END. Surprise ! all 3 lines are removed !
# Begin by replacing patterns by their equivalents, surrounding them with ##OP-START and ##OP-END lines
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g
# Last parse; delete ##OP-END
/##OP-END/d
I tried \r\n instead of \n in the sustitution command
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g, but it does not work.
I also tested on ##OP-START to see if it makes some difference,
but alas ! All 3 lines were removed too.
It seems that sed is considering it as one line !
This is not a surprise, d operates on the pattern space, not on a per line basis. After the modification with the s command, your pattern space contains 3 lines. The content of it matches the expression and gets therefore deleted.
To delete this line from the pattern space, you need to use the s command again:
s/\(.*\)##OPTION_NAME:\(.*\)/\1##OP-START\n\1##OP-NAME:\2\n\1##OP-END/g$
s/\n\/\/ ##OP-END//
About pattern and hold space: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html#tag_20_116_13

sed error unterminated substitute pattern for new line text

I am writing a script to add new dependencies to the watch list. I am putting a placeholder to know where to add the text, for eg
assets = [
"../../new_app/assets"
# [[NEW_APP_ADD_ASSETS]]
]
It is simple to replace just the place holder but my problem is to add comma in the previous line.
that can be done if I search and replace
"
# [[NEW_APP_ADD_ASSETS]]
ie "\n # [[NEW_APP_ADD_ASSETS]]
I am not able to search for the new line.
One of the solutions I found for adding a new line was
sed -i '' 's/newline/line one\
line two/' filename.txt
But when same way done for the search string it returns :unterminated substitute pattern
sed -i '' s/'assets\"\
#'/'some new text'/ filename.txt
PS: I writing on macos
Sed works on a line-by-line base, hence it becomes tricky to add the coma to the previous line as that line has already been processed. It is possible, but the sed syntax quickly becomes messy.
To be a bit more specific:
In default operation, sed cyclically shall append a line of input, less its terminating <newline> character, into the pattern space. Reading from input shall be skipped if a <newline> was in the pattern space prior to a D command ending the previous cycle. The sed utility shall then apply in sequence all commands whose addresses select that pattern space, until a command starts the next cycle or quits. If no commands explicitly started a new cycle, then at the end of the script the pattern space shall be copied to standard output (except when -n is specified) and the pattern space shall be deleted. Whenever the pattern space is written to standard output or a named file, sed shall immediately follow it with a <newline>.
In short, if you do not manipulate the pattern space, you cannot process <newline> characters as they just do not appear!
And even shorter, if you only use the substitute command, sed only processes one line at a time!
This is also why you suffer from : unterminated substitute pattern. You are searching for a newline character, but as sed just reads one line at a time, it just does not find it and it also does not expect it. The error will vanish if you replace your newline with the symbols \n.
sed -i '' s/'assets\"\n #'/'some new text'/ filename.txt
A better way to achieve your goals would be to make use of awk. It is a bit more readable:
awk '/# [[NEW_APP_ADD_ASSETS]]/{ print t","; t="line1\nline2"; next }
{ print t; t=$0 }
END{ print t }' <file>

Sed line to remove unnecessary spaces is deleting line ends too & other strange new line issue

I have to deal with an output from a command. Just plain text. What I want is to remove unnecessary spaces by using
sed 's/ */ /g'
But it's doing it wrong - it seems to remove new line characters as well...
The other problem is about the $ character representing the 'new line'.
When I'm writing sth like this:
sed 's/$/FOO/g'
It is really replacing all the new line chars with the word FOO.
So its basically like:
text
text
text
Is converted to
textFOOtextFOOtext
The problem starts when 2 new lines occur. The text:
text
text
Is converted to:
textFOOFOOtext
BUT -- absolutely no SED line is able to convert the 2 new lines into one. I tried everything I found on the web.
How do I remove that additional new line?
Check if this helps:
tr -s ' ' < File
This will change multiple spaces to single space, if thats what u want

use sed to merge lines and add comma

I found several related questions, but none of them fits what I need, and since I am a real beginner, I can't figure it out.
I have a text file with entries like this, separated by a blank line:
example entry &with/ special characters
next line (any characters)
next %*entry
more words
I would like the output merge the lines, put a comma between, and delete empty lines. I.e., the example should look like this:
example entry &with/ special characters, next line (any characters)
next %*entry, more words
I would prefer sed, because I know it a little bit, but am also happy about any other solution on the linux command line.
Improved per Kent's elegant suggestion:
awk 'BEGIN{RS="";FS="\n";OFS=","}{$1=$1}7' file
which allows any number of lines per block, rather than the 2 rigid lines per block I had. Thank you, Kent. Note: The 7 is Kent's trademark... any non-zero expression will cause awk to print the entire record, and he likes 7.
You can do this with awk:
awk 'BEGIN{RS="";FS="\n";OFS=","}{print $1,$2}' file
That sets the record separator to blank lines, the field separator to newlines and the output field separator to a comma.
Output:
example entry &with/ special characters,next line (any characters)
next %*entry,more words
Simple sed command,
sed ':a;N;$!ba;s/\n/, /g;s/, , /\n/g' file
:a;N;$!ba;s/\n/, /g -> According to this answer, this code replaces all the new lines with ,(comma and space).
So After running only the first command, the output would be
example entry &with/ special characters, next line (any characters), , next %*entry, more words
s/, , /\n/g - > Replacing , , with new line in the above output will give you the desired result.
example entry &with/ special characters, next line (any characters)
next %*entry, more words
This might work for you (GNU sed):
sed ':a;$!N;/.\n./s/\n/, /;ta;/^[^\n]/P;D' file
Append the next line to the current line and if there are characters either side of the newline substitute the newline with a comma and a space and then repeat. Eventually an empty line or the end-of-file will be reached, then only print the next line if it is not empty.
Another version but a little more sofisticated (allowing for white space in the empty line) would be:
sed ':a;$!N;/^\s*$/M!s/\n/, /;ta;/\`\s*$/M!P;D' file
sed -n '1h;1!H
$ {x
s/\([^[:cntrl:]]\)\n\([^[:cntrl:]]\)/\1, \2/g
s/\(\n\)\n\{1,\}/\1/g
p
}' YourFile
change all after loading file in buffer. Could be done "on the fly" while reading the file and based on empty line or not.
use -e on GNU sed

Resources