replacing word in shell script or sed - bash

I am a newbie, but would like to create a script which does the following.
Suppose I have a file of the form
This is line1
This is line2
This is line3
This is line4
This is line5
This is line6
I would like to replace it in the form
\textbf{This is line1}
This is line2
This is line3
\textbf{This is line4}
This is line5
This is line6
That is, at the start of the paragraph I would like to add a text \textbf{ and end the line with }. Is there a way to search for double end of lines? I am having trouble creating such a script with sed. Thank you !

Using awk you can write something like
$ awk '!f{ $0 = "\\textbf{"$0"}"; f++} 1; /^$/{f=0}' input
\textbf{This is line1}
This is line2
This is line3
\textbf{This is line4}
This is line5
This is line6
What it does?
!f{ $0 = "\\textbf{"$0"}"; f++}
!f True if value of f is 0. For the first line, since the value of f is not set, will evaluates true. If its true, awk performs tha action part {}
$0 = "\\textbf{"$0"}" adds \textbf{ and } to the line
f++ increments the value of f so that it may not enter into this action part, unless f is set to zero
1 always True. Since action part is missing, awk performs the default action to print the entire line
/^$/ Pattern matches an empty line
{f=0} If the line is empty, then set f=0 so that the next line is modfied by the first action part to include the changes

An approach using sed
sed '/^$/{N;s/^\(\n\)\(.*\)/\1\\textbf{\2}/};1{s/\(.*\)/\\textbf{\1}/}' my_file
find all lines that only have a newline character and then add the next line to it. ==
^$/{N;s/^\(\n\)\(.*\)/\1\\textbf{\2}/}
mark the line below the blank line and modify it
find the first line in the file and do the same == 1{s/\(.*\)/\\textbf{\1}/}

Just use awk's paragraph mode:
$ awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"} {$1="\\textbf{"$1"}"} 1' file
\textbf{This is line1}
This is line2
This is line3
\textbf{This is line4}
This is line5
This is line6

Related

Remove blank lines from the ends of a bunch of files

I have a bunch of files with many lines in them, and usually one or two blank lines at the end.
I want to remove the blank lines at the end, while keeping all of the blank lines that may exist within the file.
I want to restrict the operation to the use of GNU utilities or similar, i.e. bash, sed, awk, cut, grep etc.
I know that I can easily remove all blank lines, with something like:
sed '/^$/d'
But I want to keep blank lines which exist prior to further content in the file.
File input might be as follows:
line1
line2
line4
line5
I'd want the output to look like:
line1
line2
line4
line5
All files are <100K, and we can make temporary copies.
With Perl:
perl -0777 -pe 's/\n*$//; s/$/\n/' file
Second S command (s/$/\n/) appends again a newline to end of your file to be POSIX compilant.
Or shorter:
perl -0777 -pe 's/\n*$/\n/' file
With Fela Maslen's comment to edit files in place (-i) and glob all elements in current directory (*):
perl -0777 -pe 's/\n*$/\n/' -i *
If lines containing just space chars are to be considered empty:
$ tac file | awk 'NF{f=1}f' | tac
line1
line2
line4
line5
otherwise:
$ tac file | awk '/./{f=1}f' | tac
line1
line2
line4
line5
Here is an awk solution (Standard linux gawk). I enjoyed writing.
single line:
awk '/^\s*$/{s=s $0 ORS; next}{print s $0; s=""}' input.txt
using a readable script script.awk
/^\s*$/{skippedLines = skippedLines $0 ORS; next}
{print skippedLines $0; skippedLines= ""}
explanation:
/^\s*$/ { # for each empty line
skippedLines = skippedLines $0 ORS; # pad string of newlines
next; # skip to next input line
}
{ # for each non empty line
print skippedLines $0; # print any skippedLines and current input line
skippedLines= ""; # reset skippedLines
}
This might work for you (GNU sed):
sed ':a;/\S/{n;ba};$d;N;ba' file
If the current line contains a non-space character, print the current pattern space, fetch the next line and repeat. If the current line(s) is/are empty and it is the last line in the file, delete the pattern space, otherwise append the next line and repeat.

How to find duplicate lines in a file?

I have an input file with foillowing data:
line1
line2
line3
begin
line5
line6
line7
end
line9
line1
line3
I am trying to find all the duplicate lines , I tried
sort filename | uniq -c
but does not seem to be working for me :
It gives me :
1 begin
1 end
1 line1
1 line1
1 line2
1 line3
1 line3
1 line5
1 line6
1 line7
1 line9
the question may seem duplicate as Find duplicate lines in a file and count how many time each line was duplicated?
but nature of input data is different .
Please suggest .
use this:
sort filename | uniq -d
man uniq
try
sort -u file
or
awk '!a[$0]++' file
you'll have to modify the standard de-dupe code just a tiny bit to account for this:
if you want unique copy of the duplicates, then it's very much same idea:
{m,g}awk 'NF~ __[$_]++' FS='^$'
{m,g}awk '__[$_]++==!_'
If you want every copy printed for duplicates, then whenever the condition yields true for the first time, print 2 copies of it, plus print new matches along the way.
Usually it's waaaaaaaaay faster to first de-dupe, then sort, instead of the other way around.

Omit the last line with sed

I'm having the following file content.
2013-07-30 debug
line1
2013-07-30 info
line2
line3
2013-07-30 debug
line4
line5
I want to get the following output with sed.
2013-07-30 info
line2
line3
This command gives me nearly the output I want
sed -n '/info/I,/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/{p}' myfile.txt
2013-07-30 info
line2
line3
2013-07-30 debug
How do I omit the last line here?
IMO, sed starts to become unwieldy as soon as you have to add conditions into it. I realize you did not tag the question with awk, but here is an awk program to print only "info" sections.
awk -v type="info" '
$1 ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}$/ {p = ($2 == type)}
p
' myfile.txt
2013-07-30 info
line2
line3
Try:
sed -n '/info/I p; //,/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/{ //! p}' myfile.txt
It prints first match, and in range omits both edges but the first one is already printed, so only skips the second one. It yields:
2013-07-30 info
line2
line3
This might work for you (GNU sed):
sed -r '/info/I{:a;n;/^[0-9]{4}(-[0-9]{2}){2}/!ba;s/^/\n/;D};d' file
or if you prefer:
sed '/info/I{:a;n;/^....-..-.. /!ba;s/^/\n/;D};d' file
N.B. This caters for consecutive patterns

Separate by blank lines in bash

I have an input like this:
Block 1:
line1
line2
line3
line4
Block 2:
line1
line2
Block 3:
line1
line2
line3
This is an example, is there an elegant way to print Block 2 and its lines only without rely on their names? It would be like "separate the blocks by the blank line and print the second block".
try this:
awk '!$0{i++;next;}i==1' yourFile
considering performance, also can add exit after 2nd block was processed:
awk '!$0{i++;next;}i==1;i>1{exit;}' yourFile
test:
kent$ cat t
Block 1:
line1
line2
line3
line4
Block 2:
line1
line2
Block 3:
line1
line2
line3
kent$ awk '!$0{i++;next;}i==1' t
Block 2:
line1
line2
kent$ awk '!$0{i++;next;}i==1;i>1{exit;}' t
Block 2:
line1
line2
Set the record separater to the empty string to separate on blank lines. To
print the second block:
$ awk -v RS= 'NR==2{ print }'
(Note that this only separates on lines that do not contain any whitespace.
A line containing only white space is not considered a blank line.)

bash, sed, awk: extracting lines within a range

How can I get sed to extract the lines between two patterns, write that data to a file, and then extract the lines between the next range and write that text to another file? For example given the following input:
pattern_a
line1
line2
line3
pattern_b
pattern_a
line4
line5
line6
pattern_b
I want line1 line2 and line3 to appear in one file and line4 line5 and line6 to appear in another file. I can't see a way of doing this without using a loop and maintaining some state between iterations of the loop where the state tells you where sed must start start search to looking for the start pattern (pattern_a) again.
For example, in bash-like psuedocode:
while not done
if [[ first ]]; then
sed -n -e '/pattern_a/,/pattern_b/p' > $filename
else
sed -n -e '$linenumber,/pattern_b/p' > $filename
fi
linenumber = last_matched_line
filename = new_filename
Is there a nifty way of doing this using sed? Or would awk be better?
How about this:
awk '/pattern_a/{f=1;c+=1;next}/pattern_b/{f=0;next}f{print > "outfile_"c}' input_file
This will create a outfile_x for every range.

Resources