Remove blank lines from the ends of a bunch of files - bash

I have a bunch of files with many lines in them, and usually one or two blank lines at the end.
I want to remove the blank lines at the end, while keeping all of the blank lines that may exist within the file.
I want to restrict the operation to the use of GNU utilities or similar, i.e. bash, sed, awk, cut, grep etc.
I know that I can easily remove all blank lines, with something like:
sed '/^$/d'
But I want to keep blank lines which exist prior to further content in the file.
File input might be as follows:
line1
line2
line4
line5
I'd want the output to look like:
line1
line2
line4
line5
All files are <100K, and we can make temporary copies.

With Perl:
perl -0777 -pe 's/\n*$//; s/$/\n/' file
Second S command (s/$/\n/) appends again a newline to end of your file to be POSIX compilant.
Or shorter:
perl -0777 -pe 's/\n*$/\n/' file
With Fela Maslen's comment to edit files in place (-i) and glob all elements in current directory (*):
perl -0777 -pe 's/\n*$/\n/' -i *

If lines containing just space chars are to be considered empty:
$ tac file | awk 'NF{f=1}f' | tac
line1
line2
line4
line5
otherwise:
$ tac file | awk '/./{f=1}f' | tac
line1
line2
line4
line5

Here is an awk solution (Standard linux gawk). I enjoyed writing.
single line:
awk '/^\s*$/{s=s $0 ORS; next}{print s $0; s=""}' input.txt
using a readable script script.awk
/^\s*$/{skippedLines = skippedLines $0 ORS; next}
{print skippedLines $0; skippedLines= ""}
explanation:
/^\s*$/ { # for each empty line
skippedLines = skippedLines $0 ORS; # pad string of newlines
next; # skip to next input line
}
{ # for each non empty line
print skippedLines $0; # print any skippedLines and current input line
skippedLines= ""; # reset skippedLines
}

This might work for you (GNU sed):
sed ':a;/\S/{n;ba};$d;N;ba' file
If the current line contains a non-space character, print the current pattern space, fetch the next line and repeat. If the current line(s) is/are empty and it is the last line in the file, delete the pattern space, otherwise append the next line and repeat.

Related

how to copy lines one by one from a file and paste then into another file after every n lines using shell script

say i have file1 with content
line1
line2
line3
and another file2 with content
lineA
lineB
lineC
lineD
lineE
lineF
lineG
lineH
lineI
I want to make file2 as
lineA
lineB
lineC
line1
lineD
lineE
lineF
line2
lineG
lineH
lineI
line3
Here is a way to do it with paste
cat file2 | paste -d'\n' - - - file1
The dash argument for paste means to read from the standard input, which is the cat file2 output, while the fourth argument is file1. So, with three dashes, we will paste every 3 lines of one file with 1 from another and the delimiter is the newline character (-d'\n').
This will work in case of remaining lines in any of these files, as paste will continue when EOF is found for one of the inputs. But it may print a couple of empty lines in that case, so you can pipe to any command to remove them, (supposing you don't have actual empty lines in your files), for example
cat file2 | paste -d'\n' - - - file1 | sed '/^$/d'
This python code will do it, the parameters in your case would be
python interlace.py file1 file2 file3 3
I would suggest just using a mv file3 file2 afterward if you want it to be in-place. This is because if you start writing to file2 before you've read everything it can be overwritten
import sys
if len(sys.argv[1:]) == 4:
file1 = open(sys.argv[1], 'r')
file2 = open(sys.argv[2], 'r')
file3 = open(sys.argv[3], 'w')
line_count = int(sys.argv[4])
current_counter = 0
for file2_line in file2.readlines():
current_counter += 1
file3.write(file2_line)
if current_counter == line_count:
file3.write(file1.readline())
current_counter = 0
for file1_line in file1.readlines():
file3.write(file1_line)
file3.close()
This also works in the cases where file1 runs out of lines early, in which case file2's lines continue as normal, and when file1 has extra lines they just get added to the end.
This might work for you (GNU sed):
n=3
sed "$n~$n"'R file1' file2
After the third line and subsequently every third line of file2, append a line from file1.
Using awk and getline:
awk '1;NR%3==0{if((getline < "file1")>0)print}' file2
lineA
lineB
lineC
line1
lineD
...
You could probably obfuscate it to awk '1;NR%3==0&&(getline < "file1")' file2 (untested).

replacing word in shell script or sed

I am a newbie, but would like to create a script which does the following.
Suppose I have a file of the form
This is line1
This is line2
This is line3
This is line4
This is line5
This is line6
I would like to replace it in the form
\textbf{This is line1}
This is line2
This is line3
\textbf{This is line4}
This is line5
This is line6
That is, at the start of the paragraph I would like to add a text \textbf{ and end the line with }. Is there a way to search for double end of lines? I am having trouble creating such a script with sed. Thank you !
Using awk you can write something like
$ awk '!f{ $0 = "\\textbf{"$0"}"; f++} 1; /^$/{f=0}' input
\textbf{This is line1}
This is line2
This is line3
\textbf{This is line4}
This is line5
This is line6
What it does?
!f{ $0 = "\\textbf{"$0"}"; f++}
!f True if value of f is 0. For the first line, since the value of f is not set, will evaluates true. If its true, awk performs tha action part {}
$0 = "\\textbf{"$0"}" adds \textbf{ and } to the line
f++ increments the value of f so that it may not enter into this action part, unless f is set to zero
1 always True. Since action part is missing, awk performs the default action to print the entire line
/^$/ Pattern matches an empty line
{f=0} If the line is empty, then set f=0 so that the next line is modfied by the first action part to include the changes
An approach using sed
sed '/^$/{N;s/^\(\n\)\(.*\)/\1\\textbf{\2}/};1{s/\(.*\)/\\textbf{\1}/}' my_file
find all lines that only have a newline character and then add the next line to it. ==
^$/{N;s/^\(\n\)\(.*\)/\1\\textbf{\2}/}
mark the line below the blank line and modify it
find the first line in the file and do the same == 1{s/\(.*\)/\\textbf{\1}/}
Just use awk's paragraph mode:
$ awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"} {$1="\\textbf{"$1"}"} 1' file
\textbf{This is line1}
This is line2
This is line3
\textbf{This is line4}
This is line5
This is line6

Edit data removing line breaks and putting everything in a row

Hi I'm new in shell scripting and I have been unable to do this:
My data looks like this (much bigger actually):
>SampleName_ZN189A
01000001000000000000100011100000000111000000001000
00110000100000000000010000000000001100000010000000
00110000000000001110000010010011111000000100010000
00000110000001000000010100000000010000001000001110
0011
>SampleName_ZN189B
00110000001101000001011100000000000000000000010001
00010000000000000010010000000000100100000001000000
00000000000000000000000010000000000010111010000000
01000110000000110000001010010000001111110101000000
1100
Note: After every 50 characters there is a line break, but sometimes less when the data finishes and there's a new sample name
I would like that after every 50 characters, the line break would be removed, so my data would look like this:
>SampleName_ZN189A
0100000100000000000010001110000000011100000000100000110000100000000000010000000000001100000010000000...
>SampleName_ZN189B
0011000000110100000101110000000000000000000001000100010000000000000010010000000000100100000001000000...
I tried using tr but I got an error:
tr '\n' '' < my_file
tr: empty string2
Thanks in advance
tr with "-d" deletes specified character
$ cat input.txt
00110000001101000001011100000000000000000000010001
00010000000000000010010000000000100100000001000000
00000000000000000000000010000000000010111010000000
01000110000000110000001010010000001111110101000000
1100
$ cat input.txt | tr -d "\n"
001100000011010000010111000000000000000000000100010001000000000000001001000000000010010000000100000000000000000000000000000010000000000010111010000000010001100000001100000010100100000011111101010000001100
You can use this awk:
awk '/^ *>/{if (s) print s; print; s="";next} {s=s $0;next} END {print s}' file
>SampleName_ZN189A
010000010000000000001000111000000001110000000010000011000010000000000001000000000000110000001000000000110000000000001110000010010011111000000100010000000001100000010000000101000000000100000010000011100011
>SampleName_ZN189B
001100000011010000010111000000000000000000000100010001000000000000001001000000000010010000000100000000000000000000000000000010000000000010111010000000010001100000001100000010100100000011111101010000001100
Using awk
awk '/>/{print (NR==1)?$0:RS $0;next}{printf $0}' file
if you don't care of the result which has additional new line on first line, here is shorter one
awk '{printf (/>/?RS $0 RS:$0)}' file
This might work for you (GNU sed):
sed '/^\s*>/!{H;$!d};x;s/\n\s*//2gp;x;h;d' file
Build up the record in the hold space and when encountering the start of the next record or the end-of-file remove the newlines and print out.
you can use this sed,
sed '/^>Sample/!{ :loop; N; /\n>Sample/{n}; s/\n//; b loop; }' file.txt
Try this
cat SampleName_ZN189A | tr -d '\r'
# tr -d deletes the given/specified character from the input
Using simple awk, Same will be achievable.
awk 'BEGIN{ORS=""} {print}' SampleName_ZN189A #Output doesn't contains an carriage return
at the end, If u want an line break at the end this works.
awk 'BEGIN{ORS=""} {print}END{print "\r"}' SampleName_ZN189A
# select the correct line break charachter (i.e) \r (or) \n (\r\n) depends upon the file format.

Omit the last line with sed

I'm having the following file content.
2013-07-30 debug
line1
2013-07-30 info
line2
line3
2013-07-30 debug
line4
line5
I want to get the following output with sed.
2013-07-30 info
line2
line3
This command gives me nearly the output I want
sed -n '/info/I,/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/{p}' myfile.txt
2013-07-30 info
line2
line3
2013-07-30 debug
How do I omit the last line here?
IMO, sed starts to become unwieldy as soon as you have to add conditions into it. I realize you did not tag the question with awk, but here is an awk program to print only "info" sections.
awk -v type="info" '
$1 ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}$/ {p = ($2 == type)}
p
' myfile.txt
2013-07-30 info
line2
line3
Try:
sed -n '/info/I p; //,/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/{ //! p}' myfile.txt
It prints first match, and in range omits both edges but the first one is already printed, so only skips the second one. It yields:
2013-07-30 info
line2
line3
This might work for you (GNU sed):
sed -r '/info/I{:a;n;/^[0-9]{4}(-[0-9]{2}){2}/!ba;s/^/\n/;D};d' file
or if you prefer:
sed '/info/I{:a;n;/^....-..-.. /!ba;s/^/\n/;D};d' file
N.B. This caters for consecutive patterns

bash, sed, awk: extracting lines within a range

How can I get sed to extract the lines between two patterns, write that data to a file, and then extract the lines between the next range and write that text to another file? For example given the following input:
pattern_a
line1
line2
line3
pattern_b
pattern_a
line4
line5
line6
pattern_b
I want line1 line2 and line3 to appear in one file and line4 line5 and line6 to appear in another file. I can't see a way of doing this without using a loop and maintaining some state between iterations of the loop where the state tells you where sed must start start search to looking for the start pattern (pattern_a) again.
For example, in bash-like psuedocode:
while not done
if [[ first ]]; then
sed -n -e '/pattern_a/,/pattern_b/p' > $filename
else
sed -n -e '$linenumber,/pattern_b/p' > $filename
fi
linenumber = last_matched_line
filename = new_filename
Is there a nifty way of doing this using sed? Or would awk be better?
How about this:
awk '/pattern_a/{f=1;c+=1;next}/pattern_b/{f=0;next}f{print > "outfile_"c}' input_file
This will create a outfile_x for every range.

Resources