awk or sed - delete specific lines - bash

I've got this
line1
line2
line3
line4
line5
line6
line7
line8
line9
line10
line11
line12
line13
line14
line15
I want this
line1
line3
line5
line6
line8
line10
line11
line13
line15
As you can see , the line to be deleted are going from x+2 then x+3 , x equals the line number to be deleted.
I tried this with awk but this is not the right way.
awk '(NR)%2||(NR)%3' file > filev1
Any ideas why?

If I decipher your requirements correctly, then
awk 'NR % 5 != 2 && NR % 5 != 4' file
should do.

Based on Wintermutes logic :)
awk 'NR%5!~/^[24]$/' file
line1
line3
line5
line6
line8
line10
line11
line13
line15
or
awk 'NR%5~/^[013]$/' file
How it works.
We can see from your lines that the one with * should be removed and other kept.
line1
line2*
line3
line4*
line5
line6
line7*
line8
line9*
line10
line11
line12*
line13
line14*
line15
By grouping data inn to every 5 line NR%5,
We see that line to delete is 2 or 4 in every group.
NR%5!~/^[24]$/' This divide data inn to group of 5
Then this part /^[24]$/' tell to not keep 2 or 4
The ^ and $ is important so line 12 47 i deleted too,
since 12 contains 2. So we need to anchor it ^2$ and ^4$.

Using GNU sed, you can do the following command:
sed '2~5d;4~5d' test.txt

Related

Print all the lines between two patterns in shell

I have a file which is the log of a script running in a daily cronjob. The log file looks like-
Aug 19
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 19
Aug 20
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 20
Aug 21
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 21
The log is written by the script starting with the date and ending with the date and in between all the logs are written.
Now when I try to get the logs for a single day using the command below -
sed -n '/Aug 19/,/Aug 19/p' filename
it displays the output as -
Aug 19
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 19
But if I try to get the logs of multiple dates, the logs of last day is always missing.
Example- If I run the command
sed -n '/Aug 19/,/Aug 20/p' filename
the output looks like -
Aug 19
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 19
Aug 20
I have gone through this site and found some valuable inputs to a similar problem but none of the solutions work for me. The links are Link 1
Link 2
The commands that I have tried are -
awk '/Aug 15/{a=1}/Aug 21/{print;a=0}a'
awk '/Aug 15/,/Aug 21/'
sed -n '/Aug 15/,/Aug 21/p
grep -Pzo "(?s)(Aug 15(.*?)(Aug 21|\Z))"
but none of the commands gives the logs of the last date, all the commands prints till the 1st timestamp as I have shown above.
I think you can use the awk command as followed to print the lines between Aug 19 & Aug 20,
awk '/Aug 19/||/Aug 20/{a++}a; a==4{a=0}' file
Brief explanation,
/Aug 19/||/Aug 20/: find the record matched Aug 19 or Aug 20
if the criteria is met, set the flag a++
if the flag a in front of the semicolon is greater than 0, that would print the record.
Final criteria, if a==4, then reset a=0, mind that it only worked for the case in the example, if Aug 19 or Aug 20 are more than 4, modify the number 4 in the answer to meet your new request.
If you want to assign the searched patterns into variables, modify the command as followed,
$ b="Aug 19"
$ c="Aug 20"
$ awk -v b="$b" -v c="$c" '$0 ~ c||$0 ~ b{a++}a; a==4{a=0}' file
You may use multiple patterns by separating with a semicolon.
sed -n '/Aug 19/,/Aug 19/p;/Aug 20/,/Aug 20/p' filename
Could you please try following awk solution too once and let me know if this helps you.
awk '/Aug 19/||/Aug 20/{flag=1}; /Aug/ && (!/Aug 19/ && !/Aug 20/){flag=""} flag' Input_file
EDIT: Adding output too here for letting OP know.
awk '/Aug 19/||/Aug 20/{flag=1}; /Aug/ && (!/Aug 19/ && !/Aug 20/){flag=""} flag' Input_file
Aug 19
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 19
Aug 20
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Aug 20
The following approach is quite easy to understand conceptually...
Print all lines from Aug 19 onwards to end of file.
Reverse the order of the lines (with tac because tac is cat backwards).
Print all lines from Aug 21 onwards.
Reverse the order of the lines back to the original order.
sed -ne '/Aug 19/,$p' filename | tac | sed -ne '/Aug 21/,$p' | tac

How to find duplicate lines in a file?

I have an input file with foillowing data:
line1
line2
line3
begin
line5
line6
line7
end
line9
line1
line3
I am trying to find all the duplicate lines , I tried
sort filename | uniq -c
but does not seem to be working for me :
It gives me :
1 begin
1 end
1 line1
1 line1
1 line2
1 line3
1 line3
1 line5
1 line6
1 line7
1 line9
the question may seem duplicate as Find duplicate lines in a file and count how many time each line was duplicated?
but nature of input data is different .
Please suggest .
use this:
sort filename | uniq -d
man uniq
try
sort -u file
or
awk '!a[$0]++' file
you'll have to modify the standard de-dupe code just a tiny bit to account for this:
if you want unique copy of the duplicates, then it's very much same idea:
{m,g}awk 'NF~ __[$_]++' FS='^$'
{m,g}awk '__[$_]++==!_'
If you want every copy printed for duplicates, then whenever the condition yields true for the first time, print 2 copies of it, plus print new matches along the way.
Usually it's waaaaaaaaay faster to first de-dupe, then sort, instead of the other way around.

Merging Two, Nearly Similar Text Files

Suppose we have ~/file1:
line1
line2
line3
...and ~/file2:
line1
lineNEW
line3
Notice that thes two files are nearly identical, except line2 differs from lineNEW.
Question: How can I merge these two files to produce one that reads as follows:
line1
line2
lineNEW
line3
That is, how can I merge the two files so that all unique lines are captured (without overlap) into a third file? Note that the order of the lines doesn't matter (as long as all unique lines are being captured).
awk '{
print
getline line < second
if ($0 != line) print line
}' second=file2 file1
will do it
Considered the command below. It is more robust since it also works for files where a new line has been added instead of replaced (see f1 and f2 below).
First, I executed it using your files. I divided the command(s) into two lines so that it fits nicely in the "code block":
$ (awk '{ print NR, $0 }' file1; awk '{ print NR, $0 }' file2) |\
sort -k 2 | uniq -f 1 | sort | cut -d " " -f 2-
It produces your expected output:
line1
line2
lineNEW
line3
I also used these two extra files to test it:
f1:
line1 stuff after a tab
line2 line2
line3
line4
line5
line6
f2:
line1 stuff after a tab
lineNEW
line2 line2
line3
line4
line5
line6
Here is the command:
$ (awk '{ print NR, $0 }' f1; awk '{ print NR, $0 }' f2) |\
sort -k 2 | uniq -f 1 | sort | cut -d " " -f 2-
It produces this output:
line1 stuff after a tab
line2 line2
lineNEW
line3
line4
line5
line6
When you do not care about the order, just sort them:
cat ~/file1 ~/file2 | sort -u > ~/file3

can you print a record in awk [duplicate]

This question already has answers here:
How to parse multi line records (with awk?)
(2 answers)
Closed 7 years ago.
Probably a simple question but I haven't found an answer. I have a file with multiple records separated by a blank line. Each field in the file is separated by a newline. I simply want to print out the entire first record or the entire third record.
awk 'BEGIN{FS="";} {print $1}' output.txt
The above prints out the first letter of each line of the first record
awk 'BEGIN{FS="\n"; RS=""} {print $1}' output.txt
The above prints out the first field of the first record.
It seems a simple enough problem but I can't seem to solve it. Records have an indeterminate amount of fields (lines). They are simply separated by a blank line
A
ok here is a sample:
line1 record1
line2 record1
line3 record1
line4 record1
line1 record2
line2 record2
line3 record2
line4 record2
line5 record2
line1 record3
line1 record4
line2 record4
Now I want the entire first record and the entire 3rd record.
awk 'NR==1 || NR==3 {print $0}' output.txt
line1 record1
line3 record1
First and third lines of first record. no good
awk 'NR==1 || NR==3' output.txt
line1 record1
line3 record1
First and third lines of first record. no good
awk 'NR==1 || NR==3 {print $0}' output.txt
line1 record1
line3 record1
First and 3rd line of the first record. no good.
awk 'BEGIN{FS="\n"; RS=""} NR==1' output.txt
line1 record1
line2 record1
line3 record1
line4 record1
line1 record2
line2 record2
line3 record2
line4 record2
line5 record2
line1 record3
line1 record4
line2 record4
All printed out. no good
I simply want the first and third records.
The first being:
line1 record1
line2 record1
line3 record1
line4 record1
and the third being:
line1 record3
Ok so nothing spelled out seems to work for me and I'm well confused. Here is the shell output:
$ awk -v RS= -v ORS='\n\n' 'NR ~ /^(1|3)$/' output.txt
line1 record1
line2 record1
line3 record1
line4 record1
line1 record2
line2 record2
line3 record2
line4 record2
line5 record2
line1 record3
line1 record4
line2 record4
$ cat output.txt
line1 record1
line2 record1
line3 record1
line4 record1
line1 record2
line2 record2
line3 record2
line4 record2
line5 record2
line1 record3
line1 record4
line2 record4
$
I am very confused as to why this isn't working.
here is my system and the awk I'm using:
$ awk -V | head -1
GNU Awk 4.0.1
$ uname -a
Linux IEDUB2TJ5262 3.13.0-68-generic #111-Ubuntu SMP Fri Nov 6 18:17:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$
Is there something I could be missing here?
A
This is THE idiomatic awk way to do what you want and it works in all awks, not just gawk:
$ awk -v RS= -v ORS='\n\n' 'NR ~ /^(1|3)$/' file
line1 record1
line2 record1
line3 record1
line4 record1
line1 record3
See http://www.gnu.org/software/gawk/manual/gawk.html#Multiple-Line and google awk paragraph mode.
If the above does not work for you then there is something wrong with your input file (or, far less likely, your awk is broken).
awk 'NR==1 || NR==3 {print $0}' output.txt
Everything before the curly braces is called a restriction.
It will determine whether the command (everything within the parentheses) will be executed. NR means Number of Record, so it means that output will be restricted to the first and third record.
Every awk program is just a collection of restrictions and commands.
EDIT:
Actually I just realized that {print $0} is the default action, if no command is provided, this means that:
awk 'NR==1 || NR==3' output.txt
is sufficient.
EDIT:
After you've explained yourself a bit more, I suggest this:
awk 'BEGIN {RS='\n\n'} NR==1 || NR==3' output.txt
It considers everything a record, that is seperated from each other by two newlines.
awk 'BEGIN{FS="\n"; RS=""} NR==1' output.txt
prints the first record.

Separate by blank lines in bash

I have an input like this:
Block 1:
line1
line2
line3
line4
Block 2:
line1
line2
Block 3:
line1
line2
line3
This is an example, is there an elegant way to print Block 2 and its lines only without rely on their names? It would be like "separate the blocks by the blank line and print the second block".
try this:
awk '!$0{i++;next;}i==1' yourFile
considering performance, also can add exit after 2nd block was processed:
awk '!$0{i++;next;}i==1;i>1{exit;}' yourFile
test:
kent$ cat t
Block 1:
line1
line2
line3
line4
Block 2:
line1
line2
Block 3:
line1
line2
line3
kent$ awk '!$0{i++;next;}i==1' t
Block 2:
line1
line2
kent$ awk '!$0{i++;next;}i==1;i>1{exit;}' t
Block 2:
line1
line2
Set the record separater to the empty string to separate on blank lines. To
print the second block:
$ awk -v RS= 'NR==2{ print }'
(Note that this only separates on lines that do not contain any whitespace.
A line containing only white space is not considered a blank line.)

Resources