How to remove lines that begin with the same character? - bash

I'm trying to clean up the output from someone else's script by removing the headers that have no content.
Output currently looks like this:
====== Header1 ======
====== Header2 ======
====== Header3 ======
information
I'm trying to remove the lines for Header1 and Header2, but not Header3. I found an awk command that removes all duplicate lines but the last that begin with the same character, so that helps for this issue, but causes a new problem when the 'information' bit is numerous lines that also begin with the same character (usually tabs).
Desired output post cleanup:
====== Header3 ======
information
Thanks

This awk might work for you:
$ awk '/^===/{h=$0;p=0;next}!p{print h};{p=1}1' file
====== Header3 ======
information
Or as Glenn pointed out, this also works:
awk '/^===/{h=$0;next}h{print h;h=0}1' file

Related

Grabbing data from one file and sending it to another file using awk

I have a jupyter notebook running in a directory with a bunch of output files.
The directory has a bunch of .out files, and I want to run awk on them to extract some information from them.
This is the bash script that works, for the most part:
for file in *.out
do awk '/SCF TOTAL ENERGY/ {print $NF; exit}' $file >> data.txt
done
This grabs the SCF TOTAL ENERGY from each output file, prints them out, and throws them into data.txt.
However, that is not the only information I want from my output files.
Let's say I have another piece of information called "USEFUL".
I want to grab the number associated with "USEFUL" (also at the NF position), create a new column in data.txt and fill up that column with the USEFUL data.
I know that I can create a new column in data.txt using
awk 'BEGIN{FS=OFS=" "}
{print $0 OFS }' data.txt
However, I don't know how to extract information from one file, and send it to data.txt, and making a new column at the same time.
Input files look like this:
first.out
SCF TOTAL ENERGY ----> 1234
lorem
ipsum
text
here
more
text
USEFUL ---> 4567
second.out
CF TOTAL ENERGY ----> 4321
lorem
ipsum
text
here
more
text
USEFUL ---> 7654
third.out:
CF TOTAL ENERGY ----> 5566
lorem
ipsum
text
here
more
text
USEFUL ---> 8877
I want my data.txt or final data file to look like:
1234 4567
4321 7654
5566 8877
Where the first column is SCF TOTAL ENERGY and the second column is USEFUL.
At the moment, I only have the first column. I want to create a code where I can keep extracting information from my input files and keep adding columns.
Any advice you have is appreciated!!
Could you please try following, written and tested with shown samples in GNU awk. We need not to use a for loop to go through all .out files you could read all .out files by awk program itself.
awk '/SCF TOTAL ENERGY/{scfVal=$NF;next} /USEFUL/{print scfVal,$NF;scfVal=""}' *.out
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/SCF TOTAL ENERGY/{ ##Checking condition if line has SCF TOTAL ENERGY then do following.
scfVal=$NF ##Setting scfVal value to last field of current line here.
next ##next will skip all further statements from here.
}
/USEFUL/{ ##Checking if line contains USEFUL then do following.
print scfVal,$NF ##Printing scfVal and last field value here.
scfVal="" ##Nullify scfVal here.
}
' *.out ##Passing all .out files to awk program from here.
NOTE: In case you have only 1 set of SCF TOTAL ENERGY and USEFUL then add nextfile after scfVal="" line to process it faster here(it needs GNU awk).

Extract set of lines using delimiters in bash [duplicate]

This question already has answers here:
How to select lines between two marker patterns which may occur multiple times with awk/sed
(10 answers)
Closed 5 years ago.
I am trying to extract set of lines between specific patterns in bash.
My input file:
=========
a
b
ven
c
d
=========
abc
def
venkata
sad
dada
=========
I am trying to extract only the lines between two ======= which contains the pattern venkata in between. ie., the second section in the above eg (abc ... dada).
I have tried sed, but it does not give what I need exactly.
I tried splitting this task to getting lines above venkata and the lines below it separately.
Using sed -n -e '/=====/,/venkata/p' gives starting from the beginning of the input, which is not what I need.
Any thoughts ?
Edit: The number of lines between ======= can be of any number and venkata can be at any line, not necessarily the exact middle. There can be multiple words,numbers,symbols in each line. This is just a sample
Edit 2: How to select lines between two marker patterns which may occur multiple times with awk/sed accepted answer is close, but gives the output from the first match. That is not what I am looking for.
Based on the command in the answer of that question, it would set the flag when the first ==== is found.
I need the ==== just before venkata, which need not be the very first match.
That answer does not help me solve my problem
Using grep you can accomplish the same:
cat infile | grep -A 2 -B 2 "venkata"
The options -A and -B print a number of trailing and leading lines respectively.
As pointed out by #Jan Gassen, if you want the same amount of lines unde and above the matching pattern, you can make it even simpler by:
cat infile | grep -C 2 "venkata"
Using gnu-awk you can do this:
awk -v RS='={2,}\n' -v ORS= '/venkata/' file
abc
def
venkata
sad
dada
If you don't have gnu-awk then use:
awk '/={2,}/{if (s && data ~ /venkata/) printf "%s", data; s=1; data=""; next} s{data = data $0 RS}' file

Delete header/column from .txt file with bash

I'm automating a workflow with a bash script on Mac OSX. In this workflow, I'd like to add a command that deletes a header from my table (.txt) file that is tab delimited. It looks as follows:
header1 header2 header3
a 1
b 2
c 3
d 4
e 5
f 6
As you can see, the third column, named header3, is empty.
I've noted this post or this one but I don't understand the arguments.
Could you suggest a line of code that automatically deletes the third column, or (even better) deletes the header called 'header3'?
awk is designed to work with whitespace-separated text columns:
awk '{print $1 "\t" $2}' input.txt > output.txt
I found the answer here in Table 2C.
sed s/header3//g input.txt > output.txt

How to add string to empty lines of file in scripting?

Example : file1 has data like
abc
cab
def
xxy
zay
sri
ram
In this file 3rd,7th,9th lines are empty, how to fill this empty lines with Specific string?.
For example if i want to fill these lines with Hello
Output File should be like:
abc
cab
Hello
def
xxy
zay
Hello
sri
Hello
ram
sed 's/^$/Hello/' file1
will output what you want.
You can redirect that to the output file as below.
sed 's/^$/Hello/' file1 > file2
If you want to change the original file itself, you can use the -i option.
sed -i 's/^$/Hello/' file1

How to remove a long line with special characters from a big file in bash

I have a file having 200 lines and I want to remove 5 long lines( each line contains special characters).
$cat abc
............
comments[asci?_203] part of jobs where to delete
5 similar lines
.....
I tried sed to remove these 5 lines, using line numbers(nl) on the file, but did not work.
Thanks
Have you tried to remove the lines with awk? This is untested with special characters, but maybe it could work
awk '{if (length($0)<55) print $0}' < abc
Replace 55 with the maximum line length you want to keep

Resources