bash grep for string and ignore above one line - bash

One of my script will return output as below,
NameComponent=Apache
Fixed=False
NameComponent=MySQL
Fixed=True
So in the above output, I am trying to ignore the below output using grep grep -vB1 'False' which seems not working,
NameComponent=Apache
Fixed=False
Is it possible to perform this using grep or is any better way with awk..

<some-command> |tac |sed -e '/False/ { N; d}' |tac
NameComponent=MySQL
Fixed=True
For every line that matches "False", the code in the {} gets executed. N takes the next line into the pattern space as well, and then d deletes the whole thing before moving on to the next line. Note: using multiple pipes is not considered as good practice.

#Karthi1234: If your Input_file is same as provided samples then try:
awk -F' |=' '($2 != "Apache" && $2 != "False")' Input_file
First making field separator as a space or = then checking here if field 2nd's value is not equal to sting Apache and False and mentioned no action to be performed so default print action will be done by awk.
EDIT: as per OP's request following is the code changed one, try:
awk '!/Apache/ && !/False/' Input_file
You could change strings too in case if these are not the ones which you want, logic should be same.
EDIT2: eg--> You could change values of string1 and string2 and increase the conditions if needed as per your requirement.
awk '!/string1/ && !/string2/' Input_file

If I understand the question correctly you will always have a line before "Fixed=..." and you want to print both lines if and only if "Fixed=True"
The following awk should do the trick:
< command > | awk 'BEGIN {prev='NA'} {if ($0=="Fixed=True") {print prev; print $0;} prev=$0;}'
Note that if the first line is "Fixed=True" it will print the string "NA" as the first line.

Related

Print part of a comma-separated field using AWK

I have a line containing this string:
$DLOAD , 123 , Loadcase name=SUBCASE_1
I am trying to only print SUBCASE_1. Here is my code, but I get a syntax error.
awk -F, '{n=split($3,a,"="); a[n]} {printf(a[1]}' myfile
How can I fix this?
1st solution: In case you want only to get last field(which contains = in it) then with your shown samples please try following
awk -F',[[:space:]]+|=' '{print $NF}' Input_file
2nd solution: OR in case you want to get specifically 3rd field's value after = then try following awk code please. Simply making comma followed by space(s) as field separator and in main program splitting 3rd field storing values into arr array, then printing 2nd item value of arr array.
awk -F',[[:space:]]+' '{split($3,arr,"=");print arr[2]}' Input_file
Possibly the shortest solution would be:
awk -F= '{print $NF}' file
Where you simply use '=' as the field-separator and then print the last field.
Example Use/Output
Using your sample into in a heredoc with the sigil quoted to prevent expansion of $DLOAD, you would have:
$ awk -F= '{print $NF}' << 'eof'
> $DLOAD , 123 , Loadcase name=SUBCASE_1
> eof
SUBCASE_1
(of course in this case it probably doesn't matter whether $DLOAD was expanded or not, but for completeness, in case $DLOAD included another '=' ...)

Using "awk" to find a string between two other specific strings:

I have a large output of text that'll include several lines like this:
sending:WHATIWANT:output
How would I use awk to make it so that this output would ONLY include WHATIWANT on each line?
edit: there is a changing amount of text before and after WHATIWANT so something like awk -F: '{print $2}' would not always work
From what you mention in the comments, this should do it:
perl -n -e'/sending:([^:]+):output/ && print $1' input_file
This runs a simple regex match line-by-line, capturing the interesting part and then printing it. It assumes that WHATIWANT does not contain the character :
If for some reason you absolutely must use awk(1), then I think you don't have much choice but to do this:
awk -F: '{ for (i = 2; i < NF; i++) if ($(i-1) == "sending" && $(i+1) == "output") print $i }' input_file
It basically splits each line by : and iterates through every field, comparing the left and right fields until it finds one that is between sending and output. Again, it assumes that WHATIWANT does not have a :
Can't you just use sed?
echo "asfasfdsf__sending:WHATIWANT:output__asdfadas" | sed -n 's/.*sending\:\([a-zA-Z0-9]*\)\:output.*/\1/p'
Gives you "WHATIWANT"

retaining text after delimiter in fasta headers using awk

I have what should be a simple problem, but my lack of awk knowledge is holding me back.
I would like to clean up the headers of a fasta file that is in this format:
>HWGG454_Clocus2_Locus3443_allele1
ATTCTACTACTACTCT
>GHW757_clocus37_Locus555662_allele2
CTTCCCTACGATG
>TY45_clocus23_Locus800_allele0
TTCTACTTCATCT
I would like to clean up each header (line starting with ">") to retain only the informative part, which is the second "_Locus*" with or without the allele part.
I thought awk would be the easy way to do this, but I cant quite get it to work.
If I wanted to retain just the first column of text up to the "_" delimiter for the header, and the sequences below, I run this (assuming this toy example is in the file test.fasta):
cat test.fasta | awk -F '_' '{print $1}'
>HWGG454
ATTCTACTACTACTCT
>GHW757
CTTCCCTACGATG
>TY45
TTCTACTTCATCT
But, what I want is to retain just the "Locus*" text, which is after the 3rd delimiter, but, using this code I get this:
cat test.fasta | awk -F '_' '{print $3}'
Locus3443
Locus555662
Locus800
What am I doing wrong here?
thanks.
I understand this to mean that you want to pick the Locus field from the header lines and leave the others unchanged. Then:
awk -F _ '/^>/ { print $3; next } 1' filename
is perhaps the easiest way. This works as follows:
/^>/ { # in lines that begin with >
print $3 # print the third field
next # and go to the next line.
}
1 # print other lines unchanged. Here 1 means true, and the
# default action (unchanged printing) is performed.
The thing to understand here is awk's control flow: awk code consists of conditions with associated actions, and the actions are performed if the condition evaluates to true.
/^>/ is a regex match over the whole record (line by default); it is true if the line begins with > (because ^ matches the beginning), so
/^>/ { print $3; next }
will make awk execute print $3; next in lines that begin with >. The less straightforward part is
1
which prints lines unchanged. We only get here if the first action was not executed (because of the next in it), and this 1 is to be read as a condition that is always true -- nonzero values being true in awk.
Now, if either the condition or the action in an awk statement are omitted, a default is used. The default action is printing the line unchanged, and this takes advantage of it. It would be equally possible to write
1 { print }
or
{ print }
In the latter case, the condition is omitted and the default condition "true" is used. 1 is the shortest variant of this and idiomatic because of it.
$ awk -F_ '{print (/^>/ ? $3 : $0)}' file
Locus3443
ATTCTACTACTACTCT
Locus555662
CTTCCCTACGATG
Locus800
TTCTACTTCATCT
You need a second awk match for the row below. e.g.
cat test.fasta | awk -F _ '/^>/ { print $3"_"$4 } /^[A-Z]/ {print $1}'
Output:
Locus3443_allele1
ATTCTACTACTACTCT
Locus555662_allele2
CTTCCCTACGATG
Locus800_allele0
TTCTACTTCATCT
If you don't want the _allele1 bit remove "_"$4 from the awk script.
You can just do a regex on each line:
$ awk '{ sub(/^.*_L/,"L"); print $0}' /tmp/fasta.txt
Locus3443_allele1
ATTCTACTACTACTCT
Locus555662_allele2
CTTCCCTACGATG
Locus800_allele0
TTCTACTTCATCT

setting the NR to 1 does not work (awk)

I have the following script in bash.
awk -F ":" '{if($1 ~ "^fall")
{ NR==1
{{printf "\t<course id=\"%s\">\n",$1} } } }' file1.txt > container.xml
So what I have a small file. If ANY line starts with fall, then I want the first field of the VERY first line.
So I did that in the code and set NR==1. However, it does not do the job!!!
Try this:
awk -F: 'NR==1 {id=$1} $1~/^fall/ {printf "\t<course id=\"%s\">\n",id}' file1.txt > container.xml
Notes:
NR==1 {id=$1}
This saves the course ID from the first line
$1~/^fall/ {printf "\t<course id=\"%s\">\n",id}
If any line begins with fall, then the course ID is printed.
The above code illustrates that awk commands can be preceded by conditions. Thus, id=$1 is executed only if we are on the first line: NR==1. If this way, it is often unnecessary to have explicit if statements.
In awk, assignment with done with = while tests for equality are done with ==.
If this doesn't do what you want, then please add sample input and corresponding desired output to the question.
awk -F: 'NR==1{x=$1}/^fail/{printf "\t<course id=\"%s\">\n",x;exit}' file
Note:
if the file has any line beginning with fail, print the 1st field in very first line in certain format (xml tag).
no matter how many lines with fail as start, it outputs the xml tag only once.
if the file has no line starts with fail, it outputs nothing.
#!awk -f
BEGIN {
FS = ":"
}
NR==1 {
foo = $1
}
/^fall/ {
printf "\t<course id=\"%s\">\n", foo
}
Also note
BUGS
The -F option is not necessary given the command line variable assignment
feature; it remains only for backwards compatibility.
awk man page

searching multi-word patterns from one file in another using awk

patterns file:
wicked liquid
movie
guitar
balance transfer offer
drive car
bigfile file:
wickedliquidbrains
drivelicense
balanceofferings
using awk on command line:
awk '/balance/ && /offer/' bigfile
i get the result i want which is
balanceofferings
awk '/wicked/ && /liquid/' bigfile
gives me
wickedliquidbrains, which is also good..
awk '/drive/ && /car/' bigfile
does not give me drivelicense which is also good, as i am having &&
now when trying to pass shell variable, containg those '/regex1/ && /regex2/.. etc' to awk..
awk -v search="$out" '$0 ~ search' "$bigfile"
awk does not run.. what may be the problem??
Try this:
awk "$out" "$bigfile"
When you do $0 ~ search, the value of search has to be a regular expression. But you were setting it to a string containing a bunch of regexps with && between them -- that's not a valid regexp.
To perform an action on the lines that match, do:
awk "$out"' { /* do stuff */ }' "$bigfile"
I switched from double quotes to single quotes for the action in case the action uses awk variables with $.
UPDATED
An alternative to Barmars's solution with arguments passed with -v:
awk -v search="$out" 'match($0,search)' "$bigfile"
Test:
$ echo -e "one\ntwo"|awk -v luk=one 'match($0,luk)'
one
Passing two (real) regexs (EREs) to awk:
echo -e "one\ntwo\nnone"|awk -v re1=^o -v re2=e$ 'match($0,re1) && match($0,re2)'
Output:
one
If You want to read the pattern_file and do match against all the rows, You could try something like this:
awk 'NR==FNR{N=NR;re[N,0]=split($0,a);for(i in a)re[N,i]=a[i];next}
{
for(i=1;i<=N;++i) {
#for(j=1;j<=re[i,0]&&match($0,re[i,j]);++j);
for(j=1;j<=re[i,0]&&$0~re[i,j];++j);
if(j>re[i,0]){print;break}
}
}' patterns_file bigfile
Output:
wickedliquidbrains
At the 1st line it reads and stores the pattern_file in a 2D array re. Each row contains the split input string. The 0th element of each row is the length of that row.
Then it reads bigfile. Each lines of bigfile are tested for match of re array. If all items in a row are matching then that row is printed.

Resources