How to catch xth pattern1 to pattern2 - bash

this is my example to explain my question :
Bug Day 2022-01-13:
Security-Fail 248975
Resolve:
...
Bug Day 2022-01-25:
Security-Fail 225489
Security-Fail 225256
Security-Fail 225236
Resolve:
...
Bug Day 2022-02-02:
Security-Fail 222599
Resolve:
So, I have a big file that contain multiple security vulnerabilities.
I want to obtain that :
2022-01-13;248975
2022-01-25;225489,225256,225236
2022-02-02;222599
I though about doing something like
bugDayNb=$(grep "Bug Day" | wc -l)
for i in $bugDayNb; do
echo "myBugsFile" | grep -A10 -m$i "Bug Day"
done
The problem of this command is, if there are more than 10 Security-Fail, it won't works, and if I put a "-A50" it may take the next Security-Fail of the next Bug Day.
So I would prefer a way to sed or something like that from xth "Bug Day" to xth "Resolve"
Thank you !!

Here's one way to do it:
$ awk '/^Bug Day/{d=$NF; s=""}
/^Security-Fail/{d = d s $NF; s=","}
/^Resolve:/{print d}' ip.txt
2022-01-13:248975
2022-01-25:225489,225256,225236
2022-02-02:222599
/^Bug Day/{d=$NF; s=""} save the date to variable d if line starts with Bug Day and initialize s to empty string
use {d=$NF; sub(/:$/, ";", d); s=""} if you want ; instead of :
/^Security-Fail/{d = d s $NF; s=","} when line starts with Security-Fail append the number to d variable and set s so that further appends will be separated by ,
/^Resolve:/{print d} print the results when Resolve: is seen

With your shown samples, please try following awk program.
awk '
/Bug Day/{
sub(/:$/,"",$NF)
bugVal=$NF
next
}
/^Security-Fail/{
secVal=(secVal?secVal ",":"")$NF
next
}
/^Resolve:/ && bugVal && secVal{
print bugVal";"secVal
bugVal=secVal=""
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/Bug Day/{ ##Checking condition if line contains Bug day then do following.
sub(/:$/,"",$NF) ##Substituting : at last of $NF in current line.
bugVal=$NF ##Creating bugVal which has $NF value in it.
next ##next will skip all further statements from here.
}
/^Security-Fail/{ ##Checking if line starts from Security-Fail then do following.
secVal=(secVal?secVal ",":"")$NF ##Creating secVal which has $NF value in it and keep adding value to it with delimiter of comma here.
next ##next will skip all further statements from here.
}
/^Resolve:/ && bugVal && secVal{ ##Checking condition if line starts from Resolve: and bugVal is SET and secVal is SET then do following.
print bugVal";"secVal ##printing bugVal semi-colon secVal here.
bugVal=secVal="" ##Nullifying bugVal and secVal here.
}
' Input_file ##mentioning Input_file name here.

This might work for you (GNU sed):
sed -nE '/Bug Day/{:a;N;/Resolve/!ba;s/.* //mg;y/\n/,/;s/:,(.*),.*/;\1/p}' file
Gather up lines between Bug Day and Resolve and format accordingly.
If you want to be selective about a single day or range of days, use:
sed -nE '/Bug Day/{x;s/^/x/;/^x{1,3}$/!{x;d};x
:a;N;/Resolve/!ba;s/.* //mg;y/\n/,/;s/:,/;/;s/(.*),.*/\1/p}' file
The above command displays the first 3 days i.e. 1 to 3

Would you please try an awk solution:
awk '/^Bug Day/ {f=1; line=$0; next} # start of block
f {line=line ORS $0} # append the line if "f" is set
/^Security-Fail/ {g=1} # the block contains "Security-Fail"
/^Resolve/ {if (g) print line; f=g=0; line=""} # end of block
' input_file
If you prefer a one-liner:
awk '/^Bug Day/{f=1; line=$0; next} f{line=line ORS $0} /^Security-Fail/{g=1} /^Resolve/{if (g) print line; f=g=0; line=""}' input_file

Related

sed on mac: how to print a new-line for every range-match

For range-matches, wondering how to print a new line along with the match.
For example,
if the content of a file called context.txt is like
one
begin
two
three
end
four
begin
five
six
end
seven
then, this is the output I get with the following sed command
$ sed -n -e '/begin/,/end/p' content.txt
begin
two
three
end
begin
five
six
end
Instead, how can I get the output like the following:-
begin
two
three
end
begin
five
six
end
This might work for you:
sed -n -e '/begin/,/end/{/end/G;p;}' file
Print the range begin to end and append the hold space when end matches.
See here for one liner explanations.
Pipe the output through sed again:
sed -n -e '/begin/,/end/p' content.txt | sed 's/^end$/end\n/'
With your shown samples, please try following awk code.
awk '
/^end$/{
if(found){
print val ORS $0 ORS
}
found=val=""
}
/^begin$/{
val=""
found=1
}
found{
val=(val?val ORS:"")$0
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^end$/{ ##checking condition if line is equal to string end.
if(found){ ##Checking if found is NOT NULL.
print val ORS $0 ORS ##Then printing val ORS current line ORS here.
}
found=val="" ##Nullifying found and val here.
}
/^begin$/{ ##Checking condition if line is equal to string begin.
val="" ##Nullifying val here.
found=1 ##Setting found to 1 here.
}
found{ ##Checking if found is NOT NULL.
val=(val?val ORS:"")$0 ##Then keep adding current line to val variable here.
}
' Input_file ##Mentioning Input_file name here.

issue for condition on unique raws in bash

I want to print rows of a table in a file, the issue is when I use a readline the reprint me the result several times, here is my input file
aa ,DEC ,file1.txt
aa ,CHAR ,file1.txt
cc ,CHAR ,file1.txt
dd ,DEC ,file2.txt
bb ,DEC ,file3.txt
bb ,CHAR ,file3.txt
cc ,DEC ,file1.txt
Here is the result I want to have:
printed in file1.txt
aa#DEC,CHAR
cc#CHAR,DEC
printed in file2.txt
dd#DEC
printed in file3.txt
bb#DEC,CHAR
here is it my attempt :
(cat input.txt|while read line
do
table=`echo $line|cut -d"," -f1
variable=`echo $line|cut -d"," -f2
file=`echo $line|cut -d"," -f3
echo ${table}#${variable},
done ) > ${file}
This can be done in a single pass gnu awk like this:
awk -F ' *, *' '{
map[$3][$1] = (map[$3][$1] == "" ? "" : map[$3][$1] ",") $2
}
END {
for (f in map)
for (d in map[f])
print d "#" map[f][d] > f
}' file
This will populate this data:
=== file1.txt ===
aa#DEC,CHAR
cc#CHAR,DEC
=== file2.txt ===
dd#DEC
=== file3.txt ===
bb#DEC,CHAR
With your shown samples, could you please try following, written and tested in shown samples in GNU awk.
awk '
{
sub(/^,/,"",$3)
}
FNR==NR{
sub(/^,/,"",$2)
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
next
}
(($1,$3) in arr){
close(outputFile)
outputFile=$3
print $1"#"arr[$1,$3] >> (outputFile)
delete arr[$1,$3]
}
' Input_file Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
sub(/^,/,"",$3) ##Substituting starting comma in 3rd field with NULL.
}
FNR==NR{ ##Checking condition FNR==NR will be true when first time Input_file is being read.
sub(/^,/,"",$2) ##Substituting starting comma with NULL in 2nd field.
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
##Creating arr with index of 1st and 3rd fields, which has 2nd field as value.
next ##next will skip all further statements from here.
}
(($1,$3) in arr){ ##Checking condition if 1st and 3rd fields are in arr then do following.
close(outputFile) ##Closing output file, to avoid "too many opened files" error.
outputFile=$3 ##Setting outputFile with value of 3rd field.
print $1"#"arr[$1,$3] >> (outputFile)
##printing 1st field # arr value and output it to outputFile here.
delete arr[$1,$3] ##Deleting array element with index of 1st and 3rd field here.
}
' Input_file Input_file ##Mentioning Input_file 2 times here.
You have several errors in your code. You can use the built-in read to split on a comma, and the parentheses are completely unnecessary.
while IFS=, read -r table variable file
do
echo "${table}#${variable}," >>"$file"
done< input.txt
Using $file in a redirect after done is an error; the shell wants to open the file handle to redirect to before file is defined. But as per your requirements, each line should go to a different `file.
Notice also quoting fixes and the omission of the useless cat.
Wrapping fields with the same value onto the same line would be comfortably easy with an Awk postprocessor, but then you might as well do all of this in Awk, as in the other answer you already received.

How to trim every nth line?

i would like to cut off the first 9 characters of each 4th line. I could use cut -c 9, but i don't know how to select only every 4th line, without loosing the remaining lines.
Input:
#V300059044L3C001R0010004402
AAGTAGATATCATGGAGCCG
+
FFFGFGGFGFGFFGFFGFFGGGGGFFFGG
#V300059044L3C001R0010009240
AAAGGGAGGGAGAATAATGG
+
GFFGFEGFGFGEFDFGGEFFGGEDEGEGF
Output:
#V300059044L3C001R0010004402
AAGTAGATATCATGGAGCCG
+
FGFFGFFGFFGGGGGFFFGG
#V300059044L3C001R0010009240
AAAGGGAGGGAGAATAATGG
+
FGEFDFGGEFFGGEDEGEGF
Could you please try following, written and tested with shown samples in GNU awk.
awk 'FNR%4==0{print substr($0,10);next} 1' Input_file
OR as per #tripleee's suggestion(in comments) try:
awk '!(FNR%4) { $0 = substr($0, 10) }1' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR%4==0{ ##Checking condition if this line number is fully divided by 4(every 4th line).
print substr($0,10) ##Printing line from 10th character here.
next ##next will skip all further statements from here.
}
1 ##1 will print current Line.
' Input_file ##Mentioning Input_file name here.
GNU sed can choose every 4th line with 4~4, e.g.:
sed -E '4~4s/.{9}//'

How to add single quote after specific word using sed?

I am trying to write a script to add a single quote after a "GOOD" word .
For example, I have file1 :
//WER GOOD=ONE
//WER1 GOOD=TWO2
//PR1 GOOD=THR45
...
Desired change is to add single quotes :
//WER GOOD='ONE'
//WER1 GOOD='TWO2'
//PR1 GOOD='THR45'
...
This is the script which I am trying to run:
#!/bin/bash
for item in `grep "GOOD" file1 | cut -f2 -d '='`
do
sed -i 's/$item/`\$item/`\/g' file1
done
Thank you for the help in advance !
Could you please try following.
sed "s/\(.*=\)\(.*\)/\1'\2'/" Input_file
OR as per OP's comment to remove empty line use:
sed "s/\(.*=\)\(.*\)/\1'\2'/;/^$/d" Input_file
Explanation: following is only for explanation purposes.
sed " ##Starting sed command from here.
s/ ##Using s to start substitution process from here.
\(.*=\)\(.*\) ##Using sed buffer capability to store matched regex into memory, saving everything till = in 1st buffer and rest of line in 2nd memory buffer.
/\1'\2' ##Now substituting 1st and 2nd memory buffers with \1'\2' as per OP need adding single quotes before = here.
/" Input_file ##Closing block for substitution, mentioning Input_file name here.
Please use -i option in above code in case you want to save output into Input_file itself.
2nd solution with awk:
awk 'match($0,/=.*/){$0=substr($0,1,RSTART) "\047" substr($0,RSTART+1,RLENGTH) "\047"} 1' Input_file
Explanation: Adding explanation for above code.
awk '
match($0,/=.*/){ ##Using match function to mmatch everything from = to till end of line.
$0=substr($0,1,RSTART) "\047" substr($0,RSTART+1,RLENGTH) "\047" ##Creating value of $0 with sub-strings till value of RSTART and adding ' then sub-strings till end of line adding ' then as per OP need.
} ##Where RSTART and RLENGTH are variables which will be SET once a TRUE matched regex is found.
1 ##1 will print edited/non-edited line.
' Input_file ##Mentioning Input_file name here.
3rd solution: In case you have only 2 fields in your Input_file then try more simpler in awk:
awk 'BEGIN{FS=OFS="="} {$2="\047" $2 "\047"} 1' Input_file
Explanation of 3rd code: Use only for explanation purposes, for running please use above code itself.
awk ' ##Starting awk program here.
BEGIN{FS=OFS="="} ##Setting FS and OFS values as = for all line for Input_file here.
{$2="\047" $2 "\047"} ##Setting $2 value with adding a ' $2 and then ' as per OP need.
1 ##Mentioning 1 will print edited/non-edited lines here.
' Input_file ##Mentioning Input_file name here.

replace names in fasta

I want to change the sequence names in a fasta file according a text file containing new names. I found several approaches but seqkit made a good impression, anyway I can´t get it running. Replace key with value by key-value file
The fasta file seq.fa looks like
>BC1
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
>BC2
TGCATGCATGCATGCATGCATGCATGCATGCATGCATGCG
GCATGCATGCATGCATGCATGCATGCATGCATGCG
>BC3
GCATGCATGCATGCATGCATGCATGCATGCATGCCCCCCC
TGCATGCATGCATG
and the ref.txt tab delimited text file like
BC1 1234
BC2 1235
BC3 1236
using siqkit in Git Bash runs trough the file but doesn´t change the names.
seqkit replace -p' (.+)$' -r' {kv}' -k ref.txt seq.fa --keep-key
I´m used to r and new to bash and can´t find the bug but guess I need to adjust for tab and _ ?
As in the example https://bioinf.shenwei.me/seqkit/usage/#replace part 7. Replace key with value by key-value file the sequence name is tab delimited and only the second part is replaced.
Advise how to adjust the code?
Desired outcome should look like: Replacing BC1 by the number in the text file 1234
>1234
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
>1235
TGCATGCATGCATGCATGCATGCATGCATGCATGCATGCG
GCATGCATGCATGCATGCATGCATGCATGCATGCG
>1236
GCATGCATGCATGCATGCATGCATGCATGCATGCCCCCCC
TGCATGCATGCATG
could you please try following.
awk '
FNR==NR{
a[$1]=$2
next
}
($2 in a) && /^>/{
print ">"a[$2]
next
}
1
' ref.txt FS="[> ]" seq.fa
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program here.
FNR==NR{ ##FNR==NR is condition which will be TRUE when 1st Input_file named ref.txt will be read.
a[$1]=$2 ##Creating an array named a whose index is $1 and value is $2 of current line.
next ##next will skip all further statements from here.
} ##Closing BLOCK for FNR==NR condition here.
($2 in a) && /^>/{ ##Checking condition if $2 of current line is present in array a and starts with > then do following.
print ">"a[$2] ##Printing > and value of array a whose index is $2.
next ##next will skip all further statements from here.
}
1 ##Mentioning 1 will print the lines(those which are NOT starting with > in Input_file seq.fa)
' ref.txt FS="[> ]" seq.fa ##Mentioning Input_file names here and setting FS= either space or > for Input_file seq.fa here.
EDIT: As per OP's comment need to add >1234_1 occurrence number too in output so adding following code now.
awk '
FNR==NR{
a[$1]=$2
b[$1]=++c[$2]
next
}
($2 in a) && /^>/{
print ">"a[$2]"_"b[$2]
next
}
1
' ref.txt FS="[> ]" seq.fa
awk solution that doesn't require GNU awk:
awk 'NR==FNR{a[$1]=$2;next}
NF==2{$2=a[$2]; print ">" $2;next}
1' FS='\t' ref.txt FS='>' seq.fa
The first statement is filling the array a with the content of the tab delimited file ref.txt.
The second statement prints all lines of the second files seq.fa with 2 fields given the > as field delimiter.
The last statement prints all lines of that same file.

Resources