awk: search for a string, but only inside a range

awk: search for a string, but only inside a range - bash

I want to search for a string in a file, and print that line plus the line preceding it, but only after the line where string XXX appears in the file. How can I achieve this?
Here is an example: search for lines containing the string "### records", but only after the line that says "start real work"
INPUT FILE
cat << EOF > x.x
start job
initialization
20 records
start real work
first step
30 records
# comments
second step
0 records
#comments
third step
22 records
end
EOF
AWK ONE-LINER - This searches through the whole file, I can't figure out how to only start searching for the string "#### records" after the line that says "start real work"
awk '/records/ && !/^0 records/{for(i=1;i<=x;)print a[i++];print} \
{for(i=1;i<x;i++)a[i]=a[i+1];a[x]=$0;}' x=1 x.x
DESIRED OUTPUT
first step
30 records
third step
22 records

With your shown samples, please try following awk code.
awk '
/start real work/{
found=1
next
}
val && /records/{
if($1>0){
print val ORS $0
}
val=""
next
}
found && NF && !/#/{
val=$0
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/start real work/{ ##Check if line contains start real work then do following.
found=1 ##Setting found to 1 here.
next ##next will skip all further statements from here.
}
val && /records/{ ##Checking if val is set and line contains records then do following.
if($1>0){ ##Check if 1st field is greater than 0 then do following.
print val ORS $0 ##printing val ORS and current line here.
}
val="" ##Nullifying val here.
next ##next will skip all further statements from here.
}
found && NF && !/#/{ ##Checking if found is SET and NF is NOT NULL and lines is not having #
val=$0 ##Then set val to current line.
}
' Input_file ##Mentioning Input_file name here.

awk '
/start real work/ { inWork = 1 }
inWork && /^[1-9].* records/ { print prev ORS $0 }
{ prev = $0 }
' file
first step
30 records
third step
22 records

awk '/^start real work/{flag=1} flag && !/[0-9]{2}/{lastline=$0} flag && /^[0-9]{2} records/{print lastline;print}' x.x
first step
30 records
third step
22 records
note:
your problem description does not mention anything about the "step" lines shown your output.
The idea is to set a flag when you see the signal to begin and check the flag along with any other test you may require.
If the flag is set and a line is not a valid "records" line,
then stash it (as lastline).
If the flag is set and the line is a valid two digit "records" line then output the stashed line and then the current line.

With awk in paragraph mode:
awk -v RS= -v FS='\n' -v OFS='\n' '
/start real work/ {f=1;next}
f && (/records/ && !/^#/)
f && (/^#/ && $3 !~ /^0/) {print $2,$3}
' file
first step
30 records
third step
22 records

Related

awk from file using echo and output to file

A.txt contains:
/*333*/
asdfasdfadfg
sadfasdfasgadas
###
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
###
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
###
B.txt contains:
555
777
I want to create the loop, for each string found in B.txt, then output the '/*'[the string] until right before the first '###' met to each own file (the string name is also used as file name).
So based on the sample above, the result should be :
555.txt, which contains:
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
and 777.txt, which contains:
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
I tried this script but it outputs nothing:
for i in `cat B.txt`; do echo $i | awk '/{print "/*"$1}/{flag=1} /###/{flag=0} flag' A.txt > $i.txt; done
Thank you in advance

With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
FNR==NR{
if($0~/^\/\*/){
line=$0
gsub(/^\/\*|\*\/$/,"",line)
arr[++count]=$0
arr1[line]=count
next
}
arr[count]=(arr[count]?arr[count] ORS:"") $0
next
}
($0 in arr1){
outputFile=$0".txt"
print arr[arr1[$0]] >> (outputFile)
close(outputFile)
}
' file1 file2
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file1 is being read.
if($0~/^\/\*/){ ##Checking condition if current line starts with /* then do following.
line=$0 ##Setting $0 to line variable here.
gsub(/^\/\*|\*\/$/,"",line) ##using gsub to globally substitute starting /* and ending */ with NULL in line here.
arr[++count]=$0 ##Creating arr with index of ++count and value is $0.
arr1[line]=count ##Creating arr1 with index of line and value of count.
next ##next will skip all further statements from here.
}
arr[count]=(arr[count]?arr[count] ORS:"") $0 ##Creating arr with index of count and keep appending values of same count values with current line value.
next ##next will skip all further statements from here.
}
($0 in arr1){ ##checking if current line is present in arr1 then do following.
outputFile=$0".txt" ##Creating outputFile with current line .txt value here.
print arr[arr1[$0]] >> (outputFile) ##Printing arr value with index of arr1[$0] to outputFile.
close(outputFile) ##Closing outputFile in backend to avoid too many opened files error.
}
' file1 file2 ##Mentioning Input_file names here.

Making a few alterations to your code provides the desired outcome with the example data provided:
while read -r f
do
awk -v var="/[*]$f[*]/" '$0 ~ var {flag=1} /###/{flag=0} flag' A.txt > "$f".txt
done < B.txt
cat 555.txt
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
cat 777.txt
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
Does this solve your problem?

Here is another awk solution for this:
awk '
FNR == NR {
map["/*" $0 "*/"] = $0
next
}
$0 in map {
fn = map[$0] ".txt"
}
/^###$/ {
close(fn)
fn = ""
}
fn {print > fn}' B.txt A.txt

How to catch xth pattern1 to pattern2

this is my example to explain my question :
Bug Day 2022-01-13:
Security-Fail 248975
Resolve:
...
Bug Day 2022-01-25:
Security-Fail 225489
Security-Fail 225256
Security-Fail 225236
Resolve:
...
Bug Day 2022-02-02:
Security-Fail 222599
Resolve:
So, I have a big file that contain multiple security vulnerabilities.
I want to obtain that :
2022-01-13;248975
2022-01-25;225489,225256,225236
2022-02-02;222599
I though about doing something like
bugDayNb=$(grep "Bug Day" | wc -l)
for i in $bugDayNb; do
echo "myBugsFile" | grep -A10 -m$i "Bug Day"
done
The problem of this command is, if there are more than 10 Security-Fail, it won't works, and if I put a "-A50" it may take the next Security-Fail of the next Bug Day.
So I would prefer a way to sed or something like that from xth "Bug Day" to xth "Resolve"
Thank you !!

Here's one way to do it:
$ awk '/^Bug Day/{d=$NF; s=""}
/^Security-Fail/{d = d s $NF; s=","}
/^Resolve:/{print d}' ip.txt
2022-01-13:248975
2022-01-25:225489,225256,225236
2022-02-02:222599
/^Bug Day/{d=$NF; s=""} save the date to variable d if line starts with Bug Day and initialize s to empty string
use {d=$NF; sub(/:$/, ";", d); s=""} if you want ; instead of :
/^Security-Fail/{d = d s $NF; s=","} when line starts with Security-Fail append the number to d variable and set s so that further appends will be separated by ,
/^Resolve:/{print d} print the results when Resolve: is seen

With your shown samples, please try following awk program.
awk '
/Bug Day/{
sub(/:$/,"",$NF)
bugVal=$NF
next
}
/^Security-Fail/{
secVal=(secVal?secVal ",":"")$NF
next
}
/^Resolve:/ && bugVal && secVal{
print bugVal";"secVal
bugVal=secVal=""
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/Bug Day/{ ##Checking condition if line contains Bug day then do following.
sub(/:$/,"",$NF) ##Substituting : at last of $NF in current line.
bugVal=$NF ##Creating bugVal which has $NF value in it.
next ##next will skip all further statements from here.
}
/^Security-Fail/{ ##Checking if line starts from Security-Fail then do following.
secVal=(secVal?secVal ",":"")$NF ##Creating secVal which has $NF value in it and keep adding value to it with delimiter of comma here.
next ##next will skip all further statements from here.
}
/^Resolve:/ && bugVal && secVal{ ##Checking condition if line starts from Resolve: and bugVal is SET and secVal is SET then do following.
print bugVal";"secVal ##printing bugVal semi-colon secVal here.
bugVal=secVal="" ##Nullifying bugVal and secVal here.
}
' Input_file ##mentioning Input_file name here.

This might work for you (GNU sed):
sed -nE '/Bug Day/{:a;N;/Resolve/!ba;s/.* //mg;y/\n/,/;s/:,(.*),.*/;\1/p}' file
Gather up lines between Bug Day and Resolve and format accordingly.
If you want to be selective about a single day or range of days, use:
sed -nE '/Bug Day/{x;s/^/x/;/^x{1,3}$/!{x;d};x
:a;N;/Resolve/!ba;s/.* //mg;y/\n/,/;s/:,/;/;s/(.*),.*/\1/p}' file
The above command displays the first 3 days i.e. 1 to 3

Would you please try an awk solution:
awk '/^Bug Day/ {f=1; line=$0; next} # start of block
f {line=line ORS $0} # append the line if "f" is set
/^Security-Fail/ {g=1} # the block contains "Security-Fail"
/^Resolve/ {if (g) print line; f=g=0; line=""} # end of block
' input_file
If you prefer a one-liner:
awk '/^Bug Day/{f=1; line=$0; next} f{line=line ORS $0} /^Security-Fail/{g=1} /^Resolve/{if (g) print line; f=g=0; line=""}' input_file

sed on mac: how to print a new-line for every range-match

For range-matches, wondering how to print a new line along with the match.
For example,
if the content of a file called context.txt is like
one
begin
two
three
end
four
begin
five
six
end
seven
then, this is the output I get with the following sed command
$ sed -n -e '/begin/,/end/p' content.txt
begin
two
three
end
begin
five
six
end
Instead, how can I get the output like the following:-
begin
two
three
end
begin
five
six
end

This might work for you:
sed -n -e '/begin/,/end/{/end/G;p;}' file
Print the range begin to end and append the hold space when end matches.
See here for one liner explanations.

Pipe the output through sed again:
sed -n -e '/begin/,/end/p' content.txt | sed 's/^end$/end\n/'

With your shown samples, please try following awk code.
awk '
/^end$/{
if(found){
print val ORS $0 ORS
}
found=val=""
}
/^begin$/{
val=""
found=1
}
found{
val=(val?val ORS:"")$0
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^end$/{ ##checking condition if line is equal to string end.
if(found){ ##Checking if found is NOT NULL.
print val ORS $0 ORS ##Then printing val ORS current line ORS here.
}
found=val="" ##Nullifying found and val here.
}
/^begin$/{ ##Checking condition if line is equal to string begin.
val="" ##Nullifying val here.
found=1 ##Setting found to 1 here.
}
found{ ##Checking if found is NOT NULL.
val=(val?val ORS:"")$0 ##Then keep adding current line to val variable here.
}
' Input_file ##Mentioning Input_file name here.

How to run a bash script in a loop

i wrote a bash script in order to pull substrings and save it to an output file from two input files that looks like this:
input file 1
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
input file 2
gene1 10 20
gene2 40 50
genen x y
my script
>output_file
cat input_file2 | while read row; do
echo $row > temp
geneName=`awk '{print $1}' temp`
startPos=`awk '{print $2}' temp`
endPos=`awk '{print $3}' temp`
length=$(expr $endPos - $startPos)
for i in temp; do
echo ">${geneName}" >> genes_fasta
awk -v S=$startPos -v L=$length '{print substr($0,S,L)}' input_file1 >> output file
done
done
how can i make it work in a loop for more than one string in the input file 1?
new input file looks like this:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotypen...
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn...
I would like to have a different out file for every genotype and that the file name would be the genotype name.
thank you!

If I'm understanding correctly, would you try the following:
awk '
FNR==NR {
name[NR] = $1
start[NR] = $2
len[NR] = $3 - $2
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=$0
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > genotype
print substr($0, start[i], len[i]) >> genotype
}
close(genotype)
}' input_file2 input_file1
input_file1:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotype3
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Input_file2:
gene1 10 20
gene2 40 50
gene3 20 25
[Results]
genotype1:
>gene1
aaaaaaaaaa
>gene2
aaaaaaaaaa
>gene3
aaaaa
genotype2:
>gene1
bbbbbbbbbb
>gene2
bbbbbbbbbb
>gene3
bbbbb
genotype3:
>gene1
nnnnnnnnnn
>gene2
nnnnnnnnnn
>gene3
nnnnn
[EDIT]
If you want to store the output files to a different directory,
please try the following instead:
dir="./outdir" # directory name to store the output files
# you can modify the name as you want
mkdir -p "$dir"
awk -v dir="$dir" '
FNR==NR {
name[NR] = $1
start[NR] = $2
len[NR] = $3 - $2
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=$0
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > dir"/"genotype
print substr($0, start[i], len[i]) >> dir"/"genotype
}
close(dir"/"genotype)
}' input_file2 input_file1
The 1st two lines are executed in bash to define and mkdir the destination directory.
Then the directory name is passed to awk via -v option
Hope this helps.

Could you please try following, where I am assuming that your Input_file1's column which starts with > should be compared with 1st column of Input_file2's first column (since samples are confusing so based on OP's attempt this has been written).
awk '
FNR==NR{
start_point[$1]=$2
end_point[$1]=$3
next
}
/^>/{
sub(/^>/,"")
val=$0
next
}
{
print val ORS substr($0,start_point[val],end_point[val])
val=""
}
' Input_file2 Input_file1
Explanation: Adding explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named Input_file2 is being read.
start_point[$1]=$2 ##Creating an array named start_point with index $1 of current line and its value is $2.
end_point[$1]=$3 ##Creating an array named end_point with index $1 of current line and its value is $3.
next ##next will skip all further statements from here.
}
/^>/{ ##Checking condition if a line starts from > then do following.
sub(/^>/,"") ##Substituting starting > with NULL.
val=$0 ##Creating a variable val whose value is $0.
next ##next will skip all further statements from here.
}
{
print val ORS substr($0,start_point[val],end_point[val]) ##Printing val newline(ORS) and sub-string of current line whose start value is value of start_point[val] and end point is value of end_point[val].
val="" ##Nullifying variable val here.
}
' Input_file2 Input_file1 ##Mentioning Input_file names here.

awk get the nextline

i'm trying to use awk to format a file thats contains multiple line.
Contains of file:
ABC;0;1
ABC;0;0;10
ABC;0;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12
KLM;6;18;1200
KLM;10;18;14
KLM;1;18;15
result desired:
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
I am using the code below :
awk -F ";" '{
ligne= ligne $0
ma_var = $1
{getline
if($1 != ma_var){
ligne= ligne "\n" $0
}
else {
ligne= ligne";"NF
}
}
}
END {
print ligne
} ' ${FILE_IN} > ${FILE_OUT}
the objectif is to compare the first column of the next line to the first column the current line, if it matches then add the last column of the next line to the current line, and delete the next line, else print the next line.
Kind regards,

As with life, it's a lot easier to make decisions based on what has happened (the previous line) than what will happen (the next line). Re-state your requirements as the objective is to compare the first column of the current line to the first column the previous line, if it matches then add the last column of the current line to the previous line, and delete the current line, else print the current line. and the code to implement it becomes relatively straight-forward:
$ cat tst.awk
BEGIN { FS=OFS=";" }
$1 == p1 { prev = prev OFS $NF; next }
{ if (NR>1) print prev; prev=$0; p1=$1 }
END { print prev }
$ awk -f tst.awk file
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
If you're ever tempted to use getline again, be sure you fully understand everything discussed at http://awk.freeshell.org/AllAboutGetline before making a decision.

I would take a slightly different approach than Ed:
$ awk '$1 == p { printf ";%s", $NF; next } NR > 1 { print "" } {p=$1;
printf "%s" , $0} END{print ""}' FS=\; input
At each line, check if the first column matches the previous. If it does, just print the last field. If it doesn't, print the whole line with no trailing newline.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

awk: search for a string, but only inside a range - bash

awk ' /start real work/ { inWork = 1 } inWork && /^[1-9].* records/ { print prev ORS $0 } { prev = $0 } ' file first step 30 records third step 22 records

With awk in paragraph mode: awk -v RS= -v FS='\n' -v OFS='\n' ' /start real work/ {f=1;next} f && (/records/ && !/^#/) f && (/^#/ && $3 !~ /^0/) {print $2,$3} ' file first step 30 records third step 22 records

Related

awk from file using echo and output to file

How to catch xth pattern1 to pattern2

sed on mac: how to print a new-line for every range-match

How to run a bash script in a loop

awk get the nextline

Categories

Resources