sed on mac: how to print a new-line for every range-match - macos

For range-matches, wondering how to print a new line along with the match.
For example,
if the content of a file called context.txt is like
one
begin
two
three
end
four
begin
five
six
end
seven
then, this is the output I get with the following sed command
$ sed -n -e '/begin/,/end/p' content.txt
begin
two
three
end
begin
five
six
end
Instead, how can I get the output like the following:-
begin
two
three
end
begin
five
six
end

This might work for you:
sed -n -e '/begin/,/end/{/end/G;p;}' file
Print the range begin to end and append the hold space when end matches.
See here for one liner explanations.

Pipe the output through sed again:
sed -n -e '/begin/,/end/p' content.txt | sed 's/^end$/end\n/'

With your shown samples, please try following awk code.
awk '
/^end$/{
if(found){
print val ORS $0 ORS
}
found=val=""
}
/^begin$/{
val=""
found=1
}
found{
val=(val?val ORS:"")$0
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^end$/{ ##checking condition if line is equal to string end.
if(found){ ##Checking if found is NOT NULL.
print val ORS $0 ORS ##Then printing val ORS current line ORS here.
}
found=val="" ##Nullifying found and val here.
}
/^begin$/{ ##Checking condition if line is equal to string begin.
val="" ##Nullifying val here.
found=1 ##Setting found to 1 here.
}
found{ ##Checking if found is NOT NULL.
val=(val?val ORS:"")$0 ##Then keep adding current line to val variable here.
}
' Input_file ##Mentioning Input_file name here.

Related

awk from file using echo and output to file

A.txt contains:
/*333*/
asdfasdfadfg
sadfasdfasgadas
###
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
###
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
###
B.txt contains:
555
777
I want to create the loop, for each string found in B.txt, then output the '/*'[the string] until right before the first '###' met to each own file (the string name is also used as file name).
So based on the sample above, the result should be :
555.txt, which contains:
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
and 777.txt, which contains:
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
I tried this script but it outputs nothing:
for i in `cat B.txt`; do echo $i | awk '/{print "/*"$1}/{flag=1} /###/{flag=0} flag' A.txt > $i.txt; done
Thank you in advance
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
FNR==NR{
if($0~/^\/\*/){
line=$0
gsub(/^\/\*|\*\/$/,"",line)
arr[++count]=$0
arr1[line]=count
next
}
arr[count]=(arr[count]?arr[count] ORS:"") $0
next
}
($0 in arr1){
outputFile=$0".txt"
print arr[arr1[$0]] >> (outputFile)
close(outputFile)
}
' file1 file2
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file1 is being read.
if($0~/^\/\*/){ ##Checking condition if current line starts with /* then do following.
line=$0 ##Setting $0 to line variable here.
gsub(/^\/\*|\*\/$/,"",line) ##using gsub to globally substitute starting /* and ending */ with NULL in line here.
arr[++count]=$0 ##Creating arr with index of ++count and value is $0.
arr1[line]=count ##Creating arr1 with index of line and value of count.
next ##next will skip all further statements from here.
}
arr[count]=(arr[count]?arr[count] ORS:"") $0 ##Creating arr with index of count and keep appending values of same count values with current line value.
next ##next will skip all further statements from here.
}
($0 in arr1){ ##checking if current line is present in arr1 then do following.
outputFile=$0".txt" ##Creating outputFile with current line .txt value here.
print arr[arr1[$0]] >> (outputFile) ##Printing arr value with index of arr1[$0] to outputFile.
close(outputFile) ##Closing outputFile in backend to avoid too many opened files error.
}
' file1 file2 ##Mentioning Input_file names here.
Making a few alterations to your code provides the desired outcome with the example data provided:
while read -r f
do
awk -v var="/[*]$f[*]/" '$0 ~ var {flag=1} /###/{flag=0} flag' A.txt > "$f".txt
done < B.txt
cat 555.txt
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
cat 777.txt
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
Does this solve your problem?
Here is another awk solution for this:
awk '
FNR == NR {
map["/*" $0 "*/"] = $0
next
}
$0 in map {
fn = map[$0] ".txt"
}
/^###$/ {
close(fn)
fn = ""
}
fn {print > fn}' B.txt A.txt

How to catch xth pattern1 to pattern2

this is my example to explain my question :
Bug Day 2022-01-13:
Security-Fail 248975
Resolve:
...
Bug Day 2022-01-25:
Security-Fail 225489
Security-Fail 225256
Security-Fail 225236
Resolve:
...
Bug Day 2022-02-02:
Security-Fail 222599
Resolve:
So, I have a big file that contain multiple security vulnerabilities.
I want to obtain that :
2022-01-13;248975
2022-01-25;225489,225256,225236
2022-02-02;222599
I though about doing something like
bugDayNb=$(grep "Bug Day" | wc -l)
for i in $bugDayNb; do
echo "myBugsFile" | grep -A10 -m$i "Bug Day"
done
The problem of this command is, if there are more than 10 Security-Fail, it won't works, and if I put a "-A50" it may take the next Security-Fail of the next Bug Day.
So I would prefer a way to sed or something like that from xth "Bug Day" to xth "Resolve"
Thank you !!
Here's one way to do it:
$ awk '/^Bug Day/{d=$NF; s=""}
/^Security-Fail/{d = d s $NF; s=","}
/^Resolve:/{print d}' ip.txt
2022-01-13:248975
2022-01-25:225489,225256,225236
2022-02-02:222599
/^Bug Day/{d=$NF; s=""} save the date to variable d if line starts with Bug Day and initialize s to empty string
use {d=$NF; sub(/:$/, ";", d); s=""} if you want ; instead of :
/^Security-Fail/{d = d s $NF; s=","} when line starts with Security-Fail append the number to d variable and set s so that further appends will be separated by ,
/^Resolve:/{print d} print the results when Resolve: is seen
With your shown samples, please try following awk program.
awk '
/Bug Day/{
sub(/:$/,"",$NF)
bugVal=$NF
next
}
/^Security-Fail/{
secVal=(secVal?secVal ",":"")$NF
next
}
/^Resolve:/ && bugVal && secVal{
print bugVal";"secVal
bugVal=secVal=""
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/Bug Day/{ ##Checking condition if line contains Bug day then do following.
sub(/:$/,"",$NF) ##Substituting : at last of $NF in current line.
bugVal=$NF ##Creating bugVal which has $NF value in it.
next ##next will skip all further statements from here.
}
/^Security-Fail/{ ##Checking if line starts from Security-Fail then do following.
secVal=(secVal?secVal ",":"")$NF ##Creating secVal which has $NF value in it and keep adding value to it with delimiter of comma here.
next ##next will skip all further statements from here.
}
/^Resolve:/ && bugVal && secVal{ ##Checking condition if line starts from Resolve: and bugVal is SET and secVal is SET then do following.
print bugVal";"secVal ##printing bugVal semi-colon secVal here.
bugVal=secVal="" ##Nullifying bugVal and secVal here.
}
' Input_file ##mentioning Input_file name here.
This might work for you (GNU sed):
sed -nE '/Bug Day/{:a;N;/Resolve/!ba;s/.* //mg;y/\n/,/;s/:,(.*),.*/;\1/p}' file
Gather up lines between Bug Day and Resolve and format accordingly.
If you want to be selective about a single day or range of days, use:
sed -nE '/Bug Day/{x;s/^/x/;/^x{1,3}$/!{x;d};x
:a;N;/Resolve/!ba;s/.* //mg;y/\n/,/;s/:,/;/;s/(.*),.*/\1/p}' file
The above command displays the first 3 days i.e. 1 to 3
Would you please try an awk solution:
awk '/^Bug Day/ {f=1; line=$0; next} # start of block
f {line=line ORS $0} # append the line if "f" is set
/^Security-Fail/ {g=1} # the block contains "Security-Fail"
/^Resolve/ {if (g) print line; f=g=0; line=""} # end of block
' input_file
If you prefer a one-liner:
awk '/^Bug Day/{f=1; line=$0; next} f{line=line ORS $0} /^Security-Fail/{g=1} /^Resolve/{if (g) print line; f=g=0; line=""}' input_file

need help on unix Script to read a data from specific position and use the extracted in query

Input file:
ADSDWETTYT017775227ACG
ADSDWETTYT029635225HCG
ADSDWETTYC018525223JCG
ADSDWETTYC987415221ACG
ADSDWETTCC891235219ACG
ADSDWETTTT074565217ACG
ADSDWETTYT567895213ACG
ADSDWETTYH037535215ACG
ADSDWETTYC051595211ACG
ADSDWETTYT052465209ACG
ADSDWETTYT067595207ACG
ADSDWETTYT077515205ACG
need to check the 10 position on the file contain/start with T, if its start with "T" then i need to take the value from 14 char from 16.
from the above file am expecting the below output,
'5227','5225','5217','5213','5209','5207','5205'
this result i should assigned to some constant like (result below) and should be used in the query where clause like below,
result=$(awk '
BEGIN{
conf="" };
{ if(substr($0,10,1)=="T"){
conf=substr($0,16,4);
{NT==1?s="'\''"conf:s=s"'\'','\''"conf}
}
}
END {print s"'\''"}
' $INPUT_FILE_PATH
db2 "EXPORT TO ${OUTPUT_FILE} OF DEL select STATUS FROM TRAN where TN_NR in (${result})"
I need some help to enhance the awk command and passing the constant in query where clause. kindly help.
With your shown samples, attempts; please try following awk code.
awk -v s1="'" 'BEGIN{OFS=", "} substr($0,10,1)=="T"{val=(val?val OFS:"") (s1 substr($0,16,4) s1)} END{print val}' Input_file
Adding non-one liner form of above code:
awk -v s1="'" '
BEGIN{ OFS=", " }
substr($0,10,1)=="T"{
val=(val?val OFS:"") (s1 substr($0,16,4) s1)
}
END{
print val
}
' Input_file
To save output of this code into a shell variable try following:
value=$(awk -v s1="'" 'BEGIN{OFS=", "} substr($0,10,1)=="T"{val=(val?val OFS:"") (s1 substr($0,16,4) s1)} END{print val}' Input_file)
Explanation: Adding detailed explanation for above code.
awk -v s1="'" ' ##Starting awk program from here setting s1 to ' here.
BEGIN{ OFS=", " } ##Setting OFS as comma space here.
substr($0,10,1)=="T"{ ##Checking condition if 10th character is T then do following.
val=(val?val OFS:"") (s1 substr($0,16,4) s1) ##Creating val which has values from current line as per OP requirement.
}
END{ ##Starting END block of this program from here.
print val ##Printing val here.
}
' Input_file ##Mentioning Input_file name here.
I'd use sed for this:
sed -En '/^.{9}T/ s/^.{15}(....).*/\1/p' file
And then to get your exact output, pipe that into
... | sed "s/.*/'&'/" | paste -sd,
I'd use perl over awk here for its better arrays (in particular, joining one into a string). Something like:
perl -nE "push #n, substr(\$_, 15, 4) if /^.{9}T/;
END { say join(',', map { \"'\$_'\" } #n) }" "$INPUT_FILE_PATH"

Use sed (or similar) to remove anything between repeating patterns

I'm essentially trying to "tidy" a lot of data in a CSV. I don't need any of the information that's in "quotes".
Tried sed 's/".*"/""/' but it removes the commas if there's more than one section together.
I would like to get from this:
1,2,"a",4,"b","c",5
To this:
1,2,,4,,,5
Is there a sed wizard who can help? :)
You may use
sed 's/"[^"]*"//g' file > newfile
See online sed demo:
s='1,2,"a",4,"b","c",5'
sed 's/"[^"]*"//g' <<< "$s"
# => 1,2,,4,,,5
Details
The "[^"]*" pattern matches ", then 0 or more characters other than ", and then ". The matches are removed since RHS is empty. g flag makes it match all occurrences on each line.
Could you please try following.
awk -v s1="\"" 'BEGIN{FS=OFS=","} {for(i=1;i<=NF;i++){if($i~s1){$i=""}}} 1' Input_file
Non-one liner form of solution is:
awk -v s1="\"" '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i~s1){
$i=""
}
}
}
1
' Input_file
Detailed explanation:
awk -v s1="\"" ' ##Starting awk program from here and mentioning variable s1 whose value is "
BEGIN{ ##Starting BEGIN section of this code here.
FS=OFS="," ##Setting field separator and output field separator as comma(,) here.
}
{
for(i=1;i<=NF;i++){ ##Starting a for loop which traverse through all fields of current line.
if($i~s1){ ##Checking if current field has " in it if yes then do following.
$i="" ##Nullifying current field value here.
}
}
}
1 ##Mentioning 1 will print edited/non-edited line here.
' Input_file ##Mentioning Input_file name here.
With Perl:
perl -p -e 's/".*?"//g' file
? forces * to be non-greedy.
Output:
1,2,,4,,,5

Retrieve data from a file using patterns and annotate it with its filename

I have a file called bin.001.fasta looking like this:
>contig_655
GGCGGTTATTTAGTATCTGCCACTCAGCCTCGCTATTATGCGAAATTTGAGGGCAGGAGGAAACCATGAC
AGTAGTCAAGTGCGACAAGC
>contig_866
CCCAGACCTTTCAGTTGTTGGGTGGGGTGGGTGCTGACCGCTGGTGAGGGCTCGACGGCGCCCATCCTGG
CTAGTTGAAC
...
What I wanna do is to get a new file, where the 1st column is retrieved contig IDs and the 2nd column is the filename without .fasta:
contig_655 bin.001
contig_866 bin.001
Any ideas how to make it ?
Could you please try following.
awk -F'>' '
FNR==1{
split(FILENAME,array,".")
file=array[1]"."array[2]
}
/^>/{
print $2,file
}
' Input_file
OR more generic if your Input_file has more than 2 dots then run following.
awk -F'>' '
FNR==1{
match(FILENAME,/.*\./)
file=substr(FILENAME,RSTART,RLENGTH-1)
}
/^>/{
print $2,file
}
' Input_file
Explanation: Adding detailed explanation for above code.
awk -F'>' ' ##Starting awk program from here and setting field separator as > here for all lines.
FNR==1{ ##Checking condition if this is first line then do following.
split(FILENAME,array,".") ##Splitting filename which is passed to this awk program into an array named array with delimiter .
file=array[1]"."array[2] ##Creating variable file whose value is 1st and 2nd element of array with DOT in between as per OP shown sample.
}
/^>/{ ##Checking condition if a line starts with > then do following.
print $2,file ##Printing 2nd field and variable file value here.
}
' Input_file ##Mentioning Input_file name here.

Resources