Removing a delimited block of lines when one of them matches a regex pattern with awk - bash

Let's assume the following reprepro distributions file:
Origin: git.sdxlive.com/git/PPA
Label: Ubuntu focal
Suite: focal
Version: 20.04
Codename: focal
Architectures: i386 amd64
Components: stable unstable
Limit: 0
Description: Latest Ubuntu focal 20.04 packages
Contents: .gz .bz2
Tracking: keep
SignWith: xxxxxxxxxxxxxxxxxxxx
Signed-By: xxxxxxxxxxxxxxxxxxxx
ValidFor: 2y 6m
Log: packages.Ubuntu.log
Origin: git.sdxlive.com/git/PPA
Label: Ubuntu groovy
Suite: groovy
Version: 20.10
Codename: groovy
Architectures: i386 amd64
Components: stable unstable
Limit: 0
Description: Latest Ubuntu groovy 20.10 packages
Contents: .gz .bz2
Tracking: keep
SignWith: xxxxxxxxxxxxxxxxxxxx
Signed-By: xxxxxxxxxxxxxxxxxxxx
ValidFor: 2y 6m
Log: packages.Ubuntu.log
The goal is to remove the whole block of lines delimited by 'Origin: ' and an empty line when it contains the line "Codename: ${os_code_name}" where os_code_name is a bash variable.
So the expected output is:
Origin: git.sdxlive.com/git/PPA
Label: Ubuntu groovy
Suite: groovy
Version: 20.10
Codename: groovy
Architectures: i386 amd64
Components: stable unstable
Limit: 0
Description: Latest Ubuntu groovy 20.10 packages
Contents: .gz .bz2
Tracking: keep
SignWith: xxxxxxxxxxxxxxxxxxxx
Signed-By: xxxxxxxxxxxxxxxxxxxx
ValidFor: 2y 6m
Log: packages.Ubuntu.log
Without a variable Codename, we could use for instance the following to remove the block matching the focal Codename:
awk '/^Origin: /{s=x} {s=s $0 RS} /^$/{if(s!~/Codename: focal/) printf "%s",s}' distributions
I could not find a solution to use a variable Codename; I tried to use:
--assign=var="${os_code_name}"
ENVIRON["os_code_name"]
In the first case, I don't know how awk can differentiate between the string 'Codename: ' and the variable var, since we cannot use "$var". The following does not work obviously:
awk --assign=var="${os_code_name}" '/^Origin: /{s=x} {s=s $0 RS} /^$/{if(s!~/Codename: $var/) printf "%s",s}' distributions
In the second case, it is also unsuccessful:
awk '/^Origin: /{s=x} {s=s $0 RS} /^$/{if(s!~/Codename: ENVIRON["os_code_name"]/) printf "%s",s}' distributions
I also checked this answer.
Any suggestion?

Could you please try following, written and tested with shown samples and should work in all kind of awk.
os_code_name="focal" ##shell variable
awk -v co="$os_code_name" '
/Origin/{
if(!foundCo && FNR>1){ print val }
val=foundCo=""
}
/^Codename/ && $NF==co{
foundCo=1
}
{
val=(val?val ORS:"")$0
}
END{
if(!foundCo){ print val }
}
' Input_file
Explanation: Adding detailed explanation for above.
os_code_name="focal" ##This is a shell variable.
awk -v co="$os_code_name" ' ##Starting awk program from here and setting co variable as value of os_code_name here.
/Origin/{ ##Checking condition if line has Origin string in it then do following.
if(!foundCo && FNR>1){ print val } ##Checking condition if foundCo is NULL and FNR>1 then print val here.
val=foundCo="" ##Nullifying variables here.
}
/^Codename/ && $NF==co{ ##Checking condition if line starts with Codenam and last field is equal to variable.
foundCo=1 ##Setting value for foundCo here.
}
{
val=(val?val ORS:"")$0 ##Creating val which has all lines values from Origin to just before next occurrence of Origin it either gets printed above or gets NULL.
}
END{ ##Starting END block of this awk program from here.
if(!foundCo){ print val } ##Checking condition if foundCo is NULL then print val here.
}
' Input_file ##Mentioning Input_file name here.

You can use empty RS, this is the paragraph mode, and do not print any record where that codename exists.
awk -v cn="$cn" -v RS="" '!($0 ~ "Codename: " cn){print $0,"\n"}' file
The variable has to be passed in the way your linked answer says. The pattern matching can be done either using ~ /.../ or ~ "...", using the double quotes is what you have to do here, "Codename: " var is the matching string.

Related

Split command output into separate variables

I'm trying to use a bash script with macos time machine.
I'm trying to read the properties of the time machine backup destinations and then split the destinations into variables if there are multiple destinations.
from there I can use the ID to make a backup.
I'm having trouble splitting the output into their own variables.
rawdstinfo=$(tmutil destinationinfo)
echo "$rawdstinfo"
> ==================================================
Name : USB HDD
Kind : Local
Mount Point : /Volumes/USB HDD
ID : 317BD93D-7D90-494C-9D5F-9013B25D1345
====================================================
Name : TM TEST
Kind : Local
Mount Point : /Volumes/TM TEST
ID : 4648083B-2A11-42BC-A8E0-D95917053D27
I was thinking of counting the ================================================== and then trying to split the variable based on them but i'm not having any luck.
any help would be greatly appreciated.
Thanks
PS:
to make it clear what i would like to achieve, I would like to send each destination drive to an object. From there I can compare the mount point names (which has been selected earlier in the script), to then get the "destination ID" within that object so I can then use it with the other tmutil commands such as
#start a TM backup
sudo tmutil startbackup --destination $DESTINATIONID
#remove Migration HDD as a destination
sudo tmutil removedestination $DESTINATIONID
I like to use awk for parsing delimited flat files. I copied the tmutil output from your question and pasted it into a file I named testdata.txt since I'm not doing this on a Mac. Make sure the number of equal signs in the record separators actually match what tmutil produces.
Here is the awk portion of the solution which goes into a file I named timemachine_variables.awk:
function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
BEGIN {
RS="====================================================\n";
FS=":|\n"
}
{
i=FNR-1
}
(FNR>1 && $1 ~ /Name/) {print "Name["i"]="trim($2)}
(FNR>1 && $3 ~ /Kind/) {print "Kind["i"]="trim($4)}
(FNR>1 && $5 ~ /Mount Point/) {print "Mount_Point["i"]="trim($6)}
(FNR>1 && $7 ~ /ID/) {print "ID["i"]="trim($8)}
The functions at the beginning are to trim leading or trailing white spaces off any fields. I split the records based on the equals sign separators and the fields based on the colon ":" character. FNR is gawk's internal variable for the current record number that we're looking at. Since the output apparently begins with a bar of equal signs, the first record is empty so I am using FNR > 1 as a condition to exclude it. Then I have gawk print code which will become array assignments for bash. In your example, this should be gawk's output:
$ gawk -f timemachine_variables.awk testdata.txt
Name[1]=USB HDD
Kind[1]=Local
Mount_Point[1]=/Volumes/USB HDD
ID[1]=317BD93D-7D90-494C-9D5F-9013B25D1345
Name[2]=TM TEST
Kind[2]=Local
Mount_Point[2]=/Volumes/TM TEST
ID[2]=4648083B-2A11-42BC-A8E0-D95917053D27
In your BASH script, declare the arrays from the gawk script's output:
$ declare $(gawk -f timemachine_variables.awk testdata.txt)
You should now have BASH arrays for each drive:
$ echo ${ID[2]}
4648083B-2A11-42BC-A8E0-D95917053D27
UPDATE: The original awk script that I posted does not work on the Mac because BSD awk does not support multi-character separators. I'm leaving it here because it works for gawk, and comparing the two scripts may help others who are looking for a way to achieve multi-character separator behavior in BSD awk.
Instead of changing the default record separator which is the end of the line, I set my own counter of i to 0, and then increment it every time the whole record starts and ends with one or more equal signs. Since awk now views each line as its own record and the field separator is still ":", the name we are trying to match is always in $1 and the value is always in $2.
function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
BEGIN {
FS=":";
i=0;
}
($0 ~ /^=+$/) {i++;}
($1 ~ /Name/) {print "Name["i"]="trim($2)}
($1 ~ /Kind/) {print "Kind["i"]="trim($2)}
($1 ~ /Mount Point/) {print "Mount_Point["i"]="trim($2)}
($1 ~ /ID/) {print "ID["i"]="trim($2)}

How to cut a range of lines defined by variables

I have this python crawler output
[+] Site to crawl: http://www.example.com
[+] Start time: 2020-05-24 07:21:27.169033
[+] Output file: www.example.com.crawler
[+] Crawling
[-] http://www.example.com
[-] http://www.example.com/
[-] http://www.example.com/icons/ubuntu-logo.png
[-] http://www.example.com/manual
[i] 404 Not Found
[+] Total urls crawled: 4
[+] Directories found:
[-] http://www.example.com/icons/
[+] Total directories: 1
[+] Directory with indexing
I want to cut the lines between "Crawling" & "Total urls crawled" using awk or any other tool, so basically i wanna use variables to assign the NR to the first keyword "Crawling", and a second variable assigned to it the NR value of the second limiter "Total urls crawled", and then cut the range between the two, i tried something like this:
awk 'NR>$(Crawling) && NR<$(urls)' file.txt
but nothing really worked, the best i got is a cut from the Crawling+1 line to the end of the file which isn't helpfull really, so how to do it and how to cut a range of lines with awk with variables!
awk
If I got your requirement correctly you want to put shell variables to awk code and search strings then try following.
awk -v crawl="Crawling" -v url="Total urls crawled" '
$0 ~ url{
found=""
next
}
$0 ~ crawl{
found=1
next
}
found
' Input_file
Explanation: Adding detailed explanation for above.
awk -v crawl="Crawling" -v url="Total urls crawled" ' ##Starting awk program and setting crawl and url values of variables here.
$0 ~ url{ ##Checking if line is matched to url variable then do following.
found="" ##Nullify the variable found here.
next ##next will skip further statements from here.
}
$0 ~ crawl{ ##Checking if line is matched to crawl variable then do following.
found=1 ##Setting found value to 1 here.
next ##next will skip further statements from here.
}
found ##Checking condition if found is SET(NOT NULL) then print current line.
' Input_file ##Mentioning Input_file name here.
The clause "...or any other tool" prompts me to point out that a scripting language could be used in command-line mode for this. Here's how it could be done using Ruby, where 't' is the name of the file that contains the text from which the specified lines are to be extracted. The following would be entered in the shell.
ruby -W0 -e 'puts STDIN.readlines.select { |line| true if line.match?(/\bCrawling\b/)..line.match?(/\bTotal urls crawled\b/) }[1..-2]' < t
displays the following:
[" [-] http://www.example.com",
" [-] http://www.example.com/",
" [-] http://www.example.com/icons/ubuntu-logo.png",
" [-] http://www.example.com/manual",
" [i] 404 Not Found"]
The following operations are performed.
STDIN.readlines and < t reads the lines of t into an array
select selects the lines for which its block calculation returns true
[1..-2] extracts all but the first and last of the selected lines
select's block calculation,
true if line.match?(/\bCrawling\b/)..line.match?(/\bTotal urls crawled\b/)
employs the flip-flop operator. The block returns nil (treated as false by Ruby) until a line that matches /\bCrawling\b is read, namely, "[+] Crawling". The block then returns true, and continues to return true until and it encounters the line matching /\bTotal urls crawled\b, namely, "[+] Total urls crawled: 4". The block returns true for that line as well, but returns false for each subsequent line until and if it encounters another line that matches /\bCrawling\b, in which case the process repeats. Hence, "flip-flop".
"-W0" in the command line suppresses warning messages. Without it one may see the warning, "flip-flop is deprecated" (depending on the version of Ruby being used). After a decision was made to deprecate the (rarely-used) flip-flop operator, Rubyists took to the streets with pitchforks and torches in protest. The Ruby monks saw the error of their ways and reversed their decision.

Sed awk text formatting

I would like to filter and order text files with something like awk or sed. It does not need to be a single command, a small bash script should be fine too.
#
home: address01
name: name01
info: info01
number: number01
company: company01
#
name: name02
company: company02
info: info02
home: home02
#
company: company03
home: address03
name: name03
info: info03
info: info032
number: number03
company: company032
#
name: name04
info: info04
company: company04
number: number04
number: number042
info: info042
I only need name, number, and info. There is always exactly one name, but there can be 0,1 or 2 number and info. The # is the only thing which is consistent and always on the same spot.
output should be:
name01,number01,,info01,
name02,,,info02,
name03,number03,,info03,info032
name04,number04,number042,info04,info042
What I tried so far:
awk -v OFS=',' '{split($0,a,": ")} /^name:/{name=a[2]} /^number:/{number=a[2]} /^info:/{info=a[2]; print name,number,info}' > dump.csv
Consider changing the logic to print on '#' or after the last line (assuming last block not terminated with #):
awk -v OFS=',' '
{split($0,a,": ")}
/^name:/{name=a[2]}
/^number:/{number=a[2]}
/^info:/{info=a[2]}
/^#/ { print name,number,info}
END { print name,number,info}
' < w.txt > dump.csv

Extract string from many brackets

I have a file with this content:
ok: [10.9.22.122] => {
"out.stdout_lines": [
"cgit-1.1-11.el7.x86_64",
"python-paramiko-2.1.1-0.9.el7.noarch",
"varnish-libs-4.0.5-1.el7.x86_64",
"kernel-3.10.0-862.el7.x86_64"
]
}
ok: [10.9.33.123] => {
"out.stdout_lines": [
"python-paramiko-2.1.1-0.9.el7.noarch"
]
}
ok: [10.9.44.124] => {
"out.stdout_lines": [
"python-paramiko-2.1.1-0.9.el7.noarch",
"kernel-3.10.0-862.el7.x86_64"
]
}
ok: [10.9.33.29] => {
"out.stdout_lines": []
}
ok: [10.9.22.28] => {
"out.stdout_lines": [
"NetworkManager-tui-1:1.12.0-8.el7_6.x86_64",
"java-1.8.0-openjdk-javadoc-zip-debug-1:1.8.0.171-8.b10.el7_5.noarch",
"java-1.8.0-openjdk-src-1:1.8.0.171-8.b10.el7_5.x86_64",
"kernel-3.10.0-862.el7.x86_64",
"kernel-tools-3.10.0-862.el7.x86_64",
]
}
ok: [10.2.2.2] => {
"out.stdout_lines": [
"monitorix-3.10.1-1.el6.noarch",
"singularity-runtime-2.6.1-1.1.el6.x86_64"
]
}
ok: [10.9.22.33] => {
"out.stdout_lines": [
"NetworkManager-1:1.12.0-8.el7_6.x86_64",
"gnupg2-2.0.22-5.el7_5.x86_64",
"kernel-3.10.0-862.el7.x86_64",
]
}
I need to extract the IP between [] if into stout_line contains kernel*.
I want to "emulate" substring, to save a 'block' of content into varible and go through the all file.
How would I use sed, or other, to do this if I have many delimiter?
A GNU awk solution:
awk -F'\\]|\\[' 'tolower($3)~/"out.stdout_lines" *:/ && tolower($4)~/"kernel/{print "The IP " $2 " cointain Kernel"}' RS='}' file
Output:
The IP 10.9.22.122 cointain Kernel
The IP 10.9.44.124 cointain Kernel
The IP 10.9.22.28 cointain Kernel
The IP 10.9.22.33 cointain Kernel
I used ] or [ as FS field separator, and } as RS record separator.
So the IP will just becomes $2.
This solution depends on the structure, that means "out.stdout_lines" needs to be in the field after [ip] like you showed in your example.
Another GNU awk way, no above limitation:
awk -F']' 'match(tolower($0),/"out\.stdout_lines": *\[([^\]]+)/,m){if(m[1]~/"kernel/)print "The IP " substr($1, index($1,"[")+1) " cointain Kernel"}' RS='}' file
Same output. The tolowers are for case insensitive match, If you want exact match, you can remove them or just use solutions from Revision 6.
Combine merits from above two ways, the Third way:
awk -F'\\]|\\[' 'match(tolower($0),/"out\.stdout_lines": *\[([^\]]+)/,m){if(m[1]~/"kernel/)print "The IP " $2 " cointain Kernel"}' RS='}' file
Change tolower($0) to $0 if you don't need case insensitive match.
$ gawk -v RS="ok: " -F " => " '$2 ~ /[Kk]ernel/ { printf "The IP %s contains Kernel\n", $1 }' file
The IP [10.9.22.122] contains Kernel
The IP [10.9.44.124] contains Kernel
since your data are pretty much well-formated, you can use awk(gawk):
awk '
# get the ip address
/ok:/ {ip = gensub(/[^0-9\.]/, "", "g", $2) }
# check the stdout_lines block and print Kernal and ip saved from the above line
/"out.stdout_lines":/,/\]/ { if (/\<[Kk]ernel\>/) print ip}
' file
#10.9.22.122
#10.9.44.124
#10.9.22.28
#10.9.22.28
#10.9.22.33
Note:
I adjusted the regexes to reflect to your updated data.
you might get more than one Kernel files for the same IP under the out.stdout_lines block, which will yield the same IP multiple times. If this happens just pipe the result to | uniq
This might work for you (GNU sed):
sed -n '/ok:/{s/[^0-9.]//g;:a;N;/]/!ba;/stdout_line.*kernel/P}' file
Set the -n to suppress implicit printing
If a line contains the the string ok: this is an IP address, strip the line of everything but integers and periods.
Append further lines until a line containing ] is encountered and if the pattern space contains both stdout_line and kernel, print the first line.
Fast solution:
#!/bin/bash
AWK='
/^ok:/ { gsub(/^.*\[/,""); gsub(/].*$/,""); ip=$0 }
/"Kernel-default/ { if (ip) print ip; ip="" }
'
awk "$AWK" INPUT
Could you please try following, this should work for most of the awks I believe.(I have added [kK] in condition match so it should look for kernal OR Kernal both strings(since OP's previous sample was having capital K and now it has ksmall one, so thought to cover both here).
awk '
/ok/{
gsub(/.*\[|\].*/,"")
ip=$0
}
/stdout_line/{
found=1
next
}
found && /[kK]ernel/{
print ip
}
/}/{
ip=found=""
}
' Input_file
Explanation: Adding explanation for above code.
awk ' ##Starting awk program here.
/ok/{ ##Checking condition if a line contains string ok in it then do following.
gsub(/.*\[|\].*/,"") ##Globally substituting everything till [ and everything till ] with NULL in current line.
ip=$0 ##Creating variable named ip whose values is current line value(edited one).
} ##Closing BLOCK for ok string check condition.
/stdout_line/{ ##Checking condition if a line contains stdout_line then do following.
found=1 ##Set value of variable named found to 1 here.
next ##next will skip all further statements from here.
} ##Closing BLOCK for stdout_line string check condition here.
found && /[kK]ernel/{ ##Checking condition if variable found is NOT NULL and string Kernel found in current line then do following.
print ip ##Printing value of variable ip here.
} ##Closing BLOCK for above condition now.
/}/{ ##Checking condition if a line contains } then do following.
ip=found="" ##Nullify ip and found variable here.
} ##Closing BLOCK for } checking condition.
' Input_file ##Mentioning Input_file name here.
Output will be as follows.
10.9.22.122
10.9.44.124
10.9.22.28
10.9.22.28
10.9.22.33
Using Perl
$ perl -0777 -ne 's!\[(\S+)\].+?\{(.+?)\}!$y=$1;$x=$2;$x=~/kernel/ ? print "$y\n":""!sge' brenn.log
10.9.22.122
10.9.44.124
10.9.22.28
10.9.22.33
$

Awk doesnt work with the first line

Good day to everyone.
Could you, please, help me with some of my file preparation problem:
I have a file:
2:1 3:1 4:2 5:1 7:2 34:1 37:3 ...
4:2 6:1 8:1 23:1 25:2 30:1 ...
I would like to get:
20002:1 20003:1 20004:2 20005:1 20007:2 20034:1 20037:3 ...
20004:2 20006:1 20008:1 20023:1 20025:2 20030:1 ...
I tried:
awk '{FS=":"; RS=" "; OFS=":"; ORS=" "}{$1=$1+20000; print $0}'
But it works only partially: it doesnt work with the first line, giving 20002:1:3:1:4:2.., and doesn't work with the first element of each line, giving 4:2 20006:1 20008:1 ...
You can use this (GNU awk only for RT)
awk 'BEGIN{FS=OFS=":";RS="[[:space:]]"}{ORS=RT;$1=$1+20000; print $0}' file
20002:1 20003:1 20004:2 20005:1 20007:2 20034:1 20037:3
20004:2 20006:1 20008:1 20023:1 20025:2 20030:1
Explanation
BEGIN{
#Only run at start of script
FS=OFS=":"
#Set input and output field separator to :
RS="[[:space:]]"
#Set the record separator to any space character e.g `\n` `\t` or ` `
}
{ORS=RT
#Set the output record separator to whatever was captured by the input one, i.e keep newline space or tab in the right places
$1+=20000; print
#Do your math and print, note that `+=` is shorthand for adding to the current value,
#and also that print can be used on it's own as by default it prints $0(you can also use 1
#at the end of the script as this evaluates to true and the default action if no block
#is defined is to print the current line)
}'
In case of not having GNU awk as required by #123's more elegant solution:
$ awk -F"[: ]+" '{for(i=1;i<NF;i+=2){$i+=20000; printf "%s:%s ",$i,$(i+1)} print ""}' cs.txt
20002:1 20003:1 20004:2 20005:1 20007:2 20034:1 20037:3
20004:2 20006:1 20008:1 20023:1 20025:2 20030:1

Resources