Extract string from many brackets - bash

I have a file with this content:
ok: [10.9.22.122] => {
"out.stdout_lines": [
"cgit-1.1-11.el7.x86_64",
"python-paramiko-2.1.1-0.9.el7.noarch",
"varnish-libs-4.0.5-1.el7.x86_64",
"kernel-3.10.0-862.el7.x86_64"
]
}
ok: [10.9.33.123] => {
"out.stdout_lines": [
"python-paramiko-2.1.1-0.9.el7.noarch"
]
}
ok: [10.9.44.124] => {
"out.stdout_lines": [
"python-paramiko-2.1.1-0.9.el7.noarch",
"kernel-3.10.0-862.el7.x86_64"
]
}
ok: [10.9.33.29] => {
"out.stdout_lines": []
}
ok: [10.9.22.28] => {
"out.stdout_lines": [
"NetworkManager-tui-1:1.12.0-8.el7_6.x86_64",
"java-1.8.0-openjdk-javadoc-zip-debug-1:1.8.0.171-8.b10.el7_5.noarch",
"java-1.8.0-openjdk-src-1:1.8.0.171-8.b10.el7_5.x86_64",
"kernel-3.10.0-862.el7.x86_64",
"kernel-tools-3.10.0-862.el7.x86_64",
]
}
ok: [10.2.2.2] => {
"out.stdout_lines": [
"monitorix-3.10.1-1.el6.noarch",
"singularity-runtime-2.6.1-1.1.el6.x86_64"
]
}
ok: [10.9.22.33] => {
"out.stdout_lines": [
"NetworkManager-1:1.12.0-8.el7_6.x86_64",
"gnupg2-2.0.22-5.el7_5.x86_64",
"kernel-3.10.0-862.el7.x86_64",
]
}
I need to extract the IP between [] if into stout_line contains kernel*.
I want to "emulate" substring, to save a 'block' of content into varible and go through the all file.
How would I use sed, or other, to do this if I have many delimiter?

A GNU awk solution:
awk -F'\\]|\\[' 'tolower($3)~/"out.stdout_lines" *:/ && tolower($4)~/"kernel/{print "The IP " $2 " cointain Kernel"}' RS='}' file
Output:
The IP 10.9.22.122 cointain Kernel
The IP 10.9.44.124 cointain Kernel
The IP 10.9.22.28 cointain Kernel
The IP 10.9.22.33 cointain Kernel
I used ] or [ as FS field separator, and } as RS record separator.
So the IP will just becomes $2.
This solution depends on the structure, that means "out.stdout_lines" needs to be in the field after [ip] like you showed in your example.
Another GNU awk way, no above limitation:
awk -F']' 'match(tolower($0),/"out\.stdout_lines": *\[([^\]]+)/,m){if(m[1]~/"kernel/)print "The IP " substr($1, index($1,"[")+1) " cointain Kernel"}' RS='}' file
Same output. The tolowers are for case insensitive match, If you want exact match, you can remove them or just use solutions from Revision 6.
Combine merits from above two ways, the Third way:
awk -F'\\]|\\[' 'match(tolower($0),/"out\.stdout_lines": *\[([^\]]+)/,m){if(m[1]~/"kernel/)print "The IP " $2 " cointain Kernel"}' RS='}' file
Change tolower($0) to $0 if you don't need case insensitive match.

$ gawk -v RS="ok: " -F " => " '$2 ~ /[Kk]ernel/ { printf "The IP %s contains Kernel\n", $1 }' file
The IP [10.9.22.122] contains Kernel
The IP [10.9.44.124] contains Kernel

since your data are pretty much well-formated, you can use awk(gawk):
awk '
# get the ip address
/ok:/ {ip = gensub(/[^0-9\.]/, "", "g", $2) }
# check the stdout_lines block and print Kernal and ip saved from the above line
/"out.stdout_lines":/,/\]/ { if (/\<[Kk]ernel\>/) print ip}
' file
#10.9.22.122
#10.9.44.124
#10.9.22.28
#10.9.22.28
#10.9.22.33
Note:
I adjusted the regexes to reflect to your updated data.
you might get more than one Kernel files for the same IP under the out.stdout_lines block, which will yield the same IP multiple times. If this happens just pipe the result to | uniq

This might work for you (GNU sed):
sed -n '/ok:/{s/[^0-9.]//g;:a;N;/]/!ba;/stdout_line.*kernel/P}' file
Set the -n to suppress implicit printing
If a line contains the the string ok: this is an IP address, strip the line of everything but integers and periods.
Append further lines until a line containing ] is encountered and if the pattern space contains both stdout_line and kernel, print the first line.

Fast solution:
#!/bin/bash
AWK='
/^ok:/ { gsub(/^.*\[/,""); gsub(/].*$/,""); ip=$0 }
/"Kernel-default/ { if (ip) print ip; ip="" }
'
awk "$AWK" INPUT

Could you please try following, this should work for most of the awks I believe.(I have added [kK] in condition match so it should look for kernal OR Kernal both strings(since OP's previous sample was having capital K and now it has ksmall one, so thought to cover both here).
awk '
/ok/{
gsub(/.*\[|\].*/,"")
ip=$0
}
/stdout_line/{
found=1
next
}
found && /[kK]ernel/{
print ip
}
/}/{
ip=found=""
}
' Input_file
Explanation: Adding explanation for above code.
awk ' ##Starting awk program here.
/ok/{ ##Checking condition if a line contains string ok in it then do following.
gsub(/.*\[|\].*/,"") ##Globally substituting everything till [ and everything till ] with NULL in current line.
ip=$0 ##Creating variable named ip whose values is current line value(edited one).
} ##Closing BLOCK for ok string check condition.
/stdout_line/{ ##Checking condition if a line contains stdout_line then do following.
found=1 ##Set value of variable named found to 1 here.
next ##next will skip all further statements from here.
} ##Closing BLOCK for stdout_line string check condition here.
found && /[kK]ernel/{ ##Checking condition if variable found is NOT NULL and string Kernel found in current line then do following.
print ip ##Printing value of variable ip here.
} ##Closing BLOCK for above condition now.
/}/{ ##Checking condition if a line contains } then do following.
ip=found="" ##Nullify ip and found variable here.
} ##Closing BLOCK for } checking condition.
' Input_file ##Mentioning Input_file name here.
Output will be as follows.
10.9.22.122
10.9.44.124
10.9.22.28
10.9.22.28
10.9.22.33

Using Perl
$ perl -0777 -ne 's!\[(\S+)\].+?\{(.+?)\}!$y=$1;$x=$2;$x=~/kernel/ ? print "$y\n":""!sge' brenn.log
10.9.22.122
10.9.44.124
10.9.22.28
10.9.22.33
$

Related

AWK print block that does NOT contain specific text

I have the following data file:
variable "ARM_CLIENT_ID" {
description = "Client ID for Service Principal"
}
variable "ARM_CLIENT_SECRET" {
description = "Client Secret for Service Principal"
}
# [.....loads of code]
variable "logging_settings" {
description = "Logging settings from TFVARs"
}
variable "azure_firewall_nat_rule_collections" {
default = {}
}
variable "azure_firewall_network_rule_collections" {
default = {}
}
variable "azure_firewall_application_rule_collections" {
default = {}
}
variable "build_route_tables" {
description = "List of Route Table keys that need direct internet prior to Egress FW build"
default = [
"shared_services",
"sub_to_afw"
]
}
There's a 2 things I wish to do:
print the variable names without the inverted commas
ONLY print the variables names if the code block does NOT contain default
I know I can print the variables names like so: awk '{ gsub("\"", "") }; (/variable/ && $2 !~ /^ARM_/) { print $2}'
I know I can print the code blocks with: awk '/variable/,/^}/', which results:
# [.....loads of code output before this]
variable "logging_settings" {
description = "Logging settings from TFVARs"
}
variable "azure_firewall_nat_rule_collections" {
default = {}
}
variable "azure_firewall_network_rule_collections" {
default = {}
}
variable "azure_firewall_application_rule_collections" {
default = {}
}
variable "build_route_tables" {
description = "List of Route Table keys that need direct internet prior to Egress FW build"
default = [
"shared_services",
"sub_to_afw"
]
}
However, I cannot find out how to print the code blocks "if" they don't contain default. I know I will need to use an if statement, and some variables perhaps, but I am unsure as of how.
This code block should NOT appear in the output for which I grab the variable name:
variable "build_route_tables" {
description = "List of Route Table keys that need direct internet prior to Egress FW build"
default = [
"shared_services",
"sub_to_afw"
]
}
End output should NOT contain those that had default:
# [.....loads of code output before this]
expressroute_settings
firewall_settings
global_settings
peering_settings
vnet_transit_object
vnet_shared_services_object
route_tables
logging_settings
Preferable I would like to keep this a single AWK command or file, no piping. I have uses for this that do prefer no piping.
EDIT: update the ideal outputs (missed some examples of those with default)
Assumptions and collection of notes from OP's question and comments:
all variable definition blocks end with a right brace (}) in the first column of a new line
we only display variable names (sans the double quotes)
we do not display the variable names if the body of the variable definition contains the string default
we do not display the variable name if it starts with the string ARM_
One (somewhat verbose) awk solution:
NOTE: I've copied the sample input data into my local file variables.dat
awk -F'"' ' # use double quotes as the input field separator
/^variable / && $2 !~ "^ARM_" { varname = $2 # if line starts with "^variable ", and field #2 is not like "^ARM_", save field #2 for later display
printme = 1 # enable our print flag
}
/variable/,/^}/ { if ( $0 ~ "default" ) # within the range of a variable definition, if we find the string "default" ...
printme = 0 # disable the print flag
next # skip to next line
}
printme { print varname # if the print flag is enabled then print the variable name and then ...
printme = 0 # disable the print flag
}
' variables.dat
This generates:
logging_settings
$ awk -v RS= '!/default =/{gsub(/"/,"",$2); print $2}' file
ARM_CLIENT_ID
ARM_CLIENT_SECRET
[.....loads
logging_settings
of course output doesn't match yours since it's inconsistent with the input data.
Using GNU awk:
awk -v RS="}" '/variable/ && !/default/ && !/ARN/ { var=gensub(/(^.*variable ")(.*)(".*{.*)/,"\\2",$0);print var }' file
Set the record separator to "}" and then check for records that contain "variable", don't contain default and don't contain "ARM". Use gensub to split the string into three sections based on regular expressions and set the variable var to the second section. Print the var variable.
Output:
logging_settings
Another variation on awk using skip variable to control the array index holding the variable names:
awk '
/^[[:blank:]]*#/ { next }
$1=="variable" { gsub(/["]/,"",$2); vars[skip?n:++n]=$2; skip=0 }
$1=="default" { skip=1 }
END { if (skip) n--; for(i=1; i<=n; i++) print vars[i] }
' code
The first rule just skips comment lines. If you want to skip "ARM_" variables, then you can add a test on $2.
Example Use/Output
With your example code in code, all variables without default are:
$ awk '
> /^[[:blank:]]*#/ { next }
> $1=="variable" { gsub(/["]/,"",$2); vars[skip?n:++n]=$2; skip=0 }
> $1=="default" { skip=1 }
> END { if (skip) n--; for(i=1; i<=n; i++) print vars[i] }
> ' code
ARM_CLIENT_ID
ARM_CLIENT_SECRET
logging_settings
Here's another maybe shorter solution.
$ awk -F'"' '/^variable/&&$2!~/^ARM_/{v=$2} /default =/{v=0} /}/&&v{print v; v=0}' file
logging_settings

Extract json value on regex on bash script

How can i get the values inner depends in bash script?
manifest.py
# Commented lines
{
'category': 'Sales/Subscription',
'depends': [
'sale_subscription',
'sale_timesheet',
],
'auto_install': True,
}
Expected response:
sale_subscription sale_timesheet
The major problem is linebreak, i have already tried | grep depends but i can not get the sale_timesheet value.
Im trying to add this values comming from files into a var, like:
DOWNLOADED_DEPS=($(ls -A $DOWNLOADED_APPS | while read -r file; do cat $DOWNLOADED_APPS/$file/__manifest__.py | [get depends value])
Example updated.
If this is your JSON file:
{
"category": "Sales/Subscription",
"depends": [
"sale_subscription",
"sale_timesheet"
],
"auto_install": true
}
You can get the desired result using jq like this:
jq -r '.depends | join(" ")' YOURFILE.json
This uses .depends to extract the value from the depends field, pipes it to join(" ") to join the array with a single space in between, and uses -r for raw (unquoted) output.
If it is not a json file and only string then you can use below Regex to find the values. If it's json file then you can use other methods like Thomas suggested.
^'depends':\s*(?:\[\s*)(.*?)(?:\])$
demo
you can use egrep for this as follows:
% egrep -M '^\'depends\':\s*(?:\[\s*)(.*?)(?:\])$' pathTo\jsonFile.txt
you can read about grep
As #Thomas has pointed out in a comment, the OPs input data is not in JSON format:
$ cat manifest.py
# Commented lines // comments not allowed in JSON
{
'category': 'Sales/Subscription', // single quotes should be replaced by double quotes
'depends': [
'sale_subscription',
'sale_timesheet', // trailing comma at end of section not allowed
],
'auto_install': True, // trailing comma issue; should be lower case "true"
}
And while the title of the question mentions regex, there is no sign of a regex in the question. I'll leave a regex based solution for someone else to come up with and instead ...
One (quite verbose) awk solution based on the input looking exactly like what's in the question:
$ awk -F"'" ' # use single quote as field separator
/depends/ { printme=1 ; next } # if we see the string "depends" then set printme=1
printme && /]/ { printme=0 ; next} # if printme=1 and line contains a right bracket then set printme=0
printme { printf pfx $2; pfx=" " } # if printme=1 then print a prefix + field #2;
# first time around pfx is undefined;
# subsequent passes will find pfx set to a space;
# since using "printf" with no "\n" in sight, all output will stay on a single line
END { print "" } # add a linefeed on the end of our output
' json.dat
This generates:
sale_subscription sale_timesheet

Split command output into separate variables

I'm trying to use a bash script with macos time machine.
I'm trying to read the properties of the time machine backup destinations and then split the destinations into variables if there are multiple destinations.
from there I can use the ID to make a backup.
I'm having trouble splitting the output into their own variables.
rawdstinfo=$(tmutil destinationinfo)
echo "$rawdstinfo"
> ==================================================
Name : USB HDD
Kind : Local
Mount Point : /Volumes/USB HDD
ID : 317BD93D-7D90-494C-9D5F-9013B25D1345
====================================================
Name : TM TEST
Kind : Local
Mount Point : /Volumes/TM TEST
ID : 4648083B-2A11-42BC-A8E0-D95917053D27
I was thinking of counting the ================================================== and then trying to split the variable based on them but i'm not having any luck.
any help would be greatly appreciated.
Thanks
PS:
to make it clear what i would like to achieve, I would like to send each destination drive to an object. From there I can compare the mount point names (which has been selected earlier in the script), to then get the "destination ID" within that object so I can then use it with the other tmutil commands such as
#start a TM backup
sudo tmutil startbackup --destination $DESTINATIONID
#remove Migration HDD as a destination
sudo tmutil removedestination $DESTINATIONID
I like to use awk for parsing delimited flat files. I copied the tmutil output from your question and pasted it into a file I named testdata.txt since I'm not doing this on a Mac. Make sure the number of equal signs in the record separators actually match what tmutil produces.
Here is the awk portion of the solution which goes into a file I named timemachine_variables.awk:
function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
BEGIN {
RS="====================================================\n";
FS=":|\n"
}
{
i=FNR-1
}
(FNR>1 && $1 ~ /Name/) {print "Name["i"]="trim($2)}
(FNR>1 && $3 ~ /Kind/) {print "Kind["i"]="trim($4)}
(FNR>1 && $5 ~ /Mount Point/) {print "Mount_Point["i"]="trim($6)}
(FNR>1 && $7 ~ /ID/) {print "ID["i"]="trim($8)}
The functions at the beginning are to trim leading or trailing white spaces off any fields. I split the records based on the equals sign separators and the fields based on the colon ":" character. FNR is gawk's internal variable for the current record number that we're looking at. Since the output apparently begins with a bar of equal signs, the first record is empty so I am using FNR > 1 as a condition to exclude it. Then I have gawk print code which will become array assignments for bash. In your example, this should be gawk's output:
$ gawk -f timemachine_variables.awk testdata.txt
Name[1]=USB HDD
Kind[1]=Local
Mount_Point[1]=/Volumes/USB HDD
ID[1]=317BD93D-7D90-494C-9D5F-9013B25D1345
Name[2]=TM TEST
Kind[2]=Local
Mount_Point[2]=/Volumes/TM TEST
ID[2]=4648083B-2A11-42BC-A8E0-D95917053D27
In your BASH script, declare the arrays from the gawk script's output:
$ declare $(gawk -f timemachine_variables.awk testdata.txt)
You should now have BASH arrays for each drive:
$ echo ${ID[2]}
4648083B-2A11-42BC-A8E0-D95917053D27
UPDATE: The original awk script that I posted does not work on the Mac because BSD awk does not support multi-character separators. I'm leaving it here because it works for gawk, and comparing the two scripts may help others who are looking for a way to achieve multi-character separator behavior in BSD awk.
Instead of changing the default record separator which is the end of the line, I set my own counter of i to 0, and then increment it every time the whole record starts and ends with one or more equal signs. Since awk now views each line as its own record and the field separator is still ":", the name we are trying to match is always in $1 and the value is always in $2.
function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
BEGIN {
FS=":";
i=0;
}
($0 ~ /^=+$/) {i++;}
($1 ~ /Name/) {print "Name["i"]="trim($2)}
($1 ~ /Kind/) {print "Kind["i"]="trim($2)}
($1 ~ /Mount Point/) {print "Mount_Point["i"]="trim($2)}
($1 ~ /ID/) {print "ID["i"]="trim($2)}

How to select text in a file until a certain string using grep, sed or awk?

I have a huge file (this is just a sample) and I would like to select all lines with "Ph_gUFAC1083" and all after until reach one that doesn't have the code (in this example Ph_gUFAC1139)
>uce_353_Ph_gUFAC1083 |uce_353
TTTAGCCATAGAAATGCAGAAATAATTAGAAGTGCCATTGTGTACAGTGCCTTCTGGACT
GGGCTGAAGGTGAAGGAGAAAGTATCATACTATCCTTGTCAGCTGCAAGGGTAATTACTG
CTGGCTGAAATTACTCAACATTTGTTTATAAGCTCCCCAGAGCATGCTGTAAATAGATTG
TCTGTTATAGTCCAATCACATTAAAACGCTGCTCCTTGCAAACTGCTACCTCCTGTTTTC
TGTAAGCTAGACAGAGAAAGCCTGCTGCTCACTTACTGAGCACCAAGCACTGAAGAGCTA
TGTTTAATGTGATTGTTTTCATTAGCTCTTCTCTGTCTGATATTACATTTATAATTTGCT
GGGCTTGAAGACTGGCATGTTGCATTGCTTTCATTTACTGTAGTAAGAGTGAATAGCTCT
AT
>uce_101_Ph_gUFAC1083 |uce_101
TTGGGCTTTATTTCCACCTTAAAATCTTTACCTGGCCGTGATCTGTTGTTCCATTACTGG
AGGGCAAAAATGGGAGGAATTGTCTGGGCTAAATTGCAATTAGGCAGCCCTGAGAGAGGC
TGGCACCAGTTAACTTGGGATATTGGAGTGAAAAGGCCCGTAATCAGCCTTCGGTCATGT
AGAACAATGCATAAAATTAAATTGACATTAATGAATAATTGTGTAATGAAAATGGAAGAG
GAGAGTTAATTGCATGTTACAGTGAGTGTAATGCCTAGATAACCTTGCATTTAATGCTAT
TCTTAGCCCTGCTGCCAAGACTTCTACAGAGCCTCTCTCTGCAGGAAGTCATTAAAGCTG
TGAGTAGATAATGCAGGCTCAGTGAAACCTAAGTGGCAACAATATA
>uce_171_Ph_gUFAC1083 |uce_171
CATGGAAAACGAGGAAAAGCCATATCTTCCAGGCCATTAATATTACTACGGAGACGTCTT
CATATCGCCGTAATTACAGCAGATCTCAAAGTGGCACAACCAAGACCAGCACCAAAGCTA
AAATAACTCGCAGGAGCAGGCGAGCTGCTTTTGCAGCCCTCAGTCCCAGAAATGCTCGGT
AGCTTTTCTTAAAATAGACAGCCTGTAAATAAGGTCTGTGAACTCAATTGAAGGTGGCTG
TTTCTGAATTAGTCAGCCCTCACAAGGCTCTCGGCCTACATGCTAGTACATAAATTGTCC
ACTTTACCACCAGACAAGAAAGATTAGAGTAATAAACACGGGGCATTAGCTCAGCTAGAG
AAACACACCAGCCGTTACGCACACGCGGGATTGCCAAGAACTGTTAACCCCACTCTCCAG
AAACGCACACAAAAAAACAAGTTAAAGCCATGACATCATGGGAA
>uce_4300_Ph_gUFAC1139 |uce_4300
ATTAAAAATACAATCCTCATGTTTGCATTTTGCAGTCGTCAACAAGAAATTGAAGAGAAA
CTCATAGAGGAAGAAACTGCTCGAAGGGTGGAAGAACTTGTAGCTAAACGCGTGGAAGAA
GAGCTGGAGAAAAGAAAGGATGAGATTGAGCGAGAGGTTCTCCGCAGGGTGGAGGAGGCT
AAGCGCATCATGGAAAAACAGTTGCTCGAAGAACTCGAGCGACAGCGACAAGCTGAACTT
GCAGCACAAAAAGCCAGAGAGGTAACGCTCGGTCGTTTGGAAAGTAGAGACAGTCCATGG
CAAAACTTTCAGTGTCGGTTTGTGCCTCCTGTTCGGTTCAGAAAGAGATGGAATACAGCA
AATCTAATTCCCTTCTCATATAAACTTGCATTGCTGCGAAACTTAATTTCTAGCCTATTC
AGAGGAGCTCACTGATATTTAAACAGTTACTCTCCTAAAACCTGAACAAGGATACTTGAT
TCTTAATGGAACTGACCTACATATTTCAGAATTGTTTGAAACTTTTGCCATGGCTGCAGG
ATTATTCAGCAGTCCTTTCATTTT
>uce_1039_Ph_gUFAC1139 |uce_1039
ATTAGTGGAATACAAATATGCAAAAACCAAACAGTTTGGTGCTATAATGTGAAAAGAAAT
TTACACCAATCTTATTTTTAATTTGTATGGGAACATTTTTACCACAAATTCCATATTTTA
ATAATACTATCCCAACTCTATTTTTTAGACTCATTTTGTCACTGTTTTGTAACAGAAACA
CTGTAAATATTATAGATGTGGTAAACTATTATACTTGTTTTCTTATAAATGAAATGATCT
GTGCCAACACTGACAAAATGAATTAATGTGTTACTAAGGCAACAGTCACATTATATGCTT
TCTCTTTCACAGTATGCGGTAGAGCATATGGTTTACTCTTAATGGAACACTAGCTTCTCA
TTAACATACCAGTAGCAATGTCAGAACTTACAAACCAGCATAACAGAGAAATGGAAAAAC
TTATAAATTAGACCCTTTCAGTATTATTGAGTAGAAAATGACTGATGTTCCAAGGTACAA
TATTTAGCTAATACAGTGCCCTTTTCTGCATCTTTCTTCTCAAAGGAAAAAAAAATCCTC
AAAAAAAACCAGAGCAAGAAACCTAACTTTTTCTTGT
I already tried several alternatives without success, the closest I reached was
sed -n '/Ph_gUFAC1083/, />/p' file.txt
that gave me that:
>uce_2347_Ph_gUFAC1083 |uce_2347
GCTTTTCTATGCAGATTTTTTCTAATTCTCTCCCTCCCCTTGCTTCTGTCAGTGTGAAGC
CCACACTAAGCATTAACAGTATTAAAAAGAGTGTTATCTATTAGTTCAATTAGACATCAG
ACATTTACTTTCCAATGTATTTGAAGACTGATTTGATTTGGGTCCAATCATTTAAAAATA
AGAGAGCAGAACTGTGTACAGAGCTGTGTACAGATATCTGTAGCTCTGAAGTCTTAATTG
CAAATTCAGATAAGGATTAGAAGGGGCTGTATCTCTGTAGACCAAAGGTATTTGCTAATA
CCTGAGATATAAAAGTGGTTAAATTCAATATTTACTAATTTAGGATTTCCACTTTGGATT
TTGATTAAGCTTTTTGGTTGAAAACCCCACATTATTAAGCTGTGATGAGGGAAAAAGCAA
CTCTTTCATAAGCCTCACTTTAACGCTTTATTTCAAATAATTTATTTTGGACCTTCTAAA
G
>uce_353_Ph_gUFAC1083 |uce_353
>uce_101_Ph_gUFAC1083 |uce_101
TTGGGCTTTATTTCCACCTTAAAATCTTTACCTGGCCGTGATCTGTTGTTCCATTACTGG
AGGGCAAAAATGGGAGGAATTGTCTGGGCTAAATTGCAATTAGGCAGCCCTGAGAGAGGC
TGGCACCAGTTAACTTGGGATATTGGAGTGAAAAGGCCCGTAATCAGCCTTCGGTCATGT
AGAACAATGCATAAAATTAAATTGACATTAATGAATAATTGTGTAATGAAAATGGAAGAG
GAGAGTTAATTGCATGTTACAGTGAGTGTAATGCCTAGATAACCTTGCATTTAATGCTAT
TCTTAGCCCTGCTGCCAAGACTTCTACAGAGCCTCTCTCTGCAGGAAGTCATTAAAGCTG
TGAGTAGATAATGCAGGCTCAGTGAAACCTAAGTGGCAACAATATA
>uce_171_Ph_gUFAC1083 |uce_171
Do you know how to do it using grep, sed or awk?
Thx
$ awk '/^>/{if(match($0,"Ph_gUFAC1083")){s=1} else s=0}s' file
I made a simple criteria for your request,
If the the start of the line is >, we're going to judge if "Ph_gUFAC1083" existed, if yes, set s=1, set s=0 otherwise.
For the line that doesn't start with >, the value of s would be retained.
The final s in the awk command decide if the line to be printed (s=1) or not (s=0).
If what you want is every line with Ph_gUFAC1139 plus block of lines after that line until the next line starting with >, then the following awk snippet might do:
$ awk 'BEGIN {RS=ORS=">"} /Ph_gUFAC1139/' file.txt
This uses the > character as a record separator, then simply displays records that contain the text you're interested in.
If you wanted to be able to provide the search string using a variable, you'd do it something like this:
$ val="Ph_gUFAC1139"
$ awk -v s="$val" 'BEGIN {RS=ORS=">"} $0 ~ s' file.txt
UPDATE
A comment mentions that the solution above shows trailing record separators rather than leading ones. You can adapt your output to match your input by reversing this order manually:
awk 'BEGIN { RS=ORS=">" } /Ph_gUFAC1139/ { printf "%s%s",ORS,$0 }' file.txt
Note that in the initial examples, a "match" of the regex would invoke awk's default "action", which is to print the line. The default action is invoked if no action is specified within the script. The code (immediately) above includes an action .. which prints the record, preceded by the separator.
This might work for you (GNU sed):
sed '/^>/h;G;/Ph_gUFAC1083/P;d' file
Store each line beginning with > in the hold space (HS) and then append the HS to every line. If any line contains the string Ph_gUFAC1083 print the first line in the pattern space (PS) and discard the everything else.
N.B. the regexp for the match may be amended to /\n.*Ph_gUFAC1083/ if the string match may occur in any line.
This program is used to find the block which starts with Ph_gUFAC1083 and ends with any statement other than Ph_gUFAC1139
cat inp.txt |
awk '
BEGIN{begin=0}
{
# Ignore blank lines
if( $0 ~ /^$/ )
{
print $0
next
}
# mark the line that contains Ph_gUFAC1083 and print it
if( $0 ~ /Ph_gUFAC1083/ )
{
begin=1
print $0
}
else
{
# if the line contains Ph_gUFAC1083 and Ph_gUFAC1139 was found before it, print it
if( begin == 1 && ( $0 ~ /Ph_gUFAC1139/ ) )
{
print $0
}
else
{
# found a line which doesnt contain Ph_gUFAC1139 , mark the end of the block.
begin = 0
}
}
}'

awk: "not enough arguments to satisfy format string" error in script

I created a script to grab the data from our Unix server, however I am getting the below error:
awk: cmd. line:8: (FILENAME=- FNR=2) fatal: not enough arguments to satisfy format string
`|%-17s|%-16s|%-15s|'
^ ran out for this one
Below is the complete script:
#!/bin/sh
export TERM=xterm
ipath=/usr/local/nextone/bin
date=$(date +"%Y%m%d%H%M")
ifile="$(date '+/var/EndpointUsage_%I-%M-%p_%d-%m-%Y.csv')"
"$ipath"/cli iedge list | awk '
BEGIN { print "|-----------------|------------------|------------------|";
printf "|%-18s|%-17s|%-16s|\r\n","Registration ID", "Port", "Ongoing Calls"
}
/Registration ID/ { id = $3; next }
/Port/ { port = $3 ; next }
/Ongoing Calls/ {print "|-------------------|-----------------|------------- -----|";
printf "|%-18s|%-17s|%-16s|\r\n",id,port,$3 }
END{
print "|------------------|------------------|------------------|";
}'>> "$ifile"
Can anyone please help me on this, how can I resolve this error?
AFTER CHANGES the columns are showing correctly, but the Port column does not have any data. It should have 0 or if other endpoint have 1 o 2 Port number.
|-----------------|------------------|------------------|
|Registration ID |Port |Ongoing Calls |
|-------------------|-----------------|------------------|
|-------------------|-----------------|------------------
|CC_XXXXXX_01_0 | |174 |
|-------------------|-----------------|------------------|
The offending printf is:
printf "|%-18s|%-17s|%-16s|\r\n",id,$3
^^^^ awk wants to see a third parameter here
You have three %s sequences in the format string, so awk expects ,<something else> after the $3. I think maybe it's a copy and paste error. Since you are only printing two column headers, try removing the %-16s| at the end and seeing if that gives you the output you expect.
Edit Without seeing your input file, I don't know for sure. Try this, though -
/Registration ID/ { id = $3; next }
/Port/ { port = $3 ; next }
/Ongoing Calls/ {print "|-------------------|-----------------|------------------|";
printf "|%-18s|%-17s|%-16s|\r\n",id,port,$3 }
I added {port=$3;next} to save the port number, and then when you print them out, I changed id,$3 to id,port,$3 to print the saved id, saved port, and ongoing-calls value ($3) in order.

Resources