AWK print block that does NOT contain specific text - bash

I have the following data file:
variable "ARM_CLIENT_ID" {
description = "Client ID for Service Principal"
}
variable "ARM_CLIENT_SECRET" {
description = "Client Secret for Service Principal"
}
# [.....loads of code]
variable "logging_settings" {
description = "Logging settings from TFVARs"
}
variable "azure_firewall_nat_rule_collections" {
default = {}
}
variable "azure_firewall_network_rule_collections" {
default = {}
}
variable "azure_firewall_application_rule_collections" {
default = {}
}
variable "build_route_tables" {
description = "List of Route Table keys that need direct internet prior to Egress FW build"
default = [
"shared_services",
"sub_to_afw"
]
}
There's a 2 things I wish to do:
print the variable names without the inverted commas
ONLY print the variables names if the code block does NOT contain default
I know I can print the variables names like so: awk '{ gsub("\"", "") }; (/variable/ && $2 !~ /^ARM_/) { print $2}'
I know I can print the code blocks with: awk '/variable/,/^}/', which results:
# [.....loads of code output before this]
variable "logging_settings" {
description = "Logging settings from TFVARs"
}
variable "azure_firewall_nat_rule_collections" {
default = {}
}
variable "azure_firewall_network_rule_collections" {
default = {}
}
variable "azure_firewall_application_rule_collections" {
default = {}
}
variable "build_route_tables" {
description = "List of Route Table keys that need direct internet prior to Egress FW build"
default = [
"shared_services",
"sub_to_afw"
]
}
However, I cannot find out how to print the code blocks "if" they don't contain default. I know I will need to use an if statement, and some variables perhaps, but I am unsure as of how.
This code block should NOT appear in the output for which I grab the variable name:
variable "build_route_tables" {
description = "List of Route Table keys that need direct internet prior to Egress FW build"
default = [
"shared_services",
"sub_to_afw"
]
}
End output should NOT contain those that had default:
# [.....loads of code output before this]
expressroute_settings
firewall_settings
global_settings
peering_settings
vnet_transit_object
vnet_shared_services_object
route_tables
logging_settings
Preferable I would like to keep this a single AWK command or file, no piping. I have uses for this that do prefer no piping.
EDIT: update the ideal outputs (missed some examples of those with default)

Assumptions and collection of notes from OP's question and comments:
all variable definition blocks end with a right brace (}) in the first column of a new line
we only display variable names (sans the double quotes)
we do not display the variable names if the body of the variable definition contains the string default
we do not display the variable name if it starts with the string ARM_
One (somewhat verbose) awk solution:
NOTE: I've copied the sample input data into my local file variables.dat
awk -F'"' ' # use double quotes as the input field separator
/^variable / && $2 !~ "^ARM_" { varname = $2 # if line starts with "^variable ", and field #2 is not like "^ARM_", save field #2 for later display
printme = 1 # enable our print flag
}
/variable/,/^}/ { if ( $0 ~ "default" ) # within the range of a variable definition, if we find the string "default" ...
printme = 0 # disable the print flag
next # skip to next line
}
printme { print varname # if the print flag is enabled then print the variable name and then ...
printme = 0 # disable the print flag
}
' variables.dat
This generates:
logging_settings

$ awk -v RS= '!/default =/{gsub(/"/,"",$2); print $2}' file
ARM_CLIENT_ID
ARM_CLIENT_SECRET
[.....loads
logging_settings
of course output doesn't match yours since it's inconsistent with the input data.

Using GNU awk:
awk -v RS="}" '/variable/ && !/default/ && !/ARN/ { var=gensub(/(^.*variable ")(.*)(".*{.*)/,"\\2",$0);print var }' file
Set the record separator to "}" and then check for records that contain "variable", don't contain default and don't contain "ARM". Use gensub to split the string into three sections based on regular expressions and set the variable var to the second section. Print the var variable.
Output:
logging_settings

Another variation on awk using skip variable to control the array index holding the variable names:
awk '
/^[[:blank:]]*#/ { next }
$1=="variable" { gsub(/["]/,"",$2); vars[skip?n:++n]=$2; skip=0 }
$1=="default" { skip=1 }
END { if (skip) n--; for(i=1; i<=n; i++) print vars[i] }
' code
The first rule just skips comment lines. If you want to skip "ARM_" variables, then you can add a test on $2.
Example Use/Output
With your example code in code, all variables without default are:
$ awk '
> /^[[:blank:]]*#/ { next }
> $1=="variable" { gsub(/["]/,"",$2); vars[skip?n:++n]=$2; skip=0 }
> $1=="default" { skip=1 }
> END { if (skip) n--; for(i=1; i<=n; i++) print vars[i] }
> ' code
ARM_CLIENT_ID
ARM_CLIENT_SECRET
logging_settings

Here's another maybe shorter solution.
$ awk -F'"' '/^variable/&&$2!~/^ARM_/{v=$2} /default =/{v=0} /}/&&v{print v; v=0}' file
logging_settings

Related

awk: data missed while parsing file

I have written a script to parse hourly log files to extract "CustomerId, Marketplace, StartTime, and DealIdClicked" data. The log file structure is like so:
------------------------------------------------------------------------
Size=0 bytes
scheme=https
StatusCode=302
RequestId=request_Id_X07
CustomerId=XYZCustomerId
Marketplace=MarketPlace
StartTime=1592931599.986
Program=Unknown
Info=sub-page-type=desktop:Deals_Content_DealIdClicked_0002,sub-page-CSMTags=UTF-8
Counters=sub-page-type=desktop:Deals_Content_DealIdClicked_0002=3,sub-page-CSMTags=Encoding:UTF-8
EOE
------------------------------------------------------------------------
Here is the script I have written to parse the log.
function readServiceLog() {
local _logfile="$1"
local _csvFile="$2"
local _logFileName=$(getLogFileName "$_logfile")
parseLogFile "$_logfile" "$_csvFile"
echo "$_logFileName" >>"$SCRIPT_PATH/excludeFile.txt"
}
# Function to match regex and extract required data.
function parseLogFile() {
local _logfile=$1
local _csvFile=$2
zcat <"$_logfile" | awk -v csvFilePath="$_csvFile" '
BEGIN {
customerIdRegex="^CustomerId="
marketplaceIdRegex="^MarketplaceId="
startTimeRegex="^StartTime="
InfoRegex="^Info="
dealIdRegex = "Deals_Content_DealIdClicked_"
EOERegex="^EOE$"
delete RECORD
}
{
logLine=$0
if (match(logLine,InfoRegex)) {
after = substr(logLine,RSTART+RLENGTH);
if(match(after, dealIdRegex)) {
afterDeal = substr(after,RSTART+RLENGTH);
dealId = substr(afterDeal, 1, index(afterDeal,",")-1)
RECORD[0] = dealId
}
}
if (match(logLine,customerIdRegex)) {
after = substr(logLine,RSTART+RLENGTH);
customerid = substr(after, 1, length(after))
RECORD[1] = customerid
}
if (match(logLine,startTimeRegex)) {
after = substr(logLine,RSTART+RLENGTH);
startTime = substr(after, 1, length(after))
RECORD[2] = startTime
}
if (match(logLine,marketplaceIdRegex)) {
after = substr(logLine,RSTART+RLENGTH);
marketplaceId = substr(after, 1, length(after))
RECORD[3] = marketplaceId
}
if (match(logLine,EOERegex)) {
if(length(RECORD) == 4) {
printf("%s,%s,%s,%s\n", RECORD[0],RECORD[1],RECORD[2],RECORD[3]) >> csvFilePath
}
delete RECORD
}
}'
}
function processHourlyFile() {
local _currentProcessingFolder=$1
local _outputFolder=$(getOutputFolderName) //getOutputFolderName function is from util class.
mkdir -p "$_outputFolder"
local _csvFileName="$_outputFolder/${_currentProcessingFolder##*/}.csv"
for entry in "$_currentProcessingFolder"/*; do
if [[ "$entry" == *"$SERVICE_LOG"* ]]; then
readServiceLog "$entry" "$_csvFileName"
fi
done
}
# Main execution to spawn new processes for parallel parsing.
function main() {
local _processCount=1
for entry in $INPUT_LOG_PATH/*; do
processHourlyFile $entry &
pids[${_processCount}]=$!
done
printInfo
# wait for all pids
for pid in ${pids[*]}; do
wait $pid
done
}
main
printf '\nFinished!\n'
Expected output:
A comma separated file.
0002,XYZCustomerId,1592931599.986,MarketPlace
Problem
The script spawns 24 processes to parse 24-hour logs for an entire day. After parsing the files, I verified the count of record, and some time it doesn’t match with the original log file record count.
I am stuck on this from the last two days with no luck. Any help would be appreciated.
Thanks in advance.
Try:
awk -F= '
{
a[$1]=$2
}
/^Info/ {
sub(/.*DealIdClicked_/, "")
sub(/,.*/, "")
print $0, a["CustomerId"], a["StartTime"], a["Marketplace"]
delete a
}' OFS=, filename
When run on your input file, the above produces the desired output:
0002,XYZCustomerId,1592931599.986,MarketPlace
How it works
-F= tells awk to use = as the field separator on input.
{ a[$1]=$2 } tells awk to save the second field, $2, in associative array a under the key $1.
/^Info/ { ... } tells awk to perform the commands in curly braces whenever the line starts with Info. Those commands are:
sub(/.*DealIdClicked_/, "") removes all parts of the line up to and including DealIdClicked_.
sub(/,.*/, "") tells awk to remove from what's left of the line everything from the first comma to the end of the line.
The remainder of the line, still called $0, is the "DealId" that we want.
print $0, a["CustomerId"], a["StartTime"], a["Marketplace"] tells awk to print the output that we want.
delete a this deletes array a so we start over clean on the next record.
OFS=, tells awk to use a comma as the field separator on output.

Assign the value of awk-for loop variable to a bash variable

content within the tempfile
123 sam moore IT_Team
235 Rob Xavir Management
What i'm trying to do is get input from user and search it in the tempfile and output of search should give the column number
Code I have for that
#!/bin/bash
set -x;
read -p "Enter :" sword6;
awk 'BEGIN{IGNORECASE = 1 }
{
for(i=1;i<=NF;i++) {
if( $i ~ "'$sword6'$" )
print i;
}
} ' /root/scripts/pscripts/tempprint.txt;
This exactly the column number
Output
Enter : sam
2
What i need is the value of i variable should be assigned to bash variable so i can call as per the need in script.
Any help in this highly appreciated.
I searched to find any existing answer but not able to find any. If any let me know please.
first of all, you should pass your shell variable to awk in this way (e.g. sword6)
awk -v word="$sword6" '{.. if($i ~ word)...}`...
to assign shell variable by the output of other command:
shellVar=$(awk '......')
Then you can continue using $shellVar in your script.
regarding your awk codes:
if user input some special chars, your script may fail, e.g .*
if one column had matched userinput multiple times, you may have duplicated output.
if your file had multi-columns matching user input, you may want to handle it.
You just need to capture the output of awk. As an aside, I would pass sword6 as an awk variable, not inject it via string interpolation.
i=$(awk -v w="$sword6" '
BEGIN { IGNORECASE = 1 }
{ for (i=1;i<=NF;i++) {
if ($i ~ w"$") { print i; }
}
}' /root/scripts/pscipts/tempprint.txt)
Following script may help you on same too.
cat script.ksh
echo "Please enter the user name:"
read var
awk -v val="$var" '{for(i=1;i<=NF;i++){if(tolower($i)==tolower(val)){print i,$i}}}' Input_file
If tempprint.txt is big
awk -v w="$word6" '
BEGIN { IGNORECASE = 1 }
"$0 ~ \\<w\\>" {
for(i=1;i<=NF;i++)
if($i==w)print i
}' tempprint.txt

Extract values from command output to a JSON

I am extracting values from a cloud foundry command. It has to be done via the shell. Here is how the file looks like:
User-Provided:
end: 123.12.12.12
text_pass: 980
KEY: 000
Running Environment Variable Groups:
BLUEMIX_REGION: ibm:yp:us-north
Staging Environment Variable Groups:
BLUEMIX_REGION: ibm:yp:us-south
I want to extract everything from end to KEY and please note that user-provided will always be the start but the end can be any value. But there will always be a new line.
How do I extract between "User-Provided to new line" and put in a JSON file which I will later use to parse?
So far I'm able to do this:
cf env space | awk -F 'end:' '{print $2}'
this gives me the value of end but not the whole object.
Expected output:
{
"end": "123.12.12.12"
"text_pass": "980"
"KEY": "000"
}
cf env space | awk '/User-Provided/{a = 1; next}/^$/{a = 0} a'
end: 123.12.12.12
text_pass: 980
KEY: 000
When pattern User-Provided is encountered set a variable a and when a blank line is encountered, unset this variable a. Now, the lines will be printed out for only the cases when a is set.
Edited answer:
cf env space | awk -F" *: *" '/User-Provided/{a=1;print"{";next}/^$/{a=0} END{print "\n}"} a{if(c)printf(","); printf("%s", "\n\""$1"\" : \""$NF"\""); c=1}'
This will give the output:
{
"end" : "123.12.12.12",
"text_pass" : "980",
"KEY" : "000"
}
Latest edit:
cf env space | awk '/User-Provided/{a=1;print"{";next}/^$/{a=0} END{print "\n}"} a{if(c)printf(","); sub(/:$/,"",$1); printf("%s", "\n\""$1"\" : \""$NF"\""); c=1}'
In awk:
$ awk '/^end:/,/^KEY:/' file
end: 123.12.12.12
text_pass: 980
KEY: 000
/.../,/.../ is used to name the start and end markers which are printed.
However, the output requirements complicate the program a bit:
$ awk '
BEGIN { FS=": *";OFS=":" } # set appropriate delimiters
/^end:/ { print "{";f=1 } # print at start marker and raise flag
f { print "\"" $1"\"","\"" $2"\"" } # when flag up, print
/^KEY:/ { print "}";f="" } # at end-marker, print end marker and flag down
' file
{
"end":"123.12.12.12"
"text_pass":"980"
"KEY":"000"
}
If you want to use and empty line as end marker, use /^$/ && f instead of /^KEY:/.

How to get specific data from block of data based on condition

I have a file like this:
[group]
enable = 0
name = green
test = more
[group]
name = blue
test = home
[group]
value = 48
name = orange
test = out
There may be one ore more space/tabs between label and = and value.
Number of lines may wary in every block.
I like to have the name, only if this is not true enable = 0
So output should be:
blue
orange
Here is what I have managed to create:
awk -v RS="group" '!/enable = 0/ {sub(/.*name[[:blank:]]+=[[:blank:]]+/,x);print $1}'
blue
orange
There are several fault with this:
I am not able to set RS to [group], both this fails RS="[group]" and RS="\[group\]". This will then fail if name or other labels contains group.
I do prefer not to use RS with multiple characters, since this is gnu awk only.
Anyone have other suggestion? sed or awk and not use a long chain of commands.
If you know that groups are always separated by empty lines, set RS to the empty string:
$ awk -v RS="" '!/enable = 0/ {sub(/.*name[[:blank:]]+=[[:blank:]]+/,x);print $1}'
blue
orange
#devnull explained in his answer that GNU awk also accepts regular expressions in RS, so you could only split at [group] if it is on its own line:
gawk -v RS='(^|\n)[[]group]($|\n)' '!/enable = 0/ {sub(/.*name[[:blank:]]+=[[:blank:]]+/,x);print $1}'
This makes sure we're not splitting at evil names like
[group]
enable = 0
name = [group]
name = evil
test = more
Your problem seems to be:
I am not able to set RS to [group], both this fails RS="[group]" and
RS="\[group\]".
Saying:
RS="[[]group[]]"
should yield the desired result.
In these situations where there's clearly name = value statements within a record, I like to first populate an array with those mappings, e.g.:
map["<name>"] = <value>
and then just use the names to reference the values I want. In this case:
$ awk -v RS= -F'\n' '
{
delete map
for (i=1;i<=NF;i++) {
split($i,tmp,/ *= */)
map[tmp[1]] = tmp[2]
}
}
map["enable"] !~ /^0$/ {
print map["name"]
}
' file
blue
orange
If your version of awk doesn't support deleting a whole array then change delete map to split("",map).
Compared to using REs and/or sub()s., etc., it makes the solution much more robust and extensible in case you want to compare and/or print the values of other fields in future.
Since you have line-separated records, you should consider putting awk in paragraph mode. If you must test for the [group] identifier, simply add code to handle that. Here's some example code that should fulfill your requirements. Run like:
awk -f script.awk file.txt
Contents of script.awk:
BEGIN {
RS=""
}
{
for (i=2; i<=NF; i+=3) {
if ($i == "enable" && $(i+2) == 0) {
f = 1
}
if ($i == "name") {
r = $(i+2)
}
}
}
!(f) && r {
print r
}
{
f = 0
r = ""
}
Results:
blue
orange
This might work for you (GNU sed):
sed -n '/\[group\]/{:a;$!{N;/\n$/!ba};/enable\s*=\s*0/!s/.*name\s*=\s*\(\S\+\).*/\1/p;d}' file
Read the [group] block into the pattern space then substitute out the colour if the enable variable is not set to 0.
sed -n '...' set sed to run in silent mode, no ouput unless specified i.e. a p or P command
/\[group\]/{...} when we have a line which contains [group] do what is found inside the curly braces.
:a;$!{N;/\n$/!ba} to do a loop we need a place to loop to, :a is the place to loop to. $ is the end of file address and $! means not the end of file, so $!{...} means do what is found inside the curly braces when it is not the end of file. N means append a newline and the next line to the current line and /\n$/ba when we have a line that ends with an empty line branch (b) to a. So this collects all lines from a line that contains `[group] to an empty line (or end of file).
/enable\s*=\s*0/!s/.*name\s*=\s*\(\S\+\).*/\1/p if the lines collected contain enable = 0 then do not substitute out the colour. Or to put it another way, if the lines collected so far do not contain enable = 0 do substitute out the colour.
If you don't want to use the record separator, you could use a dummy variable like this:
#!/usr/bin/awk -f
function endgroup() {
if (e == 1) {
print n
}
}
$1 == "name" {
n = $3
}
$1 == "enable" && $3 == 0 {
e = 0;
}
$0 == "[group]" {
endgroup();
e = 1;
}
END {
endgroup();
}
You could actually use Bash for this.
while read line; do
if [[ $line == "enable = 0" ]]; then
n=1
else
n=0
fi
if [ $n -eq 0 ] && [[ $line =~ name[[:space:]]+=[[:space:]]([a-z]+) ]]; then
echo ${BASH_REMATCH[1]}
fi
done < file
This will only work however if enable = 0 is always only one line above the line with name.

Shell script to combine three files using AWK

I have three files G_P_map.txt, G_S_map.txt and S_P_map.txt. I have to combine these three files using awk. The example contents are the following -
(G_P_map.txt contains)
test21g|A-CZ|1mos
test21g|A-CZ|2mos
...
(G_S_map.txt contains)
nwtestn5|A-CZ
nwtestn6|A-CZ
...
(S_P_map.txt contains)
3mos|nwtestn5
4mos|nwtestn6
Expected Output :
1mos, 3mos
2mos, 4mos
Here is the code which I tried. I was able to combine the first two, but I couldn't do along with the third one.
awk -F"|" 'NR==FNR {file1[$1]=$1; next} {$2=file[$1]; print}' G_S_map.txt S_P_map.txt
Any ideas/help is much appreciated. Thanks in advance!
I would look at a combination of join and cut.
GNU AWK (gawk) 4 has BEGINFILE and ENDFILE which would be perfect for this. However, the gawk manual includes a function that will provide this functionality for most versions of AWK.
#!/usr/bin/awk
BEGIN {
FS = "|"
}
function beginfile(ignoreme) {
files++
}
function endfile(ignoreme) {
# endfile() would be defined here if we were using it
}
FILENAME != _oldfilename \
{
if (_oldfilename != "")
endfile(_oldfilename)
_oldfilename = FILENAME
beginfile(FILENAME)
}
END { endfile(FILENAME) }
files == 1 { # save all the key, value pairs from file 1
file1[$2] = $3
next
}
files == 2 { # save all the key, value pairs from file 2
file2[$1] = $2
next
}
files == 3 { # perform the lookup and output
print file1[file2[$2]], $1
}
# Place the regular END block here, if needed. It would be in addition to the one above (there can be more than one)
Call the script like this:
./scriptname G_P_map.txt G_S_map.txt S_P_map.txt

Resources