awk command and variable assignment - shell

I have a MyFile.xml whose contents are as below
<root>
<Main>
<someothertag>..</someothertag>
<Amt Ccy="EUR">13</Amt>
</Main>
.
.
.
some other tags
<Main>
<someothertag>..</someothertag>
<Amt Ccy="SGD">10</Amt>
</Main>
<another>
<Amt Ccy="EUR">10</Amt>
</another>
</root>
I have script file whose contents are as below
result = `awk '/<Main>/ { f=1 } f && /Amt/ { split($0,a,/[<>]/); s+=a[3] } /<\/Main>/ { f=0 } END {print s }' MyFile.xml`
echo "The result is " $result
But i am getting output as
result: 0653-690 Cannot open =.
result: 0653-690 Cannot open 23.
The result is
My Expected output is
The result is 23

When assigning variables there should be no spaces on either side of the =.
Change to:
result=`awk '/<Main>/ { f=1 } f && /Amt/ { split($0,a,/[<>]/); s+=a[3] } /<\/Main>/ { f=0 } END {print s }' MyFile.xml`

Related

Retreive specific values from file

I have a file test.cf containing:
process {
withName : teq {
file = "/path/to/teq-0.20.9.txt"
}
}
process {
withName : cad {
file = "/path/to/cad-4.0.txt"
}
}
process {
withName : sik {
file = "/path/to/sik-20.0.txt"
}
}
I would like to retreive value associated at the end of the file for teq, cad and sik
I was first thinking about something like
grep -E 'teq' test.cf
and get only second raw and then remove part of recurrence in line
But it may be easier to do something like:
for a in test.cf
do
line=$(sed -n '{$a}p' test.cf)
if line=teq
#next line using sed -n?
do print nextline &> teq.txt
else if line=cad
do print nextline &> cad.txt
else if line=sik
do print nextline &> sik.txt
done
(obviously it doesn't work)
EDIT:
output wanted:
teq.txt containing teq-0.20.9, cad.txt containing cad-4.0 and sik.txt containing sik-20.0
Is there a good way to do that? Thank you for your comments
Based on your given sample:
awk '/withName/{close(f); f=$3 ".txt"}
/file/{sub(/.*\//, ""); sub(/\.txt".*/, "");
print > f}' ip.txt
/withName/{close(f); f=$3 ".txt"} if line contains withName, save filename in f using the third field. close() will close any previous file handle
/file/{sub(/.*\//, ""); sub(/\.txt".*/, ""); if line contains file, remove everything except the value required
print > f print the modified line and redirect to filename in f
if you can have multiple entries, use >> instead of >
Here is a solution in awk:
awk '/withName/{name=$3} /file =/{print $3 > name ".txt"}' test.cf
/withName/{name=$3}: when I see the line containing "withName", I save that name
When I see the line with "file =", I print

AWK print block that does NOT contain specific text

I have the following data file:
variable "ARM_CLIENT_ID" {
description = "Client ID for Service Principal"
}
variable "ARM_CLIENT_SECRET" {
description = "Client Secret for Service Principal"
}
# [.....loads of code]
variable "logging_settings" {
description = "Logging settings from TFVARs"
}
variable "azure_firewall_nat_rule_collections" {
default = {}
}
variable "azure_firewall_network_rule_collections" {
default = {}
}
variable "azure_firewall_application_rule_collections" {
default = {}
}
variable "build_route_tables" {
description = "List of Route Table keys that need direct internet prior to Egress FW build"
default = [
"shared_services",
"sub_to_afw"
]
}
There's a 2 things I wish to do:
print the variable names without the inverted commas
ONLY print the variables names if the code block does NOT contain default
I know I can print the variables names like so: awk '{ gsub("\"", "") }; (/variable/ && $2 !~ /^ARM_/) { print $2}'
I know I can print the code blocks with: awk '/variable/,/^}/', which results:
# [.....loads of code output before this]
variable "logging_settings" {
description = "Logging settings from TFVARs"
}
variable "azure_firewall_nat_rule_collections" {
default = {}
}
variable "azure_firewall_network_rule_collections" {
default = {}
}
variable "azure_firewall_application_rule_collections" {
default = {}
}
variable "build_route_tables" {
description = "List of Route Table keys that need direct internet prior to Egress FW build"
default = [
"shared_services",
"sub_to_afw"
]
}
However, I cannot find out how to print the code blocks "if" they don't contain default. I know I will need to use an if statement, and some variables perhaps, but I am unsure as of how.
This code block should NOT appear in the output for which I grab the variable name:
variable "build_route_tables" {
description = "List of Route Table keys that need direct internet prior to Egress FW build"
default = [
"shared_services",
"sub_to_afw"
]
}
End output should NOT contain those that had default:
# [.....loads of code output before this]
expressroute_settings
firewall_settings
global_settings
peering_settings
vnet_transit_object
vnet_shared_services_object
route_tables
logging_settings
Preferable I would like to keep this a single AWK command or file, no piping. I have uses for this that do prefer no piping.
EDIT: update the ideal outputs (missed some examples of those with default)
Assumptions and collection of notes from OP's question and comments:
all variable definition blocks end with a right brace (}) in the first column of a new line
we only display variable names (sans the double quotes)
we do not display the variable names if the body of the variable definition contains the string default
we do not display the variable name if it starts with the string ARM_
One (somewhat verbose) awk solution:
NOTE: I've copied the sample input data into my local file variables.dat
awk -F'"' ' # use double quotes as the input field separator
/^variable / && $2 !~ "^ARM_" { varname = $2 # if line starts with "^variable ", and field #2 is not like "^ARM_", save field #2 for later display
printme = 1 # enable our print flag
}
/variable/,/^}/ { if ( $0 ~ "default" ) # within the range of a variable definition, if we find the string "default" ...
printme = 0 # disable the print flag
next # skip to next line
}
printme { print varname # if the print flag is enabled then print the variable name and then ...
printme = 0 # disable the print flag
}
' variables.dat
This generates:
logging_settings
$ awk -v RS= '!/default =/{gsub(/"/,"",$2); print $2}' file
ARM_CLIENT_ID
ARM_CLIENT_SECRET
[.....loads
logging_settings
of course output doesn't match yours since it's inconsistent with the input data.
Using GNU awk:
awk -v RS="}" '/variable/ && !/default/ && !/ARN/ { var=gensub(/(^.*variable ")(.*)(".*{.*)/,"\\2",$0);print var }' file
Set the record separator to "}" and then check for records that contain "variable", don't contain default and don't contain "ARM". Use gensub to split the string into three sections based on regular expressions and set the variable var to the second section. Print the var variable.
Output:
logging_settings
Another variation on awk using skip variable to control the array index holding the variable names:
awk '
/^[[:blank:]]*#/ { next }
$1=="variable" { gsub(/["]/,"",$2); vars[skip?n:++n]=$2; skip=0 }
$1=="default" { skip=1 }
END { if (skip) n--; for(i=1; i<=n; i++) print vars[i] }
' code
The first rule just skips comment lines. If you want to skip "ARM_" variables, then you can add a test on $2.
Example Use/Output
With your example code in code, all variables without default are:
$ awk '
> /^[[:blank:]]*#/ { next }
> $1=="variable" { gsub(/["]/,"",$2); vars[skip?n:++n]=$2; skip=0 }
> $1=="default" { skip=1 }
> END { if (skip) n--; for(i=1; i<=n; i++) print vars[i] }
> ' code
ARM_CLIENT_ID
ARM_CLIENT_SECRET
logging_settings
Here's another maybe shorter solution.
$ awk -F'"' '/^variable/&&$2!~/^ARM_/{v=$2} /default =/{v=0} /}/&&v{print v; v=0}' file
logging_settings

awk: data missed while parsing file

I have written a script to parse hourly log files to extract "CustomerId, Marketplace, StartTime, and DealIdClicked" data. The log file structure is like so:
------------------------------------------------------------------------
Size=0 bytes
scheme=https
StatusCode=302
RequestId=request_Id_X07
CustomerId=XYZCustomerId
Marketplace=MarketPlace
StartTime=1592931599.986
Program=Unknown
Info=sub-page-type=desktop:Deals_Content_DealIdClicked_0002,sub-page-CSMTags=UTF-8
Counters=sub-page-type=desktop:Deals_Content_DealIdClicked_0002=3,sub-page-CSMTags=Encoding:UTF-8
EOE
------------------------------------------------------------------------
Here is the script I have written to parse the log.
function readServiceLog() {
local _logfile="$1"
local _csvFile="$2"
local _logFileName=$(getLogFileName "$_logfile")
parseLogFile "$_logfile" "$_csvFile"
echo "$_logFileName" >>"$SCRIPT_PATH/excludeFile.txt"
}
# Function to match regex and extract required data.
function parseLogFile() {
local _logfile=$1
local _csvFile=$2
zcat <"$_logfile" | awk -v csvFilePath="$_csvFile" '
BEGIN {
customerIdRegex="^CustomerId="
marketplaceIdRegex="^MarketplaceId="
startTimeRegex="^StartTime="
InfoRegex="^Info="
dealIdRegex = "Deals_Content_DealIdClicked_"
EOERegex="^EOE$"
delete RECORD
}
{
logLine=$0
if (match(logLine,InfoRegex)) {
after = substr(logLine,RSTART+RLENGTH);
if(match(after, dealIdRegex)) {
afterDeal = substr(after,RSTART+RLENGTH);
dealId = substr(afterDeal, 1, index(afterDeal,",")-1)
RECORD[0] = dealId
}
}
if (match(logLine,customerIdRegex)) {
after = substr(logLine,RSTART+RLENGTH);
customerid = substr(after, 1, length(after))
RECORD[1] = customerid
}
if (match(logLine,startTimeRegex)) {
after = substr(logLine,RSTART+RLENGTH);
startTime = substr(after, 1, length(after))
RECORD[2] = startTime
}
if (match(logLine,marketplaceIdRegex)) {
after = substr(logLine,RSTART+RLENGTH);
marketplaceId = substr(after, 1, length(after))
RECORD[3] = marketplaceId
}
if (match(logLine,EOERegex)) {
if(length(RECORD) == 4) {
printf("%s,%s,%s,%s\n", RECORD[0],RECORD[1],RECORD[2],RECORD[3]) >> csvFilePath
}
delete RECORD
}
}'
}
function processHourlyFile() {
local _currentProcessingFolder=$1
local _outputFolder=$(getOutputFolderName) //getOutputFolderName function is from util class.
mkdir -p "$_outputFolder"
local _csvFileName="$_outputFolder/${_currentProcessingFolder##*/}.csv"
for entry in "$_currentProcessingFolder"/*; do
if [[ "$entry" == *"$SERVICE_LOG"* ]]; then
readServiceLog "$entry" "$_csvFileName"
fi
done
}
# Main execution to spawn new processes for parallel parsing.
function main() {
local _processCount=1
for entry in $INPUT_LOG_PATH/*; do
processHourlyFile $entry &
pids[${_processCount}]=$!
done
printInfo
# wait for all pids
for pid in ${pids[*]}; do
wait $pid
done
}
main
printf '\nFinished!\n'
Expected output:
A comma separated file.
0002,XYZCustomerId,1592931599.986,MarketPlace
Problem
The script spawns 24 processes to parse 24-hour logs for an entire day. After parsing the files, I verified the count of record, and some time it doesn’t match with the original log file record count.
I am stuck on this from the last two days with no luck. Any help would be appreciated.
Thanks in advance.
Try:
awk -F= '
{
a[$1]=$2
}
/^Info/ {
sub(/.*DealIdClicked_/, "")
sub(/,.*/, "")
print $0, a["CustomerId"], a["StartTime"], a["Marketplace"]
delete a
}' OFS=, filename
When run on your input file, the above produces the desired output:
0002,XYZCustomerId,1592931599.986,MarketPlace
How it works
-F= tells awk to use = as the field separator on input.
{ a[$1]=$2 } tells awk to save the second field, $2, in associative array a under the key $1.
/^Info/ { ... } tells awk to perform the commands in curly braces whenever the line starts with Info. Those commands are:
sub(/.*DealIdClicked_/, "") removes all parts of the line up to and including DealIdClicked_.
sub(/,.*/, "") tells awk to remove from what's left of the line everything from the first comma to the end of the line.
The remainder of the line, still called $0, is the "DealId" that we want.
print $0, a["CustomerId"], a["StartTime"], a["Marketplace"] tells awk to print the output that we want.
delete a this deletes array a so we start over clean on the next record.
OFS=, tells awk to use a comma as the field separator on output.

How to delete everything between two :'s, but not if between {}'s? [duplicate]

This question already has an answer here:
How to delete a pattern when it is not found between two symbols in Perl?
(1 answer)
Closed 8 years ago.
I have a text file like this:
This is {an example} of : some of the: text.
This is yet {another : example :} of some of the text.
:This: is :still :yet another {:example:} of :some text:.
I need to delete any text found inside any :'s, including the :'s, but not if they fall inside a pair of { and }.
Anything between a { and } is safe, including :'s.
Anything not between a { and } but found between : and : is deleted.
The :'s found outside { and } are all deleted.
The output would look like this:
This is {an example} of text.
This is yet {another : example :} of some of the text.
is yet another {:example:} of .
There is only one set of braces per line.
The paired braces are never split across lines.
There could be any number of :'s on the line, inside or outside the braces.
:'s always come in pairs.
How can I delete everything between colons, including the colons themselves, but not when protected by braces?
My best attempt so far is to use awk -F"{" '{ print $1 }' > file1.txt, awk -F"{" '{ print $2 }' > file2.txt, etc. to split the lines around the braces into different, run sed on the specific files to remove the parts, but not on the files containing the data inside the braces, then to assemble it back together with paste, but this solution is far too complicated.
This will do as you ask
use strict;
use warnings;
my $data = do {
local $/;
<DATA>;
};
my #parts = split m/ ( \{ [^{}]* \} ) /x, $data;
for (#parts) {
s/ : [^:]* : //gx unless /^\{/;
}
print #parts, "\n";
__DATA__
This is {an example} of : some of the: text.
This is yet {another : example :} of some of the text.
:This: is :still :yet another {:example:} of :some text:.
output
This is {an example} of text.
This is yet {another : example :} of some of the text.
is yet another {:example:} of .
this is simple, try the following:
perl -pe 's/({[^{}]*})|:[^:]*:/$1/g' file
all texts inside { } are saved in $1 and thus skipped:)
In Perl:
#!/usr/bin/env perl
while (<>) {
my #chars = split //;
foreach my $c (#chars) {
if ($c eq "{" .. $c eq "}") {
print "$c";
} elsif ($c eq ":" ... $c eq ":") {
}
else {
print "$c";
}
}
}
or put more succinctly:
while (<>) {
print grep {/\{/ .. /\}/ or not /:/ ... /:/} split //;
}
Counting braces and colons:
perl -ne '
$b = $c = 0;
for $char (split //) {
$b++ if $char eq "{";
$b-- if $char eq "}";
if ($b > 0) {
print $char;
}
else {
if ($c == 0 and $char eq ":") {
$c++;
}
else {
print $char if $c == 0;
$c-- if $c == 1 and $char eq ":";
}
}
}
' <<END
This is {an example} of : some of the: text.
This is yet {another : example :} of some of the text.
:This: is :still :yet another {:example:} of :some text:.
END
This is {an example} of text.
This is yet {another : example :} of some of the text.
is yet another {:example:} of .

Script to migrate data from one source to another

I have a .h file, among other things, containing data in this format
struct X[]{
{"Field", "value1 value2 value"},
{"Field2", "value11 value12 value232"},
{"Field3", "x y z"},
{"Field4", "a bbb s"},
{"Field5", "sfsd sdfdsf sdfs"};
/****************/
};
I have text file containing, values that I want to replace in .h file with new values
value1 Valuesdfdsf1
value2 Value1dfsdf
value3 Value1_another
sfsd sfsd_ewew
sdfdsf sdfdsf_ew
sdfs sfsd_new
And the resulting .h file will contain the replacements from the text file above. Everything else remains the same.
struct X[]{
{"Field1", "value11 value12 value232"},
{"Field2", "value11 value12 value232"},
{"Field3", "x y z"},
{"Field4", "a bbb s"},
{"Field5", "sfsd_ewew sdfdsf_ew sdfs_new"};
/****************/
};
Please help me come with a solution to accomplish it using unix tools: awk, perl, bash, sed, etc
cat junk/n2.txt | perl -e '{use File::Slurp; my #r = File::Slurp::read_file("junk/n.txt"); my %r = map {chomp; (split(/\s+/,$_))[0,1]} #r; while (<>) { unless (/^\s*{"/) {print $_; next;}; my ($pre,$values,$post) = ($_ =~ /^(\s*{"[^"]+", ")([^"]+)(".*)$/); my #new_values = map { exists $r{$_} ? $r{$_}:$_ } split(/\s+/,$values); print $pre . join(" ",#new_values) . $post . "\n"; }}'
Result:
struct X[]{
{"Field", "value1 Value1dfsdf value"},
{"Field2", "value11 value12 value232"},
{"Field3", "x y z"},
{"Field4", "a bbb s"},
{"Field5", "sfsd_ewew sdfdsf_ew sfsd_new"};
/****************/
};
Code untangled:
use File::Slurp;
my #replacements = File::Slurp::read_file("junk/n.txt");
my %r = map {chomp; (split(/\s+/,$_))[0,1]} #replacements;
while (<>) {
unless (/^\s*{"/) {print $_; next;}
my ($pre,$values,$post) = ($_ =~ /^(\s*{"[^"]+", ")([^"]+)(".*)$/);
my #new_values = map { exists $r{$_} ? $r{$_} : $_ } split(/\s+/, $values);
print $pre . join(" ",#new_values) . $post . "\n";
}
#!/usr/bin/perl
use strict; use warnings;
# you need to populate %lookup from the text file
my %lookup = qw(
value1 Valuesdfdsf1
value2 Value1dfsdf
value3 Value1_another
sfsd sfsd_ewew
sdfdsf sdfdsf_ew
sdfs sfsd_new
);
while ( my $line = <DATA> ) {
if ( $line =~ /^struct \w+\Q[]/ ) {
print $line;
process_struct(\*DATA, \%lookup);
}
else {
print $line;
}
}
sub process_struct {
my ($fh, $lookup) = #_;
while (my $line = <$fh> ) {
unless ( $line =~ /^{"(\w+)", "([^"]+)"}([,;])\s+/ ) {
print $line;
return;
}
my ($f, $v, $p) = ($1, $2, $3);
$v =~ s/(\w+)/exists $lookup->{$1} ? $lookup->{$1} : $1/eg;
printf qq|{"%s", "%s"}%s\n|, $f, $v, $p;
}
return;
}
__DATA__
struct X[]{
{"Field", "value1 value2 value"},
{"Field2", "value11 value12 value232"},
{"Field3", "x y z"},
{"Field4", "a bbb s"},
{"Field5", "sfsd sdfdsf sdfs"};
/****************/
};
Here's a simple looking program:
use strict;
use warnings;
use File::Copy;
use constant {
OLD_HEADER_FILE => "headerfile.h",
NEW_HEADER_FILE => "newheaderfile.h",
DATA_TEXT_FILE => "data.txt",
};
open (HEADER, "<", OLD_HEADER_FILE) or
die qq(Can't open file old header file ") . OLD_HEADER_FILE . qq(" for reading);
open (NEWHEADER, ">", NEW_HEADER_FILE) or
die qq(Can't open file new header file ") . NEW_HEADER_FILE . qq(" for writing);
open (DATA, "<", DATA_TEXT_FILE) or
die qq(Can't open file data file ") . DATA_TEXT_FILE . qq(" for reading);
#
# Put Replacement Data in a Hash
#
my %dataHash;
while (my $line = <DATA>) {
chomp($line);
my ($key, $value) = split (/\s+/, $line);
$dataHash{$key} = $value if ($key and $value);
}
close (DATA);
#
# NOW PARSE THOUGH HEADER
#
while (my $line = <HEADER>) {
chomp($line);
if ($line =~ /^\s*\{"Field/) {
foreach my $key (keys(%dataHash)) {
$line =~ s/\b$key\b/$dataHash{$key}/g;
}
}
print NEWHEADER "$line\n";
}
close (HEADER);
close (NEWHEADER);
copy(NEW_HEADER_FILE, OLD_HEADER_FILE) or
die qq(Unable to replace ") . OLD_HEADER_FILE . qq(" with ") . NEW_HEADER_FILE . qq(");
I could make it more efficient by using map, but that makes it harder to understand.
Basically:
I open three files, the original Header, the new Header I'm building, and the data file
I first put my data into a hash where the replacement text is keyed by the original text. (Could have done it the other way around if I wanted.
I then go through each line of the original header.
** If I see a line that looks like its a field line, I know that I might have to do a replacement.
** For each entry in my %dataHash, I do a substitution of the $key with the $dataHash{$key} replacement value. I use the \b to mark word boundries. This way, field11 is not substituted because I see field1 in that string.
** Now I write the line back to my new header file. If I didn't replace anything, I just write back the original line.
Once I finish, I copy the new header over the old header file.
This script should work
keyval is the file containing key value pairs
filetoreplace is the file containing data to be modified
The file named changed will contain the changes
#!/bin/sh
echo
keylist=`cat keyval | awk '{ print $1}'`
while read line
do
for i in $keylist
do
if echo $line | grep -wq $i; then
value=`grep -w $i keyval | awk '{print $2}'`
line=`echo $line | sed -e "s/$i/$value/g"`
fi
done
echo $line >> changed
done < filetoreplace
This might be kind of slow if your files are big.
gawk -F '[ \t]*|"' 'FNR == NR {repl[$1]=$2;next}{for (f=1;f<=NF;++f) for (r in repl) if ($f == r) $f=repl[r]; print} ' keyfile file.h

Resources