Get diff and replace string with awk

Get diff and replace string with awk - bash

I am trying to get the differences between two files with the following awk script:
awk 'NR==FNR{
a[$0]
next
}
{
if($0 in a)
delete a[$0]
else
a[$0]
}
END {
for(i in a) {
$0=i
sub(/[^.]*$/,substr(tolower($1),1,length($1)-1),$3)
print
}
}' [ab].yaml
The a.yaml file:
NAME_VAR: {{ .Data.data.name_var }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}
and the b.yaml file:
NAME_VAR: {{ .Data.data.name_var }}
SOME_VALUE: {{ .Data.data. }}
ONE_MORE: {{ .Data.data. }}
ADD_THIS: {{ .Data.data. }}
the script should merge the differences and replace what is contained in the curly brackets.
Something like this:
ADD_THIS: {{ .Data.data.add_this }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}
But it duplicates my output:
ADD_THIS: {{ .Data.data.add_this }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}
the script should replace everything contained in the braces if there are new variables.

Assumptions:
the data component will always be one of .Data.data.<some_string> or .Data.data.
Taking a slightly different approach:
awk '
NR==FNR { a[$1]=$3
next
}
$1 in a { if ($3 == a[$1]) { # if $1 is an index of a[] and $3 is an exact match then ...
delete a[$1] # delete the a[] entry (ie, these 2 rows are identical so discard both)
}
else
if (length($3) > length(a[$1])) # if $1 is an index of a[] but $3 does not match, and $3 is longer then ...
a[$1]=$3 # update a[] with the new/longer entry
next # skip to next input line
}
{ a[$1]=$3 } # if we get here then $1 has not been seen before so add to a[]
END { for (i in a) { # loop throug indices
val=a[i] # make copy of value
sub(/[^.]*$/,tolower(i),val) # strip off everything coming after the last period and add the lowercase of our index
sub(/:$/,"",val) # strip the ":" off the end of the index
print i,"{{",val,"}}" # print our new output
}
}
' [ab].yaml
This generates:
SOME_VALUE: {{ .Data.data.some_value }}
ADD_THIS: {{ .Data.data.add_this }}
ONE_MORE: {{ .Data.data.one_more }}
NOTE: if the output needs to be re-ordered then it's like easier to pipe the results to the appropriate sort command
As for why OP's current code prints duplicate lines ...
Modify the END{...} block like such:
END { for (i in a)
print i,a[i]
}
This should show the code saves the inputs from both files but since no effort is made to match 'duplicates' (via a matching $1) the result is that both sets of inputs (now modified to look identical) are printed to stdout.

Related

How to extract a fragment of text from file?

I have a file with the following text:
<div>
<b>a:</b> <a class='a' href='/a/1'>a1</a><br>
<b>b:</b> <a class='b' href='/b/2'>b2</a><br>
<b>c:</b> <a class='c' href='/c/3/'>c3</a><br>
<b>d:</b> "ef"<br><br><div class='start'>123
<br>ghij.
<br>klmn
<br><br><b>end</b>
</div>
</div>
I want to do the following:
Whenever a line starts with <b>a:</b> <a class='a', I want to copy the text between the > symbol after <a class='a' and </a> — it must be stored in a[1];
Similarly, whenever a line starts with <b>b:</b> <a class='b', I want to copy the text between the > symbol after <a class='b' and </a> — it must be stored in b[1];
Whenever a line contains <div class='start'>, I want to create the variable t whose value starts with the text that occurs between <div class='start'> and the end of this line, then set flag to 1;
If the value of flag is already 1 and the current line does not start with <br><br><b>end</b>, I want to append the current line to the current value of the variable t (using the space symbol as separator);
If the value of flag is already 1 and the current line starts with <br><br><b>end</b>, I want to concatenate three current values of a[1], b[1] and t (using ; as separator) and print the result to the output file, then set flag to 0, then clear the variable t.
I used the following code (for gawk 4.0.1):
gawk 'BEGIN {flag = 0; t = ""; }
{
if ($0 ~ /^<b>a:<\/b> <a class=\x27a\x27/ ) {
match($0, /^<b>a:<\/b> <a class=\x27a\x27 href=\x27\/a\/[0-9]{1,}\x27>(.*)<\/a>/, a) };
if ($0 ~ /^<b>b:<\/b> <a class=\x27b\x27/ ) {
match($0, /^<b>b:<\/b> <a class=\x27b\x27 href=\x27\/b\/[0-9]{1,}\x27>(.*)<\/a>/, b) };
if ($0 ~ /<div class=\x27start\x27>/ ) {
match($0, /^.*<div class=\x27start\x27>(.*)$/, s);
t = s[1];
flag = 1 };
if (flag == 1) {
if ($0 ~ /^<br><br><b>end<\/b>/) {
str = a[1] ";" b[1] ";" t;
print(str) > "output.txt";
flag = 0; str = ""; t = "" }
else {
t = t " " $0 }
}
}' input.txt
I was expecting the following output:
a1;b2;123 <br>ghij. <br>klmn
But the output is:
;;123 <b>d:</b> "ef"<br><br><div class='start'>123 <br>ghij. <br>klmn
Why are a[1] and b[1] empty? Why does <b>d:</b> "ef"<br><br><div class='start'> occur in the output? How to fix the code to obtain the expected output?

Here's the answers to your specific questions:
Q) Why are a[1] and b[1] empty?
A) They aren't when I try your script with gawk 5.1.1 so most likely either there's a bug in your awk version or some of the white space in your input isn't blanks as your script requires (maybe it's tabs), or you have some control chars or your awk version doesn't like using \x27 instead of \047 for 's.
Q) Why does <b>d:</b> "ef"<br><br><div class='start'> occur in the output?
A) Because you forgot a next in the block that matches on div so the next block is also executing and saving $0 from the div line.
Q) How to fix the code to obtain the expected output?
A) Here's how I'd approach your problem, using GNU awk for the 3rd arg to match() and \s shorthand for [:space:]:
$ cat tst.sh
#!/usr/bin/env bash
gawk '
BEGIN { OFS=";" }
match($0, /^<b>(.):<\/b>\s+<a\s+class=\047.\047\s+href=\047\/.\/[0-9]+\/?\047>(.*)<\/a>/, arr) {
vals[arr[1]] = arr[2]
}
match($0, /^.*<div\s+class=\047start\047>(.*)/, arr) {
vals["div"] = arr[1]
inDiv = 1
next
}
inDiv {
if ( /^<br><br><b>end<\/b>/ ) {
print vals["a"], vals["b"], vals["div"]
delete vals
inDiv = 0
}
else {
vals["div"] = vals["div"] " " $0
}
}
' 'input.txt' > 'output.txt'
$ ./tst.sh
$ cat output.txt
a1;b2;123 <br>ghij. <br>klmn
So
I'm using a single match() to capture all values for lines that look like your a, b, c lines for consistency, conciseness, and maintainability.
I'm always saving the match results in an array named arr rather than different arrays per occurrence so I don't have to remember to keep deleting those arrays and the code that uses the matches can all be homogenized.
I'm using a single associative array vals[] to hold all values indexed by the letter after <b> so we don't need to test those letters and create separate variables, it's easy to clear the data by just deleting the array rather than having to set multiple variables to null, and it's easy to add the c or any other similar values to the output later if desired.
I'm using \s+ instead of a single blank char for every space in the input to be agnostic about the actual space char(s) and number of spaces used.
I'm using \047 instead of \x27 to match 's for portability and robustness, see http://awk.freeshell.org/PrintASingleQuote.
I'm letting the shell handle all input/output rather than including output redirection in the awk script for consistency and improved robustness in error scenarios like files that can't be opened.
I named my flag variable inDiv rather than flag so it tells us what it means, i.e. that we're in the div block of the input, for improved clarity and easy of future maintenance. Naming a flag variable flag is like naming a numeric variable number instead of sum, count, ave, tot, diff or something else meaningful that'd improve your script. When you see people use f for the name of a flag variable, that f is shorthand for found, not for flag.

Demonstrating that gawk's regexes don't match perl's
perl:
$ echo aaaab | perl -nE '/a*(a+b)/ && say $1'
ab
$ echo aaaab | perl -nE '/a*?(a+b)/ && say $1'
aaaab
a*? matched the shortest sequence of zero or more a's, and the greedy a+ consumed the rest.
gawk
$ echo aaaab | gawk 'match($0, /a*(a+b)/, m) {print m[1]}'
ab
$ echo aaaab | gawk 'match($0, /a*?(a+b)/, m) {print m[1]}'
ab
Not the same behaviour: a*? is still greedy.

Why(...)a[1](...)empty?
match function does return 0 if not match was found, which allows to easy check if this is case, I selected part pertaining to filling a-array and altered it a bit
{
if ($0 ~ /^<b>a:<\/b> <a class=\x27a\x27/ )
{
print NR, match($0, /^<b>a:<\/b> <a class=\x27a\x27 href=\x27\/a\/[0-9]{1,}>(.*)<\/a>/, a);
}
}
then used it again
<div>
<b>a:</b> <a class='a' href='/a/1'>a1</a><br>
<b>b:</b> <a class='b' href='/b/2'>b2</a><br>
<b>c:</b> <a class='c' href='/c/3/'>c3</a><br>
<b>d:</b> "ef"<br><br><div class='start'>123
<br>ghij.
<br>klmn
<br><br><b>end</b>
</div>
</div>
and got output
2 0
so condition in if worked as expected as line with <b>a:</b>... is 2nd line, however match was not found. This mean your regular expression is wrong, after examining, your regular expression is missing one single quote, it should be
/^<b>a:<\/b> <a class=\x27a\x27 href=\x27\/a\/[0-9]{1,}\x27>(.*)<\/a>/
then
{
if ($0 ~ /^<b>a:<\/b> <a class=\x27a\x27/ )
{
print NR, match($0, /^<b>a:<\/b> <a class=\x27a\x27 href=\x27\/a\/[0-9]{1,}\x27>(.*)<\/a>/, a);
print a[1];
}
}
does give output
2 1
a1
(tested in gawk 4.2.1)

Ansible : Use regex replace for items in list and assign it back

I tried using regex_replace option this way:
Here groups[group_names[0]] is list of node names
"groups[group_names[0]]": [
"node1.in.labs.corp.netin",
"node2.in.labs.corp.netin"
]
- set_fact:
groups[group_names[0]]={{ groups[group_names[0]] |
map('regex_replace', _regex, _replace)|list }}
vars:
_regex: '^(.*?)\.(.*)$'
_replace: '-n \1'
Hitting the following error:
{"changed": false, "msg": "The variable name 'groups[group_names[0]]' is not valid. Variables must start with a letter or underscore character, and contain only letters, numbers and underscores."}
Can i assign back to same list ? after replacing the regex ?
Also -n option is using so that my expected output should be
-n node1 -n node2

Is there an efficient way to concatenate strings

For example, there is a function like that:
func TestFunc(str string) string {
return strings.Trim(str," ")
}
It runs in the example below:
{{ $var := printf "%s%s" "x" "y" }}
{{ TestFunc $var }}
Is there anyway to concatenate strings with operators in template ?
{{ $var := "y" }}
{{ TestFunc "x" + $var }}
or
{{ $var := "y" }}
{{ TestFunc "x" + {$var} }}
It gives unexpected "+" in operand error.
I couldnt find it in documentation (https://golang.org/pkg/text/template/)

There is not a way to concatenate strings with an operator because Go templates do not have operators.
Use the printf function as shown in the question or combine the calls in a single template expression:
{{ TestFunc (printf "%s%s" "x" "y") }}
If you always need to concatenate strings for the TestFunc argument, then write TestFunc to handle the concatenation:
func TestFunc(strs ...string) string {
return strings.Trim(strings.Join(strs, ""), " ")
}
{{ TestFunc "x" $var }}

How to avoid newlines caused by conditionals?

Given this Go text/template code:
Let's say:
{{ if eq .Foo "foo" }}
Hello, StackOverflow!
{{ else if eq .Foo "bar" }}
Hello, World!
{{ end }}
We get the following output in case Foo equals "foo":
Let's say:
Hello, StackOverflow!
(followed by a newline)
Is there a way to get rid of the extra newlines?
I would expect that this can be accomplished using the {{- and -}} syntax:
Let's say:
{{- if eq .Foo "foo" }}
Hello, StackOverflow!
{{- else if eq .Foo "bar" }}
Hello, World!
{{- end }}
However, that yields an illegal number syntax: "-" error.

In your first template, you have a newline after the static text "Let's say:", and the 2nd line contains only the {{if}} action, and it also contains a newline, and its body "Hello, StackOverflow!" starts in the 3rd line. If this is rendered, there will be 2 newlines between the 2 static texts, so you'll see an empty line (as you posted).
You may use {{- if... to get rid of the first newline, so when rendered, only 1 newline gets to the output, resulting in 2 different lines but no newlines between them:
Let's say:
{{- if eq .Foo "foo" }}
Hello, StackOverflow!
{{- else if eq .Foo "bar" }}
Hello, World!
{{- end }}
Output when Foo is "foo":
Let's say:
Hello, StackOverflow!
Output when Foo is "bar":
Let's say:
Hello, World!
Try it on the Go Playground.
Note that this was added in Go 1.6: Template, and is documented at text/template: Text and Spaces.
If you use the - sign at the closing of the actions -}}, you can even remove all the newlines:
Let's say:
{{- if eq .Foo "foo" -}}
Hello, StackOverflow!
{{- else if eq .Foo "bar" -}}
Hello, World!
{{- end -}}
Output when Foo is "foo" and Foo is "bar":
Let's say:Hello, StackOverflow!
Let's say:Hello, World!
Try this one on the Go Playground.

There is a new line because you're adding a new line after colons (:)
This works https://play.golang.org/p/k4lazGhE-r
Note I just start the first if right after the first colons

awk command and variable assignment

I have a MyFile.xml whose contents are as below
<root>
<Main>
<someothertag>..</someothertag>
<Amt Ccy="EUR">13</Amt>
</Main>
.
.
.
some other tags
<Main>
<someothertag>..</someothertag>
<Amt Ccy="SGD">10</Amt>
</Main>
<another>
<Amt Ccy="EUR">10</Amt>
</another>
</root>
I have script file whose contents are as below
result = `awk '/<Main>/ { f=1 } f && /Amt/ { split($0,a,/[<>]/); s+=a[3] } /<\/Main>/ { f=0 } END {print s }' MyFile.xml`
echo "The result is " $result
But i am getting output as
result: 0653-690 Cannot open =.
result: 0653-690 Cannot open 23.
The result is
My Expected output is
The result is 23

When assigning variables there should be no spaces on either side of the =.
Change to:
result=`awk '/<Main>/ { f=1 } f && /Amt/ { split($0,a,/[<>]/); s+=a[3] } /<\/Main>/ { f=0 } END {print s }' MyFile.xml`

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Get diff and replace string with awk - bash

Related

How to extract a fragment of text from file?

Ansible : Use regex replace for items in list and assign it back

Is there an efficient way to concatenate strings

How to avoid newlines caused by conditionals?

awk command and variable assignment

Categories

Resources