Loop pcregrep and get one match per index - bash

i try to put all pcregrep matches in a loop but when i have two matches in one line or only the second match. I would like to loop it with one match per loop-index. this is what i have now but i just get the second match:
pcregrep -o1 -r --exclude-dir="_copy" "^.*LLL:EXT:(.*)['\"].*$" Resources/Private/Templates/ContentElements
Example Filecontent:
="panels"
label="LLL:EXT:xxx/Resources/Private/Language/locallang.xlf:flux.m-alert.sheets.panels">
<flux:field.select name="color"
label="LLL:EXT:xxx/Resources/Private/Language/locallang.xlf:flux.m-alert.color"
items="{ivs:bootstrap.colors()}"
/>
<flux:field.select name="color"
label="LLL:EXT:xxx/Resources/Private/Language/locallang.xlf:flux.m-alert.color"
items="{
0:{
0:'LLL:EXT:xxx/Resources/Private/Language/locallang.xlf:bootstrap.color.primary.key', 1:'LLL:EXT:xxx/Resources/Private/Language/locallang.xlf:bootstrap.color.primary.value'
}
}"
/>
<flux:grid>
<flux:grid.row>
<flux:grid.column colPos="0"
name="content"
label="LLL:EXT:xxx/Resources/Private/Language/locallang.xlf:flux.content"/>
</flux:grid.row>
</flux:grid>
</flux:form.sheet>
Result:
xxx/Resources/Private/Language/locallang.xlf:flux.m-alert.sheets.panels
xxx/Resources/Private/Language/locallang.xlf:flux.m-alert.color
xxx/Resources/Private/Language/locallang.xlf:flux.m-alert.color
xxx/Resources/Private/Language/locallang.xlf:bootstrap.color.primary.value
xxx/Resources/Private/Language/locallang.xlf:flux.content
Expected Result:
xxx/Resources/Private/Language/locallang.xlf:flux.m-alert.sheets.panels
xxx/Resources/Private/Language/locallang.xlf:flux.m-alert.color
xxx/Resources/Private/Language/locallang.xlf:flux.m-alert.color
xxx/Resources/Private/Language/locallang.xlf:bootstrap.color.primary.key
xxx/Resources/Private/Language/locallang.xlf:bootstrap.color.primary.value
xxx/Resources/Private/Language/locallang.xlf:flux.content

Related

Awk to get the attribute value from XML file

For getting the attribute value from the below mentioned xml for attribute code from tag c
random.xml
<a>
<b>
<c id="123" code="abc" date="12-12-2022"/>
<c id="123" code="efg" date="12-12-2022"/>
<c id="123" date="12-12-2022"/>
</b>
</a>
Currently the logic is:
cat random.xml | egrep "<c.*/>" | awk -F1 ' /code=/ {f=NR} f&&NR-1==f' RS='"'
How does the above logic work to get the values of code from tag c?
Getting the expected output:
abc
efg
Firstly observe that
cat random.xml | egrep "<c.*/>" | awk -F1 ' /code=/ {f=NR} f&&NR-1==f' RS='"'
is of dubious quality, as
egrep does not require standard input, it can read file itself, so you have useless use of cat
simple pattern is used in egrep which will work equally well in common grep, no need to summon ehanced grep, this usage is overkill
1 is set as field separator in awk, but code does not make any use of fields mechanism
after fixing these issue code looks following way
grep "<c.*/>" random.xml | awk ' /code=/ {f=NR} f&&NR-1==f' RS='"'
How it does work: select lines which contain <c followed by zero-or-more any characters followed by />, then instruct awk that row are separated by qoutes (") when row does contain code= set f variable value to number of row, print such row that f is set to non-zero value and f value is equal to current number of lines minus one, which does mean print rows which are directly after row containing code=.
Observe that GNU AWK is poorly suited for working with XML and using regular expression against XML is very poor idea, as XML is not Chomsky Type 3 contraption.
If possible use proper tools for working with XML data, e.g. hxselect might be used following way, let file.xml content be
<a>
<b>
<c id="123" code="abc" date="12-12-2022"/>
<c id="123" code="efg" date="12-12-2022"/>
<c id="123" date="12-12-2022"/>
</b>
</a>
then
hxselect -c -s '\n' 'c[code]::attr(code)' < file.xml
gives output
abc
efg
Explanation: -c get just value rather than name and value, -s '\n' shear using newline, i.e. each value will be on own line c[code] is CSS3 selector meaning any c tag with attribute code, ::attr(code) is hxselect feature meaning get attribute named code. Observe that this solution is more robust than peculiar cat-egrep-awk pipeline as is immune to e.g. other whitespace usage in file (whitespaces outside tags in XML are optional).
This might be an awk question but parsing XML should be done with XML tools.
Here's an example with Xidel (available here for a few OSs) and a standard XPath expression:
xidel --xpath '//c[#code]/#code' random.xml
note: //c[#code] selects the c nodes that have a code attribute, and .../#code outputs the value of the code attribute.
Output
abc
efg
If your input always looks likes the sample XML then you can make the code attribute itself a field separator, and < the record separator, so that you can easily extract the value as the second field when the first field is the tag name c:
awk -F' .*code="|" ' -vRS='<' '$1=="c"{print $2}'
Demo: https://awk.js.org/?snippet=Lz6yx7

How to use sed to extract the specific substring?

div class="panel-body" id="current-conditions-body">
<!-- Graphic and temperatures -->
<div id="current_conditions-summary" class="pull-left" >
<img src="newimages/large/sct.png" alt="" class="pull-left" />
<p class="myforecast-current">Partly Cloudy</p>
<p class="myforecast-current-lrg">64°F</p>
<p class="myforecast-current-sm">18°C</p>
I try to extract the "64" in line 6, I was thinking to use awk '/<p class="myforecast-current-lrg">/{print}', but this only gave me the full line. Then I think I need to use sed, but i don't know how to use sed.
Assumptions:
input is nicely formatted as per the sample provided by OP so we can use some 'simple' pattern matching
Modifying OP's current awk code:
# use split() function to break line using dual delimiters ">" and "&"; print 2nd array entry
awk '/<p class="myforecast-current-lrg">/{ n=split($0,arr,"[>&]");print arr[2]}'
# define dual input field delimiter as ">" and "&"; print 2nd field in line that matches search string
awk -F'[>&]' ' /<p class="myforecast-current-lrg">/{print $2}'
Both of these generate:
64
One sed idea:
sed -En 's/.*<p class="myforecast-current-lrg">([^&]+)&deg.*/\1/p'
This generates:
64

grep two strings but only if the 2nd string is in the next line

I can grep two strings 'ion' and 'spin' from a file but would also like to either:
grep only if both strings exist
or only grep if the 2nd string is in the next line
grep -w 'ion\|spin' filename
I want to grep from a file which contains random strings like this:
<set comment="ion 3">
<set comment="spin 1">
but only if the word "spin" occurs in the next line.

SED AWK to strip data from log file

Hi I have the following entries in log file.
I need to produce a list of names in the name field if I see Denied on the line above. So I need to get something like:
Sally
Matt
Linda
Can you help me with this and I would appreciate if you could explain the command so I can use it later on for other logs.
<!-- user 1 -- >
<ABC 12345 "123" text="*Denied: ths is aa test status="0" >
<key flags="tdst" name="sally" />
<userbody>
</Status>
<!-- user 2 -- >
<ABD 12345 "123" text="*Denied: ths is aa test status="0" >
<key flags="tdst" name="Matt" />
<userbody>
</Status>
<!-- user 3 -- >
<ABD 12345 "123" text="*Denied: ths is aa test status="0" >
<key flags="tdst" name="Linda" />
<userbody>
</Status>
Regards
This GNU sed could work
sed -n -r '/Denied:/{N; s/^.*name="([^"]*)".*$/\1/; p}' file
n is skip printing lines
r using extended regular expressions, used for grouping here, to not escape () characters
N is reading next line and adding it to pattern
space
s/input/output/ is substitution
^ is start of line, so ^.*name=" will find everything till [^"] first next quote.
$ is end of line
[^"] is any character which is not " (set negation)
\1 is taking only matching group i.e. ([^"]*)
p is printing line (when prev condition Denied is fullfiled on processed 2
lines
output
sally
Matt
Linda
Try this:
sed -rn '/Denied/{n;s#(.+)(name="(\w+))"(.+)#\3#p}' < sample.txt
/Denied/ - search for the keyword
{n; - if found then read next line
s#(.+)(name="(\w+))"(.+)#\3#p - lookup regex groups and print out only the third one, which is equal to the name within quotes in your data sample.

Sed backreference not working properly

Test Data:
<img src=\"images/docs/mydash_grooms.png\" alt=\"\" />
Sed:
sed 's/<img\ssrc=\\"images\/docs\/\([[:graph:]]\)/<a class=\\"popup-image\\" href=\\"images\/docs\/\1\\"><img src=\\"images\/docs\/tn.\1/g' test.txt
Output from Sed:
<a class=\"popup-image\" href=\"images/docs/m\"><img src=\"images/docs/tn.mydash_grooms.png\" alt=\"\" />
Why is my backreference not working properly both times used?
Trying to accomplish:
Changing:
<img src=\"images/docs/mydash_grooms.png\" alt=\"\" />
to
<a class=\"popup-image\" href=\"images/docs/mydash_grooms.png\"><img src=\"images/docs/tn.mydash_grooms.png\" alt=\"\" />
You have to escape the \ so they become actually "\\". However, you also have to escape the /, which makes the string very complex. I suggest replacing the delimiter of sed (i.e., the '/'), to another character to avoid complex strings. For example, using #
sed 's#<img src=\\"images/docs/\(.*\)\\" alt=\\"\\" />#<a class=\\"popup-image\\" href=\\"images/docs/\1\\"><img src=\\"images/docs/tn.\1\\" alt=\\"\\" />#g' test.txt
Futhermore, please replace the [[:graph:]], it was not working for me.
This might work for you (GNU sed):
sed -r 'h;s|img src(.*) alt.*|a class=\\"popup-image\\" href\1>|;G;s/\n//;s|(.*/)([^>])|\1tn.\2|' file
Save the line in the hold space then alter the line to replicate the first attribute. Append the original line and insert the tn. into the file name.

Resources