How to get XML uncommented section using sed/awk - bash

I have an XML file in my linux box & I want to read the lines which are not commented.
Example :
Input File
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is an example
don't read it while working -->
<ccb>
<ccc>
<aaa>true</aaa>
<bbb>name_1</bbb>
<Port>1534</Port>
<datPort>1532</datPort>
<!--
<e214>
<ImsiPrefixLen>5</ImsiPrefixLen>
<LocalPrefix>97252</LocalPrefix>
</e214>
-->
</ccc>
</ccb>
Output file:
<?xml version="1.0" encoding="UTF-8"?>
<ccb>
<ccc>
<aaa>true</aaa>
<bbb>name_1</bbb>
<Port>1534</Port>
<datPort>1532</datPort>
</ccc>
</ccb>

Note that in XML a comment starts with <!-- and ends with -->; It can't contain --.
perl -pe 'BEGIN{undef$/}s/<!--.*?-->//gs' <<END
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is an example
don't read it while working -->
<ccb>
<ccc>
<aaa>true</aaa>
<bbb>name_1</bbb>
<Port>1534</Port>
<datPort>1532</datPort>
<!--
<e214>
<ImsiPrefixLen>5</ImsiPrefixLen>
<LocalPrefix>97252</LocalPrefix>
</e214>
-->
</ccc>
</ccb>
END
Explanation
perl -h
-p : assume loop like -n but print line also, like sed
BEGIN block executed once at beginning to unset the input record separator ($/) because of multiline matching <!-- -->
s/// : substitute function (/ can be replaced by any other character)
<!--.*?--> : .* any string ? lazy modifier to get the shortest match
s : modifier so that . matches also newline character

Related

Sed - Conditionally match and add additional string after the find

Let's say I have a line like this in a file "config.xml"
<widget android-packageName="com.myproject" android-versionCode="12334" ios-CFBundleIdentifier="com.myproject" ios-CFBundleVersion="12334" version="1.5.2" versionCode="1.5.2" xmlns="http://www.w3.org/ns/widgets" xmlns:android="http://schemas.android.com/apk/res/android" xmlns:cdv="http://cordova.apache.org/ns/1.0">
And I want to use a line of command in sed to change it into this, which is adding ".1" after the current version numbers:
<widget android-packageName="com.myproject" android-versionCode="12334" ios-CFBundleIdentifier="com.myproject" ios-CFBundleVersion="12334" version="1.5.2.1" versionCode="1.5.2.1" xmlns="http://www.w3.org/ns/widgets" xmlns:android="http://schemas.android.com/apk/res/android" xmlns:cdv="http://cordova.apache.org/ns/1.0">
Assuming the version number could change, which means I would likely need to match it as a string between "version="" and """ first then add something after. How should I achieve that?
Attempted code that was (wrongly) shown in the form of an answer:
sed -i '' -e 's/\" versionCode=\"/\.1\" versionCode=\"/g' config.xml
sed -i '' -e 's/\" xmlns=\"/\.1\" xmlns=\"/g' config.xml
You may use this sed to append .1 in version number of any field name starting with version:
sed -i.bak -E 's/( version[^=]*="[.0-9]+)/\1.1/g' file
Output:
<widget android-packageName="com.myproject" android-versionCode="12334.1" ios-CFBundleIdentifier="com.myproject" ios-CFBundleVersion="12334" version="1.5.2.1" versionCode="1.5.2.1" xmlns="http://www.w3.org/ns/widgets" xmlns:android="http://schemas.android.com/apk/res/android" xmlns:cdv="http://cordova.apache.org/ns/1.0">
Breakup:
(: Start capture group
version: natch text version
[^=]*: match 0 or more of any character that is not =
=: match a =
": match a "
[.0-9]+: match 1+ of any character that are digits or dot
): End capture group

Unable to associate or grouping each set of xml attributes in bash script

I have following format xml which has multiple occurrences of same attributes ( name , code and format ).
<?xml version="1.0" encoding="UTF-8"?>
<config>
<input>
<pattern>
<name>ABC</name>
<code>1234</code>
<format>txt</format>
</pattern>
</input>
<input>
<pattern>
<name>XYZ</name>
<code>7799</code>
<format>csv</format>
</pattern>
</input>
</config>
I want to parse each of these patterns and construct string like : ABC-1234-txt , XYZ-7799-csv etc... and add this to an array. The idea here is to group each pattern by constructing the string which will further be used.
I have tried below command but unable to maintain the grouping :
awk -F'</?name>|</?code>|</?format>' ' { print $2 } ' sample.xml
It simply prints available values of these attributes in xml. As I am not an expert in bash so can anyone please suggest me how to group each pattern in above mentioned format in string.
With bash and xmlstarlet:
mapfile -t array < <(
xmlstarlet select \
--text --template --match '//config/input/pattern' \
--value-of "concat(name,'-',code,'-',format)" -n file.xml
)
declare -p array
Output:
declare -a array=([0]="ABC-1234-txt" [1]="XYZ-7799-csv")
See: help mapfile and xmlstarlet select
with xslt:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:strip-space elements="*"/>
<xsl:template match="pattern">
<xsl:value-of select="concat(name,'-',code,'-',format,'
')"/>
</xsl:template>
</xsl:stylesheet>
Apply the transform via xsltproc:
$ xsltproc example.xslt sample.xml
ABC-1234-txt
XYZ-7799-csv
Populate array with xslt output:
$ declare -a my_array
$ my_array=($(xsltproc example.xslt sample.xml))
$ echo "${my_array[#]}"
ABC-1234-txt XYZ-7799-csv
$ echo "${my_array[1]}"
XYZ-7799-csv

Search and delete matches of patterns array

I made an array of filenames of files in which match an pattern:
lista=($(grep -El "<LastVisitedURL>.+</LastVisitedURL>.*<FavoriteTopic>0</FavoriteTopic>" *))
Now I would delete in a file index.xml all tags enclosure which contains the filenames in the array.
for e in ${lista[*]}
do
sed '/\<TopicKey FileName=\"$e\"\>.*\<\/TopicKey\>/d' index.xml
done
The complete script is:
#! /bin/bash
#search xml files watched and no favorites.
lista=($(grep -El "<LastVisitedURL>.+</LastVisitedURL>.*<FavoriteTopic>0</FavoriteTopic>" *))
#declare -p lista
for e in ${lista[*]}
do
sed '/<TopicKey FileName=\"$e\">.*<\/TopicKey>/d' index.xml
done
Even though the regex pattern doesn't work, -i option in sed for edit in place index.xml, reload index file many times how filenames have the array, and this is bad.
Any suggestions?
Here an example using xmlstarlet in a shell :
% cat file.xml
<?xml version="1.0"?>
<root>
<foobar>aaa</foobar>
<LastVisitedURL>http://foo.bar/?a=1</LastVisitedURL>
<LastVisitedURL>http://foo.bar/?a=2</LastVisitedURL>
<LastVisitedURL>http://foo.bar/?a=3</LastVisitedURL>
</root>
Then, the command line :
% xmlstarlet edit --delete '//LastVisitedURL' file.xml
<?xml version="1.0"?>
<root>
<foobar>aaa</foobar>
</root>

unix bash remove xml tag if value is X

i have an xml which looks like this:
<b>
hTTp://test.com:7001
</b>
<b>
hTTp://anothertest:7001
</b>
I'd like to remove the whole tag if the value is test
i tried to do something like:
sed -i -e "/<b> /, ${SERVER_ADDRESS}/ <\/b>/d"
after running the command the xml should look like this:
<b>
hTTp://anothertest:7001
</b>
Use xmlstarlet.
$ cat in.txt
<root>
<b>foo</b>
<b>bar</b>
</root>
$ xmlstarlet ed -d '//b[contains(text(), "foo")]' < in.txt
<?xml version="1.0"?>
<root>
<b>bar</b>
</root>

Iterate through XML with xmlstarlet

I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<test-report>
<testsuite>
<test name="RegisterConnection1Tests">
<testcase name="testRregisterConnection001"></testcase>
<testcase name="testRegisterConnection002"></testcase>
</test>
<test name="RegisterConnection2Tests">
<testcase name="testRregisterConnection001"></testcase>
<testcase name="testRegisterConnection002"></testcase>
</test>
</testsuite>
</test-report>
And I want the output:
RegisterConnection1Tests,testRregisterConnection001
RegisterConnection1Tests,testRregisterConnection002
RegisterConnection2Tests,testRregisterConnection001
RegisterConnection2Tests,testRregisterConnection002
I'm confused as to how to show the children as I expected
xmlstarlet sel -t -m 'test-report/testsuite/test' -v '#name' -v '//testcase/#name' -n $1 to work, though it only inputs:
RegisterConnection1TeststestRregisterConnection001
RegisterConnection2TeststestRregisterConnection001
To add the missing comma you can add another -v "','"
In your second column you are selecting with an absolute xpath expression from the root element and not from the element matched by the template, the double slashes are wrong. Since you want one line per testcase I would iterate over the testcase elements and then add the name attribute of the parent element like this:
xmlstarlet sel -t -m 'test-report/testsuite/test/testcase' -v '../#name' -v "','" -v '#name' -n $1

Resources