Shellscript Read XML attribute value - bash

We want to read XML attributes from an XML file. Example of file content is as below:
<properties>
<property name="abc" value="15"/>
<property name="xyz" value="26"/>
</properties>
We want to read value (i.e. 15) for property "abc" using shell script.
Please suggest shell commands to achieve this.

You can use a proper XML parser like xmllint. If your version supports xpath, it will be very easy to grab specific values. If it doesn't support xpath, then you can use --shell option like so:
$ echo 'cat //properties/property[#name="abc"]/#value' | xmllint --shell myxml
/ > -------
value="15"
/ >
You can then use awk or sed to format and extract desired field from output.
$ echo 'cat //properties/property[#name="abc"]/#value' | xmllint --shell myxmlfile | awk -F'[="]' '!/>/{print $(NF-1)}'
15
You can use command substitution to capture the output in a variable by saying:
$ myvar=$(echo 'cat //properties/property[#name="abc"]/#value' | xmllint --shell myxml | awk -F'[="]' '!/>/{print $(NF-1)}')
$ echo "$myvar"
15
Using anything else other than a xmlparser is prone to errors and will break easy.

quick and dirty
sed -n '/<Properties>/,\|</properties>| {
s/ *<property name="xyz" value="\([^"]*\)"\/>/\1/p
}'
no xml check and based on your sample so assume same structure (one property name per line, ...)
posix version (--posix for GNU sed)

sed -n '/<property name="abc"/s/.*value="\(.*\)"[^\n]*/\1/p' file
Creates a hold pattern for the value then matches everything except for the newline to avoid printing the newline, it expects the value double quoted as per your example data.
E.g.
<properties>
<property name="abc" value="15"/>
<property name="xyz" value="26"/>
</properties>
Output:
15
(Prior to edit: sed '/<property name="abc"/s/.*value="\(.*\)"[^\n]*/\1/' file)

Related

Extract a string in linux shell script

Guys i have a string like this:
variable='<partyRoleId>12345</partyRoleId>'
what i want is to extract the value so the output is 12345.
Note the tag can be in any form:
<partyRoleId> or <ns1:partyRoleId>
any idea how to get the tag value using grep or sed only?
Use an XML parser to extract the value:
echo "$variable" | xmllint -xpath '*/text()' -
You probably should use it for the whole XML document instead of extracting a single line from it into a variable, anyway.
to use only grep, you need regexp to find first closing brackets and cut all digits:
echo '<partyRoleId>12345</partyRoleId>'|grep -Po ">\K\d*"
-P means PCRE
-o tells to grep to show only matched pattern
and special \K tells to grep cut off everything before this.

Bash change number to another value on specific line

i'm new with bash scripting , and i looking for solution to change a number to another value on specific line.
I have file named foo.config and in this file i have about 100 lines of configuration.
For example i have
<UpdateInterval>2</UpdateInterval>
and i need to find this line on foo.config and replace number(this can be number for 0 to 10 and for my example is 2) for 0 as always.
Like this :
<UpdateInterval>0</UpdateInterval>
How can i do it with sed ? please suggest
the part of lines:
<InstallUrl />
<TargetCulture>en</TargetCulture>
<ApplicationVersion>1.0.1.8</ApplicationVersion>
<AutoIncrementApplicationRevision>true</AutoIncrementApplicationRevision>
<UpdateEnabled>true</UpdateEnabled>
<UpdateInterval>2</UpdateInterval>
<UpdateIntervalUnits>hours</UpdateIntervalUnits>
<ProductName>xxxxxxxxxxxx</ProductName>
<PublisherName />
<SupportUrl />
<FriendlyName>xxxxxxxxxxxx</FriendlyName>
<OfficeApplicationDescription />
<LoadBehavior>3</LoadBehavior>
sed and others(grep, awk) never be a good tools for parsing xml/html data. Use a proper xml/html parsers, like xmlstarlet:
xmlstarlet ed -L -O -u "//UpdateInterval" -v 0 foo.config
ed - edit mode
-L - edit the file inplace
-O - omit xml declaration
-u - update action
"//UpdateInterval" - xpath expression
-v 0 - the new value of the element to be updated
The final (exemplary) foo.config contents:
<root>
<InstallUrl/>
<TargetCulture>en</TargetCulture>
<ApplicationVersion>1.0.1.8</ApplicationVersion>
<AutoIncrementApplicationRevision>true</AutoIncrementApplicationRevision>
<UpdateEnabled>true</UpdateEnabled>
<UpdateInterval>0</UpdateInterval>
<UpdateIntervalUnits>hours</UpdateIntervalUnits>
<ProductName>xxxxxxxxxxxx</ProductName>
<PublisherName/>
<SupportUrl/>
<FriendlyName>xxxxxxxxxxxx</FriendlyName>
<OfficeApplicationDescription/>
<LoadBehavior>3</LoadBehavior>
</root>
The <root> tag was specified for demonstration purpose, your xml/html structure should have its own "root"(most parent) tag
In a very simple way, you may try:
sed -E 's/^<UpdateInterval>[0-9]+/<UpdateInterval>0/' foo.config
This will search for <UpdateInterval> at the beginning of a line (note the ^) and then a number ([0-9] stands for a digit and + for a repetition of one or more). This bit will be replaced with <UpdateInterval>0. The / characters separate what you search and what will replace it. The s command is a search and replace.
It will take the file foo.config as input and you will get the output on standard output. If you want your output on the same file, you may do:
sed -E 's/^<UpdateInterval>[0-9]+/<UpdateInterval>0/' foo.config >foo.temp
mv foo.temp foo.config
Or more simply:
sed -i -E 's/^<UpdateInterval>[0-9]+/<UpdateInterval>0/' foo.config
Note that this is not a good way to do the substitution if your config file contains general XML. It will only work in the simplest of cases (but will do for your example.) If your XML bit may be in the middle of a line, remove the ^ character. The search and replace expression assumes that there is no whitespace around the XML tags.
A solution using an XML parsing tool:
{ echo '<root>'; cat foo.config; echo '</root>'; } |
xmlstarlet ed -O -P -u //UpdateInterval -v 0 |
sed '1d;$d' |
sponge foo.config
The first line is to make the config file into a proper XML file.
The second line updates the value.
The third line removes the root tags.
The last line rewrites the config file. Need to install the moreutils package.

Replacing a string on 2 lines

I would like to edit a file on the command line.
I need to modify a set of 2 lines in this file.
Here are the 2 lines:
<parameter name="mail.smtp.from">synapse.demo.0#gmail.com</parameter>
</transportSender>-->
Here is the result I would like to have:
<parameter name="mail.smtp.from">synapse.demo.0#gmail.com</parameter>
</transportSender>
You can use command line editors like vi. It comes with almost all Unix systems. You can edit the file entering into insert mode. You can refer the following link.
Basic vi commands
I suppose it's a sed related question more than a WSO2IOT question, isn't it?
You already removed comment tag start and I suppose the line of the comment close tag is not unique…
Is line number a constant?
you could get your line number:
sed -i "`grep -n -A1 synapse.demo.0#gmail.com file.xml|tail -n1 | cut --delimiter="-" -f1`s/-->//" file.xml
Perl
perl -lpe 's|</transportSender>-->|</transportSender>|' your-file Just Once
perl -lpe 's|</transportSender>-->|</transportSender>|g' your-file multiple
To save in-place
perl -i -lpe 's|</transportSender>-->|</transportSender>| your-file' find and edit and save all together

Bash script to find placeholders in file

I'm trying to write a bash script to find all placeholders in a file.
For example
I have following file:
<property name="sdfasdf" value="$ABC.D"></property>
<property name="sadf" value="$DFG.F.G"></property>
<property name="sadf" value="hello"></property>
<property name="ddd" value="$HJK"></property
and I would like to get these:
$ABC.D
$DFG.F.G
$HJK
I tried many options but without success.
Could someone help me?
Can grep for these values and placeholders and further grep to get the symbol names.
example
$ grep -o 'value="$.\+"' input.txt | grep -oE '\$(\w|\.)+'
$ABC.D
$DFG.F.G
$HJK
note: assumes there is only one placeholder value per line
details
o flag only prints the matches to the pattern
E flag for extended regex used to match either word or .
You can use sed:
sed -n 's/.*value="\($[\.a-zA-Z]*\)".*/\1/p' ./input.txt
where input.txt file contains your text.
Here we use a substitution group to only print the actual match (not the entire matching line).
sed -nr 's%(\$[A-Za-z][A-Za-z.]*)%\n\1\n%gp' test | grep '^\$[A-Za-z][A-Za-z.]*'
This is universal method to find placeholders, independent of whether or not there is only one placeholder per each line and whether or not there is "value=" context near it. All placeholders will be printed on STDOUT.

Extract text matching pattern X after having searched for pattern Y (bash)

In a bash script how would I be able to extract a text from an XML file that begins with abc ends with /abc which comes after a pattern that I need to look for?
Exemple of the input file:
<111>
<abc>
text
</abc>
<def>
text
</def>
</111>
<222>
<abc>
text to extract
</abc>
</222>
My goal would be to display "text to extract" indicating I'm looking for the pattern <222>.
your xml example doesn't have root element?
<111> <222> are not valid xml tag names
if you are not sure your xml format is fixed, don't use regex to parse it
xpath would be the way to go
assume the 111,222 tag named as t111, t222 and you had a root element.
xmllint --xpath "//t222/abc/text()" your.xml
This is really ugly and you really should use #Kent's answer, but if you really, really insist:
grep -A 999 "<222>" file.xml | grep -A1 "<abc>" | tail -n 1
It takes up to 999 lines after finding your pattern <222>, and then, from that, it takes the single line following <abc> and from that it takes the last line.
Using GNU awk for multi-char RS and gensub():
$ awk -v RS='^$' '{print gensub(/.*<222>.*<abc>\n(.*)\n<\/abc>.*/,"\\1","")}' file
text to extract

Resources