Batch to read partial xml file - shell

I'm trying to make a batch file to pull data out of a file and set it as a variable.
The tricky part is I need to read a XML file, and I only need the data between the quotes of the following line...
narrative="I only need this text here"
The text in that line can also contain spaces, brackets, slashes, dashes and colons.
SAMPLE XML FILE :
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<cadcall>
<call callnumber="123456" jurisdiction="abcd" department="dept 1" complaint="cost" priority="M" calltencode="" callername="Persons Name" phonenumber="Cell Number HERE" narrative="[10/02/2012 14:56:27 : pos9 : PERSON] Fairly long narrative here describing issue of dispatch, but sometimes these can be short." alarmtype="" ocanumber="0000000000" disposition1="TRAN" />
</cadcall>

The proper tool to do this is xmllint from libxml, please, provide a more complete XML example, I will tell you how to use a Xpath request on your XML.
EDIT :
here a solution using Xpath (with a little hack : contains) :
xmllint --xpath '/cadcall/call/#narrative[contains(.,'.')]' file.xml

without seeing the complete input, just based on your example line. grep works for you.
kent$ echo 'narrative="I only need this text here"'|grep -Po '(?<=narrative=")[^"]*'
I only need this text here

Related

SED Ensure XML insert not inside comment

I am working with stock RHEL7/8 tools, and writing a script that will add a piece to a config file that is formatted as XML. I have run into a case where my sed statement can insert the added text inside a comment.
My current sed command gets the last existence of the tag <Program> and inserts the new tag after its closing tag </Program>.
How can I account for this possibly, but not always being inside a comment?
My script:
sed -i '0,/<Program id/s// <Program id=\"myProgram\"> <\/Program>' filepath
XML Example (displays the error inserting inside comment):
<Program id="myProgram"></Program>
<!--
<Program id="commentedOutProgram"></Program>
<Program id="newlyAddedProgram"><Program>
-->
EDIT:
This is happening at install time. I would like to add a way for some RHEL 7/8 built in tool to look in the XML file, make sure it's not in a comment, and add the new contents
Have a go with this. The usual caveats apply: It probably only works for exactly the sample you provided. Use a proper XML tool if you need a robust solution.
sed -e '/<!--/,/-->/b' \
-e '0,\%<Program id="[^"]*"></Program>%s%<Program id="myProgram"> </Program>%' filepath
Your original script seemed to have several errors, so I couldn't copy it verbatim, but this should at least give you an idea of how to modify it: add a b to skip any lines between <!-- and -->.
The % separators are just to avoid having to backslash slashes; sed allows you to use any separator you like instead of a slash, you just have to backslash the first one.
The b command jumps to a label; if the label is not specified, it jumps to the end of the script, i.e. skips the substitution part and starts over with the next line. The address expression before b selects any comment region, i.e. any lines between a line matching <!-- and a line matching -->.

how to replace character in html attribute value (shell / bash)?

Sorry for the stupid question, but I have been stuck all afternoon with this simple problem. So I have a sample text file containing:
<product productId="123456" description="good apple, very green" publicPriceTTC="5,07" brand-id="152" />
<product productId="123457" description="fresh orange, very juicy" publicPriceTTC="12,47" brand-id="153" />
<product productId="123458" description="big banana, very yellow" publicPriceTTC="5,07" brand-id="154" />
And I'd like to modify this file into:
<product productId="123456" description="good apple, very green" publicPriceTTC="5.07" brand-id="152" />
<product productId="123457" description="fresh orange, very juicy" publicPriceTTC="12.47" brand-id="153" />
<product productId="123458" description="big banana, very yellow" publicPriceTTC="5.07" brand-id="154" />
Basically, I need to replace the "," (comma) by a "." (point) in all values of "publicPriceTTC". The trick here is that other attributes might have commas in their values ("description" in this example). I guess sed or awk can do that but I was unable to achieve it.
Can someone help me? Thank you very much for any help.
If you search for a comma to replace with a point, you will be doing a very coarse search/replace. Try something more especific. With sed, assume your input file is called xml:
sed -E 's/(publicPriceTTC="[0-9]+),([0-9]+")/\1.\2/' xml
You probably know that sed has the command s/<what you search>/<replacement>. We use that.
The -E option triggers the use of extended regular expressions. With that the s expression matches the whole tag + "=" + number within quotes, and uses the parenthesis to use the bit within them to be part of the substitution. \1 stands for the first bit between parenthesis block; \2 for the second.
You could of course make the search more robust to cope with whitespace between the tag and the equal sign and so on.
An awk solution to this might be:
awk '/<product/{for(i=1;i<=NF;i++){if($i~/^publicPriceTTC="/)sub(/,/,".",$i)}}1' file.xml
This steps through every whitespace-separated "field" on every <product>, looking for "words" that begin with the attribute you're trying to modify. If found, the entire attribute has its commas replaced with periods.
A simpler awk solution to emulate what others are doing with sed would be nice, except that awk does not support parenthesized subexpressions (i.e. \1 in your replacement string). Gawk supports them in the gensub() function, so the following might suffice:
gawk '{print gensub(/(publicPriceTTC="[0-9]+),/,"\\1.","g")}' file.xml
But ... you are solving the wrong problem here. Tools like sed and awk, which process files based on regular expressions, are not XML parsers. Either Javier's sed solution or my awk solutions could garble things accidentally, or miss certain things that are in perfectly valid XML files. Regex cannot be used to parse XML safely.
I recommend that you look into using python or perl or ruby or php or some other language with native XML support.
For example, turning your input into actual XML like this:
<p>
<product productId="123456" description="good apple, very green" publicPriceTTC="5,07" brand-id="152" />
<product productId="123457" description="fresh orange, very juicy" publicPriceTTC="12,47" brand-id="153" />
<product productId="123458" description="big banana, very yellow" publicPriceTTC="5,07" brand-id="154" />
</p>
We could run a PHP one-liner:
php -r '$x=new SimpleXMLElement(file_get_contents("file.xml")); foreach($x->product as $p) { $p["publicPriceTTC"]=str_replace(",",".",$p["publicPriceTTC"]); } print $x->asXML();'
Or split out for easier reading (and commenting):
<?php
// Read an XML file into an object
$x=new SimpleXMLElement(file_get_contents("file.xml"));
// Step through the object, fixing attributes as we find them
foreach($x->product as $p) {
$p["publicPriceTTC"] = str_replace(",",".",$p["publicPriceTTC"]);
}
// Print the result
print $x->asXML();
This will work on GNU
sed 's/\(publicPriceTTC="[0-9]*\),/\1./' fileName
Here using sub in awk is enough.
awk '{sub(/,/,".",$7)}1' file

replace multiple key value in one line with sed [duplicate]

Quick Summary: I need to create a Bash script to change the text within a node automatically every week. The script will match the node and replace the text inside them (if this is possible)? How would I do this?
Long Summary:
I host a Minecraft server which has shops, each of which have their own .xml file in the /ShowcaseStandalone/ffs-storage/ directory. Every Sunday my server restarts and executes several commands into the terminal to reset several things. One thing that I am trying to make change is one of the shops. I am wanting to change the text in the node <itemstack> and the text in the node <price>. I am simply wanting to take text from a .txt file in a different folder, and insert it into that node. The problem is, that the text in the node will change every week. Is there any way to replace a specific line or text within two nodes using bash?
XML file:
<?xml version="1.0" encoding="UTF-8"?>
<scs-shop usid="cac8480951254352116d5255e795006252d404d9" version="2" type="storage">
<enchantments type="string"/>
<owner type="string">Chadward27</owner>
<world type="string">Frisnuk</world>
<itemStack type="string">329:0</itemStack>
<activity type="string">BUY</activity>
<price type="double">55.0</price>
<locX type="double">487.5</locX>
<locY type="double">179.0</locY>
<locZ type="double">-1084.5</locZ>
<amount type="integer">0</amount>
<maxAmount type="integer">0</maxAmount>
<isUnlimited type="boolean">true</isUnlimited>
<nbt-storage usid="23dffac5fb2ea7cfdcf0740159e881026fde4fa4" version="2" type="storage"/>
</scs-shop>
Operating System: Linux Ubuntu 12.04
You can use xmlstarlet to edit a XML file in a shell like this :
xmlstarlet edit -L -u "/scs-shop/price[#type='double']" -v '99.66' file.xml
NOTE
"/scs-shop/price[#type='double']" is a Xpath expression
see xmlstarlet ed --help
The XML way is cool, but if you need to use normal bash tools, you can modify a line using sed. For instance:
PRICE=123
sed -i "s/\(<price.*>\)[^<>]*\(<\/price.*\)/\1$PRICE\2/" $XML_FILE_TO_MODIFY
This will replace the price with 123.
That sed command seems daunting, so let me break it down:
\(<price.*>\)[^<>]*\(<\/price.*\) is the pattern to match. \( ... \) are parenthesis for grouping. <price.*> matches the opening price tag. [^<>]* matches anything except angle brackets, and in this case will match the contents of the price tag. <\/price.* matches the end of the price tag. Forward slash is a delimiter in sed, so I escape it with a back slash.
\1$PRICE\2 is the text to replace the matched text with. \1 refers to the first matched parenthesis group, which is the opening price tag. $PRICE is the variable with the desired price in it. \2 refers to the second parenthesis group, in this case the closing tag.
I did not have the luxury of having xmlstarlet.
I found a solution though simply by doing an inline replacement;
template-parameter.xml
<ns:Parameter>
<ns:Name required="true">##-ParamName-##</ns:Name>
<ns:Value>
<ns:Text>##-ParamValue-##</ns:Text>
</ns:Value>
</ns:Parameter>
Snippet
tokenName="foo"
tokenValue="bar"
#Replace placeholders in parameter template element
myParamElement=$(cat template-parameter.xml)
myParamElement=${myParamElement//##-ParamName-##/$tokenName}
myParamElement=${myParamElement//##-ParamValue-##/$tokenValue}
Result
<ns:Parameter>
<ns:Name required="true">foo</ns:Name>
<ns:Value>
<ns:Text>bar</ns:Text>
</ns:Value>
</ns:Parameter>

Bash Using Sed on a Variable

I have an xml file which will always have the following block inside of it:
<Name NameType="Also Known As">
<NameValue>
<FirstName>name1</FirstName>
<Surname>sur1</Surname>
</NameValue>
<NameValue>
<FirstName>name2</FirstName>
<Surname>sur2</Surname>
</NameValue>
</Name>
There may be just one "NameValue" node inside this block or there may be many. My issue is that I only ever want to keep the first one and discard the rest. The example above would hence read:
<Name NameType="Also Known As">
<NameValue>
<FirstName>name1</FirstName>
<Surname>sur1</Surname>
</NameValue>
</Name>
I'd like to use sed for this purpose. I have written the above block to a variable and tried to manipulate that as follows:
var=$(sed -n '/<Name NameType="Also Known As">*/,/<\/Name>/p' $file)
sed -i 's/<\/NameValue>.*<\/Name>/<\/NameValue><\/Name>/g' "$var"
I get the following in return:
sed: can't read <Name NameType="Also Known As">...
I thought the above would have replaced everything between the first closing NameValue tag and the closing Name tag with nothing. I used the -i option as I want to save the changes to this file. Perhaps it's syntax related, or maybe my understanding of using the sed command on a variable is way off. A point to note is that these tags occur inside other nodes in the file and so doing one blanket update using sed would change blocks which i don't want to change. This is the reason I passed this block to a variable. Any pointer's in the right direction would be much appreciated.
As the previous comments point out, sed expects one or several input files (which may be something like stdin, of course). In your example code, you read the content of a file into a variable in a roundabout way, and then use that variable as the name of a file. That is bound to fail. While I'm sure you could achieve your objectives using sed, I do not advise it. Branching in sed is horrible. Use a more powerful tool, preferably xsltproc or something else that is designed to work with XML.
You could use a stylesheet similar to this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/Name">
<Name>
<xsl:copy-of select="#*" />
<xsl:copy-of select="*[1]" />
</Name>
</xsl:template>
</xsl:stylesheet>

processing xml files with bash scripting

I have an xml file which has the following structure that contains numerous <Episodes></Episodes> to which the structure looks like this:
<Episode>
<id>4195462</id>
<Combined_episodenumber>8</Combined_episodenumber>
<Combined_season>2</Combined_season>
<DVD_chapter></DVD_chapter>
<DVD_discid></DVD_discid>
<DVD_episodenumber></DVD_episodenumber>
<DVD_season></DVD_season>
<Director>Jay Karas</Director>
<EpImgFlag>2</EpImgFlag>
<EpisodeName>Karl's Wedding</EpisodeName>
<EpisodeNumber>8</EpisodeNumber>
<FirstAired>2011-11-08</FirstAired>
<GuestStars>Katee Sackhoff|Carla Gallo</GuestStars>
<IMDB_ID></IMDB_ID>
<Language>en</Language>
<Overview>Karl Hevacheck, aka the Human Genius, gets married.</Overview>
<ProductionCode>209</ProductionCode>
<Rating>7.6</Rating>
<RatingCount>20</RatingCount>
<SeasonNumber>2</SeasonNumber>
<Writer>Kevin Etten</Writer>
<absolute_number></absolute_number>
<filename>episodes/211751/4195462.jpg</filename>
<lastupdated>1362547148</lastupdated>
<seasonid>471254</seasonid>
<seriesid>211751</seriesid>
</Episode>
I've figured out how to pull the information between a single tag like so
value=$(grep -m 1 "<Rating>" path_to_file | sed 's/<.*>\(.*\)<\/.*>/\1/')
but I can't find a way to verify that I am looking at the correct episode ie. to check If this is the correct branch which is for <Combined_season>2</Combined_season> <EpisodeNumber>8</EpisodeNumber> before saving the values for specific attributes. I know this can somehow be done using a combination of sed and awk but can't seem to figure it out anyhelp on how I can do this would be greatly appreciated.
Use a proper XML parser not sed or awk. You can still call your XML parser from your bash script just like you would with sed or awk. It's a bad idea to use sed or awk because XML is a structured file, sed and awk typical work with line oriented files. You will just give yourself a headache by using the wrong tool for the job. I suggest using a dedicated tools or a language such a php, python or perl (or any other language not starting with p) that has libraries for parsing XML.

Resources