Bash Using Sed on a Variable - bash

I have an xml file which will always have the following block inside of it:
<Name NameType="Also Known As">
<NameValue>
<FirstName>name1</FirstName>
<Surname>sur1</Surname>
</NameValue>
<NameValue>
<FirstName>name2</FirstName>
<Surname>sur2</Surname>
</NameValue>
</Name>
There may be just one "NameValue" node inside this block or there may be many. My issue is that I only ever want to keep the first one and discard the rest. The example above would hence read:
<Name NameType="Also Known As">
<NameValue>
<FirstName>name1</FirstName>
<Surname>sur1</Surname>
</NameValue>
</Name>
I'd like to use sed for this purpose. I have written the above block to a variable and tried to manipulate that as follows:
var=$(sed -n '/<Name NameType="Also Known As">*/,/<\/Name>/p' $file)
sed -i 's/<\/NameValue>.*<\/Name>/<\/NameValue><\/Name>/g' "$var"
I get the following in return:
sed: can't read <Name NameType="Also Known As">...
I thought the above would have replaced everything between the first closing NameValue tag and the closing Name tag with nothing. I used the -i option as I want to save the changes to this file. Perhaps it's syntax related, or maybe my understanding of using the sed command on a variable is way off. A point to note is that these tags occur inside other nodes in the file and so doing one blanket update using sed would change blocks which i don't want to change. This is the reason I passed this block to a variable. Any pointer's in the right direction would be much appreciated.

As the previous comments point out, sed expects one or several input files (which may be something like stdin, of course). In your example code, you read the content of a file into a variable in a roundabout way, and then use that variable as the name of a file. That is bound to fail. While I'm sure you could achieve your objectives using sed, I do not advise it. Branching in sed is horrible. Use a more powerful tool, preferably xsltproc or something else that is designed to work with XML.
You could use a stylesheet similar to this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/Name">
<Name>
<xsl:copy-of select="#*" />
<xsl:copy-of select="*[1]" />
</Name>
</xsl:template>
</xsl:stylesheet>

Related

how to replace character in html attribute value (shell / bash)?

Sorry for the stupid question, but I have been stuck all afternoon with this simple problem. So I have a sample text file containing:
<product productId="123456" description="good apple, very green" publicPriceTTC="5,07" brand-id="152" />
<product productId="123457" description="fresh orange, very juicy" publicPriceTTC="12,47" brand-id="153" />
<product productId="123458" description="big banana, very yellow" publicPriceTTC="5,07" brand-id="154" />
And I'd like to modify this file into:
<product productId="123456" description="good apple, very green" publicPriceTTC="5.07" brand-id="152" />
<product productId="123457" description="fresh orange, very juicy" publicPriceTTC="12.47" brand-id="153" />
<product productId="123458" description="big banana, very yellow" publicPriceTTC="5.07" brand-id="154" />
Basically, I need to replace the "," (comma) by a "." (point) in all values of "publicPriceTTC". The trick here is that other attributes might have commas in their values ("description" in this example). I guess sed or awk can do that but I was unable to achieve it.
Can someone help me? Thank you very much for any help.
If you search for a comma to replace with a point, you will be doing a very coarse search/replace. Try something more especific. With sed, assume your input file is called xml:
sed -E 's/(publicPriceTTC="[0-9]+),([0-9]+")/\1.\2/' xml
You probably know that sed has the command s/<what you search>/<replacement>. We use that.
The -E option triggers the use of extended regular expressions. With that the s expression matches the whole tag + "=" + number within quotes, and uses the parenthesis to use the bit within them to be part of the substitution. \1 stands for the first bit between parenthesis block; \2 for the second.
You could of course make the search more robust to cope with whitespace between the tag and the equal sign and so on.
An awk solution to this might be:
awk '/<product/{for(i=1;i<=NF;i++){if($i~/^publicPriceTTC="/)sub(/,/,".",$i)}}1' file.xml
This steps through every whitespace-separated "field" on every <product>, looking for "words" that begin with the attribute you're trying to modify. If found, the entire attribute has its commas replaced with periods.
A simpler awk solution to emulate what others are doing with sed would be nice, except that awk does not support parenthesized subexpressions (i.e. \1 in your replacement string). Gawk supports them in the gensub() function, so the following might suffice:
gawk '{print gensub(/(publicPriceTTC="[0-9]+),/,"\\1.","g")}' file.xml
But ... you are solving the wrong problem here. Tools like sed and awk, which process files based on regular expressions, are not XML parsers. Either Javier's sed solution or my awk solutions could garble things accidentally, or miss certain things that are in perfectly valid XML files. Regex cannot be used to parse XML safely.
I recommend that you look into using python or perl or ruby or php or some other language with native XML support.
For example, turning your input into actual XML like this:
<p>
<product productId="123456" description="good apple, very green" publicPriceTTC="5,07" brand-id="152" />
<product productId="123457" description="fresh orange, very juicy" publicPriceTTC="12,47" brand-id="153" />
<product productId="123458" description="big banana, very yellow" publicPriceTTC="5,07" brand-id="154" />
</p>
We could run a PHP one-liner:
php -r '$x=new SimpleXMLElement(file_get_contents("file.xml")); foreach($x->product as $p) { $p["publicPriceTTC"]=str_replace(",",".",$p["publicPriceTTC"]); } print $x->asXML();'
Or split out for easier reading (and commenting):
<?php
// Read an XML file into an object
$x=new SimpleXMLElement(file_get_contents("file.xml"));
// Step through the object, fixing attributes as we find them
foreach($x->product as $p) {
$p["publicPriceTTC"] = str_replace(",",".",$p["publicPriceTTC"]);
}
// Print the result
print $x->asXML();
This will work on GNU
sed 's/\(publicPriceTTC="[0-9]*\),/\1./' fileName
Here using sub in awk is enough.
awk '{sub(/,/,".",$7)}1' file

replace multiple key value in one line with sed [duplicate]

Quick Summary: I need to create a Bash script to change the text within a node automatically every week. The script will match the node and replace the text inside them (if this is possible)? How would I do this?
Long Summary:
I host a Minecraft server which has shops, each of which have their own .xml file in the /ShowcaseStandalone/ffs-storage/ directory. Every Sunday my server restarts and executes several commands into the terminal to reset several things. One thing that I am trying to make change is one of the shops. I am wanting to change the text in the node <itemstack> and the text in the node <price>. I am simply wanting to take text from a .txt file in a different folder, and insert it into that node. The problem is, that the text in the node will change every week. Is there any way to replace a specific line or text within two nodes using bash?
XML file:
<?xml version="1.0" encoding="UTF-8"?>
<scs-shop usid="cac8480951254352116d5255e795006252d404d9" version="2" type="storage">
<enchantments type="string"/>
<owner type="string">Chadward27</owner>
<world type="string">Frisnuk</world>
<itemStack type="string">329:0</itemStack>
<activity type="string">BUY</activity>
<price type="double">55.0</price>
<locX type="double">487.5</locX>
<locY type="double">179.0</locY>
<locZ type="double">-1084.5</locZ>
<amount type="integer">0</amount>
<maxAmount type="integer">0</maxAmount>
<isUnlimited type="boolean">true</isUnlimited>
<nbt-storage usid="23dffac5fb2ea7cfdcf0740159e881026fde4fa4" version="2" type="storage"/>
</scs-shop>
Operating System: Linux Ubuntu 12.04
You can use xmlstarlet to edit a XML file in a shell like this :
xmlstarlet edit -L -u "/scs-shop/price[#type='double']" -v '99.66' file.xml
NOTE
"/scs-shop/price[#type='double']" is a Xpath expression
see xmlstarlet ed --help
The XML way is cool, but if you need to use normal bash tools, you can modify a line using sed. For instance:
PRICE=123
sed -i "s/\(<price.*>\)[^<>]*\(<\/price.*\)/\1$PRICE\2/" $XML_FILE_TO_MODIFY
This will replace the price with 123.
That sed command seems daunting, so let me break it down:
\(<price.*>\)[^<>]*\(<\/price.*\) is the pattern to match. \( ... \) are parenthesis for grouping. <price.*> matches the opening price tag. [^<>]* matches anything except angle brackets, and in this case will match the contents of the price tag. <\/price.* matches the end of the price tag. Forward slash is a delimiter in sed, so I escape it with a back slash.
\1$PRICE\2 is the text to replace the matched text with. \1 refers to the first matched parenthesis group, which is the opening price tag. $PRICE is the variable with the desired price in it. \2 refers to the second parenthesis group, in this case the closing tag.
I did not have the luxury of having xmlstarlet.
I found a solution though simply by doing an inline replacement;
template-parameter.xml
<ns:Parameter>
<ns:Name required="true">##-ParamName-##</ns:Name>
<ns:Value>
<ns:Text>##-ParamValue-##</ns:Text>
</ns:Value>
</ns:Parameter>
Snippet
tokenName="foo"
tokenValue="bar"
#Replace placeholders in parameter template element
myParamElement=$(cat template-parameter.xml)
myParamElement=${myParamElement//##-ParamName-##/$tokenName}
myParamElement=${myParamElement//##-ParamValue-##/$tokenValue}
Result
<ns:Parameter>
<ns:Name required="true">foo</ns:Name>
<ns:Value>
<ns:Text>bar</ns:Text>
</ns:Value>
</ns:Parameter>

Find and replace in file with script

I want to find and replace the VALUE into a xml file :
<test name="NAME" value="VALUE"/>
I have to filter by name (because there are lot of lines like that).
Is it possible ?
Thanks for you help.
Since you tagged the question "bash", I assume that you're not trying to use an XML library (although I think an XML expert might be able to give you something like an XSLT processor command that solves this question very robustly), but that you're simply interested in doing search & replace from the commandline.
I am using perl for this:
perl -pi -e 's#VALUE#replacement#g' *.xml
See perlrun man page: Very shortly put, the -p switches perl into text processing mode, -i stands for "in-place", and -e let's you specify an expression to apply to all lines of input.
Also note (if you are not too familiar with that already) that you may use other characters than # (common ones are %, a comma, etc.) that don't clash with your search & replacement strings.
There is one small caveat: perl will read & write all files given on the commandline, even those that did not change. Thus, the files' modification times will be updated even if they did not change. (I usually work around that with some more shell magic, e.g. using grep -l or grin -l to select files for perl to work on.)
EDIT: If I understand your comments correctly, you also need help with the regular expression to apply. Let me briefly suggest something like this then:
perl -pi -e 's,(name="NAME" value=)"[^"]*",\1"NEWVALUE",g' *.xml
Related: bash XHTML parsing using xpath
You can use SED:
SED 's/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/' test.xml
where test.xml is the xml document containing the given node. This is very fragile, and you can work to make it more flexible if you need to do this substitution multiple times. For instance, the current statement is case sensitive, so it won't substitute the value on a node with the name="name", but you can add a case insensitivity flag to the end of the statement, like so:
('s/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/I').
Another option would be to use XSLT, but it would require you to download an external library. It's pretty versatile, and could be a viable option for more complex modifications to an XML document.

Using sed to write xml tag in Rakefile

I have trying to pre-process a xml-like file by using Rakefile, what I am trying to do is adding a group of xml tags.
The following sed is the short version of what I have done
sed -ig '/TARGET_STRING/{N;G;s/$/<key>KEY_NAME<\/key>/g;}' whateverfile.xml
and this piece of code is worked beautifully and successes while using terminal.
And I put them into the Rakefile I made, like this:
desc 'setup pods archs'
task :setup_podsarchs => :setup_submodules do
puts 'Altering xml...'.cyan
`sed -ig '/TARGET_STRING/{N;G;s/$/<key>KEY_NAME<\/key>/g;}' whateverfile.xml`
end
After executing rake, it prompt an error and terminate the task
sed: 1: "/TARGET_STRING/{N;G;s/$/ ...": bad flag in substitute command: 'k'
I had searching around for a long time, cannot find any information about escaping the < and > characters in Ruby.
My platform
OS: Mac OS X 10.9
Ruby: 2.0.0p247
rake: 0.9.6
sed: 7
Update
Hi, thank you guys for the extremely fast reply.
and #the Tin Man,
for the comment,
What I am trying to do is pre-process the Xcode project file (.pbxproj), which is structured as a xml,
For simplicity, I just show the example of xml structure here:
<dict>
<key>Key_ONE</key>
<string>1</string>
</dict>
What I am trying to do is finding the KEY_ONE and adding another key after that:
<dict>
<key>Key_ONE</key>
<string>1</string>
<key>Key_TWO</key>
<string>2</string>
</dict>
Using regular expressions for anything beyond the most trivial and controlled parsing leads to madness. Use Nokogiri, an excellent Ruby XML/HTML parser. For instance:
require 'nokogiri'
xml = <<EOT
<xml>
<foo>foo</foo>
<bar>bar</bar>
</xml>
EOT
doc = Nokogiri::XML(xml)
doc.at('foo').content = 'bar'
doc.at('bar')['class'] = 'cyan'
puts doc.to_xml
Which outputs:
<?xml version="1.0"?>
<xml>
<foo>bar</foo>
<bar class="cyan">bar</bar>
</xml>
Notice the content inside the <foo> tag changed, along with <bar> gaining an attribute.
What's important about using a parser is that the content can change, tag parameters can change, their order can move around inside the tag, tags can be split across multiple lines, and a parser will not care, whereas a regular expression will spout flames and stop working.
parser is better solution.
for sed point of view try with
`sed -ig '/TARGET_STRING/{
N
s/$/\
<key>KEY_NAME<\/key>/
}' whateverfile.xml`
G is normaly not needed because nothing is loaded into
g is also not needed, there is onle 1 replacement possible

Batch to read partial xml file

I'm trying to make a batch file to pull data out of a file and set it as a variable.
The tricky part is I need to read a XML file, and I only need the data between the quotes of the following line...
narrative="I only need this text here"
The text in that line can also contain spaces, brackets, slashes, dashes and colons.
SAMPLE XML FILE :
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<cadcall>
<call callnumber="123456" jurisdiction="abcd" department="dept 1" complaint="cost" priority="M" calltencode="" callername="Persons Name" phonenumber="Cell Number HERE" narrative="[10/02/2012 14:56:27 : pos9 : PERSON] Fairly long narrative here describing issue of dispatch, but sometimes these can be short." alarmtype="" ocanumber="0000000000" disposition1="TRAN" />
</cadcall>
The proper tool to do this is xmllint from libxml, please, provide a more complete XML example, I will tell you how to use a Xpath request on your XML.
EDIT :
here a solution using Xpath (with a little hack : contains) :
xmllint --xpath '/cadcall/call/#narrative[contains(.,'.')]' file.xml
without seeing the complete input, just based on your example line. grep works for you.
kent$ echo 'narrative="I only need this text here"'|grep -Po '(?<=narrative=")[^"]*'
I only need this text here

Resources