Ruby edit and save XML file - ruby

I have written some code to transform the name of the audio files (appending 'XX' to the name on this example). The code is working but I don't find a way to save it as a new xml file. I did try all the solutions I found on the forums but still not working;(
This is a piece of my xml file :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="4">
<sequence authoringapp="Storyboard Pro"
projectpath="/Users/rougnaux/Desktop/XML_TEST/man_142_test/man_142_test.sboard"
id="man_142_test">
<media>
<audio>
<track>
<clipitem id="MAN_142_RADIOPLAY - SCRATCH_ESCAPING THE MANOR_20200929">
<file id="MAN_142_RADIOPLAY - SCRATCH_ESCAPING THE MANOR_20200929 1">
<name>MAN_142_RADIOPLAY - SCRATCH_ESCAPING THE MANOR_20200929.wav</name>
</file>
</clipitem>
</track>
</audio>
</media>
<ismasterclip>FALSE</ismasterclip>
</sequence>
</xmeml>
this is my code:
require 'nokogiri'
file = File.read('man_142_test.xml')
document = Nokogiri::XML(file)
document.xpath(("//sequence/media/audio/track/clipitem/file/name")).map {|name|
name.children.text }.map!{ |name| 'XX'+name}
File.open("new_test.xml", "w") do |f|
f.write document.to_xml
end
When I check in irb with puts document, I have this output
["XXMAN_142_RADIOPLAY - SCRATCH_ESCAPING THE MANOR_20200929.wav", "XXchimes 02_1.mp3"]
On the 'new_test.xml' file, the name of the audio tracks are not changed. What is missing here?!

Try the below:
require 'nokogiri'
file = File.read('man_142_test.xml')
document = Nokogiri::XML(file)
document.xpath(("//sequence/media/audio/track/clipitem/file/name")).map do |name|
name.content = 'XX' + name
end
File.open("new_test.xml", "w") do |f|
f.write document.to_xml
end

For those who are facing same kind of problem, here is the way to do it:
require 'nokogiri'
file = File.read('man_142_test.xml')
document = Nokogiri::XML(file)
document.search("//sequence/media/audio/track/clipitem/file/name").each do |node|
node.content = "XX_#{node.content}"
end
File.open("new_test.xml", "w") do |f|
f.write document.to_xml
end
I think xpath and map were the wrong approaches to modify the xml, even though I don't understand yet why I saw them in lots of solutions about editing xml. If someone can give more details about that, I think it could help lots of people like me!

Related

How to replace XML node contents using Nokogiri

I'm using Ruby to read an XML document and update a single node, if it exists, with a new value.
http://www.nokogiri.org/tutorials/modifying_an_html_xml_document.html
is not obvious to me how to change the node data, let alone how to save it back to the file.
def ammend_parent_xml(folder, target_file, new_file)
# open parent XML file that contains file reference
get_xml_files = Dir.glob("#{#target_folder}/#{folder}/*.xml").sort.select {|f| !File.directory? f}
get_xml_files.each { |xml|
f = File.open(xml)
# Use Nokgiri to read the file into an XML object
doc = Nokogiri::XML(f)
filename = doc.xpath('//Route//To//Node//FileName')
filename.each_with_index {
|fl, i|
if target_file == fl.text
# we found the file, now rename it to new_file
# ???????
end
}
}
end
This is some example XML:
<?xml version="1.0" encoding="utf-8">
<my_id>123</my_id>
<Route>
<To>
<Node>
<Filename>file1.txt</Filename>
<Filename>file2.mp3</Filename>
<Filename>file3.doc</Filename>
<Filename>file4.php</Filename>
<Filename>file5.jpg</Filename>
</Node>
</To>
</Route>
</xml>
I want to change "file3.doc" to "file3_new.html".
I would call:
def ammend_parent_xml("folder_location", "file3.doc", "file3_new.html")
To change an element in the XML:
#doc = Nokogiri::XML::DocumentFragment.parse <<-EOXML
<body>
<h1>OLD_CONTENT</h1>
<div>blah</div>
</body>
EOXML
h1 = #doc.at_xpath "body/h1"
h1.content = "NEW_CONTENT"
puts #doc.to_xml #h1 will be NEW_CONTENT
To save the XML:
file = File.new("xml_file.xml", "wb")
file.write(#doc)
file.close
There's a few things wrong with your sample XML.
There are two root elements my_id and Route
There is a missing ? in the first tag
Do you need the last line </xml>?
After fixing the sample I was able to get the element by using the example by Phrogz:
element = #doc.xpath("Route//To//Node//Filename[.='#{target_file}']").first
Note .first since it will return a NodeSet.
Then I would update the content with:
element.content = "foobar"
def amend_parent_xml(folder, target_file, new_file)
Dir["#{#target_folder}/#{folder}/*.xml"]
.sort.select{|f| !File.directory? f }
.each do |xml_file|
doc = Nokogiri.XML( File.read(xml_file) )
if file = doc.at("//Route//To//Node//Filename[.='#{target_file}']")
file.content = new_file # set the text of the node
File.open(xml_file,'w'){ |f| f<<doc }
break
end
end
end
Improvements:
Use File.read instead of File.open so that you don't leave a file handle open.
Uses an XPath expression to find the SINGLE matching node by looking for a node with the correct text value.
Alternatively you could find all the files and then if file=files.find{ |f| f.text==target_file }
Shows how to serialize a Nokogiri::XML::Document back to disk.
Breaks out of processing the files as soon as it finds a matching XML file.

Building blank XML tags with Nokogiri?

I'm trying to build up an XML document using Nokogiri. Everything is pretty standard so far; most of my code just looks something like:
builder = Nokogiri::XML::Builder.new do |xml|
...
xml.Tag1(object.attribute_1)
xml.Tag2(object.attribute_2)
xml.Tag3(object.attribute_3)
xml.Tag4(nil)
end
builder.to_xml
However, that results in a tag like <Tag4/> instead of <Tag4></Tag4>, which is what my end user has specified that the output needs to be.
How do I tell Nokogiri to put full tags around a nil value?
SaveOptions::NO_EMPTY_TAGS will get you what you want.
require 'nokogiri'
builder = Nokogiri::XML::Builder.new do |xml|
xml.blah(nil)
end
puts 'broken:'
puts builder.to_xml
puts 'fixed:'
puts builder.to_xml(save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS)
output:
(511)-> ruby derp.rb
broken:
<?xml version="1.0"?>
<blah/>
fixed:
<?xml version="1.0"?>
<blah></blah>

Search node in xml by using Nokogiri xpath (with xml namesapce)

I found Nokogiri is quite powerful on dealing with xml but I met a special case
I am trying to search a node in xml file like this
<?xml version="1.0" encoding="utf-8" ?>
<ConfigurationSection>
<Configuration xmlns="clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server" >
<Configuration.Store>SqlServer</Configuration.Store>
<Configuration.Engine>Staging</Configuration.Engine>
</Configuration>
</ConfigurationSection>
When I do a
xml = File.new(webconfig,"r")
doc = Nokogiri::XML(xml.read)
nodes = doc.search("//Configuration.Store")
xml.close
I got empty nodes. Something am I missing? I have tried
nodes = doc.search("//Configuration\.Store")
still no luck.
Updated: I have attached the whole xml file
Updated the xml Again: My mistake, it does have a namaspace
EDIT #2: Solution now includes #parse_with_namespace
You can find a number of Nokogiri methods pertaining to namespaces in the Nokogiri::XML::Node documentation.
# encoding: UTF-8
require 'rspec'
require 'nokogiri'
XML = <<XML
<?xml version="1.0" encoding="utf-8" ?>
<ConfigurationSection>
<Configuration xmlns="clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server" >
<Configuration.Store>SqlServer</Configuration.Store>
<Configuration.Engine>Staging</Configuration.Engine>
</Configuration>
</ConfigurationSection>
XML
class ConfigParser
def parse(xml)
doc = Nokogiri::XML(xml).remove_namespaces!
configuration = doc.at('/ConfigurationSection/Configuration')
store = configuration.at("./Configuration.Store").text
engine = configuration.at("./Configuration.Engine").text
{store: store, engine: engine}
end
def parse_with_namespace(xml)
doc = Nokogiri::XML(xml)
configuration = doc.at('/ConfigurationSection/xmlns:Configuration', 'xmlns' => 'clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server')
store = configuration.at("./xmlns:Configuration.Store", 'xmlns' => 'clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server').text
engine = configuration.at("./xmlns:Configuration.Engine", 'xmlns' => 'clr-namespace:Newproject.Framework.Server.Store.Configuration;assembly=Newproject.Framework.Server').text
{store: store, engine: engine}
end
end
describe ConfigParser do
before(:each) do
#parsed = subject.parse XML
#parsed_with_ns = subject.parse_with_namespace XML
end
it "should be able to parse the Configuration Store" do
#parsed[:store].should eq "SqlServer"
end
it "should be able to parse the Configuration Engine" do
#parsed[:engine].should eq "Staging"
end
it "should be able to parse the Configuration Store with namespace" do
#parsed_with_ns[:store].should eq "SqlServer"
end
it "should be able to parse the Configuration Engine with namespace" do
#parsed_with_ns[:engine].should eq "Staging"
end
end
require 'nokogiri'
XML = "<Configuration>
<Configuration.Store>SqlServer</Configuration.Store>
<Configuration.Engine>Staging</Configuration.Engine>
</Configuration>"
p Nokogiri::VERSION, Nokogiri.XML(XML).search('//Configuration.Store')
#=> "1.5.0"
#=> [#<Nokogiri::XML::Element:0x8103f0f8 name="Configuration.Store" children=[#<Nokogiri::XML::Text:0x81037524 "SqlServer">]>]
p RUBY_DESCRIPTION
#=> "ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.0]"

ruby malformed XML: missing tag start

I have a very weird problem: I run the same code on the two xml files, the second of which is the copy of the first one (I copied the contents, maybe that's a problem).
The code uses REXML to parse the xml file, on the first file it's all good, on the second I have this error:
Failed: malformed XML: missing tag start
Line: 2
Position: 102
Last 80 unconsumed characters:
<t>dede</t>
The contents of the xml file is:
<?xml version="1.0" standalone="yes"?>
<t>dede</t>
Any ideas?
Thanks a lot
I do not have any such problem using this code:
require 'rexml/document'
doc = REXML::Document.new <<ENDXML
<?xml version="1.0" standalone="yes"?>
<t>dede</t>
ENDXML
doc.each_element('//t'){ |e| puts e }
#=> <t>dede</t>
What version of Ruby are you using, and what does your code actually look like?
Edit: Based off the new information that you're using the stream parser, here's another piece of code that also works for me using Ruby 1.8.7:
class Listener
def method_missing( name, *args ); puts "I don't support '#{name}'"; end
def tag_start( name, attrs ); puts "<#{name} #{attrs.inspect}>"; end
def text( str ); p str; end
def tag_end( name ); puts "</#{name}>"; end
end
require 'stringio'
xml = StringIO.new <<ENDXML
<?xml version="1.0" standalone="yes"?>
<t>dede</t>
ENDXML
require 'rexml/document'
doc = REXML::Document.parse_stream( xml, Listener.new )
#=> "\t"
#=> I don't support 'xmldecl'
#=> "\n\t"
#=> <t {}>
#=> "dede"
#=> </t>
#=> "\n"
It's because of the file encoding. I have the same problem and found out the file was UCS-2 encoded. Either UTF-8 or ANSI works, but UCS-2 doesn't, it seems. It probably needs specialized parsers for this format first. I just converted the xml file in Notepad++ to test the different encodings.
REXML seems a bit too eager to throw a ParseException. Encoding is definitely a major culprit. Check the encoding of your files.

How do I validate specific attributes in XML using Ruby's REXML?

I'm trying to read some XML I've retrieved from a web service, and validate a specific attribute within the XML.
This is the XML up to the tag that I need to validate:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<QueryResponse xmlns="http://tempuri.org/">
<QueryResult xmlns:a="http://schemas.datacontract.org/2004/07/Entity"
xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Navigation i:nil="true" />
<a:SearchResult>
<a:EntityList>
<a:BaseEntity i:type="a:Product">
<a:ExtractDateTime>1290398428</a:ExtractDateTime>
<a:ExtractDateTimeFormatted>11/22/2010
04:00:28</a:ExtractDateTimeFormatted>
Here's the code I have thus far using REXML in Ruby:
require 'xmlsimple'
require 'rexml/document'
require 'rexml/streamlistener'
include REXML
class Listener
include StreamListener
xmlfile = File.new("rbxml_CS_Query.xml")
xmldoc = Document.new(xmlfile)
# Now get the root element
root = xmldoc.root
puts root.attributes["a:EntityList"]
# This will output the date/time of the query response
xmldoc.elements.each("a:BaseEntity"){
|e| puts e.attributes["a:ExtractDateTimeFormatted"]
}
end
I need to validate that ExtractDateTimeFormatted is there and has a valid value for that attribute. Any help is greatly appreciated. :)
Reading from local xml file.
File.open('temp.xml', 'w') { |f|
f.puts request
f.close
}
xml = File.read('temp.xml')
doc = Nokogiri::XML::Reader(xml)
extract_date_time_formatted = doc.at(
'//a:ExtractDateTimeFormatted',
'a' => 'http://schemas.datacontract.org/2004/07/Entity'
).inner_text
show = DateTime.strptime(extract_date_time_formatted, '%m/%d/%Y')
puts show
When I run this code I get an error: "undefined method 'at' for # on line 21
Are you tied to REXML or can you switch to Nokogiri? I highly recommend Nokogiri over the other Ruby XML parsers.
I had to add enough XML tags to make the sample validate.
require 'date'
require 'nokogiri'
xml = %q{<?xml version="1.0"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<QueryResponse xmlns="http://tempuri.org/">
<QueryResult xmlns:a="http://schemas.datacontract.org/2004/07/Entity" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Navigation i:nil="true"/>
<a:SearchResult>
<a:EntityList>
<a:BaseEntity i:type="a:Product">
<a:ExtractDateTime>1290398428</a:ExtractDateTime>
<a:ExtractDateTimeFormatted>11/22/2010</a:ExtractDateTimeFormatted>
</a:BaseEntity>
</a:EntityList>
</a:SearchResult>
</QueryResult>
</QueryResponse>
</s:Body>
</s:Envelope>
}
doc = Nokogiri::XML(xml)
extract_date_time_formatted = doc.at(
'//a:ExtractDateTimeFormatted',
'a' => 'http://schemas.datacontract.org/2004/07/Entity'
).inner_text
puts DateTime.strptime(extract_date_time_formatted, '%m/%d/%Y')
# >> 2010-11-22T00:00:00+00:00
There's a couple things going on that could make this harder to handle than a simple XML file.
The XML is using namespaces. They are useful, but you have to tell the parser how to handle them. That is why I had to add the second parameter to the at() accessor.
The date value is in a format that is often ambiguous. For many days of the year it is hard to tell whether it is mm/dd/yyyy or dd/mm/yyyy. Here in the U.S. we assume it's the first, but Europe is the second. The DateTime parser tries to figure it out but often gets it wrong, especially when it thinks it's supposed to be dealing with a month 22. So, rather than let it guess, I told it to use mm/dd/yyyy format. If a date doesn't match that format, or the date's values are out of range Ruby will raise an exception, so you'll need to code for that.
This is an example of how to retrieve and parse XML on the fly:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::XML(open('http://java.sun.com/developer/earlyAccess/xml/examples/samples/book-order.xml'))
puts doc.class
puts doc.to_xml
And an example of how to read a local XML file and parse it:
require 'nokogiri'
doc = Nokogiri::XML(File.read('test.xml'))
puts doc.to_xml
# >> <?xml version="1.0"?>
# >> <root xmlns:foo="bar">
# >> <bar xmlns:hello="world"/>
# >> </root>

Resources