Ruby: Insert new XML element into existing XML file - ruby

How can I insert another XML element into an XML file I'm creating with Builder::XmlMarkup? e.g., something like
xml = Builder::XmlMarkup.new( :indent => 4 )
xml.content
xml.common do
xml.common_field1 do
// common_field1 content
end
xml.common_field2 do
// common_field 2 content
end
end
xml.custom do
xml.insert!(<XML element>)
end
end
Where <XML element> looks something like
<elements>
<element>
// element content
</element>
<element>
// element content
</element>
<elements>
and the final output looks like
<content>
<common>
<content1>
<!-- content1 -->
</content1>
<content2>
<!-- content2 -->
</content2>
</common>
<custom>
<elements>
<element>
<!-- element content -->
</element>
<element>
<!-- element content -->
</element>
</elements>
</custom>
</content>
I've tried using the << operator but that doesn't unfortunately doesn't maintain formatting.

<< is exactly what you need:
xml.custom do |custom|
custom << '<XML element>'
end
Rubydocs doesn't seem to work, so here's the link to the source code: https://github.com/jimweirich/builder/blob/master/lib/builder/xmlbase.rb#L104

Related

XPath: Get element with matching attribute value

I'm trying to get content from an element whose #id attribute matches the context node's #idref. For example, given the following xml (just a contrived sample)...
<doc>
<toc>
<entry idref="ch1"/>
<entry idref="ch2"/>
</toc>
<body>
<chapter id="ch1">
<title>Chapter 1</title>
<para/>
</chapter>
<chapter id="ch2">
<title>Chapter 2</title>
<para/>
</chapter>
<chapter id="ch3">
<title>Chapter 3</title>
<para/>
</chapter>
</body>
</doc>
From the [entry] element, how can I get the content of [title] within [chapter] whose #id matches the current #idref.
So, basically find chapter[where chapter #id = current entry #idref]/title
I've tried
string(//chapter[#id = #idref]/title)
string(//chapter[#id = ./#idref]/title)
string(//chapter[#id = current()/#idref]/title)
all with no luck.
Can you try this expression on your xml?
//chapter[#id=//toc/entry/#idref]/string-join((title,#id),' ')
Output:
Chapter 1 ch1
Chapter 2 ch2

Replacing xml tags in BASH

I have a large collection of xml documents with a wide array of different tags in them. I need to change all tags of the form <foo> and turn them into tags of the form <field name="foo"> in a way that will also ignore the attributes of a given tag. That is, a tag of the form <foo id="bar"> should also be changed to the tag <field name="foo">.
In order for this transformation to work, I also need to distinguish between <foo> and </foo>, as </foo> must go to </field>.
I have played around with sed in a bash script, but to no avail.
Although sed is not ideal for this task (see comments; further reading: regular, context-free grammar and xml), it can be pressed into service. Try this one-liner:
sed -e 's/<\([^>\/\ ]*\)[^>]*>/<field name=\"\1\">/g' -e 's/<field name=\"\">/<\/field>/g' file
First it will replace all end tags with </field>, then replace every open tag first words with <field name="firstStoredWord">
This solution prints everything on the standard output. If you want to replace it in file directly when processing, try
sed -i -e 's/<\([^>\/\ ]*\)[^>]*>/<field name=\"\1\">/g' -e 's/<field name=\"\">/<\/field>/g' file
That makes from
<html>
<person>
but <person name="bob"> and <person name="tom"> would both become
</person>
this
<field name="html">
<field name="person">
but <field name="person"> and <field name="person"> would both become
</field>
Sed is the wrong tool for the job - a simple XSL Transform can do this much more reliably:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="foo">
<field name="foo">
<xsl:apply-templates/>
</field>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note that unlike sed, it can handle short empty elements, newlines within tags (e.g. as produced by some tools), and just about anything that's well-formed XML. Here's my test file:
<?xml version="1.0"?>
<doc>
<section>
<foo>Plain foo, simple content</foo>
</section>
<foo attr="0">Foo with attr, with content
<bar/>
<foo attr="shorttag"/>
</foo>
<foo
attr="1"
>multiline</foo
>
<![CDATA[We mustn't transform <foo> in here!]]>
</doc>
which is transformed by the above (using xsltproc 16970175.xslt 16970175.xml) to:
<?xml version="1.0"?>
<doc>
<section>
<field name="foo">Plain foo, simple content</field>
</section>
<field name="foo">Foo with attr, with content
<bar/>
<field name="foo"/>
</field>
<field name="foo">multiline</field>
We mustn't transform <foo> in here!
</doc>

xpath selection of a title node within a resultset

I've an xml doc like below. I was trying to select a title node with a particular value in it say "![CDATA[ 1234 ]]". That Title node may be in any Type node. I was using this xpath query
/Results/ResultSet/Type[Title="![CDATA[ 1234 ]]"]
but didnt get anything selected. can someone pls help.
<Results>
<Info>...</Info>
<ResultSet num="4">
<Type type="A">
<Title>
<![CDATA[ 1234 ]]>
</Title>
<Description>
<![CDATA[ 1234 ]]>
</Description>
<Domain>
<![CDATA[1234 ]]>
</Domain>
<Target>
<![CDATA[]]>
</Target>
</Type>
<Type type="A">
<Title>
<![CDATA[ abcdef ]]>
</Title>
<Description>
<![CDATA[abcdef]]>
</Description>
<Domain>
<![CDATA[abcdef]]>
</Domain>
<Target>
<![CDATA[abcdef]]>
</Target>
</Type>
EDIT: included the ruby code that I am using
doc = Nokogiri::HTML(html)
Element = doc.xpath('/Results/ResultSet/Type/Title[text()=" 1234 "]')
if Element.empty?()
puts "not there "
else
Element.each do |node|
puts "Found Title: #{node.text}"
end
end
end
The XPath is wrong:
Use this:
/Results/ResultSet/Type/Title[text()=" 1234 "]
Based on the link OP posted for the XML, here is the working XPath:
/QuigoResults/ResultSet/Listing/Title[text()=" location in DYNAMICREGION "]

XPath in Nokogiri returning empty array [] whereas I am expecting to have results

I am trying to parse XML files using Nokogiri, Ruby and XPath. I usually don't encounter any problem but with the following I can't make any xpath request:
doc = Nokogiri::HTML(open("myfile.xml"))
doc.("//Meta").count
# result ==> 0
doc.xpath("//Meta")
# result ==> []
doc.xpath(.).count
# result => 1
Here is an simplified version of my XML File
<Answer xmlns="test:com.test.search" context="hf%3D10%26target%3Dst0" last="0" estimated="false" nmatches="1" nslices="0" nhits="1" start="0">
<time>
...
</time>
<promoted>
...
</promoted>
<hits>
<Hit url="http://www.test.com/" source="test" collapsed="false" preferred="false" score="1254772" sort="0" mask="272" contentFp="4294967295" did="1287" slice="1">
<groups>
...
</groups>
<metas>
<Meta name="enligne">
<MetaString name="value">
</MetaString>
</Meta>
<Meta name="language">
<MetaString name="value">
fr
</MetaString>
</Meta>
<Meta name="text">
<MetaText name="value">
<TextSeg highlighted="false" highlightClass="0">
La
</TextSeg>
</MetaText>
</Meta>
</metas>
</Hit>
</hits>
<keywords>
...
</keywords>
<groups>
...
</groups>
How can I get all children of <Hit> from this XML?
Include the namespace information when calling xpath:
doc.xpath("//x:Meta", "x" => "test:com.test.search")
You can use the remove_namespaces! method and save your day.
This is one of the most FAQ XPAth questions -- search for "XPath default namespace".
If there is no way to register a namespace for the default namespace and use the registered prefix (say "x" in //x:Meta) then use:
//*[name() = 'Meta` and namespace-uri()='test:com.test.search']
If it is known that Meta can only belong to the default namespace, then the above can be shortened to:
//*[name() = 'Meta`]

How to reference an XML attribute using XPath?

My XML:
<root>
<cars>
<makes>
<honda year="1995">
<model />
<!-- ... -->
</honda>
<honda year="2000">
<!-- ... -->
</honda>
</makes>
</cars>
</root>
I need a XPath that will get me all models for <honda> with year 1995.
so:
/root/cars/makes/honda
But how to reference an attribute?
"I need a XPath that will get me all models for <honda> with year 1995."
That would be:
/root/cars/makes/honda[#year = '1995']/model
Try /root/cars/makes/honda/#year
UPDATE: reading your question again:
/root/cars/makes/honda[#year = '1995']
Bottom line is: use # character to reference xml attributes.

Resources