How to extract all values of all attributes in certain nodes from XML file in NiFi? - xpath

So I am having this type of structure in my xml file
<parent tag>
<childtag code="1"/>
<childtag code="2"/>
<childtag code="3"/>
<childtag code="4"/>
....
<parent tag>
and what I wanna to get is the concatenation string of all values from attribute code.
Are there any ways to do that via XPAth or XSLT may be?
P.S. all childtags are named the same.

Related

Parsing an XML file using nokogiri to create \index fields for LaTeX

I'm a professional indexer new to Ruby and nokogiri and I am in need of some assistance.
I'm working on a set of macros that will allow me to take an XML file, output from my indexing software, and parse it into valid \index{} commands for inclusion in a LaTeX source file. Each XML <record> contains at least two <field> tags, so I will have to iterate over the multiple <field> tags to build my \index{} entry.
The following is an example of an index record from the xml file.
<record time="2022-08-27T17:25:12" id="30">
<field><text style="i"/><hide>SS </hide>Titanic<text/></field>
<field>passengers</field>
<field class="locator"><text style="b"/>5<text/></field>
</record>
I will produce intermediate output of this record in the form of:
\index{Titanic#\textit{SS Titanic}!passengers|textbf} 5
(The numeric locator is used to place the \index{} entry at the correct spot in the LaTex file and won't be included in the LaTeX source file)
I am using nokogiri to manipulate the xml file and have been able to reach the point where I return a nodelist that contains just the <field> tags for each <record>, but I need to be able to retrieve all the text in the <field>, including the formatting information (if I use the text method on a <field>, it returns "SS Titanic" for example, with all formatting information stripped away).
I'm stuck on how to access the entire text string in the <field> tag. Once I can get that, I have a good idea of how to structure my parser.
Any help will be greatly appreciated.
does this help?
xml = "<record time="2022-08-27T17:25:12" id="30">
<field><text style="i"/><hide>SS </hide>Titanic<text/></field>
<field>passengers</field>
<field class="locator"><text style="b"/>5<text/></field>
</record>"
fields = Nokogiri::XML(xml).xpath(".//field")
puts fields.first.text #=> "SS Titanic"
puts fields.map(&:text) #=> ["SS Titanic", "passengers", "5"]

Do I need specify namespaces in xpath?

I am reading docs, and it's seems that namespaces are needed mostly for xsd-scheme and generation some other formats from XML. But I can't understand do I need to use them in XPATH. Nothing do not stop me to specify path to element without namespace.
The path without a namespace is a path to elements in the empty namespace. Nothing can stop you specifying a path without namespaces, but such a path only matches elements without namespaces.
For example, /root/a/text() returns 1, but /root/ns:a/text() returns 2:
<root xmlns:ns="some:namespace">
<a>1</a>
<ns:a>2</ns:a>
</root>
Both of the texts can be selected by /root/*[local-name()='a']/text().

How can I use Nokogiri with Ruby to replace values in existing xml?

I am using Ruby 1.9.3 with the lastest Nokogiri gem. I have worked out how to extract values from an xml using xpath and specifying the path(?) to the element. Here is the XML file I have:
<?xml version="1.0" encoding="utf-8"?>
<File xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Houses>
<Ranch>
<Roof>Black</Roof>
<Street>Markham</Street>
<Number>34</Number>
</Ranch>
</Houses>
</File>
I use this code to print a value:
doc = Nokogiri::XML(File.open ("C:\\myfile.xml"))
puts doc.xpath("//Ranch//Street")
Which outputs:
<Street>Markham</Street>
This is all working fine but what I need is to write/replace the value. I want to use the same kind of path-style lookup to pass in a value to replace the one that is there. So I want to pass a street name to this path and overwrite the street name that is there. I've been all over the internet but can only find ways to create a new XML or insert a completely new node in the file. Is there a way to replace values by line like this? Thanks.
You want the content= method:
Set the Node’s content to a Text node containing string. The string gets XML escaped, not interpreted as markup.
Note that xpath returns a NodeSet not a single Node, so you need to use at_xpath or get the single node some other way:
doc = Nokogiri::XML(File.open ("C:\\myfile.xml"))
node = doc.xpath("//Ranch//Street")[0] # use [0] to select the first result
node.content = "New value for this node"
puts doc # produces XML document with new value for the node

Using Xpath return text that is positioned after the last comma

Using xpath, I want to return the value 000078 & 000077 from the below xml. The text for "Entity" tag can be 2 comma separated values or 3 or more. I always want the last value.
<Parent ID="123">
<SubParent ID="1">
<Name>Modem</Name>
<Entity>000006,000069,000078</Entity>
</SubParent>
<SubParent ID="2">
<Name>Modem</Name>
<Entity>000006,000077</Entity>
</SubParent>
</Parent>
XPath is a selection language, not a string processing (or general purpose programming) language, and you can only select from the distinct nodes in your document.
The nodes that contain the values you are looking for are two text nodes, '000006,000069,000078' and '000006,000077', so //Entity/text() (or //Entity) is the closest you can get with XPath alone.
Any further string processing, like pulling out the substring after the last comma, must be done in the host language.
This is one of the examples that show that storing opaque strings that contain multiple data points (like comma-separated values) in XML is a bad idea.
This is how your XML should look like.
<Parent ID="123">
<SubParent ID="1">
<Name>Modem</Name>
<Entity>000006</Entity>
<Entity>000069</Entity>
<Entity>000078</Entity>
</SubParent>
<SubParent ID="2">
<Name>Modem</Name>
<Entity>000006</Entity>
<Entity>000077</Entity>
</SubParent>
</Parent>
because now you would easily be able to select //Entity[last()]/text() and get exactly two nodes.

Parsing XML tags with small difference in names

I have an XML file to parse in which the element tags are of the form:
<mensa-1>
..
</mensa-1>
<mensa-2>
..
</mensa-2>
Is it possible to parse such elements via Xpath when the element names differ via a number at the end?
The following XPath expression returns all the elements whose names start with "mensa-":
//*[starts-with(name(),'mensa-')]

Resources