How to remove duplicate nodes in linq to xml document

How to remove duplicate nodes in linq to xml document - linq

In the following XML there are 2 sets of nodes that have the same TxnID. Using linq to XML how does one remove the duplicate PurchaseOrderRet nodes.
<?xml version="1.0"?>
<QBPOSXML>
<QBPOSXMLMsgsRs>
<PurchaseOrderQueryRs>
<PurchaseOrderRet>
<TxnID>abc</TxnID>
</PurchaseOrderRet>
</PurchaseOrderQueryRs>
<PurchaseOrderQueryRs>
<PurchaseOrderRet>
<TxnID>xyz</TxnID>
</PurchaseOrderRet>
</PurchaseOrderQueryRs>
<PurchaseOrderQueryRs>
<PurchaseOrderRet>
<TxnID>abc</TxnID>
</PurchaseOrderRet>
<PurchaseOrderRet>
<TxnID>def</TxnID>
</PurchaseOrderRet>
<PurchaseOrderRet>
<TxnID>xyz</TxnID>
</PurchaseOrderRet>
</PurchaseOrderQueryRs>
</QBPOSXMLMsgsRs>
</QBPOSXML>

You can use this statement:
XDocument doc = XDocument.Load(#"mypath\MyFile.xml");
to load the XML into an XDocument object.
You can use GroupBy to identify duplicate <TxnID> elements. After applying the following operations onto doc:
doc.Descendants("PurchaseOrderRet")
.GroupBy(p => p.Element("TxnID").Value)
.Where(g => g.Count() > 1)
.ToList()
.ForEach(x => x.Skip(1).Remove());
doc holds the following XML:
- <QBPOSXML>
- <QBPOSXMLMsgsRs>
- <PurchaseOrderQueryRs>
- <PurchaseOrderRet>
<TxnID>abc</TxnID>
</PurchaseOrderRet>
</PurchaseOrderQueryRs>
- <PurchaseOrderQueryRs>
- <PurchaseOrderRet>
<TxnID>xyz</TxnID>
</PurchaseOrderRet>
</PurchaseOrderQueryRs>
- <PurchaseOrderQueryRs>
- <PurchaseOrderRet>
<TxnID>def</TxnID>
</PurchaseOrderRet>
</PurchaseOrderQueryRs>
</QBPOSXMLMsgsRs>
</QBPOSXML>

Related

How select nodes combining preceding-sibling and following sibling?

I want to select all nodes preceding-sibling A and following-sibling A, excluding following-sibling C and D
XML :
<XMLCODE>
<ex>
<z>bla</z>
<z>bla</z>
<A/>
<k>want</k>
<b>want</b>
<A/>
<b>bla</b>
<h>bla</h>
<C/>
<z>bla</z>
<D/>
<e>bla</e>
<A/>
<j>want</j>
<A/>
<i>bla</i>
<C/>
<y>bla</y>
<C/>
<y>bla</y>
</ex>
</XMLCODE>
output:
<k>want</k>
<b>want</b>
<j>want</j>
I tried
//*[
preceding-sibling::*[self::A ]
and
following-sibling::*[self::A ]
]
[not(self::A)]
Thanks

This is how I would approach this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="trail" match="*[not(self::A)]" use="generate-id(preceding-sibling::A[1])" />
<xsl:template match="/XMLCODE">
<result>
<xsl:for-each select="ex/A[position() mod 2 = 1]">
<xsl:copy-of select="key('trail', generate-id())"/>
</xsl:for-each>
</result>
</xsl:template>
</xsl:stylesheet>
Applied to your input example, this will return:
Result
<?xml version="1.0" encoding="UTF-8"?>
<result>
<k>want</k>
<b>want</b>
<j>want</j>
</result>
This is actually an XSLT 1.0 method. In XSLT 2.0 you could ostensibly do something with:
<xsl:for-each-group select="ex/*" group-starting-with="A">
but I don't see an elegant method to distinguish between the "on" and "off" groups, since the first group could start with an A or not.

Use this XPath 1.0 expression:
/*/ex/*[not(self::A or self::B or self::C)
and
(
preceding-sibling::A[1] | preceding-sibling::B[1] | preceding-sibling::C[1]
)[last()][self::A]
and
(
following-sibling::A[1] | following-sibling::B[1] | following-sibling::C[1]
)[1][self::A]
]
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/ex/*[
not(self::A or self::B or self::C)
and
(
preceding-sibling::A[1] | preceding-sibling::B[1] | preceding-sibling::C[1]
)[last()][self::A]
and
(
following-sibling::A[1] | following-sibling::B[1] | following-sibling::C[1]
)[1][self::A]
]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<XMLCODE>
<ex>
<z>bla</z>
<z>bla</z>
<A/>
<k>want</k>
<b>want</b>
<A/>
<b>bla</b>
<h>bla</h>
<C/>
<z>bla</z>
<D/>
<e>bla</e>
<A/>
<j>want</j>
<A/>
<i>bla</i>
<C/>
<y>bla</y>
<C/>
<y>bla</y>
</ex>
</XMLCODE>
The Xpath expression is evaluated and the results of this are copied to the output --the correct, wanted result is produced:
<k>want</k>
<b>want</b>
<j>want</j>

I think you can do
let $A := //A
return for-each-pair(
$A[position() mod 2 = 1],
$A[position() mod 2 = 0],
function($A1, $A2) {
$A1/following-sibling::* intersect $A2/preceding-sibling::*
}
)
XPath 3.1 with higher-order function support but Saxon 10 and later in all editions, SaxonJS 2 and Saxon 9.8/9.9 PE/EE do that.

Using xslt-1.0 and exslt to extract element nodes between pairs of A elements:
xmlstarlet select -t \
-m '*/*/A[following-sibling::A][count(preceding-sibling::A) mod 2 = 0]' \
-c 'set:leading(following-sibling::*,following-sibling::A[1])' \
file.xml
-m iterates over A element pairs, changing the context to the first A
-c copies following sibling elements up to, and excluding, the second A
set:leading documentation on github.io,
implementation on github.com.
Output:
<k>want</k><b>want</b><j>want</j>
To have xmlstarlet select list the generated XSLT add a -C
option before -t:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" xmlns:set="http://exslt.org/sets" version="1.0" extension-element-prefixes="exslt set">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="*/*/A[following-sibling::A][count(preceding-sibling::A) mod 2 = 0]">
<xsl:copy-of select="set:leading(following-sibling::*,following-sibling::A[1])"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

How to search for some XML data and repleace it with a new value using Nokogiri Ruby gem

Base on below XML exemple file employees.xml and using Ruby Nokogiri gem I wan to open this file, change the building number to 320 and the room number to 99 for Sandra Defoe and save the changes. What is the recommended way to do it.
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>

I'd use this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
</employees>
EOT
first_name = 'Sandra'
last_name = 'Defoe'
node = doc.at("//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
node.at('building').content = '320'
node.at('room').content = '99'
Which results in:
doc.to_xml
# => "\uFEFF<?xml version=\"1.0\" encoding=\"utf-16\"?>\n" +
# "<employees>\n" +
# " <employee id=\"be130\">\n" +
# " <firstname>William</firstname>\n" +
# " <lastname>Defoe</lastname>\n" +
# " <building>326</building>\n" +
# " <room>14a</room>\n" +
# " </employee>\n" +
# " <employee id=\"be132\">\n" +
# " <firstname>Sandra</firstname>\n" +
# " <lastname>Defoe</lastname>\n" +
# " <building>320</building>\n" +
# " <room>99</room>\n" +
# " </employee>\n" +
# "</employees>\n"
Normally I recommend using CSS selectors because they tend to result in less visual noise, however CSS doesn't let us peek into the text of nodes, and working around that, while possible, results in even more noise. XPath, on the other hand, can be very noisy, but for this sort of task, it's more usable.
XPath is very well documented and figuring out what this is doing should be pretty easy.
The Ruby side of it is using a "format string":
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
similar to
"%s %s" % [first_name, last_name] # => "Sandra Defoe"
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name]
# => "//employee[firstname/text()='Sandra' and lastname/text()='Defoe']"
Just for thoroughness, here's what I'd do if I wanted to use CSS exclusively:
node = doc.search('employee').find { |node|
node.at('firstname').text == first_name && node.at('lastname').text == last_name
}
This gets ugly though, because search tells Nokogiri to retrieve all employee nodes from libXML, then Ruby has to walk through them all telling Nokogiri to tell libXML to look in the child firstname and lastname nodes and return their text. That's slow, especially if there are many employee nodes and the one you want is at the bottom of the file.
The XPath selector tells Nokogiri to pass the search to libXML which parses it, finds the employee node with the child nodes containing the first and last names and returns only that node. It's much faster.
Note that at('employee') is equivalent to search('employee').first.
# File 'lib/nokogiri/xml/searchable.rb', line 70
def at(*args)
search(*args).first
end
Finally, mediate on the difference between a NodeSet#text and Node#text as the first will lead to insanity.

Assume your content is a string:
xml=%q(
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>)
doc = Nokogiri.parse(xml)
This will work but assumes the first and last names are unique, otherwise it will modify the first match of first and last name.
target = doc.css('employee').find do |node|
node.search('firstname').text == 'Sandra' &&
node.search('lastname').text == 'Defoe'
end
target.at_css('building').content = '320'
target.at_css('room').content = '99'
doc # outputs the updated xml
=> <?xml version="1.0"?>
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>320</building>
<room>99</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>

XPath expressions?

<?xml version="1.0" encoding="UTF-8"?>
<Patients>
<patientRole>
<id extension="996-756-495" root="2.16.840.1.113883.19.5"/>
<id extension="775-756-495" root="2.16.840.1.113883.14.6"/>
<patient>
<name>
<given>Henry</given>
<family>Levin</family>
</name>
<administrativeGenderCode code="M" codeSystem="2.16.840.1.113883.5.1"/>
<birthTime value="19320924"/>
</patient>
<providerOrganization>
<id root="2.16.840.1.113883.19.5"/>
<name>Good Health Clinic</name>
</providerOrganization>
<admissionTime value="2012030111:32"/>
</patientRole>
<patientRole>
<id extension="65" root="2.16.840.1.113883.3.933"/>
<patient>
<name>
<given>Paul</given>
<family>Pappel</family>
</name>
<administrativeGenderCode code="M" codeSystem="2.16.840.1.113883.5.1"/>
<birthTime value="19551217"/>
</patient>
<providerOrganization>
<id extension="84756-11241-283-OPTD-3322" root="1.2.3.4.5.6.1.8.9.0"/>
<name> Dr.med. Hans Topp-Glucklich</name>
</providerOrganization>
<admissionTime value="201201152200"/>
</patientRole>
<patientRole>
<id extension="800001" root="2.16.840.1.113883.19.5"/>
<patient>
<name>
<given>JEANNE</given>
<family>PETIT</family>
</name>
<administrativeGenderCode code="F" codeSystem="2.16.840.1.113883.5.1"/>
<birthTime value="19480105"/>
</patient>
<providerOrganization>
<id root="2.16.840.1.113883.19.5"/>
<name>Good Health Clinic</name>
</providerOrganization>
<admissionTime value="20120101T22:00"/>
</patientRole>
</Patients>
I am having difficulty writing XPath expressions for the following two items:
Given and family name of the patient born on 17 Dec 1955
Number of patients admitted to "Good Health Clinic" in January 2012

This will get you the name element of the patients born on the particular date:
patientRole/patient[birthTime/#value="19551217"]/name
This will get you the count of patientRole elements with the organisation name and admission date specified:
count(patientRole[providerOrganization/name="Good Health Clinic"][starts-with(admissionTime/#value,"201201")])

How to write a Xpath Expression comparing two attributes or nodes

Given the following sample:
<?xml version="1.0" encoding="UTF-8"?>
<Patients>
<patientRole>
<id extension="996-756-495" root="2.16.840.1.113883.19.5"/>
<id extension="775-756-495" root="2.16.840.1.113883.14.6"/>
<patient>
<name>
<given>Henry</given>
<family>Levin</family>
</name>
<administrativeGenderCode code="M" codeSystem="2.16.840.1.113883.5.1"/>
<birthTime value="19320924"/>
</patient>
<providerOrganization>
<id root="2.16.840.1.113883.19.5"/>
<name>Good Health Clinic</name>
</providerOrganization>
<admissionTime value="2012030111:32"/>
</patientRole>
<patientRole>
<id extension="65" root="2.16.840.1.113883.3.933"/>
<patient>
<name>
<given>Paul</given>
<family>Pappel</family>
</name>
<administrativeGenderCode code="M" codeSystem="2.16.840.1.113883.5.1"/>
<birthTime value="19551217"/>
</patient>
<providerOrganization>
<id extension="84756-11241-283-OPTD-3322" root="1.2.3.4.5.6.1.8.9.0"/>
<name> Dr.med. Hans Topp-Glucklich</name>
</providerOrganization>
<admissionTime value="201201152200"/>
</patientRole>
<patientRole>
<id extension="800001" root="2.16.840.1.113883.19.5"/>
<patient>
<name>
<given>JEANNE</given>
<family>PETIT</family>
</name>
<administrativeGenderCode code="F" codeSystem="2.16.840.1.113883.5.1"/>
<birthTime value="19480105"/>
</patient>
<providerOrganization>
<id root="2.16.840.1.113883.19.5"/>
<name>Good Health Clinic</name>
</providerOrganization>
<admissionTime value="20120101T22:00"/>
</patientRole>
</Patients>
How would I write a X-Path expression for the following:
Family names for the male patients (gender code="M")
Any help is greatly appreciated I am new to XML/Xpath and i have tried multiple ways and its not generating what i need.

This should work:
/Patients/patientRole/patient[administrativeGenderCode/#code='M']/name/family

I want to reprint the modified xml after deleting entire child node

<product>
<book>
<id>111</id>
<name>xxx</name>
</book>
<pen>
<id>222</id>
<name>yyy</name>
</pen>
<pencil>
<id>333</id>
<name>zzz</name>
</pencil>
I want to remove the "pencil" node and print the remaining xml using REXML (Ruby). Can anybody tell me how to do that ?

By using one of the delete methods http://rubydoc.info/stdlib/rexml/
require "rexml/document"
string = <<EOF
<product>
<book>
<id>111</id>
<name>xxx</name>
</book>
<pen>
<id>222</id>
<name>yyy</name>
</pen>
<pencil>
<id>333</id>
<name>zzz</name>
</pencil>
</product>
EOF
doc = REXML::Document.new(string)
doc.delete_element('//pencil')
puts doc
There is also nice tutorial to get you started: http://www.germane-software.com/software/rexml/docs/tutorial.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to remove duplicate nodes in linq to xml document - linq

Related

How select nodes combining preceding-sibling and following sibling?

How to search for some XML data and repleace it with a new value using Nokogiri Ruby gem

XPath expressions?

How to write a Xpath Expression comparing two attributes or nodes

I want to reprint the modified xml after deleting entire child node

Categories

Resources