How to add the values for respective elements? - xpath

I have 3 XML structures as below:
a.xml
<Books>
<Book>
<Publisher>ABC Pvt Ltd</Publisher>
<Month>May</Month>
<Year>2016</Year>
<BooksReleased>4</BooksReleased>
</Book>
</Books>
b.xml
<Books>
<Book>
<Publisher>XYZ Pvt Ltd</Publisher>
<Month>April</Month>
<Year>2016</Year>
<BooksReleased>2</BooksReleased>
</Book>
</Books>
c.xml
<Books>
<Book>
<Publisher>ABC Pvt Ltd</Publisher>
<Month>June</Month>
<Year>2016</Year>
<BooksReleased>2</BooksReleased>
</Book>
</Books>
I would like to group these XML by publisher and also need to calculate its total no. of BooksReleased by the publisher for particular year.
required output format:
<TotalCalc>
<PublishedBook>
<Publisher>ABC Pvt Ltd</Publisher>
<no.of books>6</no.of books>
</PublishedBook>
<PublishedBook>
<Publisher>XYZ Pvt Ltd</Publisher>
<no.of books>2</no.of books>
</PublishedBook>
</TotalCalc>
Kindly, help me i tried the following but its not working
typeswitch($Publisher)
case element (ABC Pvt Ltd)
return sum($doc/BooksReleases[$doc/$Publisher = 'ABC Pvt Ltd'])
default return 'unknnown'

It might be possible to use cts:value-tuples to pull up co-occurrences of Publisher and 'BooksReleased', which you can then iterate to aggregate by Publisher. That would scale much better. Something like:
let $aggregates := map:map()
let $_ :=
for $tuple in cts:value-tuples((
cts:element-reference(xs:QName("Publisher")),
cts:element-reference(xs:QName("BooksReleased"))
))
let $values := json:array-values($tuple)
let $pub := $values[1]
let $books as xs:int := $values[2]
return map:put($aggregates, $pub, (map:get($aggregates, $pub), 0)[1] + $books)
return $aggregates
Note thought that this requires indexes on Publisher and BooksReleased, and it is important that each document contains only one (value of) Publisher to prevent cross-products.
I would also consider simply dropping (or ignoring) BooksReleased, and just making sure you save each book as a separate document. You can then use cts:values on Publisher and use cts:frequency on each publisher value to get the number of books for the publishers.
HTH!

Related

How to combine two XML files with Nokogiri

I am trying to combine two separate, but related, files with Nokogiri. I want to combine the "product" and "product pricing" if "ItemNumber" is the same.
I loaded the documents, but I have no idea how to combine the two.
Product File:
<Products>
<Product>
<Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
<ProductTypeId>0</ProductTypeId>
<Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head.</Description>
<ActiveFlag>Y</ActiveFlag>
<ImageFile>100024.jpg</ImageFile>
<ItemNumber>100024</ItemNumber>
<ProductVariants>
<ProductVariant>
<Sku>100024</Sku>
<ColorName></ColorName>
<SizeName></SizeName>
<SequenceNo>0</SequenceNo>
<BackOrderableFlag>N</BackOrderableFlag>
<InventoryLevel>0</InventoryLevel>
<ColorCode></ColorCode>
<SizeCode></SizeCode>
<TaxableFlag>Y</TaxableFlag>
<VariantPromoGroupCode></VariantPromoGroupCode>
<PricingGroupCode></PricingGroupCode>
<StartDate xsi:nil="true"></StartDate>
<EndDate xsi:nil="true"></EndDate>
<ActiveFlag>Y</ActiveFlag>
</ProductVariant>
</ProductVariants>
</Product>
</Products>
Product Pricing Fields:
<ProductPricing>
<ItemNumber>100024</ItemNumber>
<AcquisitionCost>8.52</AcquisitionCost>
<MemberCost>10.7</MemberCost>
<Price>14.99</Price>
<SalePrice xsi:nil="true"></SalePrice>
<SaleCode>0</SaleCode>
</ProductPricing>
I am looking to generate a file like this:
<Products>
<Product>
<Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
<ProductTypeId>0</ProductTypeId>
<Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head.</Description>
<ActiveFlag>Y</ActiveFlag>
<ImageFile>100024.jpg</ImageFile>
<ItemNumber>100024</ItemNumber>
<ProductVariants>
<ProductVariant>
<Sku>100024</Sku>
<ColorName></ColorName>
<SizeName></SizeName>
<SequenceNo>0</SequenceNo>
<BackOrderableFlag>N</BackOrderableFlag>
<InventoryLevel>0</InventoryLevel>
<ColorCode></ColorCode>
<SizeCode></SizeCode>
<TaxableFlag>Y</TaxableFlag>
<VariantPromoGroupCode></VariantPromoGroupCode>
<PricingGroupCode></PricingGroupCode>
<StartDate xsi:nil="true"></StartDate>
<EndDate xsi:nil="true"></EndDate>
<ActiveFlag>Y</ActiveFlag>
</ProductVariant>
</ProductVariants>
</Product>
<ProductPricing>
<ItemNumber>100024</ItemNumber>
<AcquisitionCost>8.52</AcquisitionCost>
<MemberCost>10.7</MemberCost>
<Price>14.99</Price>
<SalePrice xsi:nil="true"></SalePrice>
<SaleCode>0</SaleCode>
</ProductPricing>
</Products>
Here is the code I have so far:
require 'csv'
require 'nokogiri'
xml = File.read('lateApril-product-pricing.xml')
xml2 = File.read('lateApril-master-date')
doc = Nokogiri::XML(xml)
doc2 = Nokogiri::XML(xml2)
pricing_data = []
item_number = []
doc.xpath('//ProductsPricing/ProductPricing').each do |file|
itemNumber = file.xpath('./ItemNumber').first.text
variant_Price = file.xpath('./Price').first.text
pricing_data << [ itemNumber, variant_Price ]
item_number << [ itemNumber ]
end
puts item_number ## This prints all the item number but i have no idea how to loop through them and combine them with Product XML
doc2.xpath('//Products/Product').each do |file|
itemNumber = file.xpath('./ItemNumber').first.text #not sure how to write the conditions here since i don't have pricing fields available in this method
end
Try this on:
require 'nokogiri'
doc1 = Nokogiri::XML(<<EOT)
<Products>
<Product>
<Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
</Product>
</Products>
EOT
doc2 = Nokogiri::XML(<<EOT)
<ProductPricing>
<ItemNumber>100024</ItemNumber>
</ProductPricing>
EOT
doc1.at('Product').add_next_sibling(doc2.at('ProductPricing'))
Which results in:
puts doc1.to_xml
# >> <?xml version="1.0"?>
# >> <Products>
# >> <Product>
# >> <Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
# >> </Product><ProductPricing>
# >> <ItemNumber>100024</ItemNumber>
# >> </ProductPricing>
# >> </Products>
Please, when you ask, strip the example input and expected resulting output to the absolute, bare, minimum. Anything beyond that wastes space, eye-time and brain CPU.
This is untested code, but is where I'd start if I was going to merge two files containing multiple <ItemNumber> nodes:
require 'nokogiri'
doc1 = Nokogiri::XML(<<EOT)
<Products>
<Product>
<Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
<ItemNumber>100024</ItemNumber>
</Product>
</Products>
EOT
doc2 = Nokogiri::XML(<<EOT)
<ProductPricing>
<ItemNumber>100024</ItemNumber>
</ProductPricing>
EOT
# build a hash containing the item numbers in doc1 for each product
doc1_products_by_item_numbers = doc1.search('Product').map { |product|
item_number = product.at('ItemNumber').value
[
item_number,
product
]
}.to_hash
# build a hash containing the item numbers in doc2 for each product pricing
doc2_products_by_item_numbers = doc2.search('ProductPricing').map { |pricing|
item_number = pricing.at('ItemNumber').value
[
item_number,
pricing
]
}.to_hash
# append doc2 entries to doc1 after each product based on item numbers
doc1_products_by_item_numbers.keys.each { |k|
doc1_products_by_item_numbers[k].add_next_sibling(doc2_products_by_item_numbers[k])
}

regular expressions to strip off all the XML

I am very new with Ruby and I need to write the ruby regular expressions to strip off all the XML and create a file with titles instead of XML:
for example the first book should be:
book: bk101
author: Mathew Gamardella (notice first name first!!!)
title: XML Developer's Guide
Genre: Computer
Price: 44.95
Publish Date: October 1,2000 (Notice this is different from the XML - you must convert the date to this form)
Description: An in-depth look at creating applications
with XML
Here is my XML file -
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<book id="bk105">
<author>Corets, Eva</author>
<title>The Sundered Grail</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters of Maeve, half-sisters,
battle one another for control of England. Sequel to
Oberon's Legacy.</description>
</catalog>
Any help is really appreciated.
This looks more like a homework assignment than a question. I'll let you figure out how to write files and format the date --- here's something simple that will make a hash out of your XML and loop through each book / field one at a time (I shortened your document to two books).
require 'active_support/core_ext/hash'
xml_books = <<-EOF
"<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>
EOF
books = Hash.from_xml(xml_books)
books['catalog']['book'].each do |e|
e.keys.each do |k|
printf("%s -> %s\n", k, e[k])
end # do k
end # do e
Produces the following output:
id -> bk101
author -> Gambardella, Matthew
title -> XML Developer's Guide
genre -> Computer
price -> 44.95
publish_date -> 2000-10-01
description -> An in-depth look at creating applications
with XML.
id -> bk102
author -> Ralls, Kim
title -> Midnight Rain
genre -> Fantasy
price -> 5.95
publish_date -> 2000-12-16
description -> A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.

Unable to findnodes() restricted just to current parent

I'm parsing a simple XML file to create a flat text file from it. The desired outcome is shown below the sample XML. The XML has sort of a header-detail structure (Assembly_Info and Part respectively), with a unique header node followed by any number of detail record nodes, all of which are siblings. After digging into the elements under the header, I can't then find a way back 'up' to then pick up all the sibling detail nodes.
XML file looks like this:
<?xml version="1.0" standalone="yes" ?>
<Wrapper>
<Record>
<Product>
<prodid>4094</prodid>
</Product>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0000</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0455</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>045A</dev_name>
</Part>
</Assembly>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0002</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0457</dev_name>
</Part>
</Assembly>
</Record>
</Wrapper>
For each Assembly I need to read the values of the two elemenmets in Assembly_Info which I do successfully. But, I then want to read each of the Part records that are associated with the Assembly. The objective is to 'flatten' the file into this:
prodid id interface status dev_name
4094 DF-7A C N/A 0000
4094 DF-7A C Ready 0455
4094 DF-7A C Ready 045A
4094 DF-7A C N/A 0002
4094 DF-7A C Ready 0457
I'm attempting to use findnodes() to do this, as that's about the only tool I thought I understood. My code unfortunately reads all of the Part records from the entire file foreach Assembly--since the only way I've been able to find the Part nodes is to start at the root. I don't know how to change 'where I am', if you will; to tell findnodes to begin at current parent. Code looks like this:
my $parser = XML::LibXML -> new();
my $tree = $parser -> parse_file ('DEMO.XML');
for my $product ($tree->findnodes ('/Wrapper/Record/Product/prodid')) {
$prodid = $product->textContent();
}
foreach my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly')){
$assemblies++;
$parts = 0;
for my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly/Assembly_Info')) {
$id = $assembly->findvalue('id');
$interface = $assembly->findvalue('interface');
}
foreach my $part ($tree->findnodes ('/Wrapper/Record/Assembly/Part')) {
$parts++;
$status = $part->findvalue('status');
$dev_name = $part->findvalue('dev_name');
}
print "Assembly No: ", $assemblies, " Parts: ",$parts, "\n";
}
How do I get just the Part nodes for a given Assembly, after I've gone down to the Assembly_Info depths? There is quite a bit I'm not getting, and I think a problem may be that I'm thinking of this as 'navigating' or moving a cursor, if you will. Examples of XPath path expressions have not helped me.
Instead of always using $tree as the starting point for the findnodes method, you can use any other node, especially also child nodes. Then you could use a relative XPath expression. For example:
for my $record ($tree->findnodes('/Wrapper/Record')) {
for my $assembly ($record->findnodes('./Assembly')) {
for my $part ($assembly->findnodes('./Part')) {
}
}
}

libxml2 predicates in xpath expression are not always recognized

I appeal to you because I have problems in using the libxml2 library that does not take into account certain parameters in my xpath expressions.
Here is an example of xml file that I am trying to parse:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book title="Harry Potter" lang="eng" version="1">
<price>29.99</price>
</book>
<book title="Learning XML" lang="eng" version="2">
<price>38.95</price>
</book>
<book title="Learning C" lang="eng" version="2">
<price>39.95</price>
</book>
</bookstore>
Suppose I want to extract all the books whose native language is English and whose version is the first edition.
I'll use if I'm not mistaken the following XPath expression :
//book[#lang='eng' and #version='1']
and the following instructions in my code :
xmlChar * xpath_expression = "//book[#lang='eng' and #version='1']";
xmlXPathObjectPtr xpathRes = xmlXPathEvalExpression(xpath_expression, ctxt);
The problem is that I get as a result, the list of books as if I'd just do the following request:
//book
I wonder if my version is buggy knowing that I have the latest for my debian squeeze (2.7.8.dfsg-2 + squeeze7)...
This is most certainly not a bug in libxml2. You probably made an error elsewhere. The following code only prints "Harry Potter":
#include <stdio.h>
#include <libxml/xpath.h>
int main()
{
static const char xml[] =
"<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"
"<bookstore>\n"
" <book title=\"Harry Potter\" lang=\"eng\" version=\"1\">\n"
" <price>29.99</price>\n"
" </book>\n"
" <book title=\"Learning XML\" lang=\"eng\" version=\"2\">\n"
" <price>38.95</price>\n"
" </book>\n"
" <book title=\"Learning C\" lang=\"eng\" version=\"2\"> \n"
" <price>39.95</price>\n"
" </book>\n"
"</bookstore>\n";
xmlDocPtr doc = xmlParseMemory(xml, sizeof(xml));
xmlXPathContextPtr ctxt = xmlXPathNewContext(doc);
xmlChar *expression = BAD_CAST "//book[#lang='eng' and #version='1']";
xmlXPathObjectPtr res = xmlXPathEvalExpression(expression, ctxt);
xmlNodeSetPtr nodeset = res->nodesetval;
for (int i = 0; i < nodeset->nodeNr; i++) {
xmlNodePtr node = nodeset->nodeTab[i];
xmlChar *title = xmlGetProp(node, BAD_CAST "title");
printf("%s\n", title);
}
xmlXPathFreeObject(res);
xmlXPathFreeContext(ctxt);
xmlFreeDoc(doc);
return 0;
}

Distinct Result via xQuery

I'm trying to get reviewers who review one or more books published after 2010.
for $r in doc("review.xml")//Reviews//Review,
$b in doc("book.xml")//Books//Book
where $b/Title = $r/BookTitle
and $b/Year > 2010
return {$r/Reviewer}
The following are both XML files.
review.xml:
<Reviews>
<Review>
<ReviewID>R1</ReviewID>
<BookTitle>B1</BookTitle>
<Reviewer>AAA</Reviewer>
</Review>
<Review>
<ReviewID>R2</ReviewID>
<BookTitle>B1</BookTitle>
<Reviewer>BBB</Reviewer>
</Review>
<Review>
<ReviewID>R3</ReviewID>
<BookTitle>B2</BookTitle>
<Reviewer>AAA</Reviewer>
</Review>
<Review>
<ReviewID>R4</ReviewID>
<BookTitle>B3</BookTitle>
<Reviewer>AAA</Reviewer>
</Review>
<Reviews>
book.xml:
<Books>
<Book>
<Title>B1</Title>
<Year>2005</Year>
</Book>
<Book>
<Title>B2</Title>
<Year>2011</Year>
</Book>
<Book>
<Title>B3</Title>
<Year>2012</Year>
</Book>
</Books>
I'll get two AAA by my xQuery code. I was wondering if I can get the distinct result, which means only one AAA. I've tried distinct-value() but don't know how to use it probably. Thanks for your reply!
----My Updated Solution with XML format for xQuery 1.0----
<root>
{
for $x in distinct-values
(
for $r in doc("review.xml")//Reviews//Review,
$b in doc("book.xml")//Books//Book
where $b/Title = $r/BookTitle
and $b/Year > 2010
return {$r/Reviewer}
)
return <reviewer>{$x}</reviewer>
}
</root>
To preserve nodes, you can use the "group by" clause and select the first item of a group sequence:
for $r in doc("review.xml")//Review,
$b in doc("book.xml")//Book
let $n := $r/Reviewer
where $b/Title = $r/BookTitle
and $b/Year > 2010
group by $n
return $r[1]/Reviewer
The following query will give you all distint reviewer names (note that the values are atomized, which means the element nodes are removed):
distinct-values(
for $r in doc("review.xml")//Reviews//Review,
$b in doc("book.xml")//Books//Book
where $b/Title = $r/BookTitle
and $b/Year > 2010
return $r/Reviewer
)

Resources