xpath, I have to do a complex query - xpath

I am learning xpath and i have some problems to make some query
This is my dtd:
<! DOCTYPE database[
<! ELEMENT database(Customer*, Stock*, Zone*, Machine*, Seller*)>
<! ELEMENT Customer(social_id)>
<! ELEMENT Machine(name_machine)>
<! ELEMENT Seller(name_seller, cell-phone, email)>
<! ELEMENT Stock(howmany)>
<! ELEMENT ZOne(name_zone)>
<! ATTLIST Customer
id_customer ID #REQUIRED,
id_zone IDREF #REQUIRED,
id_seller IDREF #REQUIRED>
<! ATTLIST Machine id_machine ID #REQUIRED>
<! ATTLIST Seller id_seller ID #REQUIRED>
<! ATTLIST Stock
id_customer IDREF #REQUIRED,
id_machine IDREF #REQUIRED,
howmany CDATA #REQUIRED>
<! ATTLIST ZOne id_zone ID #REQUIRED>
<! ELEMENT name_machine (#PCDATA)>
<! ELEMENT name_seller (#PCDATA)>
<! ELEMENT name_zone (#PCDATA)>
<! ELEMENT email (#PCDATA)>
<! ELEMENT cell-phone (#PCDATA)>
<! ELEMENT social_id (#PCDATA)>]
And the query that i must do is:
1) get the customer's cellphone from some zone (particulary just one)
thanks for your help, im am learning englist too, Im sorry if a wrote something really wrong

First, I would create an XML file that obeys the DTD you have, then find an application that allows you to experiment with XPath expressions, start with root // and get all nodes, then work your way down to nodes you need to select.

Related

Xquery get node with specific child element

I am using xquery 1.0 and have the following problem.
My input message:
<Body>
<album>
<contents>
<content>correct</content>
<content>hardcore</content>
</contents>
</album>
<album>
<contents>
<content>incorrect</content>
<content>punk</content>
</contents>
</album>
<album>
<contents>
<content>incorrect</content>
<content>rock</content>
</contents>
</album>
</Body>
Desired result:
I would like to search for the 'Album' node that contains the child element <content>correct</content> and when the node has been found I would like to pick/use the element <content>hardcore</content>. Note that the order of the album nodes is subject to change. So a first() or [1] will not be sufficient.
What I tried:
if (body/album/contents/content[text()='correct']) then ???
If I understand you correctly, you probably don't need xquery for that.
//contents/content[.="correct"]/following-sibling::content
should be enough.

How stop on specific tag?

How get whole text under h1 tag to the next h1 tag?
I have class name of starting h1 tag
...
<h1 class="something">...</h1>
...
<h1 ...>...</h1>
...
I tried: //*[#class='something']//text()
I want to scrapy text from all childs and siblings. I don't need text of h1 tags. I don't know how to stop scraping to next h1 tag.
With a proper example:
<root>
<h1 class="something">.1.</h1>
.2.
<p>.3.</p>
.4.
<h1 class="other">.5.</h1>
</root>
This XPath 1.0 expression:
/root//text()[not(ancestor::h1)][preceding::h1[1][#class='something']]
Meaning: "descendants text nodes of root element having the first preceding h1 element with #class attribute equal to 'something´ and not having an ancestor h1 element"
And it selects
.2.
.3.
.4.
Test in http://www.xpathtester.com/xpath/ecd4f379b13558572ffd62d0db3a3f98

How to prevent Xpath recursion

Given I have this (unknown) document structure, how do I write xpath to select div1 and div2, i.e. all divs, but not recursivelly (no divs, contained anywhere within another divs)?
I couldn't find any documentation that would point me in this direction, all I could manage is to select ALL divs, i.e. div1, div2 and div3 (with //div expression), but I want to exclude div2 here as it is the descendant div of another one.
(I need a generic solution to select tags not recursivelly, the ids here are for explanatory purposes only.)
...some unknown structure with no divs...
<div id="1">
...some unknown structure with no divs...
<div id="2"></div>
...some unknown structure with no divs...
</div>
...some unknown structure with no divs...
<div id="3"></div>
...some unknown structure with no divs...
If you select //div[not(ancestor::div)] you select all div elements that don't have any ancestor also being a div.
If you have access to XPath 3.1 or 3.0 you can also use the outermost function https://www.w3.org/TR/xpath-functions/#func-outermost as it "returns every node within the sequence that does not have another node within the sequence as an ancestor" so "the expression outermost(//div) returns those div elements that are not contained within further div elements".

XPath expression to match across two associated elements

I’ve got the following XML of associated elements:
<doc>
<!-- A block of style elements. -->
<styles>
<style id='style-1' class='bar'>…</style>
<style id='style-2' class='baz'>…</style>
…
</styles>
<!-- Document content. -->
<p style='style-1'>…</p>
<p style='style-2'>…</p>
…
</doc>
For an XSLT template I’m looking for an XPath expression matches “an element p whose style is of class bar”?
Pure XPath 1.0 expression that will return all elements p whose style is of class bar :
//p[#style = //style[#class='bar']/#id]
Basically, the XPath looks for <p> elements where style attribute equals id of <style class='bar'>.
Presuming that is an accurate representation of your document's structure, I would advise using this, without double-slashes (//) since double-slashes can be very inefficient:
/doc/p[#style = /doc/styles/style[#class = 'bar']/#id]

how to match no following sibling

Here's my xml,
<w:tc>
<w:p>
<w:pPr></w:pPr>
<w:r></w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:pPr></w:pPr>
</w:p>
</w:tc>
I want to match w:p which is preceded by w:tc and has no following sibling w:r, Precisely i want second w:tc. Code what i have tried,
<xsl:template match="w:pPr[ancestor::w:p[ancestor::w:tc] and not(following-sibling::w:r)]">
I need xpath for w:pPr having no following-sibling
The problem is when w:pPr is followed by w:hyperlink. Now i have ignored w:hyperlink too.
If you want to match a w:pPr that has no following sibling elements at all (regardless of name), then just use a match pattern of
w:pPr[ancestor::w:p[ancestor::w:tc] and not(following-sibling::*)]
or equivalently (and slightly shorter)
w:tc//w:p//w:pPr[not(following-sibling::*)]
Using the XPath is simple and straightforward, you have to filter elements olny. Your filtring could be based on the content of the element (using [] and path inside the brackets). With the filtered elements you can work as same as with the XML tree (start filtering again or select the final elements).
In your case, first you have to choose the correct tc element (filter the element as you need):
Based on the count of elements: //tc[count(./p/*) = 1], or
Based on non existing r element: //tc[not(./p/r)], or
Based on non existing r and hyperlink element: //tc[not(./p/r) and not(./p/hyperlink)]
Based on existing pPr and non existing r (it is not a necessary because the pPr is filtred in second step): //tc[./p/r and not(./p/r)]
It returns the following XML.
<tc>
<p>
<pPr>pPr</pPr>
</p>
</tc>
Then just simply say what do you want from the new XML:
Do you want the pPr element? Use: /p/pPr
All together:
//tc[count(./p/*) = 1]/p/pPr
or
//tc[not(./p/r)]/p/pPr
Note: // means find the element anywhere in the document.
Update 1: Hyperlink condition added.

Resources