Do I need specify namespaces in xpath? - xpath

I am reading docs, and it's seems that namespaces are needed mostly for xsd-scheme and generation some other formats from XML. But I can't understand do I need to use them in XPATH. Nothing do not stop me to specify path to element without namespace.

The path without a namespace is a path to elements in the empty namespace. Nothing can stop you specifying a path without namespaces, but such a path only matches elements without namespaces.
For example, /root/a/text() returns 1, but /root/ns:a/text() returns 2:
<root xmlns:ns="some:namespace">
<a>1</a>
<ns:a>2</ns:a>
</root>
Both of the texts can be selected by /root/*[local-name()='a']/text().

Related

How to add a new node without prefix

I'm working with a SOAP API that requires some XML nodes without prefixes. Is it even possible to do with Nokogiri? Simply omitting the prefix from the node name makes Nokogiri use the default prefix "env".
node = Nokogiri::XML::Node.new('WageReportsToIR', envelope)
envelope.xpath('//env:Body').first.add_child(node)
results
<env:Body>\n <env:WageReportsToIR/>\n </env:Body>
Do I have any other option but to write a regex to remove the prefixes after I'm done editing the XML with Nokogiri?

Parsing XPath using Nokogiri

I am writing some scripts to change some values in config (XML) files. The script will take XPath expressions and replacement values to be replaced in a source document.
If the node is found in the source document, then the value will be replaced, but if the node is not found, I need to create a new element and add required elements with attributes.
For example, in a web.config if appSetting exists, then change its value, if not then create a new one
/configuration/appSettings/add[#key='ClientValidationEnabled']/#value
I'm wondering if it's possible to read the XPath as an expression that lets me walk it and create a new element if needed.

XPath 2.0: Retrieving nodes by attribute where value is case Insensitive

I am new to using XPath and I am trying to retrieve a node via its attribute but the problem is that the attribute is case insensitive meaning I won't exactly know how the string is cased in the document.
So for example:
Given the document:
<Document xmlns:my="http://www.MyDomain.com/MySchemaInstance">
<Machines>
<Machine FQDN="machine1.mydomain.com">
<...>
</Machine>
<Machine FQDN="Machine2.MyDomain.Com">
<...>
</Machine>
</Machines>
</Document>
If I want to retrieve the machine1 I would use the XPath:
//my:Machines/my:Machine/*[#FQDN='machine1.mydomain.com']
But a similar XPath to get machine2 would fail becuase the case does not match:
//my:Machines/my:Machine/*[#FQDN='machine2.mydomain.com'] //Fails
I have seen various posts mention using something like (I am not sure how to apply Namespaces to this):
translate(#FQDN, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
But even if I got it to work it would be really cumbersome considering the number of times I would be using it.
Finally I have read that XPath 2.0 supports matches() and lower-case() but being new to XPath I don't understand how to apply them:
For example if I try the following I get an "Invalid Qualified name":
//my:Machines/my:Machine/[matches(#FQDN, '(?i)machine1.mydomain.com')]
//my:Machines/my:Machine/[lower-case(#FQDN, 'machine1.mydomain.com')]
Can someone provide a sample XPath that includes handling of Namespaces that would work?
Thanks
Your example XML and XPath statements don't match.
The sample XML elements are not bound to a namespace. The "my" namespace-prefix is declared, but not used for those elements, so they are in the "no namespace".
Your sample XPath is using predicate filters on the children of Machine rather than on the Machine element that has the #FQDN.
You could use either of these methods to look for the value case-insensitive:
matches() function, with a flag for case-insensitive matching:
//Machines/Machine[matches(#FQDN,'machine2.mydomain.com','i')]
upper-case() function to evaluate the upper-case strings:
//Machines/Machine[upper-case(#FQDN)=upper-case('machine2.mydomain.com')]
lower-case() function to evaluate the lower-case strings:
//Machines/Machine[lower-case(#FQDN)=lower-case('machine2.mydomain.com')]
Can someone provide a sample XPath that includes handling of
Namespaces that would work?
Not sure what you meant by the handling of namespaces, but if you wanted to match on those elements regardless of their namespace then you can use the wildcard operator for the namespace:
//*:Machines/*:Machine[matches(#FQDN,'machine2.mydomain.com','i')]

Trouble using Xpath "starts with" to parse xhtml

I'm trying to parse a webpage to get posts from a forum.
The start of each message starts with the following format
<div id="post_message_somenumber">
and I only want to get the first one
I tried xpath='//div[starts-with(#id, '"post_message_')]' in yql without success
I'm still learning this, anyone have suggestions
I think I have a solution that does not require dealing with namespaces.
Here is one that selects all matching div's:
//div[#id[starts-with(.,"post_message")]]
But you said you wanted just the "first one" (I assume you mean the first "hit" in the whole page?). Here is a slight modification that selects just the first matching result:
(//div[#id[starts-with(.,"post_message")]])[1]
These use the dot to represent the id's value within the starts-with() function. You may have to escape special characters in your language.
It works great for me in PowerShell:
# Load a sample xml document
$xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>'
# Run the xpath selection of all matching div's
$xml.selectnodes('//div[#id[starts-with(.,"post_message")]]')
Result:
id
--
post_message_somenumber
post_message_somenumber2
Or, for just the first match:
# Run the xpath selection of the first matching div
$xml.selectnodes('(//div[#id[starts-with(.,"post_message")]])[1]')
Result:
id
--
post_message_somenumber
I tried xpath='//div[starts-with(#id,
'"post_message_')]' in yql without
success I'm still learning this,
anyone have suggestions
If the problem isn't due to the many nested apostrophes and the unclosed double-quote, then the most likely cause (we can only guess without being shown the XML document) is that a default namespace is used.
Specifying names of elements that are in a default namespace is the most FAQ in XPath. If you search for "XPath default namespace" in SO or on the internet, you'll find many sources with the correct solution.
Generally, a special method must be called that binds a prefix (say "x:") to the default namespace. Then, in the XPath expression every element name "someName" must be replaced by "x:someName.
Here is a good answer how to do this in C#.
Read the documentation of your language/xpath-engine how something similar should be done in your specific environment.
#FindBy(xpath = "//div[starts-with(#id,'expiredUserDetails') and contains(text(), 'Details')]")
private WebElementFacade ListOfExpiredUsersDetails;
This one gives a list of all elements on the page that share an ID of expiredUserDetails and also contains the text or the element Details

Use XPath to parse element name containing a colon

So I'm trying to parse some of the WootAPI with XPath. A problem that I'm running into is that a couple of the elements have colons in their name, such as woot:price or woot:condition. Now, trying to use the XPath //rss/channel/item/woot:price won't grab the contents of the element woot:price, because of the colon, I think. What can I do to get it anyway?
The colons are because the elements have a namespace prefix and are bound to the Woot namespace.
You should read up on XML namespaces and how they affect XPATH and XSLT.
If you want to reference the Woot elements in your XPATH you will either need to:
Declare the Woot namespace http://www.woot.com/ so that when you use that namespace prefix in your XPATH it will be understood.
Use a more generic XPATH statement that uses predicate filters that use local-name() and namespace-uri() to match the element.
//rss/channel/item/*[local-name()='price' and namespace-uri()='http://www.woot.com/']

Resources