html 4.0 entities in XPATH queries

html 4.0 entities in XPATH queries - xpath

I don't know exactly why the xpath expression:
//h3[text()='Foo › Bar']
doesn't match:
<h3>Foo › Bar</h3>
Does that seem right? How do I query for that markup?

XPath does not define any special escape sequences. When XPath is used within XSLT (e.g. in attributes of elements of an XSLT document), the escape sequences are processed by the XML processor that reads the stylesheet. If you use XPath in non-XML context (e.g. from Java or C# or other language) via a library, and your XPath query is a string literal in that language, you won't get any escape processing aside from that which the language itself usually does.
If this is C# or Java, this should work:
String xpath = "//h3[text()='Foo \u8250 Bar']";
...
As a side note, it wouldn't work in XSLT either, as XSLT uses XML, which doesn't define a character entity › - it only defines <, >, ", &apos; and &. You'd have to either use 艐, or define the character entity yourself in DOCTYPE declaration of the XSLT stylesheet.

From the XPath specification:
XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax
… so unless you are using the query inside (as opposed to "to query") a language that resolves that entity (perhaps XSLT with a DTD that includes the entity (if that is possible, I'm far from an XSLT expert)), I wouldn't expect it to work.
Use a literal character or an escape sequence recognized by whatever language you are using XPath from.

Related

What characters are never used in xpath?

I'm trying to build a DSL which will contain a number of XPaths as parameters. I'm new to XPath, and I need a character which is never used in the XPath syntax so I can delimit n number of XPaths on a single line of a script. My question: what characters are NOT part of the XPath syntax?

The null character.
Seriously. Because an XPath is supposed to support any XML document, it must be capable of matching text nodes that contain any allowed Unicode character. However, XML disallows one character: the null character.
Ok, that is not entirely true, but it is simplest. As in XML 1.1, control characters were supported, except Unicode Null. However, as per the XML 1.0 production of Char, there are a few other characters you can choose from: surrogate pairs (as characters, not as correctly encoded octets representing a non-BMP character), and anything before 0x20, except linefeed, carriage return and tab.
Another good guess is any Private Use character, as it is unlikely it is used by your input documents, however, this is not guaranteed, and you asked for "never".

I'm trying to build a DSL which will contain a number of XPaths as parameters.
Well, many people use XML for DSLs, and this is how you would do it in XML:
<paths>
<path>/a/b/c/d</path>
<path>/w/x/y/z</path>
</path>
So how do we reconcile this with the fact that "<" can appear in an XPath expression? Answer: if it does appear, we escape it:
<paths>
<path>/a/b/c/d[e < 3]</path>
<path>/w/x/y/z[v < 2]</path>
</path>
So: don't try to find a character that can't appear in an XPath expression. Use a character that can appear, and escape it if it does.

XPath - How to get image source from xml

Hello i have this xml
<item>
<title> Something for title»</title>
<link>some url</link>
<description><![CDATA[<div class="feed-description"><div class="feed-image"><img src="pictureUrl.jpg" /></div>text for desc</div>]]></description>
<pubDate>Thu, 11 Jun 2015 16:50:16 +0300</pubDate>
</item>
I try to get the img src with path: //description//div[#class='feed-description']//div[#class='feed-image']//img/#src but it doesn't work
is there any solution?

A CDATA section escapes its contents. In other words, CDATA prevents its contents from being parsed as markup when the rest of the document is parsed. So the <div>s in there are not seen as XML elements, only as flat text. The <description> element has no element children ... only a single text child. As such, XPath can't select any <div> descendant of <description> because none exists in the parsed XML tree.
What to do?
If your XPath environment supports XPath 3.0, you could use parse-xml() to turn the flat text into a tree, then use XPath to select //div[#class='feed-description']//div[#class='feed-image']//img/#src from the resulting tree.
Otherwise, your best workaround may be to use primitive string-processing functions like substring-before(), substring-after(), or match(). (The latter uses regular expressions and requires XPath 2.0.) Of course, many people will tell you not to use regular expressions to analyze markup like XML and HTML. For good reason: in the general case, it's very difficult to do it right (with regexes or with plain string searches). But for very restricted cases where the input is highly predictable, and in absence of better tools, it can be the best tool for a less-than-ideal job.
For example, for the data shown in your question, you could use
substring-before(substring-after(//description, 'img src="'), '"')
In this case, the inner call substring-after(//description, 'img src="') returns pictureUrl.jpg" /></div>text for desc</div>, of which the substring before " is pictureUrl.jpg.
This isn't really robust, for example it'll fail if there's a space between src and =. But if the exact formatting is predictable, you'll be OK.

Altova XMLspy 2014: Multi-line xpath in XSD 1.1 assertions

In Altova XMLspy 2014, in a XSD 1.1 document, if I add an assertion, I can insert a XPATH 2.0 expresion for the "test" atribute of the assertion, but only ONE line is shown. How can I enter a multi-line xpath in an assertion?
Of course, I can enter a multi-line xpath in text view. But I'm using a graphical tool to edit my XSD files easily, so I would like to edit complex xpath expressions graphically (in schema view).
In other components (for example, in annotations) I can press control+intro to insert multiple lines. I can't do it in assertions.
Even worse, if I enter a multi-line xpath assertion in text view, and I change to schema view ("Schema Overview" or "Content Model View") and try to edit the xpath, then the multi-line xpath is shown as only one line.
Multi-line xpath in assertions is required for advanced (complex) node checking. For example, the following xpath:
every $symbol in symbols/symbol satisfies
every $state in states/state satisfies
some $tran in transition-function/transition satisfies
$tran/#current-symbol eq $symbol
and $tran/#current-state eq $state
can be easily understood only with a multi-line format.
Xpath 2.0 is near to be a programming languaje, very useful to check relations between node values. So, as a programming languaje, the expresions can be long and complex, and the multi-line feature is absolutely required.
Perhaps am I missing some setup option to enable it?

Using parentheses in strings of text make xpath fail

My question is this: is it possible to write an xpath in which parentheses are interpreted as part of a string?
My selenium script keeps failing as soon as I use parentheses in a contains function.
For example:
//li/div/span[contains(text(),"Komkommer (BONUS)")]

I have the same problem. It has to do with encoding. Your parenthesis string may have different encoding than the comparison base.

Using upper-case and lower-case xpath functions in selenium IDE

I am trying to get a xpath query using the xpath function lower-case or upper-case, but they seem to not work in selenium (where I test my xpath before I apply it).
Example that does NOT work:
//*[.=upper-case('some text')]
I have no problem locating the nodes I need in complex path and even using aggregated functions, as long as I don't use the upper and lower case.
Has anyone encountered this before? Does it make sense?
Thanks.

upper-case() and lower-case() are XPath 2.0 functions. Chances are your platform supports XPath 1.0 only.
Try:
translate('some text','abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')
which is the XPath 1.0 way to do it. Unfortunately, this requires knowledge of the alphabet the text uses. For plain English, the above probably works, but if you expect accented characters, make sure you add them to the list.
In most environments you are using XPath out of a host language of some sort, and can use the host language's capabilities to work around this XPath 1.0 limitation by externally providing upper- and lower-case variants of the search string to translate().
Shown on the example of Python:
search = 'Some Text'
lc = search.lower()
uc = search.upper()
xpath = f"//p[contains(translate(., '{lc}', '{uc}'), '{uc}')]"
This would produce the following XPath expression:
//p[contains(translate(., 'some text', 'SOME TEXT'), 'SOME TEXT')]
which searches case-insensitively and works for arbitrary search text.

If you are going to need upper case in multiple places in your xslt, you can define variables for the lower case and upper case and then use them in your translate function everywhere. It should make your xslt much cleaner.
Example at XSL/XPATH : No upper-case function in MSXML 4.0 ?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

html 4.0 entities in XPATH queries - xpath

I don't know exactly why the xpath expression: //h3[text()='Foo › Bar'] doesn't match: <h3>Foo › Bar</h3> Does that seem right? How do I query for that markup?

Related

What characters are never used in xpath?

XPath - How to get image source from xml

Altova XMLspy 2014: Multi-line xpath in XSD 1.1 assertions

Using parentheses in strings of text make xpath fail

Using upper-case and lower-case xpath functions in selenium IDE

Categories

Resources