Difference between text() and string() - xpath

Can someone explain the difference between text() and string() functions. I often use one with other, but it does not make any difference, both will get the string value of the xml node.

Can someone explain the difference between text() and string()
functions.
I. text() isn't a function but a node test.
It is used to select all text-node children of the context node.
So, if the context node is an element named x, then text() selects all text-node children of x.
Other examples:
/a/b/c/text()
selects all text-node children of any c element that is a child of any b element that is a child of the top element a.
II. The string() function
By definition string(exprSelectingASingleNode) returns the string value of the node.
The string value of an element is the concatenation of all of its text-node descendents -- in document order.
Therefore, if in the following XML document:
<a>
<b>2</b>
<c>3
<d>4</d>
</c>
5
</a>
string(/a) returns (without the surrounding quotes):
"
2
3
4
5
"
As we see, the string value reflects three white-space-only text-nodes, which we typically fail to notice and account for.
Some XML parsers have the option of stripping-off white-space-only text nodes. If the above document was parsed with the white-space-only text nodes stripped off, then the same function:
string(/a)
now returns:
"23
4
5
"

Most of the time, if you want the content of an element node X, you can refer to it as ".", if it's the context node, or as "X" if it's a child of the context node. For example:
<xsl:if test="X = 'abcd'">...
or
<xsl:value-of select="."/>
In both cases, because the context demands a string, the string() function is applied automatically. (That's a slight simplification, if you're running schema-aware XSLT 2.0 the rules are a little more complicated).
Using "string()" here is unnecessary, because it's done automatically; and using text() is a mistake (one that seems to be increasingly common, encouraged by some bad tutorials on the web). Using ./text() orX/text() in this situation gives you all the text node children of the element. Often the element has one text node child whose string value happens to be the same as the string value of the element, but your code fails if someone adds a comment or processing instruction, because the value is then split into multiple text nodes. It also fails if the element is one (say "title") that allows mixed content: string(title) and title/text() are going to give the same answer until you hit an article with the title
<title>On the wetness of H<sub>2</sub>O</title>

Related

Is it possible in XPATH to find an element by attribute value, not by name?

For example I have an XML element:
<input id="optSmsCode" type="tel" name="otp" placeholder="SMS-code">
Suppose I know that somewhere there must be an attribute with otp value, but I don’t know in what attribute it can be, respectively, is it possible to have an XPath expression of type like this:
.//input[(contains(*, "otp")) or (contains(*, "ode"))]
Try it like this and see if it works:
one = '//input/#*[(contains(.,"otp") or contains(.,"ode"))]/..'
print(driver.find_elements_by_xpath(one))
Edit:
The contains() function has a required cardinality of first argument of either one or zero. In plain(ish) English, it means you can check only one element at a time to see if it contains the target string.
So, the expression above goes through each attribute of input separately (/#*), checks if the attribute value of that specific attribute contains within it the target string and - if target is found - goes up to the parent of that attribute (/..) which, in the case of an attribute, is the node itself (input).
This XPath expression selects all <input> elements that have some attribute, whose string value contains "otp" or "ode". Notice that there is no need to "go up to the parent ..."
//input[#*[contains(., 'otp') or contains(., 'ode')]]
If we know that "otp" or "ode" must be the whole value of the attribute (not just a substring of the value), then this expression is stricter and more efficient to evaluate:
//input[#*[. ='otp' or . = 'ode']]
In this latter case ("otp" or "ode" are the whole value of the attribute), if we have to compare against many values then an XPath expression of the above form will quickly become too long. There is a way to simplify such long expression and do just a single comparison:
//input[#*[contains('|s1|s2|s3|s4|s5|', concat('|', ., '|'))]]
The above expression selects all input elements in the document, that have at least one attribute whose value is one of the strings "s1", "s2", "s3", "s4" or "s5".

Finding a parent element (not direct parent) based on partial match bot both parent id and child value

I have the following setup
<Ancestor_element_*****> Ancestor value
L
......
L
<Child_element> Child value *****
I have part of the child value and part of the ancestor node name. I need to get the Ancestor value (I do not know the exact level of nesting). Can this be done via an XPath query?
You are looking for a child element whose text contains "Child value", then you want its ancestor whose name contains "Ancestor_element", and you want its text value:
//Child_element[contains(text(),'Child value')]
/ancestor::*[contains(name(),'Ancestor_element')]/text()
Tested against
<Root>
<Ancestor_element_1>Ancestor value
<Something/>
<Something_in_between>
<Child_element> Child value 1</Child_element>
</Something_in_between>
</Ancestor_element_1>
</Root>
in xsh.

XPath Syntax - XSL 1.0

I'm trying to select all elements using XSL and XPath syntax where there is more than one pickup. I'd like to return the counter_name for each. Can someone please help me with the syntax? In this example there is only one counter_name with pickup locations, but there could be multiple locations where there are pickup counters.
XPATH
<xsl:value-of select="results/unique_locations/partner_location_ids[count(pickup) > 0]/counter_name" /><br/>
XML
<results>
<unique_locations>
<counter_name>Lake Buena Vista, FL</counter_name>
<is_airport>N</is_airport>
<partner_location_ids>
<pickup>
</pickup>
<dropoff>
<container>ZR-ORLS001</container>
<container>ET-ORLR062</container>
<container>HZ-ORLS011</container>
<container>HZ-ORLW015</container>
<container>AV-ORLR004</container>
</dropoff>
</partner_location_ids>
<counter_name>Orlando, FL</counter_name>
<is_airport>N</is_airport>
<partner_location_ids>
<pickup>
<container>ET-ORLC037</container>
<container>AV-ORLC021</container>
<container>ET-ORLC033</container>
<container>ET-ORLC035</container>
<container>HZ-ORLS007</container>
<container>HZ-ORLC004</container>
<container>HZ-ORLC002</container>
<container>ZR-ORLS002</container>
<container>BU-ORLE002</container>
<container>AV-ORLC019</container>
<container>ET-ORLR064</container>
<container>ET-ORLC001</container>
<container>ET-ORLR063</container>
<container>ET-ORLR061</container>
<container>HZ-ORLC011</container>
<container>HZ-ORLC054</container>
<container>HZ-ORLN003</container>
<container>HZ-ORLC007</container>
<container>HZ-ORLC005</container>
<container>ZA-ORLC002</container>
<container>ZA-ORLC003</container>
<container>ZA-ORLC001</container>
<container>AV-ORLC002</container>
<container>AV-ORLC001</container>
<container>BU-ORLS001</container>
<container>ET-ORLC012</container>
<container>AL-ORLR071</container>
<container>HZ-ORLC022</container>
<container>ET-ORLC051</container>
<container>HZ-ORLC025</container>
<container>HZ-ORLN018</container>
<container>HZ-ORLC017</container>
<container>AV-ORLN003</container>
<container>BU-ORLC002</container>
<container>BU-ORLC003</container>
<container>BU-ORLS006</container>
<container>ET-ORLC027</container>
<container>ET-ORLC022</container>
<container>AL-ORLR081</container>
<container>BU-ORLC005</container>
<container>HZ-ORLR029</container>
<container>HZ-ORLC032</container>
<container>HZ-ORLC031</container>
<container>HZ-ORLC030</container>
<container>ET-ORLC021</container>
</pickup>
<dropoff>
<container>HZ-ORLC003</container>
<container>ZA-ORLC004</container>
<container>BU-ORLW002</container>
<container>HZ-ORLC026</container>
<container>ZR-ORLC010</container>
<container>AL-ORLR073</container>
</dropoff>
</partner_location_ids>
</unique_locations>
Your XML structure is non-ideal, in that it appears to contain elements that are associated with each other by sequence, rather than exclusively by containment within the same element. But XPath can deal with that.
Supposing that the context node for evaluation of the XPath is the parent node of the <results> whose contents you are examining, it appears you want something along these lines:
results/unique_locations/partner_location_ids[pickup/*]/preceding-sibling::counter_name
Note in the first place the predicate: [pickup/*]. The expression within, interpreted in boolean context, evaluates to true if the expression matches any nodes. That's why we need pickup/*, not just pickup, to distinguish between <pickup> elements that contain child nodes and those that don't.
Additionally, observe the use of the preceding-sibling axis instead of the default child axis to step from each matching <partner_location_ids> to its corresponding (I think) <counter_name>.

Does xpath query has Limit option like mysql

I want to limit number of result I receive from xpath query.
For example:-
$info = $xml->xpath("//*[firstname='Sheila'] **LIMIT 0,100**");
You see that LIMIT 0,100.
You should be able to use "//*[firstname='Sheila' and position() <= 100]"
Edit:
Given the following XML:
<root>
<country.php desc="country.php" language="fr|pt|en|in" editable="Yes">
<en/>
<in>
<cityList desc="cityList" language="in" editable="Yes" type="Array" index="No">
<element0>Abu</element0>
<element1>Agartala</element1>
<element2>Agra</element2>
<element3>Ahmedabad</element3>
<element4> Ahmednagar</element4>
<element5>Aizwal</element5>
<element150>abcd</element150>
</cityList>
</in>
</country.php>
</root>
You can use the following XPath to get the first three cities:
//cityList/*[position()<=3]
Results:
Node element0 Abu
Node element1 Agartala
Node element2 Agra
If you want to limit this to nodes that start with element:
//cityList/*[substring(name(), 1, 7) = 'element' and position()<=3]
Note that this latter example works because you're selecting all the child nodes of cityList, so in this case Position() works to limit the results as expected. If there was a mix of other node names under the cityList node, you'd get undesirable results.
For example, changing the XML as follows:
<root>
<country.php desc="country.php" language="fr|pt|en|in" editable="Yes">
<en/>
<in>
<cityList desc="cityList" language="in" editable="Yes" type="Array" index="No">
<element0>Abu</element0>
<dog>Agartala</dog>
<cat>Agra</cat>
<element3>Ahmedabad</element3>
<element4> Ahmednagar</element4>
<element5>Aizwal</element5>
<element150>abcd</element150>
</cityList>
</in>
</country.php>
</root>
and using the above XPath expression, we now get
Node element0 Abu
Note that we're losing the second and third results, because the position() function is evaluating at a higher order of precedence - the same as requesting "give me the first three nodes, now out of those give me all the nodes that start with 'element'".
Ran into the same issue myself and had some issue with Geoffs answer as it, as he clearly describes, limits the number of elements returned before it performs the other parts of the query due to precedence.
My solution is to add the position() < 10 as an additional conditional after my other conditions have been applied e.g.:
//ElementsIWant[./ChildElementToFilterOn='ValueToSearchFor'][position() <= 10]/.
Notice that I'm using two separate conditional blocks.
This will first filter out elements that live up to my condition and secondly only take 10 of those.

How to get H1,H2,H3,... using a single xpath expression

How can I get H1,H2,H3 contents in one single xpath expression?
I know I could do this.
//html/body/h1/text()
//html/body/h2/text()
//html/body/h3/text()
and so on.
Use:
/html/body/*[self::h1 or self::h2 or self::h3]/text()
The following expression is incorrect:
//html/body/*[local-name() = "h1"
or local-name() = "h2"
or local-name() = "h3"]/text()
because it may select text nodes that are children of unwanted:h1, different:h2, someWeirdNamespace:h3.
Another recommendation: Always avoid using // when the structure of the XML document is statically known. Using // most often results in significant inefficiencies because it causes the complete document (sub)tree roted in the context node to be traversed.

Resources