Given the XML structure
<Doc>
<Other />
<Q1 />
<Q2 />
</Doc>
How can I select only nodes that begin with a "Q", e.g. /Doc/Q1 and /Doc/Q2?
It seems like this can be done with starts-with, but I have only found examples that apply starts-with to the value of the node
/Doc/*[starts-with(name(), 'Q')]
Related
Assume the following XML:
<data>
<node id="1" />
<node id="2" />
<node id="12" />
<node id="16" />
</data>
This xpath expression should be valid:
count(//node)
.. and should produce the number 4
I'm new to robot frameworks. Is it possible to use this xpath in robot framework?
for example something like:
${value}= Get something something source=${xml} xpath=count(//node)
The one below works but I would like the xpath to produce the end value, not a list.
#{nodelist}= Get Elements ${xml} xpath=node
Length Should Be ${nodelist} 4
Edit
I know that I can count the nodes in a list of nodes. However, I would like to get the absolute value (integer or string) using xpath. Now I need to write different code depending on if the xpath result is a node, list or attribute when the xpath could theoretically produce the final value.
You can use the Get Element Count Keyword it returns the number of elements matching the locator
You can do something as simple as this
${count} = Get Element Count name:div_name
Should Be True ${count} > 2
For more info on Keywords Have a look at this Keyword Page
When working with XML it is generally best to use the XML library. In the below example you'll find a solution for counting the elements using the XML library Get Element Count.
data.xml
<data>
<node id="1" />
<node id="2" />
<node id="12" />
<node id="16" />
</data>
Testcase.robot
*** Settings ***
Library XML
Library OperatingSystem
*** Test Cases ***
TC
${xml} Get File ./data.xml
${count} Get Element Count ${xml} xpath=node
Should Be Equal As Integers ${count} ${4}
Sample xml:
<Root>
<Customers>
<Customer>
<CompanyName>Great Lakes Food Market</CompanyName>
<ContactName>Howard Snyder</ContactName>
<ContactTitle>Marketing Manager</ContactTitle>
<Phone>(503) 555-7555</Phone>
<FullAddress>
<Address>2732 Baker Blvd.</Address>
<City>Eugene</City>
<Region>OR</Region>
<PostalCode>97403</PostalCode>
<Country>USA</Country>
</FullAddress>
</Customer>
</Customers>
</Root>
In the above xml, when I use "Customer" as the root node and xpath query as "/Root/Customers/Customer", I'm unable to print the child nodes of "FullAddress" and when I use "FullAddress" as the root node and the xpath query as "/Root/Customers/Customer/FullAddress", unable to print all the fields.
Kindly help me with the solution to print all the xml elements including the nested in a single report.
The correct XPath query is
<queryString language="XPath">
<![CDATA[/Root/Customers/Customer]]>
</queryString>
This include both of your nodes, to access the value is FullAddress node you should use XPath also in fieldDescription when you define your field, hence Address is accessed through FullAddress/Address
Example
If the field declaration of CompanyName is
<field name="CompanyName" class="java.lang.String">
<fieldDescription><![CDATA[CompanyName]]></fieldDescription>
</field>
the field declaration of for example the City is
<field name="City" class="java.lang.String">
<fieldDescription><![CDATA[FullAddress/City]]></fieldDescription>
</field>
Tl;dr: How can I get Solr 4 to ignore diacritics when sorting facet values?
I've added the following four documents to the "collection1" Solr core in the default Solr example:
<doc>
<field name="id">1</field>
<field name="cat">manuka</field>
<field name="cat">mystery</field>
</doc>
<doc>
<field name="id">2</field>
<field name="cat">mānuka</field>
<field name="cat">stuff</field>
</doc>
<doc>
<field name="id">3</field>
<field name="cat">management</field>
<field name="cat">stuff</field>
</doc>
<doc>
<field name="id">4</field>
<field name="cat">abc</field>
<field name="cat">stuff</field>
</doc>
The "cat" field is defined as:
<field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
and the "string" type is defined as:
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
When I do a facet query on the "cat" field, sorted by value (http://localhost:8983/solr/collection1/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=cat&facet.sort=index), I get:
....
"facet_fields":{
"cat":[
"abc",1,
"management",1,
"manuka",1,
"mystery",1,
"mānuka",1,
"stuff",3]},
....
Note that mānuka comes after mystery. I'd like to have mānuka come after manuka and before stuff, that is, I'd like the sort to ignore diacritics including the macron.
If this was a non-facet search, it looks like I could achieve what I want by setting up Collation for a separate copy field and sort by that (I can't set up collation for the field itself because the stored data will be a binary representation of the collation key). However, it looks like this approach isn't possible for facet queries since they can only be sorted by index or count.
Am I overlooking something? Is there some trick to get this working in an environment where I do need to display the value of the "cat" field?
The question is about customizing the index-order of a facet.
Your suggestion is to use Collation. You can do this and the order of your facets will be correct. The problem is that neither CollationField nor ICUCollationField are overriding the indexedToReadable method.
The two classes cannot override indexedToReadable because in general the mapping from word to term is not invertible. But for your case possible you can implemenent a subclass of ICUCollationField which overrides indexedToReadable in a sencefull way.
Your starting point could be TestICUCollationField with
<fieldType name="sort_fr_t" class="solr.ICUCollationField" locale="fr" strength="primary"/>
...
<field name="sort_fr" type="sort_fr_t" indexed="true" stored="true" docValues="true" multiValued="true"/>
as you will see in this case the names of the facet values are very unreadable.
Does XMLStarlet let you use a less-than/greater-than operator to filter on an attribute value? For example, consider a document like this:
<xml>
<list>
<node name="a" val="x" />
<node name="b" val="y" />
<node name="c" val="z" />
etc.
</list>
{code}
Is there a way to select nodes whose value is greater than "x"? This XPath does not seem to work with XMLStarlet 1.5.0:
//node[#val > 'x']
Nor does this:
//node[#value gt 'x']
Comparing Characters like they were numbers (ASCII values/UniCode codepoints) is (unfortunately) impossible in XPath 1.0, look at this SO question if interested in more details.
So if your #val attributes are sorted in the XML, you can achieve this with a simple XPath expression selecting all nodes after an 'equal' match:
//node[#val='x']/following-sibling::node
If not, you'd have to use an XSLT-Stylesheet. Luckily, XMLStarlet has the ability to apply XSL-Stylesheets. I cite from their overview:
Apply XSLT stylesheets to XML documents (including EXSLT support, and passing parameters to stylesheets)
So you have the possibility to apply an xsl:stylesheet to achieve the desired result using xsl:sort, which is capable of sorting by characters.
<xsl:template match="/list">
<xsl:for-each select="//node"> <!-- all nodes sorted by 'val' attribute' -->
<xsl:sort select="#val" data-type="text" order="ascending" case-order="upper-first"/>
<xsl:value-of select="#name" /> <!-- or whatever output you desire -->
</xsl:for-each>
</xsl:template>
I have an xpath-expression like this:
element[#attr="a"] | element[#attr="b"] | element[#attr="c"] | … which is an »or« statement. So can I create an expression that guarantees the result to appear in the order as in the query, even if the elements appear in a different order in the document?
f.e. an document fragment in this order:
<doc>
<element attr="c" />
<element attr="b" />
<element attr="a" />
.
.
.
</doc>
and a result list ordered like this:
[0] <element attr="a" />
[1] <element attr="b" />
[2] <element attr="c" />
.
.
.
The | operator computes the union of its operands and with XPath 1.0 you simply get a set of nodes, the order is undefined, though most XPath APIs then return the result in document order or allow you to say which order you want or whether order matters (see for instance http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html#XPathResult).
With XPath 2.0 you get a sequence of nodes ordered in document order, with XPath 2.0 if you want the order of your subexpressions you would need to use the comma operator, not the union operator i.e. element[#attr="a"] , element[#attr="b"] , element[#attr="c"].
can I create an expression that guarantees the result to appear in the
order as in the query, even if the elements appear in a different
order in the document?
Not with any XPath 1.0 engine -- they return the resulting XmlNodeList in document order.
With XPath 2.0 one can specify that a sequence is to be returned, using the comma , operator, like this:
element[#attr="a"] , element[#attr="b"] , element[#attr="c"]
Finally, If you are limited with an XPath 1.0 implementation, one way of getting the results in the desired order is to evaluate these three XPath expressions:
element[#attr="a"]
element[#attr="b"]
element[#attr="c"]
Then you can access the first result first, the second result -- second and the third result -- third.