How can I use Xpath to only select the all dict's that contain genre?
<dict>
<key>genre</key><string>News</string>
<key>genreId</key><integer>6009</integer>
</dict>
<dict>
<key>fieldID</key><integer>2</integer>
</dict>
Try this:
//dict[key = 'genre']
Before your edit:
You can use the function position()
//dict[ position() = 1 ]
or short version
//dict[ 1 ]
After your edit:
//dict[key = 'genre']
returns all dict elements which have the element key equal to 'genre'
Related
sample_xml='<employees>\
<person id="p1">\
<name value="Alice">ALICE</name>\
</person>\
<person id="p2">\
<name value="Alice">BOB</name>\
</person>\
<person id="p3">\
<name value="Alice"/>\
</person>\
</employees>'
data = [
[f'{sample_xml}']
]
df = spark.createDataFrame(data, ['data'])
df=df.selectExpr(
'xpath(data,"/employees/person/name[#value=\'Alice\']/text()") test'
)
this gives expcted ["ALICE", "BOB"]
Problem:
I want my result to be ["ALICE", "BOB","NA"]
i.e for empty path like below
<name value="Alice"/>
I want to return a default NA .
is it possible to achieve this ?
Regards
With XPath itself this is not possible. It can only return you the actual values of the matching nodes or nothing if no match.
In order to get NA or any other data that is not actually contained in the XML, you should wrap the basic XPath request with some additional, external code to return the customized output in case of no match.
In XPath 2.0, use /employees/person/name[#value=\'Alice\'] /(string(text()), 'NA')[1]".
It can't be done in XPath 1.0. In XPath 1.0 there's no such thing as a sequence of strings; you can only return a sequence of nodes, and you can only return nodes that are actually present in the input document.
I have the following XML
<?xml version = "1.0" encoding = "UTF-8"?>
<root>
<group>
<p1></p1>
</group>
<group>
<p1>value1</p1>
</group>
<group>
<p1></p1>
</group>
</root>
is it possible to get the last the node with value? in this case get the value of the second group/p1.
This xpath should work as well:
//group/p1[string-length(text()) > 0]
How about something like /root/group/p1[text() and not(../following-sibling::group/p1/text())]
In other words: get the p1 elements that have text and whose group parents are not followed by group nodes that have non-empty p1 elements.
You may also use [not(node())] Selector.
Example: //group/p1[not(node())]
It actually can be simplified as below:
//group/p1[string-length() > 0] => element text is non-empty
//group/p1[string-length() = 6] => element text has length 6
I'm working on a Query in XPath and somehow I just can't get it to work.
I've got more cars in my "garage" of course, but to solve the problem, the two Nodes will do it:
<garage>
<car>
<data>
<brand name="Mazda" model="MX5"></brand>
<country>Japan</country>
<ctype>Cabriolet</ctype>
<motor fueltype="Super">
<ps>146</ps>
<kw>107</kw>
<umin>5000</umin>
</motor>
<price>22000</price>
</data>
</car>
<car>
<data>
<brand name="Audi" model="RS6"></brand>
<country>Germany</country>
<ctype>Limousine</ctype>
<motor fueltype="Super">
<ps>580</ps>
<kw>426</kw>
<umin>6250</umin>
</motor>
<price>108000</price>
</data>
</car>
</garage>
I want to count all cars, that are from japan AND got at least 100 ps (ps means horsepower in german). In the example above the result should be 1, because only the mx5 matches both conditions. I tried "and", I tried "intersect" and now I'm out. Could someone help me out, PLEASE!!!!!!
Here you go:
/garage/car[data/country = 'Japan' and data/motor/ps >= 100]
or:
/garage/car[data/country = 'Japan'][data/motor/ps >= 100]
or:
/garage/car[data[country = 'Japan'][motor/ps >= 100]]
The above are all equivalent. To get the count, wrap any of the above with count(...).
Using ruby 1.9.3 and Nokogiri (latest):
Given (no, I did not come up with this):
<root>
<subelement>
<key>
<var name="id">50</var>
<var name="secondaryid">0</var>
</key>
</subelement>
<subelement>
<key>
<var name="id">50</var>
<var name="secondaryid">1</var>
</key>
</subelement>
</root>
Return the parent element (<key>) which has a var element with name property equal to "id" and a value equal to 50 AND a var element with name property equal to "secondaryid" and a value equal to 0. Do not return the node with id=50 and secondaryid=1.
Obviously it's going to be built off something along the lines of:
#doc.xpath("//var[#name='id' and text()=50]")
but I can't figure out how to add another predicate that will match the name = "secondaryid" element too.
Not tested with Ruby, but this should do the trick.
//key[var[#name='id'] = '50'][var[#name='secondaryid'] = '0']
Another approach:
subelement/key
[var[#name="id" and . = "50"]]
[var[#name="secondaryid" and . = "0"]]
I posted a similar question and I got a very useful reply.
Now the question is a little different, so I post it.
I specify it is an XPath 1 related question.
This is the content of my XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<mainNode>
<subNode>
<h hm="08:45">
<store id="1563">Open</store>
</h>
<h hm="13:00">
<store id="1045">Open</store>
<store id="763">Open</store>
<store id="1047">Open</store>
</h>
<h hm="16:30">
<store id="1045">Open</store>
<store id="763">Open</store>
<store id="1047">Open</store>
</h>
<h hm="20:00">
<store id="1045">Open</store>
<store id="1043">Open</store>
<store id="1052">Open</store>
</h>
<h hm="22:00">
<store id="763">Open</store>
<store id="1052">Open</store>
</h>
</subNode>
</mainNode>
My program gets the current time: if I get 12.40, I must retrieve all the stores id of the next h hm (13.00): this issue has been solved.
After retrieving the data, with a second XPath query, I must get until when, during the current day (of which the XML file is a representation), a single store will be open.
So, imagine the store is the store identified with the id=1045 and now it's 12.40 in the morning. This store will close at 20.00 because it is found in the h hm=13.00 subnode, in the h hm=16.30 subnode and in the h hm=20.00 subnode. So, I must get that 20.00.
Case 2: it's 12.40 and I must know when 763 will close. It will close at 16.30, no matter it is included in the last node (h hm=22.00). So, I must get that 16.30.
Is this possible?
Thanks in advance.
Here is how such an XPatch expression can be constructed:
The following XPath expression selects the wanted result
($vOpen[not(count(following-sibling::h[1] | $vOpen)
=
count($vOpen))
][1]/#hm
|
$vOpen[last()]/#hm
)
[1]
where $vOpen
is defined as:
$vge1240[store/#id=$vId]
and $vge1240 is defined as:
/*/*/h[translate(#hm,':','') >= 1240]
and $vId is defined as:
1045
or
763
The above variables may be defined and referenced within an XSLT stylesheet or, if XPath is embedded in another host, then each variable reference must be substituted with the right-hand-side of the variable definition. In this case the complete XPath expression will be:
(/*/*/h[translate(#hm,':','') >= 1240][store/#id=763]
[not(count(following-sibling::h[1]
|
/*/*/h[translate(#hm,':','') >= 1240][store/#id=763]
)
=
count(/*/*/h[translate(#hm,':','') >= 1240][store/#id=763]))
]
[1]
/#hm
|
/*/*/h[translate(#hm,':','') >= 1240][store/#id=763]
[last()]
/#hm
)
[1]
Explanation:
($vOpen[not(count(following-sibling::h[1] | $vOpen)
=
count($vOpen))
][1]/#hm
|
$vOpen[last()]/#hm
)
[1]
means the following:
From all entries in the "Open" hours that contain the id (763)
Take those, whose immediate following sibling does not belong to that set (closed or not containing 763)
From those take the first one.
Take the first one (in document order) from the node selected in step 3. above and the last element in $vOpen. This will select the last entry in the "Open" hours if all entries in it contain the given Id.
Here we use essentially the Kayesian method for intersection of two nodesets $ns1 and $ns2:
$ns1[count(. | $ns2) = count($ns2)]
I'll just repeat the last part of my answer to you in that last question you refer to.
It would more pragmatic to load the XML into some data structures that are more conducive to your requirements. This secondary question just re-inforces the sensibleness of that advice.
I think this works. It finds the last consecutive occurrence (after the start time) of store/#id where #id = $id, which is what I believe you were looking for.
<xsl:variable name="id" select="'1045'"/>
<xsl:variable name="st" select="translate('12:40',':','')"/>
...
((//h[translate(#hm,':','') >= $st])[position() = count(preceding-sibling::*[store/#id=$id])+1 and store/#id=$id])[last()]/#hm
Explanation:
First, select all elements that share a start time >= to the provided start. Then, within those results, select the #hm attribute of the last element which has a position equal to the number of preceding elements that have the requested store/#id tag within them.
The one limitation in this short version is that it'll fail if the store id does not occur within the the first element after the start time. The one below fixes that limitation by starting with the first after the start time that contains the proper store/#id, but it's a bit messier:
<xsl:variable name="id" select="'1045'"/>
<xsl:variable name="st" select="translate('12:40',':','')"/>
...
((//h[translate(#hm,':','') >= $st and position() >= count(//h[not(preceding-sibling::*[store/#id=$id])])])[position() = count(preceding-sibling::*[store/#id=$id])+1 and store/#id=$id])[last()]/#hm
If the store ID you're looking for is in $store, this will get the last h/#hm which contains a that ID and is followed by an h which does not contain it or the last h/#hm which contains that store ID (for stores which close in the last h/#hm).
//h[not(store/#id=$store)]/preceding-sibling::h[store/#id = $store][1]/#hm | //h[store/#id=$store][last()]/#hm
To test in XSLT with your example XML:
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:variable name="store" select="763"/>
<xsl:template match="/">
<closes>
<xsl:value-of select="//h[not(store/#id=$store)]/preceding-sibling::h[store/#id = $store][1]/#hm | //h[store/#id=$store][last()]/#hm"/>
</closes>
</xsl:template>
</xsl:stylesheet>
Prints <closes>16:30</closes>. If you change $store to 1052, it prints <closes>22:00</closes> instead.