There is a xml :
<mgns1:Champ_supplementaire>
<mgns1:CODE_CS>1</mgns1:CODE_CS>
<mgns1:VALEUR_CS>2</mgns1:VALEUR_CS>
</mgns1:Champ_supplementaire>
<mgns1:Champ_supplementaire>
<mgns1:CODE_CS>2</mgns1:CODE_CS>
<mgns1:VALEUR_CS>M</mgns1:VALEUR_CS>
</mgns1:Champ_supplementaire>
<mgns1:Champ_supplementaire>
<mgns1:CODE_CS>3</mgns1:CODE_CS>
<mgns1:VALEUR_CS>LOC</mgns1:VALEUR_CS>
</mgns1:Champ_supplementaire>
I want to get the node mgns1:Champ_supplementaire having a child mgns1:CODE_CS which text's is 2. How to do that ?
I tried
NodeList nodeliste_cs2 = (NodeList) xpath.evaluate( "//mgns1:Champ_supplementaire[//mgns1:CODE_CS=2]//mgns1:VALEUR_CS",doc, XPathConstants.NODESET);
//node_foo[//node_bar=2]
means select first found node_foo if there is a node_bar with value 2 anywhere in DOM
//node_foo[node_bar=2]
means select first found node_foo if it has its own child node_bar with value 2
So you need
"//mgns1:Champ_supplementaire[mgns1:CODE_CS=2]/mgns1:VALEUR_CS"
Related
I have the following setup
<Ancestor_element_*****> Ancestor value
L
......
L
<Child_element> Child value *****
I have part of the child value and part of the ancestor node name. I need to get the Ancestor value (I do not know the exact level of nesting). Can this be done via an XPath query?
You are looking for a child element whose text contains "Child value", then you want its ancestor whose name contains "Ancestor_element", and you want its text value:
//Child_element[contains(text(),'Child value')]
/ancestor::*[contains(name(),'Ancestor_element')]/text()
Tested against
<Root>
<Ancestor_element_1>Ancestor value
<Something/>
<Something_in_between>
<Child_element> Child value 1</Child_element>
</Something_in_between>
</Ancestor_element_1>
</Root>
in xsh.
While trying to help another user out with some question, I ran into the following problem myself:
The object is to find the country of origin of a list of wines on the page. So we start with:
import requests
from lxml import etree
url = "https://www.winepeople.com.au/wines/Dry-Red/_/N-1z13zte"
res = requests.get(url)
content = res.content
res = requests.get(url)
tree = etree.fromstring(content, parser=etree.HTMLParser())
tree_struct = etree.ElementTree(tree)
Next, for reasons I'll get into in a separate question, I'm trying to compare the xpath of two elements with certain attributes. So:
wine = tree.xpath("//div[contains(#class, 'row wine-attributes')]")
country = tree.xpath("//div/text()[contains(., 'Australia')]")
So far, so good. What are we dealing with here?
type(wine),type(country)
>> (list, list)
They are both lists. Let's check the type of the first element in each list:
type(wine[0]),type(country[0])
>> (lxml.etree._Element, lxml.etree._ElementUnicodeResult)
And this is where the problem starts. Because, as mentioned, I need to find the xpath of the first elements of the wine and country lists. And when I run:
tree_struct.getpath(wine[0])
The output is, as expected:
'/html/body/div[13]/div/div/div[2]/div[6]/div[1]/div/div/div[2]/div[2]'
But with the other:
tree_struct.getpath(country[0])
The output is:
TypeError: Argument 'element' has incorrect type (expected
lxml.etree._Element, got lxml.etree._ElementUnicodeResult)
I couldn't find much information about _ElementUnicodeResult), so what is it? And, more importantly, how do I fix the code so that I get an xpath for that node?
You're selecting a text() node instead of an element node. This is why you end up with a lxml.etree._ElementUnicodeResult type instead of a lxml.etree._Element type.
Try changing your xpath to the following in order to select the div element instead of the text() child node of div...
country = tree.xpath("//div[contains(., 'Australia')]")
I have many statements like this in my test.xml file
<House name="bla"><Room id="bla" name="black" ></Room></House>
How do I print all Rooms with name="black". I am using CSS selector but Only House and Room attributes are taken by the selector.
I started with trying to print all name's, doesn't matter House or Room.
nodes = doc.css("name"). But it gives null as the output. So I am not able to proceed.
In CSS you have a syntax for matching elements by an attribute key-val pair:
nodes = doc.css("[name='black']")
For future reference you can also chain attribute selectors
nodes = doc.css(".my-class[name='black'][foo='bar']")
Or omit the val and match any element where the attribute is present:
nodes = doc.css("[name]")
I have several nodes (see below). I know how to select specific nodes which have a certain attribute. But in this case I would like to import the "file_url" value of the media objects that belong to the group "narrowImage".
<media_object>
<media_object>
<file_id>5175967</file_id>
<group>wideImage</group>
<file_url>http://www.mysite.com/image1.jpg</file_url>
</media_object>
<media_object>
<file_id>5175968</file_id>
<group>wideImage</group>
<file_url>http://www.mysite.com/image2.jpg</file_url>
</media_object>
<media_object>
<file_id>5175969</file_id>
<group>narrowImage</group>
<file_url>http://www.mysite.com/image3.jpg</file_url>
</media_object>
</media_object>
In the above case i would only need the value "http://www.mysite.com/image3.jpg"
any xpath expert out there who can point me in the right direction?
Use:
/*/*[group = 'narrowImage']/file_url
This selects any file_url element that is a "grand-child" of the top element in the XML document, and whose parent has a group child-element whose string value is 'narrowImage'.
I think you should be able to use:
//media_object[group='narrowImage']/file_url
This should select every media_object in your file (regardless of the level) then filter them based on group='narrowImage' then give you the file_url child.
I'm pretty confused about this one. Given the following xml:
<sch:eventList>
<sch:event>
<sch:eventName>Event One</sch:eventName>
<sch:locationName>Location One</sch:locationName>
</sch:event>
<sch:event>
<sch:eventName>Event Two</sch:eventName>
<sch:locationName>Location Two</sch:locationName>
</sch:event>
</sch:eventList>
When using JDOM using the following code:
XPath eventNameExpression = XPath.newInstance("//sch:eventName");
XPath eventLocationExpression = XPath.newInstance("//sch:eventLocation");
XPath eventExpression = XPath.newInstance("//sch:event");
List<Element> elements = eventExpression.selectNodes(requestElement);
for(Element e: elements) {
System.out.println(eventNameExpression.valueOf(e));
System.out.println(eventLocationExpression.valueOf(e));
}
The console shows this:
Event One
Location One
Event One
Location One
What am I missing?
Don't use '//' it starts always searching at the root node. Use e.g. './sch:eventName' it is relative to the current node.