LINQ to XML question - linq

My requirement here is to retrieve the node that matches the hostname (for eg. machine1) and I always get back no results. Please let me know what the problem is?
Thanks for the help in advance!!!
XDocument configXML = XDocument.Load("the below xml");
var q = from s in configXML.Descendants("lcsetting")
where ((string)s.Element("host") == hostName)
select s;
The actual xml:
<lcsettings>
<lcsetting env="prod">
<hosts usagelogpath="">
<host>machine1</host>
<host>machine2</host>
<host>machine3</host>
</hosts>
</lcsetting>
<lcsetting env="qa">
<hosts usagelogpath="">
<host>machine4</host>
<host>machine5</host>
<host>machine6</host>
</hosts>
</lcsetting>
<lcsetting env="test">
<hosts usagelogpath="">
<host>machine7</host>
<host>machine8</host>
<host>machine9</host>
</hosts>
</lcsetting>
</lcsettings>

You're looking for a host element directly under an lcsetting - that doesn't occur because there's always a hosts element between the two in the hierarchy. You're also using Element instead of Elements, which means only the first element with the right name will be returned.
You could use Descendants again instead of Element... but you'll need to change the condition. Something like:
var q = from s in configXML.Descendants("lcsetting")
where s.Descendants("host").Any(host => host.Value == hostName)
select s;
Alternatively, you could make your query find host elements and then take the grandparent element in each case:
var q = from host in configXML.Descendants("host")
where host.Value == hostName
select host.Parent.Parent;
(This assumes a host element will only occur once per lcsetting; if that's not the case, you can add a call to Distinct.)

"host" is not a child of "lcsetting".

You're selecting the descendants lcsetting but then attempting to check the element host which is two levels below it. The Element() function references only child elements 1 level deep. I'd recommend changing this to:
XDocument configXML = XDocument.Load("the below xml");
var q = from s in configXML.Descendants("lcsetting")
where s.Descendants("host").SingleOrDefault(e => e.Value == hostname) != null
select s;

That is because you have a <hosts> tag immedieately below your lcsetting, that contains your <host> tags. <host> is not an immedieate child of <lcsetting>.
This query will work:
var q = from s in configXML.Descendants("lcsetting").SelectMany(lcSetting => lcSetting.Descendants("host"))
where s.Name == "host" && s.Value == hostName
select s;

Related

What is the syntax for a sorted XPath query on atomic values?

I am trying to execute the following select in my Java code:
// initialized earlier
private XdmNode xmlDocument;
private XPathCompiler xPath;
// ... the code that's a problem:
XPathExecutable exec = xPath.compile("sort(distinct-values(/root/data/hasTransaction/element/hasAssets/element/associatedAttributes/element[(value != '') and (dataParamName != 'modelNomenclature')]/name), (), function($node) { $node/displaySeq })");
XPathSelector selector = exec.load();
selector.setContextItem(xmlDocument);
selector.evaluate();
The call to evaluate() throws the exception:
net.sf.saxon.trans.XPathException: The required item type of the first operand of '/' is node(); the supplied value u"Model Name" is an atomic value
What is wrong with the query? I know distinct-values() returns atomic values, but why is there a problem sorting those? Is it that $node makes no sense to sort atomic values? But the select (without the sort) is:
/root/data/hasTransaction/element/hasAssets/element/associatedAttributes/element[(value != '') and (dataParamName != 'modelNomenclature')]/name
And there is at /root/data/hasTransaction/element/hasAssets/element/associatedAttributes:
<associatedAttributes>
<element>
<uom>N/A</uom>
<name>Model Name</name>
<dataParamName>modelName</dataParamName>
<seq />
<value>A17-230P1A</value>
<displayLevel>0</displayLevel>
<displaySeq>5</displaySeq>
<displayLevelTitle />
<displayName>Model Name</displayName>
<dataGroup />
</element>
So it seems logical (to me) as it's "...[...]/name" that it can sort on displaySeq
The "/" operator requires a node (not an atomic value) on the LHS (as the error message says).
You're trying, I think, to eliminate element elements as duplicates if they have the same name child, and then to sort them by the value of displaySeq. Unfortunately distinct-values() only retains the (atomic) values, it loses knowledge of the nodes from which these values were derived. (And in principle at least, two elements with the same name can have different values for displaySeq, so it's not clear which one you want to retain.
Ideally you would use XSLT or XQuery grouping for this, rather than distinct-values. If you have to use XPath, you could consider creating a map to do the deduplication:
let $index := map:merge(/root/data/hasTransaction/element/hasAssets
/element/associatedAttributes/element[
(value != '') and (dataParamName != 'modelNomenclature')
]!map{displaySeq : .}/name)
return sort(map:for-each($index, function($k, $v){$v}),
(), function($node) { $node/displaySeq })
Not tested.
Perhaps
distinct-values(sort(/root/data/hasTransaction/element/hasAssets/element/associatedAttributes/element[(value != '') and (dataParamName != 'modelNomenclature')], (), function($node) { $node/displaySeq })!name)
gives you the right result, it sorts the element elements by the displaySeq, than selects the name child elements and computes the distinct ones.
You could also write it as
(/root/data/hasTransaction/element/hasAssets/element/associatedAttributes/element[(value != '') and (dataParamName != 'modelNomenclature')] => sort((), function($node) { $node/displaySeq })) ! name => distinct-values()

Problems with '._ElementUnicodeResult'

While trying to help another user out with some question, I ran into the following problem myself:
The object is to find the country of origin of a list of wines on the page. So we start with:
import requests
from lxml import etree
url = "https://www.winepeople.com.au/wines/Dry-Red/_/N-1z13zte"
res = requests.get(url)
content = res.content
res = requests.get(url)
tree = etree.fromstring(content, parser=etree.HTMLParser())
tree_struct = etree.ElementTree(tree)
Next, for reasons I'll get into in a separate question, I'm trying to compare the xpath of two elements with certain attributes. So:
wine = tree.xpath("//div[contains(#class, 'row wine-attributes')]")
country = tree.xpath("//div/text()[contains(., 'Australia')]")
So far, so good. What are we dealing with here?
type(wine),type(country)
>> (list, list)
They are both lists. Let's check the type of the first element in each list:
type(wine[0]),type(country[0])
>> (lxml.etree._Element, lxml.etree._ElementUnicodeResult)
And this is where the problem starts. Because, as mentioned, I need to find the xpath of the first elements of the wine and country lists. And when I run:
tree_struct.getpath(wine[0])
The output is, as expected:
'/html/body/div[13]/div/div/div[2]/div[6]/div[1]/div/div/div[2]/div[2]'
But with the other:
tree_struct.getpath(country[0])
The output is:
TypeError: Argument 'element' has incorrect type (expected
lxml.etree._Element, got lxml.etree._ElementUnicodeResult)
I couldn't find much information about _ElementUnicodeResult), so what is it? And, more importantly, how do I fix the code so that I get an xpath for that node?
You're selecting a text() node instead of an element node. This is why you end up with a lxml.etree._ElementUnicodeResult type instead of a lxml.etree._Element type.
Try changing your xpath to the following in order to select the div element instead of the text() child node of div...
country = tree.xpath("//div[contains(., 'Australia')]")

Node Selection - Two Tags Deep from Current Node

I am using HTML Agility Pack. I have an HTMLNode which has the following InnerHtml:
"Item: <b>Link Text</b>"
From this node, I want to select the "Link Text" from within the "a" tag. I have not been able to do this. I have tried this:
System.Diagnostics.Debug.WriteLine(node.InnerHtml);
//The above line prints "Item: <b>Link Text</b>"
HtmlNode boldTag = node.SelectSingleNode("b");
if (boldTags != null)
{
HtmlNode linkTag = boldTag.SelectSingleNode("a");
//This is always null!
if (linkTag != null)
{
return linkTag.InnerHtml;
}
}
Any help to get the selection correct would be appreciated.
SelectSingleNode expects an XPath
So you need
var b = htmlDoc.DocumentNode.SelectSingleNode("//b");
var a = b.SelectSingleNode("./a");
var text = a.InnerText;
in one line
var text = htmlDoc.DocumentNode.SelectSingleNode("//b/a").InnerText;
Note that at the begining of the xpath
// will look anywhere in DocumentNode
.// will look for a descendant of the current node
/ will look for a child of the DocumentNode
./ will look for a child of the current node

Syntax error about XPath in Nokogiri, when combining namespace and node()

I'm learning XPath with Nokogiri. The XPath is like this:
xml_doc = Nokogiri::XML(open("test.xml"))
result = xml_doc.xpath("//x:foo", 'x' => 'www.example.com')
I could get the results. But when I perform this call:
result = xml_doc.xpath("//x:node()", 'x' => 'www.example.com')
I get an error:
Nokogiri::XML::XPath::SyntaxError: Invalid expression: //x:node()
Am I doing something wrong?
Different from elements, you don't need to use a namespace prefix to match by node(). The following will return all nodes in any namespace just fine:
result = xml_doc.xpath("//node()")
There are several types of nodes in XPath, namely text node, comment node, element node, so on. node() is a node tests which simply returns true for any node type whatsoever. Compare to text() which is another type of node tests that returns true only for text nodes. (See "w3.org > Xpath > Node Tests")
In my understanding, the notion of local name and namespace are only exists in the context of element nodes, so using a namespace prefix along with the node() test simply doesn't make sense.
If you meant to select all elements in a specific namespace use * instead of node():
result = xml_doc.xpath("//x:*", 'x' => 'www.example.com')

JDOM-XPath: Can't get the second value in a collection

I'm pretty confused about this one. Given the following xml:
<sch:eventList>
<sch:event>
<sch:eventName>Event One</sch:eventName>
<sch:locationName>Location One</sch:locationName>
</sch:event>
<sch:event>
<sch:eventName>Event Two</sch:eventName>
<sch:locationName>Location Two</sch:locationName>
</sch:event>
</sch:eventList>
When using JDOM using the following code:
XPath eventNameExpression = XPath.newInstance("//sch:eventName");
XPath eventLocationExpression = XPath.newInstance("//sch:eventLocation");
XPath eventExpression = XPath.newInstance("//sch:event");
List<Element> elements = eventExpression.selectNodes(requestElement);
for(Element e: elements) {
System.out.println(eventNameExpression.valueOf(e));
System.out.println(eventLocationExpression.valueOf(e));
}
The console shows this:
Event One
Location One
Event One
Location One
What am I missing?
Don't use '//' it starts always searching at the root node. Use e.g. './sch:eventName' it is relative to the current node.

Resources