Selecting attributes with Hyphen in chrome developer tool using xpath - xpath

How can I find the nodes that are #is-visible = 'true' using xpath in the document below? Essentially, how can I escape the hyphen
<root>
<year is-visible = 'true'>November 2020</year>
<year ishidden = 'true'>October 1998</year>
</root>

Related

XPath to only select the text contained within an element

I am new to xpath so I apologize in advance for how basic this question is.
How do I extract just the text from a specific element? For example, how would I extract just "text"
<h1>text</h1>
I tried the following but it seems to select everything including the tags instead of just the text.
//h1/text()
Thanks for your help
`
DocumentBuilderFactory docFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new File("src/myFile.xml"));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String sessionId = (String) xpath
.evaluate(
"/Envelope/Body/LoginProcessResponse/loginResponse/sessionId",
doc, XPathConstants.STRING);
`
here Envelope is my parent element and i just traversed to the required path(in my case it is sessionid).
Hope it helps
This answer is rather an XSLT answer than an XPath answer, but many of the concepts are nevertheless applicable.
The XPath expression
//h1/text()
seems to be correct. It does select all text() nodes that are direct children of <h1> elements.
But one problem may be, that the XSL default template still copies all the othertext() nodes like described here in the W3C specification:
In the absence of a select attribute, the xsl:apply-templates instruction processes all of the children of the current node, including text nodes.
So to solve your problem, you have to define an explicit template that
ignores all other text() nodes like this:
<xsl:template match="text()" />
If you add this line to your XSL processing, the result will most likely be more pleasant to you.

Nested xpath: How do I use the result of an XPath expression as value?

I am having the following XML structure:
<xml>
<value>b</value>
<objects>
<object>
<value>a</value>
</object>
<object>
<value>b</value>
</object>
</objects>
</xml>
What I want is to select the second object, based on the value in the xml.
This XPath works:
//xml/objects/object[value = 'b']
This XPath does not return results:
//xml/objects/object[value = //xml/value/text()]
Are nested XPath expressions not supported?
They are, but the search within a predicate is always relative to the context you currently in.
Currently you start looking for an <xml/> element which is a child of <object/> and as there is none it will yield an empty result set.
Using ../ or parent::* you can go an axis step up to the parent and can select the required value:
//xml/objects/object[value = ../../value]

xpath: Picking tag after text

How would one, via xpath, select the strong tag after baz text for example?
<p>
<br>foo<strong>this foo</strong>
<br>bar<strong>this bar</strong>
<br>baz<strong>this baz</strong>
<br>qux<strong>this qux</strong></p>
Obviously the following does not work....
//p[text() = 'baz']/following-sibling::select[1]
Try this
//p/text()[. = 'baz']/following-sibling::strong[1]
Demo here - http://www.xpathtester.com/obj/b67bad4d-4d38-4e2d-a3df-b7e5a2e9f286
This solution relies on no whitespace around your text nodes. You will need to switch to using the following if you start using indentation or other whitespace characters
//p/text()[normalize-space(.) = 'baz']/following-sibling::strong[1]

XPath expression for selecting all text in a given node, and the text of its chldren

Basically I need to scrape some text that has nested tags.
Something like this:
<div id='theNode'>
This is an <span style="color:red">example</span> <b>bolded</b> text
</div>
And I want an expression that will produce this:
This is an example bolded text
I have been struggling with this for hour or more with no result.
Any help is appreciated
The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order.
You want to call the XPath string() function on the div element.
string(//div[#id='theNode'])
You can also use the normalize-space function to reduce unwanted whitespace that might appear due to newlines and indenting in the source document. This will remove leading and trailing whitespace and replace sequences of whitespace characters with a single space. When you pass a nodeset to normalize-space(), the nodeset will first be converted to it's string-value. If no arguments are passed to normalize-space it will use the context node.
normalize-space(//div[#id='theNode'])
// if theNode was the context node, you could use this instead
normalize-space()
You might want use a more efficient way of selecting the context node than the example XPath I have been using. eg, the following Javascript example can be run against this page in some browsers.
var el = document.getElementById('question');
var result = document.evaluate('normalize-space()', el, null ).stringValue;
The whitespace only text node between the span and b elements might be a problem.
Use:
string(//div[#id='theNode'])
When this expression is evaluated, the result is the string value of the first (and hopefully only) div element in the document.
As the string value of an element is defined in the XPath Specification as the concatenation in document order of all of its text-node descendants, this is exactly the wanted string.
Because this can include a number of all-white-space text nodes, you may want to eliminate contiguous leading and trailing white-space and replace any such intermediate white-space by a single space character:
Use:
normalize-space(string(//div[#id='theNode']))
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
"<xsl:copy-of select="string(//div[#id='theNode'])"/>"
===========
"<xsl:copy-of select="normalize-space(string(//div[#id='theNode']))"/>"
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<div id='theNode'> This is an
<span style="color:red">example</span>
<b>bolded</b> text
</div>
the two XPath expressions are evaluated and the results of these evaluations are copied to the output:
" This is an
example
bolded text
"
===========
"This is an example bolded text"
If you are using scrapy in python, you can use descendant-or-self::*/text(). Full example:
txt = """<div id='theNode'>
This is an <span style="color:red">example</span> <b>bolded</b> text
</div>"""
selector = scrapy.Selector(text=txt, type="html") # Create HTML doc from HTML text
all_txt = selector.xpath('//div/descendant-or-self::*/text()').getall()
final_txt = ''.join( _ for _ in all_txt).strip()
print(final_txt) # 'This is an example bolded text'
How about this :
/div/text()[1] | /div/span/text() | /div/b/text() | /div/text()[2]
Hmmss I am not sure about the last part though. You might have to play with that.
normal code
//div[#id='theNode']
to get all text but if they become split then
//div[#id='theNode']/text()
Not sure but if you provide me the link I will try

How do I retrieve element text inside CDATA markup via XPath?

Consider the following xml fragment:
<Obj>
<Name><![CDATA[SomeText]]></Name>
</Obj>
How do I retrieve the "SomeText" value via XPath? I'm using Nauman Leghari's (excellent) Visual XPath tool.
/Obj/Name returns the element
/Obj/Name/text() returns blank
I don't think its a problem with the tool (I may be wrong) - I also read XPath can't extract CDATA (See last response in this thread) - which sounds kinda weird to me.
/Obj/Name/text() is the XPath to return the content of the CDATA markup.
What threw me off was the behavior of the Value property. For an XMLNode (DOM world), the XmlNode.Value property of an Element (with CDATA or otherwise) returns Null. The InnerText property would give you the CDATA/Text content.
If you use Xml.Linq, XElement.Value returns the CDATA content.
string sXml = #"
<object>
<name><![CDATA[SomeText]]></name>
<name>OtherName</name>
</object>";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml( sXml );
XmlNamespaceManager nsMgr = new XmlNamespaceManager(xmlDoc.NameTable);
Console.WriteLine(#"XPath = /object/name" );
WriteNodesToConsole(xmlDoc.SelectNodes("/object/name", nsMgr));
Console.WriteLine(#"XPath = /object/name/text()" );
WriteNodesToConsole( xmlDoc.SelectNodes("/object/name/text()", nsMgr) );
Console.WriteLine(#"Xml.Linq = obRoot.Elements(""name"")");
XElement obRoot = XElement.Parse( sXml );
WriteNodesToConsole( obRoot.Elements("name") );
Output:
XPath = /object/name
NodeType = Element
Value = <null>
OuterXml = <name><![CDATA[SomeText]]></name>
InnerXml = <![CDATA[SomeText]]>
InnerText = SomeText
NodeType = Element
Value = <null>
OuterXml = <name>OtherName</name>
InnerXml = OtherName
InnerText = OtherName
XPath = /object/name/text()
NodeType = CDATA
Value = SomeText
OuterXml = <![CDATA[SomeText]]>
InnerXml =
InnerText = SomeText
NodeType = Text
Value = OtherName
OuterXml = OtherName
InnerXml =
InnerText = OtherName
Xml.Linq = obRoot.Elements("name")
Value = SomeText
Value = OtherName
Turned out the author of Visual XPath had a TODO for the CDATA type of XmlNodes. A little code snippet and I have CDATA support now.
MainForm.cs
private void Xml2Tree( TreeNode tNode, XmlNode xNode)
{
...
case XmlNodeType.CDATA:
//MessageBox.Show("TODO: XmlNodeType.CDATA");
// Gishu
TreeNode cdataNode = new TreeNode("![CDATA[" + xNode.Value + "]]");
cdataNode.ForeColor = Color.Blue;
cdataNode.NodeFont = new Font("Tahoma", 12);
tNode.Nodes.Add(cdataNode);
//Gishu
break;
CDATA sections are just part of what in XPath is known as a text node or in the XML Infoset as "chunks of character information items".
Obviously, your tool is wrong. Other tools, as the XPath Visualizer correctly highlight the text of the Name element when evaluating this XPath expression:
/*/Name/text()
One can also write a simple XSLT transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
"<xsl:value-of select="/*/Name"/>"
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<Obj>
<Name><![CDATA[SomeText]]></Name>
</Obj>
the correct result is produced:
"SomeText"
i think the thread you referenced says that the CDATA markup itself is ignored by XPATH, not the text contained in the CDATA markup.
my guess is that its an issue with the tool, the source code is available for download, maybe you can debug it...
See if this helps - http://www.zrinity.com/xml/xpath/
XPATH = /Obj/Name/text()
Just in case you run into a similar issue with jdom2, text() will be an array.
To recover CDATA, use /Obj/Name/text()
A suggestion would be to have another field of the md5 hash of the cdata. You can then use xpath to query based off the md5 with no issue
<sites>
<site>
<name>Google</name>
<url><![CDATA[http://www.google.com]]></url>
<urlMD5>ed646a3334ca891fd3467db131372140</urlMD5>
</site>
</sites>
Then you can search:
/sites/site[urlMD5=ed646a3334ca891fd3467db131372140]

Resources