How to treat node that has no text? - xpath

There is an xml :
<mgns1:Champ_supplementaire>
<mgns1:CODE_CS>3</mgns1:CODE_CS>
<mgns1:VALEUR_CS />
</mgns1:Champ_supplementaire>
When trying to get :
NodeList nodeliste_cs3 = (NodeList) xpath.evaluate( "//mgns1:Champ_supplementaire[mgns1:CODE_CS=3]/mgns1:VALEUR_CS",doc, XPathConstants.NODESET);
...
Node node_cs3 = nodeliste_cs3.item(i);
list.add(node_cs3.getTextContent() + ";");
I get NullPointerException ! So how to deal with node with no text ?

You can explicitly add predicate to specify that you want to select node only if it contains text:
...mgns1:VALEUR_CS[normalize-space()]

Related

extract Xpath for string in a div class

I have the below XPath
<div class="sic_cell {symbol : 'GGRM.JK'}">
Gudang Garam Tbk.
</div>
I would like to extract "GGRM.JK"from the HTML.
//div[contains(#class, "symbol")]
return element not no text of "GGRM.JK"
Since it seems you are using python, try the following:
import lxml.html as lh
data = """[your html above]"""
doc = lh.fromstring(data)
#version 1
target = doc.xpath('//div[contains(#class, "symbol")]/#class')[0]
print(target.split("'")[1])
#version 2
target2 = doc.xpath('//div[contains(#class, "symbol")]/a/#href')[0]
target2.split('=')[1]
In either case, the output should be
GGRM.JK
The shortest way to get the substing you want with xpath only, without postprocessing, is to use a functions substring-after and substring-before.
Here is an example, how to get 'GGRM.JK' from both class and href attributes.
import lxml.html as lh
htmlText = """<div class="sic_cell {symbol : 'GGRM.JK'}">
Gudang Garam Tbk.
</div>"""
htmlDom = lh.fromstring(htmlText)
fromHref = htmlDom.xpath('substring-after(//div/a/#href, "=")')
print(fromHref)
fromClass = htmlDom.xpath('substring-before(substring-after(//div/#class, ": \'"), "\'")')
print(fromClass)

python+selenium xpath Unable to locate element

company_name = 'google'
browser.get('https://m.tianyancha.com/search?key=&checkFrom=searchBox')
ele = browser.find_element_by_xpath("//input[#id='live-search']")
ele.clear()
ele.send_keys(company_name, Keys.ENTER)
name = browser.find_element_by_xpath(
"//div[#class='new-border-bottom pt5 pb5 ml15 mr15'][1]//a[#class='query_name in-block']/span/em")
if name.text:
if name.text == company_name:
check = '1'
else:
check = '0'
else:
check = '0'
the error is :
NoSuchElementException: Message: no such element: Unable to locate
element: {"method":"xpath","selector":"//div[#class='new-border-bottom
pt5 pb5 ml15 mr15'][1]//a[#class='query_name in-block']/span/em"}
Your relative Xpath is wrong.
name = browser.find_element_by_xpath(
"//div[#class='new-border-bottom pt5 pb5 ml15 mr15'][1]//a[#class='query_name in-block']/span/em"
you cannot have // two times in your xpath. // means relative from the element you start with.
Check your Xpath for name.

Changing node values in XML using VBS with similar names

I am facing this challenge with changing values of nodes in an xml file which has same names, using VBScript. Following is the sample XML:
- <MappingData>
<Name>Name 1</Name>
- <ValueField FieldName="Name 2">
<CharValue>Value 1</CharValue>
</ValueField>
- <ValueField FieldName="Name 3">
<CharValue>Value 1</CharValue>
</ValueField>
My requirement is to change the values of both occurrences of tag
<CharValue>
.
I could have done this if there was any attribute in these tags, but in this case I am stuck.
I tried the following code, but could'nt get what I need.
Set NodeList = objXMLDoc.documentElement.selectNodes("//MappingData/ValueField/CharValue")
For i = 0 To NodeList.length - 1
node.Text = "Value 1"
Next
Any help is appreciated. Thanks.
Try the following code
Set xDoc = CreateObject( "Msxml2.DOMDocument.6.0" )
xDoc.setProperty "SelectionLanguage", "XPath"
xDoc.async = False
xDoc.load "test.xml"
Set nNodes = xDoc.selectNodes("//MappingData/ValueField/CharValue")
For i = 0 To nNodes.Length - 1
nNodes(i).text = "Changed value " & i
Next
xDoc.Save "test.xml"

XmlUnit empty Elements

I try to compare two xml with xmlUnit. I have the following problem. When i have two empty elements like the example below xmlUnit identificate the elements as a difference. Can i configure xmlUnit to ignore this?
</name> and <name></name>
I am only interesting in difference like the next two examples.
<name>test1</name> and <name>test2</name>
difference: test1 and test2
or
<name>test1</name> and <name></name>
difference
test1 and ...
My code:
`
Diff diff = new Diff(fr1, fr2);
DetailedDiff detailedDiff = new DetailedDiff(diff);
List differenceList = detailedDiff.getAllDifferences();
List differences = detailedDiff.getAllDifferences();
for (Object object : differences) {
Difference difference = (Difference)object;
String node1;
String node2;
node1 = difference.getControlNodeDetail().getNode().getNodeName() + " " + difference.getControlNodeDetail().getNode().getNodeValue();
node2 = difference.getTestNodeDetail().getNode().getNodeName() + " " + difference.getTestNodeDetail().getNode().getNodeValue();
}
`
Assuming your </name> is a typo and it is <name/> as per the comment,
then you could try the following.
XMLUnit.setIgnoreWhitespace(true);
Seems to work for me.
ie.
When I try to compare <Carp1></Carp1> with <Carp1/>.
Without the above setting, I get
Expected text value '
' but was '
' - comparing <CfgDN ...>
</CfgDN> at /CfgDN[1]/text()[19] to <CfgDN ...>
</CfgDN> at /CfgDN[1]/text()[19]
With the above setting, all is similar and identical.

XElement null when attributes exist

Given the following xml:
<root xmlns="http://tempuri.org/myxsd.xsd">
<node name="mynode" value="myvalue" />
</root>
And given the following code:
string file = "myfile.xml"; //Contains the xml from above
XDocument document = XDocument.Load(file);
XElement root = document.Element("root");
if (root == null)
{
throw new FormatException("Couldn't find root element 'parameters'.");
}
If the root element contains the xmlns attribute then the variable root is null. If I remove the xmlns attribute then root is NOT null.
Can anyone explain why this is?
When you declare your root element like <root xmlns="http://tempuri.org/myxsd.xsd"> this means that your root element all of its descendants are in http://tempuri.org/myxsd.xsd namespace. By default namespace of an element has an empty namespace and XDocument.Element looks for elements without namespace. If you want to access an element with a namespace you should explicitly specify the namespace.
var xdoc = XDocument.Parse(
"<root>" +
"<child0><child01>Value0</child01></child0>" +
"<child1 xmlns=\"http://www.namespace1.com\"><child11>Value1</child11></child1>" +
"<ns2:child2 xmlns:ns2=\"http://www.namespace2.com\"><child21>Value2</child21></ns2:child2>" +
"</root>");
var ns1 = XNamespace.Get("http://www.namespace1.com");
var ns2 = XNamespace.Get("http://www.namespace2.com");
Console.WriteLine(xdoc.Element("root")
.Element("child0")
.Element("child01").Value); // Value0
Console.WriteLine(xdoc.Element("root")
.Element(ns1 + "child1")
.Element(ns1 + "child11").Value); // Value1
Console.WriteLine(xdoc.Element("root")
.Element(ns2 + "child2")
.Element("child21").Value); // Value2
For your case
var ns = XNamespace.Get("http://tempuri.org/myxsd.xsd");
xdoc.Element(ns + "root").Element(ns + "node").Attribute("name")

Resources