Avoiding namespace prefixes with Saxon XPath against XHTML

Avoiding namespace prefixes with Saxon XPath against XHTML - xpath

Using Saxon HE 9.6 as a JAXP implementation
Have an HTML document with the XHTML namespace
//*:title returns the expected value, but //title doesn't
I'd really like to just use //title. How can this be done?
Alternatively, can I just remove a namespace from an already constructed Document?

See https://saxonica.plan.io/boards/3/topics/1649, you can cast the JAXP XPath object you have created from a Saxon XPathFactory implementation to a net.sf.saxon.xpath.XPathEvaluator and then set the default XPath namespace for XPath evaluation with e.g.
((XPathEvaluator)xpath).getStaticContext().setDefaultElementNamespace("http://www.w3.org/1999/xhtml");
Then a path //title will select title elements in the XHTML namespace. I tested that to work in a sample
XPathFactory xpathFactory = new XPathFactoryImpl();
XPath xpath = xpathFactory.newXPath();
((XPathEvaluator)xpath).getStaticContext().setDefaultElementNamespace("http://www.w3.org/1999/xhtml");
String xhtmlSample = "<html xmlns='http://www.w3.org/1999/xhtml'><head><title>This is a test</title></head><body><h1>Test</h1></body></html>";
InputSource source = new InputSource(new StringReader(xhtmlSample));
System.out.println("Found: " + xpath.evaluate("//title", source));

Related

XmlDocument - xpath returns nothing

I'm trying to read this science.org feed: https://www.science.org/rss/news_current.xml
with this simple code:
using var httpClient = new HttpClient();
var request = new HttpRequestMessage(HttpMethod.Get, url);
var response = httpClient.Send(request);
var content = await response.Content.ReadAsStringAsync();
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(content);
var items = xmlDoc.DocumentElement?.SelectNodes("//item");
if (items != null)
{
Console.WriteLine($"{url}: items={items.Count}");
}
but I get 0 items...
(the 'content' variable is good and contains the right xml data)
It works for other RSS feeds.
any idea of what I'm doing wrong?

Note that the root element includes this default namespace declaration: xmlns="http://purl.org/rss/1.0/", which means that the names of elements within the document are qualified by that namespace URI, unless they have an explicit namespace prefix. Your item elements have no prefix, which means they do belong to that RSS namespace.
So instead of querying for elements named item, you will need to include a namespace prefix in your query, e.g. //rss:item, and of course to allow that prefix to make sense to the SelectNodes method, you'll need to bind the rss prefix to the namespace URI http://purl.org/rss/1.0/. See the documentation for SelectNodes for information about how to handle the namespace.

You can also use XPath 2 and do e.g.
using System.Xml;
using Wmhelp.XPath2;
var doc = new XmlDocument(new NameTable());
doc.Load(#"https://www.science.org/rss/news_current.xml");
var xmlNamespaceMgr = new XmlNamespaceManager(doc.NameTable);
xmlNamespaceMgr.AddNamespace("", "http://purl.org/rss/1.0/");
var items = doc.XPath2SelectNodes("//item", xmlNamespaceMgr);
Console.WriteLine(items.Count);
by using the NuGet package https://www.nuget.org/packages/XPath2.

OData V4 + WebAPI Filter by Int Value of Enum?

OData V4 has enum support but it appears you have to search by the namespace only. How does one now search by the value instead of the text representation?
In V3 of odata you could query for $filter=Status eq 35, where 35 is Complete in the enum. This method would work, even if that field was an enum field in the data model.
Now this method fails in V4, instead requiring the namespace with text representation of the enum.
I want to get the V3 support working again without having to lose the other features of odata V4. Searching by the int value for the enum item seems more reliable than searching for text. Older odata clients (such as kendo) don't support a by-text enum filtering method.

to do that in OData v4, we can enable the EnumPrefixFree in the initial webapi configuration, so we dont have to write the full enum namespace as prefix :
public static void Register(HttpConfiguration config)
{
// ...
config.EnableEnumPrefixFree(enumPrefixFree: true);
config.MapODataServiceRoute("odata", "odata", YourEdmModem);
// ...
}
then, we can filter any enum by String or Int value :
$filter=Status eq 'single'
or
$filter=Status eq 1
hope this helps.

With v4, you have to add the namespace as the prefix and surround the value with the single quote such as http://services.odata.org/V4/(S(m1bhpaebr1yvzx5vtz5v4ur1))/TripPinServiceRW/People?$filter=Gender%20eq%20Microsoft.OData.SampleService.Models.TripPin.PersonGender'1' , where 1 represents Female.
Here is a quotation from the ABNF of the protocol http://docs.oasis-open.org/odata/odata/v4.0/os/abnf/odata-abnf-construction-rules.txt:
enum = qualifiedEnumTypeName SQUOTE enumValue SQUOTE
enumValue = singleEnumValue *( COMMA singleEnumValue )
singleEnumValue = enumerationMember / enumMemberValue
enumMemberValue = int64Value

How to resolve entity when using saxon

I am using saxon to process my xpath, but sometimes xml file comes with namespace declaration which make my class to throw exception.
Is there any way to ignore namespace while using saxon as we do with dom i.e
builder.setEntityResolver(new EntityResolver()
{
public InputSource resolveEntity(String publicId,
String systemId) throws SAXException,IOException
{
return null;
}
});

If you do not want to use namespaces in your XPath, you can use local-name(), for example:
/pref:root/pref:element1[#attr="value"]/pref:element2
If you have the above XPath (with namespaces) you can also write is as this:
/*[local-name() = "root"]/*[local-name() = "element1"][#attr="value"]/*[local-name() = "element2"]
This will allow you not to use namespaces

selenium2.0 webelement cannot get Attribute of the html

I used pagefactory mode, and in my bean file, i declare the WebElement by using xpath:
#FindBy(xpath ='//div[5]/div/div/dl/dd[4]')
def public WebElement nextPage //nextpage
and in my factory file(this class extends the bean class), i used
nextPage.getAttribute("class")
but the result return me a null or empty. i dont know why...I just want to get the class of the following html, to judge if this is a clickable link or a common text.
here is the html:
<a class="easyquery_paginglink" href='javascript:gotoPage("consumeRecord","consumeRecord",2)'>nextpage</a>

Your XPath could be "//a[Text() = 'nextpage']" then use .getAttribute("class");
So:
IWebElement element = _driver.FindElement(By.XPath("//a[Text() = 'nextpage']"));
string className = element.GetAttribute("class");

A better solution than element.Elements("Whatever").First()?

I have an XML file like this:
<SiteConfig>
<Sites>
<Site Identifier="a" />
<Site Identifier="b" />
<Site Identifier="c" />
</Sites>
</SiteConfig>
The file is user-editable, so I want to provide reasonable error message in case I can't properly parse it. I could probably write a .xsd for it, but that seems kind of overkill for a simple file.
So anyway, when querying for the list of <Site> nodes, there's a couple of ways I could do it:
var doc = XDocument.Load(...);
var siteNodes = from siteNode in
doc.Element("SiteConfig").Element("Sites").Elements("Site")
select siteNode;
But the problem with this is that if the user has not included the <SiteUrls> node (say) it'll just throw a NullReferenceException which doesn't really say much to the user about what actually went wrong.
Another possibility is just to use Elements() everywhere instead of Element(), but that doesn't always work out when coupled with calls to Attribute(), for example, in the following situation:
var siteNodes = from siteNode in
doc.Elements("SiteConfig")
.Elements("Sites")
.Elements("Site")
where siteNode.Attribute("Identifier").Value == "a"
select siteNode;
(That is, there's no equivalent to Attributes("xxx").Value)
Is there something built-in to the framework to handle this situation a little better? What I would prefer is a version of Element() (and of Attribute() while we're at it) that throws a descriptive exception (e.g. "Looking for element <xyz> under <abc> but no such element was found") instead of returning null.
I could write my own version of Element() and Attribute() but it just seems to me like this is such a common scenario that I must be missing something...

You could implement your desired functionality as an extension method:
public static class XElementExtension
{
public static XElement ElementOrThrow(this XElement container, XName name)
{
XElement result = container.Element(name);
if (result == null)
{
throw new InvalidDataException(string.Format(
"{0} does not contain an element {1}",
container.Name,
name));
}
return result;
}
}
You would need something similar for XDocument. Then use it like this:
var siteNodes = from siteNode in
doc.ElementOrThrow("SiteConfig")
.ElementOrThrow("SiteUrls")
.Elements("Sites")
select siteNode;
Then you will get an exception like this:
SiteConfig does not contain an element SiteUrls

You could use XPathSelectElements
using System;
using System.Linq;
using System.Xml.Linq;
using System.Xml.XPath;
class Program
{
static void Main()
{
var ids = from site in XDocument.Load("test.xml")
.XPathSelectElements("//SiteConfig/Sites/Site")
let id = site.Attribute("Identifier")
where id != null
select id;
foreach (var item in ids)
{
Console.WriteLine(item.Value);
}
}
}
Another thing that comes to mind is to define an XSD schema and validate your XML file against this schema. This will generate meaningful error messages and if the file is valid you can parse it without problems.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Avoiding namespace prefixes with Saxon XPath against XHTML - xpath

Using Saxon HE 9.6 as a JAXP implementation Have an HTML document with the XHTML namespace //*:title returns the expected value, but //title doesn't I'd really like to just use //title. How can this be done? Alternatively, can I just remove a namespace from an already constructed Document?

Related

XmlDocument - xpath returns nothing

OData V4 + WebAPI Filter by Int Value of Enum?

How to resolve entity when using saxon

selenium2.0 webelement cannot get Attribute of the html

A better solution than element.Elements("Whatever").First()?

Categories

Resources