XDMP-REGEX: (err:FORX0002) - String transformation with Regular expressions - xpath

I am working on xquery requirement to identify the xml tag name() from the XML document using the regex. Later , will do the transformation on data.It searches the entire document and If i found match, am doing string :replace using xquery/xpath.
Please find some sample code which am looking for.
let $full-doc := fn:doc($uri)
if(fn:matches($full-doc,"<Hyperlink\b[^\>]*?>([A-Z][a-z]{2} [0-3]?[0-9]
[12][890][0-9]{2})</Hyperlink>"))
then $full-doc
else "regex is not working"
I am getting the following Error.
regex-match :
[1.0-ml] XDMP-REGEX: (err:FORX0002) fn:matches(fn:doc("44215.xml"), "
<Hyperlink\b[^\>]*?>([A-Z][a-z]{2} [0-3]?[0-9] [12][890][0-9]{2}...") -
- Invalid regular expression
Could some one please explain why my regex is not working ?

Looking at your requirement:
I am working on xquery requirement to identify the xml tag name() from the XML document using the regex.
You are going about this entirely the wrong way. XQuery doesn't see the lexical XML, it sees a tree of nodes. To find the name of an element, use an XPath expression to find the element, then use the name() function to get its name.
If you want to find an element whose name matches a regex, use //*[matches(name(), $regex)]

The word boundary code \b is not supported in XQuery (see https://www.w3.org/TR/xpath-functions-31/#regex-syntax).
But I guess you are looking for Hyperlink elements, not for a <Hyperlink> substring, so you should use a path expression:
let $doc := fn:doc($uri)
where $doc//Hyperlink[matches(., '([A-Z][a-z]{2} [0-3]?[0-9] [12][890][0-9]{2})')]
return $doc

Related

Using a regex to get a Nokogiri node

I'm parsing an XML file with Nokogiri.
Currently, I'm using the following to get the value I need (the document includes multiple Phase nodes):
xml.xpath("//Phase[#text=' = STER P=P(T) ']")
But now, the uploaded XML file can have a text attribute with a different value. Thus, I'm trying to update my code using a regular expression since the value always contains STER.
After looking at a few questions on SO, I tried
xml.xpath("//Phase[#text~=/STER/]")
However, when I run it, I get
ERROR: Invalid predicate: //Phase[#text~=/STER/] (Nokogiri::XML::XPath::SyntaxError)
What am I missing here?
Alternatively, is there an XPATH function similar to starts-with` that looks for the substring within the entire value and not just at the beginning of it?
There are two problems with your code: first off, there is no =~ operator in XPath. The way to test whether text matches a regex is using the matches function:
//Phase[matches(#text, 'STER')]
Secondly, regex matching is a feature of XPath 2.0, but Nokogiri implements XPath 1.0.
Luckily, you are not actually using any regex features, you are simply checking for a fixed string, which can be done with XPath 1.0 using the contains function:
//Phase[contains(#text, 'STER')]

I am trying to use XPath function contains() that has a string in 2 parts but it is throwing an invalid xpath error

I am trying to use XPath function contains() that has a string in 2 parts but it is throwing an "invalid xpath expression" error upon evaluation.
Here is what I am trying to achieve:
Normal working xpath:
//*[contains(text(),'some_text')]
Now I want to break it up in 2 parts as some random text is populating in between:
//*[contains(text(),'some'+ +'text')]
What I have done is to use '+' '+' to concatenate string in expression as we do in Java. Please suggest how can i get through this.
You can combine 2 contains() in one predicate expression to check if a text node contains 2 specific substrings :
//*[text()[contains(.,'some') and contains(.,'text')]]
demo
If you need to be more specific by making sure that 'text' comes somewhere after 'some' in the text node, then you can use combination of substring-after() and contains() as shown below :
//*[text()[contains(substring-after(.,'some'),'text')]]
demo
If each target elements always contains one text node, or if only the first text node need to be considered in case multiple text nodes found in an element, then the above XPath can be simplified a bit as follow :
//*[contains(substring-after(text(),'some'),'text')]

Finding an exact match of an inner text using XPath

What XPath syntax can I use to find an anchor tag where the inner text is "abc". The closest I can get to this is:
SelectSingleNode(".//a[starts-with(., \"abc\")]");
I couldn't find any "equals" function to use.
Try the following
SelectSingleNode("//a[.='abc']");
// ususally means you intend to search the whole tree, why would you add . before that?

XPath 2.0: Retrieving nodes by attribute where value is case Insensitive

I am new to using XPath and I am trying to retrieve a node via its attribute but the problem is that the attribute is case insensitive meaning I won't exactly know how the string is cased in the document.
So for example:
Given the document:
<Document xmlns:my="http://www.MyDomain.com/MySchemaInstance">
<Machines>
<Machine FQDN="machine1.mydomain.com">
<...>
</Machine>
<Machine FQDN="Machine2.MyDomain.Com">
<...>
</Machine>
</Machines>
</Document>
If I want to retrieve the machine1 I would use the XPath:
//my:Machines/my:Machine/*[#FQDN='machine1.mydomain.com']
But a similar XPath to get machine2 would fail becuase the case does not match:
//my:Machines/my:Machine/*[#FQDN='machine2.mydomain.com'] //Fails
I have seen various posts mention using something like (I am not sure how to apply Namespaces to this):
translate(#FQDN, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
But even if I got it to work it would be really cumbersome considering the number of times I would be using it.
Finally I have read that XPath 2.0 supports matches() and lower-case() but being new to XPath I don't understand how to apply them:
For example if I try the following I get an "Invalid Qualified name":
//my:Machines/my:Machine/[matches(#FQDN, '(?i)machine1.mydomain.com')]
//my:Machines/my:Machine/[lower-case(#FQDN, 'machine1.mydomain.com')]
Can someone provide a sample XPath that includes handling of Namespaces that would work?
Thanks
Your example XML and XPath statements don't match.
The sample XML elements are not bound to a namespace. The "my" namespace-prefix is declared, but not used for those elements, so they are in the "no namespace".
Your sample XPath is using predicate filters on the children of Machine rather than on the Machine element that has the #FQDN.
You could use either of these methods to look for the value case-insensitive:
matches() function, with a flag for case-insensitive matching:
//Machines/Machine[matches(#FQDN,'machine2.mydomain.com','i')]
upper-case() function to evaluate the upper-case strings:
//Machines/Machine[upper-case(#FQDN)=upper-case('machine2.mydomain.com')]
lower-case() function to evaluate the lower-case strings:
//Machines/Machine[lower-case(#FQDN)=lower-case('machine2.mydomain.com')]
Can someone provide a sample XPath that includes handling of
Namespaces that would work?
Not sure what you meant by the handling of namespaces, but if you wanted to match on those elements regardless of their namespace then you can use the wildcard operator for the namespace:
//*:Machines/*:Machine[matches(#FQDN,'machine2.mydomain.com','i')]

Invalid Token when using XPath

I am making a modification to a web application using XPath, and when executed I get an error message - Invalid token!
This is basic what I am doing
public xmlNode GetSelection (SelectParams params, xmldocument docment)
{
xpathstring = string.format("Name =\'{0}' Displaytag = \'{1}' Manadatory=\'{2}', params.Name, params.Displaytag, params.Manadatory);
return document.selectsinglenode(xpathstring);
}
As you can see, I am making a string and setting values on the nodes I am trying to find against my xml document, and thus returning xml data that matches my parameters.
What is happening is that I am getting an xpathexeception error in Visual Studio and it says invalid token.
I do know that in the xml document that the parameters I am looking in the tags have double quotes, for example, Name="ABC". So, I thought the problem could be solved using an "\".
Can anyone help?
Update from comments
In the Xml Document, the tag has
attributes where they are set as
Name="ABC" Displaytag="ATag"
Manadatory="true".
I guess you need:
//*[#Name="ABC"][#Displaytag="ATag"][#Manadatory="true"]
Or
//*[#Name="ABC" and #Displaytag="ATag" and #Manadatory="true"]
Meaning: any element in the whole document having a Name attribute with "ABC" value, a Displaytag attribute with "ATag" value and a Manadatory attribute with "true" value.
The string passed as argument to SelectSingleNode() (BTW, the exact capitalization is important) is something like:
Name ='someName' Displaytag = 'someString' Manadatory='true'
This is extremely different than a syntactically legal XPath expression.
And the error message just reflects the fact that toxic food has been given to the XPath engine.
Solution: Do read at least a light XPath tutorial and then specify a correct XPath expression.
The string you are constructing is not a valid XPath expression. In fact, it is nothing like XPath at all.
Indeed, even if it were a valid XPath expression, constructing it this way by string concatenation is a very dangerous practice, because of the possibility of injection attacks. But I suspect that advice will fall on stony ground.

Resources