What do "+", "#" and "*" mean in an XPath query? - xpath

On the iReport documentation I found these XPath queries:
/addressbook/category#name
/addressbook/category/person#id
/addressbook/category/person+LASTNAME
/addressbook/category/person+FIRSTNAME
/addressbook/category/person+hobbies*hobby
My questions:
Is category#name the same as category/#name?
What's the meaning of person+LASTNAME? (The + to be precise)
What's the meaning of person+hobbies*hobby (The * to be precise)
They are applied to this XML:
<addressbook>
<category name="home">
<person id="1">
<LASTNAME>Davolio</LASTNAME>
<FIRSTNAME>Nancy</FIRSTNAME>
<hobbies>
<hobby>Radio Control</hobby>
<hobby>R/C Cars</hobby>
<hobby>Micro R/C Cars</hobby>
<hobby>Die-Cast Models</hobby>
</hobbies>
<email>email1#my.domain.it</email>
<email>email2#my.domain2.it</email>
...
(Full XML here)

That's not XPath. It's just XPath-like. From the page that you linked:
<symbol> is used to add an extra path to the base path and to define what should be returned.
+ add the following path to the base_path (this happen when the base_path = record path);
# return the attribute value: it's followed by the attribute name;
* return all tags identified by the following path as a JRXMLDatasource
It's in section 7.3 of the link you have in your question.
So, going from that, these are the meanings of your XPath-like expressions:
/addressbook/category#name
The basepath is /addressbook/category, return the attribute "name"
/addressbook/category/person#id
The basepath is /addressbook/category/person, return the attribute "id"
/addressbook/category/person+LASTNAME
The basepath is /addressbook/category/person, return the element "LASTNAME"
/addressbook/category/person+FIRSTNAME
The basepath is /addressbook/category/person, return the element "FIRSTNAME"
/addressbook/category/person+hobbies*hobby
The basepath is /addressbook/category/person, look inside "hobbies"
and return all elements named "hobby"

Related

I can't extract the node text with a Xpath

I have a XML file (test.xml) like this one:
<?xml version="1.0" encoding="ISO-8859-1"?>
<s2xResponse>
<s2xData>
<Name>This is the name</Name>
<InfocomData>
<DateOfUpdate day="07" month="02" year="2018">20180207</DateOfUpdate>
<CompanyName>MY COMPANY</CompanyName>
<TaxCode FlagCheck="0">XXXYYYWWWZZZ</TaxCode>
</InfocomData>
<AssessmentSummary>
<Rating Code="2">Rating Description for Code 2</Rating>
</AssessmentSummary>
<AssessmentData>
<SectorialDistribution>
<CompaniesNumber>11650</CompaniesNumber>
<ScoreDistribution />
<CervedScoreDistribution>
<DistributionData>
<Rating Code="1">SICUREZZA</Rating>
<Percentage>1.91</Percentage>
</DistributionData>
<DistributionData>
<Rating Code="2">SOLVIBILITA' ELEVATA</Rating>
<Percentage>35.56</Percentage>
</DistributionData>
</CervedScoreDistribution>
</SectorialDistribution>
</AssessmentData>
</s2xData>
</s2xResponse>
I'm trying to get the "Name" node text ("This is the name") with a U-SQL script using the XmlExtractor. The following is the code I'm using:
USE TestXML; // It contains the registered assembly
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
#xml = EXTRACT xml_text string
FROM "textxpath/test.xml"
USING Extractors.Text(rowDelimiter: "^", quoting: false);
#xml_cleaned =
SELECT
xml_text.Replace("\r\n", "").Replace("\t", " ") AS xml_text
FROM #xml;
#values =
SELECT Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text, "s2xResponse/s2xData/Name")[1] AS value
FROM #xml_cleaned;
OUTPUT #values TO #"outputs/test_xpath.txt" USING Outputters.Text(quoting: false);
But I'm getting this runtime error:
Execution failed with error '1_SV1_Extract Error :
'{"diagnosticCode":195887116,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXPRESSIONEVALUATION","message":"Error
while evaluating expression
Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text.Replace(\"\r\n\",
\"\").Replace(\"\t\", \" \"),
\"s2xResponse/s2xData/Name\")[1]","description":"Inner exception from
user expression: Index was out of range. Must be non-negative and less
than the size of the collection.
I get the same error even if I use a zero index for the Evaluate result ([0]).
What's wrong with my query?
The problem here is that you are applying the subscript [1] to the result of XPath.Evaluate, which I believe will be returning the Name nodes. However, you are applying the [1] subscript in code, not in XPath, so the subscript is likely to be zero based, and not 1-based as it is in XPath, hence the Index out of range error.
Here's one solution - simply apply the subscript operator in Xpath (where it is still 1-based), and select the text() there
.Evaluate("s2xResponse/s2xData/Name[1]/text()")
Is there a particular reason you want to use the Evaluate method? I got his to work using the XmlDomExtractor, which would allow you to extract multiple values from the xml, eg
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE #inputFile string = "/input/input100.xml";
#input =
EXTRACT Name string
FROM #inputFile
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath : "/s2xResponse",
columnPaths : new SQL.MAP<string, string>{
{ "s2xData/Name", "Name" },
}
);
#output =
SELECT *
FROM #input;

How to get element by attribute name which ends with defined word

In my XML I have elements
<driverConfig name="ADriver">
...
</driverConfig>
<driverConfig name="BDriver">
...
</driverConfig>
Is there a way how to select all value of sub-element. Problem is I can modify just first name in this expression which I already tried but with no success:
//driverConfig[#name="*Driver"]/fd:properties/fd:property[#name="path"]
With XPath 2.0 you can do //driverConfig[ends-with(#name, 'Driver')]/fd:properties/fd:property[#name="path"] respectively //driverConfig[matches(#name, 'Driver$')]/fd:properties/fd:property[#name="path"].
With XPath 1.0 you can use //driverConfig[substring(#name, string-length(#name) - 5) = 'Driver']/fd:properties/fd:property[#name="path"].

Jackrabbit XPath Query: UUID with leading number in path

I have what I think is an interesting problem executing queries in Jackrabbit when a node in the query path is a UUID that start with a number.
For example, this query work fine as the second node starts with a letter, 'f':
/*/JCP/feeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence]
This query however does not, if the first 'f' is replaced with '2':
/*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence]
The exception:
Encountered "-" at line 1, column 26.
Was expecting one of:
<IntegerLiteral> ...
<DecimalLiteral> ...
<DoubleLiteral> ...
<StringLiteral> ...
... rest omitted for brevity ...
for statement: for $v in /*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence] return $v
My code in general
def queryString = queryFor path
def queryManager = session.workspace.queryManager
def query = queryManager.createQuery queryString, Query.XPATH // fails here
query.execute().nodes
I'm aware my query, with the leading asterisk, may not be the best, but I'm just starting out with querying in general. Maybe using another language other than XPATH might work.
I tried the advice in this post, adding a save before creating the query, but no luck
Jackrabbit Running Queries against UUID
Thanks in advance for any input!
A solution that worked was to try and properly escape parts of the query path, namely the individual steps used to build up the path into the repository. The exception message was somewhat misleading, at least to me, as in made me think that the hyphens were part of the root cause. The root problem was that the leading number in the node name created an illegal XPATH query as suggested above.
A solution in this case is to encode the individual steps into the path and build the rest of the query. Resulting in the leading number only being escaped:
/*/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//*[#sequence]
Code that represents a list of steps or a path into the Jackrabbit repository:
import org.apache.commons.lang3.StringUtils;
import org.apache.jackrabbit.util.ISO9075;
class Path {
List<String> steps; //...
public String asQuery() {
return steps.size() > 0 ? "/*" + asPathString(encodedSteps()) + "//*" : "//*";
}
private String asPathString(List<String> steps) {
return '/' + StringUtils.join(steps, '/');
}
private List<String> encodedSteps() {
List<String> encodedSteps = new ArrayList<>();
for (String step : steps) {
encodedSteps.add(ISO9075.encode(step));
}
return encodedSteps;
}
}
Some more notes:
If we escape more of the query string as in:
/_x002a_/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//_x002a_[#sequence]
Or the original path encoded as a whole as in:
_x002f_a_x002f_fffe4dcf0-360c-11e4-ad80-14feb59d0ab5_x002f_2cbae0dc-35e2-11e4-b5d6-14feb59d0ab5_x002f_c
The queries do not produce the wanted results.
Thanks to #matthias_h and #LarsH
An XML element name cannot start with a digit. See the XML spec's rules for STag, Name, and NameStartChar. Therefore, the "XPath expression"
/*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence]
is illegal, because the name test 2eead... isn't a legal XML name.
As such, you can't just use any old UUID as an XML element name nor as a name test in XPath. However if you put a legal NameStartChar on the front (such as _), you can probably use any UUID.
I'm not clear on whether you think you already have XML data with an element named <2eead...> (and are trying to query that element's descendants); if so, whatever tool produced it is broken, as it emits illegal XML. On the other hand if the <2eead...> is something that you yourself are creating, then presumably you have the option of modifying the element name to be a legal XML name.

Nokogiri: find tag, get attributes and replace tag

I'm working with Nokogiri and I'm a newbye. I'm parsing an HTML document to match some placeholder, and after match I must replace the widget placeholder with some generated HTML.
I create this method:
doc = Nokogiri::HTML.fragment(raw)
matches = doc.xpath(".//widget")
if matches.present?
matches.each do |match|
media_replace(..)
else
self.body = raw
end
I have some matches, and every match has this attributes.
matches.first.attributes
{"data_id"=>#(Attr:0x3fdd42e2cebc { name = "data_id", value = "5" }),
"data_type"=>#(Attr:0x3fdd42e2ce94 { name = "data_type", value = "gallery" })}
How can I extract these attributes(gallery and 5) to pass them to my media_replace method?
Media_replace method return to me an 'html': how can I replace every 'match' with the returned HTML?
To get attribute values from a node you can use the [] method. For example:
media_replace(match['data_id'], match['data_gallery'])
To replace the node, use the replace or swap methods (assuming media_replace returns a string or other compatible object):
new_html = media_replace(...)
match.replace(new_html)

How to get attribute names of element in xml by LINQ in C#

I have xml element:
<.SECTIONS>
<.SECTION ID ="1" NAME="System Health" CONTROL-TYPE="Button" LINK="http://www.google.co.in/">
<.DATAITEMS>
<./DATAITEMS>
<./SECTION>
<./SECTIONS>
I want to get the all attribute names of SECTION Element. as ID,NAME,CONTROL-TYPE,LINK at server side using LINQ to XML in C# language. What query I have to write there?
As #Giu mentions, your XML is technically malformed with the . preceding each element name.
To get the names of the attributes available in SECTION:
string xmlData = "<SECTIONS> <SECTION ID =\"1\" NAME=\"System Health\" CONTROL-TYPE=\"Button\" LINK=\"http://www.google.co.in/\"> <DATAITEMS> </DATAITEMS> </SECTION> </SECTIONS>";
XDocument doc = XDocument.Parse( xmlData );
//The above line could also be XDocument.Load( fileName ) if you wanted a file
IEnumerable<string> strings = doc.Descendants("SECTIONS")
.Descendants("SECTION")
.Attributes()
.Select( a => a.Name.LocalName );
This will give you an enumerable containing ID, NAME, CONTROL-TYPE, and LINK.
However, if you want the values contained in them, I would use #Giu's answer.
Your XML looks a little bit malformed due to the . before each tag name; I therefore sanitized your XML code by removing the .s, and made a solution based on the following XML code:
<SECTIONS>
<SECTION ID ="1" NAME="System Health" CONTROL-TYPE="Button" LINK="http://www.google.co.in/">
<DATAITEMS> </DATAITEMS>
</SECTION>
</SECTIONS>
Thanks to the sanitized XML code, you now can use the following code snippet to achieve what you want (don't forget the using directive using System.Xml.Linq;):
XDocument doc = XDocument.Parse("<SECTIONS><SECTION ID =\"1\" NAME=\"System Health\" CONTROL-TYPE=\"Button\" LINK=\"http://www.google.co.in/\"><DATAITEMS></DATAITEMS></SECTION></SECTIONS>");
var query = from item in doc.Descendants("SECTIONS").Descendants("SECTION")
select new {
Name = (string)item.Attribute("NAME"),
Id = (string)item.Attribute("ID"),
ControlType = (string)item.Attribute("CONTROL-TYPE"),
Link = (string)item.Attribute("LINK")
};
(Sidenote: You can load your XML code directly from a file (e.g. file.xml), too, by defining the doc variable as follows:
XDocument doc = XDocument.Load(#"C:\Path\To\file.xml");
)
The following code will print the value of each attribute:
foreach (var elem in query)
System.Console.WriteLine(string.Format("{0}, {1}, {2}, {3}", elem.Id, elem.Name, elem.ControlType, elem.Link));
Console output:
1, System Health, Button, http://www.google.co.in/

Resources