XPath fails because Namespace colon in Title - xpath

I'm generating an XML report, using the JDF standard for PDFs going into a printing workflow.
There are 3 "DPart" sections, and I can use an xPath query to recognize them, but I want to grab the "Separation" attribute of each "cip4:Part". I can also get a query to find that, but it does not distinguish between the multiple "DPart"s.
<DPart End="0" ID="0003" ParentRef="0002" Start="0">
<DPM>
<cip4:Root>
<cip4:Intent cip4:ProductType="ProductPart"/>
<cip4:Production>
<cip4:Resource>
<cip4:Part Separation="K1"/>
<cip4:Color cip4:ActualColorName="Black" cip4:ColorType="Normal">
</cip4:Resource>
<cip4:Resource>
<cip4:Part Separation="S1"/>**
<cip4:Color cip4:ActualColorName="Dieline" cip4:ColorType="Normal">
</cip4:Resource>
<cip4:Resource>
<cip4:ColorantControl ColorantOrder="K1 S1" ColorantParams="K1 S1"/>
</cip4:Resource>
<cip4:Resource>
<eg:InkCoverage>
<eg:InkCov eg:Mm2="0.000000" eg:Pct="0.000000" eg:Separation="K1"/>
<eg:InkCov eg:Mm2="182.337538" eg:Pct="0.721209" eg:Separation="S1"/>
</eg:InkCoverage>
</cip4:Resource>
</cip4:Production>
</cip4:Root>
</DPM>
</DPart>
I want to do something like:
/DPM[2]/*[name ()='cip4:Part'], but it's not working.
I'm in a low-code pre-press environment (Esko Automation Engine), but the system gives me tools to parse an xPath, and throw some JavaScript at it.

There are at least three reasons your XPath selects nothing:
DPM is not an immediate child of the root node
There is only one DPM, so DPM[2] won't select anything
There is no child of a DPM whose name is cip4:Part.
You also say in the narrative that there are three DPart's, which implies that DPart is not actually the outermost element as it appears to be in your sample. This makes it difficult to provide the correct XPath. However, you might be able to make a start with
(//DPM)[2]//*[name()='cip4:Part']

Related

Can't select XML attributes with Oxygen XQuery implementation; Oxygen XPath emits result

I learned that every Xpath expression is also a valid Xquery expression. I'm using Oxygen 16.1 with this sample XML:
<actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors>
My expression is:
//actor/#id
When I evaluate this expression in Oxygen with Xpath 3.0, I get exactly what I expect:
15
38
However, when I evaluate this expression with Xquery 3.0 (also 1.0), I get the message: "Your query returned an empty sequence.
Can anyone provide any insight as to why this is, and how I can write the equivalent Xquery statement to get what the Xpath statement did above?
Other XQuery implementations do support this query
If you want to validate that your query (as corrected per discussion in comments) does in fact work with other XQuery implementations when entered exactly as given in the question, you can run it as follows (tested in BaseX):
declare context item := document { <actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors> };
//actor/#id
Oxygen XQuery needs some extra help
Oxygen XML doesn't support serializing attributes, and consequently discards them from a result sequence when that sequence would otherwise be provided to the user.
Thus, you can work around this with a query such as the following:
//actor/#id/string(.)
data(//actor/#id)
Below applies to a historical version of the question.
Frankly, I would not expect //actors/#id to return anything against that data with any valid XPath or XQuery engine, ever.
The reason is that there's only one place you're recursing -- one // -- and that's looking for actors. The single / between the actors and the #id means that they need to be directly connected, but that's not the case in the data you give here -- there's an actor element between them.
Thus, you need to fix your query. There are numerous queries you could write that would find the data you wanted in this document -- knowing which one is appropriate would require more information than you've provided:
//actor/#id - Find actor elements anywhere, and take their id attribute values.
//actors/actor/#id - Find actors elements anywhere; look for actor elements directly under them, and take the id attribute of such actor elements.
//actors//#id - Find all id attributes in subtrees of actors elements.
//#id - Find id attributes anywhere in the document.
...etc.

Find HTML Tags in Properties

My current issue is to find HTML-Tags inside of property values. I thought it would be easy to search with a query like /jcr:root/content/xgermany//*[jcr:contains(., '<strong>')] order by #jcr:score
It looks like there is a problem with the chars < and > because this query finds everything which has strong in it's property. It finds <strong>Some Text</strong> but also This is a strong man.
Also the Query Builder API didn't helped me.
Is there a possibility to solve it with a XPath or SQL Query or do I have to iterate through the whole content?
I don't fully understand why it finds This is a strong man as a result for '<strong>', but it sounds like the unexpected behavior comes from the "simple search-engine syntax" for the second argument to jcr:contains(). Apparently the < > are just being ignored as "meaningless" punctuation.
You could try quoting the search term:
/jcr:root/content/xgermany//*[jcr:contains(., '"<strong>"')]
though you may have to tweak that if your whole XPath expression is enclosed in double quotes.
Of course this will not be very robust even if it works, since you're trying to find HTML elements by searching for fixed strings, instead of actually parsing the HTML.
If you have an specific jcr:primaryType and the targeted properties you can do something like this
select * from nt:unstructured where text like '%<strong>%'
I tested it , but you need to know the properties you are intererested in.
This is jcr-sql syntax
Start using predicates like a champ this way all of this will make sense to you!
HTML Encode <strong>
HTML Decimal <strong>
Query builder is your friend:
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%3Cstrong%3E%25
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have a go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%26lt%3Bstrong%26gt%3B%25
XPath:
/jcr:root/content/geometrixx//element(*, nt:unstructured)
[
jcr:like(#text, '%<strong>%')
]
SQL2 (already covered... NASTY YUK..)
SELECT * FROM [nt:unstructured] AS s WHERE ISDESCENDANTNODE([/content/geometrixx]) and text like '%<strong>%'
Although I'm sure it's entirely possible with a string of predicates, it's possibly heading down the wrong route. Ideally it would be better to parse the HTML when it is stored or published.
The required information would be stored on simple properties on the node in question. The query will then be a lot simpler with just a property = value query, than lots of overly complex query syntax.
It will probably be faster too.
So if you read in your HTML with something like HTMLClient and then parse it with a OSGI service, that can accurately save these properties for you. Every time the HTML is changed the process would update these properties as necessary. Just some thoughts if your SQL is getting too much.

Can I get the information at which node or element or attribute the xpath failed while evaluating it against an xml

I have some xpath and I am evaluating against an XML.
//view/section/row
[(cell/data[#value='Other Roles']) and
(cell/data[contains(#value,'336')]) and
(cell/data[contains(#value,'0')]) and
(cell/data[contains(#value,'320')]) and
(cell/data[contains(#value,'16')]) and
(cell/data[contains(#value,'0')]) ]
While doing so, the xpath might not be available say row does not have the cell with data 336 , can I get that piece of information where it failed
Any code/utils that gives this information
In general, No.
Even if the result set is empty, it does not mean it fails. It is just an empty result set, which is a valid result. So as a matter of fact, your assumption is wrong, because the XPath did not fail.
If you want to check whether your XPath yiels an empty sequence, you can check using fn:empty(), e.g. empty(cell/data[contains(#value,'336')]).
Using XPath 2.0 you can also raise your own errors, using the fn:error() function. However, I do not see how you want to apply that in this specific example in a useful manner.
I've not seen any tools that automatically do this, but manually performing such sanity checks can be very useful:
First check that you're matching views:
//view
then sections:
//view/section
then rows:
//view/section/row
then specific rows:
//view/section/row[(cell/data[#value='Other Roles'])]
...until you get to a point where reality deviates from your expectations. You'll then know where an adjustment must be made.

Make 1 page objects Two Elements ID's to 1 page object Variable

I am using the page object Gem with Watir. During testing I found that I have a field that has the same contents that show in the same location but have separate unique ID's. The difference is before you get to the page.
I tried using Xpaths:
select_list(:selectionSpecial, :xpath => "//select[#id='t_id9' OR #id='t_id7']")
But was met with a script error.
They are static ID's but I want to force them into one variable since that would allow me to use "populate_page_with" feature.
I have a long winded way currently, but I am fishing for a more efficient way that works with the page object Features.
Does anyone know of a way to do this?
Your approach of using xpath can work. The problem is the syntax errors in the xpath selector. It should be:
"//select[#id='t_id9' or #id='t_id7']"
Note:
The start should be a // rather than a \
Using or is case-sensitive; it has to be lower case
There was also a missing closing ' for the first id attribute
Personally, I find css and xpath selectors harder to use. I would go with the id locator with a regex. The following gives the same results, but some will find it easier to read.
select_list(:selectionSpecial, :id => /^t_id(7|9)$/)

Retrieve an xpath text contains using text()

I've been hacking away at this one for hours and I just can't figure it out. Using XPath to find text values is tricky and this problem has too many moving parts.
I have a webpage with a large table and a section in this table contains a list of users (assignees) that are assigned to a particular unit. There is nearly always multiple users assigned to a unit and I need to make sure a particular user is assigned to any of the units on the table. I've used XPath for nearly all of my selectors and I'm half way there on this one. I just can't seem to figure out how to use contains with text() in this context.
Here's what I have so far:
//td[#id='unit']/span [text()='asdfasdfasdfasdfasdf (Primary); asdfasdfasdfasdfasdf, asdfasdfasdfasdf; 456, 3456'; testuser]
The XPath Query above captures all text in the particular section I am looking at, which is great. However, I only need to know if testuser is in that section.
text() gets you a set of text nodes. I tend to use it more in a context of //span//text() or something.
If you are trying to check if the text inside an element contains something you should use contains on the element rather than the result of text() like this:
span[contains(., 'testuser')]
XPath is pretty good with context. If you know exactly what text a node should have you can do:
span[.='full text in this span']
But if you want to do something like regular expressions (using exslt for example) you'll need to use the string() function:
span[regexp:test(string(.), 'testuser')]

Resources