Using XPath Functions to Clear Unnecessary Information

Using XPath Functions to Clear Unnecessary Information - xpath

We have a result from an XML sync that comes in like this:
<items>
</item>
<extension_attributes>
<coaf type="string">{"additional_fields":{"attribute_id":"425","attribute_code":"last_4_digits","admin_label":"Last 4 Digits","label":"Last 4 Digits","email":"1","pdf":"0","value":"4321"},"which_warehouse":{"attribute_id":"426","attribute_code":"which_warehouse","admin_label":"Warehouse Location","label":"Which Warehouse (Assuming Available Stock)","email":"1","pdf":"0","value":"Either or"}}</coaf>
</extension_attributes>
</item>
</items>
I believe (and have asked, but they only answered with an assumption as well) that we are using XPath 1.0
To get the Match and load the fields, I have set:
/root/items/item/extension_attributes/coaf
And within the value field, I have placed something like a simple translate to make sure all is being found correctly:
translate(/root/items/item/extension_attributes/coaf, 'attribute_code{}"attribute_id','')
Which obviously comes back with a pretty crazy result (to be expected) like:
nlfls::425,:ls4gs,mnll:Ls 4 Dgs,ll:Ls 4 Dgs,ml:1,pf:0,vl:4321,whhwhs::426,:whhwhs,mnll:Whs Ln,ll:Whh Whs (Assmng Avll Sk),ml:1,pf:0,vl:Eh
So we are sure we are in the right place, just unsure how to attack this.
We have tried replace, but unsure how to combine replace commands together to get rid of the extra information, especially because the value field within this program we are using is (currently) limited to 100 characters. So a lengthy replace line to get all the extra information would likely go over the character count.
What would be the best way to shed all the unnecessary information and essentially just retain the important fields like only keeping label:title1 & value:value1 and label:title2 & value:value2 ?

Related

XPath fails because Namespace colon in Title

I'm generating an XML report, using the JDF standard for PDFs going into a printing workflow.
There are 3 "DPart" sections, and I can use an xPath query to recognize them, but I want to grab the "Separation" attribute of each "cip4:Part". I can also get a query to find that, but it does not distinguish between the multiple "DPart"s.
<DPart End="0" ID="0003" ParentRef="0002" Start="0">
<DPM>
<cip4:Root>
<cip4:Intent cip4:ProductType="ProductPart"/>
<cip4:Production>
<cip4:Resource>
<cip4:Part Separation="K1"/>
<cip4:Color cip4:ActualColorName="Black" cip4:ColorType="Normal">
</cip4:Resource>
<cip4:Resource>
<cip4:Part Separation="S1"/>**
<cip4:Color cip4:ActualColorName="Dieline" cip4:ColorType="Normal">
</cip4:Resource>
<cip4:Resource>
<cip4:ColorantControl ColorantOrder="K1 S1" ColorantParams="K1 S1"/>
</cip4:Resource>
<cip4:Resource>
<eg:InkCoverage>
<eg:InkCov eg:Mm2="0.000000" eg:Pct="0.000000" eg:Separation="K1"/>
<eg:InkCov eg:Mm2="182.337538" eg:Pct="0.721209" eg:Separation="S1"/>
</eg:InkCoverage>
</cip4:Resource>
</cip4:Production>
</cip4:Root>
</DPM>
</DPart>
I want to do something like:
/DPM[2]/*[name ()='cip4:Part'], but it's not working.
I'm in a low-code pre-press environment (Esko Automation Engine), but the system gives me tools to parse an xPath, and throw some JavaScript at it.

There are at least three reasons your XPath selects nothing:
DPM is not an immediate child of the root node
There is only one DPM, so DPM[2] won't select anything
There is no child of a DPM whose name is cip4:Part.
You also say in the narrative that there are three DPart's, which implies that DPart is not actually the outermost element as it appears to be in your sample. This makes it difficult to provide the correct XPath. However, you might be able to make a start with
(//DPM)[2]//*[name()='cip4:Part']

Navigating XML via Xpath

I am new to Xpath and have been trying to get my head around some basic examples using xpath testing sites before I tackle a much more complex piece.
I am trying to understand exactly how to use the contains function in conjunction with other condition(s), but struggling a bit.
Here is my xml:
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<actor id="1">Christian Bale</actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="4">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
</root>
To replicate the type of xml (or html to be more precise) I am trying to parse, I have got one of the singer attributes repeated.
So I am trying to use contains to find the foo:singer id = 4 and contains "Tom Waits".
From what I have read and examples I have seen, you can use this type of expression:
.//*[#id =4 and //foo:singer[contains(text(),'Tom Waits')]]/text()
However, this returns both Tom Waits and BB King.
If I use these two expressions separately, they get the expected results, so not sure exactly if/how they can be combined.
Many thanks if you can assist me.
Andrew

Be sure to pay attention to context. There's no need for the nested predicate.
.//*[#id =4 and contains(.,'Tom Waits')]/text()
So I am trying to use contains to find the foo:singer id = 4 and contains "Tom Waits".
Since you're using //foo:singer for the contains test, the entire document is in context so it's always true.

Use
//foo:singer[contains(text(),'Tom Waits')]/text()

Find HTML Tags in Properties

My current issue is to find HTML-Tags inside of property values. I thought it would be easy to search with a query like /jcr:root/content/xgermany//*[jcr:contains(., '<strong>')] order by #jcr:score
It looks like there is a problem with the chars < and > because this query finds everything which has strong in it's property. It finds <strong>Some Text</strong> but also This is a strong man.
Also the Query Builder API didn't helped me.
Is there a possibility to solve it with a XPath or SQL Query or do I have to iterate through the whole content?

I don't fully understand why it finds This is a strong man as a result for '<strong>', but it sounds like the unexpected behavior comes from the "simple search-engine syntax" for the second argument to jcr:contains(). Apparently the < > are just being ignored as "meaningless" punctuation.
You could try quoting the search term:
/jcr:root/content/xgermany//*[jcr:contains(., '"<strong>"')]
though you may have to tweak that if your whole XPath expression is enclosed in double quotes.
Of course this will not be very robust even if it works, since you're trying to find HTML elements by searching for fixed strings, instead of actually parsing the HTML.

If you have an specific jcr:primaryType and the targeted properties you can do something like this
select * from nt:unstructured where text like '%<strong>%'
I tested it , but you need to know the properties you are intererested in.
This is jcr-sql syntax

Start using predicates like a champ this way all of this will make sense to you!
HTML Encode <strong>
HTML Decimal <strong>
Query builder is your friend:
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%3Cstrong%3E%25
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have a go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%26lt%3Bstrong%26gt%3B%25
XPath:
/jcr:root/content/geometrixx//element(*, nt:unstructured)
[
jcr:like(#text, '%<strong>%')
]
SQL2 (already covered... NASTY YUK..)
SELECT * FROM [nt:unstructured] AS s WHERE ISDESCENDANTNODE([/content/geometrixx]) and text like '%<strong>%'

Although I'm sure it's entirely possible with a string of predicates, it's possibly heading down the wrong route. Ideally it would be better to parse the HTML when it is stored or published.
The required information would be stored on simple properties on the node in question. The query will then be a lot simpler with just a property = value query, than lots of overly complex query syntax.
It will probably be faster too.
So if you read in your HTML with something like HTMLClient and then parse it with a OSGI service, that can accurately save these properties for you. Every time the HTML is changed the process would update these properties as necessary. Just some thoughts if your SQL is getting too much.

Retrieve an xpath text contains using text()

I've been hacking away at this one for hours and I just can't figure it out. Using XPath to find text values is tricky and this problem has too many moving parts.
I have a webpage with a large table and a section in this table contains a list of users (assignees) that are assigned to a particular unit. There is nearly always multiple users assigned to a unit and I need to make sure a particular user is assigned to any of the units on the table. I've used XPath for nearly all of my selectors and I'm half way there on this one. I just can't seem to figure out how to use contains with text() in this context.
Here's what I have so far:
//td[#id='unit']/span [text()='asdfasdfasdfasdfasdf (Primary); asdfasdfasdfasdfasdf, asdfasdfasdfasdf; 456, 3456'; testuser]
The XPath Query above captures all text in the particular section I am looking at, which is great. However, I only need to know if testuser is in that section.

text() gets you a set of text nodes. I tend to use it more in a context of //span//text() or something.
If you are trying to check if the text inside an element contains something you should use contains on the element rather than the result of text() like this:
span[contains(., 'testuser')]
XPath is pretty good with context. If you know exactly what text a node should have you can do:
span[.='full text in this span']
But if you want to do something like regular expressions (using exslt for example) you'll need to use the string() function:
span[regexp:test(string(.), 'testuser')]

XPath concat multiple nodes

I'm not very familiar with xpath. But I was working with xpath expressions and setting them in a database. Actually it's just the BAM tool for biztalk.
Anyway, I have an xml which could look like:
<File>
<Element1>element1<Element1>
<Element2>element2<Element2>
<Element3>
<SubElement>sub1</SubElement>
<SubElement>sub2</SubElement>
<SubElement>sub3</SubElement>
<Element3>
</File>
I was wondering if there is a way to use an xpath expression of getting all the SubElements concatted? At the moment, I am using:
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement']
This works if it only has one index. But apparently my xml sometimes has more nodes, so it gives NULL. I could just use
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement'][0]
but I need all the nodes. Is there a way to do this?
Thanks a lot!
Edit: I changed the XML, I was wrong, it's different, it should look like this:
<item>
<element1>el1</element1>
<element2>el2</element2>
<element3>el3</element3>
<element4>
<subEl1>subel1a</subEl1>
<subEl2>subel2a</subEl2>
</element4>
<element4>
<subEl1>subel1b</subEl1>
<subEl2>subel2b</subEl2>
</element4>
</item>
And I need to have a one line code to get a result like: "subel2a subel2b";
I need the one line because I set this xpath expression as an xml attribute (not my choice, it's specified). I tried string-join but it's not really working.

string-join(/file/Element3/SubElement, ',')

/File/Element3/SubElement will match all of the SubElement elements in your sample XML. What are you using to evaluate it?
If your evaluation method is subject to the "first node rule", then it will only match the first one. If you are using a method that returns a nodeset, then it will return all of them.

You can get all SubElements by using:
//SubElement
But this won't keep them grouped together how you want. You will want to do a query for all elements that contain a SubElement (basically do a search for the parent of any SubElements).
//parent::SubElement
Once you have that, you could (depending on your programming language) loop through the parents and concatenate the SubElements.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Using XPath Functions to Clear Unnecessary Information - xpath

Related

XPath fails because Namespace colon in Title

Navigating XML via Xpath

Find HTML Tags in Properties

Retrieve an xpath text contains using text()

XPath concat multiple nodes

Categories

Resources