Use Xpath to find the appropriate element based on the element value - xpath

I have the following xml snippet
<ZMARA01 SEGMENT="1">
<CHARACTERISTICS_01>X,001,COLOR_ATTRIBUTE_FR,BRUN ÉCORCE,TMBR,French C</CHARACTERISTICS_01>
<CHARACTERISTICS_02>X,001,COLOR_ATTRIBUTE,Timber Brown,TMBR,Color Attr</CHARACTERISTICS_02>
</ZMARA01>
I am looking for an xpath expression that will match based on COLOR_ATTRIBUTE. It will not always be in CHARACTERISTIC_02. It could be CHARACTERISTIC_XX. Also I don't want to match COLOR_ATTRIBUTE_FR. I have been using this:
Transaction.Input_XML{/ZMAT/IDOC/E1MARAM/ZMARA01/*[starts-with(local-name(.), 'CHARACTERISTIC_')][contains(.,'COLOR_ATTRIBUTE')]}
This gets me mostly there but it matches both COLOR_ATTRIBUTE and COLOR_ATTRIBUTE_FR

Use:
contains(concat(',', ., ','), ',COLOR_ATTRIBUTE,')
This first surrounds the string value of the context node with commas, then simply tests if the so cunstructed string contains ',COLOR_ATTRIBUTE,'.
Thus we treat all cases (pattern at the start of the string, pattern at the end of the string and pattern neither at the start or at the end) in the same single way.

If COLOR_ATTRIBUTE is guaranteed not to be in the first or last position, you could use [contains(.,',COLOR_ATTRIBUTE,')], otherwise you could use something like [contains(.,'COLOR_ATTRIBUTE') and not contains(.,'COLOR_ATTRIBUTE_FR')].

Related

Is it possible in XPATH to find an element by attribute value, not by name?

For example I have an XML element:
<input id="optSmsCode" type="tel" name="otp" placeholder="SMS-code">
Suppose I know that somewhere there must be an attribute with otp value, but I don’t know in what attribute it can be, respectively, is it possible to have an XPath expression of type like this:
.//input[(contains(*, "otp")) or (contains(*, "ode"))]
Try it like this and see if it works:
one = '//input/#*[(contains(.,"otp") or contains(.,"ode"))]/..'
print(driver.find_elements_by_xpath(one))
Edit:
The contains() function has a required cardinality of first argument of either one or zero. In plain(ish) English, it means you can check only one element at a time to see if it contains the target string.
So, the expression above goes through each attribute of input separately (/#*), checks if the attribute value of that specific attribute contains within it the target string and - if target is found - goes up to the parent of that attribute (/..) which, in the case of an attribute, is the node itself (input).
This XPath expression selects all <input> elements that have some attribute, whose string value contains "otp" or "ode". Notice that there is no need to "go up to the parent ..."
//input[#*[contains(., 'otp') or contains(., 'ode')]]
If we know that "otp" or "ode" must be the whole value of the attribute (not just a substring of the value), then this expression is stricter and more efficient to evaluate:
//input[#*[. ='otp' or . = 'ode']]
In this latter case ("otp" or "ode" are the whole value of the attribute), if we have to compare against many values then an XPath expression of the above form will quickly become too long. There is a way to simplify such long expression and do just a single comparison:
//input[#*[contains('|s1|s2|s3|s4|s5|', concat('|', ., '|'))]]
The above expression selects all input elements in the document, that have at least one attribute whose value is one of the strings "s1", "s2", "s3", "s4" or "s5".

Need XPath and XQuery query

I'm working on Xpath/Xquery to return values of multiple child nodes based on a sibling node value in a single query. My XML looks like this
<FilterResults>
<FilterResult>
<ID>535</ID>
<Analysis>
<Name>ZZZZ</Name>
<Identifier>asdfg</Identifier>
<Result>High</Result>
<Score>0</Score>
</Analysis>
<Analysis>
<Name>XXXX</Name>
<Identifier>qwerty</Identifier>
<Result>Medium</Result>
<Score>0</Score>
</Analysis>
</FilterResult>
<FilterResult>
<ID>745</ID>
<Analysis>
<Name>XXXX</Name>
<Identifier>xyz</Identifier>
<Result>Critical</Result>
<Score>0</Score>
</Analysis>
<Analysis>
<Name>YYYY</Name>
<Identifier>qwerty</Identifier>
<Result>Medium</Result>
<Score>0</Score>
</Analysis>
</FilterResult>
</FilterResults>
I need to get values of Score and Identifier based on Name value. I'm currently trying with below query but not working as desired
fn:string-join((
for $Identifier in fn:distinct-values(FilterResults/FilterResult/Analysis[Name="XXXX"])
return fn:string-join((//Identifier,//Score),'-')),',')
The output i'm looking for is this
qwerty-0,xyz-0
Your question suggests some fundamental misunderstandings about XQuery, generally. It's hard to explain everything in a single answer, but 1) that is not how distinct-values works (it returns string values, not nodes), and 2) the double slash selections in your return statement are returning everything because they are not constrained by anything. The XPath you use inside the distinct-values call is very close, however.
Instead of calling distinct-values, you can assign the Analysis results of that XPath to a variable, iterate over them, and generate concatenated strings. Then use string-join to comma separate the full sequence. Note that in the return statement, the variable $a is used to concat only one pair of values at a time.
string-join(
let $analyses := FilterResults/FilterResult/Analysis[Name="XXXX"]
for $a in $analyses
return $a/concat(Identifier, '-', Score),
',')
=> qwerty-0,xyz-0

Filtering by multiple values using XPath

I am trying to filter an XML document of Jobs by the Company name.
I am able to pull all items that match specific Company names using:
doc.xpath("/source/job[company[text() = 'BigCorp' or text() = 'MegaCorp']]")
I am unable to do the opposite and exclude by these values, using something like:
doc.xpath("/source/job[company[text() != 'Hodes' or text() != 'Scurri']]")
Where am I going wrong? Is there a way to provide a comma-separated list of values?
Try changing the or to and:
doc.xpath("/source/job[company[text() != 'Hodes' and text() != 'Scurri']]")
If you use or, it's always going to return the job.
For example, it would return the job with the company Hodes because text() != 'Scurri' is true (and vice versa).
Regarding the following comment:
so normalize-space() did it!
doc.xpath("/source/job[company[normalize-space() != 'Hodes' and normalize-space() != 'Scurri']]") not sure why?
The reason normalize-space() worked is because text() is also going to return whitespace.
For example, if you have an element like:
<company>
Hodes
</company>
or:
<company> Hodes </company>
the text() would equal "_Hodes_". (I replaced the spaces with _ to make them easier to see.)
Because of the whitespace, "_Hodes_" doesn't equal "Hodes".
Using normalize-space() will strip the leading/trailing whitespace and replace multiple spaces with a single space.

Select attribute and text() in the same query

I would like to select a attribute and the text() value of a node in one query, e.g. I have
<Tag1 #myattr='test'>MyText</Tag1>
and I am interested in getting "test" and "MyText" with one query.
The obvious
//Tag1/#myattr | //Tag1/text()
fails due to the fact, that Unions are only allowed over node-sets.
Any ideas?
I think, given XPath 2.0, you want a sequence of string values which you get with //Tag1/(#myattr, .)/string(). If you want a single string then use //Tag1/string-join((#myattr, .), ' ').
BTW, your path //Tag1/#myattr | //Tag1/text() would select a sequence containing an attribute value and a text node. I don't see how that would fail.

How to check IDREFS count is bigger than 1 in xPath

Before this is marked as a duplicate, I need the xpath expression and not the xquery expression. So this didn't help me: How to check IDREFS length in xPath
Also, I tried using id function as suggested here: xpath: contains() for a group of answers
but this only returns empty results for me.
I'm using the xml plugin for Notpad++, if that matters.
I have the next DTD definition
<!ELEMENT testNode EMPTY>
<!ATTLIST testNode
listOfNodes IDREFS #REQUIRED
bestNode IDREF #REQUIRED
>
when I get /testNode/#listOfNodes I have to check if there are more than one ref in listOfNodes. How can I do that ?
Thanks!
One possibility with xpaht-1.0 is:
Check how many separators (space) are in the attribute. This could be done by length of original string minus length of string without spaces.
string-length( testNode/#listOfNodes) - string-length( translate(/testNode/#listOfNodes,' ',''))
Therefore you test would be:
string-length( testNode/#listOfNodes) - string-length( translate(/testNode/#listOfNodes,' ','')) +1 > 1
To find occurrences of the attribute with only a single token, I'd use //testNode/#listOfNodes[not(contains(.,' ')] -- if I were condemned to work in an environment without validation, that would change to //testNode/#listOfNodes[not(contains(normalize-space(.),' ')]. To find occurrences with multiple IDREF tokens, remove the not().

Resources