Shorten XPATH with wildcards - xpath

I'm currently trying to figure out how to shorten my extremely long xpath.
//div[#class='m_set_part'][1]/div/div[2]/div[#class='row']/div[#class='col details detail-head']/div[#class='detail-body']/div[2]/div/div[#class='size']/div/div[#class='m_product_finder_size']/ul/li[1]/span[#class='size-btn']/a
This is the one I have right now and it's way too long, the problem is I need the first node to differentiate between products. Is there a way to shorten it like
//div[#class='m_set_part']/*/span[#class='size-btn']/a
Or do I have to go through all childnodes to reach the last nodes?
Link
I want to find the for each product the sizebuttons. The only way to differentiate them, I guess, is via adding a [1] or [2] to the m_set_part node.

You are basically correct. As said in the comments, you can use // to select descendant or self nodes. Hence, this will give you all the size links:
//span[#class='size-btn']/a
As you suggest, you can select the specific product using a positional predicate. However, if you prefer you could also use another detail, e.g. the name. This would simply be
//div[#class="m_set_part"][.//label="Vælg"]
to given you the Vælg product.
Now combine them both and you can get the size link for this specifc product using
//div[#class="m_set_part"][.//label="Vælg"]//span[#class='size-btn']/a
or using the psoitional predicate it would be
//div[#class="m_set_part"][1]//span[#class='size-btn']/a
Also, please make sure you use a proper namespace as this is an actual XHTML document. One other thing is that you might prefer to use contains(#class, 'm_set_part') instead of #class="m_set_part" and the like, because the query will still work even if the add new CSS classes to this element.

To answer to your question: No you don't have to go through all nodes.
You may use the // descendant-or-self selector to 'skip' zero or more nodes in between the preceeding and the next part of the expression. So //div[#class='m_set_part']//span[#class='size-btn']/a might give you exactly what you want. * on the other hand matches any node, but exactly one node. Therfore
//div[#class='m_set_part'][1]/*/*[2]/*[#class='row']/*[#class='col details detail-head']/*[#class='detail-body']/*[2]/*/*[#class='size']/*/*[#class='m_product_finder_size']/*/*[1]/*[#class='size-btn']/a
is another way to shorten your original expression. Whether it's still returns only the interested node or more is solely depends on the document you apply the expression on.

Related

How to uniquly identify an two objects in same page having same url

I Have two objects in same page but with different locations(tabs), I want to verify those objects each a part ...
i cant uniquely any of objects because the have same properties.
These objects clearly are unique to a point because they have completely different text, this means that you will be able to create an object to match only one of them. My suggestion would be to look for the object by using its text property, one of them will always have "Top Ranking" the other you wil need to turn into a regular expression for the text and will be something "Participants (\d+)".
I am assuming that this next answer is unlikely to be possible so saved it for after the answer you are likely to use but the best solution would of course be to get someone with access to give these elements ids for you to search for. This will in the long term be much easier for you to maintain and not using text will allow this test to run in any language.
Manaysah, do these objects have different indexes? Use the object spy and determine which index they have, the ordinal identifier index may be a solution to your problem. You could also try adding an innertext object property if possible, using a wildcard for the number inside the () as it appears dynamic.
try using xpath for the objects...xpath will definitely be different

XPath simple conditional statement? If node X exists, do Y?

I am using google docs for web scraping. More specifically, I am using the Google Sheets built in IMPORTXML function in which I use XPath to select nodes to scrape data from.
What I am trying to do is basically check if a particular node exists, if YES, select some other random node.
/*IF THIS NODE EXISTS*/
if(exists(//table/tr/td[2]/a/img[#class='special'])){
/*SELECT THIS NODE*/
//table/tr/td[2]/a
}
You don't have logic quite like that in XPath, but you might be able to do something like what you want.
If you want to select //table/tr/td[2]/a but only if it has a img[#class='special'] in it, then you can use //table/tr/td[2]/a[img[#class='special']].
If you want to select some other node in some other circumstance, you could union two paths (the | operator), and just make sure that each has a filter (within []) that is mutually exclusive, like having one be a path and the other be not() of that path. I'd give an example, but I'm not sure what "other random node" you'd want… Perhaps you could clarify?
The key thing is to think of XPath as a querying language, not a procedural one, so you need to be thinking of selectors and filters on them, which is a rather different way of thinking about problems than most programmers are used to. But the fact that the filters don't need to specifically be related to the selector (you can have a filter that starts looking at the root of the document, for instance) leads to some powerful (if hard-to-read) possibilities.
Use:
/self::node()[//table/tr/td[2]/a/img[#class='special']]
//table/tr/td[2]/a

Understanding X-Path Expression

I'm trying to get an understanding of XPath in order to parse a diffxml file. I skimmed over the w3schools site. Am I understanding these correctly?
Statement 1: /node()[1]/node()[3]
Selects the third child of the root node
Statement 2: /node()[1]/node()[1]/node()[1]
Selects the child of the first node of the root node
Statement 3: /node()[1]/node()[3]/node()[2]
Selects the second child of the third node under the root node.
Yes, you understand them correctly, but this is not how you'd use XPath. First node() can be anything, not just elements. Then the pure index is arguably the wort way of selecting things, you should really use names, and possibly predicates for filtering the node-sets.
You'll find a lot of criticism of w3schools on this site. Personally I find it a useful resource, but only when I'm trying to remind myself of something I once knew. It's not really designed for teaching yourself things from scratch, and I suggest you need a different learning strategy. Call me old-fashioned, but when I'm learning a new technology I find there's nothing better than a good book.
You've understood your examples correctly as far as I can tell. But have you understood what a "node" is? For example, do you know under what circumstances whitespace text counts as a node? The key to understanding XPath is to understand the data model, and the way in which the data model relates to the lexical (angle-bracket) form of the XML.

How to use not contains() in XPath?

I have some XML that is structured like this:
<whatson>
<productions>
<production>
<category>Film</category>
</production>
<production>
<category>Business</category>
</production>
<production>
<category>Business training</category>
</production>
</productions>
</whatson>
And I need to select every production with a category that doesn't contain "Business" (so just the first production in this example).
Is this possible with XPath? I tried working along these lines but got nowhere:
//production[not(contains(category,'business'))]
XPath queries are case sensitive. Having looked at your example (which, by the way, is awesome, nobody seems to provide examples anymore!), I can get the result you want just by changing "business", to "Business"
//production[not(contains(category,'Business'))]
I have tested this by opening the XML file in Chrome, and using the Developer tools to execute that XPath queries, and it gave me just the Film category back.
I need to select every production with a category that doesn't contain "Business"
Although I upvoted #Arran's answer as correct, I would also add this...
Strictly interpreted, the OP's specification would be implemented as
//production[category[not(contains(., 'Business'))]]
rather than
//production[not(contains(category, 'Business'))]
The latter selects every production whose first category child doesn't contain "Business". The two XPath expressions will behave differently when a production has no category children, or more than one.
It doesn't make any difference in practice as long as every <production> has exactly one <category> child, as in your short example XML. Whether you can always count on that being true or not, depends on various factors, such as whether you have a schema that enforces that constraint. Personally, I would go for the more robust option, since it doesn't "cost" much... assuming your requirement as stated in the question is really correct (as opposed to e.g. 'select every production that doesn't have a category that contains "Business"').
You can use not(expression) function.
not() is a function in xpath (as opposed to an operator)
Example:
//a[not(contains(#id, 'xx'))]
OR
expression != true()
Should be xpath with not contains() method, //production[not(contains(category,'business'))]

XPath Query in JMeter

I'm currently working with JMeter in order to stress test one of our systems before release. Through this, I need to simulate users clicking links on the webpage presented to them. I've decided to extract theese links with an XPath Post-Processor.
Here's my problem:
I have an a XPath expression that looks something like this:
//div[#data-attrib="foo"]//a//#href
However I need to extract a specific child for each thread (user). I want to do something like this:
//div[#data-attrib="foo"]//a[position()=n]//#href
(n being the current index)
My question:
Is there a way to make this query work, so that I'm able to extract a new index of the expression for each thread?
Also, as I mentioned, I'm using JMeter. JMeter creates a variable for each of the resulting nodes, of an XPath query. However it names them as "VarName_n", and doesn't store them as a traditional array. Does anyone know how I can dynamicaly pick one of theese variables, if possible? This would also solve my problem.
Thanks in advance :)
EDIT:
Nested variables are apparently not supported, so in order to dynamically refer to variables that are named "VarName_1", VarName_2" and so forth, this can be used:
${__BeanShell(vars.get("VarName_${n}"))}
Where "n" is an integer. So if n == 1, this will get the value of the variable named "VarName_1".
If the "n" integer changes during a single thread, the ForEach controller is designed specifically for this purpose.
For the first question -- use:
(//div[#data-attrib="foo"]//a)[position()=$n]/#href
where $n must be substituted with a specific integer.
Here we also assume that //div[#data-attrib="foo"] selects a single div element.
Do note that the XPath pseudo-operator // typically result in very slow evaluation (a complete sub-tree is searched) and also in other confusing problems ( this is why the brackets are needed in the above expression).
It is recommended to avoid using // whenever the structure of the document is known and a complete, concrete path can be specified.
As for the second question, it is not clear. Please, provide an example.

Resources