Parsing XML tags with small difference in names - xpath

I have an XML file to parse in which the element tags are of the form:
<mensa-1>
..
</mensa-1>
<mensa-2>
..
</mensa-2>
Is it possible to parse such elements via Xpath when the element names differ via a number at the end?

The following XPath expression returns all the elements whose names start with "mensa-":
//*[starts-with(name(),'mensa-')]

Related

XPath to be precised into one in order to extract text from a web page?

I have a few Xpaths as below:
//*[#id="904735f0-bb82-11ea-a473-6d0f51688222"]/div/p
//*[#id="729c0860-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
//*[#id="2555ab30-bb84-11ea-9e8b-277e7f6208b2"]/div/div/div[1]/div/p
//*[#id="7e100250-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
//*[#id="811727d0-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
All of the above are used to extract text from a single web page since text is located at different view--ports, but I wish to find a single xpath to extract text for all of them. Is it possible to use 'and' and multiple ID's to extract all of it through one xpath?
Any other suggestions would be appreciate.
You can use the or operator for the last four.
And the merge-nodes operator | to add the first one.
So to select all 5 expression in one, use the following expression:
//*[#id="904735f0-bb82-11ea-a473-6d0f51688222"]/div/p | //*[#id="729c0860-a71d-11ea-b994-53a3e91a35c2" or #id="2555ab30-bb84-11ea-9e8b-277e7f6208b2" or #id="7e100250-a71d-11ea-b994-53a3e91a35c2" or #id="811727d0-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
A shorter and more generic solution could be :
(//div/div/div[1]/div/p|//div/p)[parent::*[string-length(#id)=36 and substring(#id,24,1)="-"]]
First part with () is used to specify the end of the path. Since #id attributes have the same length, we use it inside the predicate. We also verify the presence of a - at a specific position with substring.

How to run for each loop on value extracted from regular expression extractor

Problem Statement: in my API response, I am getting same set of xml tag repeated multiple times but with different values of the underlying tags which are also repeating across xml response. Also those two underlying values have mapping to each other. I need to extract all the mappings from response and write them to SQL. I am having trouble with extracting the mapping values from the API response.
I am trying to use regular expression extractor to fetch that repetitive main tag which hold those two values. Then I am trying to use for each loop on the output variable of regular expression extractor. And will then write respective values to target table for each of the iteration.
Following tag is repeating multiple times for each cycle. I need to fetch the two values present under <Value> tag for each of the <Object> tag. For example '{abc-def}' and 'D12345' in this case for this particular instance....and so on.
<Object classId="QueryResultRow"><Property i:type="fn40:SingletonId" propertyId="Id"><Value>{abc-def}</Value></Property><Property i:type="fn40:SingletonString" propertyId="DCN"><Value>D12345</Value></Property></Object>
I am unable to get the required two values from each of the tag while retaining the mapping. Also, I am not sure how to use only one of the generated variable from the output of regular expression as it's creating 4 types of variables for each target xml tag.
objVal=<Object classId="QueryResultRow"><Property i:type="fn40:SingletonId" propertyId="Id"><Value>{abc-def}</Value></Property><Property i:type="fn40:SingletonString" propertyId="DCN"><Value>D112345</Value></Property></Object>
objVal_g=1
objVal_g0=<Object classId="QueryResultRow"><Property i:type="fn40:SingletonId" propertyId="Id"><Value>{abc-def}</Value></Property><Property i:type="fn40:SingletonString" propertyId="DCN"><Value>D12345</Value></Property></Object>
objVal_g1=<Property i:type="fn40:SingletonId" propertyId="Id"><Value>{abc-def}</Value></Property><Property i:type="fn40:SingletonString" propertyId="DCN"><Value>D12345</Value></Property>
I would need to use only objVal from here and I am trying to use
Regular expression extractor
Flow of my Test
For Each loop to extract object tags
second for loop to extract two values out of extracted object tags
use of variable created in 4th step in my jdbc sampler
After you have xml in objVal variable,
Use XPath Extractor using JMeter variable Name to use your objVal variable
Use XPath query as /Object/Property/Value
And if Match No. is -1 you will get all values
objVal_value={abc-def}
aa_value_1={abc-def}
aa_value_2=D112345
allows the user to extract value(s) from structured response - XML or (X)HTML - using XPath query language
You can use the below regular expression to fetch the required values.
propertyId="Id"><Value>(.*?)</Value>(.*)propertyId="DCN"><Value>(.*?)</Value>
The values will be present in the below variables.
objval_1_g0
objval_1_g2
objval_2_g0
objval_2_g2
objval_3_g0
objval_3_g2
objval_4_g0
objval_4_g2
You can use a debug sampler to check the variables value.

Evaluate xpath selector to get text in p- and li-tags

For purposes to automatically replace keywords with links based on a list of keyword-link pairs I need to get text that is not already linked, not a script or manually excluded, inside paragraphs (p) and list items (li) –- to be used in Drupal's Alinks module.
I modified the existing xpath selector as follows and would like to get feedback on it, if it is efficient or might be improved:
//*[p or li]//text()[not(ancestor::a) and not(ancestor::script) and not(ancestor::*[#data-alink-ignore])]
The xpath is meant to work with any html5 content, also with self closing tags (not well-formed xml) -- that's the way the module was designed, and it works quite well.
In order to select text node descendant of p or li elements that are not descendant of a or script elements, you can use this XPath 1.0:
//*[self::p|self::li]
//text()[
not(ancestor::a|ancestor::script|ancestor::*[#data-alink-ignore])
]
Your XPath expression is invalid. You are missing a / before text(). So a valid expression would be
//*[p or li]/text()[not(ancestor::a) and not(ancestor::script) and not(ancestor::*[#data-alink-ignore])]
But without an XML source file it is impossible to tell if this expression would match your desired node.

I am trying to use XPath function contains() that has a string in 2 parts but it is throwing an invalid xpath error

I am trying to use XPath function contains() that has a string in 2 parts but it is throwing an "invalid xpath expression" error upon evaluation.
Here is what I am trying to achieve:
Normal working xpath:
//*[contains(text(),'some_text')]
Now I want to break it up in 2 parts as some random text is populating in between:
//*[contains(text(),'some'+ +'text')]
What I have done is to use '+' '+' to concatenate string in expression as we do in Java. Please suggest how can i get through this.
You can combine 2 contains() in one predicate expression to check if a text node contains 2 specific substrings :
//*[text()[contains(.,'some') and contains(.,'text')]]
demo
If you need to be more specific by making sure that 'text' comes somewhere after 'some' in the text node, then you can use combination of substring-after() and contains() as shown below :
//*[text()[contains(substring-after(.,'some'),'text')]]
demo
If each target elements always contains one text node, or if only the first text node need to be considered in case multiple text nodes found in an element, then the above XPath can be simplified a bit as follow :
//*[contains(substring-after(text(),'some'),'text')]

How does one use the contains text syntax in xpath. I am using ruby-cucumber

I have xpath:
//div[#id='123']/li/a[4]
which has text as '2' in its html
so i wanted it to be specific like:
//div[#id='123']/li/a[contains text='2'] ???
Basically I don't want to depend on a[4].. not on the number but to be specific to the text so that it can be located even after the webpage is updated or modified.
In the predicates part [], you can use text() to retrieve the text node and contains(,) to test if it contains the specific text.
//div[#id='123']//li/a[contains(text(), '2')]
Here's an example

Resources