starts with and ends with in xpath [duplicate] - xpath

This question already has answers here:
XPath testing that string ends with substring?
(5 answers)
Closed 2 years ago.
Following is how the element looks like on the page:
I want all the li which comes right after General Engineering Courses. General [COURSE_NAME] Courses is common text on other pages as well.
So basically I want all the li which come right after GENERAL [COURSE_NAME] Courses.
I wrote the following XPath, but unfortunately, it's causing DOMException while executing the XPath on Chrome.
//*[starts-with(text(), 'GENERAL') and ends-with(text(), 'COURSES')]
Page: http://catalog.fullerton.edu/content.php?catoid=16&navoid=1922

You need use XPath Axes, the following-sibling
//p[starts-with(strong,'GENERAL ') and substring(strong, string-length(strong)-7)=' COURSES']/following-sibling::ul/li

Related

Google Sheet importxml - How to retrieve only the 5 first values?

I try to use Google Sheet's importxml function to get list of value, but only need first 12 value.
So how can I do it, please?
My query: =IMPORTXML("https://muagame.vn/may-ps4.html","//h3")
You want to retrieve the values from the URL of https://muagame.vn/may-ps4.html with the xpath of //h3.
When the xpath of //h3 is used, 12 items are retrieved. You want to retrieve the 1st 5 items.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
In this answer, the xpath is modified. Please modify the xpath of =IMPORTXML("https://muagame.vn/may-ps4.html","//h3") as follows.
From:
//h3
To:
//li[position()<=5]/h3
In the HTML data, the tag h3 is put in the tag li. So in order to retrieve the 1st 5 items of h3, I used li[position()<=5].
Result:
In this case, the formula is =IMPORTXML("https://muagame.vn/may-ps4.html","//li[position()<=5]/h3").
Reference:
position
If I misunderstood your question and this was not the result you want, I apologize.
try:
=QUERY(IMPORTXML("https://muagame.vn/may-ps4.html", "//h3"), "limit 5")
or:
=ARRAY_CONSTRAIN(IMPORTXML("https://muagame.vn/may-ps4.html", "//h3"), 5, 1)

How to use Capybara Matchers when multiple checks are involved? [duplicate]

This question already has answers here:
RSpec: Expect to change multiple
(4 answers)
Closed 3 years ago.
It's easy to use Capybara Matcher on an element with single check like
expect(element).to have_selector('#selector')||
BUT
How do I achieve the same when asserting on an element that is part of a list and multiple checks need to be done to assert.
I want to do something like:
<products>
<product>
<product>
<product>
</products>
def have_product(name,price)
products.any? {|product| product.have_text(name) && product.have_text(price)} # pseudo code
end
I want to be able to check whether a product with given name and price(both need to match) exists in the list.
Simplest solution assuming you have already found the <products> element and don't need it to be the same product would be
expect(products).to have_css('product', text: name).and(have_css('product', text: price))
if it needs to be both texts in the same product element then use a regex
expect(products).to have_css('product', text: /#{product}.*#{price}/)
More advanced would be creating custom selector types for dealing with the types of objects in your UI.
UPDATE:
With the clarification of the original issue provided, the best solution is probably to use an optional filter block
expect(products).to have_css('product', text: name) { |node|
node.has_field?(with: price)
}

Xpath subscript returning all nodes, not just the requested one [duplicate]

This question already has answers here:
How to select first element via XPath?
(2 answers)
How to select the first element with a specific attribute using XPath
(9 answers)
Closed 3 years ago.
I'm trying the following XPath:
//*[local-name()='SN102'][1]
Using XPathTester, I saved my scenario
http://www.xpathtester.com/xpath/94ee37e08960247a7bf0619d38c52bee
Not every HL1Loop has a SN102.
Otherwise, I could this:
//*[local-name()='HLLoop1'][1]//*[local-name()='SN102']
I have simplified the sample data down to the following:
<ns0:X12_00401_856 xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006">
<ns0:HLLoop1>
<ns0:SN1>
<SN102>1</SN102>
<SN103>EA</SN103>
<SN108>AC</SN108>
</ns0:SN1>
</ns0:HLLoop1>
<ns0:HLLoop1>
<ns0:SN1>
<SN102>2</SN102>
<SN103>EA</SN103>
<SN108>AC</SN108>
</ns0:SN1>
</ns0:HLLoop1>
</ns0:X12_00401_856>
The result is coming back with all nodes, not just the first one:
<?xml version="1.0" encoding="UTF-8"?>
<result>
<SN102>1</SN102>
<SN102>2</SN102>
</result>
How do I select the first node only. Seems simply, and I'm sure I've done it before, but not working today.
I have a "Vendor Simulator" program that is building fake 856 data to send back, and I want to increase the first quantity only to force some error handling logic.
Just select the first element of the whole nodelist
(//*[local-name()='SN102'])[1]
The original query //*[local-name()='SN102'][1] would have selected the first SN102 if there had been several siblings of the same name.

Scraping number of likes and comments from an Instagram post via Google Sheet IMPORTXML

I am a noob at importXML. The XPath to the number of likes is
//*[#id="react-root"]/section/main/div/div/article/div[2]/section[2]/div/a/span
So the formula for the scraping the number of likes from this post: https://www.instagram.com/p/BZLli5ll6yz/ should be:
=IMPORTXML("https://www.instagram.com/p/BZLli5ll6yz/", "//*[#id="react-root"]/section/main/div/div/article/div[2]/section[2]/div/a/span")
Right? What am I missing?
Make sure that in the xpath the "react-root" is in a subclass: 'react-root'. This keeps it contained within the second argument.

Trying to exclude a portion of an xPath

I have looked through several posts about this, but have failed to apply the principles used to get the result I desire, so I'm going to just post my specific problem.
I am building a Google Sheet that enables the user to pull up Bible verses.
I have it all working, however I am running into an issue with a hidden element being pulled into my text().
FUNCTION:
=IMPORTXML("http://www.biblestudytools.com/ESV/Numbers/5-3.html",
"//*[#class='scripture']//span[2]//text()")
RESULT: You shall put out both male and female, putting them outside the camp, that they may not defile their camp, 1in the midst of which I dwell."
You can see the "1" that is showing up before the word "in"
I have found the xPath that pulls only that "1"
//*[#class='scripture']//span[2]//sup//text()
I am trying to remove that "1" from the text.
HELP PLEASE!!! :)
You can add a predicate to the end to exclude text nodes that are inside sup elements:
=IMPORTXML("http://www.biblestudytools.com/ESV/Numbers/5-3.html",
"//*[#class='scripture']//span[2]//text()[not(ancestor::sup)]")
This will retrieve only the text nodes that are not inside a sup element, but it will still result in having the verse spread out across two cells, because there are two text nodes. You can rectify this by wrapping this expression in a JOIN():
=JOIN("", IMPORTXML("http://www.biblestudytools.com/ESV/Numbers/5-3.html",
"//*[#class='scripture']//span[2]//text()[not(ancestor::sup)]"))

Resources