Xpath - Exclude elements within TD - xpath

I'm trying to use Chrome's scraper extension using XPath. I've been able to scrape everything I need from a table, but I'm stuck in one spot. Here's the source
<td>
<p class="pClass">
<a href="theurl" target="_blank">
<i class="iClass">someText</i>
Anchor text
</a>
</p>
</td>
I'm trying to grab just the URL, but when using my Xpath code as td[9]/p/a it grabs the icon part that says "someText". Is there a way to just grab the URL?

In order to extract url just add #href to your xpath expression, this should work: //td[9]/p/a/#href.
For stripping white space you can use xpath function normalize-space().

Related

"Imported content is empty." error when scraping with ImportXML in GSheets

I need to scrape images' source URLs from a directory's linked web pages to columns into a Google Sheet.
I think using IMPORTXML function would be the easiest solution, but I get the #N/A "Imported content is empty." error every time.
I have tried to use this extension as well to define XPath, but still the same error.
The page's source code, where image source URL is:
<div class="centerer" id="rbt-gallery-img-1">
<i class="spinner">
<span></span>
</i>
<img data-lazy="//i.example.com/01.jpg" border="0"/>
</div>
So I want to get "i.example.com/01.jpg" value to B2, followed by further images' URLs to adjacent cells.
The function I used is:
=IMPORTXML(A2,"//img[#class='centerer']/#data-lazy")
I tried using spinner instead of centerer, with the same result.
You can get the string i.example.com/01.jpg with the following XPath-1.0 expression:
substring-after(//div[#class='centerer']/img/#data-lazy,'//')
If you don't need to remove the leading //, you can only use
//div[#class='centerer']/img/#data-lazy
So, in the first case, the Google-Sheets expression could be
=IMPORTXML(A2,"substring-after(//div[#class='centerer']/img/#data-lazy,'//')")
and in the second it could be
=IMPORTXML(A2,"//div[#class='centerer']/img/#data-lazy")

Get href attribute of a tag with id using XPath

How do I get the Xpath that returns:
//www.example.com
With the following DOM:
<a id="myId" href="//www.example.com">
Click here</a>
//a[#id='myId']/#href
That should work, just implement it into whatever language you're using xpath with.

Xpath of a text containing Bold text

I am trying to click on the link whose site is www.qualtrapharma.com‎ by searching in google
"qualtra" but there is problem in writing xpath as <cite> tag contains <B> tag inside it. How to do any any one suggest?
<div class="f kv" style="white-space:nowrap">
<cite class="vurls">
www.
<b>qualtra</b>
pharma.com/
</cite>
<div>
You may overcome this by using the '.' in the XPath, which stands for the 'text in the current node'.
The XPath would look like the following:
//cite[.='www.qualtrapharma.com/']

anchor opening element on separate page not working?

I am trying to get my anchors of thumbnails to open to a different page but at the specific element. I don't think it is working as there is a jquery plugin on the page that the elements exist on and I can't find a way to target them. When you click on a image it opens on the first in the sequence but not on the image requested. How can this be solved?
Please see my page here http://i-n-t-e-l-l-i-g-e-n-t-s-i-a.com/melissafranklin.com/index.html
The href attribute of the <a> tag should match the src attribute of the <img> tag
Use this:
<a href="images/paintings/7copy.jpg">
<img src="images/paintings/7copy.jpg" alt="">
</a>
Instead of this:
<a href="paintings.html">
<img src="images/paintings/7copy.jpg" alt="">
</a>

Trying to create XPath from this HTML snippet

I have played for a while writing XPath but am unable to come up with exactly what I want.
I'm trying to write XPath for link(click1 and click2 in code snippet below) based on known text(myidentity in code snippet below). Can someone take a look into and suggest possible solution?
HTML code snippet:
<div class="abc">
<a onclick="mycontroller.goto('xx','yy'); return false;" href="#">
<img src="images/controls/inheritance.gif"/>
</a>
myidentity
<span>
<a onclick="mycontroller.goto('xx','yy'); return false;" href="#">click1</a>
<a onclick="mycontroller.goto('xx','yy'); return false;" href="#">click2</a>
</span>
</div>
You don't need to use XPath here, you could use a CSS locator. These are often faster and more compatible across different browsers.
css=div:contains(myidentity) > span a:nth-child(1) //click1
css=div:contains(myidentity) > span a:nth-child(2) //click2
Note that the > is only required to workaround a bug in the CSS locator library used by Selenium.
Hard to say without seeing the rest of the HTML but the following should work:
//div[text()[contains(., "myidentity")]]/span/a
See Macro's answer - this form should be used.
//div[text()[contains(., "myidentity")]]/span/a[2]
The following only works with one section of text in the containing div.
You'll need to select based on the text containing your identity text.
Xpath for click1
//div[contains(text(),"myidentity")]/span/a[1]
Xpath for click2
//div[contains(text(),"myidentity")]/span/a[2]

Resources