How to get xpath to a link to picture? - image

How to extract xpath to link of image? I want to extract this specific link:
http://insales.ru/images/bigpic.jpeg
I dont know how I must specify xpath. Do I have to include all parent tags to get it or I just can go directly to tag?
<div class="tango">
<div class="container-horizontal">
<div class="clip-horizontal">
<ul id="carousel" class="pagination jcarousel">
<li class="jcarousel-item">
<a rev=http://insales.ru/images/bigpic.jpeg href="http://insales.ru/images/pic2.jpeg">
<img src="http://insales.ru/images/thumb.jpeg">
</a>
</li>
</ul>
</div>
</div>
</div>
So my xpath would be:
//li[contains(#class, 'jcarousel-item jcarousel-item-horizontal jcarouse')]/a
Or I have to include all parent div tags:
//div[#class="tango"]//div[#class="container-horizontal"]//li[contains(#class, 'jcarousel-item jcarousel-item-horizontal jcarouse')]/a
But anyway both of this xpaths don't work.
How to specify xpath to extract this link: http://insales.ru/images/bigpic.jpeg

There are several options but the shortest one is probably:
.//*[#id='carousel']/li/a/img

Related

XPath Query for URL Extraction

I need to extract http://site.ru/ from this code:
<div class="one">
<dl>
<dt class="two">
<span class="name">Site</span>
</dt>
<dd class="three">
<span class="js-pseudo-link" data-url="rAnDoMlEtTeRsAnDnUmBeRs" style>
<a href="http://site.ru/" class rel="nofollow" target="_blank" style> http://site.ru/ </a>
</span>
</dd>
</dl>
</div>
I use this XPath query: //div//dl//dd//span//a/#href
But it doesn't work. It doesn't return anything.
I'm a newbie in XPath.
Unfortunately, the data source you are looking for is an empty span node (class js-pseudo-link). The data-url attribute has the base64 encoded link you want. This node only gets populated after loading. ImportXML for some reason ignores nodes with no text and there's no way to get it not to do that. To get around this, looks like you'll have to write an apps script that can handle empty nodes or just gets the raw HTML code and parse it.

How to properly get the value contained inside a section using XPath?

having the following HTML (snippet grabbed from the web page I wanted to scrape):
<div class="ulListContainer">
<section class="stockUpdater">
<ul class="column4">
<li>
<img src="1.png" alt="">
<strong>
Buy*
</strong>
<strong>
Sell*
</strong>
</li>
<li>
<header>
$USD
</header>
<span class="">
20.90
</span>
<span class="">
23.15
</span>
</li>
</ul>
<ul>...</ul>
</section>
</div>
how do I get the 2nd li 1st span value using XPath? The result should be 20.90.
I have tried the following //div[#class="ulListContainer"]/section/ul[1]/li[2]/span[1] but I am not getting any values. I must said this is being used from a Google Sheet and using the function IMPORTXML (not sure what version of XPath it does uses) can I get some help?
Update
Apparently Google Sheets does not support such "complex" XPath expression since it seems to work fine:
Update 1
As requested I've shared the Google Sheet I am using to test this, here is the link
What you need is :
=IMPORTXML(A1;"//li[contains(text(),'USD')]/span[1]")
Removing section from your original XPath will work too :
=IMPORTXML(A1;"//div[#class='ulListContainer']/ul[1]/li[2]/span[1]")
Try this:
=IMPORTXML("URL","//span[1]")
Change URL to the actual website link/URL

Proper xpath Syntax for Extracting Two Text Values

I am trying to scrape a web page for NAME OF COMPANY and CITY AND STATE OF COMPANY shown below.
I have an xpath code snippet that identifies both text elements at the same time:
// span[starts-with(#class,"text-align")]/text()[2]
This xpath snippet pulls the first text value (COMPANY NAME). How do I get the second text element (CITY,STATE)?
A snip of the web page code looks like this:
<div>
<ul class="pv-top-card-v3--experience-list">
<li>
<a class="pv-top-card-v3--experience-list-item" href="#" data-control-name="position_see_more" data-ember-action="" data-ember-action-172="172">
<img src="https://media.licdn.com/dms/image/C4E0BAQFhA8h46hvabA/company-logo_100_100/0?e=1582761600&v=beta&t=VAeZqaGu3Lu6Ol_n5kiiI74FSRuSOZA1ggAI5qTVRjE" id="ember173" class="EntityPhoto-square-1 flex-shrink-zero ember-view">
<span id="ember174" class="text-align-left ml2 t-14 t-black t-bold full-width lt-line-clamp lt-line-clamp--multi-line ember-view" style="-webkit-line-clamp: 2"> THIS IS THE NAME OF A COMPANY
<!----></span>
</a>
</li>
<li>
<a class="pv-top-card-v3--experience-list-item" href="#" data-control-name="education_see_more" data-ember-action="" data-ember-action-176="176">
<img src="https://media.licdn.com/dms/image/C560BAQEr2uQX-x2EwQ/company-logo_100_100/0?e=1582761600&v=beta&t=aDbYLUDMvlS4DpwOLjOaQj3Dj60C_cYLC5UUvGoyld0" id="ember177" class="EntityPhoto-square-1 flex-shrink-zero ember-view">
<span id="ember178" class="text-align-left ml2 t-14 t-black t-bold full-width lt-line-clamp lt-line-clamp--multi-line ember-view" style="-webkit-line-clamp: 2"> THIS IS THE CITY AND STATE OF COMPANY
<!----></span>
</a>
</li>
</ul>
</div>
The xpath string is picking up the two span elements using class. I can't use the span id attributes because they are dynamic and change with each page (one page per company).
Can someone advise how I extract the desired text?
Thanks.
point to the li level.
//ul/li[2]/a/span[starts-with(#class,"text-align")]

Xpath not just getting parent of html

I am trying to find the xpath for only the parent of a navigation bar. The path which I am trying at the moment is `//a[#class='unselectable'] from this peace of HTML.
`<div class="PrimaryNavigationContainer">
<div class="PrimaryNavigation">
<div class="Menu">
<div>
<a href="http://www.blah.co.uk/brands.aspx" class="unselectable"><span>
Brands</span></a>
<div class="navCol">
<div>
<a class="NoLink unselectable"><span>Shop by Brand</span></a>
<div class="navCol subMenus">
div>
<a href="http://www.blah.co.uk/blah/catlist_bd4.htm" class="unselectable"><span>
blah</span></a>
The xpath seem to be bringing up both the top level cats and sub categories and I am because it is in both but not sure how to single of the parent from the chld. Thanks for any help which you can provide
How about //div[#class="Menu"]/div/a[#class='unselectable']? This way you avoid selecting the a in the subMenus div.

Select visible xpath in list

I am trying to get the error message off of a page from a site. The list contains several possible errors so i can't check by id; but I do know that the one with display:list-item is the one I want. This is my rule but doesn't seem to work, what is wrong with it? What I want returned is the error text in the element.
//*[#id='errors']/ul/li[contains(#style,'display:list-item')]
Example dom elements:
<div id="errors" class="some class" style="display: block;">
<div class="some other class"></div>
<div class="some other class 2">
<span class="displayError">Please correct the errors listed in red below:</span>
<ul>
<li style="display:none;" id="invalidId">Enter a valid id</li>
<li style="display:list-item;" id="genericError">Something bad happened</li>
<li style="display:none;" id="somethingBlah" ............ </li>
....
</ul>
</div>
The correct XPath should be:
//*[#id='errors']//ul/li[contains(#style,'display:list-item')]
After //*[#id='errors'] you need an extra /, because <ul> is not directly beneath it. Using // again scans all underlying elements for <ul>.
If you are capable to not use // it would be better and faster and less consuming.

Resources