Scrapy / XPATH : finding substring in image url

Scrapy / XPATH : finding substring in image url - xpath

I have the following HTML source pattern :
<ul class="test_ul">
<li>
<img src="https://www.awebsite.com/image_1_a_test.png" />
</li>
<li>
<img src="https://www.awebsite.com/another_1_b_test.jpg" />
</li>
</ul>
Now I want to be able to select only the path where the src is pointing to an image which includes the "b_test" substring.
This is what I have as the string for the selector in scapy:
".//ul[#class='test_ul']//img[contains(#src,'b_test')]"
But this does not seem work and I don't see the error. I'm able to chek for the full name, but not the substring.
Any help is greatly appreciated.

I think you should include /#src in order to getting the url path
".//ul[#class='test_ul']//img[contains(#src,'b_test')]/#src"

Related

Changing the onclick URL on Tumblr photosets

I have a tumblr blog embedded into my website (iframe) and want all clicks to open in a new tab to land on the post detail url (e.g. https://diestadtgaertner.tumblr.com/post/657405245299818496). I already adapted the template to get this working for most post types by exchanging the respective href variable with "https://diestadtgaertner.tumblr.com/post/{PostID}" and add target="_blank". However, I can't get this to work for the pictureset. Does anyone know how this might work?
Help would be greatly appreciated!
Thanks & best,
Torge

You can edit your template so the photoset gets output into a normal div (I think the default is to load photosets inside an iframe themselves, which could be causing you issues.
This is the block from my tumblr template:
<ul>
...
{block:Photoset}
<li class="post photoset">
{block:Photos}
<img src="{PhotoURL-500}" {block:HighRes}style="display:none"{/block:HighRes} />
{block:HighRes}
<img src="{PhotoURL-HighRes}" class="highres" />
{/block:HighRes}
{/block:Photos}
{block:Caption}
<div class="description">{Caption}</div>
{/block:Caption}
<p>
<span class="icon-link ion-ios-infinite-outline"></span>
{block:Date}{DayOfMonthWithZero}.{MonthNumberWithZero}.{ShortYear}{/block:Date}
</p>
</li>
{/block:Photoset}
</ul>
In any case you could wrap the entire block in the Permalink href. Something like:
<ul>
...
{block:Photoset}
<li class="post photoset">
<a href="{Permalink}"> // this permalink href now wraps the entire content of the post.
{block:Photos}
<img src="{PhotoURL-500}" {block:HighRes}style="display:none"{/block:HighRes} />
{block:HighRes}
<img src="{PhotoURL-HighRes}" class="highres" />
{/block:HighRes}
{/block:Photos}
{block:Caption}
<div class="description">{Caption}</div>
{/block:Caption}
</a>
</li>
{/block:Photoset}
</ul>
The issue now is that the default click links for the images inside this post (if they exist) will no longer function normally.
It is difficult to test this without the link to your site, but I think updating your tumblr template first should hopefully give you the result you are after, but of course I would recommend a backing up your code.

Can not get the value in the strong tag

The part of HTML code is;
*...
<ul class="daily_summary">
<li class="odd">
<span> TODAY'S <br> TEST <br> RESULT </span>
<strong class="todays-test-result">123.987</strong>
</li>
</ul>
...*
XPath code : //li/span
returns TODAY's TEST RESULT
But;
XPath code : //li/span[#strong='todays-test-result']
does not return the value (123.987)
How can I get the value with XPATH?

strong tag is not in the span tag. Therefore your XPath should not include it. Following XPath should work : //li/strong[#class="todays-test-result"]
#Rixcy 's comment is pretty helpful.
Also to add on to this, the # selector is used for attributes on a tag, so in this case strong is the tag, but #class is the attribute. A useful resource for xpath is: devhints.io/xpath - #Rixcy

How to properly get the value contained inside a section using XPath?

having the following HTML (snippet grabbed from the web page I wanted to scrape):
<div class="ulListContainer">
<section class="stockUpdater">
<ul class="column4">
<li>
<img src="1.png" alt="">
<strong>
Buy*
</strong>
<strong>
Sell*
</strong>
</li>
<li>
<header>
$USD
</header>
<span class="">
20.90
</span>
<span class="">
23.15
</span>
</li>
</ul>
<ul>...</ul>
</section>
</div>
how do I get the 2nd li 1st span value using XPath? The result should be 20.90.
I have tried the following //div[#class="ulListContainer"]/section/ul[1]/li[2]/span[1] but I am not getting any values. I must said this is being used from a Google Sheet and using the function IMPORTXML (not sure what version of XPath it does uses) can I get some help?
Update
Apparently Google Sheets does not support such "complex" XPath expression since it seems to work fine:
Update 1
As requested I've shared the Google Sheet I am using to test this, here is the link

What you need is :
=IMPORTXML(A1;"//li[contains(text(),'USD')]/span[1]")
Removing section from your original XPath will work too :
=IMPORTXML(A1;"//div[#class='ulListContainer']/ul[1]/li[2]/span[1]")

Try this:
=IMPORTXML("URL","//span[1]")
Change URL to the actual website link/URL

Thymeleaf href url

I am trying to dynamically generate links for the content in my page by looping through a list but I get 'parsing errors'
I tried as mentioned in https://www.thymeleaf.org/doc/articles/standardurlsyntax.html:
<a th:href="#{/order/{id}/details(id=3,action='show_all')}">
Code:
<li th:each="param : ${paramList}">
<span th:text="${placeholder}">This is displaying the value of placeholder correctly</span>
<!-- The value I am trying to achieve is href="/member/team/ValueFromPlaceholderVariable?team=TeamName&page=PageName" -->
<a th:href="#{/member/team/{PlaceName}(PlaceName=${placeholder},team=${param.TeamName},page=${param.PageName})}">Page</a>
</li>
How can I generate the href link?

Remove the slash at the beginning
<a th:href="#{member/team/{PlaceName}(PlaceName=${placeholder},team=${param.TeamName},page=${param.PageName})}">Page</a>

How to get xpath to a link to picture?

How to extract xpath to link of image? I want to extract this specific link:
http://insales.ru/images/bigpic.jpeg
I dont know how I must specify xpath. Do I have to include all parent tags to get it or I just can go directly to tag?
<div class="tango">
<div class="container-horizontal">
<div class="clip-horizontal">
<ul id="carousel" class="pagination jcarousel">
<li class="jcarousel-item">
<a rev=http://insales.ru/images/bigpic.jpeg href="http://insales.ru/images/pic2.jpeg">
<img src="http://insales.ru/images/thumb.jpeg">
</a>
</li>
</ul>
</div>
</div>
</div>
So my xpath would be:
//li[contains(#class, 'jcarousel-item jcarousel-item-horizontal jcarouse')]/a
Or I have to include all parent div tags:
//div[#class="tango"]//div[#class="container-horizontal"]//li[contains(#class, 'jcarousel-item jcarousel-item-horizontal jcarouse')]/a
But anyway both of this xpaths don't work.
How to specify xpath to extract this link: http://insales.ru/images/bigpic.jpeg

There are several options but the shortest one is probably:
.//*[#id='carousel']/li/a/img

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Scrapy / XPATH : finding substring in image url - xpath

I think you should include /#src in order to getting the url path ".//ul[#class='test_ul']//img[contains(#src,'b_test')]/#src"

Related

Changing the onclick URL on Tumblr photosets

Can not get the value in the strong tag

How to properly get the value contained inside a section using XPath?

Thymeleaf href url

How to get xpath to a link to picture?

Categories

Resources