I am trying to fetch image relative link of an image in order to download it.
You can find an example at this address: https://www.laforet.com/agence-immobiliere/colombes/acheter/colombes/appartement-2-pieces-20909147
The part of the html i see in my browser is this one:
<a href="javascript:void(0)" class="property__fullsize" data-v-6ac7cd72="" data-v-476b61a8="">
<img src="/media/cache/office9/laforet_paris17villiers/catalog/images/pr_p/2/0/9/0/9/1/4/7/20909147c.jpg?method=max&size=medium×tamp=1633799295" alt="" class="**property__photo**" data-v-e11dca30="" data-v-476b61a8="" data-v-6ac7cd72="">
<span class="svg-icon icon-full-size" data-v-10c7d875="" data-v-476b61a8="" data-v-6ac7cd72="">
<svg data-v-10c7d875="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 23.249 23.248" role="presentation" width="18" height="18" class="white">
<g data-v-10c7d875="" id="full-size" transform="translate(0 -0.002)">
<path data-v-10c7d875="" id="Tracé_1008" data-name="Tracé 1008" d="deleted for brevity"</path>
<path data-v-10c7d875="" id="Tracé_1009" data-name="Tracé 1009" d="deleted for brevity" transform="translate(-122.883 -122.879)"></path>
<path data-v-10c7d875="" id="Tracé_1010" data-name="Tracé 1010" d="deleted for brevity" transform="translate(-0.004 -122.879)"></path>
<path data-v-10c7d875="" id="Tracé_1011" data-name="Tracé 1011" d="deleted for brevity" transform="translate(-124.215 0)"></path>
</g>
</svg>
<!---->
</span>
</a>
I want to have this part
"/media/cache/office9/laforet_paris17villiers/catalog/images/pr_p/2/0/9/0/9/1/4/7/20909147c.jpg?method=max&size=medium×tamp=1633799295"
The xpath to reach this part should be quite straightforward:
response.xpath('//*[#class="property__photo"]/#src').get()
or
response.css(".property__photo").getall()
There doesn't seem to be any trap, I think I reached the good part of the html.
The problem is that, the output is this
'<img src="data:image/svg+xml;charset=UTF-8,%3Csvg%20width%3D%221%22%20height%3D%221%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20viewBox%3D%220%200%20%25%7Bw%7D%20%25%7Bh%7D%22%20preserv eAspectRatio%3D%22none%22%3E%3Crect%20width%3D%22100%25%22%20height%3D%22100%25%22%20style%3D%22fill%3Atransparent%3B%22%3E%3C%2Frect%3E%3C%2Fsvg%3E" alt="" width="1" height="1" class="property__photo" data-v-e11dca30 data-v-476b61a8>'
Why is there a difference between he html I see in my browser and the html scrapy inported ? is that javascript ? What kind of format is it ? Can i extract an image from this string ? Is this behaviour triggered by the webpage or by scrapy ?
EDIT
I decided to find another way to extract the links as adviced by #SuperUser.
I still do not know exactly what function changed the content of the html between what i see in my browser and what scrapy is downloading. Numerous obfuscated javascripts are contained within the page and I cannot find the culprit. Nevertheless, I found a workaround by taking content of flags.
You are trying to Get the ::attr(value).
Please use
response.css('img.property__photo::attr(href)').extract()
Related
I am using scrapy to scrape images. I notice that some image url is specified by #src,like the following:
<a href="http://www.wandoujia.com/apps/com.uu">
<img src="http://img.wdjimg.com/mms/icon/v1/5/09/14687d011083dc84036fc68dc3c80095_68_68.png" width="68" height="68" alt="UU电话" class="icon">
</a>
Some are different:
<a href="http://www.wandoujia.com/apps/com.hcsql.shengqiandianhua">
<img data-original="http://img.wdjimg.com/mms/icon/v1/6/44/a27006acfbe8b6aa39bee49c6f004446_68_68.png" alt="省钱电话" class="icon lazy" width="68" height="68" src="http://img.wdjimg.com/mms/icon/v1/6/44/a27006acfbe8b6aa39bee49c6f004446_68_68.png" style="display: block;">
</a>
I use the following code to extract. The result is : 1)if only the src occur, the #src is the real link of image; 2) if the data-original occurs, the #data-original is the real link,#src is not. So my question is what should i do if I want to extract the url of the image under the both two cases.
sel.xpath('/a/img/#src').extract()
You can try:
sel.xpath('//a/img[not(#data-original)]/#src | //a/img/#data-original').extract()
sorry for my English.
I have installed wordpress theme “Dynamix”. Dynamix uses plugin WPBakery Visual Composer. When I insert “Single Image” in post, then image have empty src. Example:
<a class="fancybox galleryimg blackwhite " style="width:200px" data-fancybox-group="gallerygdgrid_1" title="Controller" href="http://localhost/wp-content/uploads/2013/07/circle1.png">
<img class="gallery-img " width="200" height="200" alt="Controller" src="" style="visibility: visible; opacity: 1;">
</a>
When I insert image using wordpress embedded “insert media” window all works fine.
Where is the problem?
Thanks You!
http://continent-news.info/page_20.html
In the left column of news, they opens in a popup window, and loaded with Ajax.
<div class="ukraine_mask1">
<svg height="116px" width="142px">
<defs>
<mask id="ukraine_mask1" maskContentUnits="userSpaceOnUse" maskUnits="userSpaceOnUse">
<image xlink:href="mask/ukraine_mask1.png" height="116px" width="142px">
</mask>
</defs>
<foreignObject class="recov_ukraine_mask1" style="mask: url(#ukraine_mask1);" height="100%" width="100%">
<div class="element_mask mask1_ukraine">
<img src="../inf_images/small/7812_8926.jpeg">
</div>
</foreignObject>
</svg>
</div>
problem in Mozilla:
Some news used map with a mask, the first time the mask works well. But if I click another news with mask, they does not want to re-use mask... =( If I press again the first news, it will work.
The second time, does not want to re-use =(
If in firebug disable / apply
mask: url ("# ukraine_mask1")
in
foreignObject class = "recov_ukraine_mask1"
it will work again ...
Maybe someone have an idea how to solve this problem?
I tried to add a simple style in СSS, but does not help = (
I'm rendering an SVG image using
<img src="data:image/svg+xml;charset=utf-8;base64," + src />
where src is the Base64 encoded SVG image. Everything in the picture displays correctly except for the text in the fields which are not displayed at all. This problem exists in Chrome, but not in Internet Explorer. Any ideas on how to get around this problem?
If I right-click on the displayed picture, download it and view it in Linux's Image Viewer, the text shows up perfectly again.
Edit: Example of SVG image:
<svg width="700" height="220" title="test2" version="1.1" xmlns="http://www.w3.org/2000/svg">
<text y="100" x="90" dy=".32em" text-anchor="end">
12
</text>
</svg>
In my case was a problem, that a have created <text> as document.createElement('text') and the svg result looked like this:
<svg xmlns="http://www.w3.org/2000/svg">
<text xmlns y="100" x="90" dy=".32em" text-anchor="end">
12
</text>
</svg>
And after encoded to base 64 using window.btoa() the text also not being rendered.
In my case the reason was in an empty xmlns attribute in text tag.
Solution: I have created element using document.createElementNS('http://www.w3.org/2000/svg', 'text'). It creates an element with the specified namespace URI and qualified name.
This is the code I have written to navigate to another jsp page onclicking the image. But, I didn't get the output. Please help me out.
<td align="center" >
<img src="image.jsp?imgid=<%=rs.getInt(1)%>" width="100" height="100" **onclick="ModelList.jsp"** >
Onclick is an event which expects an event handler (e.g a javascript function) to handle the even. so you can do something like this
onclick="window.href.location = ModelList.jsp"
the below alteration will work ..
<img src="image.jsp?imgid=<%=rs.getInt(1)%>" width="100" height="100" >