count images in a page using capybara - ruby

I want to count the images displayed in a page using capybara.The html code displayed below.for that i use following code to return the total count but the count returns 0.In my page i have 100 more images.
c= page.all('.thumbnail_select').count
puts c(returns 0)
HTML
<a class="thumbnail thumbnail_img_wrap">
<img alt="" src="test.jpg">
<div class="thumbnail_select">
<div class="thumail_selet_backnd"></div>
<div class="thumbil_selt_text">Click to Select</div>
</div>
<p>ucks</p>
<span class="info_icon"><span class="info_icon_img"></span></span>
</a>
<a class="thumbnail thumbnail_img_wrap">
<img alt="" src="test1.jpg">
<div class="thumbnail_select">
<div class="thumail_selet_backnd"></div>
<div class="thumbil_selt_text">Click to Select</div>
</div>
<p>ucks</p>
<span class="info_icon"><span class="info_icon1_img"></span></span>
</a>
.........
.........
How can i count the total images?

You have a few options.
Either find all div's with class thumbnail_select by using all("div[class='thumbnail_select']").count
But this is an awkward way of doing it since it looks for the div and not the images.
A better way would to be to look for all images using all("img").count as long as no other image is present on the page.
If neither of these works either the problem might be that your page is not loaded when you start looking for the images. Then just simply put a page.should have_content check before the image count to make sure that the page is loaded.

Related

Parsing through response created with XPath

Using Scrapy, I want to extract some data from a HTML well-formed site. With XPath I am able to extract a list of items, but I am not able to extra data from the elements in the list, using XPath
All XPath's have been tested using XPather. I have tested the issue using a local file that contains the webpage, same issue.
Here goes:
# Get the webpage
fetch("https://www.someurl.com")
# The following gives me the expected items from the HTML
products = response.xpath("//*[#id='product-list-146620']/div/div")
The items are like this:
<div data-pageindex="1" data-guid="13157582" class="col ">
<div class="item item-card item-card--static">
<div class="item-card__inner">
<div class="item__image item__image--overlay">
<a href="/www.something.anywhere?ref_gr=9801" class="ratio_custom" style="padding-bottom:100%">
</a>
</div>
<div class="item__text-container">
<div class="item__name">
<a class="item__name-link" href="/c.aspx?ref_gr=9801">The text I want</a>
</div>
</div>
</div>
</div>
</div>
When using the following Xpath to extract "The text I want", i dont get anything:
XPATH_PRODUCT_NAME = "/div/div/div/div/div[contains(#class,'item__name')]/a/text()"
products[0].xpath(XPATH_PRODUCT_NAME).extract()
The output is empty, why?
Try the following code.
XPATH_PRODUCT_NAME = ".//div[#class='item__name']/a[#class='item__name-link']/text()"
products[0].xpath(XPATH_PRODUCT_NAME).extract()

Using page.at with CSS selector in Mechanize

I am trying to scrape a webpage with Mechanize, with the following structure:
<div id="searchResultsBox">
<div class="listings-wrap">
<div class="listings-header">
<div class="listing-cat">Category</div>
<div class="listing-name">Name</div>
</div>
<ul class="listings">
<li class="listing">
<a href="/ShowRatings.jsp?tid=1143052">
<span class="listing-cat">
<span class="icon"></span>
TEXT
</span>
<span class="listing-name">
<span class="main">TEXT</span>
<span class="sub">TEXT</span>
</span>
</a>
</li>
...
I want to navigate to the page behind the <a> HTML element. Right now, I have:
agent = Mechanize.new
page = agent.get("URL")
page = page.at('#searchResultsBox > div.listings-wrap > ul > li:nth-child(1) > a')
but it keeps returning NIL (verified by puts page.class).
I also tried using sleep to try to ensure that pages have time to load before continuing.
Is there anything I am doing wrong? I thought using the CSS selector would do the trick.
Maybe the website content is loaded dynamically, by JavaScript.
Inspect the content of your page variable and see if the content there is complete or not.
If the content is incomplete, it means that there has to be some other requests, to the serwer returning that data. You can search for them opening Chrome DevTools (or other tool). In the tab "Network" you will see all requests made by website. Search for the one containing data that you need and then scrape it by Mechanize.

Using scrapy extract the url of image

I am using scrapy to scrape images. I notice that some image url is specified by #src,like the following:
<a href="http://www.wandoujia.com/apps/com.uu">
<img src="http://img.wdjimg.com/mms/icon/v1/5/09/14687d011083dc84036fc68dc3c80095_68_68.png" width="68" height="68" alt="UU电话" class="icon">
</a>
Some are different:
<a href="http://www.wandoujia.com/apps/com.hcsql.shengqiandianhua">
<img data-original="http://img.wdjimg.com/mms/icon/v1/6/44/a27006acfbe8b6aa39bee49c6f004446_68_68.png" alt="省钱电话" class="icon lazy" width="68" height="68" src="http://img.wdjimg.com/mms/icon/v1/6/44/a27006acfbe8b6aa39bee49c6f004446_68_68.png" style="display: block;">
</a>
I use the following code to extract. The result is : 1)if only the src occur, the #src is the real link of image; 2) if the data-original occurs, the #data-original is the real link,#src is not. So my question is what should i do if I want to extract the url of the image under the both two cases.
sel.xpath('/a/img/#src').extract()
You can try:
sel.xpath('//a/img[not(#data-original)]/#src | //a/img/#data-original').extract()

Center image within div

I've looked around and tried the suggestions to center an image, and it usually works just fine, but I've got a situation where something isn't right.
If you go to the test page:
http://www.503rephotography.com/_temp/ - you will see the image is pushed to the right a little bit, and if you increase or decrease the size of your screen, you will see it may shift a little further away from the center position.
I'm new to CSS and may have something messed up that is making this not work; I used some tips on here to make the div with the content on the page be somewhat centered. Now I'm just trying to center an image within that div box. Any help is much appreciated!!
You have to create a div container with margin-left:auto: and margin-right:auto; to center the content.
<div id="container">
<div id="header">
<h1 id="logo">
<a href="http://www.503rephotography.com">
<img src="images/logo.png" alt="503 rephotography">
</a>
</h1>
<ul class="navbar">
<li class="button">SERVICES</li>
<li class="button">PORTFOLIO</li>
<li class="button">CONTACT 503</li>
</ul>
</div>
<div id="topbar"></div>
<div id="content">
<img src="http://www.503rephotography.com/_temp/slides/1.jpg">
<div class="sub">
<p>Content will go here....why can't I get this div box to be centered???</p>
</div>
</div>
</div>
Try this fiddle see if it's what you need: http://jsfiddle.net/ftPa3/

How to stop auto-refresh onclick from thumbnails?

I have an image gallery on my site that uses thumbnails that enlarge above the thumbnail line when clicked on. I'm having an issue with the auto-refresh; every time I click one of the thumbnails, the page refreshes, which restores it to the "master image".
I'm not (and sort of refuse, on the grounds that I believe all this can be done with simple CSS and HTML) using anything fancy to write this code, despite my knowledge of HTML being amateur at best.
Here's a sample of the code. Let me know if you need to see a different piece of it.
<div id="rightcol">
<img name="ImageOnly. src='#' /><img src="#" />
</div>
<div id="leftcol"> <div>
<a href="" onclick="ImageOnly.src='#'"><img src="#" />
</div>
Edit: Somehow I seem to have fixed this issue by changing
<a href="" onclick="ImageOnly.src='#'">
to
<a href="#" onclick="ImageOnly.src='#'">
Not really sure why this worked but would love an explanation...?
Why not just use some simple ajax/javascript .innerHTML? instead of trying to stop the auto refresh that occurs when you click on a hyperlink that has #. That way you could update the rightcol synchroniously.
HTML
<div id="rightcol">
<img name="ImageOnly.src" src='#' />
</div>
<div id="leftcol">
<img src="#" />
</div>
AJAX Script
function ajaxMove(src)
{
var image = '<img src="'+src+'" alt="my transferred image" />';
document.getElementById('rightcol').innerHTML = image;
}
How is it used?
Request the object from the onclick event.
Build an image tag based off the information in the object.
Transfer the new image tag to the element with the id 'rightcol'
Other options
You could also remove the href="#" from the <a> tag and work directly from the onclick event and then apply style="cursor:pointer;". Then it will work like a regular hyperlink but without the refresh.
<a onclick="javascript:ajaxMove('ImageOnly.src')" style="cursor:pointer;" >Click Me</a>

Resources