Watir - scraping a grid of items - ruby

I'm trying to scrape the app URLs from a directory that's laid out in a grid:
<div id="mas-apps-list-tile-grid" class="mas-app-list">
<div class="solution-tile-container">
<div class="solution-tile-content-container">
<a href="url.com/app/345">
<div class="solution-tile-container">
<div class="solution-tile-content-container">
<a href="url.com/app/567">
... and so on
Here are my 2 lines of Watir code that are supposed to create an array with all URLs from a page:
company_listings = browser.div(id: 'mas-apps-list-tile-grid')
companies = company_listings.map { |div| div.a.href }
But instead of an array with URLs, 'companies' returns:
#<Watir::Map: located: false; {:id=>"mas-apps-list-tile-grid", :tag_name=>"div"} --> {:tag_name=>"map"}>
What am I doing wrong?

The #map method for a Watir::Element (or specifically Watir::Div in this case) returns a Watir::Map element. This is used for locating <map> tags/elements on the page.
In contrast, the #map method for a Watir::ElementCollection will iterate over each of the matching elements. This is what is missing.
You have a couple of options. If you want all the links in the grid, the most straightforward approach is to create a #links or #as element collection:
company_grid = browser.div(id: 'mas-apps-list-tile-grid')
company_hrefs = company_grid.links.map { |a| a.href }
If there are only some links you care about, you'll need to use the link's parents to narrow it down. For example, maybe it's just links located in a "solution-tile-content-container" div:
company_grid = browser.div(id: 'mas-apps-list-tile-grid')
company_listings = company_grid.divs(class: 'solution-tile-content-container')
company_hrefs = company_listings.map { |div| div.a.href }

Related

nightwatch select current element's inner div

i'm new to nightwatch and was wondering if there's any good way to select the inner element of a current element and then get the text? Assuming i have the following..and i'm trying to retrieve the text inside (a) tags of each (li).
so i would like to get 'text to retrieve' and 'text to retrieve 2'.
...
<div class="mywrapperhere">
<ul>
<li>
<a>.....
<div>text to retrieve</div>
</a>
</li>
<li>
<a>.....
<div>text to retrieve 2</div>
</a>
</li>
<li>...
...
</div>
I'm thinking along these lines..
module.exports = {
'Demo test 1' : function (browser) {
....
//some sort of selector then gets from the anchor list
...'.mywrapperhere li a') : {
..
//for each element of the anchor..
{
//is there anyway to get it through something like
element.('div').innerHTML eg..
//or am i forced to use browser.execute( ...getElementsByTag method
//to achieve this?
}
}
browser.end();
}
};
Looking at the nightwatch api, i couldn't find anything allows me to do that. I'm particularly looking at the 'Element State' examples that doesn't seem to have a way for me to select the current element state's child element :
http://nightwatchjs.org/api/elementIdAttribute.html
The reason why i had to loop through the anchor tag level is because i'll need to retrieve a few more data besides the one from div tag, thanks!
You can use elementIdElement and elementIdText to get text from a child element. First you can get all the li elements by using .elements(). Then you use elementIdElement to get a child element. Then you can use elementIdText to get the text of this child element. Here is an example that will allow you to get the text of both list items in your snippet and log the values to the console.
browser.elements('css selector', 'li', function(listItems) {
listItems.value.forEach(function(listItem) {
browser.elementIdElement(listItem.ELEMENT, 'css selector', 'a', function(anchor) {
browser.elementIdText(anchor.ELEMENT, function(text) {
console.log(text.value);
});
});
}, browser); //have to pass in browser for scoping
});

Scraping the href value of anchor in Ruby

Working on this project where I have to scrape a "website," which is just a an html file in one of the local folders. Anyway, I've been trying to scrape down to the href value (a url) of the anchor tag for each student object. I am also scraping for other things, so ignore the rest. Here is what I have so far:
def self.scrape_index_page(index_url) #responsible for scraping the index page that lists all of the students
#return an array of hashes in which each hash represents one student.
html = index_url
doc = Nokogiri::HTML(open(html))
# doc.css(".student-name").first.text
# doc.css(".student-location").first.text
#student_card = doc.css(".student-card").first
#student_card.css("a").text
end
Here is one of the student profiles. They are all the same, so I'm just interested in scraping the href url value.
<div class="student-card" id="eric-chu-card">
<a href="students/eric-chu.html">
<div class="view-profile-div">
<h3 class="view-profile-text">View Profile</h3>
</div>
<div class="card-text-container">
<h4 class="student-name">Eric Chu</h4>
<p class="student-location">Glenelg, MD</p>
</div>
</a>
</div>
thanks for your help!
Once you get an anchor tag in Nokogiri, you can get the href like this:
anchor["href"]
So in your example, you could get the href by doing the following:
student_card = doc.css(".student-card").first
href = student_card.css("a").first["href"]
If you wanted to collect all of the href values at once, you could do something like this:
hrefs = doc.css(".student-card a").map { |anchor| anchor["href"] }

Selenium webdriver: How to find nested tags?

A webpage contains
<div class="divclass">
<ul>
<li>
"hello world 1"
<img src="abc1.jpg">
</li>
<li>
"hello world 2"
<img src="abc2.jpg">
</li>
</ul>
</div>
I am able to get data under div using
element = driver.find_element(class: "divclass")
element.text.split("\n")
But I want all links respective to the achieved data
I tried using
driver.find_elements(:css, "div.divclass a").map(&:text)
but failed.
How can I get related links to the data?
If you want to get the href attribute try the below code(I am not familiar with ruby so I am posting the code in Java).
List<WebElement> elements = driver.findElements(By.xpath("//*[#class='divclass']//a"));
for(WebElement webElement:elements){
System.out.println(webElement.getAttribute("href"));
}
The xpath points to all the a tags under the div tag with class name =divclass.
If you want to get the text of all the links, you can use the blow code:
List<WebElement> elements = driver.findElements(By.xpath("//*[#class='divclass']//a"));
for(WebElement webElement:elements){
System.out.println(webElement.getText());
}
Hope it helps.
In ruby
element = driver.find_elements(:xpath, "//*[#class='divclass']//a")
list = element.collect{|e| hash ={e.text => e.attribute("href")}}
will return corresponding links with data in array of hashes

How to refer Element using xpath inside multiple <li> that in turn contains <a> tag

How can i refer multiple elements present under li tag using xpath?
<div id="accordian">
<ul>
<li>
<h3 class="classroom"></h3>
<ul style="display: block;">
<li>name1</li>
<li>name2</li>
<li>name3</li>
<li>name4</li>
</ul>
</li>
i am using Selenium Webdriver, I tried following code to refer the element, but it returns a blank value.
List<WebElement> listelement=driver.findElements(By.xpath("//div[#id='accordian']/ul/li/ul/li"));
for(WebElement list: listelement)
{
System.out.println(list.getText());
}
List<WebElement> list=driver.findElements(By.xpath("//div[#id='accordian']/ul/li/ul/li"));
just add a tag at end of your xpath, that all this will work
//div[#id='accordian']/ul/li/ul/li/a"
*** This is a comment as I don't have access to Comments section ****
Hi,
Limit xpath till /ul and don't use /li. It will return list and then iterate over the child elements.
xpath("//div[#id='accordian']/ul/li/ul")
I doubt about the Xpath you tried, But below is the way you can achieve it.
List<WebElement> list=driver.findElements(By.xpath("//div[#id='accordian']/ul/li/ul/li"));
System.out.println("No of names present="+ list.size());
// use of for loop for iteration
for(int i=0;i<list.size();i++)
{
System.out.println(list.get(i).getText());
}
System.out.println("-------------------------");
//use of for each for iteration
for(WebElement wb: list)
System.out.println(wb.getText());
Do getText() on a tag elements. I always prefer using css over xpath. So here is my solution,
By byCss = By.cssSelector("#accordian>ul>li>ul>li>a");
List<WebElement> listElement = driver.findElements(byCss);
for(WebElement list: listElement)
{
System.out.println(list.getText());
}
I got the same problem and struggled with it for few days. You can use href tag as well to fetch the elements. Also You can try using 'a' tag. It will be something like this:
List<WebElement> listelement=driver.findElements(By.xpath("//div[#id='accordian']/ul/li/ul/li/a"));
for(WebElement list: listelement) {
String name= list.getAttribute("href");
System.out.println(name);
}
---should be comment, but don't have enough reputation---
I tried your solution on given HTML,
for me it is working fine for chromedriver and firefox.(printing all four values from list)
for InternetExplorer driver i am not able to get values, but it it because listElement.size() is 0
You can try
element.getAttribute("value") or elem.getAttribute("innerHTML");
to check what is happening here.
swapnil, your code with xpath is working for me, I get all the 4 elements, still you can try this as well
List<WebElement> listElements = driver.findElements(By.tagName("a"));
for(WebElement a : listElements){
System.out.println(a.getText());
}

How to perform click event on an element present in the anchor tag?

<div class="buttonClear_bottomRight">
<div class="buttonBlueOnWhite">
<a onclick="$find('{0}').close(true); callPostBackFromAlert();" href="#">Ok</a><div
class='rightImg'>
</div>
</div>
</div>
In the above code i wanted to click on Ok button present in the anchor tag.But an id is not generated because of which i cannot directly perform a click action. I tried a work around mentioned below.
IElementContainer elm_container = (IElementContainer)pw.Element(Find.ByClass(classname));
foreach (Element element in elm_container.Elements)
{
if (element.TagName.ToString().ToUpper() == "A")
{
element.Click();
}
}
But here elm_container returns null for intial instances due to which we cannot traverse through it. Is there any other easy method to do it ?
Try this...
Div div = browser.Div(Find.ByClass("buttonClear_bottomRight")).Div(Find.ByClass("buttonBlueOnWhite"));
Debug.Assert(div.Exists);
Link link = div.Link(lnk => lnk.GetAttributeValue("onclick").ToLower().Contains(".close(true)"));
Debug.Assert(link.Exists);
link.Click();
Hope it helps!
You can simply Click on the link by finding its text
var OkButton = Browser.Link(Find.ByText("Ok"));
if(!OkButton.Exists)
{
\\Log error here
}
OkButton.Click();
Browser.WaitForCompplete();
Or you can find the div containing the link like,
var ContainerDiv = Browser.Div(Find.ByClass("buttonBlueOnWhite"));
if(!ContainerDiv.Exists)
{
\\Log error here
}
ContainerDiv.Links.First().Click();
Browser.WaitForComplete();

Resources