Capybara: Link with HTML content not found - ruby

Given a HTML structure like this
<strong>My Link</strong>
is not caught by Capybara through a cucumber step
When I follow "My Link"
using a default webstep
When /^(?:|I )follow "([^"]*)"$/ do |link|
click_link(link)
end
while this works:
When I follow "<strong>My Link</strong>"
I haven't been using Capbybara for long, but I can see what causes the problem. So, on a more general level — what's the proper way to go about this? Surely this case has to be pretty common, right?
Any ideas and general musings about Cucumber abuse very welcome!

I would move strong tag outside anchor tag or better use CSS for that.
Or you can assign id attribute to link and use it instead of content. This way is the best if your app supports multiple languages.
Otherwise you have to write some sort of xpath selector for such special case:
find(:xpath, "//a[contains(//text(), \"#{locator}\")]").click
Not tested, just an idea.

Related

Setting the correct xpath

I'm trying to set the right xpath for using RSelenium, but I'm not very experienced in this area, so any help would be much appreciated.
Since I'm not allowed to post pictures yet I have tried to add a link to a screenshot of the html:
The html
I need R to scrape the dates (28-10-2020 - 13-11-2020), but so far I have not been able to set the correct xpath when using html.nodes.
I'm trying to scrape from sites like this one: https://www.boligsiden.dk/adresse/topperne-9-3-33-2620-albertslund-01650532___9__3__33
I usually do this on python rather than R
As you can see in this image when you right-click on the element concerned. You get a drop-down menu with an x-path to the element.
Other than that, the site orientation and x-path might change and a full x-path might be a good option in the short-run, so I rather prefer driver.find_element_by_xpath('//button[contains(text(),"Login")]')\ .click()
In your case which would be find_element_by_xpath('//*[contains(#class, 'u-pb-4 u-block')]')
I hope this helps and it is mostly the same across different languages

Can Capybara select a checkbox or radio button within the scope of a specific field

I'm using Capybara on a form that has multiple checkbox fields with an "Other" option. The Capybara API gives us
page.check('Other')
but no way (that I can find) to limit the scope to a given field. You can limit the scope based on a CSS (or XPath) selector, but since none exist that make sense this requires that I either change the (ugly legacy) markup of the page just to accommodate Capybara, which seems like the wrong solution. (In a perfect world I'd have time to completely refactor the markup, and wind up with something semantically sensible that also gave me a way of selecting a scope for Capybara, but this is not a perfect world, and I don't want to just jam in classes all of the place to accommodate Capybara.)
This
page.within('[name=FieldName]') do
page.check('Other')
end
doesn't work, either, since Capybara is looking for a single parent node that it can use as the scope, and this gives a set of checkboxes. (It would be nice if Capybara supported that, but it doesn't.) It's like I'm passing a deck of cards to search through, and Capybara wants the box the cards go in, but I don't have any box.
I'd like to be able to do something like this
page.check('Other', :in => 'FieldName')
but I can't find anyway of doing that. As far as I can tell, the only options that can be passed in are text, visible, and exact. Am I missing something? Is there a way to do this without resorting to ugly workarounds?
Since you have a css-selector that can find the checkbox, you can use the find method to locate the checkbox.
page.find(:css, '[name=FieldName][value=Other]')
Then to check the checkbox, use set (which is used by the check method):
page.find(:css, '[name=FieldName][value=Other]').set(true)
You could also use the click method:
page.find(:css, '[name=FieldName][value=Other]').click
This is not the most elegant solution, but since no one is posting a better one (so far), here's the best I've come up with.
Just execute a script to do what you need. In my case I'm using jQuery:
page.execute_script('$("[name=FieldName][value=Other]").trigger("click");')

I have xpath written in Selenium RC & now I want to write the same xpath in selenium Webdriver. below is the xpath

I have xpath written in Selenium RC & now I want to write the same xpath in selenium Webdriver. below is the xpath:
"//table[#id='lgnLogin']/tbody/tr/td/table/tbody/tr[4]/td"
by using this xpath, i am capturing the error message displayed on my application like "Please check your Password".
Now how can I write it in Webdriver. I have different ways but, not worked out.
"String msg= driver.findElement(By.xpath("//*[#td='error2']")).toString();" - This is what i did in Webdriver.
Please help me out on this...
The XPath hadn't change from Selenium RC to WebDriver.
If you were able to use the some XPath-expression before it should work in WebDriver too (in 99% of cases as usual).
The code below should work but it's pretty hard to answer without seeing your HTML.
String msg= driver.findElement(By.xpath("//table[#id='lgnLogin']/tbody/tr/td/table/tbody/tr[4]/td")).getText();
//*[#td='error2'] - this expression looks bad as it means "any element with attribute named "td" and value "error2". I suppose that you don't have 'td' attributes in your HTML.
And to the end. I wouldn't recommend to use such long XPath expressions as in your example. It might be broken due to any minor change in layout. It's better to use some specific attributes rather than long hierarchy.
A good way to start would be to start with the basics : http://seleniumhq.org/docs/03_webdriver.html#introducing-the-selenium-webdriver-api-by-example
What findElement would do is just locate the element that you want to interact with. After you locate the element you need to perform the action that you want to do with the element. In your case, you want the text inside the element. So you can look at appropriate functions of the WebElement interface and use the appropriate method, in your case, getText.

extract xpath

I want to retrieve the xpath of an attribute (example "brand" of a product from a retailer website).
One way of doing it is using addons like xpather or xpath checker to firefox, opening up the website using firefox and right clicking the desired attrbute I am interested in. This is ok. But I want to capture this information for many attributes and right clicking each and every attribute maybe time consuming. Also, the other problem I have is that attributes I maybe interested in will be there for one product. The other attributes maybe for some other product. So, I will have to go that product & then do it manually again.
Is there an automated or programatic way of retrieving the xpath of the desired attributes from a website rather than having to do this manually?
You must notice that not all websites use valid XML that you can use xpath on...
That said, you should check out some HTML parsers that will allow you to use xpath on HTML even if it is not a valid XML.
Since you did not specify the technology you are working with - I'll suggest the .NET HTML Agility Pack, if you need others, search for questions dealing with this here on SO.
The solution I use for this kind of thing is to write an xpath something like this:
//*[text()="Brand"]/following-sibling::*
//*[text()="Color"]/following-sibling::*
//*[text()="Size"]/following-sibling::*
//*[text()="Material"]/following-sibling::*
It works by finding all elements (labels) with the text you want and then looking to the next sibling in the HTML. Without a specific URL to see I can't help any further.
This is a generalised version you can make more specific versions by replacing the asterisks is tag types, and you can navigate differently by replacing the axis following sibling with something else.
I use xPaths in import.io to make APIs for this kind of thing all the time, It's just a matter of finding a xPath that's generic enough to find the HTML no matter where it is on the page, but being specific enough to get the right data.

What algorithms could I use to identify content on a web page

I have a web page loaded up in the browser (i.e. its DOM and element positioning are both accessible to me) and I want to find the block element (or a sorted list of these elements), which likely contains the most content (as in a continuous block of text). The goal is to exclude things like menus, headers, footers and such.
This is my personal favorite: VIPS: a Vision-based Page Segmentation Algorithm
First, if you need to parse a web page, I would use HTMLAgilityPack to transform it to an XML. It will speed everything and will enable you, using a simple XPath to go directly to the BODY.
After that, you have to run on all the divs (You can get all the DIV elements in a list from the agility pack), and get whatever you want.
There's a simple technique to do this,based on analysing how "noisy" HTML is, i.e., what is the ratio of markup to displayed text through an html page. The Easy Way to Extract Useful Text from Arbitrary HTML describes this tex, giving some python code to illustrate.
Cf. also the HTML::ContentExtractor Perl module, which implements this idea. It would make sense to clean the html first, if you wanted to use this, using beautifulsoup.
I would recommend Vit Baisa's thesis on Web Content Cleaning, I think he has some code too, but I can't find a link for it. There is also a discussion of the very same problem on the natural language processing LingPipe blog.

Resources