Choosing form fields by label using Mechanize? - ruby

I originally wrote 800 lines to do this, site by site. However, on talking to a couple of people, it seems like my code is way longer than it needs to be.
So, I've got an idea of what you'd do in Python, with a particular Egg, but I'm working with Ruby. So, does anyone have any idea how to enter details in a form field, based on what the label for it is, rather than the id/name? Using Mechanize.

Let's say your html looks like:
<label>Foo</label>
<input name="foo_field">
You can get the name of the input following a specific label:
name = page.at('label[text()="Foo"] ~ *[name]')[:name]
#=> "foo_field"
and use that to set the form value
form[name] = 'bar'

Related

How to properly scraping filtered content using XPath Query to Google Sheet?

So, this is about a content from a website which I want to get and put it in my Google Sheets, but I'm having difficulty understanding the class of the content.
target link: https://www.cnbc.com/quotes/?symbol=XAU=
This number is what I want to get from. Picture 1: The part which i want to scrape
And this is what the code looks like in inspector. Picture 2: The code shown in inspector
The target is inside a span attribute but the span attribute looks very difficult to me, so I tried to simplify it using this line of code here =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span")
Picture 3: List is shown when putting the code
After some tries, I am able to get the right target, but it confuse me, Im using this code =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span[#class='last original'][1]")
Picture 4: The right target is shown when the xpath query is more specified
As what you can see in 2nd Picture, 'last original' is not really the full name of the class, when I put the 'last original ng-binding' instead it gave me an error saying imported content is empty
So, correct me if my code is wrong, or accidental worked out somehow because there's another correct way?
How about this answer?
Modified formula 1:
When the name of class is last original and last original ng-binding, how about the following xpath and formula?
=IMPORTXML(A1,"//span[contains(#class,'last original')][1]")
In this case, the URL of https://www.cnbc.com/quotes/?symbol=XAU= is put in the cell "A1".
In this case, //span[contains(#class,'last original')][1] is used as the xpath. The value of span that the name of class includes last original is retrieved. So last original and last original ng-binding can be used.
Modified formula2:
As other xpath, how about the following xpath and formula?
=IMPORTXML(A1,"//meta[#itemprop='price']/#content")
It seems that the value is included in the metadata. So this sample retrieves the value from the metadata.
Reference:
IMPORTXML
To complete #Tanaike's answer, two alternatives :
=IMPORTXML(B2;"//span[#class='year high']")
"Year high" seems always equal to the current stock index value.
Or, with value retrieved from the script element :
=IMPORTXML(B2;"substring-before(substring-after(//script[contains(.,'modApi')],'""last\"":\""'),'\')")
Note : since I'm based in Europe, you need to replace ; with , in the formulas.

Ruby Dropdown select based on first character

I have a dropdown of vehicle makes that I want my users to start selecting as they type. First character typed should find the first character in the makes. The problem is that it searches anywhere in the make for a character and does not start at the first character like my users would like. For example... if you type an "r" you get: Alfa Romeo, Aston Martin, Chevrolet, Chrysler, etc... well before you get a Renault.
I create my list from the database. My haml looks like this:
.field-row
= render partial:'/makes/make_select', locals:{id:'make_id'}
That calls this _make_select.html.haml
= collection_select :vehicle, id, Make.all.order(:name), :id, :name, {prompt:true}, {title:'Select Make', class:'make-select', 'data-allow-empty' => 'no'}
I cant seem to find any docs on Ruby that shows me the valid options for collection_select. Maybe there is an option that allows this?
I have read that I might need to use jQuery to accomplish this. Was just trying to figure out if there might be an easier way with just a simple option in the haml.
Let me know if there is anything else you would like to see.
thanks!
You could add logic to your controller and to check using a SQL query, something like this, depends on the database you're using.
#makes = Make.where('name LIKE ?', "#{params[character]}%")
Check out the MySql docs on pattern matching
https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html
You should not add your Model query inside a view, should add it.
In your above case, I suggest you use https://github.com/argerim/select2-rails it very powerful and already have what you need.

Having trouble parsing these data in watir-webdriver

See hierarchy below:
All I need here is "Company Title", "Company Owner", "Company Owner Title", "Street Number Street Name", and "City, State Zipcode".
I tried b.div.span.bs, but that didn't work (bs because there are multiple blocks I'm gathering data from). I also thought I'd just try something like b.tds.split('<br>') and then replace all instances of tags and somehow delete empty array cells, but I found that each block is different, so the data don't align, i.e., Company Title might be in cell 1 for the first array, but then if Company Title isn't present (for the second block) then cell 1 would be Company Owner, which is conflicting... Anyway, just trying to find a clever way to get these data. Thank you.
Here is the actual HTML; however you must first click "View All".
You can split out everything inside the <div> and then split that by <br>. The first part is Company Title (if exists) and then Company Owner is last/second.
The rest is ... trickier. Some are pretty straighforward in that Fax and Member Since have labels so those are easy. The <a> is easy.
You could probably test the phone number with a regex and then back up from there. If the one before the phone number isn't <a> then it's city, state zip and the one before that is the address. If one exists before that, it's the Company Owner Title.
Everything after the phone number in your examples have labels so those are easy.
I'm not sure all of your use cases, but often for pages where the DOM is not very helpful I just get the text and parse with Ruby:
browser.td.text.split("\n").reject(&:empty?)
This doesn't directly answer the question, but it shows how I'd go about doing this using Nokogiri, which is the standard HTML/XML parser for Ruby:
require 'nokogiri'
doc = Nokogiri::HTML('<td><div></div><br>a<br>b<br>c</td>')
doc is Nokogiri's internal representation of the document.
We use landmarks in the markup to navigate and find things we want. In this case <div> is a good starting point:
doc.at('div').next_sibling.next_sibling.text # => "a"
next_sibling is how we tell Nokogiri to look at the next node. In this case it's stepping past the first <br> and looking at the a TextNode.
That'd result in unworkable code though, so there's a better way to go:
doc.search('td br').to_html # => "<br><br><br>"
That shows we can find all the <br> tags inside the <td>, so we just have to iterate over them and use them as our landmarks:
doc.search('td br').map{ |br| br.next_sibling.text } # => ["a", "b", "c"]

Make 1 page objects Two Elements ID's to 1 page object Variable

I am using the page object Gem with Watir. During testing I found that I have a field that has the same contents that show in the same location but have separate unique ID's. The difference is before you get to the page.
I tried using Xpaths:
select_list(:selectionSpecial, :xpath => "//select[#id='t_id9' OR #id='t_id7']")
But was met with a script error.
They are static ID's but I want to force them into one variable since that would allow me to use "populate_page_with" feature.
I have a long winded way currently, but I am fishing for a more efficient way that works with the page object Features.
Does anyone know of a way to do this?
Your approach of using xpath can work. The problem is the syntax errors in the xpath selector. It should be:
"//select[#id='t_id9' or #id='t_id7']"
Note:
The start should be a // rather than a \
Using or is case-sensitive; it has to be lower case
There was also a missing closing ' for the first id attribute
Personally, I find css and xpath selectors harder to use. I would go with the id locator with a regex. The following gives the same results, but some will find it easier to read.
select_list(:selectionSpecial, :id => /^t_id(7|9)$/)

HtmlUnit getByXpath returns null

I am coding with Groovy, however, I don't believe its a language specific set of questions.
I actually have two questions
First Question
I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null.
The page I'm testing it on is:
http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4
My code:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")
println title
This simply prints out: []
Is this because the page uses onclick()? If so, how would I get around that? Enabling javascript creates a mess in my cmd prompt.
Second Question
I am wanting to also get the image but am having trouble because when I attempt to get the XPath (via firebug) it shows up as: //*[#id="gmi-ResViewSizer_img"]
How do I handle that?
First Answer:
/html/body/div[3]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a
Your XPATH was off by one in the predicate filter for the 4th div of the body, it should be the 3rd div. It appears the HTML for the site can/does change from when you had origionally snagged the XPATH using Firebug. You may need to adjust your XPATH to accommodate for potential change and be less sensitive to some differences in document structure.
Maybe something like this:
/html/body//div/h1/a
Second Answer: The XPATH that you listed will work. It may look odd/short(and may not be the most efficient), but // starts at the root node and looks throughout every node in the tree, * matches on any element(to include the img) and the [] predicate filter restricts it to those that have an id attribute who's value equals "gmi-ResViewSizer_img".
There are many other options for XPATHs that could work as well. It will also depend on how often the HTML structure changes. This is one that also works for the page referenced to select that img:
/html/body/div/div/div/div/img[1]
I had the same problem, I solved when I realize iframe tags on page, try call
((HtmlPage)current_page.getFrames()[n].getEnclosedPage()).getElementByXPath(...
where n is the position in frame in iframe collection. It's work for me !!!
Thanks a lot.

Resources