How can I get Mechanize objects from Mechanize::Page's search method? - ruby

I'm trying to scrape a site where I can only rely on classes and element hierarchy to find the right nodes. But using Mechanize::Page#search returns Nokogiri::XML::Elements which I can't use to fill and submit forms etc.
I'd really like to use pure CSS selectors but matching for classes seems to be pretty straight forward with the various _with methods too. However, matching things like :not(.class) is pretty verbose compared to simply using CSS selectors while I have no idea how to match for element hierarchy.
Is there a way to convert Nokogiri elements back to Mechanize objects or even better get them straight from the search method?

Like stated in this answer you can simply construct a new Mechanize::Form object using your Nokogiri::XML::Element retrieved via Mechanize::Page#search or Mechanize::Page#at:
a = Mechanize.new
page = a.get 'https://stackoverflow.com/'
# Get the search form via ID as a Nokogiri::XML::Element
form = page.at '#search'
# Convert it back to a Mechanize::Form object
form = Mechanize::Form.new form, a, page
# Use it!
form.q = 'Foobar'
result = form.submit
Note: You have to provide the Mechanize object and the Mechanize::Page object to the constructor to be able to submit the form. Otherwise it would just be a Mechanize::Form object without context.
There seems to be no central utility function to convert Nokogiri::XML::Elements to Mechanize elements but rather the conversions are implemented where they are needed. Consequently, writing a method that searches the document by CSS or XPath and returns Mechanize elements if applicable would require a pretty big switch-case on the node type. Not exactly what I imagined.

Related

Besides Xpath, how do I capture custom attributes

I'm trying to capture the date of an object but the object does not have id or name attributes. Also there are multiple items with the class of "ng-binding" so I don't think I can use that.
Is there a way for me to capture values from objects using custom attributes? Meaning is there a way for me in Ruby to say
varObject = find_element(:ng-binding-html, "$ctrl.app.publishedDate")
The object I'm trying to capture is
<span ng-bind-html="$ctrl.app.publishedDate" class="ng-binding">11/20/2017</span>
I took a look at an older post which seems to be close to the same issue I'm having but I was sure.
Selenium webdriver : how to find the element in DOM based on custom attribute
Thanks,
Scott
Is there a reason why you don't want to use XPath? The Ruby Selenium WebDriver find_element method that I think you refer to in your question accepts XPaths as an argument.
You could use the following XPath:
"//span[#ng-bind-html='$ctrl.app.publishedDate']"
Your Ruby code could be:
varObject = #driver.find_element(:xpath,"//span[#ng-bind-html='$ctrl.app.publishedDate']")

Best way to output the content i load from ajax

I want to load objects with ajax and then for every object make with options.
What is better to do when i load content from server via ajax:
put everything in a string variable (including html tags) named output with += and put it on a loop for each object,then append it.
append an output for every object i load in a div
or a better solution
if there is a better solution is there anyone who can help me ?
Well that really depends on a technology stack that you are using.
But in essence it is exactly what is going on.
You have some array of products coming back from an ajax request.
You want to clear out the contents of the area where you are inserting those.
Then you wanna iterate over your collection of items and using some kind of template generate an html for each one. the simplest one in plain java script would be string concatenation.
then you concatenate them all and insert the result as innerHTML inside your container.
It may be ugly for a start but as you will learn more - you will improve.

How to load the jqgrid in a selector with context

In general we call the jqgrid as in$("#grid_loc").jqGrid({});
But i want to specify the context like $("#grid_loc",context).jqGrid({}). But this is not working. Can somebody help in this?
I have to load server side data using url option.
Infact i occured to have this, as i have tabs on my page.
In each tab, i have to have a jqgrid, not different grids but same grid with different data .
Here i am getting the tab context using var tabset = $("div.tabset");
newdivid = $("div[class*='active_tab']",tabset).attr("id");
var newmenudivid = $("#"+newdivid);
And
the grid code as
$("#grid_workflow", newmenudivid).jqGrid({....});
I have been trying to find out a way to do this. you can find some of my effort in the comments section of the link
how to develop same jqgrid in multiple tabs
i was successful with id overwriting for the same purpose. But that is not a good way though. So i am forced to have another approach ie. context
I suppose that you misunderstand some important things which corresponds to id attribute. The most important that all elements on the page having id attribute have to have unique value of the attribute. In other words the ids have to be unique over the whole HTML page.
So if you need create for example tree grids inside of tree tabs you have to define different id attributes for every grid. For example; grid_workflow1, grid_workflow2, grid_workflow3. If you create the tabs and grids dynamically then you can have some variable in the outer scope (for example global variable) and increase the value of the variable. You can construct id of the grid using some prefix (like "grid_workflow") and the value of the variable. In the way you can create multiple grids with unique ids. Many JavaScript libraries uses the way to generate unique id attribute. Ij you want you can use $.jgrid.randId() method which will returns you unique strings which can be used as ids.
Because of the syntax $("#grid_workflow", newmenudivid) you should understand one important thing. I would recommend never use it. The reason is very easy. It could help only if you have id duplicates. In all other cases if will works exactly like $("#grid_workflow") but slowly. The reason is easy to understand. Web browser hold internally the list if all ids on the page and if you use getElementById method directly of indirectly (in $("#grid_workflow")) the searching of the element with the required id will be like searching in the index in the database. So you will have best performance results. If you use $("#grid_workflow", newmenudivid) then you don't allow web browser to use the index of elements by id. So the usage of context will follow to slow searching throw all children elements of newmenudivid. So you should avoid usage of jQuery context with id selectors.

Inject JS variable into a haml template

I'm trying use HTML5 localStorage with a Ruby haml template and need to be able to get the value of localStorage.getItem('myItem') to pass to a java applet (code stripped down):
- content_box("MyBox") do
%object{:classid => "clsid:xxx"}
%param{:name => "myItem", :value => "javascript:localStorage.getItem('myItem')"}
%comment
%EMBED{:myItem => "javascript:localStorage.getItem('myItem')"}
%noembed
Is there a good way to do this? I can do something like:
:javascript
document.write("<param name='myItem' value="+localStorage.getItem('myItem')+">"
but that's so ugly!
Note that this is an object I'm embedding, and need the value to be present before document_ready; I cannot select the object and append the value to it on document_ready. The only other way I can think of is to do an ajax submission to make the value a Ruby variable ahead of time, but that's really unnecessary.
Thanks!
Sometimes the only way that works is ugly.
IF your data is stored on the client, creating a server request/page/action just to get the data and pass it back in a different form straight back to the client is uncessessary, and arguably uglier.
Go with using javascript to add the <param> tag.
If the object depends on JavaScript anyway, you may as well just write the whole element with JavaScript instead of just the param. Then you can do it on document ready.

Use of Mechanize

I want to get response from websites that take a simple input, which is also reflected in the parameter of the url. Is it better to simply get the result by using conventional methods, for example OpenURI.open_uri(...) with some parameter set, or it is better to use mechanize, extract the form, and get the result through submit?
The mechanize page gives an example of extracting a form and submitting it to get the search result from Google search. However, this much can be done simply as OpenURI.open_uri("http://www.google.com/search?q=...").read. Is there any reason I should try to use one way or the other?
There are lots of sites where it turns out to be easiest to use mechanize. If you need to log in, and set a cookie before accessing the data, then mechanize is a simple way of doing this. Similarly, if there are lots of hidden fields that need to be matched (such as CSRF token), then fetching the page using mechanize then submitting it with the data filled out is often a more foolproof method that crafting the URL yourself.
If it is a simple URI, like google's search pages, then manually constructing it may be simpler.

Resources