I am writing a finder for pages with various but finite id's
#field = ['name1', 'name2']
def fieldfind
#field.each do |elem|
out = elem if page.has_css?(elem)
end
end
HTML
<input type='text' id = 'name1'>
For whatever reason, I cannot find name1. I tried find_field? and elem.to_s, but to no avail. Any ideas?
As Baldrick mentioned, the css locator is not right. However, after correcting that, you would still get a problem with the #field.each. This is going to return an array - not an element or the css of the field that exists.
If you want an element that matches one of the css in #field, try:
#field = ['#name2', '#name1']
def fieldfind
matching_css = #field.find{ |elem| page.has_css?(elem) }
page.find(matching_css)
end
Or if you just want the matching css-locator:
#field = ['#name2', '#name1']
def fieldfind
#field.find{ |elem| page.has_css?(elem) }
end
You missing #, the css selector to find element by id. It should be:
#field = ['#name1', '#name2']
...
You can use:
page.find(:css, "input[id=name1]")
If that works, go ahead and dynamically add the variable to your code.
I'm suggesting you try to find an element first (rather than just improving your existing each block) is perhaps your session object is pointed towards the wrong window or frame.
Related
I have a string which looks like the following:
string = " <SET-TOPIC>INITIATE</SET-TOPIC>
<SETPROFILE>
<PROFILE-KEY>predicates_live</PROFILE-KEY>
<PROFILE-VALUE>yes</PROFILE-VALUE>
</SETPROFILE>
<think>
<set><name>first_time_initiate</name>yes</set>
</think>
<SETPROFILE>
<PROFILE-KEY>first_time_initiate</PROFILE-KEY>
<PROFILE-VALUE>YES</PROFILE-VALUE>
</SETPROFILE>"
My objective is to be able to read out each top level that is in caps with the parse. I use a case statement to evaluate what is the top level key, such as <SETPROFILE> but there can be lots of different values, and then run a method that does different things with the contnts of the tag.
What this means is I need to be able to know very easily:
top_level_keys = ['SET-TOPIC', 'SET-PROFILE', 'SET-PROFILE']
when I pass in the key know the full value
parsed[0].value = {:PROFILE-KEY => predicates_live, :PROFILE-VALUE => yes}
parsed[0].key = ['SET-TOPIC']
I currently parse the whole string as follows:
doc = Nokogiri::XML::DocumentFragment.parse(string)
parsed = doc.search('*').each_with_object({}){ |n, h|
h[n.name] = n.text
}
As a result, I only parse and know of the second tag. The values from the first tag do not show up in the parsed variable.
I have control over what the tags are, if that helps.
But I need to be able to parse and know the contents of both tag as a result of the parse because I need to apply a method for each instance of the node.
Note: the string also contains just regular text, both before, in between, and after the XML-like tags.
It depends on what you are going to achieve. The problem is that you are overriding hash keys by new values. The easiest way to collect values is to store them in array:
parsed = doc.search('*').each_with_object({}) do |n, h|
# h[n.name] = n.text :: removed because it overrides values
(h[n.name] ||= []) << n.text
end
I have seen some examples of how to do this in Javascript or python, but am looking for how to find the text of the for attribute on a label. e.g.
<label for="thisIsTheTextNeeded">LabelText</label>
<input type="checkbox" id=thisIsTheTextNeeded">
We want to pick up the text from the for attribute on the label element. Then use that text as the id to find the checkbox.
The .NET solution might look like:
textWeNeed = selenium.getAttribute("//label[text()='LabelText']/#for");
I tried this in Ruby:
textWeNeed =
#browser.find_element("xpath//label[text()='LabelText']/#for")
and get the error:
expected "xpath//label[text()=input_value]/#for":String to respond to #shift (ArgumentError)
Any ideas?
Here is how I fixed it. Thanks for all the help!
element = #browser.find_element(:xpath=>"//label[text()=\'#{input_value}\']")
attrValue = element.attribute('for') listElement =
#browser.find_element(:id=>"#{attrValue}")
You should use the attribute method to get the value of the attribute. See the doc
element = #browser.find_element(:xpath, "//label[text()='LabelText']")
attrValue = element.attribute("for")
According to OP's comment and provided html I think the element needs some explicit wait
wait = Selenium::WebDriver::Wait.new(:timeout => 10) # seconds
element = wait.until { #browser.find_element(:xpath => "//label[contains(text(),'My list name')]") }
attrValue = element.attribute("for")
find_element function requires Hash to search with xpath.
correct way is here,
textWeNeed =
#browser.find_element(xpath: "xpath//label[text()='LabelText']/#for")
below is wrong way(your code).
textWeNeed =
#browser.find_element("xpath//label[text()='LabelText']/#for")
I found in Ruby when looking for Strings best to use is :find: instead of :find_element:
textWeNeed =
#browser.find("xpath//label[text()='LabelText']/#for")
when presented with the error: "String to respond to #shift"
How do I create an object if one is not found? This is the query I was running:
#event_object = #event_entry.event_objects.find_all_by_plantype('dog')
and I was trying this:
#event_object = EventObject.new unless #event_entry.event_objects.find_all_by_plantype('dog')
but that does not seem to work. I know I'm missing something very simple like normal :( Thanks for any help!!! :)
find_all style methods return an array of matching records. That is an empty array if no matching records are found. And an empty is truthy. Which means:
arr = []
if arr
puts 'arr is considered turthy!' # this line will execute
end
Also, the dynamic finder methods (like find_by_whatever) are officially depreacted So you shouldn't be using them.
You probably want something more like:
#event_object = #event_entry.event_objects.where(plantype: 'dog').first || EventObject.new
But you can also configure the event object better, since you obviously want it to belong to #event_entry.
#event_object = #event_entry.event_objects.where(plantype: 'dog').first
#event_object ||= #event_entry.event_objects.build(plantype: dog)
In this last example, we try to find an existing object by getting an array of matching records and asking for the first item. If there are no items, #event_object will be nil.
Then we use the ||= operator that says "assign the value on the right if this is currently set to a falsy value". And nil is falsy. So if it's nil we can build the object form the association it should belong to. And we can preset it's attributes while we are at it.
Why not use built in query methods like find_or_create_by or find_or_initialize_by
#event_object = #event_entry.event_objects.find_or_create_by(plantype:'dog')
This will find an #event_entry.event_object with plantype = 'dog' if one does not exist it will then create one instead.
find_or_initialize_by is probably more what you want as it will leave #event_object in an unsaved state with just the association and plantype set
#event_object = #event_entry.event_objects.find_or_initialize_by(plantype:'dog')
This assumes you are looking for a single event_object as it will return the first one it finds with plantype = 'dog'. If more than 1 event_object can have the plantype ='dog' within the #event_entry scope then this might not be the best solution but it seems to fit with your description.
I'm trying to scrape the cell values from an HTML table. Randomly, some of these cells are empty, and I can't guess which ones with any reliability.
Is there a way to fill a default value in for Nokogiri when it comes across an empty cell?
Thanks for any advice you can provide. Here's my code:
def scrape_stats
stats = []
(2002..2012).to_a.each do |year|
url = "website/#{year}"
doc = Nokogiri::HTML(open(url))
rows = doc.at_css("body tbody").text.split(" ")
(rows.count / 25).times do |i| # there are 25 columns per row
stats << rows.shift(25)
end
end
It sounds like you want something like:
doc.search('td:empty').each{|n| n.content = 'default value'}
This would basically involve using the Nokogiri::XML::Node#add_child method (or the shorter version, Nokogiri::XML::Node#<<) to add a new child node containing the text you want to add to the empty cell.
See this question for an example:
How to add child nodes in NodeSet using Nokogiri
I want to extract all the HTML5 data attributes from a tag, just like this jQuery plugin.
For example, given:
<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>
I want to get a hash like:
{ 'data-age' => '50', 'data-location' => 'London' }
I was originally hoping use a wildcard as part of my CSS selector, e.g.
Nokogiri(html).css('span[#data-*]').size
but it seems that isn't supported.
Option 1: Grab all data elements
If all you need is to list all the page's data elements, here's a one-liner:
Hash[doc.xpath("//span/#*[starts-with(name(), 'data-')]").map{|e| [e.name,e.value]}]
Output:
{"data-age"=>"50", "data-location"=>"London"}
Option 2: Group results by tag
If you want to group your results by tag (perhaps you need to do additional processing on each tag), you can do the following:
tags = []
datasets = "#*[starts-with(name(), 'data-')]"
#If you want any element, replace "span" with "*"
doc.xpath("//span[#{datasets}]").each do |tag|
tags << Hash[tag.xpath(datasets).map{|a| [a.name,a.value]}]
end
Then tags is an array containing key-value hash pairs, grouped by tag.
Option 3: Behavior like the jQuery datasets plugin
If you'd prefer the plugin-like approach, the following will give you a dataset method on every Nokogiri node.
module Nokogiri
module XML
class Node
def dataset
Hash[self.xpath("#*[starts-with(name(), 'data-')]").map{|a| [a.name,a.value]}]
end
end
end
end
Then you can find the dataset for a single element:
doc.at_css("span").dataset
Or get the dataset for a group of elements:
doc.css("span").map(&:dataset)
Example:
The following is the behavior of the dataset method above. Given the following lines in the HTML:
<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>
<span data-age="40" data-location="Oxford" class="highlight">Jim Foggs</span>
The output would be:
[
{"data-location"=>"London", "data-age"=>"50"},
{"data-location"=>"Oxford", "data-age"=>"40"}
]
You can do this with a bit of xpath:
doc = Nokogiri.HTML(html)
data_attrs = doc.xpath "//span/#*[starts-with(name(), 'data-')]"
This gets all the attributes of span elements that start with 'data-'. (You might want to do this in two steps, first to get all the elements you're interested in, then extract the data attributes from each in turn.
Continuing the example (using the span in your question):
hash = data_attrs.each_with_object({}) do |n, hsh|
hsh[n.name] = n.value
end
puts hash
produces:
{"data-age"=>"50", "data-location"=>"London"}
Try looping through element.attributes while ignoring any attribue that does not start with a data-.
The Node#css docs mention a way to attach a custom psuedo-selector. This might look like the following for selecting nodes with attributes starting with 'data-':
Nokogiri(html).css('span:regex_attrs("^data-.*")', Class.new {
def regex_attrs node_set, regex
node_set.find_all { |node| node.attributes.keys.any? {|k| k =~ /#{regex}/ } }
end
}.new)