couldn't create the hash with option values and <div> text - ruby

HTML Code:
<div id="empid" title="Please first select a list to filter!"><input value="5418630" name="candidateprsonIds" type="checkbox">foo <input value="6360899" name="candidateprsonIds" type="checkbox"> bar gui<input value="9556609" name="candidateprsonIds" type="checkbox"> bab </div>
Now I would like to get the below using selenium-webdriver as
[[5418630,foo],[6360899,bar gui],[9556609,bab]]
Can it be done?
I tried the below code:
driver.find_elements(:id,"filtersetedit_fieldNames").each do |x|
puts x.text
end
But it is giving me the data as string "foo bar gui bab" on my console. Thus couldn't figure out - how to create such above expected Hash.
Any help on this regard?

The only way I know to get the text nodes like that would be to use the execute_script method.
The following script would give you the hash of option values and their following text.
#The div containing the checkboxes
checkbox_div = driver.find_element(:id => 'empid')
#Get all of the option values
option_values = checkbox_div.find_elements(:css => 'input').collect{ |x| x['value'] }
p option_values
#=> ["5418630", "6360899", "9556609"]
#Get all of the text nodes (by using javascript)
script = <<-SCRIPT
text_nodes = [];
for(var i = 0; i < arguments[0].childNodes.length; i++) {
child = arguments[0].childNodes[i];
if(child.nodeType == 3) {
text_nodes.push(child.nodeValue);
}
}
return text_nodes
SCRIPT
option_text = driver.execute_script(script, checkbox_div)
#Tidy up the text nodes to get rid of blanks and extra white space
option_text.collect!(&:strip).delete_if(&:empty?)
p option_text
#=> ["foo", "bar gui", "bab"]
#Combine the two arrays to create a hash (with key being the option value)
option_hash = Hash[*option_values.zip(option_text).flatten]
p option_hash
#=> {"5418630"=>"foo", "6360899"=>"bar gui", "9556609"=>"bab"}

Related

Using Nokogiri, how to convert html to text respecting block elements (ensuring they result in line breaks)

The Nokogiri #content method does not convert block elements into paragraphs, for example:
fragment = 'hell<span>o</span><p>world<p>I am Josh</p></p>'
Nokogiri::HTML(fragment).content
=> "helloworldI am Josh"
I would expect output:
=> "hello\n\nworld\n\nI am Josh"
How to convert html to text ensuring that block elements result in line breaks and inline elements are replaced with no space.
You can use #before and #after to add newlines:
doc.search('p,div,br').each{ |e| e.after "\n" }
This is my solution:
fragment = 'hell<span>o</span><p>world<p>I am Josh</p></p>'
HtmlToText.process(fragment)
=> "hello\n\nworld\n\nI am Josh"
I traverse the nokogiri tree, building a text string as I go, wrap the text in "\n\n" for block elements and "" for inline elements. Then gsub to clean up the abundance of \n chars at the end. It's hacky but works.
require 'nokogiri'
class HtmlToText
class << self
def process html
nokogiri = Nokogiri::HTML(html)
text = ''
nokogiri.traverse do |el|
if el.class == Nokogiri::XML::Element
sep = inline_element?(el) ? "" : "\n"
if el.children.length <= 0
text += "#{sep}"
else
text = "#{sep}#{sep}#{text}#{sep}#{sep}"
end
elsif el.class == Nokogiri::XML::Text
text += el.text
end
end
text.gsub(/\n{3,}/, "\n\n").gsub(/(\A\n+)|(\n+\z)/, "")
end
private
def inline_element? el
el && el.try(:name) && inline_elements.include?(el.name)
end
def inline_elements
%w(
a abbr acronym b bdo big br button cite code dfn em i img input
kbd label map object q samp script select small span strong sub
sup textarea time tt var
)
end
end
end

How to convert partial XML to hash in Ruby

I have a string which has plain text and extra spaces and carriage returns then XML-like tags followed by XML tags:
String = "hi there.
<SET-TOPIC> INITIATE </SET-TOPIC>
<SETPROFILE>
<KEY>name</KEY>
<VALUE>Joe</VALUE>
</SETPROFILE>
<SETPROFILE>
<KEY>email</KEY>
<VALUE>Email#hi.com</VALUE>
</SETPROFILE>
<GET-RELATIONS>
<COLLECTION>goals</COLLECTION>
<VALUE>walk upstairs</VALUE>
</GET-RELATIONS>
So what do you think?
Is it true?
"
I want to parse this similar to use Nori or Nokogiri or Ox where they convert XML to a hash.
My goal is to be able to easily pull out the top level tags as keys and then know all the elements, something like:
Keys = ['SETPROFILE', 'SETPROFILE', 'SET-TOPIC', 'GET-OBJECT']
Values[0] = [{name => Joe}, {email => email#hi.com}]
Values[3] = [{collection => goals}, {value => walk up}]
I have seen several functions like that for true XML but all of mine are partial.
I started going down this line of thinking:
parsed = doc.search('*').each_with_object({}) do |n, h|
(h[n.name] ||= []) << n.text
end
I'd probably do something along these lines if I wanted the keys and values variables:
require 'nokogiri'
string = "hi there.
<SET-TOPIC> INITIATE </SET-TOPIC>
<SETPROFILE>
<KEY>name</KEY>
<VALUE>Joe</VALUE>
</SETPROFILE>
<SETPROFILE>
<KEY>email</KEY>
<VALUE>Email#hi.com</VALUE>
</SETPROFILE>
<GET-RELATIONS>
<COLLECTION>goals</COLLECTION>
<VALUE>walk upstairs</VALUE>
</GET-RELATIONS>
So what do you think?
Is it true?
"
doc = Nokogiri::XML('<root>' + string + '</root>', nil, nil, Nokogiri::XML::ParseOptions::NOBLANKS)
nodes = doc.root.children.reject { |n| n.is_a?(Nokogiri::XML::Text) }.map { |node|
[
node.name, node.children.map { |c|
[c.name, c.content]
}.to_h
]
}
nodes
# => [["SET-TOPIC", {"text"=>" INITIATE "}],
# ["SETPROFILE", {"KEY"=>"name", "VALUE"=>"Joe"}],
# ["SETPROFILE", {"KEY"=>"email", "VALUE"=>"Email#hi.com"}],
# ["GET-RELATIONS", {"COLLECTION"=>"goals", "VALUE"=>"walk upstairs"}]]
From nodes it's possible to grab the rest of the detail:
keys = nodes.map(&:first)
# => ["SET-TOPIC", "SETPROFILE", "SETPROFILE", "GET-RELATIONS"]
values = nodes.map(&:last)
# => [{"text"=>" INITIATE "},
# {"KEY"=>"name", "VALUE"=>"Joe"},
# {"KEY"=>"email", "VALUE"=>"Email#hi.com"},
# {"COLLECTION"=>"goals", "VALUE"=>"walk upstairs"}]
values[0] # => {"text"=>" INITIATE "}
If you'd rather, it's possible to pre-process the DOM and remove the top-level text:
doc.root.children.select { |n| n.is_a?(Nokogiri::XML::Text) }.map(&:remove)
doc.to_xml
# => "<root><SET-TOPIC> INITIATE </SET-TOPIC><SETPROFILE><KEY>name</KEY><VALUE>Joe</VALUE></SETPROFILE><SETPROFILE><KEY>email</KEY><VALUE>Email#hi.com</VALUE></SETPROFILE><GET-RELATIONS><COLLECTION>goals</COLLECTION><VALUE>walk upstairs</VALUE></GET-RELATIONS></root>\n"
That makes it easier to work with the XML.
Wrap the string content in a node and you can parse that with Nokogiri. The text outside the XML segment will be text node in the new node.
str = "hi there. .... Is it true?"
doc = Nokogiri::XML("<wrapper>#{str}</wrapper>")
segments = doc.xpath('/*/SETPROFILE')
Now you can use "Convert a Nokogiri document to a Ruby Hash" to convert the segments into a hash.
However, if the plain text contains some characters that needs to be escaped in the XML spec you'll need to find those and escape them yourself.

how do I retrieve list of element attributes using watir webdriver

I am trying to write a watir webdriver script which retrieves the attributes of an element and then gets their values.
given element
<input id="foobar" width="200" height="100" value="zoo" type="text"/>
hoping that I can do something like the following:
testElement = $b.element(:id, "foobar")
testElement.attributes.each do |attribute|
puts("#{attribute}: #{testElement.attribute_value(attribute)}")
end
I would hope to get
id: foobar
width: 200
height: 100
value: zoo
type: text
I have seen people using javascript to get the list of attributes. The following shows how you could add a method to Watir::Element to get the list of attributes (though extending Watir is optional).
#Add the method to list attributes to all elements
require 'watir-webdriver'
module Watir
class Element
def list_attributes
attributes = browser.execute_script(%Q[
var s = [];
var attrs = arguments[0].attributes;
for (var l = 0; l < attrs.length; ++l) {
var a = attrs[l]; s.push(a.name + ': ' + a.value);
} ;
return s;],
self )
end
end
end
#Example usage
browser = Watir::Browser.new
browser.goto('your.page.com')
el = browser.text_field(:id, 'foobar')
puts el.list_attributes
#=> ["width: 200", "type: text", "height: 100", "value: zoo", "id: foobar"]
The answer from #Ċ½eljko-Filipin might be dated ... the document he links to, as of Aug '15, lists attribute_value as a method that extracts, well, the attribute value for a DOM element.
I have commonwatir-4.0.0, watir-webdriver-0.6.11, and Ruby 2.2.2 - the above method extracts the href of an a tag, for example.
What are you trying to do?
As far as I can see Element class does not have #attributes method: http://watir.github.com/watir-webdriver/doc/Watir/Element.html
You can get element's HTML and parse it:
browser.element.html

Convert HTML to plain text (with inclusion of <br>s)

Is it possible to convert HTML with Nokogiri to plain text? I also want to include <br /> tag.
For example, given this HTML:
<p>ala ma kota</p> <br /> <span>i kot to idiota </span>
I want this output:
ala ma kota
i kot to idiota
When I just call Nokogiri::HTML(my_html).text it excludes <br /> tag:
ala ma kota i kot to idiota
Instead of writing complex regexp I used Nokogiri.
Working solution (K.I.S.S!):
def strip_html(str)
document = Nokogiri::HTML.parse(str)
document.css("br").each { |node| node.replace("\n") }
document.text
end
Nothing like this exists by default, but you can easily hack something together that comes close to the desired output:
require 'nokogiri'
def render_to_ascii(node)
blocks = %w[p div address] # els to put newlines after
swaps = { "br"=>"\n", "hr"=>"\n#{'-'*70}\n" } # content to swap out
dup = node.dup # don't munge the original
# Get rid of superfluous whitespace in the source
dup.xpath('.//text()').each{ |t| t.content=t.text.gsub(/\s+/,' ') }
# Swap out the swaps
dup.css(swaps.keys.join(',')).each{ |n| n.replace( swaps[n.name] ) }
# Slap a couple newlines after each block level element
dup.css(blocks.join(',')).each{ |n| n.after("\n\n") }
# Return the modified text content
dup.text
end
frag = Nokogiri::HTML.fragment "<p>It is the end of the world
as we
know it<br>and <i>I</i> <strong>feel</strong>
<a href='blah'>fine</a>.</p><div>Capische<hr>Buddy?</div>"
puts render_to_ascii(frag)
#=> It is the end of the world as we know it
#=> and I feel fine.
#=>
#=> Capische
#=> ----------------------------------------------------------------------
#=> Buddy?
Try
Nokogiri::HTML(my_html.gsub('<br />',"\n")).text
Nokogiri will strip out links, so I use this first to preserve links in the text version:
html_version.gsub!(/<a href.*(http:[^"']+).*>(.*)<\/a>/i) { "#{$2}\n#{$1}" }
that will turn this:
link to google
to this:
link to google
http://google.com
If you use HAML you can solve html converting by putting html with 'raw' option, f.e.
= raw #product.short_description

How to build, sort and print a tree of a sort?

This is more of an algorithmic dilemma than a language-specific problem, but since I'm currently using Ruby I'll tag this as such. I've already spent over 20 hours on this and I would've never believed it if someone told me writing a LaTeX parser was a walk in the park in comparison.
I have a loop to read hierarchies (that are prefixed with \m) from different files
art.tex: \m{Art}
graphical.tex: \m{Art}{Graphical}
me.tex: \m{About}{Me}
music.tex: \m{Art}{Music}
notes.tex: \m{Art}{Music}{Sheet Music}
site.tex: \m{About}{Site}
something.tex: \m{Something}
whatever.tex: \m{Something}{That}{Does Not}{Matter}
and I need to sort them alphabetically and print them out as a tree
About
Me (me.tex)
Site (site.tex)
Art (art.tex)
Graphical (graphical.tex)
Music (music.tex)
Sheet Music (notes.tex)
Something (something.tex)
That
Does Not
Matter (whatever.tex)
in (X)HTML
<ul>
<li>About</li>
<ul>
<li>Me</li>
<li>Site</li>
</ul>
<li>Art</li>
<ul>
<li>Graphical</li>
<li>Music</li>
<ul>
<li>Sheet Music</li>
</ul>
</ul>
<li>Something</li>
<ul>
<li>That</li>
<ul>
<li>Doesn't</li>
<ul>
<li>Matter</li>
</ul>
</ul>
</ul>
</ul>
using Ruby without Rails, which means that at least Array.sort and Dir.glob are available.
All of my attempts were formed like this (as this part should work just fine).
def fss_brace_array(ss_input)#a concise version of another function; converts {1}{2}...{n} into an array [1, 2, ..., n] or returns an empty array
ss_output = ss_input[1].scan(%r{\{(.*?)\}})
rescue
ss_output = []
ensure
return ss_output
end
#define tree
s_handle = File.join(:content.to_s, "*")
Dir.glob("#{s_handle}.tex").each do |s_handle|
File.open(s_handle, "r") do |f_handle|
while s_line = f_handle.gets
if s_all = s_line.match(%r{\\m\{(\{.*?\})+\}})
s_all = s_all.to_a
#do something with tree, fss_brace_array(s_all) and s_handle
break
end
end
end
end
#do something else with tree
Important: I can't SSH into my linux box from work right now, which means I cannot test this code. Not even the tiniest bit. It could have silly, obvious syntax errors or logic since I wrote it from scratch right in the input box. But it LOOKS right... I think. I'll check it when I get home from work.
SOURCE = <<-INPUT
art.tex: \m{Art}
graphical.tex: \m{Art}{Graphical}
me.tex: \m{About}{Me}
music.tex: \m{Art}{Music}
notes.tex: \m{Art}{Music}{Sheet Music}
site.tex: \m{About}{Site}
something.tex: \m{Something}
whatever.tex: \m{Something}{That}{Does Not}{Matter}
INPUT
HREF = '#href'
def insert_leaves(tree,node_list)
next = node_list[0]
rest = node_list[1..-1]
tree[next] ||= {}
if not rest.empty?
insert_leaves(tree[next],rest)
else
tree[next]
# recursively, this will fall out to be the final result, making the
# function return the last (deepest) node inserted.
end
end
tree = {}
SOURCE.each_line do |line|
href, folder_string = line.split(': \\m') #=> ['art.tex','{Art}{Graphical}']
folders = folder_string.scan(/[^{}]+/) #=> ['Art','Graphical']
deepest_folder = insert_leaves(tree,folders)
deepest_folder[HREF] = href
end
# After this insertion, tree looks like this:
#
# {
# About = {
# Me = {
# #href = me.tex
# }
# Site = {
# #href = site.tex
# }
# }
# Art = {
# Graphical = {
# #href = graphical.tex
# }
# ...
#
# Edge case: No category should be named '#href'.
def recursive_html_construction(branch, html)
return if branch.keys.reject(HREF).empty? # abort if the only key is
# an href.
html << '<ul>'
branch.keys.sort.each do |category|
next if category == HREF # skip href entries.
html << '<li>'
if branch[category].key?(HREF)
html << "<a href='#{branch[category][HREF]}'> #{category}</a>"
else
html << category
end
html << '</li>'
recursive_html_construction(branch[category],html)
end
html << '</ul>'
end
html = ""
recursive_html_construction(tree,html)
puts html # => '<ul><li>About</li><ul><li><a href='me.tex'>Me</a></li><li>
# <a href='site.tex'>Site</a></li></ul><li>Art</li><ul><li>
# <a href='graphical.tex'>Graphical</a></li>...
I am not familiar with Ruby, but this is how could you do it in most languages:
Make an empty tree structure.
For each line:
For each {} element after \m:
If this element can be found in a tree at the same level, do nothing.
Otherwise create a tree node that is a child of the previous element or a root if it is the first one
At the end of the line, attach the foo.tex part the the leafmost node.
I do not know how this translates to Ruby, specifically, how tree structure is represented there.
There was a question very similar to this one recently, take a look at mine and other answers:
How to handle recursive parent/child problems like this?
Here's a working solution in Python after simply pushing all of the entries into an array.
import operator
import itertools
def overput(input):
for betweenput, output in itertools.groupby(input, key=operator.itemgetter(0)):
yield '<li>'
yield betweenput
output = [x[1:] for x in output if len(x) > 1]
if output:
yield '<ul>'
for part in overput(output):
yield part
yield '</ul>'
yield '</li>'

Resources