Why is my xpath matching an undesired element? - xpath

I am trying to locate elements with the class "foo"
<div id="foo1">
<div id = "foo2">
<div class = "foo">
</div>
</div>
</div>
This is my xpath:
/div/div/div[contains(#class,'foo')]
And this is the code it's finding:
<div id="foo1">
<div id = "foo2">
<div class = "foo-err">
</div>
</div>
</div>
The path is returning the div class = "foo-err" element

contains() is a substring match. It's basically saying "if 'foo' is ANYWHERE in the class attribute, match the element".
If you want an exact match, then try
[#class='foo']

If you need to match foo in a list of classes but not foo-err, you need a more-complex construct: [contains(concat(' ', #class, ' '), ' foo ')], which will look for foo surrounded by spaces in the full class string also surrounded by spaces. Thus 'foo bar' -> ' foo bar ' -> contains ' foo ', but 'foobaz bar' -> ' foobaz bar ' -> does not contain ' foo '.

Related

Ruby remove new line character between html tags.

I would like to remove all the new line characters between
<div class="some class"> arbitrary amount of text here with possible new line characters </div>
Is this possible in ruby?
Yes, you can easily do this using the Nokogiri gem. For example:
require "rubygems"
require "nokogiri"
html = %q!
<div class="some class"> arbitrary amount of text
here with possible
new line
characters </div>
!
doc = Nokogiri::HTML::DocumentFragment.parse(html)
div = doc.at('div')
div.inner_html = div.inner_html.gsub(/[\n\r]/, " ").strip
puts html
puts '-' * 60
puts doc.to_s
When run will output this:
<div class="some class"> arbitrary amount of text
here with possible
new line
characters </div>
------------------------------------------------------------
<div class="some class">arbitrary amount of text here with possible new line characters</div>

How to define custom locating strategy for select

I looking for a proper way to redefine/extend locating strategy for select tag in Gwt app.
From html snippet you can see that select tag is not visible.
So to select option from list I need to click on button tag, and than select needed li tag from dropdown.
<div class="form-group">
<select class="bootstrap-select form-control" style="display: none; locator='gender">
<div class="btn-group">
<button class="dropdown-toggle" type="button" title="Male">
<div class="dropdown-menu open">
<ul class="dropdown-menu inner selectpicker" role="menu">
<li data-original-index="1"> (contains a>span with option text)
.....more options
</ul>
</div>
</div>
</div>
I see dirty solution: to implement method in BasePage class. This approach nice page_object sugar(options,get value, etc):
def set_nationality(country, nationality='Nationality')
select = button_element(xpath: "//button[#title='#{nationality}']")
select.click
option = span_element(xpath: "//span[.='#{country}']")
option.when_visible
option.click
end
Is there any other more clear way to do so? Using `PageObject::Widgets maybe?
UPD: Here what I expect to get:
def bool_list(name, identifier={:index => 0}, &block)
define_method("#{name}_btn_element") do
platform.send('button_for', identifier.clone + "//button")
end
define_method("#{name}?") do
platform.send('button_for', identifier.clone + "//button").exists?
end
define_method(name) do
return platform.select_list_value_for identifier.clone + '/select' unless block_given?
self.send("#{name}_element").value
end
define_method("#{name}=") do |value|
return platform.select_list_value_set(identifier.clone + '/select', value) unless block_given?
self.send("#{name}_element").select(value)
end
define_method("#{name}_options") do
element = self.send("#{name}_element")
(element && element.options) ? element.options.collect(&:text) : []
end
end
The select list appears to have the most identify attributes, therefore I would use it as the base element of the widget. All of the other elements, ie the button and list items, would need to be located with respect to the select list. In this case, they all share the same div.form-group ancestor.
The widget could be defined as:
class BoolList < PageObject::Elements::SelectList
def select(value)
dropdown_toggle_element.click
option = span_element(xpath: "./..//span[.='#{value}']")
option.when_visible
option.click
end
def dropdown_toggle_element
button_element(xpath: './../div/button')
end
def self.accessor_methods(widget, name)
widget.send('define_method', "#{name}_btn_element") do
self.send("#{name}_element").dropdown_toggle_element
end
widget.send('define_method', "#{name}?") do
self.send("#{name}_btn_element").exists?
end
widget.send('define_method', name) do
self.send("#{name}_element").value
end
widget.send('define_method', "#{name}=") do |value|
self.send("#{name}_element").select(value)
end
widget.send('define_method', "#{name}_options") do
# Since the element is not displayed, we need to check the inner HTML
element = self.send("#{name}_element")
(element && element.options) ? element.options.map { |o| o.element.inner_html } : []
end
end
end
PageObject.register_widget :bool_list, BoolList, :select
Notice that all locators are in relation to the select list. As well, notice that we use the accessor_methods to add the extra methods to the page object.
The page object would then use the bool_list accessor method. Note that the identifier is for locating the select element, which we said would be the base element of the widget.
class MyPage
include PageObject
bool_list(:gender, title: 'Gender')
bool_list(:nationality, title: 'Nationality')
end
The page will now be able to call the following methods:
page.gender_btn_element.click
page.gender_btn_element.exists?
page.gender
page.gender = 'Female'
page.gender_options
page.nationality_btn_element.click
page.nationality_btn_element.exists?
page.nationality
page.nationality = 'Barbados'
page.nationality_options

Replace markup (as a string) including certain inline elements [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
My intent is to modify a sentence within a tag.
For example change:
<div id="1">
This is text in the TD with <strong> strong </strong> tags
<p>This is a child node. with <b> bold </b> tags</p>
<div id=2>
"another line of text to a link "
<p> This is text inside a div <em>inside<em> another div inside a paragraph tag</p>
</div>
</div>
To this:
<div id="1">
This is modified text in the TD with <strong> strong </strong> tags
<p>This is a child node. with <b> bold </b> tags</p>
<div id=2>
"another line of text to a link "
<p> This is text inside a div <em>inside<em> another div inside a paragraph tag</p>
</div>
</div>
Which would mean I need to traverse the nodes grabbing a tag and getting all the text & style nodes, but not grabbing the children tags. Modifying the sentences and putting them back. I would need to do this for each tag with full text until all the content was modified.
For example grabbing the text and style nodes for div#1 would be:
"This is text in the TD with strong tags"
but as you can see, none of the other text underneath would be grabbed. It should be accessible and modifiable through a variable.
div#1.text_with_formating= "This is modified text in the TD with <strong> strong </strong> tags"
The below code removes all content, not just the children tags, keeping content leaves all content even the tags under div#1. Therefore, I'm not sure how to proceed.
Sanitize.clean(h,{:elements => %w[b em i strong u],:remove_contents=>'true'})
How would you recommend solving this?
If you want to find all the text nodes underneath an element, use:
text_pieces = div.xpath('.//text()')
If you want to find only the text that is an immediate child of an element, use:
text_pieces = div.xpath('text()')
For each text node, you can change the content any way you like. You must, however, just be sure you use my_text_node.content = ... instead of my_text_node.content.gsub!(...).
# Replace text that is a direct child of an element
def gsub_my_text!( el, find, replace=nil, &block )
el.xpath('text()').each do |text|
next if text.content.strip.empty?
text.content = replace ? text.content.gsub(find,replace,&block) : text.content.gsub(find,&block)
end
end
# Replace text beneath an element.
def gsub_text!( el, find, replace=nil, &block )
el.xpath('.//text()').each do |text|
next if text.content.strip.empty?
text.content = replace ? text.content.gsub(find,replace,&block) : text.content.gsub(find,&block)
end
end
d1 = doc.at('#d1')
gsub_my_text!( d1, /[aeiou]+/ ){ |found| found.upcase }
puts d1
#=> <div id="d1">
#=> ThIs Is tExt In thE TD wIth <strong> strong </strong> tAgs
#=> <p>This is a child node. with <b> bold </b> tags</p>
#=> <div id="d2">
#=> "another line of text to a link "
#=> <p> This is text inside a div <em>inside<em> another div inside a paragraph tag</em></em></p>
#=> </div>
#=> </div>
gsub_text!( d1, /\w+/, '(\\0)' )
puts d1
#=> <div id="d1">
#=> (ThIs) (Is) (tExt) (In) (thE) (TD) (wIth) <strong> (strong) </strong> (tAgs)
#=> <p>(This) (is) (a) (child) (node). (with) <b> (bold) </b> (tags)</p>
#=> <div id="d2">
#=> "(another) (line) (of) (text) (to) (a) (link) "
#=> <p> (This) (is) (text) (inside) (a) (div) <em>(inside)<em> (another) (div) (inside) (a) (paragraph) (tag)</em></em></p>
#=> </div>
#=> </div>
Edit: Here is code that allows you to extract runs of text+inline markup as a string, run a gsub on that, and replace the result with new markup.
require 'nokogiri'
doc = Nokogiri.HTML '<div id="d1">
Text with <strong>strong</strong> tag.
<p>This is a child node. with <b>bold</b> tags.</p>
<div id=d2>And now we are in another div.</div>
Hooray for <em>me!</em>
</div>'
module Enumerable
# http://stackoverflow.com/q/4800337/405017
def split_on() chunk{|o|yield(o)||nil}.map{|b,a|b&&a}.compact end
end
require 'set'
# Given a node, call gsub on the `inner_html`
def gsub_markup!( node, find, replace=nil, &replace_block )
allowed = Set.new(%w[strong b em i u strike])
runs = node.children.split_on{ |el| el.node_type==1 && !allowed.include?(el.name) }
runs.each do |nodes|
orig = nodes.map{ |node| node.node_type==3 ? node.content : node.to_html }.join
next if orig.strip.empty? # Skip whitespace-only nodes
result = replace ? orig.gsub(find,replace) : orig.gsub(find,&replace_block)
puts "I'm replacing #{orig.inspect} with #{result.inspect}" if $DEBUG
nodes[1..-1].each(&:remove)
nodes.first.replace(result)
end
end
d1 = doc.at('#d1')
$DEBUG = true
gsub_markup!( d1, /[aeiou]+/, &:upcase )
#=> I'm replacing "\n Text with <strong>strong</strong> tag.\n " with "\n TExt wIth <strOng>strOng</strOng> tAg.\n "
#=> I'm replacing "\n Hooray for <em>me!</em>\n" with "\n HOOrAy fOr <Em>mE!</Em>\n"
puts doc
#=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
#=> <html><body><div id="d1">
#=> TExt wIth <strong>strOng</strong> tAg.
#=> <p>This is a child node. with <b>bold</b> tags.</p>
#=> <div id="d2">And now we are in another div.</div>
#=> HOOrAy fOr <em>mE!</em>
#=> </div></body></html>
The easiest way would be:
div = doc.at('div#1')
div.replace div.to_s.sub('text', 'modified text')

How to use CSS selector with space in class name

I am trying to find CSS elements in a page, containing white space at the end of the class name:
#agent = Mechanize.new
page = #agent.get(somepage)
Where the tag is:
<div class="Example ">
When trying:
page.search('.Example')
the element is not found and when trying:
page.search('.Example ') <- space following the name
Nokogiri raises an exception:
Nokogiri::CSS::SyntaxError: unexpected '$' after 'DESCENDANT_SELECTOR'
Your implied premise, that a class cannot be found because it contains a space, is incorrect. Class names do not include spaces. Proof:
require 'nokogiri'
html = <<End
<html>
<span class="Example ">One</span>
<span class="Example foo">Two</span>
</html>
End
doc = Nokogiri::HTML(html)
puts doc.search('.Example')
Output:
<span class="Example ">One</span>
<span class="Example foo">Two</span>
So I think your HTML document simply doesn't have a class containing Example in it. If you provided the sample HTML, this question would have been easier to answer.
To find all elements having class attribute ending in whitespace:
page.search('*').select{|e| e[:class] =~ /\s$/}
If you specifically target the class attribute you can include spaces. In my case the class value had a space:
<p class="Event_CategoryTree category">
Here is how I targeted that element using Nokogiri:
page.at_css("[class='Event_CategoryTree category']")
You can use Xpath instead.
The following code will return all div containers with the class a class with spaces :
doc = Nokogiri::HTML(page)
result = doc.xpath('//div[#class="a class with spaces"]')

Inserting generated text into the same line as literal content with Haml

document.write('
- #thumbs.each_with_index do |attachment,index|
<div><img src="#..." /></div>
');
The code above outputs something like this:
document.write('
<div class="item" style="padding:20;float:left;"><div class="item" style="padding:20;float:left;">
');
Is there any way I can accomplish the same but without the breakline that HAML creates? I need to make it something like this:
document.write('<div class="item" style="padding:20;float:left;"><div class="item" style="padding:20;float:left;">');
Create and use a one_line block helper
Helper
def one_line(&block)
haml_concat capture_haml(&block).gsub("\n", '').gsub('\\n', "\n")
end
View
- one_line do
document.write('
- #thumbs.each_with_index do |attachment,index|
<div><img src="#..." /></div>
');
You can use > and <
For exampel:
%ul<
- 1.upto(5) do |i|
%li> asdf
Will output a one line list.
In your case:
document.write('
- 1.upto(5) do |i|
%div>
%a{ :href => "#..." }>
%img{ :src => "#..." }>
);
Use string interpolation in your template to inline Ruby code:
document.write('#{#thumbs.map.with_index{ |a,i| '<div>...</div>' }.join}');
For example:
require 'haml'
template = IO.read('tmp.haml')
puts template
#=> document.write('#{ #a.map.with_index{ |n,i| "<div>#{n}-#{i}</div>" }.join }')
#a = %w[a b c]
puts Haml::Engine.new(template).render(self)
#=> document.write('<div>a-0</div><div>b-1</div><div>c-2</div>')

Resources