With this xml:
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
<div some="r">d</div>
<div thing="t">f</div>
<div name="y">g</div>
we want to find only
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
which are those nodes having an attribute where the attribute name begins with val
You can try this :
//div/#*[starts-with(name(.), 'val')]
if you know that you are looking for the first attribute of the div element.
Edit:
Sorry didn't realize you wanted to select the elements themselves. You could use parent::div or what you did, but the proper way of doing this would be to select directly the div themselves :
//div[#*[starts-with(name(), 'val')]]
have you tried with .../#val* ?
which are those nodes having an attribute where the attribute name
begins with val
Use:
//div[#*[starts-with(name(), 'val')]]
This selects any div element in the document, that has at least one attribute, whose name starts with the string "val".
XSLT - based verification:
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="//div[#*[starts-with(name(), 'val')]]"/>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document (produced from the provided XML fragment):
<html>
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
<div some="r">d</div>
<div thing="t">f</div>
<div name="y">g</div>
</html>
selects and outputs the wanted nodes:
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
Related
I have an XML structure that looks like this:
<document>
<body>
<section>
<title>something</title>
<subtitle>Something again</subtitle>
<section>
<p xml:id="1234">Some text</p>
</section>
</section>
<section>
<title>something2</title>
<subtitle>Something again2</subtitle>
<section>
<p xml:id="12345678">Some text2</p>
<p getelement="1234"></p>
</section>
</section>
</body>
</document>
I want to search for the attribut value defined in "getelement". I got this code from a friendly soule here:
//section[section/p[#xml:id=#getelement]]/subtitle
but it doesnt work and i cant use current() since it is not supported in Arbortext.
You are comparing the attributes of the same element, but they are not. You have to find the getelement:
//section[section/p[#xml:id=//#getelement]]/subtitle
Also note that xml:id attributes cannot start with digits.
Given this HTML:
<li class="check_boxes input optional" id="activity_roles_input">
<fieldset class="choices">
<legend class="label"><label>Roles</label></legend>
<input id="activity_roles_none" name="activity[role_ids][]" type="hidden" value="" />
<ol class="choices-group">
<li class="choice">
<label for="activity_role_ids_104">
<input id="activity_role_ids_104" name="activity[role_ids][]" type="checkbox" value="104" />Language Therapist
</label>
</li>
<li class="choice">
<label for="activity_role_ids_103">
<input id="activity_role_ids_103" name="activity[role_ids][]" type="checkbox" value="103" />Speech Therapist
</label>
</li>
</ol>
</fieldset>
</li>
I am trying to use Selenium and xpath with it.
I am trying to select the first 'checkbox' input element link.
I am having problems selecting the element.
I cannot use the db ID (104) as this is for repeated tests with new ID's each time. I need to select the 'first' input checkbox, based on it having the text for Language Therapist.
I have tried:
xpath=(//li[contains(#id,'activity_roles_input')])//input
and
xpath=(//li[contains(#id,'activity_roles_input')])//contains('Language Therapist")
but it is not finding the element.
When I do:
xpath=(//li[contains(#id,'activity_roles_input')])
it gets to the input set. The problem I am having is selecting the first input checkbox control for 'Language Therapist'.
First, find any <li> containing the text and than look for in the descendant of those for the first checkbox.
xpath=(//li[contains(., "Language Therapist")]/descendant::input[#type="checkbox"][1])
(From Michael)
The above worked for me. In the end I actually used
xpath=(//li[contains(#id,'activity_roles_input')]/descendant::input[#type="checkbox"][1])
becuase I liked ID'ing by css ID.
interesting fact to notice when I try to run this small xsl against your xml.
XSL:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="//li[#id ='activity_roles_input']">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Output:
Roles
Language Therapist
Speech Therapist
You have
xpath=(//li[contains(#id,'activity_roles_input')])//input
Shouldn't that be
xpath=(//li[contains(#id,'activity_roles_input')]//input)
or rather
xpath=(//li[#id='activity_roles_input']//input)
?
xpath=(//li[#id='activity_roles_input']//input[1])
I have the following html code below that I am using watir to try and verify that March is not have a strike tag and April, June, and July do have strike tag. I'm pretty sure xpath is the key to my answer but have failed at coming up with right solution. Any help is greatly appreciated.
<div class="availability">
Available:
<ul>
<li><span class="month available">March</span></li>
<li><span class="month unavailable"><strike>April</strike></span></li>
<li><span class="month unavailable"><strike>May</strike></span></li>
<li><span class="month unavailable"><strike>June</strike></span></li>
</ul>
</div>
If you are using watir-webdriver, you can do:
#Create an array of the strike elements
months_with_strike = browser.elements(:tag_name, 'strike').collect(&:text)
#Determine if the specified month is in the array
months_with_strike.include?('March')
#=> false
months_with_strike.include?('April')
#=> true
Alternatively, if you only want to check for a single element:
browser.element(:tag_name => 'strike', :text => 'March').exists?
#=> false
browser.element(:tag_name => 'strike', :text => 'April').exists?
#=> true
The important part is that you can get custom elements by using the :tag_name as a locator.
Note: I would think this should also work in watir-classic, but for some reason I am getting exceptions.
Use (assuming the initial context node is the parent of the div element):
div/ul/li/span[not(strike)]
This selects any span elements that doesn't have a strike child (and is a child of a li that is a child of a ul that is a child of a div that is a child of the initial context node)
And use:
div/ul/li/span[strike]
This selects any span elements that has a strike child (and is a child of a li that is a child of a ul that is a child of a div that is a child of the initial context node)
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="div/ul/li/span[not(strike)]"/>
==============
<xsl:copy-of select="div/ul/li/span[strike]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied to the provided XML document:
<div class="availability">
Available:
<ul>
<li><span class="month available">March</span></li>
<li><span class="month unavailable"><strike>April</strike></span></li>
<li><span class="month unavailable"><strike>May</strike></span></li>
<li><span class="month unavailable"><strike>June</strike></span></li>
</ul>
</div>
the two XPath expressions are evaluated and the results (selected nodes) are copied to the output, delimited by a visually distinctive delimiter string:
<span class="month available">March</span>
==============
<span class="month unavailable">
<strike>April</strike>
</span>
<span class="month unavailable">
<strike>May</strike>
</span>
<span class="month unavailable">
<strike>June</strike>
</span>
I have a string (#description) that contains HTML code and I want to extract the content between two elements. It looks something like this
<b>Content title<b><br/>
*All the content I want to extract*
<a href="javascript:print()">
I've managed to do something like this
#want = #description.match(/Content title(.*?)javascript:print()/m)[1].strip
But obviously this solution is far from perfect as I get some unwanted characters in my #want string.
Thanks for your help
Edit:
As requested in the comments, here is the full code:
I'm already parsing an HTML document doing something where the following code:
#description = #doc.at_css(".entry-content").to_s
puts #description
returns:
<div class="post-body entry-content">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"><br><br><div style="text-align: justify;">
Some text</div>
<b>More text</b><br><b>More text</b><br><br><ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br><b>Content Title</b><br>
Some text<br><br>
Some text(with links and images)<br>
Some text(with links and images)<br>
Some text(with links and images)<br>
<br><br><img src="http://url.com/photo.jpg">
<div style="clear: both;"></div>
</div>
The text can include more paragraphs, links, images, etc. but it always starts with the "Content Title" part and ends with the javascript reference.
This XPath expression selects all (sibling) nodes between the nodes $vStart and $vEnd:
$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
To obtain the full XPath expression to use in your specific case, simply substitute $vStart with:
/*/b[. = 'Content Title']
and substitute $vEnd with:
/*/a[#href = 'javascript:print()']
The final XPath expressions after the substitutions is:
/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
Explanation:
This is a simple corollary of the Kayessian formula for the intersection of two nodesets $ns1 and $ns2:
$ns1[count(.|$ns2) = count($ns2)]
In our case, the set of all nodes between the nodes $vStart and $vEnd is the intersection of two node-sets: all following siblings of $vStart and all preceding siblings of $vEnd.
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vStart" select="/*/b[. = 'Content Title']"/>
<xsl:variable name="vEnd" select="/*/a[#href = 'javascript:print()']"/>
<xsl:template match="/">
<xsl:copy-of select=
"$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
"/>
==============
<xsl:copy-of select=
"/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (converted to a well-formed XML document):
<div class="post-body entry-content">
<a href="http://www.photourl">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"/>
</a>
<br />
<br />
<div style="text-align: justify;">
Some text</div>
<b>More text</b>
<br />
<b>More text</b>
<br />
<br />
<ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br />
<b>Content Title</b>
<br />
Some text
<br />
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
<br />
<br />
<a href="javascript:print()">
<img src="http://url.com/photo.jpg"/>
</a>
<div style="clear: both;"></div>
</div>
the two XPath expressions (with and without variable references) are evaluated and the nodes selected in each case, conveniently delimited, are copied to the output:
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
==============
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
To test your HTML, I have added tags around your code then pasting it in a file
xmllint --html --xpath '/html/body/div/text()' /tmp/l.html
output :
Some text
Some text
Some text
Some text
Now, you can use an Xpath module in ruby and re-use the Xpath expression
You will find many examples on stackoverflow website searches.
team = hxs.select ('//table[#class="tablehead"/tbody/tr[contains[.#class, "player"]')
The structure of the web site I whose table I want to select is as follows:
<html>
<body>
<table>
<tbody>
<tr>
<td>...</td>
<td>...</td>
...
</tr>
</tbody>
</table>
</body>
</html>
Since there are multiple tables in the web site, I only want to select the one whose class is defined as "tablehead". Also, for that table, I only want to select the tags whose class attributes contain the string "player". My attempt above looks a bit spotty to begin with. I tried running the crawler, and it says that the line I produced above is an invalid xpath line. Any advice would be nice.
I've came across these problems before, try to omit tbody in the xpath expression.
//table[#class="tablehead"/tbody/tr[contains[.#class, "player"]
Correcting this results in:
//table[#class='tablehead']/tbody/tr[contains(#class, 'player')]
This selects every tr the string value of whose class attribute contains the string "player" and that (the tr) is a child of a tbody that is a child of any table in the XML document, whose class attribute has string value "tablehead" .
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"//table[#class='tablehead']
/tbody/tr[contains(#class, 'player')]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (made just a little bit more realistic):
<html>
<body>
<table class="tablehead">
<tbody>
<tr class="major-player">
<td>player1</td>
<td>player2</td>
</tr>
</tbody>
</table>
</body>
</html>
the Xpath expression is evaluated and the selected nodes (just one in this case) are copied to the output:
<tr class="major-player">
<td>player1</td>
<td>player2</td>
</tr>