<div class="sResMain">
<b>
dogukan1905
</b>
<img src="http://eu.ipstatic.net/images/male.gif" width="11" height="11" class="sResSex">
20
<br>
<div class="sResMainTxt">
<div class="sResTxtField">I study at aircraft technology...</div></div></div>
I want to select number(20) between img and br tag. However I couldn't.
From what you posted, the text that you are trying to parse belongs to <div class="sResMain">. Moreover this is the only text that <div class="sResMain"> has. There is a method in Jsoup that will return the text that belongs (immediate textnode child) to a node. Try ownText() of Element.
Document doc = Jsoup.parse(htmlStr);
Elements elements = doc.select(".sResMain");
for(Element e : elements) {
String text = e.ownText();
System.out.println(text);
}
Related
I'm trying to select a node whose children do not contain some specific text.
For example:
<div class="b-margin">
<div class="tag">Pt</div>
<div class="tag">En</div>
</div>
<div class="b-margin">
<div class="tag">Ru</div>
<div class="tag">En</div>
</div>
How would i go about selecting the 'div class="b-margin"' nodes that do not have children with the text "Pt"?
Here is the simple xpath.
//div[#class='b-margin' and not(div[.='Pt'])]
Screenshot:
Suppose I have this XML:
<body>
<div id="1"></div>
<a id = "1"></a>
<a id = "2"></a>
<a id = "3"></a>
<div id="2"></div>
<a id = "4"></a>
<a id = "5"></a>
<a id = "6"></a>
</body>
Given the element //div[id='1'] how do I select "it's" <a> elements (Ids from 1 to 3) but exclude <a> elements with id 4 or higher, since they appear after <div id='2'>
This is one possible XPath :
//div[#id='1']/following-sibling::a[preceding-sibling::div[1][#id='1']]
The XPath basically select a after div[#id='1'] where nearest preceding sibling div element is the div[#id='1']. Or maybe the following simpler XPath is enough :
//a[preceding-sibling::div[1][#id='1']]
I am writing a functional test script to find a parent element that HAS a child that can be found, and if a descendant is found, return the parent. For example:
<div class="contentPane">
<h2>Heading 1</h2>
<p id="first">FIRST TEXT</p>
</div>
<div class="contentPane">
<h2>Heading 2</h2>
<p id="second">SECOND TEXT</p>
</div>
<div class="contentPane">
<h2>Heading 2</h2>
<p id="third"></p>
</div>
I want to find the contentPane that can find the paragraph with the id="second". My test case to find the parent is similar to this:
...
findAllCssSelector(".contentPane")
.then(function(array, setContext){
//for every element i in array
//I want to call its findByCssSelector(".second")
//and check if it is found. If it is
//I want to return the ith element in array
// to the command.
})
.findByTagName("h2")
.getVisibleText()
.then(function(text){
assert.strictEqual(text, "Heading 2");
})
....
...
How do I iterate through each array element and return the array element to the context stack?
For complex queries, Xpath is generally much more efficient than manually searching through elements. You could query with something like:
.findByXpath('//div[#class="contentPane" and p[#id="second"]]')
This will find the first DIV with class "contentPane" that contains a P with id "second".
I'm trying to create an xpath to find an element which doesn't have any 'p', 'li', or 'span' preceding elements under a common parent. For example I have this structure:
<a>
<div>
<div/>
<div>
<div>
<div>
</p>
</div>
<img/>
</div>
<div>
<ci/>
</div>
</div>
</div>
</a>
The node I'm interested in is the <img> element. So far I have this xpath:
count(/a/div[1]/div[position() = last()]//img[(count(preceding::*[name() = 'p' or name() = 'li' or name() = 'span']) = 0)]) > 0
I don't care if any of the unwanted elements are under /a/div[1]/div[1]/ only under /a/div[1]/div[2]. With that said, preceding won't work because it'll look under /a/div[1]/div[1] which I don't care for. The 'p' element in the above example can be in any number of divs.
EDIT:
I added the div containing the element <ci/>.
I was able to get this to work using the following:
count(/a/div[1]/div[position() = last()]//img[(count(preceding::*[(name() = 'p' or name() = 'li' or name() = 'span']) and ancestor::div[parent::div[parent::a] and descendant::ci]]) = 0)]) > 0
I am new to nokogiri and so far most familiar with CSS selectors, I am trying to parse information from a table, below is a sample of the table and the code I'm using, I'm stuck on the appropriate if statement, as it seems to return the whole contents of the table.
Table:
<div class="holder">
<div class ="row">
<div class="c1">
<!-- Content I Don't need -->
</div>
<div class="c2">
<span class="data">
<!-- Content I Don't Need -->
<span class="data">
</div>
</div>
...
<div class="row">
<div class="c1">
SPECIFIC TEXT
</div>
<div class="c2">
<span class="data">
What I want
</span>
</div>
</div>
</div>
My Script: (if SPECIFIC TEXT is found in the table it returns every "div.c2 span.data" variable - so I've either screwed up my knowledge of do loops or if statements)
data = []
page.agent.get(url)
page.search('div.row').each do |row_data|
if (row_data.search('div.c1:contains("/SPECIFIC TEXT/")').text.strip
temp = row_data.search('div.c2 span.data').text.strip
data << temp
end
end
There's no need to stop and insert ruby logic when you can extract what you need in a single CSS selector.
data = page.search('div.row > div.c1:contains("SPECIFIC TEXT") + div.c2 span.data')
This will include only those that match the selector (e.g. follow the SPECIFIC TEXT).
Here's where your logic may have gone wrong:
This code
if (row_data.search('div.c1:contains("SPECIFIC TEXT")'...
temp = row_data.search('div.c2 span.data')...
first searches the row for the specific text, then if it matches, returns ALL rows matching the second query, which has the same starting point. The key is the + in the CSS selector above which will return elements immediately following (e.g. the next sibling element). I'm making an assumption, of course, that the next element is always what you want.
I'd do
require 'nokogiri'
html = <<_
<div class="holder">
<div class ="row">
<div class="c1">
<!-- Content I Don't need -->
</div>
<div class="c2">
<span class="data">
<!-- Content I Don't Need -->
<span class="data">
</div>
</div>
<div class="row">
<div class="c1">
SPECIFIC TEXT
</div>
<div class="c2">
<span class="data">
What I want
</span>
</div>
</div>
</div>
_
doc = Nokogiri::HTML(html)
css_string = 'div.row > div.c1[text()*="SPECIFIC TEXT"] + div.c2 span.data'
doc.at(css_string).text.strip
# => "What I want"
How those selectors would work here -
[name*="value"] - Selects elements that have the specified attribute with a value containing the a given substring.
Child Selector (“parent > child”) - Selects all direct child elements specified by "child" of elements specified by "parent".
Next Adjacent Selector (“prev + next”) - Selects all next elements matching "next" that are immediately preceded by a sibling "prev".
Class Selector (“.class”) - Selects all elements with the given class.
Descendant Selector (“ancestor descendant”) - Selects all elements that are descendants of a given ancestor.