<div class="a-row a-spacing-micro" style="">
<i class="a-icon a-icon-star-medium a-star-medium-4"></i>
<a data-analytics="{"name":"Review.FullReview"}" class="a-size-base a-link-normal a-color-base review-title a-text-bold" href="/gp/cdp/member-reviews/A19123D9G66E0O/ref=pdp_new_read_full_review_link?ie=UTF8&page=1&sort_by=MostRecentReview#R1Z0A6K9CROFFV"> <span>Good Cheap Knee Pads</span>
</a>
</div>
I have this HTML that I am scraping with XPath. What XPath would I use to just return the class "a-star-medium-4"?
Thanks!
Jeff
If it's only for this specific HTML, you can extract the class name starting with a-star with this XPath:
substring(string(//i/#class),string-length(substring-before(string(//i/#class),'a-star')) +1)
When applied to your example HTML this returns a-star-medium-4.
As explanation: string(//i/#class) returns the class attribute value a-icon a-icon-star-medium a-star-medium-4. To get only the class name starting with a-star, substring() is used to remove the part of the string before a-star by cutting the string after the string-length() of the remaining string when it's cutted before a-star using substring-before().
Related
Working on this project where I have to scrape a "website," which is just a an html file in one of the local folders. Anyway, I've been trying to scrape down to the href value (a url) of the anchor tag for each student object. I am also scraping for other things, so ignore the rest. Here is what I have so far:
def self.scrape_index_page(index_url) #responsible for scraping the index page that lists all of the students
#return an array of hashes in which each hash represents one student.
html = index_url
doc = Nokogiri::HTML(open(html))
# doc.css(".student-name").first.text
# doc.css(".student-location").first.text
#student_card = doc.css(".student-card").first
#student_card.css("a").text
end
Here is one of the student profiles. They are all the same, so I'm just interested in scraping the href url value.
<div class="student-card" id="eric-chu-card">
<a href="students/eric-chu.html">
<div class="view-profile-div">
<h3 class="view-profile-text">View Profile</h3>
</div>
<div class="card-text-container">
<h4 class="student-name">Eric Chu</h4>
<p class="student-location">Glenelg, MD</p>
</div>
</a>
</div>
thanks for your help!
Once you get an anchor tag in Nokogiri, you can get the href like this:
anchor["href"]
So in your example, you could get the href by doing the following:
student_card = doc.css(".student-card").first
href = student_card.css("a").first["href"]
If you wanted to collect all of the href values at once, you could do something like this:
hrefs = doc.css(".student-card a").map { |anchor| anchor["href"] }
let say I have DOM like this:
<div id="tabsmenu">
<ul>
<li class="one">foo</li>
<li class="two">baz </li>
</ul>
</div>
and I would like to get the text from <a href> elements:
# desired output: ['#foo', '#baz']
How to do it using xpath and using combination id and element with a specific class within id ?
Already tried:
some_doc.xpath('//a[#id="tabsmenu"]/[#class="ui-tabs-anchor"]/#href')
# select all href tags of any a element that is in id tabsmenu and class attribute ui- tabs-anchor
EDIT - corrected tabmenu into tabsmenu
You're most likely looking for something like this:
//div[#id='tabsmenu']//a[#class='ui-tabs-anchor']/#href
That will get all href attributes that are part of an a tag with the class ui-tabs-anchor and inside a div element with the id tabsmenu.
Also you might want to take a look at this question:
Find out if class name contains certain text
This is because the class will match the exact value (ui-tabs-anchor) and maybe some additional class might be added there such as class="ui-tabs-anchor disabled" and then there will not be a match in there.
I'm using selenium webdriver to get some text on my webpage using xpath.
This is the code
<a class="ng-binding" data-toggle="tab" href="#tabCreatedByMe">
Created By Me
<span class="badge ng-binding">3</span>
</a>
I need to get the number '3'. this number is changing everytime
I made this code but it does not return anything
public String getAmountSubtab1() throws InterruptedException{
WebElement s = driver.findElement(By.xpath("//*[#class='badge ng-binding']"));
return s.getText(); }
Suggestions?
Are you sure you have only one span with the class badge ng-binding It might be that you might have another span before this with the same class name. Advised not to use class name when identifying an element. Use this xpath. Should work.
//a[contains(text(), 'Created By Me')]/span
in this div class:
<div class="black">
Vitamin
Watergate
</div>
I need an Xpath expression to get just the a href tag Description text, in this example "Vitamin" and "Watergate".
/div[#class='black']/a[0]/text()
will return Vitamin and
/div[#class='black']/a[1]/text()
will return Watergate.
Regards
You better google that first.
I'm trying to scrape only article text from web pages. I have discovered that the article is always surrounded with div tags. Unfortunately the class of these div tags is slightly different for each web page. I looked into using XPath but I don't think it will work due to the different class names. Is there a way I can get all the div tags and then get the class?
Examples
<div class="entry_single">
<p>I recently traveled without my notebook for the first time in ages.</p>
</div>
<div class="entry-content-pagination">
<p>Ward 9 Ald. Steven Dove</p>
</div>
That'd be easier using Linq.
foreach(HtmlNode div in doc.DocumentNode.Descendants("div"))
{
string className = div.GetAttributeValue("class", string.Empty);
// do something with class name
}