Selecting a specific div element with Xpath and Nokogiri? - xpath

I am relatively new to parsing and would like to get more practice. I want to parse the following URL: http://www.goodreads.com/quotes/tag/hard-work.
I want to grab all quotes tagged "hard-work". This is what the site code breaks down to:
<div class="content">
<div id="siteheader" class="uitext">
<div class="mainContentContainer ">
<div class="mainContent">
<div id="premiumAdTop">
<div class="mainContentFloat">
<div id="flashContainer"> </div>
<div id="connectPrompt" style="">
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote">
<h1>Quotes About Hard Work</h1>
<div class="leftContainer">
<div class="mediumText">
<div class="quote mediumText ">
<div class="quoteDetails ">
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth">
<div class="quoteText">
“It's hard to beat a person who never gives up.”
<br>
―
Babe Ruth
</div>
Right now my code is:
require "rubygems"
require "open-uri"
require "nokogiri"
#page = Nokogiri::HTML(open("http://goodreads.com/quotes"))
#div = #page.xpath("html/body/div[1]")
But the results aren't giving me the output that I want.
I think I ought to call the methods each and collect but I just don't know how to get to the node that I want, which I believe is contained somewhere in here:
<div id="connectPrompt" style="">
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote">
<h1>Quotes About Hard Work</h1>
<div class="leftContainer">
<div class="mediumText">
<div class="quote mediumText ">
<div class="quoteDetails ">
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth">
<div class="quoteText">
“It's hard to beat a person who never gives up.”
<br>
―
Babe Ruth
</div>
Can anyone point me in the right direction please? How far in do I have to go into the div class to get what I want?

You can use the XPath:
//div[#class = 'quoteText' and following-sibling::div[1][#class = 'quoteFooter' and .//a[#href and normalize-space() = 'hard-work']]]
to select all the div elements whose class is quoteText and which are followed by a div with class quoteFooter containing a link with hard-work.

Related

Error trying to get data using XPath on Google IMPORTXML function

I am trying to find the XPath to get 5 values of the following website: https://plataforma.penserico.com/dashboard/cp.pr?e=TRPL4
I want the values 7,59 2,04 1,81 7,60 7,59
For the first value I tried this command but I get #N/A:
=IMPORTXML("https://plataforma.penserico.com/dashboard/cp.pr?e=TRPL4";"//*[#id='j_idt104:0:j_idt109:1:chartPanel0']/div/span[1]")
The piece of HTML is like below:
<span id="j_idt104:0:j_idt109:1:chartPanel0">
<div class="c--anim-btn" style="color: #5DADE2;">
<span class="c-anim-btn">
7,59
</span>
<span>
<div style="font-size: 12px !important;">
<div style="width: 90%; left: 5%; position:relative;line-height:2em;white-space: nowrap;">
<div style="width:50%;float:left"><label class="idtri">1T:</label>2,04</div>
<div style="width:50%;float:right"><label class="idtri">2T:</label>1,81</div>
</div>
<div style="width: 90%; left: 5%; position:relative;line-height:2em;white-space: nowrap;">
<div style="width:50%;float:left"><label class="idtri">3T:</label>7,60</div>
<div style="width:50%;float:right"><label class="idtri">4T:</label>7,59</div>
</div>
</div>
</span>
</div></span>
What could be the second paramenter to get the values I want?
Thank you
You have to fix your XPath with the following one to get the values :
//tr[.//span[.='P/L']]/td[2]//text()[parent::span[#class='c-anim-btn'] or parent::div][normalize-space()]
Output (formula in C4):
EDIT : Individual XPath :
//tr[.//span[.='P/L']]/td[2]//text()[parent::span[#class='c-anim-btn']]
(//tr[.//span[.='P/L']]/td[2]//text()[parent::div][normalize-space()])[1]
(//tr[.//span[.='P/L']]/td[2]//text()[parent::div][normalize-space()])[2]
(//tr[.//span[.='P/L']]/td[2]//text()[parent::div][normalize-space()])[3]
(//tr[.//span[.='P/L']]/td[2]//text()[parent::div][normalize-space()])[4]

get width of a div which was generated by javascript on runtime

I have few columns which was generated by dhtmlx's javascript. The column was generated on run time which means that if I tried to view the source code of the page using the Chrome's View Page Source, I won't be able to see the generated code. But I can see the generated code by right clicking on the element and select 'Inspect Element'. So here's a part of the generated code that I copy pasted from 'Inspect Element':
<div id="scheduler_here" class="dhx_cal_container dhx_scheduler_grid" style="width:100%;height:100%;">
<div class="dhx_cal_header" style="width: 1148px; height: 20px; left: -1px; top: 60px;">
<div class="dhx_grid_line">
<div style="width:169px;">Start Date</div>
<div style="width:169px;">Time</div>
<div style="width:169px;">Event</div>
<div style="width:169px;">Location</div>
<div style="width:169px;">Stakeholders</div>
<div style="width:169px;">Type</div>
</div>
</div>
<div class="dhx_cal_data" style="width: 1148px; height: 506px; left: 0px; top: 81px; overflow-y: auto;">
<div>
<div class="dhx_grid_v_border" style="left:184px" id="imincol0"></div>
<div class="dhx_grid_v_border" style="left:370px" id="imincol1"></div>
<div class="dhx_grid_v_border" style="left:556px" id="imincol2"></div>
<div class="dhx_grid_v_border" style="left:742px" id="imincol3"></div>
<div class="dhx_grid_v_border" style="left:928px" id="imincol4"></div>
</div>
<div class="dhx_grid_area"><table></table></div>
</div>
</div>
I'm trying to get the column width of imincol0, imincol1, imincol2 and so on which you can see at the last part of the code. I have tried few methods to get the width of the columns with these ids but to no avail. I'll always get null.
If you use jquery you could do this:
var x = $('#imincol0').width();
If you're using pure js you could try this:
var x = document.getElementById('imincol0').offsetWidth;

Looking for same xpath for grid's column text from two different pages

In our application, there is a situation where there is a grid on two pages. I want to get text of columns from the grids. But both grid's column text has little different HTML.
Page 1 grid HTML:
<div class="ngHeaderContainer" ng-style="headerStyle()" style="width: 598px; height: 30px;">
<div class="ngHeaderScroller" ng-style="headerScrollerStyle()" ng-header-row="" style="height: 30px;">
<div class="ngHeaderCell ng-scope col0 colt0" ng-class="col.colIndex()" ng-repeat="col in renderedColumns" ng-style="{ height: col.headerRowHeight }" style="height: 30px;">
<div class="ngVerticalBar ngVerticalBarVisible" ng-class="{ ngVerticalBarVisible: !$last }" ng-style="{height: col.headerRowHeight}" style="height: 30px;"> </div>
<div ng-header-cell="">
<div class="ngHeaderSortColumn " ng-class="{ 'ngSorted': !col.noSortVisible() }" ng-style="{'cursor': col.cursor}" style="cursor: pointer;" draggable="true">
<div class="ngHeaderText ng-binding colt0" ng-class="'colt' + col.index" ng-click="col.sort($event)">Request ID</div>
For this, I've written xpath //div[#class='ngHeaderContainer']//div[#ng-header-cell='']//div[contains(#class,'ngHeaderText')]
Page 2 grid HTML
<div class="ngHeaderContainer" ng-style="headerStyle()" style="width: 598px; height: 30px;">
<div class="ngHeaderScroller" ng-style="headerScrollerStyle()" ng-header-row="" style="height: 30px;">
<div class="ngHeaderCell ng-scope col0 colt0" ng-class="col.colIndex()" ng-repeat="col in renderedColumns" ng-style="{ height: col.headerRowHeight }" style="height: 30px;">
<div class="ngVerticalBar ngVerticalBarVisible" ng-class="{ ngVerticalBarVisible: !$last }" ng-style="{height: col.headerRowHeight}" style="height: 30px;"> </div>
<div ng-header-cell="">
<div class="ng-scope ng-binding" ng-click="onColumnClick( 3, 'select', $event)">
Request ID
<img class="" ng-click="onColumnClick( 3, 'delete', $event)" src="styles/images/common/delete.png" ng-show="true">
<img>
</div>
For this, I've written xpath //div[#class='ngHeaderContainer']//div[#ng-header-cell='']/div
For grid, I've written a class and in that class I've method which returns column names. Since, xpath till reach to column name is different for grid on two different pages, I won't be able to use same method.
Can someone please help me to get xpath which can be used to return column names of the grid of both the pages?
This xpath will do it hopefully. I ran into similiar issue. Took help from here. This should return you both elements
//*[contains(#class, 'ng-binding')]

simple css text image layout

I am having trouble getting some text to be next to an image. I have it working on one site: http://puckpros.edkatzman.com/
but not on another: http://petra.edkatzman.com/
and I can't see the difference. Can another pair of eyes help?
Here is the jsfiddle: http://jsfiddle.net/tangobango/rK2mG/
HTML:
<div id="primary" class="content-area">
<div id="content" class="site-content" role="main">
<div id="front-page">
<div id="owner-photo ">
<img src="http://petra.edkatzman.com/wp-content/uploads/2013/01/Ed-headshot-small.jpg" alt="Ed Katzman" >
</div>
<div id="owner-description ">
<h1><span class="drop-cap">Hi! </span>My name is Michael Jennings,
the owner and master craftsman of Petra Stoneworks. I have over 25
years experience working with a wide range of both valuable and everyday stone pieces.</h1>
<h3>We specialize in the expert repair of stone objects and the creation
of original pieces. Have a look at the portfolio of our work and contact
us with any questions or to start a discussion of how we might help you.</h3>
</div>
</div>
</div><!-- #content .site-content -->
</div><!-- #primary .content-area -->
CSS:
#front-page {
background-color:#ffffff;
padding-left: 10px;
padding-right: 10px;
padding-bottom: 10px;
padding-top:10px;
overflow: hidden;
}
#owner-photo {
width:246px;
height:246px;
float:left;
}
Thanks for including the jsFiddle. That was very helpful. Your problem is a simple typo.
<div id="owner-photo ">
There is a space in the id attribute. Delete that space and the div should float.
I am not really sure what the outlook is for your page but you might want to move the owner-photo ID onto the image itself and remove the potentially unnecessary div from your code.
The diference are:
First, put float:left in id="owner-photo"
<div id="owner-photo " style="float: left;">
<img src="http://petra.edkatzman.com/wp-content/uploads/2013/01/Ed-headshot-small.jpg" alt="Ed Katzman">
</div>
Second:
I don't know why but, don't use <-h1>, use <-p>.
In the example of http://puckpros.edkatzman.com/
first there is:
<img class="left_image" src="http://puckpros.edkatzman.com/wp-content/themes/PuckPros/images/services.png">
and then:
<p>something</p>
and this is the result:
Hope it helps :)

Element selection using xPath

I am trying to write a test automation code and having a hard time finding an element using Xpath in the below structure.
<div id="270590-bar" class="chart-row clearfix" style="display: block;">
<div class="bar-col float">
<div class="bar-wrapper">
<div class="topic-name-wrapper" style="background-color: transparent;">Business</div>
<div class="bar" style="width:170px"></div>
</div>
<div style="float:left;position:relative; ">
<div class="level-dd-fake">Intermediate</div>
<select id="270590-level" class="level-dropdown level-select">
</div>
<div id="270590-un" class="topic unsubscribe" style="float:left; margin: 0px 0px 0px 1px !important;"></div>
</div>
There are several data rows which will use the same set of lines as above for each row.
When I give the value inside class="topic-name-wrapper" eg:- Business
I want to select the DropDown element at class="level-dropdown level-select"
Hope the question is clear and any help on this is really appreciated.
As far as I understand you need something like that:
//div[*[.='Business']]/following-sibling::div/select

Resources