Nokogiri: irregular divs - ruby

Trying to deal with irregular content in div elements. Namely what comes after the h3 titles. There is no set content under the h3 headings. However, I need to associate whatever text is there with the heading. There could be a ul or just a span or both. The main thing is not combining all the text under the h3 headings.
I have been able to navigate to my div using the .css operator. Each div contains one or more of 4 h3 headings followed by a comment or a list if there is more than one comment.
How can I separate whatever follows a h3 tag ending before the next tag (if there is one)?
You can see a sample of the div I'm working with here (I can grab whatever is between the h2 because its the same for every div):
<div class="inspection_container">
<h2 class="inspection_date_title">
<div class="calendar_list">
<span>Mar</span><strong>4</strong>
</div>Routine Inspection<small>Inspected Mar. 4, 2014</small>
</h2>
<h3>Actions taken by inspector</h3>
<ul>
<li class="Comment">
<strong>Consultation / Technical Assistance</strong><p>Instructions are given to the owner/operator to assist them with taking the proper actions to meet regulations.</p>
</li>
</ul>
</div>
<div class="inspection_container">
<h2 class="inspection_date_title">
<div class="calendar_list">
<span>Sep</span><strong>4</strong>
</div>Re-inspection<small>Inspected Sep. 4, 2013</small>
</h2>
<h3>Not in compliance</h3>
<ul>
<li class="X">
<strong>Premise is clean/sanitary</strong><p>Food premise is to be maintained in a clean and sanitary condition.</p>
</li>
</ul>
<h3>Actions taken by inspector</h3>
<ul>
<li class="Comment">
<strong>Consultation / Technical Assistance</strong><p>Instructions are given to the owner/operator to assist them with taking the proper actions to meet regulations.</p>
</li>
</ul>
</div>
<div class="inspection_container">
<h2 class="inspection_date_title">
<div class="calendar_list">
<span>Aug</span><strong>30</strong>
</div>Routine Inspection<small>Inspected Aug. 30, 2013</small>
</h2>
<h3>Not in compliance</h3>
<ul>
<li class="X">
<strong>Washrooms are cleaned regularly</strong><p>Washrooms are to be kept clean, sanitary, in good repair and must be supplied with liquid soap in a dispenser, single service/paper towels, cloth roller towel or hot air dryer and hot and cold running water.</p>
</li>
<li class="X">
<strong>Building interior is well-maintained</strong><p>Walls, floors and ceilings are to be maintained and in good repair.</p>
</li>
<li class="X">
<strong>Premise is clean/sanitary</strong><p>Food premise is to be maintained in a clean and sanitary condition.</p>
</li>
</ul>
<h3>Actions taken by inspector</h3>
<ul>
<li class="Comment">
<strong>Consultation / Technical Assistance</strong><p>Instructions are given to the owner/operator to assist them with taking the proper actions to meet regulations.</p>
</li>
</ul>
</div>

Provided that:
You only have intertwined h3 and ul elements till the end of the wrapping div
no other element can appear in this structure instead of ul
no other element can appear in this structure instead of h3
and that your example is representative, this should do the trick.
//ul[count(following-sibling::h3) = count(following-sibling::ul)]
If you have other elements in the same place as the ul but there is always only one between the h3s, you can use this expression
//ul[count(following-sibling::h3) = count(following-sibling::*[not(local-name() = 'h3')])]
As for grouping the h3 elements and the ul elements following them immediately, I don't think this is feasible in XPath alone. You'll need to do this in Ruby. I suggest searching for the div elements and parsing them imperatively, while counting the nodes and grouping the odd and even h3s and uls together

Related

How to keep whitespace using Thymeleaf's th:each for inline dom?

Using Thymeleaf th:each loop, whitespace is removed (or can't be added).
Thymeleaf code:
<div>
</div>
I expected:
<div>
Link 1
Link 2
Link 3
Link 4
Link 5
</div>
but html rendered below.
<div>Link 1Link 2Link 3Link 4Link 5</div>
How to add whitespace (in html file new line) using Thymeleaf th:each?
My Thymeleaf version is 3.0.12.RELEASE
If you want the links to be arranged horizontally with a single white space between them (as opposed to arranging them vertically using display:block) then you can use the Thymeleaf synthetic <th:block> element (documented here):
<div>
<th:block th:each="item : ${items}">
</th:block>
</div>
This will give you the same layout as you show in your question, when you run the first code snippet.
Update:
You can also use <span> instead of <th:block>, if you prefer:
<div>
<span th:each="item : ${items}">
</span>
</div>
This will give you the same end result (links arranged horizontally with a space between them), but the HTML generated to produce this layout will, of course, be slightly different.

How to get last locator of div using Robot Framework

I don't have any idea to get last locator from div
I try to count elements by Get Element Count in div but it got just 1
example html
<div class="add-product"
<p data-aura-rendered-by="188:14729;a">
<span data-aura-rendered-by="191:14729;a">01-January</span>
<p data-aura-rendered-by="195:14729;a">
<span data-aura-rendered-by="198:14729;a">02-February</span>
<p data-aura-rendered-by="230:14729;a">
<span data-aura-rendered-by="233:14729;a">07-July</span>
</p>
</div>
I need to count all elements in div or get last position in div (07-July) but each time the div contains a different number of elements (it depends on test data).
Use the following xpath it will identify the last element 07-July.
(//div[#class='add-product']//span)[last()]

Removing empty nodes but keep nodes with image tags

I am trying to remove all the empty nodes but the code also detects nodes with image tag as empty. I need the nodes with img tag to remain. Also I don't need nodes with whitespaces and other non printable characters. This is my current code:
$empties= $xpath->query('//*[not((*))]');
foreach($empties as $empty){
$empty->parentNode->removeChild($empty);
}
I need this to go:
<div class='blah'> </div>
and these to stay
<div class='blah'><img src='bla'/></div>
<div class='blah'>some text</div>
I'm not sure you've fully specified which nodes you want to stay, but the following XPath is consistent with your stated needs:
//*[not(self::img) and not(*) and not(text()[normalize-space()])]
(Building on Martin's comment.)
This will select for removal all elements that are not <img>, and have no element children, and have no direct text node children that contain more than just whitespace.
First, let's clear up the ambiguity by using a more comprehensive example:
<div id="d1">
<div id="d2"/>
<div id="d3" class='blah'><img src='bla'/></div>
<div id="d4" class='blah'>some text</div>
<div id="d5" class='blah'> </div>
<div id="d6" class='blah'>
</div>
</div>
Then
//*[not(*) and text()[not(normalize-space())]]
says
select
elements without child elements but with child text
consisting of only whitespace.
For the above XML, it selects the d5 and d6 divs, not the img, and not the d1 through d4 divs.

SCSS List Item Color Iteration

I'm completely new to SCSS and I'm trying to set a background color to all items of a selector.
My css selector is the following, and returns all items (of two seperate UL lists)
#g-showcase .g-menu-item
I set a color array as:
$colors: #fad941, #ffffff, #e02520, #a6a6a6, #c6c6c6, #e02520;
I would like to iterate over my selector results and set a unique color from my color array (which could be larger than the above).
I started playing with some code, but I tackled it incorrectly, as I'm iterating over colors and not over selector items. (Don't know how to do that :( )
#for $i from 1 through length($colors) {
#g-showcase li:nth-child(#{length($colors)}n+#{$i}) {
background: nth($colors, $i)
}
}
How could I achieve the desired result?
Thank you !
S.
The problem you have is - as far as SASS is concerned - it's ignorant to how many li items your HTML code has, it's a pre-processor that never really see's the DOM, so it wouldn't know when to stop generating CSS
I assume what your looking to do is have the ability to select which color each li item has set as it's background, rather than as you currently have it, which is applying colors in the order they appear in the color array.
To do this you could add some additional markup to you HTML to give the generated CSS and slightly tweak how your creating the array, using a map instead. You might be looking to avoid polluting your HTML will erroneous mark-up, but the below would work.
$colorz: (
foo: #f24162,
bar: #591240,
fee: #4c5573,
fum: #6fa0a6,
eye: #71d9d9
);
#each $pointer, $bgcolor in $colorz
{
#g-showcase li[pointer="#{$pointer}"] {
background: $bgcolor;
}
}
<ul id="g-showcase">
<li class="g-menu-item" pointer='bar'>The quick</li>
<li class="g-menu-item" pointer='foo'>Brown Fox</li>
<li class="g-menu-item" pointer='fee'>Jumped over</li>
<li class="g-menu-item" pointer="bar">the lazy</li>
<li class='g-menu-item' pointer="eye">dog</li>
</ul>
<ul id="g-showcase">
<li class="g-menu-item" pointer="fum">...and other exciting stories</li>
<li class="g-menu-item">that you hear from time-to-time</li>
</ul>
Note The above wont 'run' as it's sass, so there's a working version over on CodePen http://codepen.io/anon/pen/GJLXMq

Get multiple results xpath div text and next div text

Using xpath, can I get 2 results out of a page at a time? For example, using xpath I can get the first img element:
xpath='//div[#class="forecast-element graphic-box"]/img
...and the next sibling class="forecast-element".
I tried "and" without success:
xpath='//div[#class="forecast-element graphic-box"]/img and //div[#class="forecast-element"]'
also:
xpath='//div[#class="forecast-element graphic-box"]/img and following-sibling:://div[#class="forecast-element graphic-box"]'
I have this html:
<div class='forecast-element graphic-box ' style="background-image:url('/assets/images/forecast/BluePattern.png');">
<h4 style="color: #FFF;">AVALANCHE DANGER <span style="margin-left: 60px;"> MORNING </span><span style="margin-left: 210px;"> AFTERNOON</span></h4>
<img src="/images/forecast/2014-11-23_teton_hazard.gif" alt="Teton Area avalanche hazard rating for 11/23/2014" />
<div style='margin: 2px auto;'><a href="/assets/pdfs/North%20American%20Danger%20Scale.pdf" style='font-size: 14px; color: #CCC;'>View full danger scale definitions</a>
</div>
<a href="/assets/pdfs/North%20American%20Danger%20Scale.pdf" style='font-size: 14px;'>
<img src="/assets/images/forecast/DangerScale.png" style="margin-left: 150px; margin-top: 5px;" alt='Avalanche danger scale ratings'>
</a>
</div>
<div class='forecast-element'>
<h3 style='margin-bottom: 8px;'>GENERAL AVALANCHE ADVISORY</h3>
Moderate to heavy snowfall combined with strong southwesterly to northwesterly ridgetop winds have created unstable avalanche conditions. New wind slabs have developed at the mid and upper elevations. Snowfall over the past 24 hours has also added weight to existing weak layers near the base of the snowpack. Early season snowfall can easily cloud people’s judgment. Cautious route finding and conservative decision making will be essential for safe travel in avalanche terrain today.</div>
I would like to use following-sibling, as the item is right after the graphic-box element and there are other forecast-elements in the html above and below. BTW, I am using YQL if that makes a difference...
Ideally the results would be (psuedo here):
xpath imgsrc = "/images/forecast/2014-11-23_teton_hazard.gif"
xpath text = "GENERAL AVALANCHE
ADVISORY Moderate to heavy snowfall combined with strong
southwesterly to northwesterly........"
Thanks!
The correct syntax for selecting multiple nodes is |
Try this:
//div[#class="forecast-element graphic-box"]/img | //div[#class="forecast-element"]
As you mentioned, these are two separate query elements. In order to select following elements simply do this
//div[#class="forecast-element graphic-box"]/img | //div[#class="forecast-element graphic-box"]/following-sibling::div[#class="forecast-element"]
In some xpath parsers, this will also work:
//div[#class="forecast-element graphic-box"]/(img|following-sibling::div[#class="forecast-element"])

Resources