Ruhoh - Insert Tag every x items - ruby

I'm new to Ruby and Ruhoh and have I am trying to do something like "Rails each loop insert tag every 6 items?" but I am using Ruhoh.
Basically, I have a list of posts and every 3 posts I want to create a new row div.
I have looked through all the Ruhoh documentation and there doesn't appear to be an easy way to do this. I think I need to create a plugin in Ruhoh for a collection, but having no experience in ruby I don't really understand what I am doing. Any help or guidance in the right direction would be great,
Cheers.

I'm fairly new to ruby myself, however I think this solution meets your needs!
Create a new file in the plugin directory called pages_collection_view_addons.rb (if it doesn't already exist).
Add this to that file:
module PagesCollectionViewAddons
def chunks(n = 3)
# Get all the pages
pages = all
chunks = []
# Split the 'pages' array into chunks of size n
pages.each_slice(n) { |slice|
chunks.push({pieces: slice})
}
chunks
end
end
# Inform Ruhoh of this new addon
Ruhoh::Resources::Pages::CollectionView.send(:include, PagesCollectionViewAddons)
In your template add something such as:
{{# posts.chunks}}
<div class="row">
{{# pieces }}
<h1>{{ title }}</h1>
{{/ pieces }}
</div>
{{/ posts.chunks }}
This will iterate over each of the chunks where each chunk looks like:
{pieces: [post1, post2, post3]}
Hope this helps.

Related

Parsing a nested tag, moving it outside of the parent, and changing its type using Nokogiri

I have HTML coming from an API that I want to clean up and format it.
I'm trying to get any <strong> tags that are the first element inside a <p> tag, and change it to be the parent of the <p> tag, and convert the <p> tag to <h4>.
For example:
<p><strong>This is what I want to pull out to an h4 tag.</strong>Here's the rest of the paragraph.</p>
becomes:
<h4>This is what I want to pull out to an h4 tag.</h4><p>Here's the rest of the paragraph.</p>
EDIT: Apologies for the nature of the question being too 'please write this for me'. I posted the solution I came up with below. I just had to take the time to really learn how Nokogiri works, but it is quite powerful and it seems like you can do almost anything with it.
doc = Nokogiri::HTML::DocumentFragment.parse(html)
doc.css("p").map do |paragraph|
first = paragraph.children.first
if first.element? and first.name == "strong"
first.name = 'h4'
paragraph.add_previous_sibling(first)
end
end

scrapy xpath : selector with many <tr> <td>

Hello I want to ask a question
I scrape a website with xpath ,and the result is like this:
[u'<tr>\r\n
<td>address1</td>\r\n
<td>phone1</td>\r\n
<td>map1</td>\r\n
</tr>',
u'<tr>\r\n
<td>address1</td>\r\n
<td>telephone1</td>\r\n
<td>map1</td>\r\n
</tr>'...
u'<tr>\r\n
<td>address100</td>\r\n
<td>telephone100</td>\r\n
<td>map100</td>\r\n
</tr>']
now I need to use xpath to analyze this results again.
I want to save the first to address,the second to telephone,and the last one to map
But I can't get it.
Please guide me.Thank you!
Here is code,it's wrong. it will catch another thing.
store = sel.xpath("")
for s in store:
address = s.xpath("//tr/td[1]/text()").extract()
tel = s.xpath("//tr/td[2]/text()").extract()
map = s.xpath("//tr/td[3]/text()").extract()
As you can see in scrappy documentation to work with relative XPaths you have to use .// notation to extract the elements relative to the previous XPath, if not you're getting again all elements from the whole document. You can see this sample in the scrappy documentation that I referenced above:
For example, suppose you want to extract all <p> elements inside <div> elements. First, you would get all <div> elements:
divs = response.xpath('//div')
At first, you may be tempted to use the following approach, which is wrong, as it actually extracts all <p> elements from the document, not only those inside <div> elements:
for p in divs.xpath('//p'): # this is wrong - gets all <p> from the whole document
This is the proper way to do it (note the dot prefixing the .//p XPath):
for p in divs.xpath('.//p'): # extracts all <p> inside
So I think in your case you code must be something like:
for s in store:
address = s.xpath(".//tr/td[1]/text()").extract()
tel = s.xpath(".//tr/td[2]/text()").extract()
map = s.xpath(".//tr/td[3]/text()").extract()
Hope this helps,

Parsing multiple lists in HTML file with Nokogiri

I'm trying to learn scripting with Ruby, and this is my first problem.
I have an HTML file which contains states and their cities. I need to be able to access the cities and know which state they belong to in my Ruby code, so I plan on parsing the HTML and creating a hash for each city, like this: {New York => New York City}.
I'm attempting to use Nokogiri, which I'm just learning now.
<h4>State</h4>
<ul>
<li>city</li>
<li>city</li>
<li>city</li>
</ul>
<h4>State</h4>
<ul>
<li>city</li>
<li>city</li>
<li>city</li>
</ul>
<h4>State</h4>
<ul>
<li>city</li>
<li>city</li>
<li>city</li>
</ul>
I'm using this to get the states into an array:
require 'rubygems'
require 'nokogiri'
page = Nokogiri::HTML(open("to_parse.html"))
states = Array.new(100), index = 0
page.css('h4').each do |s|
states[index] = s.text
puts states[index]
index += 1
end
This actually doesn't really help; I need to figure out how I can get Nokogiri to parse the elements of each list into hashes containing the city and its state. I'm not sure how to have a loop break when it finishes the city list of one state, and create a new set of hashes for the city list of the next state.
I'm thinking I'll have to create a hash for each list element and store the text of the h4 tag for that list inside each hash, so I know which state the city belongs to. Which is what I'm not sure how to do.
Feel free to offer some advice on refactoring what I've got, as I know it could be done better.
XPath selectors can help you out here.
states = doc.css('li').map do |city|
state = city.xpath('../preceding-sibling::h4[1]')
[city.text, state.text]
end.to_h
#=> {'city' => 'State', ...}
This grabs all the li city elements, then traces back to their state. (the XPath reads like so: .. = up one level, preceding-sibling::h4 = the preceding h4 elements, [1] = the first such element)
Some comments on your code: In Ruby, you don't need to initialize arrays, and with the Enumerable methods like map you never need to track index variables in loops.
Note that the final to_h only works in Ruby 2.1 or greater.

Finding an Image Icon Next to a Text Item in Watir-WebDriver

The context is I'm using watir-webdriver and I need to locate if an image appears prior to a particular item in a list.
More specifically, there is a section of the site that has articles uploaded to them. Those articles appear in a list. The structure looks like this:
<div id="article-resources"
<ul class="components">
...
<li>
<div class="component">
<img src="some/path/article.png">
<div class="replies">
<label>Replies</label>
</div>
<div class="subject">
Saving the Day
</div>
</div>
</li>
...
</ul>
</div>
Each article appears as a separate li item. (The ellipses above are just meant to indicate I can have lots of liste items.)
What I want our automation to do is find out if the article has been appropriately given the image article.png. The trick is I need to make sure the actual article -- in the above case, "Saving the Day" -- has the image next to it. I can't just check for the image because there will be multiples.
So I figured I had to use xpath to solve this. Using Firefox to help look at the xpath gave me this:
id("article-resources")/x:ul/x:li[2]/x:div/x:img
That does me no good, though, because the key discriminator seems to be the li[2], but I can't count on this article always being the second in the list.
So I tried this:
article_image = '//div[#class="component"]/a[contains(.,"Saving the Day")]/../img'
#browser.image(:xpath => article_image).exist?.should be_true
The output I get is:
expected: true value
got: false (RSpec::Expectations::ExpectationNotMetError)
So it's not finding the image which likely means I'm doing something wrong since I'm certain the test is on the correct page.
My thinking was I could use the above to get any link (a) tags in the div area referenced as class "component". Check if the link has the text and then "back up" one level to see if an image is there.
I'm not even checking the exact image, which I probably should be. I'm just checking if there's an image at all.
So I guess my questions are:
What am I doing wrong with my XPath?
Is this even the best way to solve this problem?
Using Watir
There are a couple of approaches possible.
One way would be find the link, go up to the component div and then check for the image:
browser.link(:text => 'Saving the Day').parent.parent.image.present?
or
browser.div(:class => 'subject', :text => 'Saving the Day').parent.image.present?
Another approach, which is a little more robust to changes, is to find the component div that contains the link:
browser.divs(:class => 'component').find { |component|
component.div(:class => 'subject', :text => 'Saving the Day').exists?
}.image.present?
Using XPath
The above could of course be done through xpath as well.
Here is your corrected xpath:
article_image = '//div[#class="component"]//a[contains(.,"Saving the Day")]/../../img'
puts browser.image(:xpath => article_image).present?
Or alternatively:
article_image = '//a[contains(.,"Saving the Day")]/../../img'
browser.image(:xpath => article_image).present?
Again, there is also the top down approach:
article_image = '//div[#class="component"][//a[contains(.,"Saving the Day")]]/img'
browser.image(:xpath => article_image).present?
You can read more about these approaches and other options in the book Watirways.

Add properties to a page from a Jekyll plugin

Say I want to have a page with content like this:
<h1>{{page.comment_count}} Comment(s)</h1>
{% for c in page.comment_list %}
<div>
<strong>{{c.title}}</strong><br/>
{{c.content}}
</div>
{% endfor %}
There are no variables on the page named comment_count or comment_list by default; instead I want these variables to be added to the page from a Jekyll plugin. Where is a safe place I can populate those fields from without interfering with Jekyll's existing code?
Or is there a better way of achieving a list of comments like this?
Unfortunately, there isn't presently the possibility to add these attributes without some messing with internal Jekyll stuff. We're on our way to adding hooks for #after_initialize, etc but aren't there yet.
My best suggestion is to add these attributes as I've done with my Octopress Date plugin on my blog. It uses Jekyll v1.2.0's Jekyll::Post#to_liquid method to add these attributes, which are collected via send(attr) on the Post:
class Jekyll::Post
def comment_count
comment_list.size
end
def comment_list
YAML.safe_load_file("_comments/#{self.id}.yml")
end
# Convert this post into a Hash for use in Liquid templates.
#
# Returns <Hash>
def to_liquid(attrs = ATTRIBUTES_FOR_LIQUID)
super(attrs + %w[
comment_count
comment_list
])
end
end
super(attrs + %w[ ... ]) will ensure that all the old attributes are still included, then collect the return values of the methods corresponding to the entries in the String array.
This is the best means of extending posts and pages so far.

Resources