I'm trying to create a Jekyll plugin, which should go through all posts and render them with a different layout. Can't figure out how to do that. That's what I have so far:
module Jekyll
class MyGenerator < Generator
priority :low
def generate(site)
site.posts.docs.each do |doc|
page = Page.new(site, site.source, File.dirname(doc.relative_path), doc.basename)
page.do_layout(
site.site_payload,
'post' => Layout.new(site, site.source, '_layouts/my.html')
)
page.write(?)
site.pages << page
end
end
end
end
This code doesn't work.
In my code below, I'm rendering all my pages a second time with the null layout. The resulting files all have the suffix "_BARE"
module Jekyll
class BareHtml < Page
def initialize(site, base, dest_dir, src_dir, page)
#site = site
#base = base
#dir = dest_dir
#dest_dir = dest_dir
#dest_name = page.basename
file_name = "#{page.basename}_BARE.html"
self.process(file_name)
self.read_yaml(base, page.path)
self.data['layout'] = nil ### <-- set the layout name here
end
end
class BareHtmlGenerator < Generator
safe true
priority :low
def generate(site)
# Converter for .md > .html
converter = site.find_converter_instance(Jekyll::Converters::Markdown)
dest = site.dest
src = site.source
# Create destination path
FileUtils.mkpath(dest) unless File.exists?(dest)
site_pages = site.pages.dup
site_pages.each do |page|
bare = BareHtml.new(site, site.source, dest, src, page)
bare.content = converter.convert(bare.content)
bare.render(site.layouts, site.site_payload)
bare.write(site.dest)
site.pages << bare
end
end
end
end
Related
I am building a simple web spider using Sidekiq and Mechanize.
When I run this for one domain, it works fine. When I run it for multiple domains, it fails. I believe the reason is that web_page gets overwritten when instantiated by another Sidekiq worker, but I am not sure if that's true or how to fix it.
# my scrape_search controller's create action searches on google.
def create
#scrape = ScrapeSearch.build(keywords: params[:keywords], profession: params[:profession])
agent = Mechanize.new
scrape_search = agent.get('http://google.com/') do |page|
search_result = page.form...
search_result.css("h3.r").map do |link|
result = link.at_css('a')['href'] # Narrowing down to real search results
#domain = Domain.new(some params)
ScrapeDomainWorker.perform_async(#domain.url, #domain.id, remaining_keywords)
end
end
end
I'm creating a Sidekiq job per domain. Most of the domains I'm looking for should contain just a few pages, so there's no need for sub-jobs per page.
This is my worker:
class ScrapeDomainWorker
include Sidekiq::Worker
...
def perform(domain_url, domain_id, keywords)
#domain = Domain.find(domain_id)
#domain_link = #domain.protocol + '://' + domain_url
#keywords = keywords
# First we scrape the homepage and get the first links
#domain.to_parse = ['/'] # to_parse is an array of PATHS to parse for the domain
mechanize_path('/')
#domain.verified << '/' # verified is an Array field containing valid domain paths
get_paths(#web_page) # Now we should have to_scrape populated with homepage links
#domain.scraped = 1 # Loop counter
while #domain.scraped < 100
#domain.to_parse.each do |path|
#domain.to_parse.delete(path)
#domain.scraped += 1
mechanize_path(path) # We create a Nokogiri HTML doc with mechanize for the valid path
...
get_paths(#web_page) # Fire this to repopulate to_scrape !!!
end
end
#domain.save
end
def mechanize_path(path)
agent = Mechanize.new
begin
#web_page = agent.get(#domain_link + path)
rescue Exception => e
puts "Mechanize Exception for #{path} :: #{e.message}"
end
end
def get_paths(web_page)
paths = web_page.links.map {|link| link.href.gsub((#domain.protocol + '://' + #domain.url), "") } ## This works when I scrape a single domain, but fails with ".gsub for nil" when I scrape a few domains.
paths.uniq.each do |path|
#domain.to_parse << path
end
end
end
This works when I scrape a single domain, but fails with .gsub for nil for web_page when I scrape a few domains.
You can wrap you code in another class, and then create and object of that class within your worker:
class ScrapeDomainWrapper
def initialize(domain_url, domain_id, keywords)
# ...
end
def mechanize_path(path)
# ...
end
def get_paths(web_page)
# ...
end
end
And your worker:
class ScrapeDomainWorker
include Sidekiq::Worker
def perform(domain_url, domain_id, keywords)
ScrapeDomainWrapper.new(domain_url, domain_id, keywords)
end
end
Also, bear in mind that Mechanize::Page#links may be a nil.
I will go ahead and apologize upfront as I am new to ruby and rails and I cannot for the life of me figure out how to implement using hashids in my project. The project is a simple image host. I have it already working using Base58 to encode the sql ID and then decode it in the controller. However I wanted to make the URLs more random hence switching to hashids.
I have placed the hashids.rb file in my lib directory from here: https://github.com/peterhellberg/hashids.rb
Now some of the confusion starts here. Do I need to initialize hashids on every page that uses hashids.encode and hashids.decode via
hashids = Hashids.new("mysalt")
I found this post (http://zogovic.com/post/75234760043/youtube-like-ids-for-your-activerecord-models) which leads me to believe I can put it into an initializer however after doing that I am still getting NameError (undefined local variable or method `hashids' for ImageManager:Class)
so in my ImageManager.rb class I have
require 'hashids'
class ImageManager
class << self
def save_image(imgpath, name)
mime = %x(/usr/bin/exiftool -MIMEType #{imgpath})[34..-1].rstrip
if mime.nil? || !VALID_MIME.include?(mime)
return { status: 'failure', message: "#{name} uses an invalid format." }
end
hash = Digest::MD5.file(imgpath).hexdigest
image = Image.find_by_imghash(hash)
if image.nil?
image = Image.new
image.mimetype = mime
image.imghash = hash
unless image.save!
return { status: 'failure', message: "Failed to save #{name}." }
end
unless File.directory?(Rails.root.join('uploads'))
Dir.mkdir(Rails.root.join('uploads'))
end
#File.open(Rails.root.join('uploads', "#{Base58.encode(image.id)}.png"), 'wb') { |f| f.write(File.open(imgpath, 'rb').read) }
File.open(Rails.root.join('uploads', "#{hashids.encode(image.id)}.png"), 'wb') { |f| f.write(File.open(imgpath, 'rb').read) }
end
link = ImageLink.new
link.image = image
link.save
#return { status: 'success', message: Base58.encode(link.id) }
return { status: 'success', message: hashids.encode(link.id) }
end
private
VALID_MIME = %w(image/png image/jpeg image/gif)
end
end
And in my controller I have:
require 'hashids'
class MainController < ApplicationController
MAX_FILE_SIZE = 10 * 1024 * 1024
MAX_CACHE_SIZE = 128 * 1024 * 1024
#links = Hash.new
#files = Hash.new
#tstamps = Hash.new
#sizes = Hash.new
#cache_size = 0
class << self
attr_accessor :links
attr_accessor :files
attr_accessor :tstamps
attr_accessor :sizes
attr_accessor :cache_size
attr_accessor :hashids
end
def index
end
def transparency
end
def image
##imglist = params[:id].split(',').map{ |id| ImageLink.find(Base58.decode(id)) }
#imglist = params[:id].split(',').map{ |id| ImageLink.find(hashids.decode(id)) }
end
def image_direct
#linkid = Base58.decode(params[:id])
linkid = hashids.decode(params[:id])
file =
if Rails.env.production?
puts "#{Base58.encode(ImageLink.find(linkid).image.id)}.png"
File.open(Rails.root.join('uploads', "#{Base58.encode(ImageLink.find(linkid).image.id)}.png"), 'rb') { |f| f.read }
else
puts "#{hashids.encode(ImageLink.find(linkid).image.id)}.png"
File.open(Rails.root.join('uploads', "#{hashids.encode(ImageLink.find(linkid).image.id)}.png"), 'rb') { |f| f.read }
end
send_data(file, type: ImageLink.find(linkid).image.mimetype, disposition: 'inline')
end
def upload
imgparam = params[:image]
if imgparam.is_a?(String)
name = File.basename(imgparam)
imgpath = save_to_tempfile(imgparam).path
else
name = imgparam.original_filename
imgpath = imgparam.tempfile.path
end
File.chmod(0666, imgpath)
%x(/usr/bin/exiftool -all= -overwrite_original #{imgpath})
logger.debug %x(which exiftool)
render json: ImageManager.save_image(imgpath, name)
end
private
def save_to_tempfile(url)
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = uri.scheme == 'https'
http.start do
resp = http.get(uri.path)
file = Tempfile.new('urlupload', Dir.tmpdir, :encoding => 'ascii-8bit')
file.write(resp.body)
file.flush
return file
end
end
end
Then in my image.html.erb view I have this:
<%
#imglist.each_with_index { |link, i|
id = hashids.encode(link.id)
ext = link.image.mimetype.split('/')[1]
if ext == 'jpeg'
ext = 'jpg'
end
puts id + '.' + ext
%>
Now if I add
hashids = Hashids.new("mysalt")
in ImageManager.rb main_controller.rb and in my image.html.erb I am getting this error:
ActionView::Template::Error (undefined method `id' for #<Array:0x000000062f69c0>)
So all in all implementing hashids.encode/decode is not as easy as implementing Base58.encode/decode and I am confused on how to get it working... Any help would be greatly appreciated.
I would suggest loading it as a gem by including it into your Gemfile and running bundle install. It will save you the hassle of requiring it in every file and allow you to manage updates using Bundler.
Yes, you do need to initialize it wherever it is going to be used with the same salt. Would suggest that you define the salt as a constant, perhaps in application.rb.
The link you provided injects hashids into ActiveRecord, which means it will not work anywhere else. I would not recommend the same approach as it will require a high level of familiarity with Rails.
You might want to spend some time understanding ActiveRecord and ActiveModel. Will save you a lot of reinventing the wheel. :)
Before everythink you should just to test if Hashlib is included in your project, you can run command rails c in your project folder and make just a small test :
>> my_id = ImageLink.last.id
>> puts Hashids.new(my_id)
If not working, add the gem in gemfile (that anyway make a lot more sence).
Then, I think you should add a getter for your hash_id in your ImageLink model.
Even you don't want to save your hash in the database, this hash have it's pllace in your model. See virtual property for more info.
Remember "Skinny Controller, Fat Model".
class ImageLink < ActiveRecord::Base
def hash_id()
# cache the hash
#hash_id ||= Hashids.new(id)
end
def extension()
# you could add the logic of extension here also.
ext = image.mimetype.split('/')[1]
if ext == 'jpeg'
'jpg'
else
ext
end
end
end
Change the return in your ImageManager#save_image
link = ImageLink.new
link.image = image
# Be sure your image have been saved (validation errors, etc.)
if link.save
{ status: 'success', message: link.hash_id }
else
{status: 'failure', message: link.errors.join(", ")}
end
In your template
<%
#imglist.each_with_index do |link, i|
puts link.hash_id + '.' + link.extension
end # <- I prefer the do..end to not forgot the ending parenthesis
%>
All this code is not tested...
I was looking for something similar where I can disguise the ids of my records. I came across act_as_hashids.
https://github.com/dtaniwaki/acts_as_hashids
This little gem integrates seamlessly. You can still find your records through the ids. Or with the hash. On nested records you can use the method with_hashids.
To get the hash you use to_param on the object itself which result in a string similar to this ePQgabdg.
Since I just implemented this I can't tell how useful this gem will be. So far I just had to adjust my code a little bit.
I also gave the records a virtual attribute hashid so I can access it easily.
attr_accessor :hashid
after_find :set_hashid
private
def set_hashid
self.hashid = self.to_param
end
I'm using a plugin to count page views for posts and pages based on Google Analytics. To display the page view count I'm using a Liquid tag {% pageview %}. Is there any way to add this data to YAML front matter, so it can be accessed in a list of popular posts on other pages by something like {{ page.views }}?
Here is the code for the Liquid tag in the plugin:
class PageViewTag < Liquid::Tag
def initialize(name, marker, token)
#params = Hash[*marker.split(/(?:: *)|(?:, *)/)]
super
end
def render(context)
site = context.environments.first['site']
if !site['page-view']
return ''
end
post = context.environments.first['post']
if post == nil
post = context.environments.first['page']
if post == nil
return ''
end
end
pv = post['_pv']
if pv == nil
return ''
end
html = pv.to_s.reverse.gsub(/...(?=.)/,"\\&\u2009").reverse
return html
end #render
end # PageViewTag
How can I instead of registering a Liquid tag merge this data to the data of the post (document in a collection)? And use via {{ page.views }}.
You can use a generator plugin to add some data['views'] to your posts or pages.
Here is the code for the plugin I made:
require 'jekyll'
module Jekyll
class PageviewsData < Jekyll::Generator
safe :true
priority :low
def generate(site)
# require ga-page-view plugin
if !site.config['page-view']
return
end
site.collections.each do |label, collection|
collection.docs.each { |doc|
pv = doc.data['_pv']
views = pv.to_s.reverse.gsub(/...(?=.)/,"\\&\u2009").reverse
doc.data.merge!('views' => views)
}
end
end
end
end
I'm trying to write a class "web" in Ruby 2.0.0 that inherits from GEXF::Graph, but I am unable to get the Graph methods like Web.define_node_attribute to work. I'm a new ruby programmer, so I expect I'm doing something goofy. Thanks.
webrun.rb
require 'rubygems'
require 'gexf'
require 'anemone'
require 'mechanize'
require_relative 'web'
web = Web.new
web.define_node_attribute(:url)
web.define_node_attribute(:links,
:type => GEXF::Attribute::BOOLEAN,
:default => true)
web.rb
require 'rubygems'
require 'gexf'
require 'anemone'
require 'mechanize'
class Web < GEXF::Graph
attr_accessor :root
attr_accessor :pages
def initialize
#pages = Array.new
end
def pages
#pages
end
def add page
#pages << page
end
def parse uri, protocol = 'http:', domain = 'localhost', file = 'index.html'
u = uri.split('/')
if n = /^(https?:)/.match(u[0])
protocol = n[0]
u.shift()
end
if u[0] == ''
u.shift()
end
if n = /([\w\.]+\.(org|com|net))/.match(u[0])
domain = n[0]
u.shift()
end
if n = /(.*\.(html?|gif))/.match(u[-1])
file = n[0]
u.pop()
end
cnt = 0
while u[cnt] == '..' do
cnt = cnt + 1
u.shift()
end
while cnt > 0 do
cnt = cnt - 1
u.shift()
end
directory = '/'+u.join('/')
puts "protocol: " + protocol + " domain: " + domain + \
" directory: " + directory + " file: " + file
protocol + "//" + domain + directory + (directory[-1] == '/' ? '/' : '') + file
end
def crawl
Anemone.crawl(#root) do |anemone|
anemone.on_every_page do |sitepage|
add sitepage
end
end
end
def save file
f = File.open(file, mode = "w")
f.write(to_xml)
f.close()
end
end
The issue is that you are monkey-patching the GEXF::Graph initialize method without calling super on it. What you did was essentially 'write-over' the initialize method that needed to be called. To fix this, change your initialize method to call the super method first:
def initialize
super
#pages = Array.new
end
I have written a Jekyll plugin, "Tags", which generates a file and returns string of links to that file.
Everything is fine, but if I write that file directly into the _site folder, it is removed. If I put that file outside the _site folder, it is not generated inside _site.
Where and how should I add my file so that it is available inside the _site folder?
You should use class Page for this and call methods render and write.
This is an example to generate the archive page at my blog:
module Jekyll
class ArchiveIndex < Page
def initialize(site, base, dir, periods)
#site = site
#base = base
#dir = dir
#name = 'archive.html'
self.process(#name)
self.read_yaml(File.join(base, '_layouts'), 'archive_index.html')
self.data['periods'] = periods
end
end
class ArchiveGenerator < Generator
priority :low
def generate(site)
periods = site.posts.reverse.group_by{ |c| {"month" => Date::MONTHNAMES[c.date.month], "year" => c.date.year} }
index = ArchiveIndex.new(site, site.source, '/', periods)
index.render(site.layouts, site.site_payload)
index.write(site.dest)
site.pages << index
end
end
end