Avoid repeated calls to an API in Jekyll Ruby plugin - ruby

I have written a Jekyll plugin to display the number of pageviews on a page by calling the Google Analytics API using the garb gem. The only trouble with my approach is that it makes a call to the API for each page, slowing down build time and also potentially hitting the user call limits on the API.
It would be possible to return all the data in a single call and store it locally, and then look up the pageview count from each page, but my Jekyll/Ruby-fu isn't up to scratch. I do not know how to write the plugin to run once to get all the data and store it locally where my current function could then access it, rather than calling the API page by page.
Basically my code is written as a liquid block that can be put into my page layout:
class GoogleAnalytics < Liquid::Block
def initialize(tag_name, markup, tokens)
super # options that appear in block (between tag and endtag)
#options = markup # optional optionss passed in by opening tag
end
def render(context)
path = super
# Read in credentials and authenticate
cred = YAML.load_file("/home/cboettig/.garb_auth.yaml")
Garb::Session.api_key = cred[:api_key]
token = Garb::Session.login(cred[:username], cred[:password])
profile = Garb::Management::Profile.all.detect {|p| p.web_property_id == cred[:ua]}
# place query, customize to modify results
data = Exits.results(profile,
:filters => {:page_path.eql => path},
:start_date => Chronic.parse("2011-01-01"))
data.first.pageviews
end
Full version of my plugin is here
How can I move all the calls to the API to some other function and make sure jekyll runs that once at the start, and then adjust the tag above to read that local data?
EDIT Looks like this can be done with a Generator and writing the data to a file. See example on this branch Now I just need to figure out how to subset the results: https://github.com/Sija/garb/issues/22

To store the data, I had to:
Write a Generator class (see Jekyll wiki plugins) to call the API.
Convert data to a hash (for easy lookup by path, see 5):
result = Hash[data.collect{|row| [row.page_path, [row.exits, row.pageviews]]}]
Write the data hash to a JSON file.
Read in the data from the file in my existing Liquid block class.
Note that the block tag works from the _includes dir, while the generator works from the root directory.
Match the page path, easy once the data is converted to a hash:
result[path][1]
Code for the full plugin, showing how to create the generator and write files, etc, here
And thanks to Sija on GitHub for help on this.

Related

How to check that a PDF file has some link with Ruby/Rspec?

I am using prawnpdf/pdf-inspector to test that content of a PDF generated in my Rails app is correct.
I would want to check that the PDF file contains a link with certain URL. I looked at yob/pdf-reader but haven't found any useful information related to this topic
Is it possible to test URLs within PDF with Ruby/RSpec?
I would want the following:
expect(urls_in_pdf(pdf)).to include 'https://example.com/users/1'
The https://github.com/yob/pdf-reader contains a method for each page called text.
Do something like
pdf = PDF::Reader.new("tmp/pdf.pdf")
assert pdf.pages[0].text.include? 'https://example.com/users/1'
assuming what you are looking for is at the first page
Since pdf-inspector seems only to return text, you could try to use the pdf-reader directly (pdf-inspector uses it anyways).
reader = PDF::Reader.new("somefile.pdf")
reader.pages.each do |page|
puts page.raw_content # This should also give you the link
end
Anyway I only did a quick look at the github page. I am not sure what raw_content exactly returns. But there is also a low-level method to directly access the objects of the pdf:
reader = PDF::Reader.new("somefile.pdf")
puts reader.objects.inspect
With that it surely is possible to get the url.

Querying Twilio calls list resource doesn't paginate the results using Ruby or PHP

According to Twilio's documentation here regarding "paging":
The list returned to you includes paging information. If you plan on requesting more records than will fit on a single page, you may want to use the provided nextpageuri rather than incrementing through the pages by page number.
It then gives an example:
# Initialize Twilio Client
#client = Twilio::REST::Client.new(account_sid, auth_token)
#client.calls.list
.each do |call|
puts call.direction
end
However, doing this just returns an array of all calls, there isn't any paging information or limiting of results or any "pages".
For my purposes I'm actually filtering the query like this:
#calls = #client.calls.list(
start_time_after: #time
start_time_before: #another_time
)
Because my date filter range is a 1 month period and there are currently about 4.5k calls to retrieve, its taking quite a while to process (and sometimes it just never processes)
I'm using the twilio helper library ruby gem "twilio-ruby" and running ruby 2.5
I've also tried using PHP with the respective twilio helper library and have found the same result.
Using curl however does work and gives paging information, its also incredibly fast compared to using the helper libraries
Twilio developer evangelist here.
list will paginate through, loading all the resources it can.
There are other calls that will stream the API in a lazier fashion, if that is more useful for your use case. For example, you can use each and it will load the calls lazily until they have run out.
#calls = #client.calls.each(
start_time_after: #time
start_time_before: #another_time
) do |call|
puts call.direction
end
If you do want to manually paginate yourself, you can the page method to get a CallPage object and iterate from there.
page = #client.calls.page(
start_time_after: #time
start_time_before: #another_time
)
while !page.nil? do
page.each { |call| puts call.direction }
page = page.next_page
end
Let me know if that helps at all.

Parsing Liquid in a Jekyll generator before converting to JSON

Best to start by saying that I am very new to Ruby and Liquid. I have searched around looking for some resource on this issue, but as yet haven't been able to find anything of real use.
I have a Jekyll site, which utilises the HTML5 History API. I have a Jekyll generator plugin which creates a single JSON file which holds all the post and page content, ready for use with HTML5 PushState and PopState. This part is functioning properly and is tested.
My problem comes when I have a post/page on the site which has Liquid tags in it. I am guessing I need to parse these Liquid tags to get the template output before I create my JSON object for each post/page. Here is what I have for pages as an example:
# Iterate over all pages
site.pages.each do |page|
# Encode the page HTML content to JSON
link = page.url
#content = Liquid::Template.parse(page.content)
hash[link] = { "body_class" => page.data['body_class'], "content" => converter.convert(#content.render), "title" => '<h1>' + page.data["content_title"] + '</h1>' }
end
Now, this at the minute is basically removing all Liquid tags from the generated JSON file, leaving nothing in it's place.
Here is my full generator file on Github which is based very heavily on nice work by Jezen Thomas.
The output JSON file is also in that repo with the site, or can be accessed quickly here. The blog.html content is the last item in the JSON file and shows the empty h1 and div tags which should have content.

Run a Liquid filter from a Jekyll plugin

Liquid has two filters named newline_to_br and escape.
I'm working on a Jekyll plugin that needs to run a string through those filters. Rather than install a separate gem which does this, or write my own code for it, is there any way to call those filters directly from inside of the plugin?
Those filters can become available with the line include Liquid::StandardFilters.
For example:
class PlaintextConverter
include Liquid::StandardFilters
def convert(content)
content = escape(content)
content = newline_to_br(content)
content
end
end
For a full list of functions that become available in this way, you can view the source of standardfilters.rb

How can you access page properties (YAML front matter) within a converter plugin

I'm writing a converter plugin for Jekyll and need access to some of the page header (YAML front matter) properties. Only the content is passed to the main converter method and does not seem possible to access the context.
Example:
module Jekyll
class UpcaseConverter < Converter
safe true
priority :low
def matches(ext)
ext =~ /^\.upcase$/i
end
def output_ext(ext)
".html"
end
def convert(content)
###########
#
# Its here that I need access to the content page header data
#
#
###########
content.upcase
end
end
end
Any ideas how I can access the page header data within a converter plugin?
Based on the Jekyll source code, it is not possible to retrieve the YAML front matter in a converter.
I see two solutions that could work depending on your situation.
Your file extension could be descriptive enough to provide the information you would have included in the front matter. It looks like the Converter plugin was designed to be this basic.
If modifying Jekyll is an option, you could change the Convertible.transform method to send the front matter to Converter.convert. The Converters that come with Jekyll would have to be modified as well. Fork it on GitHub and see if others like the idea. Here's where to start: https://github.com/mojombo/jekyll/blob/cb1a2d1818770ca5088818a73860198b8ccca27a/lib/jekyll/convertible.rb#L49
Good luck.
devnull, I ran into a similar situation and I figured a way of doing it.
In the converter, I registered a pre-render hook to pull YAML into a variable, so that in the actual convert method, I have access to the info I just pulled. Also, another post_render hook is needed to remove that piece of info since this should be a per-post data.
A side note. I found that the convert will be called twice, once for use in the html <meta> tag, once for the actual content. The hook will be only invoked for the second case, not the first. You may need to guard you convert function.
Another side note. I think having YAML in the converter is not unreasonable. Just like in pandoc where you can specify bibliography file in the YAML section and do other fine tuning, people should be given freedom to customize a single post using YAML, too.
def initialize(config)
super(config)
Jekyll::Hooks.register :posts, :pre_render do |post|
if matches(post.data["ext"])
# extract per post metadata, including those in YAML
#myconfig["meta"] = post.data
# you may need the path to the post: post.path
end
end
Jekyll::Hooks.register :posts, :post_render do |post|
if matches(post.data["ext"])
# remove per post metadata
#myconfig.delete("meta")
end
end
end
def convert(content)
return content unless #myconfig["meta"]
# actual conversion goes here
end

Resources