Add URL field to Jekyll post yaml data - ruby

I want to programmatically append a single-line of YAML front data to all the source _posts files.
Background: I currently have a Jekyll-powered website that generates its URLs with the following plugin:
require "Date"
module Jekyll
class PermalinkRewriter < Generator
safe true
priority :low
def generate(site)
# Until Jekyll allows me to use :slug, I have to resort to this
site.posts.each do |item|
day_of_year = item.date.yday.to_s
if item.date.yday < 10
day_of_year = '00'+item.date.yday.to_s
elsif item.date.yday < 100
day_of_year = '0'+item.date.yday.to_s
end
item.data['permalink'] = '/archives/' + item.date.strftime('%g') + day_of_year + '-' + item.slug + '.html'
end
end
end
end
end
All this does is generate a URL like /archives/12001-post-title.html, which is the two-digit year (2012), followed by the day of the year on which the post was written (in this case, January 1st).
(Aside: I like this because it essentially creates a UID for every Jekyll post, which can then be sorted by name in the generated _site folder and end up in chronological order).
However, now I want to change the URL scheme for new posts I write, but I don't want this to break all my existing URLs, when the site is generated. So, I need a way to loop through my source _posts folder and append the plugin-generated ULR to each post's YAML data, with the URL: front matter.
I'm at a loss of how to do this. I know how to append lines to a text file with Ruby, but how do I do that for all my _posts files AND have that line contain the URL that would be generated by the plugin?

Et voilĂ  ! Tested on Jekyll 2.2.0
module Jekyll
class PermalinkRewriter < Generator
safe true
priority :low
def generate(site)
#site = site
site.posts.each do |item|
if not item.data['permalink']
# complete string from 1 to 999 with leading zeros (0)
# 1 -> 001 - 20 -> 020
day_of_year = item.date.yday.to_s.rjust(3, '0')
file_name = item.date.strftime('%g') + day_of_year + '-' + item.slug + '.html'
permalink = '/archives/' + file_name
item.data['permalink'] = permalink
# get post's datas
post_path = item.containing_dir(#site.source, "")
full_path = File.join(post_path, item.name)
file_yaml = item.data.to_yaml
file_content = item.content
# rewrites the original post with the new Yaml Front Matter and content
# writes 'in stone !'
File.open(full_path, 'w') do |f|
f.puts file_yaml
f.puts '---'
f.puts "\n\n"
f.puts file_content
end
Jekyll.logger.info "Added permalink " + permalink + " to post " + item.name
end
end
end
end
end

Related

How to read multiple XML files then output to multiple CSV files with the same XML filenames

I am trying to parse multiple XML files then output them into CSV files to list out the proper rows and columns.
I was able to do so by processing one file at a time by defining the filename, and specifically output them into a defined output file name:
File.open('H:/output/xmloutput.csv','w')
I would like to write into multiple files and make their name the same as the XML filenames without hard coding it. I tried doing it multiple ways but have had no luck so far.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<record:root>
<record:Dataload_Request>
<record:name>Bob Chuck</record:name>
<record:Address_Data>
<record:Street_Address>123 Main St</record:Street_Address>
<record:Postal_Code>12345</record:Postal_Code>
</record:Address_Data>
<record:Age>45</record:Age>
</record:Dataload_Request>
</record:root>
Here is what I've tried:
require 'nokogiri'
require 'set'
files = ''
input_folder = "H:/input"
output_folder = "H:/output"
if input_folder[input_folder.length-1,1] == '/'
input_folder = input_folder[0,input_folder.length-1]
end
if output_folder[output_folder.length-1,1] != '/'
output_folder = output_folder + '/'
end
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
doc = Nokogiri::XML(file)
record = {} # hashes
keys = Set.new
records = [] # array
csv = ""
doc.traverse do |node|
value = node.text.gsub(/\n +/, '')
if node.name != "text" # skip these nodes: if class isnt text then skip
if value.length > 0 # skip empty nodes
key = node.name.gsub(/wd:/,'').to_sym
if key == :Dataload_Request && !record.empty?
records << record
record = {}
elsif key[/^root$|^document$/]
# neglect these keys
else
key = node.name.gsub(/wd:/,'').to_sym
# in case our value is html instead of text
record[key] = Nokogiri::HTML.parse(value).text
# add to our key set only if not already in the set
keys << key
end
end
end
end
# build our csv
File.open('H:/output/.*csv', 'w') do |file|
file.puts %Q{"#{keys.to_a.join('","')}"}
records.each do |record|
keys.each do |key|
file.write %Q{"#{record[key]}",}
end
file.write "\n"
end
print ''
print 'output files ready!'
print ''
end
I have been getting 'read memory': no implicit conversion of Array into String (TypeError) and other errors.
Here's a quick peer-review of your code, something like you'd get in a corporate environment...
Instead of writing:
input_folder = "H:/input"
input_folder[input_folder.length-1,1] == '/' # => false
Consider doing it using the -1 offset from the end of the string to access the character:
input_folder[-1] # => "t"
That simplifies your logic making it more readable because it's lacking unnecessary visual noise:
input_folder[-1] == '/' # => false
See [] and []= in the String documentation.
This looks like a bug to me:
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
files is an array of filenames. input_folder + '/' + files is appending an array to a string:
foo = ['1', '2'] # => ["1", "2"]
'/parent/' + foo # =>
# ~> -:9:in `+': no implicit conversion of Array into String (TypeError)
# ~> from -:9:in `<main>'
How you want to deal with that is left as an exercise for the programmer.
doc.traverse do |node|
is icky because it sidesteps the power of Nokogiri being able to search for a particular tag using accessors. Very rarely do we need to iterate over a document tag by tag, usually only when we're peeking at its structure and layout. traverse is slower so use it as a very last resort.
length is nice but isn't needed when checking whether a string has content:
value = 'foo'
value.length > 0 # => true
value > '' # => true
value = ''
value.length > 0 # => false
value > '' # => false
Programmers coming from Java like to use the accessors but I like being lazy, probably because of my C and Perl backgrounds.
Be careful with sub and gsub as they don't do what you're thinking they do. Both expect a regular expression, but will take a string which they do a escape on before beginning their scan.
You're passing in a regular expression, which is OK in this case, but it could cause unexpected problems if you don't remember all the rules for pattern matching and that gsub scans until the end of the string:
foo = 'wd:barwd:' # => "wd:barwd:"
key = foo.gsub(/wd:/,'') # => "bar"
In general I recommend people think a couple times before using regular expressions. I've seen some gaping holes opened up in logic written by fairly advanced programmers because they didn't know what the engine was going to do. They're wonderfully powerful, but need to be used surgically, not as a universal solution.
The same thing happens with a string, because gsub doesn't know when to quit:
key = foo.gsub('wd:','') # => "bar"
So, if you're looking to change just the first instance use sub:
key = foo.sub('wd:','') # => "barwd:"
I'd do it a little differently though.
foo = 'wd:bar'
I can check to see what the first three characters are:
foo[0,3] # => "wd:"
Or I can replace them with something else using string indexing:
foo[0,3] = ''
foo # => "bar"
There's more but I think that's enough for now.
You should use Ruby's CSV class. Also, you don't need to do any string matching or regex stuff. Use Nokogiri to target elements. If you know the node names in the XML will be consistent it should be pretty simple. I'm not exactly sure if this is the output you want, but this should get you in the right direction:
require 'nokogiri'
require 'csv'
def xml_to_csv(filename)
xml_str = File.read(filename)
xml_str.gsub!('record:','') # remove the record: namespace
doc = Nokogiri::XML xml_str
csv_filename = filename.gsub('.xml', '.csv')
CSV.open(csv_filename, 'wb' ) do |row|
row << ['name', 'street_address', 'postal_code', 'age']
row << [
doc.xpath('//name').text,
doc.xpath('//Street_Address').text,
doc.xpath('//Postal_Code').text,
doc.xpath('//Age').text,
]
end
end
# iterate over all xml files
Dir.glob('*.xml').each { |filename| xml_to_csv(filename) }

Appending a parameter's value in parameters.yml when deploying symfony application

First of all I've never used ruby and capifony so this question might be very easy for some of you but not for me. I tried to find examples related to my question but couldn't find any or I'm too new to ruby to miss!
When deploying my Symfony app, the value of a particular key in my parameters file should be dynamically appended so, rather than using exact generic value coming from distribution file below:
parameters_prod.yml.dist
parameters:
my_prefix: my_value_prefix_
Straight after deployment, parameters.yml should read like below:
parameters.yml
parameters:
my_prefix: my_value_prefix_8235647895
8235647895 part above is going to be timestamp.
How can I do this?
My current deploy.rb
namespace :mytest do
desc "Upload the parameters.yml"
task :upload_parameters do
origin_file = parameters_dir + "/" + parameters_file if parameters_dir && parameters_file
origin_file_append = parameters_dir + "/" + parameters_file_append if parameters_dir && parameters_file_append
if origin_file && File.exists?(origin_file)
relative_path = "app/config/parameters.yml"
files = [origin_file]
files << origin_file_append if origin_file_append && File.exists?(origin_file_append)
tmp_origin_file = origin_file + '.tmp'
File.open(tmp_origin_file, 'w') do |fo|
files.each do |file|
File.foreach(file) do |li|
fo.puts li
end
end
end
if shared_files && shared_files.include?(relative_path)
destination_file = shared_path + "/" + relative_path
else
destination_file = latest_release + "/" + relative_path
end
try_sudo "mkdir -p #{File.dirname(destination_file)}"
top.upload(tmp_origin_file, destination_file)
File.delete(tmp_origin_file)
end
end
end

Download a file only if it exists with ruby

I'm doing a scraper to download all the issues of The Exile available at http://exile.ru/archive/list.php?IBLOCK_ID=35&PARAMS=ISSUE.
So far, my code is like this:
require 'rubygems'
require 'open-uri'
DATA_DIR = "exile"
Dir.mkdir(DATA_DIR) unless File.exists?(DATA_DIR)
BASE_exile_URL = "http://exile.ru/docs/pdf/issues/exile"
for number in 120..290
numero = BASE_exile_URL + number.to_s + ".pdf"
puts "Downloading issue #{number}"
open(numero) { |f|
File.open("#{DATA_DIR}/#{number}.pdf",'w') do |file|
file.puts f.read
end
}
end
puts "done"
The thing is, a lot of the issue links are down, and the code creates a PDF for every issue and, if it's empty, it will leave an empty PDF. How can I change the code so that it can only create and copy a file if the link exists?
require 'open-uri'
DATA_DIR = "exile"
Dir.mkdir(DATA_DIR) unless File.exists?(DATA_DIR)
url_template = "http://exile.ru/docs/pdf/issues/exile%d.pdf"
filename_template = "#{DATA_DIR}/%d.pdf"
(120..290).each do |number|
pdf_url = url_template % number
print "Downloading issue #{number}"
# Opening the URL downloads the remote file.
open(pdf_url) do |pdf_in|
if pdf_in.read(4) == '%PDF'
pdf_in.rewind
File.open(filename_template % number,'w') do |pdf_out|
pdf_out.write(pdf_in.read)
end
print " OK\n"
else
print " #{pdf_url} is not a PDF\n"
end
end
end
puts "done"
open(url) downloads the file and provides a handle to a local temp file. A PDF starts with '%PDF'. After reading the first 4 characters, if the file is a PDF, the file pointer has to be put back to the beginning to capture the whole file when writing a local copy.
you can use this code to check if exist the file:
require 'net/http'
def exist_the_pdf?(url_pdf)
url = URI.parse(url_pdf)
Net::HTTP.start(url.host, url.port) do |http|
puts http.request_head(url.path)['content-type'] == 'application/pdf'
end
end
Try this:
require 'rubygems'
require 'open-uri'
DATA_DIR = "exile"
Dir.mkdir(DATA_DIR) unless File.exists?(DATA_DIR)
BASE_exile_URL = "http://exile.ru/docs/pdf/issues/exile"
for number in 120..290
numero = BASE_exile_URL + number.to_s + ".pdf"
open(numero) { |f|
content = f.read
if content.include? "Link is missing"
puts "Issue #{number} doesnt exists"
else
puts "Issue #{number} exists"
File.open("./#{number}.pdf",'w') do |file|
file.write(content)
end
end
}
end
puts "done"
The main thing I added is a check to see if the string "Link is missing". I wanted to do it using HTTP status codes but they always give a 200 back, which is not the best practice.
The thing to note is that with my code you always download the whole site to look for that string, but I don't have any other idea to fix it at the moment.

Calling multiple methods on a CSV object

I have constructed an Event Manager class that performs parsing actions on a CSV file, and produces html letters using erb. It is part of a jumpstart labs tutorial
The program works fine, but I am unable to call multiple methods on an object without the earlier methods interfering with the later methods. As a result, I have opted to create multiple objects to call instance methods on, which seems like a clunky inelegant solution. Is there a better way to do this, where I can create a single new object and call methods on it?
Like so:
eventmg = EventManager.new("event_attendees.csv")
eventmg.print_valid_phone_numbers
eventmg_2 = EventManager.new("event_attendees.csv")
eventmg_2.print_zipcodes
eventmg_3 = EventManager.new("event_attendees.csv")
eventmg_3.time_targeter
eventmg_4 = EventManager.new("event_attendees.csv")
eventmg_4.day_of_week
eventmg_5 = EventManager.new("event_attendees.csv")
eventmg_5.create_thank_you_letters
The complete code is as follows
require 'csv'
require 'sunlight/congress'
require 'erb'
class EventManager
INVALID_PHONE_NUMBER = "0000000000"
Sunlight::Congress.api_key = "e179a6973728c4dd3fb1204283aaccb5"
def initialize(file_name, list_selections = [])
puts "EventManager Initialized."
#file = CSV.open(file_name, {:headers => true,
:header_converters => :symbol} )
#list_selections = list_selections
end
def clean_zipcode(zipcode)
zipcode.to_s.rjust(5,"0")[0..4]
end
def print_zipcodes
puts "Valid Participant Zipcodes"
#file.each do |line|
zipcode = clean_zipcode(line[:zipcode])
puts zipcode
end
end
def clean_phone(phone_number)
converted = phone_number.scan(/\d/).join('').split('')
if converted.count == 10
phone_number
elsif phone_number.to_s.length < 10
INVALID_PHONE_NUMBER
elsif phone_number.to_s.length == 11 && converted[0] == 1
phone_number.shift
phone_number.join('')
elsif phone_number.to_s.length == 11 && converted[0] != 1
INVALID_PHONE_NUMBER
else
phone_number.to_s.length > 11
INVALID_PHONE_NUMBER
end
end
def print_valid_phone_numbers
puts "Valid Participant Phone Numbers"
#file.each do |line|
clean_number = clean_phone(line[:homephone])
puts clean_number
end
end
def time_targeter
busy_times = Array.new(24) {0}
#file.each do |line|
registration = line[:regdate]
prepped_time = DateTime.strptime(registration, "%m/%d/%Y %H:%M")
prepped_time = prepped_time.hour.to_i
# inserts filtered hour into the array 'list_selections'
#list_selections << prepped_time
end
# tallies number of registrations for each hour
i = 0
while i < #list_selections.count
busy_times[#list_selections[i]] += 1
i+=1
end
# delivers a result showing the hour and the number of registrations
puts "Number of Registered Participants by Hour:"
busy_times.each_with_index {|counter, hours| puts "#{hours}\t#{counter}"}
end
def day_of_week
busy_day = Array.new(7) {0}
d_of_w = ["Monday:", "Tuesday:", "Wednesday:", "Thursday:", "Friday:", "Saturday:", "Sunday:"]
#file.each do |line|
registration = line[:regdate]
# you have to reformat date because of parser format
prepped_date = Date.strptime(registration, "%m/%d/%y")
prepped_date = prepped_date.wday
# adds filtered day of week into array 'list selections'
#list_selections << prepped_date
end
i = 0
while i < #list_selections.count
# i is minus one since days of week begin at '1' and arrays begin at '0'
busy_day[#list_selections[i-1]] += 1
i+=1
end
#busy_day.each_with_index {|counter, day| puts "#{day}\t#{counter}"}
prepared = d_of_w.zip(busy_day)
puts "Number of Registered Participants by Day of Week"
prepared.each{|date| puts date.join(" ")}
end
def legislators_by_zipcode(zipcode)
Sunlight::Congress::Legislator.by_zipcode(zipcode)
end
def save_thank_you_letters(id,form_letter)
Dir.mkdir("output") unless Dir.exists?("output")
filename = "output/thanks_#{id}.html"
File.open(filename,'w') do |file|
file.puts form_letter
end
end
def create_thank_you_letters
puts "Thank You Letters Available in Output Folder"
template_letter = File.read "form_letter.erb"
erb_template = ERB.new template_letter
#file.each do |line|
id = line[0]
name = line[:first_name]
zipcode = clean_zipcode(line[:zipcode])
legislators = legislators_by_zipcode(zipcode)
form_letter = erb_template.result(binding)
save_thank_you_letters(id,form_letter)
end
end
end
The reason you're experiencing this problem is because when you apply each to the result of CSV.open you're moving the file pointer each time. When you get to the end of the file with one of your methods, there is nothing for anyone else to read.
An alternative is to read the contents of the file into an instance variable at initialization with readlines. You'll get an array of arrays which you can operate on with each just as easily.
"Is there a better way to do this, where I can create a single new object and call methods on it?"
Probably. If your methods are interfering with one another, it means you're changing state within the manager, instead of working on local variables.
Sometimes, it's the right thing to do (e.g. Array#<<); sometimes not (e.g. Fixnum#+)... Seeing your method names, it probably isn't.
Nail the offenders down and adjust the code accordingly. (I only scanned your code, but those Array#<< calls on an instance variable, in particular, look fishy.)

Why only the first link is fetched?

I'm trying to fetch news from Hacker News and write a link's title and URL to an HTML file. However, only the first link is getting written and others are not. What am I doing wrong?
require 'httparty'
def fetch(source)
response = HTTParty.get(source)
response["items"].each do |item|
return '' + item["title"] + ''
end
end
links = fetch('http://api.ihackernews.com/page')
File.open("/tmp/news.html", "w") do |f|
f.puts links
end
You shouldn't use return keyword in this case. It ends the method prematurely and returns only the first link. Use this instead:
require 'httparty'
def fetch(source)
response = HTTParty.get(source)
# convert response['items'] array to array of strings
response["items"].map do |item|
'' + item["title"] + ''
end
end
links = fetch('http://api.ihackernews.com/page')
links.length # => 30

Resources