Does a Markdown parser exist that can also generate Markdown in Ruby? - ruby

I want to parse a Markdown document so I get a tree structure that I am able to manipulate. Afterwards I want the output to be Markdown again.
Example:
# This is a title
And a short paragraph...
m = SomeLib.parse("# This is a tit...")
m.insert(1, "Here is a new paragraph") # or something simmilar
m.to_md
Should become
# This is a title
Here is a new paragraph
And a short paragraph...
As I want to heavily change the document I do not want to use REGEX or simillar techniques.
I looked into Maruku and BlueCloth but somehow I cannot generate Markdown again.

Probably not out of the box, but using redcarpet you could write a custom renderer to build your tree and then manipulate it.
Though beware in this case you can't reuse the Markdown and Renderer instance and all methods in the custom Renderer subclass are supposed to return a string. Something like this could be a starting point:
class StackRenderer < Redcarpet::Render::Base
attr_reader :items
def initialize
super
#items = []
end
def header(title, level)
items << { :text => title, :level => level, :type => :header }
"#{'#' * level} #{title}\n\n"
end
def paragraph(text)
items << { :text => text, :type => :paragraph }
"#{text}\n\n"
end
end
# example...
sr = StackRenderer.new
md = Redcarpet::Markdown.new(sr)
text = <<-EOF
# This is a title
And a short paragraph...
EOF
md.render(text) # => "# This is a title\n\nAnd a short paragraph...\n\n"
sr.items # => [{:type=>:header, :level=>1, :text=>"This is a title"},
# {:type=>:paragraph, :text=>"And a short paragraph..."}]

Related

Ruby Yard documentation: how to add a "verbatim" (to generate something like a <pre> tag)

I want a piece of code, like a hash, to display with fixed typeface on the resulting html. Suppose this is the contents of my file:
=begin
One example of valid hash to this function is:
{
:name => "Engelbert",
:id => 1345
}
=end
def f hash_param
# ...
end
How to instruct yard (using the default of the version 0.9.15) so a yard doc file.rb will generate, for the hash example, the equivalent of adding 4 backslashes to the markdown format, or 4 starting empty spaces to stackoverflow, or the <pre> tag in html, resulting in a verbatim/fixed typeface format in the resulting html?
Expected output:
One example of valid hash to this function is:
{
:name => "Engelbert",
:id => 1345
}
EDIT
> gem install redcarpet
> yard doc --markup-provider redcarpet --markup markdown - file.rb
Should wrap the contents of file.rb within a <pre> tag, producing this page.
Use #example
Show an example snippet of code for an object. The first line is an optional title.
# #example One example of valid hash to this function is:
# {
# :name => "Engelbert",
# :id => 1345
# }
def f hash_param
# ...
end
Maybe I don't get your question:
the equivalent of adding 4 backslashes to the markdown format, or 4 starting empty spaces to stackoverflow
If I use the 4 starting empty spaces in my code like this:
=begin
One example of valid hash to this function is:
{
:name => "Engelbert",
:id => 1345
}
=end
def f hash_param
# ...
end
then I get
But maybe you can also use #option:
#param hash_param
#option hash_param [String] :name The name of...
#option hash_param [Integer] :id The id of...
and you get:
Disclaimer: I used yard 0.9.26 for my examples.

Ruby Conditional argument to method

I have some 'generic' methods that extract data based on css selectors that usually are the same in many websites. However I have another method that accept as argument the css selector for a given website.
I need to call the get_title method if title_selector argument is nos passed. How can I do that?
Scrape that accept css selectors as arguments
def scrape(urls, item_selector, title_selector, price_selector, image_selector)
collection = []
urls.each do |url|
doc = Nokogiri::HTML(open(url).read) # Opens URL
#items = doc.css(item_selector)[0..1].map {|item| item['href']} # Sets items
#items.each do |item| # Donwload each link and parse
page = Nokogiri::HTML(open(item).read)
collection << {
:title => page.css(title_selector).text, # I guess I need conditional here
:price => page.css(price_selector).text
}
end
#collection = collection
end
end
Generic title extractor
def get_title(doc)
if doc.at_css("meta[property='og:title']")
title = doc.css("meta[property='og:title']")
else doc.css('title')
title = doc.at_css('title').text
end
end
Use an or operator inside your page.css call. It will call get_title if title_selector is falsey (nil).
:title => page.css(title_selector || get_title(doc)).text,
I'm not sure what doc should actually be in this context, though.
EDIT
Given your comment below, I think you can just refactor get_title to handle all of the logic. Allow get_title to take an optional title_selector parameter and add this line to the top of your method:
return doc.css(title_selector).text if title_selector
Then, my original line becomes:
:title => get_title(page, title_selector)

How to create a custom DSL in Ruby like YAML, Cucumber, Markdown, etc?

I currently have a Ruby-based DSL for creating slides that uses instance eval:
# slides.rb
slide {
title 'Ruby Programming'
subtitle 'A simple introduction'
bullet 'First bullet'
bullet 'Second bullet'
}
# implementation:
class DSL
class Slide
def title(title)
#title = title
end
# ...etc...
end
def slide(&block)
#slides << Slide.new.instance_eval(&block)
end
end
dsl = DSL.new
dsl.instance_eval(File.read('slides.rb'))
Which results in something like this:
Ruby Programming
A simple introduction
First bullet
Second bullet
I would like to take this to the next level by creating a DSL that does not use Ruby syntax. Maybe something more like YAML or Markdown:
title: Ruby Programming
subtitle: A simple introduction
* First bullet
* Second bullet
How can I create a DSL/parser for this type of syntax?
Someone already mentioned Parslet, but I thought I would demo how easy it is.
require 'parslet'
require 'pp'
slides = <<EOS
title: Ruby Programming
subtitle: A simple introduction
* First bullet
* Second bullet
EOS
#Best to read the parser from the bottom up.
class SlidesParser < Parslet::Parser
rule(:eol) { str("\n") | any.absent? }
rule(:ws?) { match('[\s\t]').repeat(0) }
rule(:rest_of_line) { ws? >> (str("\n").absent? >> any).repeat(1).as(:text) }
rule(:title) { ws? >> str("title:")>> rest_of_line.as(:title) >> eol }
rule(:subtitle) { ws? >> str("subtitle:")>> rest_of_line.as(:subtitle) >> eol }
rule(:bullet) { ws? >> str("*") >> rest_of_line >> eol }
rule(:bullet_list) { bullet.repeat(1).as(:bullets) }
rule(:slide) { (title >> subtitle >> bullet_list).as(:slide) }
root(:slide)
end
# Note: parts can be made optional by adding a ".maybe" eg. => subtitle.maybe
result = SlidesParser.new.parse(slides)
pp result
#{:slide=>
# {:title=>{:text=>"Ruby Programming"#9},
# :subtitle=>{:text=>"A simple introduction"#38},
# :bullets=>[{:text=>"First bullet"#64}, {:text=>"Second bullet"#81}]}}
In Parslet, Parsers are only part of the job, as they just converting text into a ruby structure.
You then use tranforms to match/replace tree nodes to make the structure you want.
# You can do lots of things here.. I am just replacing the 'text' element with their value
# You can use transforms to build your custom AST from the raw ruby tree
class SlidesTransform < Parslet::Transform
rule(:text => simple(:str)) { str }
# rule(
# :title => simple(:title),
# :subtitle => simple(:subtitle),
# :bullets => sequence(:bullets)) { Slide.new(title, subtitle, bullets) }
end
pp SlidesTransform.new.apply(result)
#{:slide=>
# {:title=>"Ruby Programming"#9,
# :subtitle=>"A simple introduction"#38,
# :bullets=>["First bullet"#64, "Second bullet"#81]}}
I believe Cucumber uses Ragel for its parser, here's a decent looking intro to it using Ruby...
Treetop is also pretty common, along with Parslet.
ANTLR, Rex and Racc... All kinds of ways to handle external DSLs.
Eloquent Ruby has a chapter on external DSL creation, from basic string parsing and regexes to using Treetop...
You can do rudimentary parsing with regexp. Something like this:
slides = <<EOS
title: Ruby Programming
subtitle: A simple introduction
* First bullet
* Second bullet
EOS
regexp = %r{
(title:\s+)(?<title>[^\n]*)|
(subtitle:\s+)(?<subtitle>[^\n]*)|
(\*\s+)(?<bullet>[^\n]*)
}x
tags = {
'title' => 'h1',
'subtitle' => 'h2',
'bullet' => 'li'
}
fUL = false
slides.lines.each {|line|
md = line.match(regexp)
md.names.select{|k| md[k]}.each {|k|
puts '<ul>' or fUL = true if k == 'bullet' && !fUL
puts '</ul>' or fUL = false if k != 'bullet' && fUL
puts "<#{tags[k]}>#{md[k]}</#{tags[k]}>"
}
}
puts '</ul>' if fUL
Maybe its worth a look at some current open-sourced implementations.
But I have to ask - why are you making your own? why dont you use one which is already available? TOML is great.
Ruby parser implementation: https://github.com/jm/toml

Jekyll - generating JSON files alongside the HTML files

I'd like to make Jekyll create an HTML file and a JSON file for each page and post. This is to offer a JSON API of my Jekyll blog - e.g. a post can be accessed either at /posts/2012/01/01/my-post.html or /posts/2012/01/01/my-post.json
Does anyone know if there's a Jekyll plugin, or how I would begin to write such a plugin, to generate two sets of files side-by-side?
I was looking for something like this too, so I learned a bit of ruby and made a script that generates JSON representations of Jekyll blog posts. I’m still working on it, but most of it is there.
I put this together with Gruntjs, Sass, Backbonejs, Requirejs and Coffeescript. If you like, you can take a look at my jekyll-backbone project on Github.
# encoding: utf-8
#
# Title:
# ======
# Jekyll to JSON Generator
#
# Description:
# ============
# A plugin for generating JSON representations of your
# site content for easy use with JS MVC frameworks like Backbone.
#
# Author:
# ======
# Jezen Thomas
# jezenthomas#gmail.com
# http://jezenthomas.com
module Jekyll
require 'json'
class JSONGenerator < Generator
safe true
priority :low
def generate(site)
# Converter for .md > .html
converter = site.getConverterImpl(Jekyll::Converters::Markdown)
# Iterate over all posts
site.posts.each do |post|
# Encode the HTML to JSON
hash = { "content" => converter.convert(post.content)}
title = post.title.downcase.tr(' ', '-').delete("’!")
# Start building the path
path = "_site/dist/"
# Add categories to path if they exist
if (post.data['categories'].class == String)
path << post.data['categories'].tr(' ', '/')
elsif (post.data['categories'].class == Array)
path << post.data['categories'].join('/')
end
# Add the sanitized post title to complete the path
path << "/#{title}"
# Create the directories from the path
FileUtils.mkpath(path) unless File.exists?(path)
# Create the JSON file and inject the data
f = File.new("#{path}/raw.json", "w+")
f.puts JSON.generate(hash)
end
end
end
end
There are two ways you can accomplish this, depending on your needs. If you want to use a layout to accomplish the task, then you want to use a Generator. You would loop through each page of your site and generate a new .json version of the page. You could optionally make which pages get generated conditional upon the site.config or the presence of a variable in the YAML front matter of the pages. Jekyll uses a generator to handle slicing blog posts up into indices with a given number of posts per page.
The second way is to use a Converter (same link, scroll down). The converter will allow you to execute arbitrary code on your content to translate it to a different format. For an example of how this works, check out the markdown converter that comes with Jekyll.
I think this is a cool idea!
Take a look at JekyllBot and the following code.
require 'json'
module Jekyll
class JSONPostGenerator < Generator
safe true
def generate(site)
site.posts.each do |post|
render_json(post,site)
end
site.pages.each do |page|
render_json(page,site)
end
end
def render_json(post, site)
#add `json: false` to YAML to prevent JSONification
if post.data.has_key? "json" and !post.data["json"]
return
end
path = post.destination( site.source )
#only act on post/pages index in /index.html
return if /\/index\.html$/.match(path).nil?
#change file path
path['/index.html'] = '.json'
#render post using no template(s)
post.render( {}, site.site_payload)
#prepare output for JSON
post.data["related_posts"] = related_posts(post,site)
output = post.to_liquid
output["next"] = output["next"].id unless output["next"].nil?
output["previous"] = output["previous"].id unless output["previous"].nil?
#write
#todo, figure out how to overwrite post.destination
#so we can just use post.write
FileUtils.mkdir_p(File.dirname(path))
File.open(path, 'w') do |f|
f.write(output.to_json)
end
end
def related_posts(post, site)
related = []
return related unless post.instance_of?(Post)
post.related_posts(site.posts).each do |post|
related.push :url => post.url, :id => post.id, :title => post.to_liquid["title"]
end
related
end
end
end
Both should do exactly what you want.

Setting a hyperlink text color in axlsx

I'm trying to set the foreground color of text in a hyperlink cell but it doesn't seem to work.
Using something like: sheet["A1"].color = "0000FF" works fine for a normal cell, but not for a hyperlinked cell
This code simply creates a link to cell D1 on the "Log" sheet (which works fine) but A1 never turns blue!
sheet.add_hyperlink :location => "'Log'!D1", :target => :sheet, :ref => "A1"
sheet["A1"].color = "0000FF"
Thanks!
There are two important things to do before applying a color to a link:
You have to define the color within a style, and
You have to know the exact address of the cell in question.
Styles are normally applied to rows, but in this case you want to apply it to a specific cell. This is possible, but you need to address the cell directly through the Sheet Object. Also, and somewhat counter intuitively, the 'add_hyperlink' method is available to the Sheet object, not the Cell. So beware of that as well.
Here is an example of how to apply a style to a cell containing a link:
p = Axlsx::Package.new
p.workbook do |wb|
wb.styles do |s|
blue_link = s.add_style :fg_color => '0000FF'
wb.add_worksheet(:name => "Anchor Link Test") do |sheet|
sheet.add_row ['Title', 'Link']
# Define the row here, we will use that later
row = sheet.add_row ['Google', 'Click to go']
# Add the hyperlink by addressing the column you have used and add 1 to the row's index value.
sheet.add_hyperlink :location => "http://www.google.com", :ref => "B#{row.index + 1}"
sheet["B#{row.index + 1}"].style = blue_link
end
s = p.to_stream()
File.open("anchor_link_test.xlsx", 'w') { |f| f.write(s.read) }
end
end
Final note: You might note that I have written this spreadsheet using the methods
s = p.to_stream()
File.open("anchor_link_test.xlsx", 'w') { |f| f.write(s.read) }
There is evidence presented on the Axlsx Github Issues Page which shows that this means of writing out the file is significantly faster than
p.serialize
Just thought that deserved mention somewhere on StackOverflow!
This seems to work:
require 'axlsx'
p = Axlsx::Package.new
ws = p.workbook.add_worksheet
ws.add_row ['hoge-hoge']
ws['A1'].color = '0000FF'
ws.add_hyperlink :location => 'F6', :target => :sheet, :ref => 'A1'
p.serialize 'where_is_my_color.xlsx'
Can you post a larger example of your code that does not set the color?
Apparently Axlsx is only applying custom styles to String data types. Fixed this by setting each column to type :string like this:
Sheet.add_row [ "1", "2", "3" ], :types => [:string, :string, :string]
Thanks Randy!

Resources