Ruby, How to prevent Redcarpet to render HTML code in the output? - ruby

I am using Redcarpet to render in a webpage data introduced by the User.
I see that it is very easy for the User to introduce malicious HTML code.
I am trying different Redcarpet initializer options to prevent any possible malicious code to be renderered in the output but nothing is working:
Trying filter_html:
markdown =
Redcarpet::Markdown.new(
Redcarpet::Render::HTML,
filter_html: true
)
markdown.render("<style>style</style> <script>alert()</script>")
# => "<p><style>style</style> <script>alert()</script></p>\n"
Trying scape_html:
markdown =
Redcarpet::Markdown.new(
Redcarpet::Render::HTML,
escape_html: true
)
markdown.render("<style>style</style> <script>alert()</script>")
# => "<p><style>style</style> <script>alert()</script></p>\n"

These are options for the renderer, not the parser, so you need to pass them to the renderer, and then pass the configured renderer to the parser, e.g.:
markdown =
Redcarpet::Markdown.new(
Redcarpet::Render::HTML.new(escape_html: true),
# other parser options here, e.g.
autolink: true
)

Related

Keep custom code block attributes in pandoc when converting to Markdown

I am converting an org file to Markdown (specifically commonmark). I am adding a custom attribute to my code blocks, which the commonmark writer does not support, and strips them from the code block during conversion. I am trying to find a way to keep my custom attributes.
This is what I have:
#+begin_src python :hl_lines "2"
def some_function():
print("foo bar")
return
#+end_src
This is what I want in my .md file:
``` python hl_lines="2"
def some_function():
print("foo bar")
return
```
After doing some research, I think a filter can solve my issue: I am now playing with panflute, a python lib for writing pandoc filters.
I found some relevant questions, but they apply to other conversions (rST -> html, rst -> latex) and I don't know enough Lua to translate the code into Python and the org -> md conversion.
Thanks for any help.
I was able to write a script, posting it here for future Python-based questions about pandoc filters.
The filter below requires panflute, but there are other libs for pandoc filters in Python.
import panflute
def keep_attributes_markdown(elem, doc, format="commonmark"):
"""Keep custom attributes specified in code block headers when exporting to Markdown"""
if type(elem) == panflute.CodeBlock:
language = "." + elem.classes[0]
attributes = ""
attributes = " ".join(
[key + "=" + value for key, value in elem.attributes.items()]
)
header = "``` { " + " ".join([language, attributes]).strip() + " }"
panflute.debug(header)
code = elem.text.strip()
footer = "```"
content = [
panflute.RawBlock(header, format=format),
panflute.RawBlock(code, format=format),
panflute.RawBlock(footer, format=format),
]
return content
def main(doc=None):
return panflute.run_filter(keep_attributes_markdown, doc=doc)
if __name__ == "__main__":
main()
You can now run the following command:
pandoc --from=org --to=commonmark --filter=/full/path/to/keep_attributes_markdown.py --output=target_file.md your_file.org

Declare additional dependency to sphinx-build in an extension

TL,DR: From a Sphinx extension, how do I tell sphinx-build to treat an additional file as a dependency? In my immediate use case, this is the extension's source code, but the question could equally apply to some auxiliary file used by the extension.
I'm generating documentation with Sphinx using a custom extension. I'm using sphinx-build to build the documentation. For example, I use this command to generate the HTML (this is the command in the makefile generated by sphinx-quickstart):
sphinx-build -b html -d _build/doctrees . _build/html
Since my custom extension is maintained together with the source of the documentation, I want sphinx-build to treat it as a dependency of the generated HTML (and LaTeX, etc.). So whenever I change my extension's source code, I want sphinx-build to regenerate the output.
How do I tell sphinx-build to treat an additional file as a dependency? That is not mentioned in the toctree, since it isn't part of the source. Logically, this should be something I do from my extension's setup function.
Sample extension (my_extension.py):
from docutils import nodes
from docutils.parsers.rst import Directive
class Foo(Directive):
def run(self):
node = nodes.paragraph(text='Hello world\n')
return [node]
def setup(app):
app.add_directive('foo', Foo)
Sample source (index.rst):
.. toctree::
:maxdepth: 2
.. foo::
Sample conf.py (basically the output of sphinx-quickstart plus my extension):
import sys
import os
sys.path.insert(0, os.path.abspath('.'))
extensions = ['my_extension']
templates_path = ['_templates']
source_suffix = '.rst'
master_doc = 'index'
project = 'Hello directive'
copyright = '2019, Gilles'
author = 'Gilles'
version = '1'
release = '1'
language = None
exclude_patterns = ['_build']
pygments_style = 'sphinx'
todo_include_todos = False
html_theme = 'alabaster'
html_static_path = ['_static']
htmlhelp_basename = 'Hellodirectivedoc'
latex_elements = {
}
latex_documents = [
(master_doc, 'Hellodirective.tex', 'Hello directive Documentation',
'Gilles', 'manual'),
]
man_pages = [
(master_doc, 'hellodirective', 'Hello directive Documentation',
[author], 1)
]
texinfo_documents = [
(master_doc, 'Hellodirective', 'Hello directive Documentation',
author, 'Hellodirective', 'One line description of project.',
'Miscellaneous'),
]
Validation of a solution:
Run make html (or sphinx-build as above).
Modify my_extension.py to replace Hello world by Hello again.
Run make html again.
The generated HTML (_build/html/index.html) must now contain Hello again instead of Hello world.
It looks like the note_dependency method in the build environment API should do what I want. But when should I call it? I tried various events but none seemed to hit the environment object in the right state. What did work was to call it from a directive.
import os
from docutils import nodes
from docutils.parsers.rst import Directive
import sphinx.application
class Foo(Directive):
def run(self):
self.state.document.settings.env.note_dependency(__file__)
node = nodes.paragraph(text='Hello done\n')
return [node]
def setup(app):
app.add_directive('foo', Foo)
If a document contains at least one foo directive, it'll get marked as stale when the extension that introduces this directive changes. This makes sense, although it could get tedious if an extension adds many directives or makes different changes. I don't know if there's a better way.
Inspired by Luc Van Oostenryck's autodoc-C.
As far as I know app.env.note_dependency can be called within the doctree-read to add any file as a dependency to the document currently being read.
So in your use case, I assume this would work:
from typing import Any, Dict
from sphinx.application import Sphinx
import docutils.nodes as nodes
def doctree-read(app: Sphinx, doctree: nodes.document):
app.env.note_dependency(file)
def setup(app: Sphinx):
app.connect("doctree-read", doctree-read)

How to save a parsed Gherkin into a Feature File (Ruby)

I need to programatically modify feature files of cucumber.
I have parsed a feature file using gherkin's gem 'gherkin/parser'.
The problem I find is that after parsing, I end up with a hash with the following data as example:
{:type=>:GherkinDocument, :feature=>{:type=>:Feature, :tags=>[], :location=>{:line=>1, :column=>1}, :language=>"en", :keyword=>"Feature", :name=>"MyFeature", :description=>" As an user\n I want to test a feature", :children=>[{:type=>:Scenario, :tags=>[{:type=>:Tag, :location=>{:line=>5, :column=>3}, :name=>"#MyTag"}], :location=>{:line=>6, :column=>3}, :keyword=>"Scenario", :name=>"My scenario", :steps=>[{:type=>:Step, :location=>{:line=>7, :column=>5}, :keyword=>"Given ", :text=>"I start the app"}, {:type=>:Step, :location=>{:line=>8, :column=>5}, :keyword=>"And ", :text=>"I generate a test user"}, {:type=>:Step, :location=>{:line=>9, :column=>5}, :keyword=>"And ", :text=>"I finish the flow"}]}]}, :comments=>[]}
is it possible to convert this GherkinDocument generated by the parser to a plain text feature file to save it? What method or gem should I use to get
According to the docs, you would use the Ruby Gherkin::Pickles::Compiler:
require 'gherkin/parser'
require 'gherkin/pickles/compiler'
parser = Gherkin::Parser.new
gherkin_document = parser.parse("Feature: ...")
# Make changes to gherkin_document
pickles = Gherkin::Pickles::Compiler.new.compile(gherkin_document)

Add CSS minifier to Sprockets

I have a web application which uses rack.
The code:
set :assets, (Sprockets::Environment.new { |env|
env.js_compressor = Uglifier.new({
:output => {
:preserve_line => true,
:bracketize => true,
:beautify => true,
:indent_level => 4,
:semicolons => true,
},
:mangle => false
})
env.append_path(APP_ROOT + "/app/assets/images")
env.append_path(APP_ROOT + "/app/assets/javascripts")
env.append_path(APP_ROOT + "/app/assets/stylesheets")
})
I now want to add a CSS minifier to it.
Can someone explain why only javascript files are taken into the JS compressor above?
Can I add something like env.css_compressor = YUI::CssCompressor.new() after the JS_compressor to get my requirement done
UPDATE: Well the second actually worked. But I have no clue how it worked :)
You hadn't set up the Sprockets::Environment.css_compressor variable, so there was no compressor available to run on text/css assets.
puts Sprockets::Environment.methods.inspect
#=> [...#css_compressor, #css_compressor=, #js_compressor, #js_compressor=,...]
To answer your question about how assets are loaded, yes the default is to point to one load path and you can as well manipulate that to include others.
https://github.com/sstephenson/sprockets
The load path is an ordered list of directories that Sprockets uses to
search for assets. To add a directory to your environment's load path, use the append_path and prepend_path methods.

Pass dynamic content to template in Middleman

I'm building a static site using Middleman that has a portfolio section of all the client's recent projects.
The portfolio section will display project thumbnail images in a 3 X 3 gallery fashion and, when clicked on, will open their co-responding html page inside a lightbox.
The layout for the pages inside the light box is the same so rather than markup each individual page, I thought there would be a way for Middleman handling the content served from a yaml data file (projects.yml) using [a link.
Here's what I've got in my config.rb file
###
# Page options, layouts, aliases and proxies
###
# A path which all have the same layout
with_layout :popup do
page "/projects/*"
end
# Proxy (fake) files
# page "/this-page-has-no-template.html", :proxy => "/template-file.html" do
# #which_fake_page = "Rendering a fake page with a variable"
# end
data.projects.details.each do |pd|
proxy "/projects/#{pd[:client_name]}.html", "/projects/template.html", locals: { project: pd }, ignore: true
end
Ok so after some digging I came across the two posts below which helped me under stand how dynamic pages work in middleman. (Unfortunately there's not a lot of doco and the Middleman example for Dynamic pages is really basic)
http://benfrain.com/understanding-middleman-the-static-site-generator-for-faster-prototyping/
http://forum.middlemanapp.com/discussion/134/best-way-to-use-yaml-same-html-but-parameter-driven-data-fixed/p1
My solution...
data/projects.yml (contains project details)
details:
- client: "Company X"
title: "Company X Event"
video_url: ""
logo:
- "logo_companyx.gif"
image_path: "/img/projects/companyx"
total_images: 10
content: "<p>Blah blah blah</p>"
responsibilities:
"<li>Something</li>
<li>Some task</li>"
config.rb:
data.projects.details.each do |pd|
proxy "/projects/#{pd[:client]}.html", "/projects/template.html", :layout => false, :locals => { :project => pd }, :ignore => true
end
The trick with the snippet above is passing the entire project data object to the template via a proxy using locals and setting the layout to false so it doesn't inherit the default site layout (as I - or the client rather - want to display these in a lightbox popup)
The last step in the process was to create /projects/template.html.erb (in the source folder), declaring the following at the top of the template
<% p = locals[:project] %>
This allowed me to output each property of the p object within template.html.erb.
eg:
<%= p[:title] %>
I hope this helps someone as it took me a few days of playing around and LOTS of searching online for example or hints.

Resources