Remove YAML header from markdown file

Remove YAML header from markdown file - ruby

How to remove a YAML header like this one from a text file in Ruby:
---
date: 2013-02-02 11:22:33
title: "Some Title"
Foo: Bar
...
---
(The YAML is surrounded by three dashes (-))
I tried
text.gsub(/---(.*)---/, '') # text is the variable which contains the full text of the file
but it didn't work.

The solution mentioned above will match from the first occurrence of --- to the last occurrence of --- and everything in between. That means if --- appears later on in your file you'll strip out not only the header, but some of the rest of the content.
This regex will only remove the yaml header:
/\A---(.|\n)*?---/
The \A ensures that it starts matching against the very first instance of --- and the ? makes the * be non-greedy, which makes it stop matching at the second instance of ---.

Found a solution, regex should be:
/---(.|\n)*---/

Related

before and after regex in replace block - caret isn't working as expected

Very difficult to understand ansible behavior. I'm trying to edit my /etc/postfix/master.cf file to turn on the submission block.
I copied the file to /tmp as I'm working on it, so my simple Ansible playbook should be:
- name: Edit the master.cf file
replace:
path: /tmp/master.cf
after: '^#tlsproxy'
before: '^#smtps'
regexp: '^#(.*)$'
replace: '\1'
But this doesn't work. It does work if I make one change, which is taking out the caret ^ in the before and after fields. This makes ... no sense to me at all. What makes even less sense, if I use:
after: '^#'
amazingly, it will do what I expect and uncomment all the lines after the first commented line.
But:
after: '^#t'
suddenly stops matching anything. I've read the Python page on regex and yet I'm baffled by this.
Any ideas? The playbook will work fine without the carets but I want to do this really correctly and match exactly what I want to match.
Thanks!

Ansible's replace module: uses DOTALL, which means the . special character can match newlines.
When you specify after: '^#tlsproxy', that caret is really saying: 'from the beginning of the search string, immediately followed by #tlsproxy', where the caret effectively means from the beginning of the file.
Ansible tries to match this:
"Pattern for before/after params did not match the given file: ^#tlsproxy(?P<subsection>.*?)^#smtps"
With re.DOTALL, if you want to match the patterns from the beginning of a line, use a newline character instead:
- name: Edit the master.cf file
replace:
path: master.cf
after: '\n#tlsproxy'
before: '\n#smtps'
regexp: '^#(.*)$'
replace: '\1'

How to convert Github-style Wiki page link to Markdown-style link in Bash script

first question for me on Stack Overflow.
I am trying to write a Bash script to convert the kind of Github Wiki links generated for other internal Github Wiki pages into conventional Markdown-style links.
The Github Wiki link strings look like this:
[[An example of another page]]
I want to convert it to look like this:
[An example of another page](An-example-of-another-page.htm)
Documents have an unknown number of these links and I don't know the content.
Currently I have been playing around with one-line sed solutions given to other problems, like this one:
https://askubuntu.com/questions/1283471/inserting-text-to-existing-text-within-brackets
... with absolutely no success. I'm not even sure where to start with it.
Thanks.

You can try this sed
$ sed -E 's/\[(.[^]]*)\]/\1/g;s/\[(.[^]]*)]/&(\1)/g;:jump s/(\([^ \)]*)[ ]/\1-/;tjump' input_file
[An example of another page](An-example-of-another-page)
s/\[(.[^]]*)\]/\1/g - Remove brackets []
s/\[(.[^]]*)]/&(\1)/g - Duplicate the content inside brackets [], return the match &, then manipulate the match and add parenthesis (\1)
:jump s/(\([^ \)]*)[ ]/\1-/;tjump - Create a label jump, match the empty spaces within the match if it is within parenthesis and replace with -

You can use bash's internal regular expression support to find and replace instances of wiki linked [[text]] with [text](text.htm). The pattern you want to use is \[\[([^\]]*)\]\]
\[ and \] - escapes the left and right square brackets so that they aren't interpreted as meta-characters that let you match character classes
([^\]]*) captures all text inside the double brackets until the first right square bracket
From there you can evaluate this regex and use the $BASH_REMATCH array to extract and manipulate the text. You'll need to run this multiple times in order to match all instances in the string and then replace the string inline using the / and // operators.
Here's a sample script:
#!/usr/bin/env bash
wiki_string="Now, this is [[a story]] all about how
My life [[got flipped-turned upside down]]
And I'd [[like to take a minute]]
Just [[sit]] right there
I'll [[tell you]] how I [[became the prince]] of a town called Bel-Air"
printf 'Original: %s\n' "$wiki_string"
# find each instance of [[text]] and capture the text inside
# the square brackets
# if successful, BASH_REMATCH will contain the matched text and the
# captured value inside the parentheses
while [[ "$wiki_string" =~ \[\[([^\]]*)\]\] ]]; do
# escape the [ and ] characters so we can replace [[text]]
# with our modified value
replace_text="${BASH_REMATCH[0]}"
replace_text="${replace_text/\[\[/\\[\\[}"
replace_text="${replace_text/\]\]/\\]\\]}"
# Get the matched value inside the brackets
link_text="${BASH_REMATCH[1]}"
# store another copy of the text with the spaces replaced
# with dashes and appending .htm
link_target="${link_text// /-}.htm"
# Finally, replace the matched [[text]] with [text](text.htm)
wiki_string="${wiki_string//$replace_text/[$link_text]($link_target)}"
done
printf '\nUpdated: %s\n' "$wiki_string"

Thanks to HatLess for the answer which I adapted. The snippet below converts Github-style links into Markdown-style links, without the two issues that HatLess's solution had. Specifically this doesn't break pre-existing Markdown-style links and it doesn't replace spaces with hyphens within brackets unless part of a link.
sed -E 's/\[\[(.[^]]*)]]/&(support\-\1\.htm)/g;:jump s/(]\([^ \)]*)[ ]/\1-/;tjump;s/\[\[/\[/g;s/]]\(/]\(/g' | pandoc -t html

Parse Markdown that is separated by `#` with regex pattern

I'd like to parse Markdown that is separated by # (single hash).
I've been try to that with Ruby.
Below code outputs ["# Bob's markdown header 1\n\nsomething here.\n\n", "#", "# kitty's header 1\n\nmeow.\n\n"]
p \
arrayobj = <<-EOS.scan(/^#[^#]*/m)
# Bob's markdown header 1
something here.
## this is markdown header 2
yeah.
# kitty's header 1
meow.
EOS
However what I wanted is below.
["# Bob's markdown header 1\n\nsomething here.\n\n## this is markdown header 2\n\nyeah.\n\n", "# kitty's header 1\n\nmeow.\n\n"]
In that case, how do you parse the Markdown?

You may match a line starting with # not followed with another # and any amount of subsequent lines that do not start with such a standalone # char:
.scan(/^#(?!#).*(?:\R(?!#(?!#)).*)*/)
See the Ruby demo online.
Pattern details
^ - start of a line
#(?!#) - a # not followed with #
.* - the rest of the line
(?:\R(?!#(?!#)).*)* - zero or more consecutive occurrences of:
\R(?!#(?!#)) - any line break sequence (use \n for old Ruby versions) that is not followed with a standalone #
.* - the rest of the line.

Issue copying file into new file gsub with regex, variable and string?

I'm struggling with a script to target specific XML files in a directory and rename them as copies with a different name.
I put in the puts statements for debugging, and, from what I can tell, everything looks OK until the FileUtils.cp line. I tried this with simpler text and it worked, but my overly complicated cp(file, file.gsub()) seems to be causing problems that I can't figure out.
def piano_treatment(cats)
FileUtils.chdir('12Piano')
src = Dir.glob('*.xml')
src.each do |file|
puts file
cats.each do |text|
puts text
if file =~ /#{text}--\d\d/
puts "Match Found!!"
puts FileUtils.pwd
FileUtils.cp(file, file.gsub!(/#{text}--\d\d/, "#{text}--\d\dBass "))
end
end
end
end
piano_treatment(cats)
I get the following output in Terminal:
12Piano--05Free Stuff--11Test.xml
05Free Stuff
Match Found!!
/Users/mbp/Desktop/Sibelius_Export/12Piano
cp 12Piano--05Free Stuff--ddBass Test.xml 12Piano--05Free Stuff--ddBass Test.xml
/Users/mbp/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/fileutils.rb:1551:in `stat': No such file or directory - 12Piano--05Free Stuff--ddBass Test.xml (Errno::ENOENT)
Why is \d\d showing up as "dd" when it should actually be numbers? Is this a single vs. double quote issue? Both yield errors.
Any suggestions are appreciated. Thanks.
EDIT One additional change was needed to this code. The FileUtils.chdir('12Piano') would change the directory for the first iteration of the loop, but it would revert to the source directory after that. Instead I did this:
def piano_treatment(cats)
src = Dir.glob('12Piano/*.xml')
which sets the match path for the whole method.

Your replacement string is not a regex, so \d has no special meaning, but is just a literal string. You need to specify a group in your regex, and then you can use the captured group in your replacement string:
FileUtils.cp(file, file.gsub(/#{text}--(\d\d)/, "#{text}--\\1Bass "))
The parenthesis in the regex form the group, which can be used (by number) in the replacement string: \1 for the first group, \2 for the second, etc. \0 refers to the entire regex match.
Update
Replaced gsub!() with gsub() and escaped the backslash in the replacement string (to treat \1 as the capture group, not a literal character... Doh!).

Escape an & (ampersand) at the start of a YAML entry?

An ampersand at the start of a YAML entry is normally seen as a label for a set of data that can be referenced later. How do you escape a legitimate ampersand at the start of a YAML entry. For example:
---
- news:
news_text: “Text!’
I am looking to not have &ldquo be a label within the yaml file, but rather when I get parse the YAML file to have the news_text come back with the “ in the entry.

Just put quotes around the text
require 'yaml'
data = <<END
---
- news:
news_text: "“Text!’"
END
puts YAML::load(data).inspect
# produces => [{"news"=>{"news_text"=>"“Text!’"}}]

You probably can enclose the text in quotes:
---
- news:
news_text: "“Text!’"
Besides, you can probably just as well use the proper characters there:
---
- news:
news_text: “Text!’
Putting escapes specific to a totally different markup language into a document written in another markup language seems ... odd to me, somehow.

Or you could put the string on the next line, if you put a '>' or '|' at the spot where the string used to be. Using the '|' character your parser will keep your custom line breaks, while '>' turns it into one long string, ignoring line breaks.
- news:
news_text: >
“Text!’

Putting the entire string in single quotes would do what you want:
---
- news:
news_text: '“Text!’'
But, I think that any yaml library should be smart enough to do that for you?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Remove YAML header from markdown file - ruby

How to remove a YAML header like this one from a text file in Ruby: --- date: 2013-02-02 11:22:33 title: "Some Title" Foo: Bar ... --- (The YAML is surrounded by three dashes (-)) I tried text.gsub(/---(.*)---/, '') # text is the variable which contains the full text of the file but it didn't work.

Found a solution, regex should be: /---(.|\n)*---/

Related

before and after regex in replace block - caret isn't working as expected

How to convert Github-style Wiki page link to Markdown-style link in Bash script

Parse Markdown that is separated by `#` with regex pattern

Issue copying file into new file gsub with regex, variable and string?

Escape an & (ampersand) at the start of a YAML entry?

Categories

Resources