Remove header YAML with sed from a markdown file - bash

I have a text in markdown that I want to pass to HTML with pandoc and delete the header. This is the command:
sed '/---/,/---/d' java.md | pandoc - -f markdown -t html5 --wrap=none -o java.html
and this is the header:
---
title: Instalar JAVA en Ubuntu
subtitle: Subtitle
author:
- I am an author
date: \today{}
---
The problem is that it also removes part of the text where this ------ appears.
What code do I need from sed?

You can use ^ (start of line) and $ (end of line) to prevent ------- from being matched.
sed '/^---$/,/^---$/d' file.md

Related

How to remove first 3 lines of multiple .md files with Mac terminal?

I need to remove first three lines of multiple markdown files structured like that:
|- Folder
| - 01-07-22.md
| - 02-07-22.md
| - 03-07-22.md
| - ...
I would like to do this with Mac Terminal (if possible) because I have no expertise with any coding language and thus I don't have any coding platform installed on my computer.
Also I would like to know if other than deleting first 3 lines is possible to add "##" at the very beginning of every document.
What works:
This work:
sed -i '' '1,3d' Folder/*.md
The following command works also:
sed -i '' '1i\
##' *.md
But it does not add a new line before the first line.
This does not work at all:
sed -i '' '1s
/^/##/' *.md
How to add an empty line at the beginning and "##" at the beginning of the now second line? Explaination:
From this:
# First line of example .md file
Second line of example .md
...
To this:
### First line of example .md file
Second line of example .md
...
You should be able to use (GNU) sed. I hope that the Mac version supports all the required flags and features.
To delete the first 3 lines:
sed -i '1,3d' Folder/*.md
To prepend the line ## to every file:
sed -i '1i##' Folder/*.md
To prefix the existing first line with ##:
sed -i '1s/^/##/' Folder/*.md
The original files are overwritten without confirmation, but you can specify -i.bak to create backup files, e.g. Folder/01-07-22.md.bak. Specify -i '' to disable backup file creation.
Certain sed implementations might always require an argument after -i, so go with -i.bak or -i .bak.
If prepending a line does not work, try a different syntax (the newline is important):
sed -i .bak '1i\
##' Folder/*.md
If that doesn't work either, there's another form how sed could be invoked:
sed -i .bak -e '1i\' -e '##' Folder/*.md
If you want to modify the first line and add an empty line before it, e.g. transforming
1
2
3
into
##1
2
3
would require you to use the following command:
sed -i .bak '1s/^/##/;1i\
' Folder/*.md

How to include the abstract in HTML output

A simple example with pandoc 2.19.2:
$ cat test.md
---
title: An Example
author: Luís
language: en-IE
abstract: |
This is my abstract.
---
# Intro
Some text.
# Conclusion
More text.
$ pandoc -o test.html test.md
$ cat test.html
<h1 id="intro">Intro</h1>
<p>Some text.</p>
<h1 id="conclusion">Conclusion</h1>
<p>More text.</p>
The abstract does not appear in the HTML output, but in other formats it does (e.g. PDF). Is any extra parameter necessary for HTML?
Pandoc 2.17 and later support this by default. A custom template has to be used for older pandoc versions. E.g., download the updated defaults.html5 template and pass it to pandoc via
pandoc --template=/path/to/defaults.html5

BASH deleting HTML tags from the text file

I need to filter out all HTML tags from the text file (could be any sequence between <...>)
I came up with this command: cat my_file | sed 's/<[^>]*>//', but it olny delets first tag in the line. How do I delete all the tags? Is the problem with the regular expression?
From the sed manual:
The s command can be followed by zero or more of the following flags:
g
Apply the replacement to all matches to the regexp, not just the first.
So
cat my_file | sed 's/<[^>]*>//g'
If your intent is to remove all tags and get only text between them. Use, html2text or pup 'text{}'
https://github.com/ericchiang/pup
http://www.mbayer.de/html2text/
There are other tools like xidel, xmlstarlet too.

Pandoc - Convert from Markdown to PDF and Set File Name From Metadata Title

YAML Metada looks like this
---
Tag: tag1, tag2
title: "Title I want to use"
Status: active
Name: "John Smith"
---
I currently use this command
for f in *.md; do pandoc "$f" -o "${f%.md}.pdf"; done
How do I set the command so that the file name will be taken from the title metadata?
You'll need a helper file title.plain with content
$title$
With that, the following command should do:
for f in *.md; do
pandoc "$f" -o "$(pandoc --template=title.plain -t plain "$f")".pdf
done

Why does pandoc convert "<br>" to "+ \n" on html to asciidoc conversion?

I am trying to convert HTML to asciidoc using pandoc but pandoc converts <br> tags into +\n instead of \n like the following.I also tried asciidoc-escaped_line_breaks but nothing changed.
Terminal Command:
`pandoc +RTS -K100000000 -RTS --wrap=preserve -f html -t asciidoc-escaped_line_breaks "input.html" -o "output.asciidoc"`
input.html
s
<br>
s
output.asciidoc
s +
s
Expected Output:
s
s
Version:pandoc 1.19.2.4
The escaped_line_breaks extension is currently only implemented for markdown, not for AsciiDoc.
You could use a pandoc lua filter like the following, to strip all LineBreak elements from the document:
function LineBreak()
return {}
end
Save this to e.g. strip-linebreaks.lua. Note that you have a really old pandoc version, you need a newer one to use lua filters. Then:
pandoc -f html --lua-filter strip-linebreaks.lua -t asciidoc

Resources