Pandoc - Convert from Markdown to PDF and Set File Name From Metadata Title - pandoc

YAML Metada looks like this
---
Tag: tag1, tag2
title: "Title I want to use"
Status: active
Name: "John Smith"
---
I currently use this command
for f in *.md; do pandoc "$f" -o "${f%.md}.pdf"; done
How do I set the command so that the file name will be taken from the title metadata?

You'll need a helper file title.plain with content
$title$
With that, the following command should do:
for f in *.md; do
pandoc "$f" -o "$(pandoc --template=title.plain -t plain "$f")".pdf
done

Related

save command output to a file without timestamp info

If I run this command on my terminal (https://hub.getdbt.com/dbt-labs/codegen/latest/):
dbt run-operation generate_model_yaml --args "{\"model_name\": "bookings"}"
I get an output that looks like this:
12:53:32 Running with dbt=1.0.1
12:53:34 version: 2
models:
- name: bookings
description: ""
columns:
- name: booking_id
description: ""
- name: masterclient_id
description: ""
I want to save it to a file. If I do this:
dbt run-operation generate_model_yaml --args "{\"model_name\": "bookings"}" > test.yml
this also gets saved to the output:
12:53:32 Running with dbt=1.0.1
12:53:34
While my desired output is just this:
version: 2
models:
- name: bookings
description: ""
columns:
- name: booking_id
description: ""
- name: masterclient_id
description: ""
How can I get rid of the extra time stamp info in the beginning and then save the remaining output in a file?
If you're confident that the output is always structured with those exact two timestamps, you can do:
dbt run-operation generate_model_yaml \
--args "{\"model_name\": \"bookings\"}" \
| tail -n +2 | sed '1 s/[0-9:]* *//'
tail -n +2 removes the first line. The sed command removes the timestamp and following whitespace from the second (now first) line.
A quick look at the relevant dbt docs yields
The YAML for a base model will be logged to the command line
So it doesn't seem that you can instruct dbt directly to output the YAML data without the logging timestamps.

Remove header YAML with sed from a markdown file

I have a text in markdown that I want to pass to HTML with pandoc and delete the header. This is the command:
sed '/---/,/---/d' java.md | pandoc - -f markdown -t html5 --wrap=none -o java.html
and this is the header:
---
title: Instalar JAVA en Ubuntu
subtitle: Subtitle
author:
- I am an author
date: \today{}
---
The problem is that it also removes part of the text where this ------ appears.
What code do I need from sed?
You can use ^ (start of line) and $ (end of line) to prevent ------- from being matched.
sed '/^---$/,/^---$/d' file.md

Why does pandoc convert "<br>" to "+ \n" on html to asciidoc conversion?

I am trying to convert HTML to asciidoc using pandoc but pandoc converts <br> tags into +\n instead of \n like the following.I also tried asciidoc-escaped_line_breaks but nothing changed.
Terminal Command:
`pandoc +RTS -K100000000 -RTS --wrap=preserve -f html -t asciidoc-escaped_line_breaks "input.html" -o "output.asciidoc"`
input.html
s
<br>
s
output.asciidoc
s +
s
Expected Output:
s
s
Version:pandoc 1.19.2.4
The escaped_line_breaks extension is currently only implemented for markdown, not for AsciiDoc.
You could use a pandoc lua filter like the following, to strip all LineBreak elements from the document:
function LineBreak()
return {}
end
Save this to e.g. strip-linebreaks.lua. Note that you have a really old pandoc version, you need a newer one to use lua filters. Then:
pandoc -f html --lua-filter strip-linebreaks.lua -t asciidoc

How to append my conversion results of pandoc to the file?

I am trying to append my HTML to markdown conversion results of pandoc to .md file. The following command overwrites the existing file instead of appending. Is there any parameter to specify the appending operation?
pandoc -f html -t markdown -o output.md
So you want to append the output of pandoc to the output.md file? Use the shell's builtin >>:
pandoc -f html -t markdown >> output.md

Use sed find ID in txt file and use ID to rename file

Using wget, a webpage is downloaded as a .txt file. This file saved is named using part of the url of the webpage, eg. wget http://www.example.com/page/12345/ -O 12345.txt, for convenience.
I am running commands from a shell script .sh file, as it can execute multiple commands, one line at time, eg.
After a file is downloaded, I use sed to parse for text / characters I want to keep. Part of the text I want includes blah blah Product ID a5678.
What I want is to use sed to find a5678 and use this to rename the file 12345.txt to a5678.txt.
# script.sh
wget http://www.example.com/page/12345/ -O 12345.txt
sed -i '' 's/pattern/replace/g' 12345.txt
sed command to find a5678 # in line blah blah Product ID a5678
some more sed commands
mv 12345.txt a5678.txt (or use a variable $var.txt)?
How do I do this?
I may also want to use this same ID a5678 and create a folder with the same name a5678. Hence the .txt file is inside the folder like so /a5678/a5678.txt.
mkdir a5678 (or mkdir $var)? && cd a5678
I've searched for answers for half a day, but can't find any. The closest I found is
Find instance of word in files and change it to the filename but it is the exact opposite of what I want. I've also thought about using variables eg. https://askubuntu.com/questions/76808/how-do-i-use-variables-in-a-sed-command but I don't know how to save the found characters as a variable.
Very much look forward to some help! Thank you! I am on a Mac running Sierra.
Trying to minimize, so fit this into your logic.
in=12345.txt
out=$( grep ' Product ID ' $in | sed 's/.* Product ID \([^ ]*\) .*/\1/' )
mkdir -p $out
mv $in $out/$out.txt
Thank you all! With your inspiration, I solved my problem by (without using grep):
in=12345
out=$(sed -n '/pattern/ s/.*ID *//p' $in.txt)
mv $in.txt $out.txt
cd ..
mv $in $out

Resources