Specifying papersize for md to pdf conversion

Specifying papersize for md to pdf conversion - pandoc

I've read you can specify papersize by pandoc -V papersize:"a4paper" .... This doesn't seem to work anymore. I've managed to specify B5 using
pandoc -V papersize:b5 -o out.pdf in.md
But this fails for e.g. b6. The following -V arguments also all appear to have no effect:
-V papersize:b5paper
-V papersize:"b5paper"
-V "papersize:b5paper"
What is the proper way to format this, or am I missing something more fundamental? In my opinion the documentation is far from clear, lacking e.g. any examples.

pandoc -s -V papersize:a4
Pandoc's LaTeX template will append paper to a4...

You can use either -V papersize:a4 per the manual or -V geometry:a4paper. The latter has a side-effect of giving smaller margins than the default template.

pandoc -V geometry:b5paper infile -o outfile.pdf
Using 'geometry' worked for me.
I presume the options here under 'Reference guide' would work (but I haven't tested).

In order to control papersize and margins you could run something along the lines of:
pandoc -V geometry:a4paper,margin=2cm in.md -o out.pdf
(you may try other options of the geometry.sty latex package)

Related

How to convert utf8 into binary

Recently I am studying about binary code, and I want to know how do I convert text that has been encoded by UTF-8 and then into binary?

I recommend using the command-line tool iconv.
For example:
$ iconv option
$ iconv options -f from-encoding -t to-encoding inputfile(s) -o outputfile
Here is a online tutorial that might be of help:
https://www.tecmint.com/convert-files-to-utf-8-encoding-in-linux/

How to get the highest numbered link from curl result?

i have create small program consisting of a couple of shell scripts that work together, almost finished
and everything seems to work fine, except for one thing of which i'm not really sure how to do..
which i need, to be able to finish this project...
there seem to be many routes that can be taken, but i just can't get there...
i have some curl results with lots of unused data including different links, and between all data there is a bunch of similar links
i only need to get (into a variable) the link of the highest number (without the always same text)
the links are all similar, and have this structure:
always same text
always same text
always same text
i was thinking about something like;
content="$(curl -s "$url/$param")"
linksArray= get from $content all links that are in the href section of the links
that contain "always same text"
declare highestnumber;
for file in $linksArray
do
href=${1##*/}
fullname=${href%.html}
OIFS="$IFS"
IFS='_'
read -a nameparts <<< "${fullname}"
IFS="$OIFS"
if ${nameparts[1]} > $highestnumber;
then
highestnumber=${nameparts[1]}
fi
done
echo ${nameparts[1]}_${highestnumber}.html
result:
https://always/same/link/unique-name_19.html
this was just my guess, any working code that can be run from bash script is oke...
thanks...
update
i found this nice program, it is easily installed by:
# 64bit version
wget -O xidel/xidel_0.9-1_amd64.deb https://sourceforge.net/projects/videlibri/files/Xidel/Xidel%200.9/xidel_0.9-1_amd64.deb/download
apt-get -y install libopenssl
apt-get -y install libssl-dev
apt-get -y install libcrypto++9
dpkg -i xidel/xidel_0.9-1_amd64.deb
it looks awsome, but i'm not really sure how to tweak it to my needs.
based on that link and the below answer, i guess a possible solution would be..
use xidel, or use "$ sed -n 's/.href="([^"]).*/\1/p' file" as suggested in this link, but then tweak it to get the link with html tags like:
< a href="https://always/same/link/same-name_17.html">always same text< /a>
then filter out all that doesn't end with ( ">always same text< /a> )
and then use the grep sort as mentioned below.

Continuing from the comment, you can use grep, sort and tail to isolate the highest number of your list of similar links without too much trouble. For example, if you list of links is as you have described (I've saved them in a file dat/links.txt for the purpose of the example), you can easily isolate the highest number in a variable:
Example List
$ cat dat/links.txt
always same text
always same text
always same text
Parsing the Highest Numbered Link
$ myvar=$(grep -o 'https:.*[.]html' dat/links.txt | sort | tail -n1); \
echo "myvar : '$myvar'"
myvar : 'https://always/same/link/same-name_19.html'
(note: the command above is all one line separate by the line-continuation '\')
Applying Directly to Results of curl
Whether your list is in a file, or returned by curl -s, you can apply the same approach to isolate the highest number link in the returned list. You can use process substitution with the curl command alone, or you can pipe the results to grep. E.g. as noted in my original comment,
$ myvar=$(grep -o 'https:.*[.]html' < <(curl -s "$url/$param") | sort | tail -n1); \
echo "myvar : '$myvar'"
or pipe the result of curl to grep,
$ myvar=$(curl -s "$url/$param" | grep -o 'https:.*[.]html' | sort | tail -n1); \
echo "myvar : '$myvar'"
(same line continuation note.)

Why not use Xidel with xquery to sort the links and return the last?
xidel -q links.txt --xquery "(for $i in //#href order by $i return $i)[last()]" --input-format xml
The input-format parameter makes sure you don't need any html tags at the start and ending of your txt file.
If I'm not mistaken, in the latest Xidel the -q (quiet) param is replaced by -s (silent).

Pandoc: What are the available syntax highlighters?

Bullet point 18 of http://pandoc.org/demos.html#examples shows how to change the syntax highlighter used by giving an argument to --highlight-style. For example:
pandoc code.text -s --highlight-style pygments -o example18a.html
pandoc code.text -s --highlight-style kate -o example18b.html
pandoc code.text -s --highlight-style monochrome -o example18c.html
pandoc code.text -s --highlight-style espresso -o example18d.html
pandoc code.text -s --highlight-style haddock -o example18e.html
pandoc code.text -s --highlight-style tango -o example18f.html
pandoc code.text -s --highlight-style zenburn -o example18g.html
I am wondering if these are the only color schemes available. If not, how can I load a different syntax highlighter? Can I define my own?

Since pandoc 2.0.5, you can also use --print-highlight-style to output a theme file and edit it.
To me, the best way to use this option is to
Pick a pleasant available style
Output its theme file
Edit the theme file
Use it!
1. Available Styles
Pick your style, among the one already existing:
2. Output its theme file
Once you decided which style was the closest to your needs, you can output its theme file, using (for instance for pygments, the default style):
pandoc --print-highlight-style pygments
so that you can store this style in a file, using, e.g.,
pandoc --print-highlight-style pygments > my_style.theme
With some shells, especially on Windows, using redirected output can lead to encoding problems. If that happens, use this instead:
pandoc -o my_style.theme --print-highlight-style pygments
3. Edit the file
Using the Skylighting JSON Themes guide, edit the file according to your need / taste.
4. Use the file
In the right folder, just use
pandoc my_file.md --highlight-style my_style.theme -o doc.html

If your pandoc --version indicates a release of 1.15.1 (from Oct 15, 2015) or newer, then you can check if the --bash-completion parameter works for you to get a full list of available built-in highlighting styles.
Run
pandoc --bash-completion
If it works, you'll see a lot of output. And it will be useful well beyond the original question above...
If --bash-completion works, then put this line towards the end of your ${HOME}/.bashrc file (on Mac OS X or Linux -- doesn't work on Windows yet):
eval "$(pandoc --bash-completion)"
Once you open a new terminal, you can use the pandoc command with "tab completion":
pandoc --h[tab]
will yield
--help --highlight-style --html-q-tags
pandoc --hi[tab]
will yield
pandoc --highlight-style
Answer to original question:
Now punch the [tab] key one more time, and you'll see
espresso haddock kate monochrome pygments tango zenburn
It's the list of all available syntax highlighters. To shorten the precedure, you could also type
pandoc --hi[tab][tab]
to get the same result.
Usefulness of Pandoc's tab completion beyond original question:
Pandoc's bash tab completion also works for all other commandline switches:
pandoc -h[tab]
yields this -- a list of all possible command line parameters:
Display all 108 possibilities? (y or n)
--ascii --indented-code-classes --template
--asciimathml --jsmath --title-prefix
--atx-headers --katex --to
--base-header-level --katex-stylesheet --toc
--bash-completion --latex-engine --toc-depth
--biblatex --latex-engine-opt --trace
--bibliography --latexmathml --track-changes
--chapters --listings --variable
--citation-abbreviations --mathjax --verbose
--columns --mathml --version
--csl --metadata --webtex
--css --mimetex --wrap
--data-dir --natbib --write
--default-image-extension --no-highlight -A
--dpi --no-tex-ligatures -B
--dump-args --no-wrap -D
--email-obfuscation --normalize -F
--epub-chapter-level --number-offset -H
--epub-cover-image --number-sections -M
--epub-embed-font --old-dashes -N
--epub-metadata --output -R
--epub-stylesheet --parse-raw -S
--extract-media --preserve-tabs -T
--file-scope --print-default-data-file -V
--filter --print-default-template -c
--from --read -f
--gladtex --reference-docx -h
--help --reference-links -i
--highlight-style --reference-odt -m
--html-q-tags --section-divs -o
--id-prefix --self-contained -p
--ignore-args --slide-level -r
--include-after-body --smart -s
--include-before-body --standalone -t
--include-in-header --tab-stop -v
--incremental --table-of-contents -w
One interesting use case for Pandoc's tab completion is this:
pandoc --print-default-d[tab][tab]
gives the output list of completion for pandoc --print-default-data-file. This list gives you a uniq insight into what data files your instance of Pandoc will load when it is doing its work. For example you could investigate a detail of Pandoc's default ODT (OpenDocument Text file) output styling like this:
pandoc --print-default-data-file odt/content.xml \
| tr " " "\n" \
| tr "<" "\n" \
| grep --color "style"

The Pandoc README says:
--highlight-style=STYLE|FILE
Specifies the coloring style to be used in highlighted source code.
Options are pygments (the default), kate, monochrome,
breezeDark, espresso, zenburn, haddock, and tango.
For more information on syntax highlighting in pandoc, see
Syntax highlighting, below. See also
--list-highlight-styles.
Instead of a STYLE name, a JSON file with extension
.theme may be supplied. This will be parsed as a KDE
syntax highlighting theme and (if valid) used as the
highlighting style. To see a sample theme that can be
modified, pandoc --print-default-data-file default.theme.
The library skylighting (in older versions highlighting-kate) is used for the highlighting. If you don't like any of the provided color schemes, you can either:
Specify a .theme file as mentioned above,
when exporting to HTML, <span> tags are generated that you can style with your custom CSS, or
when exporting to LaTeX/PDF, you need to use a custom Pandoc LaTeX template and replace the $highlighting-macros$ part with your custom color definitions, as described in this issue.

If you are using Pandoc version 1.18 (released in October 2016) or later, a new answer is possible:
pandoc --list-highlight-languages
and
pandoc --list-highlight-styles
will give you all the info you were asking for.
Other new informational command line parameters added to v1.18 are:
pandoc --list-input-formats
pandoc --list-output-formats
pandoc --list-extensions

Grep --byte-offset not returning the offset (Grep version 2.5.1)

Grep --byte-offset not returning the offset (Grep version 2.5.1)
Hi,
I am trying to get the position of a repeated string in a line using
Code:
grep -b -o "pattern"
In my server I am using GNU grep version 2.14 and the code is working fine. However when I am deploying the same code in a different server which is using GNU grep version 2.5.1 the code is not working properly. Even though the byte offset option is available there. Any idea how to solve it.
Example:
Code:
export string="abc cat mat rat cat bat cat fat rat tat tat cat"
echo $string|grep -b -o "cat"
Expected output (and supported in grep 2.14):
4:cat
16:cat
24:cat
44:cat
But same code with grep version 2.5.1 is giving the following output:
0:cat
cat
cat
cat
Please suggest..

It was a bug in grep as some notes in its Changelog refer to it:
* src/grep.c (nlscan): Make this function more robust by removing
the undocumented assumption that its "lim" argument points
right after a line boundary. This will be used later to fix
--byte-offset's broken behavior. Patch #3769.
Use later versions (at least 2.5.3) where it seems fixed already.

Extract a specific string from a curl'd result

Given this curl command:
curl --user-agent "fogent" --silent -o page.html "http://www.google.com/search?q=insansiate"
* Spelling is intentionally incorrect. I want to grab the suggestion as my result.
I want to be able to either grep into the page.html file perhaps with grep -oE or pipe it right from curl and never store a file.
The result should be: 'instantiate'
I need only the word 'instantiate', or the phrase, whatever google is auto correcting, is what I am after.
Here is the basic html that is returned:
<span class=spell style="color:#cc0000">Did you mean: </span><a href="/search?hl=en&ie=UTF-8&&sa=X&ei=VEMUTMDqGoOINraK3NwL&ved=0CB0QBSgA&q=instantiate&spell=1"class=spell><b><i>instantiate</i></b></a> <span class=std>Top 2 results shown</span>
So perhaps from/to of the string below, which I hope is unique enough to cover all my bases.
class=spell><b><i>instantiate</i></b></a>
I keep running into issues with greedy grep; perhaps I should run it though an html prettify tool first to get a line break or 50 in there. I don't know of any simple way to do so in bash, which is what I would ideally like this to be in. I really don't want to deal with firing up perl, and making sure I have the correct module.
Any suggestions, thank you?

As I'm sure you're aware, screen scraping is a delicate business. This command sequence is no exception since it relies on the specific structure of the page which could change at any time without notice.
grep -o 'Did you mean:\([^>]*>\)\{5\}' page.html | sed 's/.*<i>\([^<]*\)<.*/\1/' page.html
In a pipe:
curl --user-agent "fogent" --silent "http://www.google.com/search?q=insansiate" | grep -o 'Did you mean:\([^>]*>\)\{5\}' page.html | sed 's/.*<i>\([^<]*\)<.*/\1/'
This relies on finding five ">" characters between "Did you mean:" and the "</i>" after the word you're looking for.
Have you considered other methods of getting spelling suggestions or are you specifically interested in what Google provides?
If you have ispell or aspell installed, you can do:
echo insansiate | ispell -a
and parse the result.

xidel is a great utility for scraping web pages; it supports retrieving pages and extracting information in various query languages (CSS selectors, XPath).
In the case at hand, the simple CSS selector a.spell will do the trick.
xidel --user-agent "fogent" "http://google.com/search?q=insansiate" -e 'a.spell'
Note how xidel does its own page retrieval, so no need for curl in this case.
If, however, you needed curl for more exotic retrieval options, here's how you'd combine the two tools (line break for readability):
curl --user-agent "fogent" --silent "http://google.com/search?q=insansiate" |
xidel - -e 'a.spell'

curl --> tidy -asxml --> xmlstarlet sel

Edit: Sorry, did not see your Perl notice.
#!/usr/bin/perl
use strict;
use LWP::UserAgent;
my $arg = shift // 'insansiate';
my $lwp = LWP::UserAgent->new(agent => 'Mozilla');
my $c = $lwp->get("http://www.google.com/search?q=$arg") or die $!;
my #content = split(/:/, $c->content);
for(#content) {
if(m;<b><i>(.+)</i></b>;) {
print "$1\n";
exit;
}
}
Running:
> perl google.pl
instantiate
> perl google.pl disconect
disconnect

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Specifying papersize for md to pdf conversion - pandoc

pandoc -s -V papersize:a4 Pandoc's LaTeX template will append paper to a4...

You can use either -V papersize:a4 per the manual or -V geometry:a4paper. The latter has a side-effect of giving smaller margins than the default template.

pandoc -V geometry:b5paper infile -o outfile.pdf Using 'geometry' worked for me. I presume the options here under 'Reference guide' would work (but I haven't tested).

In order to control papersize and margins you could run something along the lines of: pandoc -V geometry:a4paper,margin=2cm in.md -o out.pdf (you may try other options of the geometry.sty latex package)

Related

How to convert utf8 into binary

How to get the highest numbered link from curl result?

Pandoc: What are the available syntax highlighters?

Grep --byte-offset not returning the offset (Grep version 2.5.1)

Extract a specific string from a curl'd result

Categories

Resources