Pandoc: Error producing PDF. ! Undefined control sequence - pandoc

I want to convert a table written with Markdown to PDF. Within the Table there is a Cell with \\walhalla\path as content. I use this command to generate the output:
pandoc testMD.md -o testMD.pdf
So about as simple as it gets. I keep getting an error that has something to do with the backslashes. I assume the backslashes are interpreted as escape characters and are causing trouble becouse of it but is there any way for me to tell pandoc to just leave the backslashes as they are?
Here is the full Errormessage:
Error producing PDF.
! Undefined control sequence.
l.185 \textbackslash walhalla\path
I tried to specify the pdf-engine but that did not really change anything.
Just changing the content of the cell is not an option either as this is supposed to run as a CI-Pipeline to accept valid Markdown and convert it to pdf

Related

How to Escape Double Quotes from Ruby Page Object text

In using the Page Object gem, I'm trying to pull text from a page to verify error messages. One of these error messages contains double-quotes, but when the page object pulls the text from the page, it pulls some other characters.
expected ["Please select a category other than the Default â?oEMSâ?? before saving."]
to include "Please select a category other than the Default \"EMS\" before saving."
(RSpec::Expectations::ExpectationNotMetError)
I'm not quite sure how to escape these - I'm not sure where I could use Regexs and be able to escape these odd characters.
Honestly you are over complicating your validation.
I would recommend simplifying what you are trying to do, start by asking yourself: Is the part in quotes a critical part of your validation?
If it is, isolate it by doing a String.contains("EMS")
If it is not, then you are probably doing too much work, only check for exactly what you need in validation:
String.beginsWith("Please select a category other than the Default")
With respect to the actual issue you are having, on a technical level you have an encoding issue. Encode your result string with utf-8 before you pass it to your validation and you will be fine.
Good luck
It's pretty likely that somewhere along the line encoded the string improperly. (A tipoff is the accented characters followed by ?.) It seems pretty likely that the quotes were converted to "smart quotes" somewhere. This table compares Window-1252 to UTF-8:
Code Point Characters UTF-8 Bytes
Unicode Windows
1252 Expected Actual
------ ---- - --- -----------
U+201C 0x93 “ “ %E2 %80 %9C
U+201D 0x94 ” †%E2 %80 %9D
What you'll want to do is spot check various places in the code to find the first place the string is encoded in something other than UTF-8:
puts error_str.encoding
(For clarity, error_str is the variable that holds the string you are testing. I'm using puts, but you might want have another way to log diagnostic messages.)
Once you find the string that's not encoded UTF-8, you can convert it:
error_str.encode('UTF-8')
Or, if the string is hardcoded somewhere, just replace the string.
For more debugging advice, see: 3 Steps to Fix Encoding Problems in Ruby and How to Get From They’re to They’re.

Render non english characters in asciidoctor-pdf

I am trying to write documentation with asciidoctor-pdf and I need to use characters like : ă,â,î,ş,ţ. The pdf output is rendered but the mentioned characters are rendered empty. I am not sure how to handle the issue.
For example:
I wrote this code:
= Document Title
Doc Writer <doc#example.com>
:doctype: book
:source-highlighter: coderay
:listing-caption: Listing
// Uncomment next line to set page size (default is Letter)
//:pdf-page-size: A4
A simple http://asciidoc.org[AsciiDoc] document.
== Introducţie
A paragraph followed by a simple list with square bullets.
And the result was the word Introducţie rendered as Introduc ie and finally the error:
/usr/local/rvm/gems/ruby-2.2.2/gems/pdf-core-0.2.5/lib/pdf/core/pdf_object.rb:55: warning: regexp match /.../n against to UTF-8 string
Can be a system encoding configuration problem?
Do I need to set different encoding configuration in ruby?
Thank you.
I think that if you want to be sure, you can always use the decimal entity references form. For the latin small Letter T with cedilla it is: ţ
Check this table for the complete list:
List of Unicode characters
In addition, if you want to use this special char in a title, there was an issue with it:
Section id with characters outside of Windows-1252 encoding causes warning
It seems to be fixed now, but I did not verify it.
One of possible ways to write such special characters in titles is to declare them in preamble of your asciidoc document, for example,
:t-cedil: ţ
and to call it in the main text
== pass:normal[Test-{t-cedil}]
So your title will look like
Test-ţ

How Do Validators Differentiate Between '&' and '&amp'?

Knowing that & is the html entity value of & - how do validators like w3c know this? Even when I look at my source code it's already been parsed into the correct value.
Your question is based on a false premise -- as Co_42 noted, & is not the "ASCII value" of '&'. It's a HTML character reference representing the character '&'. The ASCII value of '&' is 38 (or 0x26).
Your source code almost certainly consists of ASCII or Unicode text files. Those don't use HTML entities. If you have a string with an ampersand stored in the source code, it'll probably be stored with a bare "&". If there's a string literal somewhere containing actual HTML data, it may contain "&".
When you use some sort of tool or function to convert strings to text ready to put into for an HTML or XML document, any "&" will be (should be!) converted into "&".
When a program that reads HTML documents encounters an ASCII "&", it can assume that that's the beginning of a HTML character reference. This is okay because all ampersands in the actual text should have been converted into "&".
As a somewhat perverse example, if you open your source code in a word processor and save it as an HTML document, you'll find that in the actual file, "&" has been converted into "&" (and "&" has been converted into "&amp;"). If you then open that document in a browser, you'll find that the ampersands are displayed the same way they are when you view your source code in a text editor. The encoding step that happened when you saved the HTML document corresponds to the decoding step that happens when the browser displays it.
If you put something like "Fish & chips" directly into an actual HTML document, your HTML document will be invalid. Complicating the matter is the fact that programs such as browsers tend to try to recover from errors in document and display the documents anyway. As such, your browser may still display "Fish & chips" on the screen when you open your invalid document. However, a program such as the W3C validator, which is specifically meant to discover errors in HTML documents, will notify you that your document is invalid.

Extended charsets chars not reccognized and converting to ? mark

I have a string contain some special char like "\u2012" i.e. FIGURE DASH. When i am trying to print this on console I am getting a '?' mark instead of its symbol. I have an editor where in I can insert the symbol using alt+numpad like alt+2012. In editor it I could see the symbol save it in a xml file and get the value using nodevalue, I get a '?' mark.
To summerize I am facing problem to read extended latin a charset. What i need is When i insert such symbols and read it, i should get something like &#xXXXX;.
Please help!
TIA :)
Simply I have a String inpath = "À";, I want to get its unicode value..like &#xXXXX;
The default console encoding in Windows is some MS-DOS code page and they don't support the character. You can try running chcp 65001 before running the program but you might also need to change the console font as well.
You don't need to do anything you wouldn't do with any other character, as long as you use UTF-8. You aren't doing that in many places. You need to explicitly write in your code to save and read the file in UTF-8, and not rely on the platform default encoding.

Parsing out abnormal characters

I have to work with text that was previously copy/pasted from an excel document into a .txt file. There are a few characters that I assume mean something to excel but that show up as an unrecognised character (i.e. that '?' symbol in gedit, or one of those rectangles in some other text editors.). I wanted to parse those out somehow, but I'm unsure of how to do so. I know regular expressions can be helpful, but there really isn't a pattern that matches unrecognisable characters. How should I set about doing this?
you could work with http://spreadsheet.rubyforge.org/ maybe to read / parse the data
I suppose you're getting these characters because the text file contains invalid Unicode characters, that means your '?'s and triangles could actually be unrecognized multi byte sequences.
If you want to properly handle the spreadsheet contents, i recommend you to first export the data to CSV using (Open|Libre)Office and choosing UTF-8 as file encoding.
https://en.wikipedia.org/wiki/Comma-separated_values
If you are not worried about multi byte sequences I find this regex to be handy:
line.gsub( /[^0-9a-zA-Z\-_]/, '*' )

Resources