QTextDocument print to pdf and unicode - windows

I try to print pdf file from QTextDocument. Content of document is set by setHtml().
Simplified example:
QTextDocument document;
document.setHtml("<h1>My html \304\205</h1>"); // Octal encoded ą
QPrinter printer(QPrinter::HighResolution);
printer.setPageSize(QPrinter::A4);
printer.setOutputFormat(QPrinter::PdfFormat);
printer.setOutputFileName("cert.pdf");
document.print(&printer);
It does not work as expected on windows (msvc). I get pdf file with "?" in place of most polish characters. It works on ubuntu.
On windows It makes pdf with tahoma font embedded subset. How to force QPrinter or QPrintEngine to embed more characters from this (or any other) font?

As pepe suggested in comments. I needed to wrap this string one of:
QString::fromUtf8
tr() (in case of joining translated parts)
Use html escape sequence (ex. &#261 for ę)
My original html in program was build from tr() parts, but I forgot to octal escape some of them. (which worked on gcc, not on msvc, even with utf-8 with BOM)

Related

Interpreting a text character copied from a website and its format

I'm curious as to how this works from a low-level point of view.
I understand that computers deal with text characters using Ascii code, or unicode.
For example, just now I copied a '€' character symbol from a website to put in an email because the character is not on my keyboard.
How does Windows store this character? as a unique integer identifying this character? When I paste this character into an email or word document, even it preserves its text format.
How does the email editor or word application know how to translate what I copied with exact same format? What if where I copied the character from, it was using its own special type of character-encoding, would it translate to the wrong character then when I pasted it in an email.

ZPL fieldblock ^FB for unicode fonts

I am using ZQ520, its already supports Unicode and I am loading the font as follows:
^XA ^CWZ,E:TT0003M_.FNT^FS^XZ
I can use the font to print Arabic as follows:
^FO100,50^CI28^AZN,0,25^FD ARABIC TEXT HERE ^FS
It works fine but when I use ^FB with ^FO, the Arabic letters get messed up and gets separated (In Arabic, they are connected), here an example:
^FO100,50^FB200,,,R,^CI28^AZN,0,25^FD ARABIC TEXT HERE^FS
so it seems that ^FB does not support the Unicode font. on page 187 of the manual its mention this
The ^FB command does not support complex text. For complex text
support, use ^TB.
And 179
The Field Block (^FB) command cannot support the large TrueType fonts.
Is there a way around this? Because Arabic is right to left, so I am trying to make the text right aligned and multi-line as some strings are long.
I managed to print out word wrapping Arabic text using ^TB using the following code. It may be useful to adapt for your own purposes.
^XA^LRN^CI28^CWZ,E:TT0003M_.FNT^FS
^FO600,10,2
^AZN,50,40
^TBN,600,100
^FH
^FD
arabic text here
^FS
^PQ1
^XZ
Useful links:
TB command (some extra info compared to below link): https://support.zebra.com/cpws/docs/zpl/TB_Command.pdf
Please note that it states the ^TB command must be issued after any ^Ax (font selection) command
ZPL Manual: https://www.zebra.com/content/dam/zebra/manuals/en-us/software/zpl-zbi2-pm-en.pdf

Arabic-English Transliteration using unsupported font

I am working on language transliteration for Ar and En text.
Here is the link which displays character by character replacement : https://github.com/Shnoulle/Ar-PHP/blob/master/Arabic/data/Transliteration.xml
Now issue is:
I am dealing with font style robert_bold.ttf and robert_regular_0.ttf which has some typical characters with underline and overline as in this snap
I have .ttf file so I can see this fonts on my system. But in my application or in above Transliteration.xml characters are considered as junk like [, } [ etc.
How can I add support of this unsupported characters in Transliteration.xml file?
<pair>
<search>ي</search>
<replace>y</replace>
</pair>
<pair>
<search>ى</search>
<replace>a</replace>
</pair>
<pair>
<search>أ</search>
<replace>^</replace> // Here is one of the character s_ (s with underscore not supported)
</pair>
It seems that the font is not Unicode encoded but contains the underlined letters at some arbitrarily assigned codes. While this works up to a point, it does not work across applications, of course. It works only when that specific font is used.
The proper way is to use correct Unicode characters such as U+1E0F LATIN SMALL LETTER D WITH LINE BELOW “ḏ” and, for rendering, try to find fonts containing it.
An alternative is to use just basic Latin letters with some markup, say <u>d</u>. This means that the text must not be treated as plain text in later processing, and in rendering, the markup should be interpreted as requesting for a line under the letter(s).

UTF-8 but still not showing ÆØÅ (danish chars)

Take a look at this:
http://thebekker.dk/_skole/GFeksamen/
You can see the 2nd menu item show some weird sign, instead of "Ø"
Ive set utf-8 in meta, and even tryed with AddDefaultCharset UTF-8 in .htaccess...
Still no result, if i change to ISO-8859-1 which works fine, but that makes problem when i start making ajax calls for content...
I dont get it?
How do i get it to use UTF-8 and show ÆØÅ
If you declare that your content is encoded in UTF-8 with the meta tags or default charset, then your content needs to be actually encoded in UTF-8. The fact that it shows correctly when declaring your content to be encoded in ISO-8859 means that your content is actually encoded in ISO-8859. Save your source code file as UTF-8 or otherwise make sure that your content is UTF-8 encoded.
Saving the source file in "Western European (Windows)" in EditPlus text editor did it for me + in PHP I used utf8_encode.
you can set this characters with unicode like € or so many others. In my company we work with many translations and languages like france, that has many special chars.
set your website encoding type to utf-8 and use encodings like utf8_encode in php
or manually: http://www.sql-und-xml.de/unicode-database/online-tools/

How to get glyph unicode representation of Unicode character

Windows use uniscribe library to substitute arabic and indi typed characters based on their location. The new glyph is still have the original unicode of the typed character althogh it has its dedicated representation in Unicode
How to get the Unicode of what is actually displayed not what is typed.
There are lots of tools for this like ICU, Charmap and the rest. I myself recommand http://unicode.codeplex.com, it uses Unicode Character Database to represent characters.
Note that unicode is just some information about characters and never spoke about representation. They just suggest to implement a word just like their example. so that to view each code you need Standard Unicode Font like MS Arial Unicode whichis the largest and the best choise in windows platform.
Most of the characters are implemented in this font but for new characters you need an update for it (if there are such an update) or you can use the font which you know that it implemented your desire characters
Your interpretation of what is happening in Uniscribe is not correct.
Once you have glyphs the original information is gone there is no reliable way to go back to Unicode.
Even without going to Arabic, there is no way to distinguish if the glyph for the fi ligature (for example) comes from 'f' and 'i' (U+0066 U+0069) or from 'fi' (U+FB01).
(http://www.fileformat.info/info/unicode/char/fb01/index.htm)
Also, some of the resulting glyphs do not have a Unicode value associated with them, so there is no "Unicode of what is actually displayed"

Resources