Arabic-English Transliteration using unsupported font - utf-8

I am working on language transliteration for Ar and En text.
Here is the link which displays character by character replacement : https://github.com/Shnoulle/Ar-PHP/blob/master/Arabic/data/Transliteration.xml
Now issue is:
I am dealing with font style robert_bold.ttf and robert_regular_0.ttf which has some typical characters with underline and overline as in this snap
I have .ttf file so I can see this fonts on my system. But in my application or in above Transliteration.xml characters are considered as junk like [, } [ etc.
How can I add support of this unsupported characters in Transliteration.xml file?
<pair>
<search>ي</search>
<replace>y</replace>
</pair>
<pair>
<search>ى</search>
<replace>a</replace>
</pair>
<pair>
<search>أ</search>
<replace>^</replace> // Here is one of the character s_ (s with underscore not supported)
</pair>

It seems that the font is not Unicode encoded but contains the underlined letters at some arbitrarily assigned codes. While this works up to a point, it does not work across applications, of course. It works only when that specific font is used.
The proper way is to use correct Unicode characters such as U+1E0F LATIN SMALL LETTER D WITH LINE BELOW “ḏ” and, for rendering, try to find fonts containing it.
An alternative is to use just basic Latin letters with some markup, say <u>d</u>. This means that the text must not be treated as plain text in later processing, and in rendering, the markup should be interpreted as requesting for a line under the letter(s).

Related

Cursor shifted after some spanish accent marks in RStudio editor

When editing some lines of code in RStudio, that have Spanish accents (eg. á, é...) the text I type appears one space before the cursor position. For example, in:
a <- tibble(b = c("01", "02", "03", "04", "05"),
c = c("Amazonas", "Áncash", "Apurímac","Arequipa", "Ayacucho"))
if I place the cursor after the c in "Apurímac" and type an "o", i would get "Apurímaoc" instead of Apurímaco.
I've seen this happen in lines with Spanish accents (e.g. á, é...) and only after the accented characters. Surprisingly, it doesn't seem to happen after capitalized accented characters, like Á in "Áncash". I've tried changing the font in RStudio settings as stated here, here and here with no luck. I suspect it might be related to copying from the clipboard, but I'm not pretty sure about it. Though code runs fine, it's quite annoying.
I'm running RStudio 1.4.1103 on macOS 11.4.
This occurs because RStudio's editor is not able to properly position the cursor in unicode text using joining marks. The example in your case is í, which is made up of the code points:
LATIN SMALL LETTER I
COMBINING ACUTE ACCENT
See https://apps.timwhitlock.info/unicode/inspect?s=i%CC%81 for more details.
Compare this to the NFC normalization of that same character, í, which looks the same but is made up of a single code point:
LATIN SMALL LETTER I WITH ACUTE
See https://apps.timwhitlock.info/unicode/inspect?s=%C3%AD for more details.
Unfortunately, until this is resolved, the best solution is to use the NFC-normalized version of this character; that is, LATIN SMALL LETTER I WITH ACUTE. Or, alternatively, use a unicode escape (e.g. "\u00ed") in place that character.
See also: https://github.com/rstudio/rstudio/issues/8859

Xcode font not printing letters with accent marks correctly

I'm using the font Apple SD Gothic Neo. The letters print fine except when I have one with an accent mark, like ú:
This is not a custom font, and it happens on all font weights. If it makes a difference, I'm pulling the string from Firebase.
Why is this happening and what can I do?
Use a different font.
When a font lacks a glyph, that glyph is substituted from another font, resulting in a typographical mismatch. That’s what’s happening here. You are using a font that is very Unicode-incomplete for Latin alphabet characters. It is intended for Korean! Use a more appropriate font.

ZPL fieldblock ^FB for unicode fonts

I am using ZQ520, its already supports Unicode and I am loading the font as follows:
^XA ^CWZ,E:TT0003M_.FNT^FS^XZ
I can use the font to print Arabic as follows:
^FO100,50^CI28^AZN,0,25^FD ARABIC TEXT HERE ^FS
It works fine but when I use ^FB with ^FO, the Arabic letters get messed up and gets separated (In Arabic, they are connected), here an example:
^FO100,50^FB200,,,R,^CI28^AZN,0,25^FD ARABIC TEXT HERE^FS
so it seems that ^FB does not support the Unicode font. on page 187 of the manual its mention this
The ^FB command does not support complex text. For complex text
support, use ^TB.
And 179
The Field Block (^FB) command cannot support the large TrueType fonts.
Is there a way around this? Because Arabic is right to left, so I am trying to make the text right aligned and multi-line as some strings are long.
I managed to print out word wrapping Arabic text using ^TB using the following code. It may be useful to adapt for your own purposes.
^XA^LRN^CI28^CWZ,E:TT0003M_.FNT^FS
^FO600,10,2
^AZN,50,40
^TBN,600,100
^FH
^FD
arabic text here
^FS
^PQ1
^XZ
Useful links:
TB command (some extra info compared to below link): https://support.zebra.com/cpws/docs/zpl/TB_Command.pdf
Please note that it states the ^TB command must be issued after any ^Ax (font selection) command
ZPL Manual: https://www.zebra.com/content/dam/zebra/manuals/en-us/software/zpl-zbi2-pm-en.pdf

QTextDocument print to pdf and unicode

I try to print pdf file from QTextDocument. Content of document is set by setHtml().
Simplified example:
QTextDocument document;
document.setHtml("<h1>My html \304\205</h1>"); // Octal encoded ą
QPrinter printer(QPrinter::HighResolution);
printer.setPageSize(QPrinter::A4);
printer.setOutputFormat(QPrinter::PdfFormat);
printer.setOutputFileName("cert.pdf");
document.print(&printer);
It does not work as expected on windows (msvc). I get pdf file with "?" in place of most polish characters. It works on ubuntu.
On windows It makes pdf with tahoma font embedded subset. How to force QPrinter or QPrintEngine to embed more characters from this (or any other) font?
As pepe suggested in comments. I needed to wrap this string one of:
QString::fromUtf8
tr() (in case of joining translated parts)
Use html escape sequence (ex. &#261 for ę)
My original html in program was build from tr() parts, but I forgot to octal escape some of them. (which worked on gcc, not on msvc, even with utf-8 with BOM)

How to get glyph unicode representation of Unicode character

Windows use uniscribe library to substitute arabic and indi typed characters based on their location. The new glyph is still have the original unicode of the typed character althogh it has its dedicated representation in Unicode
How to get the Unicode of what is actually displayed not what is typed.
There are lots of tools for this like ICU, Charmap and the rest. I myself recommand http://unicode.codeplex.com, it uses Unicode Character Database to represent characters.
Note that unicode is just some information about characters and never spoke about representation. They just suggest to implement a word just like their example. so that to view each code you need Standard Unicode Font like MS Arial Unicode whichis the largest and the best choise in windows platform.
Most of the characters are implemented in this font but for new characters you need an update for it (if there are such an update) or you can use the font which you know that it implemented your desire characters
Your interpretation of what is happening in Uniscribe is not correct.
Once you have glyphs the original information is gone there is no reliable way to go back to Unicode.
Even without going to Arabic, there is no way to distinguish if the glyph for the fi ligature (for example) comes from 'f' and 'i' (U+0066 U+0069) or from 'fi' (U+FB01).
(http://www.fileformat.info/info/unicode/char/fb01/index.htm)
Also, some of the resulting glyphs do not have a Unicode value associated with them, so there is no "Unicode of what is actually displayed"

Resources