Cursor shifted after some spanish accent marks in RStudio editor - rstudio

When editing some lines of code in RStudio, that have Spanish accents (eg. á, é...) the text I type appears one space before the cursor position. For example, in:
a <- tibble(b = c("01", "02", "03", "04", "05"),
c = c("Amazonas", "Áncash", "Apurímac","Arequipa", "Ayacucho"))
if I place the cursor after the c in "Apurímac" and type an "o", i would get "Apurímaoc" instead of Apurímaco.
I've seen this happen in lines with Spanish accents (e.g. á, é...) and only after the accented characters. Surprisingly, it doesn't seem to happen after capitalized accented characters, like Á in "Áncash". I've tried changing the font in RStudio settings as stated here, here and here with no luck. I suspect it might be related to copying from the clipboard, but I'm not pretty sure about it. Though code runs fine, it's quite annoying.
I'm running RStudio 1.4.1103 on macOS 11.4.

This occurs because RStudio's editor is not able to properly position the cursor in unicode text using joining marks. The example in your case is í, which is made up of the code points:
LATIN SMALL LETTER I
COMBINING ACUTE ACCENT
See https://apps.timwhitlock.info/unicode/inspect?s=i%CC%81 for more details.
Compare this to the NFC normalization of that same character, í, which looks the same but is made up of a single code point:
LATIN SMALL LETTER I WITH ACUTE
See https://apps.timwhitlock.info/unicode/inspect?s=%C3%AD for more details.
Unfortunately, until this is resolved, the best solution is to use the NFC-normalized version of this character; that is, LATIN SMALL LETTER I WITH ACUTE. Or, alternatively, use a unicode escape (e.g. "\u00ed") in place that character.
See also: https://github.com/rstudio/rstudio/issues/8859

Related

Xcode font not printing letters with accent marks correctly

I'm using the font Apple SD Gothic Neo. The letters print fine except when I have one with an accent mark, like ú:
This is not a custom font, and it happens on all font weights. If it makes a difference, I'm pulling the string from Firebase.
Why is this happening and what can I do?
Use a different font.
When a font lacks a glyph, that glyph is substituted from another font, resulting in a typographical mismatch. That’s what’s happening here. You are using a font that is very Unicode-incomplete for Latin alphabet characters. It is intended for Korean! Use a more appropriate font.

Arabic NSString shows different letter order between Xcode debugger and log

I know nothing about Arabic writing, but we need to add support for it.
I'm getting confused about the letter order. As you can see in the screenshot, the order of the characters is different depending on the display method.
In Xcode I also noticed a different letter order in the preview and in the description.
Your screenshots have same letter order, but different layout directions, i.e. the ordering of fragments of text (e.g. words). If you remove the left-to-right fragment 12345 and the letter z, the string will look the same in both cases.
You can learn about the tricky bidirectional text layout in Wikipedia.
In your case, I believe that U+200F right-to-left mark as the first character of your string will fix the problem. Be careful with the editor, though. XCode editor does not support bidirectional text good enough.

Arabic-English Transliteration using unsupported font

I am working on language transliteration for Ar and En text.
Here is the link which displays character by character replacement : https://github.com/Shnoulle/Ar-PHP/blob/master/Arabic/data/Transliteration.xml
Now issue is:
I am dealing with font style robert_bold.ttf and robert_regular_0.ttf which has some typical characters with underline and overline as in this snap
I have .ttf file so I can see this fonts on my system. But in my application or in above Transliteration.xml characters are considered as junk like [, } [ etc.
How can I add support of this unsupported characters in Transliteration.xml file?
<pair>
<search>ي</search>
<replace>y</replace>
</pair>
<pair>
<search>ى</search>
<replace>a</replace>
</pair>
<pair>
<search>أ</search>
<replace>^</replace> // Here is one of the character s_ (s with underscore not supported)
</pair>
It seems that the font is not Unicode encoded but contains the underlined letters at some arbitrarily assigned codes. While this works up to a point, it does not work across applications, of course. It works only when that specific font is used.
The proper way is to use correct Unicode characters such as U+1E0F LATIN SMALL LETTER D WITH LINE BELOW “ḏ” and, for rendering, try to find fonts containing it.
An alternative is to use just basic Latin letters with some markup, say <u>d</u>. This means that the text must not be treated as plain text in later processing, and in rendering, the markup should be interpreted as requesting for a line under the letter(s).

Does Google Chart support UTF-8 Characters?

I have title and labels with unicode labels in Google Chart, but they are not being displayed properly.
Here's an example: http://chart.apis.google.com/chart?chs=300x225&cht=p3&chco=344566,C4C4C4&chds=0,90&chma=70,70&choe=UTF-8&chtt=Test&chd=t:27933485,20611682,34172068&chl=Un%E9%A7%85xbr%E1%83%A6cker|Test1|Test2
Characters do not appear right as you see.
Is there a way to make google charts display utf-8 characters properly? I've tried many things but nothing worked for me.
The problem appears to be the unicode codepoints (E9A785 -> 99C5 and E183A6 -> 10E6) that you are providing. These characters do not appear to be displayed in a google chart. Experiments with other codepoints (specifying them as UTF-8 in the same format as your query) appear to work fine.
The particular characters in your example (the first is from the CJK Unified Ideograms and the second from Georgian) are a little strange. You might want to double check that they are correct.

How to get glyph unicode representation of Unicode character

Windows use uniscribe library to substitute arabic and indi typed characters based on their location. The new glyph is still have the original unicode of the typed character althogh it has its dedicated representation in Unicode
How to get the Unicode of what is actually displayed not what is typed.
There are lots of tools for this like ICU, Charmap and the rest. I myself recommand http://unicode.codeplex.com, it uses Unicode Character Database to represent characters.
Note that unicode is just some information about characters and never spoke about representation. They just suggest to implement a word just like their example. so that to view each code you need Standard Unicode Font like MS Arial Unicode whichis the largest and the best choise in windows platform.
Most of the characters are implemented in this font but for new characters you need an update for it (if there are such an update) or you can use the font which you know that it implemented your desire characters
Your interpretation of what is happening in Uniscribe is not correct.
Once you have glyphs the original information is gone there is no reliable way to go back to Unicode.
Even without going to Arabic, there is no way to distinguish if the glyph for the fi ligature (for example) comes from 'f' and 'i' (U+0066 U+0069) or from 'fi' (U+FB01).
(http://www.fileformat.info/info/unicode/char/fb01/index.htm)
Also, some of the resulting glyphs do not have a Unicode value associated with them, so there is no "Unicode of what is actually displayed"

Resources