How do I prevent Turkish letters from dropping when using UIFont in cocos2d? - internationalization

I'm doing the following to create a label that I use as part of attribution for a photo:
CCLabelTTF *imageSourceLabel = [CCLabelTTF labelWithString:[_organism imageSource] fontName:[[UIFont systemFontOfSize:12] fontName] fontSize:12];
Several of the image sources include Turkish letters. For example, in this URL:
http://commons.wikimedia.org/wiki/File:Şahlûr-33.jpg
This displays improperly in my iPad app; the Turkish letters are missing.
How do I create a label that will work with text like that in the URL above?
Edit:
Nevermind... the problem is with exporting from Excel. See the comments on the answer below. This link provides some additional information: Excel to CSV with UTF8 encoding
Additional Edit:
Actually, it's still a problem, even after I export correctly and verify that I have the proper UTF-8 (or is it 16?) letters in the CSV file. For example, this string:
Dûrzan cîrano / CC BY-SA 3.0
Is displayed like this:
and this string:
Christian Mehlführer / CC-BY 2.5
is displayed like this:
It's definitely being processed improperly upon import, as CCLOG generates the following:
Photo Credit: Dûrzan cîrano / CC BY-SA 3.0
More Info:
Upon import, I'm storing the following value as a string in an array:
"Christian Mehlf\U00c3\U00bchrer / CC-BY 2.5"
Wikipedia says the UTF-8 value for ü, in hex, is C3 BC. It looks like the c3bc is in there, but masked as \U00c3\U00bc.
Is there any way to convert this properly? Or is something fundamentally broken at the CSV import level?
The solution is below.

There were several problems:
Excel on the Mac doesn't export UTF-8 properly. The solution I used was to paste the data into Google Spreadsheet and export from there. More info here: Excel to CSV with UTF8 encoding
I realized that once I had the proper data in the CSV file, that I was importing it with the improper settings. I'm using parseCSV and needed to set _encoding in the -init method to NSUTF8StringEncoding instead of the default, NSISOLatin1StringEncoding.

if you try this:
[CCLabelTTF labelWithString:[[_organism imageSource] stringByUnescapingHTML] fontName:[[UIFont systemFontOfSize:12] fontName] fontSize:12];
it will likely work better. I suspect your url string is escaped HTML.

Related

How to make Zend Form element attributes encode HTML elements using correct encoding?

When using Zend\Form\Element\Select option that contains HTML entities, how do I encode it correctly?
Try 1:
I pass in 90°, I see it unconverted (same as 90°) in my HTML select box, instead of the expected degree symbol (°)
Try 2:
I use ° directly in my label name, I see this: 90�
Zend Code
Chasing the Zend Form code appears to yield these lines:
https://github.com/zendframework/zend-form/blob/master/src/View/Helper/AbstractHelper.php#L248,
where $escape is the $this->getEscapeHtmlHelper() method.
and actual conversion happens here:
https://github.com/zendframework/zend-escaper/blob/master/src/Escaper.php#L369
Re-saving the source file that contained the degree symbol directly (°) using UTF-8 encoding seems to have done the job.
(Before it was encoded using ANSI)
Incidentally that degree symbol now shows up like this in my IDE: °

Render non english characters in asciidoctor-pdf

I am trying to write documentation with asciidoctor-pdf and I need to use characters like : ă,â,î,ş,ţ. The pdf output is rendered but the mentioned characters are rendered empty. I am not sure how to handle the issue.
For example:
I wrote this code:
= Document Title
Doc Writer <doc#example.com>
:doctype: book
:source-highlighter: coderay
:listing-caption: Listing
// Uncomment next line to set page size (default is Letter)
//:pdf-page-size: A4
A simple http://asciidoc.org[AsciiDoc] document.
== Introducţie
A paragraph followed by a simple list with square bullets.
And the result was the word Introducţie rendered as Introduc ie and finally the error:
/usr/local/rvm/gems/ruby-2.2.2/gems/pdf-core-0.2.5/lib/pdf/core/pdf_object.rb:55: warning: regexp match /.../n against to UTF-8 string
Can be a system encoding configuration problem?
Do I need to set different encoding configuration in ruby?
Thank you.
I think that if you want to be sure, you can always use the decimal entity references form. For the latin small Letter T with cedilla it is: ţ
Check this table for the complete list:
List of Unicode characters
In addition, if you want to use this special char in a title, there was an issue with it:
Section id with characters outside of Windows-1252 encoding causes warning
It seems to be fixed now, but I did not verify it.
One of possible ways to write such special characters in titles is to declare them in preamble of your asciidoc document, for example,
:t-cedil: ţ
and to call it in the main text
== pass:normal[Test-{t-cedil}]
So your title will look like
Test-ţ

Extended charsets chars not reccognized and converting to ? mark

I have a string contain some special char like "\u2012" i.e. FIGURE DASH. When i am trying to print this on console I am getting a '?' mark instead of its symbol. I have an editor where in I can insert the symbol using alt+numpad like alt+2012. In editor it I could see the symbol save it in a xml file and get the value using nodevalue, I get a '?' mark.
To summerize I am facing problem to read extended latin a charset. What i need is When i insert such symbols and read it, i should get something like &#xXXXX;.
Please help!
TIA :)
Simply I have a String inpath = "À";, I want to get its unicode value..like &#xXXXX;
The default console encoding in Windows is some MS-DOS code page and they don't support the character. You can try running chcp 65001 before running the program but you might also need to change the console font as well.
You don't need to do anything you wouldn't do with any other character, as long as you use UTF-8. You aren't doing that in many places. You need to explicitly write in your code to save and read the file in UTF-8, and not rely on the platform default encoding.

C# MVC3 and non-latin characters

I have my database results (áéíóúàâêô...) and when I display any of this characters I get codes like:
á
My controller is like this:
ViewBag.EstadosDeAlma = (from e in db.EstadosDeAlma select e.Title).ToList();
My cshtml page is like this:
var data = '#foreach (dynamic item in ViewBag.EstadosDeAlma){ #(item + " ") }';
In addition, if I use any rich text editor as Tiny MCE all non-latin characters are like this too.
What should I do to avoid this problem?
What output encoding are you using on your web pages? I would suggest using UTF-8 since you want a lot of non-ascii characters to work.
I think you should HTML encode/decode the values before comparing them.
Since you are using jQuery you can take advantage of the encoding functions built-in into it. For example:
$('<div/>').html('& #225;gil').html()
gives you "ágil" (notice that I added an extra space between the & and the # so that stackoverflow does not encode it, you won't need it)
This other question has more information about this.
HTML-encoding lost when attribute read from input field

InstallShield 2011 error 7185 importing Japanese strings in the string table of basic MSI project

I am trying to import Japanese strings inside my "Basic MSI" project, it use to work before without any issues but now when I try to import some Japanese strings from a text file then it throws following error (I have changed some of the personal data from the error message.)
ISDEV : error -7185: The Japanese: 日本語 translation for string identifier IDS_XXXX_1111 includes characters that are not available on code page 932.
I think there are some of the characters inside the IDS_XXXX_1111 are not part of code page 932. How to detect those characters using some tool?
Also documentation mentions about changing some encoding settings to UTF-8 in InstallShield 2011, if you are aware then please guide me.
Thanks in advance
Rahul
My favorite way to detect such characters is with python. For example, reading a file like the InstallShield string tables in python 2.x:
import codecs
strings = codecs.open("strings.txt", "r", "UTF-16"):
for line in strings.readlines():
line = line.strip()
try:
line.encode("cp932")
except UnicodeError:
print "Can't encode: " + line.encode("cp932", "replace")
Your alternatives are to pinpoint the characters that cannot be represented on the relevant code page and replace them with ones that can, or to go to the Releases view and select yes for the Build UTF-8 Database setting.

Resources