C# MVC3 and non-latin characters - asp.net-mvc-3

I have my database results (áéíóúàâêô...) and when I display any of this characters I get codes like:
á
My controller is like this:
ViewBag.EstadosDeAlma = (from e in db.EstadosDeAlma select e.Title).ToList();
My cshtml page is like this:
var data = '#foreach (dynamic item in ViewBag.EstadosDeAlma){ #(item + " ") }';
In addition, if I use any rich text editor as Tiny MCE all non-latin characters are like this too.
What should I do to avoid this problem?

What output encoding are you using on your web pages? I would suggest using UTF-8 since you want a lot of non-ascii characters to work.

I think you should HTML encode/decode the values before comparing them.
Since you are using jQuery you can take advantage of the encoding functions built-in into it. For example:
$('<div/>').html('& #225;gil').html()
gives you "ágil" (notice that I added an extra space between the & and the # so that stackoverflow does not encode it, you won't need it)
This other question has more information about this.
HTML-encoding lost when attribute read from input field

Related

breakable slashes everywhere but URLs

I generate pdf (latex) from restructured text using python sphinx (1.4.6) .
I use narrow table column headers with texts like "stuff/misc/other". I need the slashes to be breakable, so the table headers don't overflow into the next column.
The LaTeX solution is to use \BreakableSlash or \slash where necessary. I can use python code to replace all slashes:
from sphinx.util.texescape import tex_replacements
# \BreakableSlash needs package hyphenat to be loaded
tex_replacements.append((u'/', ur'\BreakableSlash ') )
# tex_replacements.append((u'/', ur'\slash ') )
But that will break any URL like http://www.example.com/ into something like
http:\unhbox\voidb#x\penalty\#M\hskip\z#skip/\discretionary{-}{}{}\penalty\#M\hskip\z#skip\unhbox\voidb#x\penalty\#M\hskip\z#skip/\discretionary{-}{}{}\penalty\#M\hskip\z#skipwww.example.com
or
http:/\penalty\exhyphenpenalty/\penalty\exhyphenpenaltywww.example.com
I'd like to use a general solution that works in both cases, where the editor of the documentation can still use normal ReST and doesn't have to worry about latex.
Any idea how to get classic slashes in URLs and breakable slashes everywhere else?
You have not really given data and source code and only asked for an idea, so I take the liberty of only sketching a solution in pseudo code:
Split the document into a list of strings at each position of a space using .split()
For each string, check whether it is an URL by comparing its left side to http:// (and maybe also ftp://, https:// or similar tags)
Do replacements, but only in strings which are no URLs
Recombine all strings including the spaces again, using a command such as " ".join(my_list)
One way to do it, might be to write a Transform subclass. And then use add transform in setup(app) to use it in every read.
I could use DefaultSubstitutions from transforms.py as template for my own class.

Render non english characters in asciidoctor-pdf

I am trying to write documentation with asciidoctor-pdf and I need to use characters like : ă,â,î,ş,ţ. The pdf output is rendered but the mentioned characters are rendered empty. I am not sure how to handle the issue.
For example:
I wrote this code:
= Document Title
Doc Writer <doc#example.com>
:doctype: book
:source-highlighter: coderay
:listing-caption: Listing
// Uncomment next line to set page size (default is Letter)
//:pdf-page-size: A4
A simple http://asciidoc.org[AsciiDoc] document.
== Introducţie
A paragraph followed by a simple list with square bullets.
And the result was the word Introducţie rendered as Introduc ie and finally the error:
/usr/local/rvm/gems/ruby-2.2.2/gems/pdf-core-0.2.5/lib/pdf/core/pdf_object.rb:55: warning: regexp match /.../n against to UTF-8 string
Can be a system encoding configuration problem?
Do I need to set different encoding configuration in ruby?
Thank you.
I think that if you want to be sure, you can always use the decimal entity references form. For the latin small Letter T with cedilla it is: ţ
Check this table for the complete list:
List of Unicode characters
In addition, if you want to use this special char in a title, there was an issue with it:
Section id with characters outside of Windows-1252 encoding causes warning
It seems to be fixed now, but I did not verify it.
One of possible ways to write such special characters in titles is to declare them in preamble of your asciidoc document, for example,
:t-cedil: ţ
and to call it in the main text
== pass:normal[Test-{t-cedil}]
So your title will look like
Test-ţ

Printing superscript / subscript to zebra printer using ZPL

I'm trying to find a solution to print superscript using ZPL.
Example, if I have this string of ZPL:
string ZPLString =
"^XA" +
"^FO50,50" +
"^A0N50,50" +
"^FDHello, World!^FS" +
"^XZ";
sendToZebraPrinter(ZPLString);
Since there aren't any superscript characters, I could send this to my printer without issue. But if I wanted to use this string:
string ZPLString =
"^XA" +
"^FO50,50" +
"^A0N50,50" +
"^FDe = mc²^FS" +
"^XZ";
sendToZebraPrinter(ZPLString);
The superscript won't print natively. I think I need to access an international character set or something but I'm not sure how to do this, especially if I only need it for the one character. Do I need to change my entire character set, or do some sort of "replace" on it?
Note, we are generating ZPL code manually and shooting it directly at the printers (unfortunately this is our system), bypassing any drivers or 3rd party dev components of any kind.
Mark's answer gave me exactly what I needed to solve my issue. Here is additional information to further clarify the solution:
To use the hex code in your data you need to prefix the ^FD command with ^FH_ (where ^FH tells the printer the data in ^FD will contain hex values and the _ defines the hex code identifier so it knows which data is or is not defined as a hex code instead of standard text)
I got this to work immediately exactly as you mentioned. Then testing against additional printers I found (but not sure why) that I didn't need to actually send in the ^CI13 to specify code page 850. The ² appeared on all printers even when I didn't send the ^CI13
In my .NET application, for some reason the ² didn't map to the correct hex code that the ZPL code page expected (the .NET app converted ² to hex code b2 instead of fd, but for most standard characters converted to the same code as the ZPL map) so in my application I created a conversion table where any character I defined in my table I mapped to the ZPL hex code and any character I didn't define I allowed to remain as converted by the application).
I'd never used information from the non default code page and I didn't realize when using ^FH that you could mix standard text with hex (I thought if you used ^FH that "all" of the information in ^FD had to be hex). So the information Mark provided let me right down the correct path.
The final example to solve the problem, using the information Mark provided, is:
string ZPLString =
"^XA" +
"^FO50,50" +
"^A0N50,50" +
"^FH_" +
"^FDe = mc_fd^FS" +
"^XZ";
sendToZebraPrinter(ZPLString);
Try using ^CI13 to select code page 850, then use _fd in your string for the superscripted 2. The underscore is used to designate a hex character.

How Do Validators Differentiate Between '&' and '&amp'?

Knowing that & is the html entity value of & - how do validators like w3c know this? Even when I look at my source code it's already been parsed into the correct value.
Your question is based on a false premise -- as Co_42 noted, & is not the "ASCII value" of '&'. It's a HTML character reference representing the character '&'. The ASCII value of '&' is 38 (or 0x26).
Your source code almost certainly consists of ASCII or Unicode text files. Those don't use HTML entities. If you have a string with an ampersand stored in the source code, it'll probably be stored with a bare "&". If there's a string literal somewhere containing actual HTML data, it may contain "&".
When you use some sort of tool or function to convert strings to text ready to put into for an HTML or XML document, any "&" will be (should be!) converted into "&".
When a program that reads HTML documents encounters an ASCII "&", it can assume that that's the beginning of a HTML character reference. This is okay because all ampersands in the actual text should have been converted into "&".
As a somewhat perverse example, if you open your source code in a word processor and save it as an HTML document, you'll find that in the actual file, "&" has been converted into "&" (and "&" has been converted into "&amp;"). If you then open that document in a browser, you'll find that the ampersands are displayed the same way they are when you view your source code in a text editor. The encoding step that happened when you saved the HTML document corresponds to the decoding step that happens when the browser displays it.
If you put something like "Fish & chips" directly into an actual HTML document, your HTML document will be invalid. Complicating the matter is the fact that programs such as browsers tend to try to recover from errors in document and display the documents anyway. As such, your browser may still display "Fish & chips" on the screen when you open your invalid document. However, a program such as the W3C validator, which is specifically meant to discover errors in HTML documents, will notify you that your document is invalid.

How do I prevent Turkish letters from dropping when using UIFont in cocos2d?

I'm doing the following to create a label that I use as part of attribution for a photo:
CCLabelTTF *imageSourceLabel = [CCLabelTTF labelWithString:[_organism imageSource] fontName:[[UIFont systemFontOfSize:12] fontName] fontSize:12];
Several of the image sources include Turkish letters. For example, in this URL:
http://commons.wikimedia.org/wiki/File:Şahlûr-33.jpg
This displays improperly in my iPad app; the Turkish letters are missing.
How do I create a label that will work with text like that in the URL above?
Edit:
Nevermind... the problem is with exporting from Excel. See the comments on the answer below. This link provides some additional information: Excel to CSV with UTF8 encoding
Additional Edit:
Actually, it's still a problem, even after I export correctly and verify that I have the proper UTF-8 (or is it 16?) letters in the CSV file. For example, this string:
Dûrzan cîrano / CC BY-SA 3.0
Is displayed like this:
and this string:
Christian Mehlführer / CC-BY 2.5
is displayed like this:
It's definitely being processed improperly upon import, as CCLOG generates the following:
Photo Credit: Dûrzan cîrano / CC BY-SA 3.0
More Info:
Upon import, I'm storing the following value as a string in an array:
"Christian Mehlf\U00c3\U00bchrer / CC-BY 2.5"
Wikipedia says the UTF-8 value for ü, in hex, is C3 BC. It looks like the c3bc is in there, but masked as \U00c3\U00bc.
Is there any way to convert this properly? Or is something fundamentally broken at the CSV import level?
The solution is below.
There were several problems:
Excel on the Mac doesn't export UTF-8 properly. The solution I used was to paste the data into Google Spreadsheet and export from there. More info here: Excel to CSV with UTF8 encoding
I realized that once I had the proper data in the CSV file, that I was importing it with the improper settings. I'm using parseCSV and needed to set _encoding in the -init method to NSUTF8StringEncoding instead of the default, NSISOLatin1StringEncoding.
if you try this:
[CCLabelTTF labelWithString:[[_organism imageSource] stringByUnescapingHTML] fontName:[[UIFont systemFontOfSize:12] fontName] fontSize:12];
it will likely work better. I suspect your url string is escaped HTML.

Resources