Encoding issues with Microsoft Word characters in an AJAX request - ajax

I'm writing a function to convert MS Word-styled text into Adobe InDesign-formatted text (it uses a kind of XML to indicate styling). The text is pasted into a TinyMCE rich text editor, which then sends the HTML-formatted code to a php function.
I've tried this function to clean up the code once it reaches my conversion code:
$text = iconv("windows-1250", "UTF-8", $html);
When I use any 'special' kind of characters, things go wrong. £ signs, é (or any other accents), and a variety of 'curly' apostrophes/quote marks seem to break things. For example, if I try to convert a £ sign, the code returns \u0141, but I get the Ł symbol displayed onscreen when the function returns.
Does anybody know what I can do to prevent Word's weird characters breaking everything I'm doing?

I seem to have fixed this. I was using escape() to pass the values, but replaced this with encodeURIComponent() instead (and removed the iconv() call in my php code), which seems to have fixed it.

Related

Special character ok in HTML, not ok in PDF

The below code in the template takes care that these special characters are printed ok when printed in HTML
?replace("≥", "≥")?replace("≤", "≤")
The result is:
Test special characters:
Greater_equal ≥
Less_equal ≤
When I change the output type from HTML to PDF, these characters are not printed anymore:
Test special characters:
Greater_equal
Less_equal
How canI make this work with PDF as output type?
For the one that could encounter same issue
I've had this issue recently and what I've found out is for some fonts there is not character set so it doesn't show greater than equal or other symbols.
However I've tried with Arial font and it's working even if you've added it like this: '≥'
Long story short, solution would be to update the font with the one that support.

Pasting Arabic text to CKEditor numbers get changed to English rather than staying in Arabic

I'm pasting Arabic text from Microsoft Word to CKEditor. It comes over OK apart from numbers which should remain as Arabic/Hindi numerals: ١٢٣٤٥٦٧٨٩٠
but instead come out as Roman/English: 0123456790.
Is there a way to stop this from happening or work around it somehow? E.g. a plugin, patch or something else?
I'm using CKEditor 4.17.1
Thanks

Is it possible to render zero-width unicode characters as a special replacement character with a custom font?

I'm trying to figure out a way to render certain unicode characters as a custom character instead of how they are supposed to appear.
For example, I would like the character U+0E4A to render as something else rather than how it currently appears in Windows.
I tried to create a quick custom font and to replace those glyphs but it only seems to work in some programs. My font will work correctly in LibreOffice Writer but it won't display properly in WordPad. Replacing regular letters works fine, but for other unicode characters they seem to revert back to a default way of rendering and don't display correctly.
Here is a screenshot of my custom font in WordPad, as you can see I made an obvious edit to the B character but I also did the same to the U+0E4A code point and yet it renders as normal.
If there is a special font that already does this that would probably save me the time of making a custom font, but either way I can't figure out how to render these characters as a custom character.

Elixir: getting gibberish for single and double quotes when using Earmark to render markdown in Windows 10

I ran into a problem while I was working with Earmark, an Elixir library for rendering markdown, on Windows 10. Whenever the text contained single or double quotes, the rendered markup appeared as gibberish on the console or in the rendered html file.
What makes no sense is that I can use single and double quotes and they render correctly on the console. It only becomes gibberish once Earmark processes the text.
With the help of a number of people in the Elixir community, I was able to find two solutions to the problem.
The problem stems from Elixir's default use of utf-8 as the basic encoding for strings. This in itself is not a problem. It becomes a problem because Windows' console subsystem has a poor handling of utf-8.
Earmark uses smartypants to transform single and double quotes into curly single and double quotes. This is where Windows' console gets confused and prints gibberish.
Solutions:
For the html rendering
The best solution here is to add utf-8 encoding in the template for the final html page.
<meta charset="utf-8" />
If you don't care about smart quotes, then you can also call Earmark with the smartypants option set to false to avoid using it
Earmark.as_html(markdown, %Earmark.Options{smartypants: false})
For the console
Here you need to set the the console font to Lucida and run the command below , according to this question on stackoverflow
chcp 65001
I used it and it worked on my Windows 10 machine.
Notes: Thanks for Iinspectable for correcting a statement on Windows and utf-8 :)

angular-translate and escaped characters

I have an app that uses AngularJS along with angular-translate to provide localization. The app currently uses only English and German.
On the login page is a required field, an email. If there is an error, the app displays "A valid email is required" in English.
In German (and forgive me if this is mangled, this is Google Translate, I don't know any German) the phrase is "Eine gültige E-Mail erforderlich".
In the second word, you'll notice an international character, it looks like a "u" with two little dots over it. When the app is set to display in German, that character gets escaped and much weirdness happens on the screen.
Looking that the docs, it seems like using $translateProvider.useSanitizeValueStrategy() is supposed to handle this, but it's not. If I use $translateProvider.useSanitizeValueStrategy('escaped') then it look like this onscreen:
If use $translateProvider.useSanitizeValueStrategy('sanitize')(which I'd really prefer of course) then it looks like this:
I also happened to come across this article which states that my *.js translation file needs to be UTF-8 encoded. I opened up that file in NotePad++, changed the encoding to UTF-8 Without BOM and saved it, but I'm still seeing the error. And VS really hates the file now.
I know, it's a little late, but maybe others have similar Problems:
Adressing the UI:
Are you using the attribute style e.g.
<span translate="key"><span>
or inline style
<span>{{key | translate}}</span>
in your view?
I am working with the second style without issues.
Addressing your Problem with UTF-8:
I am not using Visual Studio nor Notepad++, so I don't know how Notepad++ handles the conversion. Possibly it does not convert the characters at all, but only changes the file to be seen as UTF-8.
Sublime Text 2 (1), on the other hand, offers you to 'Save with Encoding', which converts all characters accordingly. I stressed this conversion pretty much, so that I can recommend this approach with a clean conscience.
(1) I have no relations to Sublime Text, this is not meant be any form of commercial advertisement

Resources