Converting Traditional Chinese into Simplified Chinese in Cocoa - macos

A customer asked for a text conversion feature from traditional Chinese to simplified Chinese.
I did a little research and apparently it's the kind of thing that can be automated quite readily and Mac OS X even has a system wide service for doing this.. there seems to be something built into OS X to perform those changes, but I'm no expert at internationalization..
Does anybody know how to perform this miracle?
Best regards,
Frank

I think the functionality you requested exists inside an app that Apple shipped with OS X--I don't know of any built-in functionality.
If you are trying to implement conversion, it looks like wikimedia has some code that might do what you want, but you'll have to convert it from PHP:
http://svn.wikimedia.org/doc/classZhConverter.html
Also, it may not be just a matter of converting the characters, although that's most of it... there are also phrases that are different in mainland Chinese (simplified characters) and other places (traditional characters).

Related

Computer Aided Translation and Xcode

Is it somehow possible to use CAT Tools (Computer Aided Translation) like Swordfish in any senseful way to get i18n done? Copy/pasting strings is error prone and any MS Word is not exactly a pro application for translations.
Any other app/system/format that could work well with XCode for that job?
It looks like AppleGlot is such a tool I looked for. It doesn't translate, but extracts strings and allows incremental localization. It has XLIFF support (as Swordfish, mentioned above).
AppleGlot is available in the developers.apple.com area.

Internationalization English & Indian languages

I am working on a Java application primarily on English but also hope to support Indian languages like Hindi, Telugu and Gujarati. I am wondering what can be a good strategy for this.
I have seen I18N projects in Latin script based languages but Hindi uses Devanagri script so it's a little more complicated.
Has anyone done anything close to this ?
To be honest, the scripts and languages you mentioned are not necessary very common in programming. Since you didn't mention whether it will be desktop or web application, it will be quite hard to give you any advice apart from use latest Java (7) and ICU (49.1 or even 50M2) version.
That's because this releases support newer version of the Unicode Standard, and it will play a role here.
BTW. You might want to know, that the Unicode Consortium is working on better support for Indian scripts. Therefore developing applications with this languages in mind will be easier in the future, for now you will probably struggle a bit.

How will I convert characters? Or other solutions

I found out (though my other question) that my IME outputs Hangul Compatibility Jamo (U+3130 – U+318F) instead of regular Hangul Jamo(U+1100 – U+11FF).
So I tried asking a question in superuser about other IMEs, no replies yet.
Should I just convert it myself? What exactly does that entail? Is it too complicated? Any ideas on how to? Any help would be appreciated.
Language: Delphi
OS: WinXP
IME: Korean Input System (IME 2002)
There is no reason you could not write an interesting experimental editor control with its own built in Unicode Compose feature. However, before you did that, you might look for a way to change the configuration of the IME. This seems to be a really interesting corner-case you have to work with. I was already surprised about your other question - that Windows has the ability to handle Raw Input from keyboards.
I found that source code for something that says it is the Korean IME is available for Windows CE. You might learn something by studying it, even though it is for Windows CE rather than XP.
http://msdn.microsoft.com/en-us/library/ee491900.aspx

Ruby works well with Unicode character in Filenames on Mac OS X and on Linux, but why to make it work on Windows, it took at least 2 years?

Ruby works well with Unicode character in File Path and Filenames on Mac OS X and on Linux, but why to make it work on Windows, it took more than 2 years?
I was just looking at Google Code Jam. People are solving non-trivial problems within a few hours. At work, I can imagine solving a filename or path issue having unicode characters even if we need to write it in the standard library to be solvable within a day or two, or a few days, or 1 or 2 weeks? But 2 years?
What might be a reason? I think Mac OS X and Linux might work as it was because they were using UTF-8, and a lot of ASCII program code can work well with UTF-8 without any modification.
Windows might be returning the filenames or path in UTF-16, so it is more complicated, but there are functions to convert UTF-16 to UTF-8 and vice versa, so isn't it a fairly solvable problem?
It sure is a solvable problem, but I think no one in the core team is using Windows for development. For such topics the OSX/Linux/BSD/... solution is available quickly as in most cases it is just one solution for all this platforms and it is those platforms that are mainly used by the core developers and people close the core (i.e. willing to come up with a fix and offer support). Also, keep in mind that Ruby's main use case is for web apps, and, at least in Ruby land, it's rather uncommon to use Windows for deployment.
For aiding desktop/console applications Ruby is only popular on OSX, as I see it. On Linux Python is rather dominant in this area and on Windows there is no such thing (maybe VBScript, though), as you often don't have small applications interacting with each other (console programms, pipes, KISS principle, UNIX principle, all not very common on Windows, you have to write a service for anything and so on). But I cannot really judge that, as I haven't used Windows in years. Therefore you only have a handful of people really having this issue. And if no one of these people is willing to fix the issue, it takes two years.
Because you need to have a huge layer between OS and your program to do every trivial operation.
Even fopen() does not work with UTF-8 on Windows. In other words, the reason is that,
Windows Unicode API is... crap (sorry all Windows developers)
So supporting Unicode on windows is very hard, while all other OSes live happily with UTF-8.

Pros and cons of using gettext instead of QObject.tr() for localization of PyQt4 application?

I have couple of application written in PyQt4 where I've used standard Python gettext library for internationalization and localization of GUI. It works good for me. But I've selected gettext just because I've already had knowledge and experience of gettext usage, and zero of experience with Qt4 tr() approach.
Now I'd like to better compare both approaches and understand what I'm missing by using gettext instead of QObject.tr, and does there any serious reason why I should not use gettext for Qt4/PyQt4 applications?
In my understanding advantages of using gettext are:
GNU gettext is mature and it seems to be standard de-facto in GNU/Linux world.
There is enough special editors for PO files to simplify translators work, although textual nature of PO templates makes it not strictly necessary.
There is even web services available which can be used for collaborative translations.
gettext is standard Python library, so I don't need to install anything special to use it in runtime.
It has very good support for singular/plural forms selection via ngettext().
What I see as advantages of QObject.tr():
This is native technology for Qt4/PyQt4 so maybe it will work better/faster (although I have no data to prove).
The messages to translate may have additional context information which will help translators to choose the best variants for homonym words, e.g. the english word "Letter" can be translates as "Character", "Mail" or even kind of "Paper size" depending on the actual context.
What I see as disadvantages of QObject.tr() vs gettext:
I did not found in the Qt documentation how's supported singular/plural selection there.
Qt4 TS translation template is in XML format and therefore more complex to edit without special editor (QT Linguist) and it seems there is no other third-party solutions or web services. So it would require for translators to learn new tool (if they are already familiar with PO tools).
But all the items above are not critical enough to clearly say that any tool is better of other. And I don't want to start flame war about what is better because it's very subjective. I just want to know what I missing as pros and cons of QObject.tr() vs gettext.
One simple reason to use QObject.tr() is:
It saves you the need to install gettext on Windows, making cross-platform work a bit easier.
I try to have as little binary dependencies as possible on Windows.
All have their pros and cons, but to define them more clearly you would have to define first if you're targeting a mobile environment or a desktop environment.
Within our company we use different methods simply because the ideal solution does not exist yet.
For desktop development we're using PO files simply because the buttons are not scaled and therefore text will fit.
For mobile development, the translation of a string depends on the button size which could be different on landscape and portrait devices.
So this complicates it a little because a PO file can just have 1 translation of a certain word.
So we selected XLIFF for this, so we could assign unique ID's to a string.
This is not an easy task as well, because there are no good solutions to convert .RC files to XLIFF files.
(Because current tools convert ALL strings between "" which is of course unwanted behavior).
So I wrote a converter for this task.
However, when thinking of localization, then plural forms are very important so not having this is not a good localization solution.
Therefore, I would say to go for PO gettext.
Greetings,
Floris.
At the current time, Qt does not handle plural forms when you're making use of QT_TRANSLATE_NOOP
You could add that args are managed differently...
With Gettext, we can do
_("Hello %(name)s from %(city)s") % {person.__dict__}
whereas in PyQt, we do
self.tr("Hello %1 from %2").arg(person.name).arg(person.city)

Resources