Blackberry - Sorting with accents - sorting

I am trying to sort a SimpleSortingVector in BlackBerry that contains french accents. The sorting puts the items with accents at the very end of the list. How do I sort in blackberry that will put the accented characters with unaccented characters. Collator doesn't seem to work because I believe I am building for too low of a JRE version. I'm building for JRE 4.5.0 minimum.
ie
É = E
here is how I sort the vector:
ssv.setSortComparator(new Comparator()
{
public int compare(Object obj1, Object obj2)
{
String value = ((Item) obj1).getText();
String otherValue = ((Item) obj2).getText();
return value.compareTo(otherValue);
}
});
ssv.reSort();
Thanks,
DMan

OS 4.5 is a challenge. For OS 7, RIM added a string comparer to StringUtilities that can be configured the way you want:
StringUtilities.compare(String aString1, int aOffset1, int aLength1,
String aString2, int aOffset2, int aLength2,
int aLevel, int aLocale, int aFlags, int aFlagsMask)
Unfortunately, I am not aware of any built-in solutions for earlier versions of BBOS. You can build your own sorting table for French characters, and write a custom comparer if you only need to support French. If you're looking for global compatibility, that will get tedious though.

Related

Generating nice looking BETA keys

I built a web application that is going to launch a beta test soon. I would really like to hand out beta invites and keys that look nice.
i.e. A3E6-7C24-9876-235B
This is around 16 character, hexadecimal digits.
It looks like the typical beta key you might see.
My question is what is a standard way to generate something like this and make sure that it is unique and that it will not be easy for someone to guess a beta key and generate their own.
I have some ideas that would probably work for beta keys:
MD5 is secure enough for this, but it is long and ugly looking and could cause confusion between 0 and O, or 1 and l.
I could start off with a large hexadecimal number that is 16 digits in length. To prevent people from guessing what the next beta key might be increment the value by a random number each time. The range of numbers between 1111-1111-1111-1111 and eeee-eeee-eeee-eeee will have plenty of room to spare even if I am skipping large quantities of numbers.
I guess I am just wondering if there is a standard way for doing this that I am not finding with google. Is there a better way?
The canonical "unique identifying number" is a uuid. There are various forms - you can generate one from random numbers (version 4) or from a hash of some value (user's email + salt?) (versions 3 and 5), for example.
Libraries for java, python and a bunch more exist.
PS I have to add that when I read your question title I thought you were looking for something cool and different. You might consider using an "interesting" word list and combining words with hyphens to encode a number (based on hash of email + salt). That would be much more attractive imho: "your beta code is secret-wombat-cookie-ninja" (I'm sure I read an article describing an example, but I can't find it now).
One way (C# but the code is simple enough to port to other languages):
private static readonly Random random = new Random(Guid.NewGuid().GetHashCode());
static void Main(string[] args)
{
string x = GenerateBetaString();
}
public static string GenerateBetaString()
{
const string alphabet = "ABCDEF0123456789";
string x = GenerateRandomString(16, alphabet);
return x.Substring(0, 4) + "-" + x.Substring(4, 4) + "-"
+ x.Substring(8, 4) + "-" + x.Substring(12, 4);
}
public static string GenerateRandomString(int length, string alphabet)
{
int maxlen = alphabet.Length;
StringBuilder randomChars = new StringBuilder(length);
for (int i = 0; i < length; i++)
{
randomChars.Append(alphabet[random.Next(0, maxlen)]);
}
return randomChars.ToString();
}
Output:
97A8-55E5-C6B8-959E
8C60-6597-B71D-5CAF
8E1B-B625-68ED-107B
A6B5-1D2E-8D77-EB99
5595-E8DC-3A47-0605
Doing this way gives you precise control of the characters in the alphabet. If you need crypto strength randomness (unlikely) use the cryto random class to generate random bytes (possibly mod the alphabet length).
Computing power is cheap, take your idea of the MD5 and run an "aesthetic" of your own devising over the set. The code below generates 2000 unique keys almost instantaneously that do not have a 0,1,L,O character in them. Modify aesthetic to fit any additional criteria:
import random, hashlib
def potential_key():
x = random.random()
m = hashlib.md5()
m.update(str(x))
s = m.hexdigest().upper()[:16]
return "%s-%s-%s-%s" % (s[:4],s[4:8],s[8:12],s[12:])
def aesthetic(s):
bad_chars = ["0","1","L","O"]
for b in bad_chars:
if b in s: return False
return True
key_set = set()
while len(key_set) < 2000:
k = potential_key()
if aesthetic(k):
key_set.add(k)
print key_set
Example keys:
'4297-CAC6-9DA8-625A', '43DD-2ED4-E4F8-3E8D', '4A8D-D5EF-C7A3-E4D5',
'A68D-9986-4489-B66C', '9B23-6259-9832-9639', '2C36-FE65-EDDB-2CF7',
'BFB6-7769-4993-CD86', 'B4F4-E278-D672-3D2C', 'EEC4-3357-2EAB-96F5',
'6B69-C6DA-99C3-7B67', '9ED7-FED5-3CC6-D4C6', 'D3AA-AF48-6379-92EF', ...

Find a Global Atom from a partial string

I can create an Global Atom using GlobalAddAtom and I can find that atom again using GlobalFindAtom if I already know the string associated with the atom. But is there a way to find all atoms whose associated string matches a given partial string?
For example, let's say I have an atom whose string is "Hello, World!" How can I later find that atom by searching for just "Hello"?
Unfortunately, the behavior you're describing is not possible for Atom Tables. This is because Atom Tables in Windows are basically Hash Tables, and the mapping process handles strings in entirety and not by parts.
Of course, it almost sounds like it would be possible, as quoted from the MSDN documentation:
Applications can also use local atom tables to save time when searching for a particular string. To perform a search, an application need only place the search string in the atom table and compare the resulting atom with the atoms in the relevant structures. Comparing atoms is typically faster than comparing strings.
However, they are referring to exact matches. This limitation probably seems dated compared to what is possible with resources currently available to software. However, Atoms have been available as far back as Win16 and in those times, this facility allowed a means for applications to manage string data effectively in minimal memory. Atoms are still used now to manage window class names, and still provide decent benefits in reducing the footprint of multiple stored copies of strings.
If you need to store string data efficiently and to be able to scan by partial starting matches, a Suffix Tree is likely to meet or exceed your needs.
It actually can be done, but only through scanning them all. In LINQPad 5 this can be done in 0.025 seconds on my machine, so it is quite fast. Here is an example implementation:
void Main()
{
const string atomPrefix = "Hello";
const int bufferSize = 1024;
ushort smallestAtomIndex = 0XC000;
var buffer = new StringBuilder(bufferSize);
var results = new List<string>();
for (ushort atomIndex = smallestAtomIndex; atomIndex < ushort.MaxValue; atomIndex++)
{
var resultLength = GlobalGetAtomName(atomIndex, buffer, bufferSize);
if (buffer.ToString().StartsWith(atomPrefix))
{
results.Add($"{buffer} - {atomIndex}");
}
buffer.Clear();
}
results.Dump();
}
[DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
public static extern uint GlobalGetAtomName(ushort atom, StringBuilder buffer, int size);

LINQ and CASE Sensitivity

I have this LINQ Query:
TempRecordList = new ArrayList(TempRecordList.Cast<string>().OrderBy(s => s.Substring(9, 30)).ToArray());
It works great and performs sorting in a way that's accurate but a little different from what I want. Among the the result of the query I see something like this:
Palm-Bouter, Peter
Palmer-Johnson, Sean
Whereas what I really need is to have names sorted like this:
Palmer-Johnson, Sean
Palm-Bouter, Peter
Basically I want the '-' character to be treated as being lower than the character so that names that contain it show up later in an ascending search.
Here is another example. I get:
Dias, Reginald
DiBlackley, Anton
Instead of:
DiBlackley, Anton
Dias, Reginald
As you can see, again, the order is switched due to how the uppercase letter 'B' is treated.
So my question is, what do I need to change in my LINQ query to make it return results in the order I specified. Any feedback would be greatly appreaciated.
By the way, I tried using s.Substring(9, 30).ToLower() but that didn't help.
Thank you!
To customize the sorting order you will need to create a comparer class that implements IComparer<string> interface. The OrderBy() method takes comparer as second parameter.
internal sealed class NameComparer : IComparer<string> {
private static readonly NameComparer DefaultInstance = new NameComparer();
static NameComparer() { }
private NameComparer() { }
public static NameComparer Default {
get { return DefaultInstance; }
}
public int Compare(string x, string y) {
int length = Math.Min(x.Length, y.Length);
for (int i = 0; i < length; ++i) {
if (x[i] == y[i]) continue;
if (x[i] == '-') return 1;
if (y[i] == '-') return -1;
return x[i].CompareTo(y[i]);
}
return x.Length - y.Length;
}
}
This works at least with the following test cases:
var names = new[] {
"Palmer-Johnson, Sean",
"Palm-Bouter, Peter",
"Dias, Reginald",
"DiBlackley, Anton",
};
var sorted = names.OrderBy(name => name, NameComparer.Default).ToList();
// sorted:
// [0]: "DiBlackley, Anton"
// [1]: "Dias, Reginald"
// [2]: "Palmer-Johnson, Sean"
// [3]: "Palm-Bouter, Peter"
As already mentioned, the OrderBy() method takes a comparer as a second parameter.
For strings, you don't necessarily have to implement an IComparer<string>. You might be fine with System.StringComparer.CurrentCulture (or one of the others in System.StringComparer).
In your exact case, however, there is no built-in comparer which will handle also the - after letter sort order.
OrderBy() returns results in ascending order.
e comes before h, thus the first result (remember you're comparing on a substring that starts with the character in the 9th position...not the beginning of the string) and i comes before y, thus the second. Case sensitivity has nothing to do with it.
If you want results in descending order, you should use OrderByDescending():
TempRecordList.Cast<string>
.OrderByDescending(s => s.Substring(9, 30)).ToArray());
You might want to just implement a custom IComparer object that will give a custom priority to special, upper-case and lower-case characters.
http://msdn.microsoft.com/en-us/library/system.collections.icomparer.aspx

How to convert Chinese characters to Pinyin [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed last year.
The community reviewed whether to reopen this question 4 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
For sorting Chinese language text, I want to convert Chinese characters to Pinyin, properly separating each Chinese character and grouping successive characters together.
Can you please help me in this task by providing the logic or source code for doing this?
Please let me know if any open source or lib already present for this.
Short answer: you don't.
Long answer: There is no one-to-one mapping for 汉字 to 汉语拼音. Just some quick examples:
把 can be "ba" in the third tone or fourth tone.
了 can be "le" toneless or "liao" third tone.
乐 can be "le" or "yue", both in the fourth tone.
落 can be "luo", "la" or "lao", all in the fourth tone.
And so on. I have a beginners' book on this topic that has 207 examples. I stress that this is a beginners' book and is by no means complete. Each one has a page or two of examples of use and conditions under which you choose the appropriate pronunciation. It is not something that could be easily programmed (if at all).
And this doesn't even address the other slippery thing you want to deal with: the separation of characters into grouped words. The very notion of a word is a bit slippery in Chinese. (There's two terms that correspond, roughly to "word" in Chinese for example: 字 and 词. The first is the character, the second groups of characters that are put together into one concept. (I frequently get asked by Chinese speakers how many "words" I can read when they really mean "characters".) While in some cases the distinction is clear (the 词 "乌鸦", for example, is "crow" -- the two 字 must be together to express the idea properly and it would be incorrect to translate it as "black crow"), in others it is not so clear. What does "你好" translate to? Is it one word meaning, idiomatically, "hello"? Or is it two words translating literally to "you good"? Each of the characters involved stands alone or in groups with other words, but together they mean something entirely different from their individual meanings. Given this, how, precisely, do you plan to group the 汉语拼音 transliterations (which are difficult to impossible to get right in the first place!) into "words"?
While #JUST MY correct OPINION's answer addresses some of the difficulties of converting characters into pinyin, it is not an impossible problem to solve.
I have written a library (pinyinify) that solves this task with decent accuracy. Even though there is not a one-to-one mapping between characters and pinyin, my library can usually decide which pronunciation is correct. For example, "我受不了了" correctly converts to "wǒ shòubùliǎo le", with two different pronunciations of 了.
My approach to solving the problem is pretty simple:
First segment the text into words. For example, 我喜欢旅游 would be divided into three words: 我 喜欢 旅游. This is also not a simple process, but there are many libraries for it. jieba is one of the more popular libraries for this purpose.
Use a dictionary to convert the words into pinyin.
If the word is not in the dictionary, fall back to converting the individual characters to pinyin using their most common pronunciation.
CoreFoundation provides certain method to do the conversion:
CFMutableStringRef string = CFStringCreateMutableCopy(NULL, 0, CFSTR("中文"));
CFStringTransform(string, NULL, kCFStringTransformMandarinLatin, NO);
CFStringTransform(string, NULL, kCFStringTransformStripDiacritics, NO);
NSLog(#"%#", string);
The output is
zhong wen
the following code writing in C# can help you to simply convert chinese words that including in gb2312 encodec(just 2312 of often used Simplified-Chinese words) to pinyin.like convert "今天天气不错" to "JinTianTianQiBuCuo".
sometimes a chinese word is not one to one map to a pinyin,it depends on the context we talk about.like the "行" in "自行车"(bike) is pronounced "Xing",but in "银行"(bank) it pronounced "Hang".so if you have problem with this,you may find more complex solution to handle this.
sorry for my poor english.i hope this could give you a little help.
public class ChineseToPinYin
{
private static int[] pyValue = new int[]
{
-20319,-20317,-20304,-20295,-20292,-20283,-20265,-20257,-20242,-20230,-20051,-20036,
-20032,-20026,-20002,-19990,-19986,-19982,-19976,-19805,-19784,-19775,-19774,-19763,
-19756,-19751,-19746,-19741,-19739,-19728,-19725,-19715,-19540,-19531,-19525,-19515,
-19500,-19484,-19479,-19467,-19289,-19288,-19281,-19275,-19270,-19263,-19261,-19249,
-19243,-19242,-19238,-19235,-19227,-19224,-19218,-19212,-19038,-19023,-19018,-19006,
-19003,-18996,-18977,-18961,-18952,-18783,-18774,-18773,-18763,-18756,-18741,-18735,
-18731,-18722,-18710,-18697,-18696,-18526,-18518,-18501,-18490,-18478,-18463,-18448,
-18447,-18446,-18239,-18237,-18231,-18220,-18211,-18201,-18184,-18183, -18181,-18012,
-17997,-17988,-17970,-17964,-17961,-17950,-17947,-17931,-17928,-17922,-17759,-17752,
-17733,-17730,-17721,-17703,-17701,-17697,-17692,-17683,-17676,-17496,-17487,-17482,
-17468,-17454,-17433,-17427,-17417,-17202,-17185,-16983,-16970,-16942,-16915,-16733,
-16708,-16706,-16689,-16664,-16657,-16647,-16474,-16470,-16465,-16459,-16452,-16448,
-16433,-16429,-16427,-16423,-16419,-16412,-16407,-16403,-16401,-16393,-16220,-16216,
-16212,-16205,-16202,-16187,-16180,-16171,-16169,-16158,-16155,-15959,-15958,-15944,
-15933,-15920,-15915,-15903,-15889,-15878,-15707,-15701,-15681,-15667,-15661,-15659,
-15652,-15640,-15631,-15625,-15454,-15448,-15436,-15435,-15419,-15416,-15408,-15394,
-15385,-15377,-15375,-15369,-15363,-15362,-15183,-15180,-15165,-15158,-15153,-15150,
-15149,-15144,-15143,-15141,-15140,-15139,-15128,-15121,-15119,-15117,-15110,-15109,
-14941,-14937,-14933,-14930,-14929,-14928,-14926,-14922,-14921,-14914,-14908,-14902,
-14894,-14889,-14882,-14873,-14871,-14857,-14678,-14674,-14670,-14668,-14663,-14654,
-14645,-14630,-14594,-14429,-14407,-14399,-14384,-14379,-14368,-14355,-14353,-14345,
-14170,-14159,-14151,-14149,-14145,-14140,-14137,-14135,-14125,-14123,-14122,-14112,
-14109,-14099,-14097,-14094,-14092,-14090,-14087,-14083,-13917,-13914,-13910,-13907,
-13906,-13905,-13896,-13894,-13878,-13870,-13859,-13847,-13831,-13658,-13611,-13601,
-13406,-13404,-13400,-13398,-13395,-13391,-13387,-13383,-13367,-13359,-13356,-13343,
-13340,-13329,-13326,-13318,-13147,-13138,-13120,-13107,-13096,-13095,-13091,-13076,
-13068,-13063,-13060,-12888,-12875,-12871,-12860,-12858,-12852,-12849,-12838,-12831,
-12829,-12812,-12802,-12607,-12597,-12594,-12585,-12556,-12359,-12346,-12320,-12300,
-12120,-12099,-12089,-12074,-12067,-12058,-12039,-11867,-11861,-11847,-11831,-11798,
-11781,-11604,-11589,-11536,-11358,-11340,-11339,-11324,-11303,-11097,-11077,-11067,
-11055,-11052,-11045,-11041,-11038,-11024,-11020,-11019,-11018,-11014,-10838,-10832,
-10815,-10800,-10790,-10780,-10764,-10587,-10544,-10533,-10519,-10331,-10329,-10328,
-10322,-10315,-10309,-10307,-10296,-10281,-10274,-10270,-10262,-10260,-10256,-10254
};
private static string[] pyName = new string[]
{
"A","Ai","An","Ang","Ao","Ba","Bai","Ban","Bang","Bao","Bei","Ben",
"Beng","Bi","Bian","Biao","Bie","Bin","Bing","Bo","Bu","Ba","Cai","Can",
"Cang","Cao","Ce","Ceng","Cha","Chai","Chan","Chang","Chao","Che","Chen","Cheng",
"Chi","Chong","Chou","Chu","Chuai","Chuan","Chuang","Chui","Chun","Chuo","Ci","Cong",
"Cou","Cu","Cuan","Cui","Cun","Cuo","Da","Dai","Dan","Dang","Dao","De",
"Deng","Di","Dian","Diao","Die","Ding","Diu","Dong","Dou","Du","Duan","Dui",
"Dun","Duo","E","En","Er","Fa","Fan","Fang","Fei","Fen","Feng","Fo",
"Fou","Fu","Ga","Gai","Gan","Gang","Gao","Ge","Gei","Gen","Geng","Gong",
"Gou","Gu","Gua","Guai","Guan","Guang","Gui","Gun","Guo","Ha","Hai","Han",
"Hang","Hao","He","Hei","Hen","Heng","Hong","Hou","Hu","Hua","Huai","Huan",
"Huang","Hui","Hun","Huo","Ji","Jia","Jian","Jiang","Jiao","Jie","Jin","Jing",
"Jiong","Jiu","Ju","Juan","Jue","Jun","Ka","Kai","Kan","Kang","Kao","Ke",
"Ken","Keng","Kong","Kou","Ku","Kua","Kuai","Kuan","Kuang","Kui","Kun","Kuo",
"La","Lai","Lan","Lang","Lao","Le","Lei","Leng","Li","Lia","Lian","Liang",
"Liao","Lie","Lin","Ling","Liu","Long","Lou","Lu","Lv","Luan","Lue","Lun",
"Luo","Ma","Mai","Man","Mang","Mao","Me","Mei","Men","Meng","Mi","Mian",
"Miao","Mie","Min","Ming","Miu","Mo","Mou","Mu","Na","Nai","Nan","Nang",
"Nao","Ne","Nei","Nen","Neng","Ni","Nian","Niang","Niao","Nie","Nin","Ning",
"Niu","Nong","Nu","Nv","Nuan","Nue","Nuo","O","Ou","Pa","Pai","Pan",
"Pang","Pao","Pei","Pen","Peng","Pi","Pian","Piao","Pie","Pin","Ping","Po",
"Pu","Qi","Qia","Qian","Qiang","Qiao","Qie","Qin","Qing","Qiong","Qiu","Qu",
"Quan","Que","Qun","Ran","Rang","Rao","Re","Ren","Reng","Ri","Rong","Rou",
"Ru","Ruan","Rui","Run","Ruo","Sa","Sai","San","Sang","Sao","Se","Sen",
"Seng","Sha","Shai","Shan","Shang","Shao","She","Shen","Sheng","Shi","Shou","Shu",
"Shua","Shuai","Shuan","Shuang","Shui","Shun","Shuo","Si","Song","Sou","Su","Suan",
"Sui","Sun","Suo","Ta","Tai","Tan","Tang","Tao","Te","Teng","Ti","Tian",
"Tiao","Tie","Ting","Tong","Tou","Tu","Tuan","Tui","Tun","Tuo","Wa","Wai",
"Wan","Wang","Wei","Wen","Weng","Wo","Wu","Xi","Xia","Xian","Xiang","Xiao",
"Xie","Xin","Xing","Xiong","Xiu","Xu","Xuan","Xue","Xun","Ya","Yan","Yang",
"Yao","Ye","Yi","Yin","Ying","Yo","Yong","You","Yu","Yuan","Yue","Yun",
"Za", "Zai","Zan","Zang","Zao","Ze","Zei","Zen","Zeng","Zha","Zhai","Zhan",
"Zhang","Zhao","Zhe","Zhen","Zheng","Zhi","Zhong","Zhou","Zhu","Zhua","Zhuai","Zhuan",
"Zhuang","Zhui","Zhun","Zhuo","Zi","Zong","Zou","Zu","Zuan","Zui","Zun","Zuo"
};
/// <summary>
/// 把汉字转换成拼音(全拼)
/// </summary>
/// <param name="hzString">汉字字符串</param>
/// <returns>转换后的拼音(全拼)字符串</returns>
public static string Convert(string hzString)
{
// 匹配中文字符
Regex regex = new Regex("^[\u4e00-\u9fa5]$");
byte[] array = new byte[2];
string pyString = "";
int chrAsc = 0;
int i1 = 0;
int i2 = 0;
char[] noWChar = hzString.ToCharArray();
for (int j = 0; j < noWChar.Length; j++)
{
// 中文字符
if (regex.IsMatch(noWChar[j].ToString()))
{
array = System.Text.Encoding.Default.GetBytes(noWChar[j].ToString());
i1 = (short)(array[0]);
i2 = (short)(array[1]);
chrAsc = i1 * 256 + i2 - 65536;
if (chrAsc > 0 && chrAsc < 160)
{
pyString += noWChar[j];
}
else
{
// 修正部分文字
if (chrAsc == -9254) // 修正“圳”字
pyString += "Zhen";
else
{
for (int i = (pyValue.Length - 1); i >= 0; i--)
{
if (pyValue[i] <= chrAsc)
{
pyString += pyName[i];
break;
}
}
}
}
}
// 非中文字符
else
{
pyString += noWChar[j].ToString();
}
}
return pyString;
}
}
You can use the following method:
from __future__ import unicode_literals
from pypinyin import lazy_pinyin
hanzi_list = ['如何', '将', '汉字','转为', '拼音']
pinyin_list = [''.join(lazy_pinyin(_)) for _ in hanzi_list]
Output:
['ruhe', 'jiang', 'hanzi', 'zhuanwei', 'pinyin']
i had this problem and i found a solution in PHP (which could be cleaner i suppose but it works). I had some troubles because the file given in this topic is from hexa unicode.
1) Import the data from ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/data/Uni2Pinyin.gz (thanks pierr) to your database or whatever
2) Import your data in an array as $pinyinArray[$hexaUnicode] = $pinyin;
3) Use this code:
/*
* Decimal representation of $c
* function found there: http://www.cantonese.sheik.co.uk/phorum/read.php?2,19594
*/
function uniord($c)
{
$ud = 0;
if (ord($c{0})>=0 && ord($c{0})<=127)
$ud = $c{0};
if (ord($c{0})>=192 && ord($c{0})<=223)
$ud = (ord($c{0})-192)*64 + (ord($c{1})-128);
if (ord($c{0})>=224 && ord($c{0})<=239)
$ud = (ord($c{0})-224)*4096 + (ord($c{1})-128)*64 + (ord($c{2})-128);
if (ord($c{0})>=240 && ord($c{0})<=247)
$ud = (ord($c{0})-240)*262144 + (ord($c{1})-128)*4096 + (ord($c{2})-128)*64 + (ord($c{3})-128);
if (ord($c{0})>=248 && ord($c{0})<=251)
$ud = (ord($c{0})-248)*16777216 + (ord($c{1})-128)*262144 + (ord($c{2})-128)*4096 + (ord($c{3})-128)*64 + (ord($c{4})-128);
if (ord($c{0})>=252 && ord($c{0})<=253)
$ud = (ord($c{0})-252)*1073741824 + (ord($c{1})-128)*16777216 + (ord($c{2})-128)*262144 + (ord($c{3})-128)*4096 + (ord($c{4})-128)*64 + (ord($c{5})-128);
if (ord($c{0})>=254 && ord($c{0})<=255) //error
$ud = false;
return $ud;
}
/*
* Translate the $string string of a single chinese charactere to unicode
*/
function chineseToHexaUnicode($string) {
return strtoupper(dechex(uniord($string)));
}
/*
*
*/
function convertChineseToPinyin($string,$pinyinArray) {
$pinyinValue = '';
for ($i = 0; $i < mb_strlen($string);$i++)
$pinyinValue.=$pinyinArray[chineseToHexaUnicode(mb_substr($string, $i, 1))];
return $pinyinValue;
}
$string = '龙江省五大';
echo convertChineseToPinyin($string,$pinyinArray);
echo: (long2)(jiang1)(sheng3,xing3)(wu3)(da4,dai4)
Of course, $pinyinArray is your array of data (hexoUnicode => pinyin)
Hope it will help someone.
If you use Visual Studio, this might be an option:
Microsoft.International.Converters.PinYinConverter
How to install:
First, download the Visual Studio International Pack 2.0, Official Download. Once the download is complete install the run file VSIPSetup.msi installation (x86 operating system on the default installation directory (C:\Program Files\Microsoft Visual Studio International Feature Pack 2.0).
After installation, you need to add a reference in VS, respectively reference:
C:\Program Files\Microsoft Visual Studio International Pack\Simplified Chinese Pin-Yin Conversion Library (Pinyin)
and
C:\Program Files\Microsoft Visual Studio International Pack\Traditional Chinese to Simplified Chinese Conversion Library and Add-In Tool (Traditional and Simplified Huzhuan to)
How to use:
public static string GetPinyin(string str)
{
string r = string.Empty;
foreach (char obj in str)
{
try
{
ChineseChar chineseChar = new ChineseChar(obj);
string t = chineseChar.Pinyins[0].ToString();
r += t.Substring(0, t.Length - 1);
}
catch
{
r += obj.ToString();
}
}
return r;
}
Source:
http://www.programering.com/a/MzM3cTMwATA.html

Converting non-decimal numbers to another non-decimal

Not that it's a lot of work, but the only way I know to convert a non-decimal to another non-decimal is by converting the number to decimal first, then take a second step to convert it to a new base. For example, to convert 456 (in base 7) to 567 (in base 8), I would calculate the decimal value of 456, then convert that value into base 8...
Is there a better way to go directly from 7 to 8? or any base to any other base for that matter?
Here's what I have:
//source_lang and target_lang are just the numeric symbols, they would be "0123456789" if they were decimal, and "0123456789abcdef" if hex.
private string translate(string num, string source_lang, string target_lang)
{
int b10 = 0;
string rv = "";
for (int i=num.Length-1; i>=0; i--){
b10 += source_lang.IndexOf( num[i] ) * ((int)Math.Pow(source_lang.Length, num.Length -1 - i));
}
while (b10 > 0) {
rv = target_lang[b10 % target_lang.Length] + rv;
b10 /= target_lang.Length;
}
return rv;
}
You're not really converting into base 10. You're converting it into a numeric data type instead of a string representation. If anything, you're converting it into binary :) It's worth distinguishing between "an integer" (which doesn't intrinsically have a base) and "the textual representation of an integer" (which does).
That seems like a sensible way to go, IMO. However, your conversion routines certainly aren't particularly efficient. I would separate out your code into Parse and Format methods, then the Convert method can be something like:
public static string Convert(string text, int sourceBase, int targetBase)
{
int number = Parse(text, sourceBase);
return Format(number, targetBase);
}
(You can use a string to represent the different bases if you want, of course. If you really need that sort of flexibility though, I'd be tempted to create a new class to represent a "numeric representation". That class should probably be the one to have Parse, Format and Convert in it.)

Resources