TreeView sorting in GTK# produces incorrect result - gtk#

We have a large C# application using the GTK# bindings. Recently we became aware of a very strange bug: When the user clicks on a column to sort it, it doesn't sort correctly. To be specific, GTK appears to be ignoring all punctuation (including whitespace) and sorting only on alphabetic characters.
Does anybody have any idea why on Earth it would do something like that?
We're not doing anything fancy with custom sorting or anything like that. I'm really puzzled as to why it doesn't just sort the strings in ASCII order.

We eventually solved this by adding a custom sorting function that just does a normal ASCII comparison.
I'm still baffled as to why this doesn't happen by default in the first place... so if anybody knows, write your answer here.

Related

Find but skip strings and comments?

One thing that constantly annoys me about VS is that when I do a Find or Find all, it looks in comments, strings, and other places. When I'm trying to find a particular bit of code, like and rent, it finds it all over. Is there a way to limit searches just to code?
Not sure if there is a specific setting to ignore comments, but you could do a regex find. For example, assuming you want to find "text", you could use this:
^(?!\s*?//).*?text
Caveats:
Assumes comments start with // as first non-whitespace characters. E.g. C# comment types
Doesn't work for comments at the end of code lines (only comments on their own lines)
Doesn't work with block comments, for example /* comment */
So overall it isn't perfect by any means, but depending how many hits you are getting, it might help to cut them down which can be useful if you have a lot of false positives in one-liner comments
The 'Find All References' function may suit you : it ignores all commented-out code and text in strings. CTRL+K, R is the keyboard shortcut.
(Note that it's designed for going from a specific instance of a search string to all other instances. so if you haven't already found an instance of what you're searching for, you would have to (temporarily) type one in to the editor window, then search. Also it's not available for all languages : I know it works fine for C#, though.)

Localizing spreadsheet cell names

I'm working on an application that has a spreadsheet-like interface. There is a grid of cells. Rows are numbered, and letters are used for the columns. So "names" like A2 and Q17 refer to cells in the grid.
I know I can use GetLocaleInfo(Ex) with LOCALE_SNATIVEDIGITS to get the appropriate digits for the user's locale, so I can format the row numbers. But I don't see something comparable for the locale-appropriate "alphabet".
I could imagine the same question arising for things like word processors that have an outline mode and need to be able to enumerate some list items with letters.
I've been pouring through the Windows NLS APIs, and I don't see anything like LOCALE_SNATIVEALPHABET nor do I see an API like EnumLocaleAlphabet. Am I missing such an API or am I stuck rolling my own?
Personally I haven't heard of such API. The closest to what you are asking would be ICU uchar's UBlockCode but it still won't give you concrete alphabet.
By the way, I don't think it is actually localize cell names unless you localize the whole User Interface. But in such case you may simply ask translators to provide valid cell symbols.
And this probably what you should do, because some writing systems do not have concept of alphabet at all. That is, it is called script, not alphabet. For example, I don't think it would be good idea to use Arabic for cell symbol (which glyph variant in such case?) nor I would use Chinese (all possible ideograms?).
My suggestion is: leave it to translators, if they want to localize it, that is OK, if they don't, just trust them, they really should know their craft.

Converting NSString to keyCode+modifiers for AXUIElementPostKeyboardEvent

Edit: Turns out, I was misled during my initial explorations of the accessibility APIs. Once I found the secure text field in the AX hierarchy, I was easily able to set the value. Not sure what to do with this question beyond that, but I wanted to update this for future searchers.
I'm working on some code that will post keyboard events to targeted applications using the Accessibility APIs. So far, I have been able to write a trivial app that allows me to type in a string value and then post keyboard events with those key codes to the targeted application. In reality, the strings would be read from another location.
What I have not yet been able to figure out is how to ascertain whether and which modifier keys should also be posted. For instance, when I type Hello, world! into my test application, the input is sent to the other application as hello, world1 because I am not yet including the modifier keys to create the upper case H and the exclamation point. This is made doubly complicated by multi-keystroke characters like é or ü. Sending é sends a raw e with no accent for example.
Is there a simple method I am overlooking for discerning the modifiers to combine with a keycode for creating a particular NSString or unichar? If not, does anyone have a suggestion of how to proceed? So far, the best I have come up with is calling UCKeyTranslate with all possible modifier combinations until I find one that matches the unichar I get using -[NSString characterAtIndex:] I'm not sure this is scalable or reliable, though, given the multi-keystroke nature of some characters as noted above.
Thanks in advance!
This probably won't help. But just in case: Is it really necessary to send keyboard events? Because that is going to get really difficult if you need to support, say, Kotoeri.
It's a simple matter to override insertText: and doCommandBySelector: and send the results of the key sequence, rather than the individual keystrokes.
I have found a example which does the trick but it's incomplete:It will not be a general solution in any case ...how can this handle multiple keyboard layouts ?
There is an cgquartz obsolete function to do so: CGPostKeyboardEvent (not sure it's possible to pass only the char?) may be can still be used (marked undocumented with some side effect to but .. ).
EDIT: UCKeyTranslate as a way to build a dictionary. Interesting but how the OS do this? A better answer should be hidden somewhere !

Can sorting Japanese kanji words be done programmatically?

I've recently discovered, to my astonishment (having never really thought about it before), machine-sorting Japanese proper nouns is apparently not possible.
I work on an application that must allow the user to select a hospital from a 3-menu interface. The first menu is Prefecture, the second is City Name, and the third is Hospital. Each menu should be sorted, as you might expect, so the user can find what they want in the menu.
Let me outline what I have found, as preamble to my question:
The expected sort order for Japanese words is based on their pronunciation. Kanji do not have an inherent order (there are tens of thousands of Kanji in use), but the Japanese phonetic syllabaries do have an order: あ、い、う、え、お、か、き、く、け、こ... and on for the fifty traditional distinct sounds (a few of which are obsolete in modern Japanese). This sort order is called 五十音順 (gojuu on jun , or '50-sound order').
Therefore, Kanji words should be sorted in the same order as they would be if they were written in hiragana. (You can represent any kanji word in phonetic hiragana in Japanese.)
The kicker: there is no canonical way to determine the pronunciation of a given word written in kanji. You never know. Some kanji have ten or more different pronunciations, depending on the word. Many common words are in the dictionary, and I could probably hack together a way to look them up from one of the free dictionary databases, but proper nouns (e.g. hospital names) are not in the dictionary.
So, in my application, I have a list of every prefecture, city, and hospital in Japan. In order to sort these lists, which is a requirement, I need a matching list of each of these names in phonetic form (kana).
I can't come up with anything other than paying somebody fluent in Japanese (I'm only so-so) to manually transcribe them. Before I do so though:
Is it possible that I am totally high on fire, and there actually is some way to do this sorting without creating my own mappings of kanji words to phonetic readings, that I have somehow overlooked?
Is there a publicly available mapping of prefecture/city names, from the government or something? That would reduce the manual mapping I'd need to do to only hospital names.
Does anybody have any other advice on how to approach this problem? Any programming language is fine--I'm working with Ruby on Rails but I would be delighted if I could just write a program that would take the kanji input (say 40,000 proper nouns) and then output the phonetic representations as data that I could import into my Rails app.
宜しくお願いします。
For Data, dig Google's Japanese IME (Mozc) data files here.
https://github.com/google/mozc/tree/master/src/data
There is lots of interesting data there, including IPA dictionaries.
Edit:
And you may also try Mecab, it can use IPA dictionary and can convert kanjis to katakana for most of the words
https://taku910.github.io/mecab/
and there is ruby bindings for that too.
https://taku910.github.io/mecab/bindings.html
and here is somebody tested, ruby with mecab with tagger -Oyomi
http://hirai2.blog129.fc2.com/blog-entry-4.html
just a quick followup to explain the eventual actual solution we used. Thanks to all who recommended mecab--this appears to have done the trick.
We have a mostly-Rails backend, but in our circumstance we didn't need to solve this problem on the backend. For user-entered data, e.g. creating new entities with Japanese names, we modified the UI to require the user to enter the phonetic yomigana in addition to the kanji name. Users seem accustomed to this. The problem was the large corpus of data that is built into the app--hospital, company, and place names, mainly.
So, what we did is:
We converted all the source data (a list of 4000 hospitals with name, address, etc) into .csv format (encoded as UTF-8, of course).
Then, for developer use, we wrote a ruby script that:
Uses mecab to translate the contents of that file into Japanese phonetic readings
(the precise command used was mecab -Oyomi -o seed_hospitals.converted.csv seed_hospitals.csv, which outputs a new file with the kanji replaced by the phonetic equivalent, expressed in full-width katakana).
Standardizes all yomikata into hiragana (because users tend to enter hiragana when manually entering yomikata, and hiragana and katakana sort differently). Ruby makes this easy once you find it: NKF.nkf("-h1 -w", katakana_str) # -h1 means to hiragana, -w means output utf8
Using the awesomely conveninent new Ruby 1.9.2 version of CSV, combine the input file with the mecab-translated file, so that the resulting file now has extra columns inserted, a la NAME, NAME_YOMIGANA, ADDRESS, ADDRESS_YOMIGANA, and so on.
Use the data from the resulting .csv file to seed our rails app with its built-in values.
From time to time the client updates the source data, so we will need to do this whenever that happens.
As far as I can tell, this output is good. My Japanese isn't good enough to be 100% sure, but a few of my Japanese coworkers skimmed it and said it looks all right. I put a slightly obfuscated sample of the converted addresses in this gist so that anybody who cared to read this far can see for themselves.
UPDATE: The results are in... it's pretty good, but not perfect. Still, it looks like it correctly phoneticized 95%+ of the quasi-random addresses in my list.
Many thanks to all who helped me!
Nice to hear people are working with Japanese.
I think you're spot on with your assessment of the problem difficulty. I just asked one of the Japanese guys in my lab, and the way to do it seems to be as you describe:
Take a list of Kanji
Infer (guess) the yomigana
Sort yomigana by gojuon.
The hard part is obviously step two. I have two guys in my lab: 高橋 and 高谷. Naturally, when sorting reports etc. by name they appear nowhere near each other.
EDIT
If you're fluent in Japanese, have a look here: http://mecab.sourceforge.net/
It's a pretty popular tool, so you should be able to find English documentation too (the man page for mecab has English info).
I'm not familiar with MeCab, but I think using MeCab is good idea.
Then, I'll introduce another method.
If your app is written in Microsoft VBA, you can call "GetPhonetic" function. It's easy to use.
see : http://msdn.microsoft.com/en-us/library/aa195745(v=office.11).aspx
Sorting prefectures by its pronunciation is not common. Most Japanese are used to prefectures sorted by 「都道府県コード」.
e.g. 01:北海道, 02:青森県, …, 13:東京都, …, 27:大阪府, …, 47:沖縄県
These codes are defined in "JIS X 0401" or "ISO-3166-2 JP".
see (Wikipedia Japanese) :
http://ja.wikipedia.org/wiki/%E5%85%A8%E5%9B%BD%E5%9C%B0%E6%96%B9%E5%85%AC%E5%85%B1%E5%9B%A3%E4%BD%93%E3%82%B3%E3%83%BC%E3%83%89

Windows Explorer sort method

I'm looking for an algorithm that sorts strings similar to the way files (and folders) are sorted in Windows Explorer. It seems that numeric values in strings are taken into account when sorted which results in something like
name 1, name 2, name 10
instead of
name 1, name 10, name 2
which you get with a regular string comparison.
I was about to start writing this myself but wanted to check if anyone had done this before and was willing to share some code or insights. The way I would approach this would be to add leading zeros to the numeric values in the name before comparing them. This would result in something like
name 00001, name 00010, name 00002
which when sorted with a regular string sort would give me the correct result.
Any ideas?
It's called "natural sort order". Jeff had a pretty extensive blog entry on it a while ago, which describes the difficulties you might overlook and has links to several implementations.
Explorer uses the API StrCmpLogicalW() for this kind of sorting (called 'natural sort order').
You don't need to write your own comparison function, just use the one that already exists.
A good explanation can be found here.
There is StrCmpLogicalW, but it's only available starting with Windows XP and only implemented as Unicode.
Some background information:
http://www.siao2.com/2006/10/01/778990.aspx
The way I understood it, Windows Explorer sorts as per your second example - it's always irritated me hugely that the ordering comes out 1, 10, 2. That's why most apps which write lots of files (like batch apps) always use fixed length filenames with leading 0's or whatever.
Your solution should work, but you'd need to be careful where the numbers were in the filename, and probably only use your approach if they were at the very end.
Have a look at
http://www.interact-sw.co.uk/iangblog/2007/12/13/natural-sorting
for some source code.
I also posted a related question with additional hints and pitfalls:
Sorting strings is much harder than you thought
I posted code (C#) and a description of the algorithm here:
Natural Sort Order in C#
This is a try to implement it in Java:
Java - Sort Strings like Windows Explorer
In short it splits the two Strings to compare in Letter - Digit Parts and compares this parts in a specific way to achieve this kind of sorting.

Resources