Format phone number in freemarker - freemarker

I have a 10 digit number , which needs to be displayed as formatted phone number.
Eg: 1234567890 needs to be formatted in (123) 456-7890
I have tried formatting it using built-ins for numbers in freemarker, but somehow still not able to get it in expected format.

I don't think that's possible with Java SimpleNumberFormat patterns, which is what FreeMarker uses when you write things like '0.##'. But there are no limitations with custom number formats (see http://freemarker.org/docs/pgui_config_custom_formats.html), like, you could have something like ${n?string.#phone} that can do all kind of Java logic.
However, I would like to note that perhaps there's a problem there in the data model itself. In reality, phone numbers are not numbers, but stings (or even structures). They can have significant characters in them like + (or even #). Not to mention /, in case you have to dial extensions.

Related

standardized international phone number field format as a string

I'd like to store phone numbers as unique user ids in my database/app which will initially roll out in the United States but could expand to other countries eventually.
My question is when storing phone numbers, what's a resilient way to store the number as a string so that I don't have any duplicate numbers from other countries overlap.
My initial thought is to do it this way
+1(212)555-5555
+{countryCode}({areaCode}){{subscriberCode}} *formatted with a hyphen for u.s numbers
Does that seem reasonable or are there any pitfalls to that? Should spaces be used? For instance I can't imagine other countries would use spaces or parenthesis in their subscriber codes... but maybe they do? It would also be nice if it followed the standard output format from ios and android phones' address books.
Here's what I'd say:
Use the plus. It indicates for certain that the country code follows and the number is not in a local format. You could also not store the plus and make an internal decision that all phone numbers will be stored with the country code, thus obviating the need for the plus.
Don't use any formatting in the storage of the number. Formatting is irrelevant when it comes to dialing and it makes searching and comparing more difficult.
Use a gem like phony_rails or phoney to format the number to local conventions when displaying.
So it looks like there is an international standard
http://en.wikipedia.org/wiki/E.164
And a node.js library that can format to that standard
https://github.com/aftership/node-phone

Using ASCII delimiters (29-31) in modern programming

I'm currently building a hash key string (collapsed from a map) where the values that are delimited by the special ASCII unit delimiter 31 (1F).
This nicely solves the problem of trying to guess what ASCII characters won't be used in the string values and I don't need to worry about escaping or quoting values etc.
However reading about the history of this is it appears to be a relic from the 1960s and I haven't seen many examples where strings are built and tokenised using this special character so it all seems too easy.
Are there any issues to using this delimiter in a modern application?
I'm currently doing this in a non-Unicode C++ application, however I'm interested to know how this applies generally in other languages such as Java, C# and with Unicode.
The lower 128 char map of ASCII is fully set in stone into the Unicode standard, this including characters 0->31. The only reason you don't see special ASCII chars in use in strings very often is simply because of human interfacing limitations: they do not visualize well (if at all) when displayed to screen or written to file, and you can't easily type them in from a keyboard either. They're also not allowed in un-escaped form within various popular 'human readable' file formats, such as XML.
For logical processing tasks within a program that do not need end-user interaction, however, they are perfectly suitable for whatever use you can find for them. Your particular use sounds novel and efficient and I think you should definitely run with it.
Your application is free to accept whatever binary format it pleases. However, if you need to embed arbitrary binary data in your input, you need to escape whatever delimiters or other special codes your format uses. This is true regardless of which ones you choose.
I'd also not ignore Unicode. It's 2012, by now it's rather silly to work with an outdated model for dealing with text. If your input data is textual, handle it as such.
The one issue that comes to mind is why invent another format instead of using XML or JSON; or if you need a compact encoding, a "binary" variant of those two (Fast Infoset, msgpack, who knows what else), or ASN.1? There's probably a whole bunch of other issues that you'll encounter when rolling your own that the design and tooling for those formats already solved.
I work with barcodes in a warehouse setting. We use ASCII code 31 as a field-separator so that a single scan can populate multiple data fields with a single scan. So, consider the ramifications if you think your hash key could end up on a barcode.

Localizing spreadsheet cell names

I'm working on an application that has a spreadsheet-like interface. There is a grid of cells. Rows are numbered, and letters are used for the columns. So "names" like A2 and Q17 refer to cells in the grid.
I know I can use GetLocaleInfo(Ex) with LOCALE_SNATIVEDIGITS to get the appropriate digits for the user's locale, so I can format the row numbers. But I don't see something comparable for the locale-appropriate "alphabet".
I could imagine the same question arising for things like word processors that have an outline mode and need to be able to enumerate some list items with letters.
I've been pouring through the Windows NLS APIs, and I don't see anything like LOCALE_SNATIVEALPHABET nor do I see an API like EnumLocaleAlphabet. Am I missing such an API or am I stuck rolling my own?
Personally I haven't heard of such API. The closest to what you are asking would be ICU uchar's UBlockCode but it still won't give you concrete alphabet.
By the way, I don't think it is actually localize cell names unless you localize the whole User Interface. But in such case you may simply ask translators to provide valid cell symbols.
And this probably what you should do, because some writing systems do not have concept of alphabet at all. That is, it is called script, not alphabet. For example, I don't think it would be good idea to use Arabic for cell symbol (which glyph variant in such case?) nor I would use Chinese (all possible ideograms?).
My suggestion is: leave it to translators, if they want to localize it, that is OK, if they don't, just trust them, they really should know their craft.

Can sorting Japanese kanji words be done programmatically?

I've recently discovered, to my astonishment (having never really thought about it before), machine-sorting Japanese proper nouns is apparently not possible.
I work on an application that must allow the user to select a hospital from a 3-menu interface. The first menu is Prefecture, the second is City Name, and the third is Hospital. Each menu should be sorted, as you might expect, so the user can find what they want in the menu.
Let me outline what I have found, as preamble to my question:
The expected sort order for Japanese words is based on their pronunciation. Kanji do not have an inherent order (there are tens of thousands of Kanji in use), but the Japanese phonetic syllabaries do have an order: あ、い、う、え、お、か、き、く、け、こ... and on for the fifty traditional distinct sounds (a few of which are obsolete in modern Japanese). This sort order is called 五十音順 (gojuu on jun , or '50-sound order').
Therefore, Kanji words should be sorted in the same order as they would be if they were written in hiragana. (You can represent any kanji word in phonetic hiragana in Japanese.)
The kicker: there is no canonical way to determine the pronunciation of a given word written in kanji. You never know. Some kanji have ten or more different pronunciations, depending on the word. Many common words are in the dictionary, and I could probably hack together a way to look them up from one of the free dictionary databases, but proper nouns (e.g. hospital names) are not in the dictionary.
So, in my application, I have a list of every prefecture, city, and hospital in Japan. In order to sort these lists, which is a requirement, I need a matching list of each of these names in phonetic form (kana).
I can't come up with anything other than paying somebody fluent in Japanese (I'm only so-so) to manually transcribe them. Before I do so though:
Is it possible that I am totally high on fire, and there actually is some way to do this sorting without creating my own mappings of kanji words to phonetic readings, that I have somehow overlooked?
Is there a publicly available mapping of prefecture/city names, from the government or something? That would reduce the manual mapping I'd need to do to only hospital names.
Does anybody have any other advice on how to approach this problem? Any programming language is fine--I'm working with Ruby on Rails but I would be delighted if I could just write a program that would take the kanji input (say 40,000 proper nouns) and then output the phonetic representations as data that I could import into my Rails app.
宜しくお願いします。
For Data, dig Google's Japanese IME (Mozc) data files here.
https://github.com/google/mozc/tree/master/src/data
There is lots of interesting data there, including IPA dictionaries.
Edit:
And you may also try Mecab, it can use IPA dictionary and can convert kanjis to katakana for most of the words
https://taku910.github.io/mecab/
and there is ruby bindings for that too.
https://taku910.github.io/mecab/bindings.html
and here is somebody tested, ruby with mecab with tagger -Oyomi
http://hirai2.blog129.fc2.com/blog-entry-4.html
just a quick followup to explain the eventual actual solution we used. Thanks to all who recommended mecab--this appears to have done the trick.
We have a mostly-Rails backend, but in our circumstance we didn't need to solve this problem on the backend. For user-entered data, e.g. creating new entities with Japanese names, we modified the UI to require the user to enter the phonetic yomigana in addition to the kanji name. Users seem accustomed to this. The problem was the large corpus of data that is built into the app--hospital, company, and place names, mainly.
So, what we did is:
We converted all the source data (a list of 4000 hospitals with name, address, etc) into .csv format (encoded as UTF-8, of course).
Then, for developer use, we wrote a ruby script that:
Uses mecab to translate the contents of that file into Japanese phonetic readings
(the precise command used was mecab -Oyomi -o seed_hospitals.converted.csv seed_hospitals.csv, which outputs a new file with the kanji replaced by the phonetic equivalent, expressed in full-width katakana).
Standardizes all yomikata into hiragana (because users tend to enter hiragana when manually entering yomikata, and hiragana and katakana sort differently). Ruby makes this easy once you find it: NKF.nkf("-h1 -w", katakana_str) # -h1 means to hiragana, -w means output utf8
Using the awesomely conveninent new Ruby 1.9.2 version of CSV, combine the input file with the mecab-translated file, so that the resulting file now has extra columns inserted, a la NAME, NAME_YOMIGANA, ADDRESS, ADDRESS_YOMIGANA, and so on.
Use the data from the resulting .csv file to seed our rails app with its built-in values.
From time to time the client updates the source data, so we will need to do this whenever that happens.
As far as I can tell, this output is good. My Japanese isn't good enough to be 100% sure, but a few of my Japanese coworkers skimmed it and said it looks all right. I put a slightly obfuscated sample of the converted addresses in this gist so that anybody who cared to read this far can see for themselves.
UPDATE: The results are in... it's pretty good, but not perfect. Still, it looks like it correctly phoneticized 95%+ of the quasi-random addresses in my list.
Many thanks to all who helped me!
Nice to hear people are working with Japanese.
I think you're spot on with your assessment of the problem difficulty. I just asked one of the Japanese guys in my lab, and the way to do it seems to be as you describe:
Take a list of Kanji
Infer (guess) the yomigana
Sort yomigana by gojuon.
The hard part is obviously step two. I have two guys in my lab: 高橋 and 高谷. Naturally, when sorting reports etc. by name they appear nowhere near each other.
EDIT
If you're fluent in Japanese, have a look here: http://mecab.sourceforge.net/
It's a pretty popular tool, so you should be able to find English documentation too (the man page for mecab has English info).
I'm not familiar with MeCab, but I think using MeCab is good idea.
Then, I'll introduce another method.
If your app is written in Microsoft VBA, you can call "GetPhonetic" function. It's easy to use.
see : http://msdn.microsoft.com/en-us/library/aa195745(v=office.11).aspx
Sorting prefectures by its pronunciation is not common. Most Japanese are used to prefectures sorted by 「都道府県コード」.
e.g. 01:北海道, 02:青森県, …, 13:東京都, …, 27:大阪府, …, 47:沖縄県
These codes are defined in "JIS X 0401" or "ISO-3166-2 JP".
see (Wikipedia Japanese) :
http://ja.wikipedia.org/wiki/%E5%85%A8%E5%9B%BD%E5%9C%B0%E6%96%B9%E5%85%AC%E5%85%B1%E5%9B%A3%E4%BD%93%E3%82%B3%E3%83%BC%E3%83%89

Formatting phone numbers

Our customers often fill out "incorrect" formatted phone-numbers. Do anyone know if there is any lib or standard to convert numbers into a more international style?
This is a Swedish example but we have customers around the globe and i don't what to manually handle implementations for everyone.
input often is like this: 0555 11122
and the wanted result is something like this: +46(0)555-11122
I can do the formatting myself but different countries have different variations and systems so a C/Java/C# lib or a standard method to handle this would be great.
Sound like a job for Regular Expressions.
You can do regex conversions in almost any language.
If you are doing this in C#, you would use Regex.Replace() with the proper regex expression for international numbers.
See:
What regular expression will match valid international phone numbers?

Resources