I have a situation where I need to convert between ISO 3166 country codes.
For example, using the ISO 3 standard for country codes, IOT is the alpha code for British Indian Ocean Territory and 086 is it's numeric equivalent.
Another example would be using the ISO 4 for currency codes, 'UZS' is the alpha code for Uzbekistan and 860 is it's numeric equivalent.
You can find machine-processable lists of ISO 3166 country codes in a few places, e.g.:
in plain text format: http://download.geonames.org/export/dump/countryInfo.txt
in JSON: https://github.com/mledoze/countries (check the file countries.json, which contains much more than just country codes; the README describes its structure).
See also Full list of ISO ALPHA-2 and ISO ALPHA-3 country codes on GIS Stack Exchange.
Related
I want to get an ISO 639-1 language string from an LCID. The problem is that 2052 (Simplified Chinese) and 1028 (Traditional Chinese) both return zh (Chinese) instead of zh-CN and zh-TW.
The code I use is
WCHAR locale[8];
GetLocaleInfoW(lcid, LOCALE_SISO639LANGNAME, locale, 8);
Is there a way to get the right code?
ISO 639-1 specifies 2-letter language names, so GetLocaleInfo() correctly returns "zh" for both Simplified and Traditional Chinese - they are not differentiated in the ISO 639-1 spec.
A call with LOCALE_SNAME instead always returns a string also containing the sub-tag, eg "de-DE" or "de-AT".
Everything else, for example a 2-letter tag for "most" languages and 4-letter one (xx-YY) for some "exceptions" (like Chinese - and which other ones?), is something custom and would therefore require custom code.
I switched from the old iText library to the iTextPdf library and noticed a problem. The new library sets the producer to a value that includes non-Unicode characters (windows TM symbol and copyright symbol). The problem is that validation programs that read this text choke on these characters.
Can I get iText to fix this (w/o paying for a license)? I am ok with iText getting credit. I just want the credits to be Unicode clean.
<</Producer(iText® 5.5.0 ©2000-2013 iText Group NV \(AGPL-version\))/ModDate(D:20150126155550-07'00')/CreationDate(D:20150126155550-07'00')>>
You are looking at the document information dictionary of a PDF, more exactly at the value of its Producer entry. It is specified as:
Producer text string (Optional) If the document was converted to PDF from another format, the name of the conforming product that converted it to PDF.
(Table 317 – Entries in the document information dictionary)
So the value must have the type text string. This in turn is specified as:
The text string type shall be used for character strings that shall be encoded in either PDFDocEncoding or the UTF-16BE Unicode character encoding scheme. PDFDocEncoding can encode all of the ISO Latin 1 character set and is documented in Annex D.
(section 7.9.2.2 Text String Type)
In Annex D you find:
CHAR CODE (OCTAL)
CHAR NAME STD MAC WIN PDF
...
© copyright — 251 251 251
...
® registered — 250 256 256
...
(D.2 Latin Character Set and Encodings)
Thus, these characters are completely valid here and validators which choke on these characters are broken.
So you had better report this bug to the developers of the validators in question.
I am new to Altova Stylevision. I need to format the date from dd/mm/yy to dd/Mon/yyyy.
I have tried the options suggested in the Altova manual but it does not seem to recognize the format.
This question is almost three weeks old, so maybe meanwhile you have found an answer, for instance by looking it up at page 781 of XSLT 2.0 and XPath 2.0 Programmer's Reference by Michael Kay. Or perhaps somewhere else on the web.
To achieve what you want in XSLT 2.0, we first need to convert your date-like string into international date-time notation. Luckily, we only need to care about the date-part, which should take the form YYYY-MM-DD.
After that, we "just" need to call format-date with the proper picture string:
<!-- first, convert your date into int'l date-notation with a regex -->
<xsl:variable
name="date"
select="replace('14/09/2014', '(\d+)/(\d+)/(\d+)', '$3-$2-$1')" />
<!-- then, use a properly formatted picture string to get the abbrev. month -->
<xsl:value-of
select="format-date(xs:date($date), '[D01]/[MNn,3-3]/[Y]')" />
The output with conforming processors like Saxon or Exselt (didn't try Altova) is this: 14/Sep/2014.
The picture string works as follows (from that same book I quoted):
[xxxx] is a variable marker
[D01] formats the day-part as a two-digit day (leave out the zero if you don't want leading zeroes)
[MNn,3-3] formats the month with M as case-word with Nn with a width of min three and max three with 3-3.
[Y] formats the year in the default format as a four-digit year.
If you need the full name of the month, remove the width-specifier. If you need some other output, check out the table in the XPath 3.0 specification for what you can use in the picture string.
Through the REST API of an application, I receive language codes of the following form: ll-Xxxx.
two lowercase letters languages (looks like ISO 639-1),
a dash,
a code going up to four letters, starting with an uppercase letter (looks like an ISO 639-3 macrolanguage code).
Some examples:
az-Arab Azerbaijani in the Arabic script
az-Cyrl Azerbaijani in the Cyrillic script
az-Latn Azerbaijani in the Latin script
sr-Cyrl Serbian in the Cyrillic script
sr-Latn Serbian in the Latin script
uz-Cyrl Uzbek in the Cyrillic script
uz-Latn Uzbek in the Latin script
zh-Hans Chinese in the simplified script
zh-Hant Chinese in the traditional script
From what I found online:
[ISO 639-1] is the first part of the ISO 639 series of international standards for language codes. Part 1 covers the registration of two-letter codes.
and
ISO 639-3 is an international standard for language codes. In defining some of its language codes, some are defined as macrolanguages [...]
Now I need to write a piece of code to verify that I receive a valid language code.
But since what I receive is a mix of 639-1 (2 letters language) and 639-3 (macrolanguage), what standard am I supposed to stick with ? Are these code belonging to some sort of mixed up (perhaps common) standard ?
The current reference for identifying languages is IETF BCP 47, which combines IETF RFC 5646 and RFC 4647.
Codes of the form ll-Xxxx combine an ISO 639-1 language code (two letters) and an ISO 15924 script code (four letters). BCP 47 recommends that language codes be written in lower case and that script codes be written "lowercase with the initial letter capitalized", but this is basically for readability.
BCP 47 also recommends that the language code should be the shortest available ISO 639 tag. So if a language is represented in both ISO 639-1 (two letters) and ISO 639-3 (three letters), than you should use the ISO 639-1.
Following RFC-5646 (at page 4) a language tag can be written with the following form : [language]-[script].
language (2 or 3 letters) is the shortest ISO 639 code
script (4 letters) is a ISO 15924 code (see also RFC section)
What is the globally accepted way of displaying international currencies?
For example: US$20, $20, $20 (US), €20, 20€, etc?
If there are many ways to show each currency, what is a good general way of showing currency?
I didn't find any single way. That said:
Show the amount (obviously)
Show the ISO currency code
Optionally show a user-friendly symbol
Don't rely on $ or £ -- several currencies use these symbols. ISO currency codes make it unambiguous. I usually do:
[user-friendly-symbol][amount] [iso code]
For example, $100 USD or €2,000,000 EUR
For the thousand separator, I usually take the local user's preference, rather than trying to figure out if that currency is generally formatted with , or .
See ISO 4217
ISO Currency Codes are the standard, although you might want to special-case certain common currencies (eg, USD, GBP, JPY, EUR etc) and display their symbols too.
ISO 4217
This has been a popular issue around here. See if any of these help you out:
Best Practice - Format Multiple Currencies
Proper currency format when not displaying the native currency of a culture
Currency formatting
I think it's generally accepted this is the best way to do it:
USD$30
AUS$40
And these currencies are displayed like this by default:
£20
€20
The ISO 4217 currency code plus the value. So USD20, EUR20.
I do not work in this field but I believe it should be displayed with a 3 letter code after the sum like :
20 EUR
105 GBP
86.4 USD
There are so many countries using dollars, francs (except France, using euros now) and so on
There is still the problem of the separators:
1,000,000.00 USD but
1 000 000,00 EUR here in France