Mapping from locale to USB LANGID - internationalization

Mapping from locale to USB LANGID - internationalization

I am using libusb in a POSIXy environment (specifically FreeBSD but I hope to be fairly portable).
I want to fetch some strings from a USB device using libusb_get_string_descriptor but I'm not sure what value I should use for langid. I am aware of the official list from https://web.archive.org/web/20180829193331/http://www.usb.org/developers/docs/USB_LANGIDs.pdf/ but is there an easy way to map between the result of e.g. setlocale(LC_MESSAGES, NULL) and the LANGIDs in this PDF?
Is this even the correct approach? Sample code I have seen appear to all fetch the first string without worrying about language selection.

I haven't been able to find any mapping like you've described, and I'm not sure anyone has made one, specifically because the language names aren't standard across OSes.
In my (admittedly limited) experience, devices tend to ignore the langid and will always return the same string for given index. However, I'm in the US and don't have access to equipment designed to support multiple languages, so this may not be true worldwide.
That said, every USB device (that support string descriptors at all) is required to provide at least one supported langid on string index zero, so you could grab that, first (with langid 0), to use as a default.

Related

Steps to develop a multilingual web application

What are the steps to develop a multilingual web application?
Should i store the languages texts and resources in database or should i use property files or resource files?
I understand that I need to use CurrentCulture with C# alone with CultureFormat etc.
I wanted to know you opinions on steps to build a multilingual web application.
Doesn't have to be language specific. I'm just looking for steps to build this.

The specific mechanisms are different depending on the platform you are developing on.
As a cursory set of work items:
Separation of code from content. Generally, resources are compiled into assemblies with the help of resource files (in dot net) or stored in property files (in java, though there are other options), or some other location, and referred to by ID. If you want localization costs to be reasonable, you need to avoid changes to the IDs between releases, as most localization tools will treat new IDs as new content.
Identification of areas in the application which make assumptions about the locale of the user, especially date/time, currency, number formatting or input.
Create some mechanism for locale-specific CSS content; not all fonts work for all languages, and not all font-sizes are sane for all languages. Don't paint yourself into a corner of forcing Thai text to be displayed in 8 pt. Also, text directionality is going to be right-to-left for at least two languages.
Design your page content to reflow or resize reasonably when more or less content than you expect is present. Many languages expand 50-80% from English for short strings, and 30-40% for longer pieces of content (that's a rough rule of thumb, not a law).
Identify cultural presumptions made by your UI designers, and try to make them more neutral, or, if you've got money and sanity to burn, localizable. Mailboxes don't look the same everywhere, hand gestures aren't universal, and something that's cute or clever or relies on a visual pun won't necessarily travel well.
Choose appropriate encodings for your supported languages. It's now reasonable to use UTF-8 for all content that's sent to web browsers, regardless of language.
Choose appropriate collation for your databases, or enable alternate collations, if you are dealing with content in multiple languages in your databases. Case-insensitivity works differently in many languages than it does in English, and accent insensitivity is acceptable in some languages and generally inappropriate in others.
Don't assume words are delimited by spaces or that sentences are delimited by punctuation, if you're trying to support search.
Avoid:
Storing localized content in databases, unless there's a really, really, good reason. And then, think again. If you have content that is somewhat dynamic and representatives of each region need to customize it, it may be reasonable to store certain categories of content with an associated locale ID.
Trying to be clever with string concatenation. Also, try not to assume rules about pluralization or counting work the same for every culture. Make sure, at least, that the order of strings (and controls) can be specified with format strings that are typical your platform, or well documented in your localization kit if you elect to roll your own for some reason.
Presuming that it's ok for code bugs to be fixed by localizers. That's generally not reasonable, at least if you want to deliver your product within a reasonable time at a reasonable cost; it's sometimes not even possible.

The first step is to internationalize. The second step is to localize. The third step is to translate.

Portable keycodes in X11?

I want to get mapping-independent key codes, but documentation says that "keycode" in XKeyEvent structure depends on hardware and driver and I can't rely on it. How can I get some portable key codes like VK_* in Windows?

You want key syms, not key codes.
See XKeycodeToKeysym() and /usr/include/X11/keysymdef.h
To be strictly correct (especially with internationalization) you need a whole bunch of code along the lines of http://git.gnome.org/browse/gtk+/tree/gdk/x11/gdkkeys-x11.c
However, if you're using raw Xlib instead of a toolkit you probably don't care about this kind of thing (if you do you're in for years of work), and XKeycodeToKeysym() is good enough for US keyboards.

Localization best practices

I'm starting to modify my app, which uses all hardcoded strings for errors, GUI, etc. I'm considering these two approaches, but let me know if there is an even better way:
-Put all string in ressource (.rc) files.
-define all strings in a file, once for each language. Use a preprocessor define to decide which strings get compiled in.
Which of these two approaches is generally prefered?

Put all the strings in resource files. Once you've done that, there's several good translation packages available. One useful thing these packages do is allow you to get translation done by somebody who doesn't program.
Remember, also, that internationalization (i18n) is a large subject, and there's a lot of things to consider. It isn't just a matter of translating strings. Do a web search on it, at the very least. You might want to read a book on it: I used International Programming for Windows by Schmitt as a guide. It's an old book from Microsoft Press, and I had to get it through a used book service; most of the more modern stuff seems to be on internationalizing .NET apps.
Without knowing more about your project (what sort of software, who the intended audience is, what sort of organization you have, what sort of budget, why you're interested in internationalization, etc.), this is about the most I can tell you.

Generally you see locale specific resource files containing strings referenced by key. Compiling different versions for different locales is a very rigid solution and will be a maintenance nightmare. Using resource files also allows the user to have fallback locales.

There's another approach of just putting strings in the source with somethign like tr(" ") and usign one of the tools that strips them out and converts them.
It works with any toolkit/GUI library.
You can mark text to be converted and text not to change (such as protocol strings or db keys).
It makes the source easier to read and search, isntead of having to lookup what IDS_MESSAGE34 means.
One problem with resource files, at least with Windows/MFC, is that you can't use the stringtable in dialogs. So you have some text in the stringtabel and some in the dialog section which you have to dela with separately.

ANSI or OEM Codepage when using MME and DirectMusic?

I noticed that when reading MIDI port names from MME, the names are multi-byte strings encoded using the ANSI Codepage, which my app uses by default. When receiving those names from the DirectMusic driver, the names are wide-character strings encoded with the OEM Codepage. See this article by Raymond Chen for a quick refresher on Codepages.
On my German system, this means that when using the current codepage, which turns out to be the ANSI one, I get "Audiogerät" from MME, and "Audiogeröt" from DirectMusic, the latter being wrong. This gets fixed when I treat that last name as OEM-encoded instead.
So how do I know with which codepage to decode those names? Why does the name coming from DirectMusic get encoded differently? Does it come from the USB driver? The COM framework? DirectMusic? How can I know for sure which codepage to use when reading the names of my MIDI ports?
For info:
I use the MultiByteToWideChar() and WideCharToMultiByte() functions to perform the conversions, with CP_ACP and CP_OEMCP as argument for the codepage to use.
I use midiInGetDeviceCaps() to get MIDI port information from the MME subsystem...
... and convert MIDIINCAPS.szPname using the CP_ACP (ANSI) codepage.
I use IID_IDirectMusic8::EnumPort() to get port information from DirectMusic...
... and convert DMUS_PORTCAPS.wszDescription using the CP_OEMCP codepage.

I don't know for sure why the DirectMusic framework would use one set of codepages, and MME another, but the solution here on your end is probably to build an abstraction layer and then make specific implementations for each API. That way, the higher levels of your software don't need to concern itself with details like this.
That said, the endpoint names definitely come from the OS. USB MIDI devices specify only endpoint types (ie, either input or output, and the number), but the OS is free to interpret them as it sees fit, which is why they are localized.
There is not a specific API call (as far as I know) to find out which codepage the framework will deliver its strings in. However, DirectMusic does seem to use double wide characters with OEM codepage as a general convention, though I could not find this clearly stated in any of the MSDN docs. In the MSDN DirectMusic documentation about MIDI port capability structures, the description type clearly is defined as a WCHAR, and the Game Audio Programming book seems to also indicate that this type is an API-wide convention. While it's dangerous to assume that OEM is the default encoding for these chars, I can't find anything that says otherwise (and googling for "DirectMusic codepage" now lists this page as the top hit).
Edit: Check out this stackoverflow question on determining the current OS codepage. It is possible that the DirectMusic API sets the codepage in this manner.

There isn't really an automatic way to tell what codepage is used for these types of data. See here: How can I detect the encoding/codepage of a text file

Win32 File Name Comparison

Does anyone know what culture settings Win32 uses when dealing with case-insensitive files names?
Is this something that varies based on the user's culture, or are the casing rules that Win32 uses culture invariant?

An approximate answer is at
Comparing Unicode file names the right way.
Basically, the recommendation is to uppercase both strings (using CharUpper, CharUpperBuff, or LCMapString), then compare using a binary comparison (i.e. memcmp or wmemcmp, not CompareString with an invariant locale). The file system doesn't do Unicode normalization, and the case rules are not dependent on locale settings.
There are unfortunate ambiguous cases when dealing with characters whose casing rules have changed across different versions of Unicode, but it's about as good as you can do.

Comparing file names in native code and Don't compare filenames are a couple of good blog posts on this topic. The first has C/C++ code for OrdinalIgnoreCaseCompareStrings, and the second tells you how that doesn't always work for filenames and what to do to mitigate that.
Then there are the Unicode problems. While these new OrdinalIgnoreCase string comparison algorithms are great for your local NTFS drive, they might not yield the right answer on your FAT drive, or a network share.
So what's the answer? When possible, let the file system tell you. CreateFile can tell you if a given filename exists. Just pick the right creation disposition. If you need to compare to handles, you can often use GetFileInformationByHandle; look at dwVolumeSerialNumber/nFileIndexHigh/nFileIndexLow.

If you're using .NET, the official recommendation from Microsoft is to use StringComparison.OrdinalIgnoreCase for comparison and ToUpperInvariant for normalization (to be later compared using Ordinal comparison). This also applies to Registry keys and values, environment variables etc.
See New Recommendations for Using Strings in Microsoft .NET 2.0 for more details.
Note that while it's reliable on NTFS, it can fail with network shares, for example. See #SteveSteiner's answer and links in his post for solutions.

According to Windows Driver Samples FastFAT and CDFS, it uses RtlUpcaseUnicodeString to convert a string to uppercase. According to a brief look in Ghidra, that uses an internal function named NLS_UPCASE, whose behavior is based on your current system codepage.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio