What is a proper way for naming of message properties in i18n? - internationalization

We do have a website which should be translate into different languages. Some of the wording is in message properties files ready for translation. I want now add the rest of the text into these files.
What is a good way to name the text blocks?
<view>.<type>.<name>
We mostly have webpages and some of the elements/modules are repeating on some sites.

As far as I know, no "standard" exists. Therefore it is pretty hard to tell what is proper and what is improper way of naming resource keys. However, based on my experience, I could recommend this way:
property file name: <module>.properties
resource keys: <view or dialog>[.<sub-context>].<control-type>.<name>
We may discuss if it is proper way to put every strings from one module into one property files - probably it could be right if updates doesn't happen often and there are not so many messages. Otherwise you might think about one file per view.
As for key naming strategy: it is important for the Translator (sounds like film with honorable governor Arnold S. isn't it?) to have a Context. Translation may actually depend on it, i.e. in Polish you would translate a message in a different way if it is page/dialog/whatever title and in totally different way if it is text on a button.
One example of such resource key could be:
preferences.password_area.label.username=User name
It gives enough hints to the Translator about what it actually is, which could result in correct translation...

We have come up with the following key naming convention (Java, btw) using dot notation and camel case:
Label Keys (form labels, page/form/app titles, etc...i.e., not full sentences; used in multiple UI locations):
If the label represents a Java field (i.e., a form field) and matches the form label: label.nameOfField
Else: label.sameAsValue
Examples:
label.firstName = First Name
label.lastName = Last Name
label.applicationTitle = Application Title
label.editADocument = Edit a Document
Content Keys:
projectName.uiPath.messageOrContentType.n.*
Where:
projectName is the short name of the project (or a derived name from the Java package)
uiPath is the UI navigation path to the content key
messageOrContentType (e.g., added, deleted, updated, info, warning, error, title, content, etc.) should be added based on the type of content. Example messages: (1) The page has been updated. (2) There was an error processing your request.
n.* handles the following cases: When there are multiple content areas on a single page (e.g., when the content is separated by, an image, etc), when content is in multiple paragraphs or when content is in an (un)ordered list - a numeric identifier should be appended. Example: ...content.1, ...content.2
When there are multiple content areas on a page and one or more need to be further broken up (based on the HTML example above), a secondary numeric identifier may be appended to the key. Example: ...content.1.1, ...content.1.2
Examples:
training.mySetup.myInfo.content.1 = This is the first sentence of content 1. This is the second sentence of content 1. This content will be surrounded by paragraph tags.
training.mySetup.myInfo.content.2 = This is the first sentence of content 2. This is the second sentence of content 2. This content will also be surrounded by paragraph tags.
training.mySetup.myInfo.title = My Information
training.mySetup.myInfo.updated = Your personal information has been updated.
Advantages / Disadvantages:
+ Label keys can easily be reused; location is irrelevant.
+ For content keys that are not reused, locating the page on the UI will be simple and logical.
- It may not be clear to translators where label keys reside on the UI. This may be a non-issue for translators who do not navigate the pages, but may still be an issue for developers.
- If content keys must be used in more than one location on the UI (which is highly likely), the key name choice will not make sense in the other location(s). In our case, management is not concerned with a duplication of values for content areas, so we will be using different keys (to demonstrate the location on the UI) in this case.
Feedback on this convention - especially feedback that will improve it - would be much appreciated since we are currently revamping our resource bundles! :)

I'd propose the below convention
functionalcontext.subcontext.key
logicalcontext.subcontext.key
This way you can logically group all the common messages in a super context (id in the below example). There are few things that aren't specific to any functional context (like lastName etc) which you can group into logical-context.
order.id=Order Id
order.submission.submit=Submit Order
name.last=Last Name

the method that I have personally used and that I've liked more so far is using sentence to localisee as the key. For example: (pls replace T with the right syntax dependably on the language)
for example:
print(T("Hello world"))
in this case T will search for a key "Hello world". If it is not found then the key is returned, otherwise the value of the key.
In this way, you do not need to edit the message (in your default language) at least that you need to use parameters.... It saved me a LOT of dev time

Related

What is the essential difference between Document and Collectiction in YAML syntax?

Warning: This question is a more philosophical question than practical, but I find it well as to be asked and answered in practical contexts (forums like StackOverflow here, instead of the SoftwareEngineering stack-exchange website), due to the native development in the actual use de-facto of YAML and the way the way it's specification has evolved and features have been added to it over time. Let's ask:
As opposed to formats/languages/protocols such as JSON, the YAML format allows you (according to this link, that seems pretty official, or at least accurate and reliable source to understand the YAML specification) to embed multiple 'Documents' within one file/stream, using the three-dashes marking ("---").
If so, it's hard to ignore the fact that the concept/model/idea of 'Document' in YAML, is no longer an external definition, or "meta"-directive that helps the human/parser to organize multiple/distincted documents along each other (similar to the way file-systems defining the concept of "file" to organize different files, but each file in itself - does not necessarily recognize that it's a file, or that it's being part of a file system that wraps it, by definition, AFAIK.
However, when YAML allows for a multi-Document YAML files, that gather collections of Documents in a single YAML file (and perhaps in a way that is similar/analogous to HTTP Pipelining approach of HTTP protocol), the concept/model/idea/goal of Document receives a new, wider definition/character de-facto, as a part of the YAML grammar and it's produces, and not just of the YAML specification as an assistive concept or format description that helps to describe the specification.
If so, being a Document part of the language itself, what is the added value of this data-structure, compared to the existing, familiar and well-used good old data-structure of Collection (array of items)?
I'm asking it, because I've seen in this link (here) some snippet (in the second example), which describes a YAML sequence that is actually a collection of logs. For some reason, the author of the example, chose to prefer to present each log as a separate "Document" (separated with three-dashes), gathered together in the same YAML sequence/file, instead of writing a file that has a "Collection" of logs represented with the data-type of array. Why did he choose to do this? Is his choice fit, correct, ideal?
I can speculate that the added value of the distinction between a Document and a Collection become relevant when using more advanced features of the YAML grammar, such as Anchors, Tags, References. I guess every Document provide a guarantee that all these identifiers will be a unique set, and there is no collision or duplicates among them. Am I right? And if so, is this the only advantage, or maybe there are any more justifications for the existence of these two pretty-similar data structures?
My best for now, is to see Document as a "meta"-Collection, that is more strict, and lack of high-level logic, or as two different layers of collection schemes. Is it correct, accurate way of view?
And even if I am right, why in the above example (of the logs document from the link), when there's no use and not imply or expected to use duplications or collisions or even identifiers/anchors or compound structures at all - the author is still choosing to represent the collection's items as separate documents? Is this just not so successful selection of an example? Or maybe I'm missing something, and this is a redundancy in the specification, or an evolving syntactic-sugar due to practical needs?
Because the example was written on a website that looks serious with official information written by professionals who dealt with the essence of the language and its definition, theory and philosophy behind (as opposed to practical uses in the wild), and also in light of other provided examples I have seen in it and the added value of them being meticulous, I prefer not to assume that the example is just simply imperfect/meticulous/fit, and that there may be a good reason to choose to write it this way over another, in the specific case exampled.
First, let's look at the technical difference between the list of documents in a YAML stream and a YAML sequence (which is a collection of ordered items). For this, I'll discuss YAML tags, which are an advanced feature so I'll provide a quick overview:
YAML nodes can have tags, such as !!str (the official tag for string values) or !dice (a local tag that can be interpreted by your application but is unknown to others). This applies to all nodes: Scalars, mappings and sequences. Nodes that do not have such a tag set in the source will be assigned the non-specific tag ?, except for quoted scalars which get ! instead. These non-specific tags are later resolved to specific tags, thereby defining to which kind of data structure the node will be deserialized into.
YAML implementations in scripting languages, such as PyYAML, usually only implement resolution by looking at the node's value. For example, a scalar node containing true will become a boolean value, 42 will become an integer, and droggeljug will become a string.
YAML implementations for languages with static types, however, do this differently. For example, assume you deserialize your YAML into a Java class
public class Config {
String name;
int count;
}
Assume the YAML is
name: 42
count: five
The 42 will become a String despite the fact that it looks like a number. Likewise, five will generate an error because it is not a number; it won't be deserialized into a string. This means that not the content of the node defines how it will be deserialized, but the path to the node.
What does this have to do with documents? Well, the YAML spec says:
Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node and (3) the content (and hence the kind) of the node.)
So, the technical difference is: If you put your data into a single document with a collection at the top, the YAML processor is allowed to take into account the position of the data in the top-level collection when resolving a tag. However, when you put your data in different documents, the YAML processor must not depend on the position of the document in the YAML stream for resolving the tag.
What does this mean in practice? It means that YAML documents are structurally disjoint from one another. Whether a YAML document is valid or not must not depend on any preceeding or succeeding documents. Consequentially, even when deserialization runs into a semantic problem (such as with the five above) in one document, a following document may still be deserialized successfully.
The goal of this design is to be able to concatenate arbitrary YAML documents together without altering their semantics: A middleware component may, without understanding the semantics of the YAML documents, collect multiple streams together or split up a single stream. As long as they are syntactically correct, stream splitting and merging are sound operations that do not invalidate a YAML document even if another document is structurally invalid.
This design primary focuses on sending and receiving data over networks. Of course, nowadays, YAML is primarily used as configuration language. This is why this feature is seldom used and of rather little importance.
Edit: (Reply to comment)
What about end-cases like a string-tagged Document starts with a folded-string, making even its following "---" and "..." just a characters of the global string?
That is not the case, see rules l-bare-document and c-forbidden. A line containing un-indented ... not followed by non-whitespace will always end a document if one is open.
Moreover, ... doesn't do anything if no document is open. This ensures that a stream merger can always append ... to a document to ensure that the current document is closed, but no additional one is created.
--- has widely been adopted as separator between YAML documents (and, perhaps more prominently, between YAML front matter and content in tools like Jekyll) where ... would have been more appropriate, particularly in Jekyll. This gives the false impression that --- should be used by tooling to separate documents, when in reality ... is the syntactic element designed for that use-case.

Using XPath to get strings between and inside tags

Super new to XPath so forgive me if I stumble through terms. I'm using IMPORTXML() in a google doc in order to pull info from a webpage. Basically what I'm shooting for is to turn this
into
What I can't figure out is how to pull info between the <br> nodes and pull the string from within the <a> node.
I've fumbled my way as far as =IMPORTXML($A$1, "//p/b[starts-with(text(), '"& $A4 &"')]/following-sibling::text()[1]") to get a return of 1 for Casting Time, but not any further.
The end goal is to do this for about a dozen different values across the page and cycle the checks through about 500 web pages, hence the cells in the formula. Any help would be appreciated.
Super in depth clarification section
Using XPath and a Google Sheet I am attempting to automatically make a roll20 formatted template macro for each spell on a spell casters list.
For example, the Shaman Spell List I used //tr/td[1]/a[#href] and //tr/td[1]/a/#href to create side by side columns of spell names and their associated URL's.
Then on another page I can copy and paste the entire class spell list and use Vlookup to get the associated URL's while keeping the organized level sectioned tables like so (Note the Hyperlinked spell names are rich text so the internal URL is invisible to IMPORTXML, hence the extra step).
With a single class having upwards of 500+ spells the ultimate goal is to create a series of IMPORTXML that look at the spell URL and pull relevant data from this particular section. For this example I'm using Arcane Mark.
The final goal is to use IMPORTXML to get each important category such as School, Casting Time, Target, Effect, Area, Range, etc. Put them in their respective columns and have a Concatenate I've written go through and pull all the various parts into one big formatted string compatible with the roll20 macro template to look like &{template:default} {{Name=Arcane mark}} {{School=Universal}} {{Casting Time=1 Standard Action}} {{Components=V,S}} {{Range=Touch}} {{Effect=One personal rune or mark, all of which must fit within 1 sq. ft.}} {{Duration=Permanent}} {{Saving Throw=None}} {{Spell Resistance=No}}
=ARRAYFORMULA(REGEXEXTRACT(TRANSPOSE(QUERY(TRANSPOSE(QUERY(ARRAY_CONSTRAIN(
IMPORTDATA("http://www.d20pfsrd.com/magic/all-spells/a/arcane-mark"),1000,5),
"where Col1 contains 'School'", 0)),,999^99)), A10&"\</b>\ (.+)\;"))

Kofax Seperate Main Invoice from Supporting Document without using Seperator sheet

When a batch gets created documents should get separated automatically without using separator sheet or Barcode separator.
How can I classify documents for Invoice and supporting document.
In our project we get many invoices with supporting document so the scanning person has to insert the separator sheets manually, so to avoid this we want to automatically classify the supporting documents.
In general the concept would be that you would enable separation in the project and then train your classes with examples to be used for the layout or content classifiers.
However, as I'm sure you've seen, the obstacle with invoices is that they are different enough between vendors that it would not reliably classify all to an Invoice class. Similarly with "Supporting Documents" which are likely to be very different from each other, so unfortunately there isn't a completely easy answer without separator sheets (or barcode stickers affixed to supporting docs).
What you might want to do is write code in the one of the separation events like Document_AfterSeparate event. Despite the name, the document has not yet been split at this point, but the classifiers have run. See Scripting Help topic "Server Script Events Sequence > Document Separation > Standard Document Separation" for more detail. Setting the SplitPage property on the CDocPage (pXDoc.CDoc.Pages.ItemByIndex(lPage).SplitPage) will allow you to use your own logic to determine which pages to separate.
For example if you know that you will always have single page invoices, you can split on the first page and classify accordingly. Or you can try to search for something that indicates the end of the invoice like "Total" or other characteristics. There is an example of how you can use locators to help separation in the Scripting Help topic "Script Samples > Use Locator Results for Standard Document Separation". The example uses a Barcode Locator, but the same concept works if you wanted to try it with a Format Locator or anything else.
Without Separator sheets you will need a smart classification software like Kofax Transformation Module (KTM). Its kind of expensive. you will need to verify the cost saving and ROI.

Internalizating content heavy pages into messagefiles seem cumbersome in play2?

Internalization in Play2 can be done with Message.get("home.title") and language files. What about when you internalizate a page full of textual content and not just one specific header or link?
For example doing Messagefile for a long page representing e.g. product info:
_First header_
Some paragraphs of text
...
_Tenth header_
Tenth paragraph and more text*
Messagefile
a)
product.info = "<many paragraphs of text including headers>"
or splitting one page into html elements
b)
product.info.h1 = "<first header>"
product.info.p1 = "<first para>"
product.info.p2 = "<2nd para>"
For me both solutions doesn't sound right. In first having a vast value for a single key seems bad convention and in latter separating a single page into dozens of keys doesn't sound good either.
Big websites often follow the convention www.site.com/en-us/product/1 of having the language in the URL. So the question is, how do i do in this way and is doing in this way a better way at all? I could easily end up not just translating to dozen languages but doing also dozen times layout changes.
I could use global codesnippets using Messagefile for elements that have a little text and doesn't change often e.g. navigation /view/global/header/somenavbar.scala.html but then i end up only having a complex folder structure.
Another way, a best practise, in Play 2 for internalization than messagefile?
Take a look to the Joscha Feth's solution in play_authenticate Java sample.
There are templates for emails in 3 languages for email confirmation, password reseting etc.
Template for each 'type' of email && each language is kept in single file ie:
_password_reset_en.scala.html
_password_reset_de.scala.html
_password_reset_pl.scala.html
_verify_email_en... etc
And for each 'type' there is an 'parent' template, which contains a condition (common Scala's match check the Tags section of template doc) which returns rendered view depending on detected language:
password_reset.scala.html
Finally, yes, at the beginning I also thought that some kind of madness, but believe me, that technique can be useful. There's field for further improvements I think. Maybe it would be better to move the language conditioning to the controller, hm I think that depends on many factors and it will be great if you'll find a time to investigate this topic.

Best practice for key values in translation files

In general, translation methods take a key > value mapping and use the key to transform that into a value. Now I recognize two different methods to name your translation keys and within my team we do not come to consensus what seems to be the best method.
Method 1:
Use full English words or sentences:
Name => Name
Please enter your email address => Please enter your email address
Method 2:
Use keywords describing the situation:
NAME => Name
ENTER_EMAIL => Please enter your email address
I personally prefer method #1 because it directly shows the meaning of the message. If the translation is not present, you could fall back to the key and this doesn't cause any problems. However, the method is cumbersome when a translation changes frequently, because all the files need to be updated. Also for longer texts these keys become very large. This is solved by using keys like ENTER_EMAIL, but the phrasing is completely out of the context. The list of abstract translation keys would be huge, you need meta data for all the keys explaining their usage and collisions can occur much easier.
Is there a best of both worlds or a third method? How do you use translation keys in your application? In our case it is a php-based webapplication, but I think above problem is generic enough to talk about i18n in general.
This is a question that is also faced by iOS/OSX developers. And for them there is even a standard tool called genstrings which assumes method 1. But of course Apple developers don't have to use this tool--I don't.
While the safety net that you get with method 1 is nice (i.e you can display the key if you somehow forgot to localize a string) it has the downside that it can lead to conflicting keys. Sometimes one identical piece of display text needs to be localized in two different ways, due to grammar rules or differences in context. For instance the French translation for "E-mail" would be "E-mail" if it's a dialog title and "Envoyer un e-mail" if it's a button (in French the word "E-mail" is only a noun and can't be used as a verb, unlike in English where it's both a noun and a verb). With method 2 you could have keys EMAIL_TITLE and EMAIL_BUTTON to solve this issue, and as a bonus this would give a hint to translators to help them translate correctly.
One more advantage of method 2 is that you can change the English text without having to worry about updating the key in English and in all your localizations.
So I recommend method 2!
Why not use both worlds? I use method #1 for short strings, and method #2 for long strings that are full sentences. I am not afraid to mix both in the same file.
For example, in the following string the text may change if in a new app version the user experience is modified:
"screen description" = "Tap the plus button to add a new item. Tap an item for more options or to edit its details.";
So here it makes sense to apply method #2.
However, for simple strings like in the following example, method #1 is more useful:
"Preferences" = "Preferences";
In general when people try to standardize things it often appears restrictive to me. Personally, I prefer a more "anarchistic" approach where several methods are valid (not only as in this method #1 vs method #2 thread, but also for example when a team of developers fight over coding style).

Resources