Creating a custom markup for a dual language text - markup

I need to create a file that contains two versions of a short story - one in its original language, and one in a translation to English - and shows correspondence between the two versions sentence by sentence. For example, the markup needs to show that
Esto es el texto de la oracion 1.
corresponds with
This is the text of sentence 1.
What is the best approach to structuring this? I can use an existing markup language like XML or write something new for the purpose. Any ideas would be appreciated.

Related

How to implement fuzzy search for Chinese pinyin and Japanese romaji?

I have some data in Chinese and Japanese, and I want it possible to search by their romanizations (Pinyin for Chinese, Romaji for Japanese). Assume that the romanizations are already provided, separated by syllables.
eg. the text "示例文本", which romanizes to ["shi", "li", "wen", "ben"].
Users should be able to match this by typing
whole syllables, with or without space, eg. shi li wen ben or shiliwenben
initials or first few letters of syllables, eg. shlwb or slwb
they might also type only part of the string, eg. wenben or wb (these examples correspond to the last two syllables of the text above).
Is there an elegant way of implementing this?
(note: I did not specify any programming language in this question, because I want to implement this in different languages. If your response is language-specific or requires specific libraries, please make it clear. Thank you!)

botframework v4 how to prevent language translator text api translate username on waterfall dialog

I'm using the demo sample: BotBuilder Samples
5.multi-turn-prompt &
17.multilingual-bot combine as single project.
how can I prevent language translator auto translate the name input by user to other language?
eg: if my name input in thai, then the bot response will not translate to en/es language.
I found two ways to do this:
If you're using a language that utilizes the Latin alphabet (such as english), you can do a search to see if a userName is included in the turnContext text, and replace it with <div class="notranslate">USERNAME_HERE</div> like below (forgive my funny variable names)
You do have to remember to remove the formatting that might come along, with the following:
If you're using a language with a dedicated alphabet (for example: Korean), you have to do a detection in the turnContext.onSendActivities for the userName, then slice the name out, translate the before and after, and then smoosh it all back together:

Defining new language grammar rules?

Can you help me how could I edit the .tagger file using Stanford NLP? I have problem here, i can't open and edit the file to define the grammar rules for new language to generate part of speech?
The .tagger files are serialized statistical models used by a Maximum Entropy based sequence tagger. You can't edit them in any meaningful way.
If you want to create part of speech tags for a new language, you will have to create training data which consists of a large set of sentences in the language you want and having the correct part of speech tag for each word in the sentence, and then train a new part of speech tagging model.

Automatic link to different language version of a document (sphinx)

I'm creating a multi-page document in two different languages (English and French) with possibly other languages to be added. The url of a given document will take the form of prefix/en/name.html or prefix/fr/name.html i.e. only the "en" or "fr" part will be different. Is is possible to include some code in the main template (layout.html ... or elsewhere?) that would take the url of the current (English) document, replace "/en/" by "/fr/" and insert it as a link to the "French" version? Something like
automatically retrieve:
prefix/en/this_document.html
transform into:
French
I essentially found the answer I needed in this post: https://groups.google.com/forum/#!topic/sphinx-users/Xmbs5AbnVKY
Basically, what I do is insert the following:
{{"English version"}}
where needed.

How to convert French accented words into codes for Magento 1.8?

I have two versions for my store - English and French. And I am doing the translation from English to French in app/locale/fr_FR/Mage_Page.csv
I notice that I have to use some codes for certain French characters, such as En-tête de page for tête de page.
So if I have French words like 100% Magasinage sécurisé, how can use convert it into codes like En-tête de page?
I think it can come from the encoding of your file Mage_Page.php. If you use linux, I think that you have it in your file's properties and if you use Windows, you can check with notepad++. Richard B. gave a good link that in my opinion might solve the problem.
I'm using french on all the magento website that we produce and I do not have any problem like that so it must come from the encoding of your file.
One last thing : if your file is utf8 encoded, it means that your text was imported from a different one and editor did what it could but not successfully to change the characters (can happen with text from excel).

Resources