I want to identify words like "sooooooooooooooo" and replace them with "so" in Spell Check. How can I achieve this ? What do I write (meaning a Filter, etc.) and Where do I tweak the code for the same ?
Thanks !
You could use store_replacement, however my understanding is that store_replacement needs to be implemented by the underlying provider. If you use the provider Aspell which implements it you can see it working like so: (Note you will need to install Aspell and it's dictionaries to see this working)
import enchant
# Get the broker.
b = enchant.Broker()
# Set the ordering on the broker so aspell gets used first.
b.set_ordering("en_US","aspell,myspell")
# Print description of broker just to see what's available.
print (b.describe())
# Get an US English dictionary.
d=b.request_dict("en_US")
# Print the provider of the US English dictionary.
print (d.provider)
# A test string.
s = 'sooooooooooooooo'
# We will check the word is not in the dictionary not needed if we know it isn't.
print (d.check(s))
# Print suggestions for the string before we change anything.
print (d.suggest(s))
# Store a relacement for our string as "so".
d.store_replacement(s, 'so')
# Print our suggestions again and see "so" appears at the front of the list.
print (d.suggest(s))
[<Enchant: Aspell Provider>, <Enchant: Ispell Provider>, <Enchant: Myspell Provider>, <Enchant: Hspell Provider>]
<Enchant: Aspell Provider>
False
['SO', 'so', 'spoor', 'sou', 'sow', 'soy', 'zoo', 'Soho', 'Soto', 'solo', 'soon', 'soot', 'shoo', 'soar', 'sour', 'shoos', 'sooth', 'sooty', 'Si', 'sootier', 'sough', 'SOP', 'sop', 'S', 'poo', 's', 'sooner', 'soothe', 'sorrow', 'Sir', 'Sui', 'sci', 'sir', 'poos', 'silo', 'soap', 'soil', 'soup', 'SA', 'SE', 'SS', 'SW', 'Se', 'soother', 'SOB', 'SOS', 'SOs', 'SRO', 'Soc', 'Sol', 'Son', 'sob', 'soc', 'sod', 'sol', 'son', 'sot', 'boo', 'coo', 'foo', 'goo', 'loo', 'moo', 'ooh', 'too', 'woo', 'CEO', "S's", 'SSA', 'SSE', 'SSS', 'SSW', 'Sue', 'Zoe', 'saw', 'say', 'sea', 'see', 'sew', 'sue', 'xor', 'Snow', 'Sony', 'Sosa', 'boos', 'bozo', 'coos', 'loos', 'moos', 'oohs', 'ooze', 'oozy', 'orzo', 'ouzo', 'sago', 'scow', 'sloe', 'slow', 'snow', 'soak']
['so', 'SO', 'spoor', 'sou', 'sow', 'soy', 'zoo', 'Soho', 'Soto', 'solo', 'soon', 'soot', 'shoo', 'soar', 'sour', 'shoos', 'sooth', 'sooty', 'Si', 'sootier', 'sough', 'SOP', 'sop', 'S', 'poo', 's', 'sooner', 'soothe', 'sorrow', 'Sir', 'Sui', 'sci', 'sir', 'poos', 'silo', 'soap', 'soil', 'soup', 'SA', 'SE', 'SS', 'SW', 'Se', 'soother', 'SOB', 'SOS', 'SOs', 'SRO', 'Soc', 'Sol', 'Son', 'sob', 'soc', 'sod', 'sol', 'son', 'sot', 'boo', 'coo', 'foo', 'goo', 'loo', 'moo', 'ooh', 'too', 'woo', 'CEO', "S's", 'SSA', 'SSE', 'SSS', 'SSW', 'Sue', 'Zoe', 'saw', 'say', 'sea', 'see', 'sew', 'sue', 'xor', 'Snow', 'Sony', 'Sosa']
I am hoping this is something simple I am just overlooking. We have 3 Plone sites that are supposed to be exactly the same in their core setup, only differing with certain products installed and the actual content. I noticed our translations are working on one site, and not on the other two. So far I can't find any differences.
We are using i18ndude (version 3.3.3) with Plone 4.3.2. We do have custom products/types with our own domain, but it is more than just those not working, it is everything in the site.
For testing, I have tried just grabbing and printing the browser's language. I did it with both context.REQUEST['LANGUAGE'] and context.portal_languages.getPreferredLanguage(). I set my browser language in each attempt to 'es', 'en', and 'pt', as those are the languages we are currently supporting. The Site Language in each site is set to English. Here are my test results:
Browser Language set to 'es':
Site A: returned 'es'
Site B: returned 'en'
Site C: returned 'en'
Browser Language set to 'en':
Site A: returned 'en'
Site B: returned 'en'
Site C: returned 'en'
Browser Language set to 'pt':
Site A: returned 'en'
Site B: returned 'en'
Site C: returned 'en'
Site A and B are both on the same server, so I don't believe its a missing server package. The buildouts are almost identical for those two, but the differences are just in a couple eggs that are seemingly unrelated to this issue.
I just don't understand why it isn't even detecting the updated browser language at all, it just defaults back to the site's preferred language it seems. Except for one scenario in one site. What is strange is, these all used to work to the best of my knowledge, and I am not sure when they stopped.
I did check context.portal_languages.getAvailableLanguages() just to make sure the ones I am using are in there, and they are. I also checked the ownership and permissions of the locales & i18n directories, those are all a match across sites and set accurately.
EDIT
This is a script I quickly wrote to see what all values Plone is getting:
pl = context.portal_languages
langs = [str(language) for language in pl.getAvailableLanguages().keys()]
print langs
print "Preferred: ", pl.getPreferredLanguage()
ts = context.translation_service
print "Request Language: ", context.REQUEST['LANGUAGE']
print "Accept Language: ", context.REQUEST['HTTP_ACCEPT_LANGUAGE']
return printed
This is my browser language setup when running this, listed by highest priority first:
pt-br
pt
es
en
en-us
And this is my result (site A, which seems to recognize Spanish, but not Portuguese):
['gv', 'gu', 'gd', 'ga', 'gn', 'gl', 'lg', 'lb', 'ty', 'ln', 'tw', 'tt', 'tr', 'ts', 'li', 'tn', 'to', 'tl', 'lu', 'tk', 'th', 'ti', 'tg', 'as', 'te', 'ta', 'yi', 'yo', 'de', 'ko', 'da', 'dz', 'dv', 'qu', 'kn', 'lv', 'el', 'eo', 'en', 'zh', 'ee', 'za', 'uk', 'eu', 'zu', 'es', 'ru', 'rw', 'kl', 'rm', 'rn', 'ro', 'bn', 'be', 'bg', 'ba', 'wa', 'wo', 'bm', 'jv', 'bo', 'bh', 'bi', 'br', 'bs', 'ja', 'om', 'oj', 'la', 'oc', 'kj', 'lo', 'os', 'or', 'xh', 'ch', 'co', 'ca', 'ce', 'cy', 'cs', 'cr', 'cv', 'cu', 'ps', 'pt', 'lt', 'pa', 'pi', 'ak', 'pl', 'hz', 'hy', 'an', 'hr', 'am', 'ht', 'hu', 'hi', 'ho', 'ha', 'he', 'mg', 'uz', 'ml', 'mo', 'mn', 'mi', 'mh', 'mk', 'ur', 'mt', 'ms', 'mr', 'ug', 'my', 'ki', 'aa', 'ab', 'ae', 've', 'af', 'vi', 'is', 'vk', 'iu', 'it', 'vo', 'ii', 'ay', 'ik', 'ar', 'km', 'io', 'et', 'ia', 'az', 'ie', 'id', 'ig', 'ks', 'nl', 'nn', 'no', 'na', 'nb', 'nd', 'ne', 'ng', 'ny', 'kw', 'nr', 'nv', 'kv', 'fr', 'ku', 'fy', 'fa', 'kk', 'ff', 'fi', 'fj', 'ky', 'fo', 'ka', 'kg', 'ss', 'sr', 'sq', 'sw', 'sv', 'su', 'st', 'sk', 'kr', 'si', 'sh', 'so', 'sn', 'sm', 'sl', 'sc', 'sa', 'sg', 'se', 'sd']
Preferred: es
Request Language: es
Accept Language: pt-br,pt;q=0.8,es;q=0.6,en;q=0.4,en-us;q=0.2
And results for Site B and C:
['en-mp', 'gv', 'gu', 'fr-dj', 'fr-gb', 'en-na', 'en-ng', 'en-nf', 'zh-hk', 'gd', 'pt-br', 'ga', 'gn', 'gl', 'en-nu', 'en-fm', 'en-ag', 'ms-my', 'ty', 'tw', 'tt', 'tr', 'ts', 'ko-kp', 'tn', 'to', 'tl', 'tk', 'th', 'ti', 'tg', 'te', 'zh-sg', 'ta', 'fr-mq', 'de', 'da', 'ar-ae', 'es-ni', 'dz', 'en-kn', 'fr-ml', 'dv', 'en-ms', 'fr-mg', 'fr-sc', 'fr-vu', 'qu', 'ar-qa', 'es-bo', 'en-nz', 'fr-bj', 'en-ws', 'fr-bi', 'zh', 'en-lr', 'fr-ch', 'fr-bf', 'za', 'fr-be', 'en-lc', 'fr-rw', 'zu', 'ch-mp', 'ar-ly', 'en-gb', 'en-nr', 'es-pr', 'tr-bg', 'en-gh', 'en-gi', 'fr-km', 'es-py', 'en-gm', 'es-pe', 'es-pa', 'en-gu', 'en-gy', 'sw-tz', 'ms-sg', 'wa', 'pt-st', 'wo', 'pt-ao', 'jv', 'fr-cd', 'ja', 'en-vu', 'es-ar', 'fr-td', 'fr-tg', 'da-dk', 'ch', 'co', 'en-vg', 'en-bz', 'ca', 'en-us', 'ce', 'en-ai', 'en-bm', 'en-vi', 'cy', 'en-bn', 'cs', 'cr', 'fr-ci', 'cv', 'cu', 'en-bb', 'ps', 'ln-cg', 'pt', 'en-au', 'zh-tw', 'es-mx', 'de-de', 'pa', 'es-ve', 'en-as', 'en-er', 'pi', 'de-dk', 'pl', 'en-sb', 'ch-gu', 'es-hn', 'en-sc', 'fr-nc', 'it-hr', 'ar-eg', 'mg', 'pt-pt', 'ml', 'mo', 'mn', 'mi', 'mh', 'mk', 'mt', 'ms', 'mr', 'fr-fr', 'hu-si', 'my', 'sv-fi', 'fr-re', 'en-pk', 've', 'vi', 'is', 'vk', 'iu', 'it', 'vo', 'ii', 'ik', 'en-io', 'fr-cm', 'io', 'ia', 'ie', 'id', 'ig', 'es-cu', 'hu-hu', 'es-cr', 'es-cl', 'es-co', 'fr-wf', 'pt-mz', 'en-il', 'it-it', 'de-be', 'fr', 'en-ke', 'fr-ga', 'fr-pf', 'es-do', 'ar-ps', 'fy', 'fr-gn', 'fr-pm', 'en-ki', 'en-ug', 'fa', 'fr-gp', 'ff', 'fi', 'fj', 'fo', 'ar-kw', 'bn-sg', 'ss', 'sr', 'sq', 'sw', 'sv', 'su', 'st', 'sk', 'si', 'sh', 'so', 'sn', 'sm', 'sl', 'sc', 'sa', 'sg', 'se', 'sd', 'bn-in', 'fr-mc', 'sv-se', 'ar-bh', 'lg', 'lb', 'la', 'ln', 'lo', 'ss-za', 'li', 'lv', 'lt', 'lu', 'sw-ke', 'en-bw', 'yi', 'en-ph', 'en-pn', 'yo', 'en-ie', 'en-pg', 'pt-cv', 'hr-ba', 'bn-bd', 'en-pr', 'en-pw', 'ss-sz', 'ar-iq', 'de-ch', 'ar-il', 'es-sv', 'el', 'eo', 'en', 'ar-dz', 'ee', 'tn-bw', 'es-gq', 'fr-gf', 'es-gt', 'eu', 'et', 'de-lu', 'es', 'ru', 'rw', 'zh-cn', 'ar-td', 'nl-nl', 'it-sm', 'it-si', 'rm', 'rn', 'ro', 'ar-sa', 'be', 'bg', 'ur-pk', 'ba', 'fr-ca', 'bm', 'bn', 'bo', 'bh', 'bi', 'fr-cg', 'fr-cf', 'es-us', 'el-cy', 'en-vc', 'sd-pk', 'ta-sg', 'br', 'bs', 'nl-an', 'sd-in', 'cs-cz', 'om', 'oj', 'fr-lb', 'en-fk', 'en-fj', 'oc', 'ln-cd', 'fr-lu', 'ar-om', 'de-at', 'os', 'or', 'tr-cy', 'xh', 'el-gr', 'de-li', 'ar-sy', 'en-jm', 'es-ec', 'ar-so', 'it-ch', 'en-ls', 'ar-sd', 'es-es', 'en-rw', 'tn-za', 'ar-jo', 'en-ky', 'en-bs', 'hz', 'ar-ma', 'da-gl', 'hy', 'en-mt', 'en-mu', 'nl-aw', 'en-mw', 'hr', 'en-tt', 'en-zw', 'ht', 'hu', 'en-to', 'ar-mr', 'hi', 'en-tk', 'ho', 'hr-hr', 'ha', 'en-tc', 'pt-gw', 'he', 'en-dm', 'fr-it', 'uz', 'en-et', 'ur-in', 'ur', 'tr-tr', 'uk', 'ms-bn', 'ug', 'aa', 'en-so', 'en-sl', 'ab', 'ae', 'en-sh', 'af', 'en-sg', 'ak', 'am', 'ko-kr', 'an', 'as', 'ar', 'en-sz', 'nl-be', 'ay', 'az', 'ar-lb', 'nl', 'nn', 'no', 'na', 'nb', 'nd', 'ne', 'ng', 'ny', 'ta-in', 'fr-yt', 'en-za', 'nr', 'nv', 'ar-ye', 'ar-tn', 'en-cm', 'en-ck', 'sr-ba', 'en-ca', 'ka', 'kg', 'en-gd', 'es-uy', 'kk', 'kj', 'ki', 'ko', 'kn', 'km', 'kl', 'ks', 'kr', 'fr-ad', 'kw', 'kv', 'ku', 'en-zm', 'ky', 'fr-ht', 'nl-sr']
Preferred: en
Request Language: en
Accept Language: pt-br,pt;q=0.8,es;q=0.6,en;q=0.4,en-us;q=0.2
I just noticed that the list of available languages from portal_languages is different between those sites. Adding to the strange, but maybe a hint to the culprit?
Sorry for the long post, just trying to give as much info as I can!
My suspicions were right about it being something simple I am overlooking. Posting my find here.
In the ZMI, go to portal_languages and check these settings:
Default Language
Allowed Languages
ALL supported languages should be selected.
Negotiation Scheme
Make sure "Use browser language request negotiation" is checked
My issue was that only the Default language was selected in the Allowed Languages selection list. I am not sure why it go reset like this or how. When using the Language Settings Control Panel I did not see the Allowed Languages option, had to go to ZMI for it.
Apparently the changes mentioned by hvelarde did not update this setting either.
Search the instance part of your buildout for the environment variable zope_i18n_allowed_languages; it is used to restrict the languages for which po files are loaded to speed up Zope startup time and use less memory.
In your case, you should set it as follows:
[instance]
...
environment-vars =
PTS_LANGUAGES en es pt
zope_i18n_allowed_languages en es pt
zope_i18n_compile_mo_files true
For more information check Maurits van Rees' Internationalization in Plone 3.3 and 4.0.