Related
I have set up a CoreNLP server and am using Stanford NER to extract time periods from sentences.
If I use the online interactive demo at corenlp.run to parse the sentence
'Last year something happened.'
it shows 'DATE' and '2016'.
However, my own server, set up with the latest release of CoreNLP, only shows 'DATE'. What's more, when I use Python Requests to query my server's API with the same sentence, the first two tokens in the response contain the fields 'timex': {'type': 'DATE','tid': 't1', 'altValue': 'THIS P1Y OFFSET P-1Y'} and 'normalizedNER': 'THIS P1Y OFFSET P-1Y'.
If I just have to deal with the fact that my output is not as good as the demo's, then where is the Stanford NER or timex3 documentation explaining what THIS P1Y OFFSET P-1Y means or describing what other possible responses I might get in the normalizedNER field?
Here is the entire API response
[
{'word': 'Last', 'after': ' ', 'originalText': 'Last', 'timex': {'type': 'DATE', 'tid': 't1', 'altValue': 'THIS P1Y OFFSET P-1Y'}, 'pos': 'JJ', 'ner': 'DATE', 'lemma': 'last', 'normalizedNER': 'THIS P1Y OFFSET P-1Y', 'before': '', 'index': 1, 'characterOffsetBegin': 0, 'characterOffsetEnd': 4},
{'word': 'year', 'after': ' ', 'originalText': 'year', 'timex': {'type': 'DATE', 'tid': 't1', 'altValue': 'THIS P1Y OFFSET P-1Y'}, 'pos': 'NN', 'ner': 'DATE', 'lemma': 'year', 'normalizedNER': 'THIS P1Y OFFSET P-1Y', 'before': ' ', 'index': 2, 'characterOffsetBegin': 5, 'characterOffsetEnd': 9},
{'word': 'something', 'before': ' ', 'originalText': 'something', 'ner': 'O', 'lemma': 'something', 'after': ' ', 'characterOffsetEnd': 19, 'index': 3, 'characterOffsetBegin': 10, 'pos': 'NN'},
{'word': 'happened', 'before': ' ', 'originalText': 'happened', 'ner': 'O', 'lemma': 'happen', 'after': '', 'characterOffsetEnd': 28, 'index': 4, 'characterOffsetBegin': 20, 'pos': 'VBD'},
{'word': '.', 'before': '', 'originalText': '.', 'ner': 'O', 'lemma': '.', 'after': '', 'characterOffsetEnd': 29, 'index': 5, 'characterOffsetBegin': 28, 'pos': '.'}
]
If you closely look at the request made to corenlp server in the interactive demo, then you will see that current date is also sent as "date" parameter with the request.
For eg. If your sentence is "I went to school today.", then "today" has normalized ner is "2017-19-09" (current date).
If you dont pass the "date" parameter, "today" won't have the exact date as normalized ner.
Hope it makes sense.
Hi I have added a new feature to allow you to tell the pipeline to use the present date as the docDate when running, this is the main source of your issue. To get this feature you will have to use the latest version of Stanford CoreNLP available on GitHub.
Also, when you start the server you will have to use the -serverProperties option and supply a .properties file with these properties:
annotators = tokenize,ssplit,pos,lemma,ner,entitymentions
ner.usePresentDateForDocDate = true
If you do this it should work now and properly list 2016
I am hoping this is something simple I am just overlooking. We have 3 Plone sites that are supposed to be exactly the same in their core setup, only differing with certain products installed and the actual content. I noticed our translations are working on one site, and not on the other two. So far I can't find any differences.
We are using i18ndude (version 3.3.3) with Plone 4.3.2. We do have custom products/types with our own domain, but it is more than just those not working, it is everything in the site.
For testing, I have tried just grabbing and printing the browser's language. I did it with both context.REQUEST['LANGUAGE'] and context.portal_languages.getPreferredLanguage(). I set my browser language in each attempt to 'es', 'en', and 'pt', as those are the languages we are currently supporting. The Site Language in each site is set to English. Here are my test results:
Browser Language set to 'es':
Site A: returned 'es'
Site B: returned 'en'
Site C: returned 'en'
Browser Language set to 'en':
Site A: returned 'en'
Site B: returned 'en'
Site C: returned 'en'
Browser Language set to 'pt':
Site A: returned 'en'
Site B: returned 'en'
Site C: returned 'en'
Site A and B are both on the same server, so I don't believe its a missing server package. The buildouts are almost identical for those two, but the differences are just in a couple eggs that are seemingly unrelated to this issue.
I just don't understand why it isn't even detecting the updated browser language at all, it just defaults back to the site's preferred language it seems. Except for one scenario in one site. What is strange is, these all used to work to the best of my knowledge, and I am not sure when they stopped.
I did check context.portal_languages.getAvailableLanguages() just to make sure the ones I am using are in there, and they are. I also checked the ownership and permissions of the locales & i18n directories, those are all a match across sites and set accurately.
EDIT
This is a script I quickly wrote to see what all values Plone is getting:
pl = context.portal_languages
langs = [str(language) for language in pl.getAvailableLanguages().keys()]
print langs
print "Preferred: ", pl.getPreferredLanguage()
ts = context.translation_service
print "Request Language: ", context.REQUEST['LANGUAGE']
print "Accept Language: ", context.REQUEST['HTTP_ACCEPT_LANGUAGE']
return printed
This is my browser language setup when running this, listed by highest priority first:
pt-br
pt
es
en
en-us
And this is my result (site A, which seems to recognize Spanish, but not Portuguese):
['gv', 'gu', 'gd', 'ga', 'gn', 'gl', 'lg', 'lb', 'ty', 'ln', 'tw', 'tt', 'tr', 'ts', 'li', 'tn', 'to', 'tl', 'lu', 'tk', 'th', 'ti', 'tg', 'as', 'te', 'ta', 'yi', 'yo', 'de', 'ko', 'da', 'dz', 'dv', 'qu', 'kn', 'lv', 'el', 'eo', 'en', 'zh', 'ee', 'za', 'uk', 'eu', 'zu', 'es', 'ru', 'rw', 'kl', 'rm', 'rn', 'ro', 'bn', 'be', 'bg', 'ba', 'wa', 'wo', 'bm', 'jv', 'bo', 'bh', 'bi', 'br', 'bs', 'ja', 'om', 'oj', 'la', 'oc', 'kj', 'lo', 'os', 'or', 'xh', 'ch', 'co', 'ca', 'ce', 'cy', 'cs', 'cr', 'cv', 'cu', 'ps', 'pt', 'lt', 'pa', 'pi', 'ak', 'pl', 'hz', 'hy', 'an', 'hr', 'am', 'ht', 'hu', 'hi', 'ho', 'ha', 'he', 'mg', 'uz', 'ml', 'mo', 'mn', 'mi', 'mh', 'mk', 'ur', 'mt', 'ms', 'mr', 'ug', 'my', 'ki', 'aa', 'ab', 'ae', 've', 'af', 'vi', 'is', 'vk', 'iu', 'it', 'vo', 'ii', 'ay', 'ik', 'ar', 'km', 'io', 'et', 'ia', 'az', 'ie', 'id', 'ig', 'ks', 'nl', 'nn', 'no', 'na', 'nb', 'nd', 'ne', 'ng', 'ny', 'kw', 'nr', 'nv', 'kv', 'fr', 'ku', 'fy', 'fa', 'kk', 'ff', 'fi', 'fj', 'ky', 'fo', 'ka', 'kg', 'ss', 'sr', 'sq', 'sw', 'sv', 'su', 'st', 'sk', 'kr', 'si', 'sh', 'so', 'sn', 'sm', 'sl', 'sc', 'sa', 'sg', 'se', 'sd']
Preferred: es
Request Language: es
Accept Language: pt-br,pt;q=0.8,es;q=0.6,en;q=0.4,en-us;q=0.2
And results for Site B and C:
['en-mp', 'gv', 'gu', 'fr-dj', 'fr-gb', 'en-na', 'en-ng', 'en-nf', 'zh-hk', 'gd', 'pt-br', 'ga', 'gn', 'gl', 'en-nu', 'en-fm', 'en-ag', 'ms-my', 'ty', 'tw', 'tt', 'tr', 'ts', 'ko-kp', 'tn', 'to', 'tl', 'tk', 'th', 'ti', 'tg', 'te', 'zh-sg', 'ta', 'fr-mq', 'de', 'da', 'ar-ae', 'es-ni', 'dz', 'en-kn', 'fr-ml', 'dv', 'en-ms', 'fr-mg', 'fr-sc', 'fr-vu', 'qu', 'ar-qa', 'es-bo', 'en-nz', 'fr-bj', 'en-ws', 'fr-bi', 'zh', 'en-lr', 'fr-ch', 'fr-bf', 'za', 'fr-be', 'en-lc', 'fr-rw', 'zu', 'ch-mp', 'ar-ly', 'en-gb', 'en-nr', 'es-pr', 'tr-bg', 'en-gh', 'en-gi', 'fr-km', 'es-py', 'en-gm', 'es-pe', 'es-pa', 'en-gu', 'en-gy', 'sw-tz', 'ms-sg', 'wa', 'pt-st', 'wo', 'pt-ao', 'jv', 'fr-cd', 'ja', 'en-vu', 'es-ar', 'fr-td', 'fr-tg', 'da-dk', 'ch', 'co', 'en-vg', 'en-bz', 'ca', 'en-us', 'ce', 'en-ai', 'en-bm', 'en-vi', 'cy', 'en-bn', 'cs', 'cr', 'fr-ci', 'cv', 'cu', 'en-bb', 'ps', 'ln-cg', 'pt', 'en-au', 'zh-tw', 'es-mx', 'de-de', 'pa', 'es-ve', 'en-as', 'en-er', 'pi', 'de-dk', 'pl', 'en-sb', 'ch-gu', 'es-hn', 'en-sc', 'fr-nc', 'it-hr', 'ar-eg', 'mg', 'pt-pt', 'ml', 'mo', 'mn', 'mi', 'mh', 'mk', 'mt', 'ms', 'mr', 'fr-fr', 'hu-si', 'my', 'sv-fi', 'fr-re', 'en-pk', 've', 'vi', 'is', 'vk', 'iu', 'it', 'vo', 'ii', 'ik', 'en-io', 'fr-cm', 'io', 'ia', 'ie', 'id', 'ig', 'es-cu', 'hu-hu', 'es-cr', 'es-cl', 'es-co', 'fr-wf', 'pt-mz', 'en-il', 'it-it', 'de-be', 'fr', 'en-ke', 'fr-ga', 'fr-pf', 'es-do', 'ar-ps', 'fy', 'fr-gn', 'fr-pm', 'en-ki', 'en-ug', 'fa', 'fr-gp', 'ff', 'fi', 'fj', 'fo', 'ar-kw', 'bn-sg', 'ss', 'sr', 'sq', 'sw', 'sv', 'su', 'st', 'sk', 'si', 'sh', 'so', 'sn', 'sm', 'sl', 'sc', 'sa', 'sg', 'se', 'sd', 'bn-in', 'fr-mc', 'sv-se', 'ar-bh', 'lg', 'lb', 'la', 'ln', 'lo', 'ss-za', 'li', 'lv', 'lt', 'lu', 'sw-ke', 'en-bw', 'yi', 'en-ph', 'en-pn', 'yo', 'en-ie', 'en-pg', 'pt-cv', 'hr-ba', 'bn-bd', 'en-pr', 'en-pw', 'ss-sz', 'ar-iq', 'de-ch', 'ar-il', 'es-sv', 'el', 'eo', 'en', 'ar-dz', 'ee', 'tn-bw', 'es-gq', 'fr-gf', 'es-gt', 'eu', 'et', 'de-lu', 'es', 'ru', 'rw', 'zh-cn', 'ar-td', 'nl-nl', 'it-sm', 'it-si', 'rm', 'rn', 'ro', 'ar-sa', 'be', 'bg', 'ur-pk', 'ba', 'fr-ca', 'bm', 'bn', 'bo', 'bh', 'bi', 'fr-cg', 'fr-cf', 'es-us', 'el-cy', 'en-vc', 'sd-pk', 'ta-sg', 'br', 'bs', 'nl-an', 'sd-in', 'cs-cz', 'om', 'oj', 'fr-lb', 'en-fk', 'en-fj', 'oc', 'ln-cd', 'fr-lu', 'ar-om', 'de-at', 'os', 'or', 'tr-cy', 'xh', 'el-gr', 'de-li', 'ar-sy', 'en-jm', 'es-ec', 'ar-so', 'it-ch', 'en-ls', 'ar-sd', 'es-es', 'en-rw', 'tn-za', 'ar-jo', 'en-ky', 'en-bs', 'hz', 'ar-ma', 'da-gl', 'hy', 'en-mt', 'en-mu', 'nl-aw', 'en-mw', 'hr', 'en-tt', 'en-zw', 'ht', 'hu', 'en-to', 'ar-mr', 'hi', 'en-tk', 'ho', 'hr-hr', 'ha', 'en-tc', 'pt-gw', 'he', 'en-dm', 'fr-it', 'uz', 'en-et', 'ur-in', 'ur', 'tr-tr', 'uk', 'ms-bn', 'ug', 'aa', 'en-so', 'en-sl', 'ab', 'ae', 'en-sh', 'af', 'en-sg', 'ak', 'am', 'ko-kr', 'an', 'as', 'ar', 'en-sz', 'nl-be', 'ay', 'az', 'ar-lb', 'nl', 'nn', 'no', 'na', 'nb', 'nd', 'ne', 'ng', 'ny', 'ta-in', 'fr-yt', 'en-za', 'nr', 'nv', 'ar-ye', 'ar-tn', 'en-cm', 'en-ck', 'sr-ba', 'en-ca', 'ka', 'kg', 'en-gd', 'es-uy', 'kk', 'kj', 'ki', 'ko', 'kn', 'km', 'kl', 'ks', 'kr', 'fr-ad', 'kw', 'kv', 'ku', 'en-zm', 'ky', 'fr-ht', 'nl-sr']
Preferred: en
Request Language: en
Accept Language: pt-br,pt;q=0.8,es;q=0.6,en;q=0.4,en-us;q=0.2
I just noticed that the list of available languages from portal_languages is different between those sites. Adding to the strange, but maybe a hint to the culprit?
Sorry for the long post, just trying to give as much info as I can!
My suspicions were right about it being something simple I am overlooking. Posting my find here.
In the ZMI, go to portal_languages and check these settings:
Default Language
Allowed Languages
ALL supported languages should be selected.
Negotiation Scheme
Make sure "Use browser language request negotiation" is checked
My issue was that only the Default language was selected in the Allowed Languages selection list. I am not sure why it go reset like this or how. When using the Language Settings Control Panel I did not see the Allowed Languages option, had to go to ZMI for it.
Apparently the changes mentioned by hvelarde did not update this setting either.
Search the instance part of your buildout for the environment variable zope_i18n_allowed_languages; it is used to restrict the languages for which po files are loaded to speed up Zope startup time and use less memory.
In your case, you should set it as follows:
[instance]
...
environment-vars =
PTS_LANGUAGES en es pt
zope_i18n_allowed_languages en es pt
zope_i18n_compile_mo_files true
For more information check Maurits van Rees' Internationalization in Plone 3.3 and 4.0.
How can I import a customer's address line 2 using the AvS importer for Magento? The example shows only one line and the names do not match the Magento fields, so I'm not sure how to handle this.
$data = array(
array(
'email' => 'customer#company.com',
'_website' => 'base',
'group_id' => 1,
'firstname' => 'John',
'lastname' => 'Doe',
'_address_firstname' => 'John',
'_address_lastname' => 'Doe',
'_address_street' => 'Main Street 1',
'_address_postcode' => '12345',
'_address_city' => 'Springfield',
'_address_country_id' => 'US',
'_address_telephone' => '+1 2345 6789',
'_address_default_billing_' => 1,
'_address_default_shipping_' => 0,
));
I tried adding '_address_street2', but that does not work.
You can use the \n (newline) in the _address_street field as divider between the first and second line.
...
'_address_street' => "Main Street\n1",
...
Use double quotes for this entry in the array to have the \n be parsed as newline
The result will be stored as provided in the database table customer_address_entity_text, which is in Magento terms a multiline field. For displaying it in the Magento front and backend Magento will automatically split it up by the newline and place this in separate input fields.
Is there a best practice for normalizing British and American English in Elasticsearch?
Using a Synonym Token Filter requires an incredibly long configuration file. There are actually several thousand differently spelled words in UK and US English and it's almost impossible to find a really comprehensive list of words. Here's a list of almost 2.000 words, but it's far from being complete.
Preferably, I'd like to create an ES Analyzer/Filter with rules to transform US to UK English. Maybe that's the better approach, but I don't know where to start - which type of filters do I need for that? It doesn't have to cover everything - it should merely normalize most search terms. E.g. "grey" - "gray", "colour" - "color", "center" - "centre", etc.
Here's the approach I went for after fiddling around a while. It's a combination of basic rules, "fixes", and synonyms: First, apply a char_filter to enforce a set of basic spelling rules. It's not 100% correct, but it does the job pretty well:
"char_filter": {
"en_char_filter": { "type": "mapping", "mappings": [
# fixes
"aerie=>axerie", "aeroplane=>airplane", "aloe=>aloxe", "canoe=>canoxe", "coerce=>coxerce", "poem=>poxem", "prise=>prixse",
# whole words
"armour=>armor", "behaviour=>behavior", "centre=>center" "colour=>color", "clamour=>clamor", "draught=>draft", "endeavour=>endeavor", "favour=>favor", "flavour=>flavor", "harbour=>harbor", "honour=>honor",
"humour=>humor", "labour=>labor", "litre=>liter", "metre=>meter", "mould=>mold", "neighbour=>neighbor", "plough=>plow", "saviour=>savior", "savour=>savor",
# generic transformations
"ae=>e", "ction=>xion", "disc=>disk", "gramme=>gram", "isable=>izable", "isation=>ization", "ise=>ize", "ising=>izing", "ll=>l", "oe=>e", "ogue=>og", "sation=>zation", "yse=>yze", "ysing=>yzing"
] }
}
The "fixes" entry is there to prevent incorrect application of other rules. E.g. "prise=>prixse" prevents "prise" from getting changed into "prize", which has a different meaning. You may need to adapt this according to your own needs.
Next, include a synonym filter for catching the most frequently used exceptions:
"en_synonym_filter": { "type": "synonym", "synonyms": EN_SYNONYMS }
Here's our list of synonyms that includes the most important keywords for our use case. You may wish to adapt this list to your needs:
EN_SYNONYMS = (
"accolade, prize => award",
"accoutrement => accouterment",
"aching, pain => hurt",
"acw, anticlockwise, counterclockwise, counter-clockwise => ccw",
"adaptor => adapter",
"advocate, attorney, barrister, procurator, solicitor => lawyer",
"ageing => aging",
"agendas, agendum => agenda",
"almanack => almanac",
"aluminium => aluminum",
"america, united states, usa",
"amphitheatre => amphitheater",
"anti-aliased, anti-aliasing => antialiased",
"arbour => arbor",
"ardour => ardor",
"arse => ass",
"artefact => artifact",
"aubergine => eggplant",
"automobile, motorcar => car",
"axe => ax",
"bannister => banister",
"barbecue => bbq",
"battleaxe => battleax",
"baulk => balk",
"beetroot => beet",
"biassed => biased",
"biassing => biasing",
"biscuit => cookie",
"black american, african american, afro-american, negro",
"bobsleigh => bobsled",
"bonnet => hood",
"bulb, electric bulb, light bulb, lightbulb",
"burned => burnt",
"bussines, bussiness => business",
"business man, business people, businessman",
"business woman, business people, businesswoman",
"bussing => busing",
"cactus, cactuses => cacti",
"calibre => caliber",
"candour => candor",
"candy floss, candyfloss, cotton candy",
"car park, parking area, parking ground, parking lot, parking-lot, parking place, parking",
"carburettor => carburetor",
"castor => caster",
"cataloguing => cataloging",
"catboat, sailboat, sailing boat",
"champion, gainer, victor, win, winner => victory",
"chat => talk",
"chequebook => checkbook",
"chequer => checker",
"chequerboard => checkerboard",
"chequered => checkered",
"christmas tree ball, christmas tree ball ornament, christmas ball ornament, christmas bauble",
"christmas, x-mas => xmas",
"cinema => movies",
"clangour => clangor",
"clarinettist => clarinetist",
"conditioning => conditioner",
"conference => meeting",
"coriander => cilantro",
"corporate => company",
"cosmos, universe => outer space",
"cosy, cosiness => cozy",
"criminal => crime",
"curriculums => curricula",
"cypher => cipher",
"daddy, father, pa, papa => dad",
"defence => defense",
"defenceless => defenseless",
"demeanour => demeanor",
"departure platform, station platform, train platform, train station",
"dishrag => dish cloth",
"dishtowel, dishcloth => dish towel",
"doughnut => donut",
"downspout => drainpipe",
"drugstore => pharmacy",
"e-mail => email",
"enamoured => enamored",
"england => britain",
"english => british",
"epaulette => epaulet",
"exercise, excercise, training, workout => fitness",
"expressway, motorway, highway => freeway",
"facebook => facebook, social media",
"fanny => buttocks",
"fanny pack => bum bag",
"farmyard => barnyard",
"faucet => tap",
"fervour => fervor",
"fibre => fiber",
"fibreglass => fiberglass",
"flashlight => torch",
"flautist => flutist",
"flier => flyer",
"flower fly, hoverfly, syrphid fly, syrphus fly",
"foot-walk, sidewalk, sideway => pavement",
"football, soccer",
"forums => fora",
"fourth => 4",
"freshman => fresher",
"chips, fries, french fries",
"gaol => jail",
"gaolbird => jailbird",
"gaolbreak => jailbreak",
"gaoler => jailer",
"garbage, rubbish => trash",
"gasoline => petrol",
"gases, gasses",
"gauge => gage",
"gauged => gaged",
"gauging => gaging",
"gipsy, gipsies, gypsies => gypsy",
"glamour => glamor",
"glueing => gluing",
"gravesite, sepulchre, sepulture => sepulcher",
"grey => gray",
"greyish => grayish",
"greyness => grayness",
"groyne => groin",
"gryphon, griffon => griffin",
"hand shake, shake hands, shaking hands, handshake",
"haulier => hauler",
"hobo, homeless, tramp => bum",
"new year, new year's eve, hogmanay, silvester, sylvester",
"holiday => vacation",
"holidaymaker, holiday-maker, vacationer, vacationist => tourist",
"homosexual, fag => gay",
"inbox, letterbox, outbox, postbox => mailbox",
"independence day, 4th of july, fourth of july, july 4th, july 4, 4th july, july fourth, forth of july, 4 july, fourth july, 4th july",
"infant, suckling, toddler => baby",
"infeasible => unfeasible",
"inquire, inquiry => enquire",
"insure => ensure",
"internet, website => www",
"jelly => jam",
"jewelery, jewellery => jewelry",
"jogging => running",
"journey => travel",
"judgement => judgment",
"kerb => curb",
"kiwifruit => kiwi",
"laborer => worker",
"lacklustre => lackluster",
"ladybeetle, ladybird, ladybug => ladybird beetle",
"larrikin, scalawag, rascal, scallywag => naughty boy",
"leaf => leaves",
"licence, licenced, licencing => license",
"liquorice => licorice",
"lorry => truck",
"loupe, magnifier, magnifying, magnifying glass, magnifying lens, zoom",
"louvred => louvered",
"louvres => louver",
"lustre => luster",
"mail => post",
"mailman => postman",
"marriage, married, marry, marrying, wedding => wed",
"mayonaise => mayo",
"meagre => meager",
"misdemeanour => misdemeanor",
"mitre => miter",
"mom, momma, mummy, mother => mum",
"moonlight => moon light",
"moult => molt",
"moustache, moustached => mustache",
"nappy => diaper",
"nightlife => night life",
"normalcy => normality",
"octopus => kraken",
"odour => odor",
"odourless => odorless",
"offence => offense",
"omelette => omelet",
"# fix torres del paine",
"paine => painee",
"pajamas => pyjamas",
"pantyhose => tights",
"parenthesis, parentheses => bracket",
"parliament => congress",
"parlour => parlor",
"persnickety => pernickety",
"philtre => filter",
"phoney => phony",
"popsicle => iced-lolly",
"porch => veranda",
"pretence => pretense",
"pullover, jumper => sweater",
"pyjama => pajama",
"railway => railroad",
"rancour => rancor",
"rappel => abseil",
"row house, serial house, terrace house, terraced house, terraced housing, town house",
"rigour => rigor",
"rumour => rumor",
"sabre => saber",
"saltpetre => saltpeter",
"sanitarium => sanatorium",
"santa, santa claus, st nicholas, st nicholas day",
"sceptic, sceptical, scepticism, sceptics => skeptic",
"sceptre => scepter",
"shaikh, sheikh => sheik",
"shivaree => charivari",
"silverware, flatware => cutlery",
"simultaneous => simultanous",
"sleigh => sled",
"smoulder, smouldering => smolder",
"sombre => somber",
"speciality => specialty",
"spectre => specter",
"splendour => splendor",
"spoilt => spoiled",
"street => road",
"streetcar, tramway, tram => trolley-car",
"succour => succor",
"sulphate, sulphide, sulphur, sulphurous, sulfurous => sulfur",
"super hero, superhero => hero",
"surname => last name",
"sweets => candy",
"syphon => siphon",
"syphoning => siphoning",
"tack, thumb-tack, thumbtack => drawing pin",
"tailpipe => exhaust pipe",
"taleban => taliban",
"teenager => teen",
"television => tv",
"thank you, thanks",
"theatre => theater",
"tickbox => checkbox",
"ticked => checked",
"timetable => schedule",
"tinned => canned",
"titbit => tidbit",
"toffee => taffy",
"tonne => ton",
"transportation => transport",
"trapezium => trapezoid",
"trousers => pants",
"tumour => tumor",
"twitter => twitter, social media",
"tyre => tire",
"tyres => tires",
"undershirt => singlet",
"university => college",
"upmarket => upscale",
"valour => valor",
"vapour => vapor",
"vigour => vigor",
"waggon => wagon",
"windscreen, windshield => front shield",
"world championship, world cup, worldcup",
"worshipper, worshipping => worshiping",
"yoghourt, yoghurt => yogurt",
"zip, zip code, postal code, postcode",
"zucchini => courgette"
)
I realize that this answer departs somewhat from the OP's initial question, but if you just want to normalize American vs. British English spelling variants, you can look here for a manageably sized list (~1,700 replacements): http://www.tysto.com/uk-us-spelling-list.html. I'm sure there are others out there too that you could use to create a consolidated master list.
Apart from spelling variation, you must be very careful not to blithely replace words in isolation with their (assumed!) counterparts in American English. I would advise against all but the most solid of lexical replacements. E.g., I can't see anything bad happening from this one
"anticlockwise, counterclockwise, counter-clockwise => counter-clockwise"
but this one
"hobo, homeless, tramp => bum"
would index "A homeless man" => *"A bum man", which is nonsense. (Not to mention that hobos, the homeless and "tramps" are quite distinct -- http://knowledgenuts.com/2014/11/26/the-difference-between-hobos-tramps-and-bums/.)
In summary, apart from spelling variation, the American vs. British dialect divide is complicated and cannot be reduced to simple list look-ups.
P.S. If you really want to do this right (i.e., account for grammatical context, etc.), you would probably need a context-sensitive paraphrase model to "translate" British to American English (or the inverse, depending on your needs) before it ever hits the ES index. This could be done (with sufficient parallel data) using an off-the-shelf statistical translation model, or maybe even some custom, in-house software that uses natural language parsing, POS tagging, chunking, etc.
I am trying to explore the simple_aws gem. When I connect to cloudwatch to get the metric statistics I get an error as follows:
cw.get_metric_statistics(
:metric_name => metric_name,
:period => period,
:start_time => start_time,
:end_time => end_time,
:statistics => "Average",
:namespace => "AWS/EC2"
)
SimpleAWS::UnsuccessfulResponse: MissingParameter (400):
The parameter Namespace is required.
The parameter MetricName is required.
The parameter StartTime is required.
The parameter EndTime is required.
The parameter Period is required.
The parameter Statistics is required.
Later, I tried this:
cw.get_metric_statistics(
options => [
{:metric_name=>"CPUUtilization",
:period=>60,
:start_time => Time.now()-86400,
:end_time => Time.now()-3600,
:statistics => "Average"
}
]
)
But got the following error:
URI::InvalidComponentError: bad component(expected query component):
Action=GetMetricStatistics&{:metric_name=>"CPUUtilization"}.1.metric_name=CPUUtilization&{:metric_name=>"CPUUtilization"}.1.period=60&{:metric_name=>"CPUUtilization"}.1.start_time=2012-05-06%2014%3A25%3A28%20%2B0530&{:metric_name=>"CPUUtilization"}.1.end_time=2012-05-07%2013%3A25%3A28%20%2B0530&{:metric_name=>"CPUUtilization"}.1.statistics=Average&AWSAccessKeyId=AccessKey&SignatureMethod=HmacSHA256&SignatureVersion=2&Timestamp=2012-05-07T08%3A55%3A28Z&Version=2010-08-01&Signature=Signature
one more try:
cw.get_metric_statistics(
namespace: 'AWS/EC2',
measure_name: 'CPUUtilization',
statistics: 'Average',
start_time: time-1000,
dimensions: "InstanceId=#{instance_id}"
)
ArgumentError: comparison of Array with Array failed
Can anybody please help find the correct syntax for issuing this command.
result = cw.get_metric_statistics(step,
start_time,
end_time,
metric,
'AWS/RDS',
'Average',
dimensions={'DBInstanceIdentifier': [indentifier]})
This also worked for me
I found that this works;
lat = cw.get_metric_statistics(
'MetricName' => 'Latency',
'Period' => 60,
'StartTime' => (Time.now() - 3600).iso8601,
'EndTime' => Time.now().iso8601,
'Statistics.member.1' => "Average",
'Namespace' => "AWS/ELB",
'Unit' => 'Seconds'
)
Firstly that the datetime is required in ISO8601 format, secondly that the parameters need to be cased correctly, thirdly the Unit parameter is required, and finally that Statistics needed a namespace(?) after it.
Hope this helps, even if it is a little late.