Elasticsearch synonym issue - elasticsearch
I've had a look on the other questions surrounding this problem but it doesn't seem to help.
I'm having to change an input of "i phone" or "i Phone" to query "iPhone" in Elasticsearch.
As you can see, I have tried most everything I can think of, including simply "phone => iPhone" and leaving the "i" in there to hang around and possibly add it to the stopwords.
I've tried using "simple", "keyword", "standard" and "whitespace" for my custom analyzer.
Can anyone spot where I've gone wrong, this is the last problem before I can finish my project so it'd be appreciated. Thanks
P.S. Bonus points if you include how I can do auto suggest on inputs, thanks
Below is my code
public static CreateIndexDescriptor GetMasterProductDescriptor(string indexName = "shopmaster")
{
var indexDescriptor = new CreateIndexDescriptor(indexName)
.Settings(s => s
.Analysis(a => a
.TokenFilters(t => t
.Stop("my_stop", st => st
.StopWords("_english_", "new", "cheap")
.RemoveTrailing()
)
.Synonym("my_synonym", st => st
.Synonyms(
"phone => iPhone"
//"i phone => iPhone",
//"i Phone => iPhone"
)
)
.Snowball("my_snowball", st => st
.Language(SnowballLanguage.English)
)
)
.Analyzers(an => an
.Custom("my_analyzer", ca => ca
.Tokenizer("simple")
.Filters(
"lowercase",
"my_stop",
"my_snowball",
"my_synonym"
)
)
)
)
)
.Mappings(
ms => ms.Map<MasterProduct>(
m => m.AutoMap()
.Properties(
ps => ps
.Nested<MasterProductAttributes>(p => p.Name(n => n.MasterAttributes))
.Nested<MasterProductAttributes>(p => p.Name(n => n.ProductAttributes))
.Nested<MasterProductAttributeType>(p => p.Name(n => n.MasterAttributeTypes))
.Nested<Feature>(p => p.Name(n => n.Features))
.Nested<RelatedProduct>(p => p.Name(n => n.RelatedProducts))
.Nested<MasterProductItem>(
p => p.Name(
n => n.Products
)
.Properties(prop => prop.Boolean(
b => b.Name(n => n.InStock)
))
)
.Boolean(b => b.Name(n => n.InStock))
.Number(t => t.Name(n => n.UnitsSold).Type(NumberType.Integer))
.Text(
tx => tx.Name(e => e.ManufacturerName)
.Fields(fs => fs.Keyword(ss => ss.Name("manufacturer"))
.TokenCount(t => t.Name("MasterProductId")
.Analyzer("my_analyzer")
)
)
.Fielddata())
//.Completion(cm=>cm.Analyzer("my_analyser")
)
)
);
return indexDescriptor;
}
The order of your filters matters!
You are applying lowercase, then a stemmer (snowball) then synonyms. You synonyms contain capital letters, but by the time they are applied, lowercasing has already occurred. It's a good idea to apply lowercasing first, to make sure case doesn't affect matching of the synonyms, but your replacements, in the that case, shouldn't have caps.
Stemmers should not be applied before synonyms (unless you know what you are doing, and are comparing post-stemming terms). Snowball, I believe, will transform 'iphone' to 'iphon', so this is another area where you are running into trouble.
"lowercase",
"my_synonym",
"my_stop",
"my_snowball",
(And don't forget to remove the caps from your synonyms)
Related
Inner Hits isn't working ElasticSearch
Good Day: I'm using ElasticSearch/NEST to query against nested objects. What I realize is that my nested object is empty however, the parent is being returned despite there being now match. ISearchResponse<Facility> responses = await this._elasticClient.SearchAsync<Facility>(a => a.Query(q => q.Bool(b => b.Must(m => m.Nested(n => n.Query(nq => nq.Term(t =>t.Field(f => f.Reviews.First().UserId).Value(user.Id)) ).InnerHits(ih => ih.From(0).Size(1).Name("UserWithReview")) ) ) ) )); When I look at the generated query, I"m even more confused what is happening: Successful low level call on POST: /dev/doc/_search?typed_keys=true # Audit trail of this API call: - [1] HealthyResponse: Node: http://localhost:9200/ Took: 00:00:00.9806442 # Request: {} As you can see the request is empty.
You haven't defined the nested query with all the properties needed; it's missing the Path property, which tells Elasticsearch which document field (i.e. path) to execute the query on. Looking at the rest of the query, it looks like this should be the Reviews property ISearchResponse<Facility> responses = await this._elasticClient.SearchAsync<Facility>(a => a .Query(q => q .Bool(b => b .Must(m => m .Nested(n => n .Path(f => f.Reviews) // <-- missing .Query(nq => nq .Term(t => t .Field(f => f.Reviews.First().UserId) .Value(user.Id) ) ) .InnerHits(ih => ih.From(0).Size(1).Name("UserWithReview")) ) ) ) ) );
copy_to elasticsearch 6 analizers
I am using elasticsearch 6 nest for dot.net I used _all field in order to search all the index but now since its deprecated i need a new solution I have found the copy_to option which is good enough. My question is : i used to define to the _all field which analyzer it should use (ngram) and which search analyzer it should use (whitespace) for my project reasons. Now since the copy_to field is not declared in the mappings i am unable to define it, any idea would be appreciated. var res = client1.CreateIndex(INDEX_NAME, desc => desc .InitializeUsing(indexState) .Settings(x => x .Analysis(g => g .Tokenizers(t => t .NGram("ngram_tokenizer", y => y .MinGram(3) .MaxGram(7) .TokenChars( TokenChar.Letter, TokenChar.Digit, TokenChar.Punctuation, TokenChar.Symbol ))) .Analyzers(o => o .Custom("ngram_analyzer", w => w.Tokenizer("ngram_tokenizer").Filters("lowercase")) .Whitespace("whitespace_analyzer") .Standard("standard_analyzer", e => e.MaxTokenLength(1111))))) .Mappings(ms => ms .Map<SampleClass>(m => m .AutoMap() //Still auto map exists if there are attributes on the class definition .Properties(ps => ps //Override auto map .Text(s => s .Name(n => n.SampleString) .CopyTo(am=>am.Field("searchallfield"))) .Number(s => s .Name(n => n.SampleInteger)) .Date(s => s .Name(n => n.SampleDateTime) .Format("MM-DD-YY")) ))));
Apparently you can define the copy_to field in the mappings .Map<SampleClass>(m => m .AutoMap() //Still auto map exists if there are attributes on the class definition .Properties(ps => ps //Override auto map .Text(yy=>yy .Name("searchallfield") .SearchAnalyzer("whitespace_analyzer") .Analyzer("ngram_analyzer")) .Text(s => s .Name(n => n.SampleString) .CopyTo(am=>am.Field("searchallfield"))) .Number(s => s .Name(n => n.SampleInteger)) .Date(s => s .Name(n => n.SampleDateTime))
Get product id from recurring profile Array
I am Using an Observer on Place Order for Recurring Profile public function SubscribePlan($observer){ $profileIds = Mage::getSingleton('checkout/session')->getLastRecurringProfileIds(); if ($profileIds && is_array($profileIds)) { $collection = Mage::getModel('sales/recurring_profile')->getCollection() ->addFieldToFilter('profile_id', array('in' => $profileIds)) ; $profiles = array(); foreach ($collection as $profile) { $profiles[]= $profile; } echo"<pre>"; print_r($profiles); echo"</pre>"; die('dead'); } and Printing Array , It Give me Array like this [profile_id] => 53 [state] => active [customer_id] => 10 [store_id] => 1 [method_code] => paypal_express [created_at] => 2014-06-25 06:04:43 [updated_at] => 2014-06-25 06:04:44 [reference_id] => I-KJXWM42XC01K [subscriber_name] => [start_datetime] => 2014-06-25 06:04:44 [internal_reference_id] => 53-4ba91ddd43b6d2d377378a5aba7f3908 [schedule_description] => One Year [suspension_threshold] => 5 [bill_failed_later] => 0 [period_unit] => year [period_frequency] => 1 [period_max_cycles] => [billing_amount] => 100.0000 [trial_period_unit] => [trial_period_frequency] => [trial_period_max_cycles] => [trial_billing_amount] => [currency_code] => USD [shipping_amount] => 5.0000 [tax_amount] => [init_amount] => [init_may_fail] => 0 [order_info] => a:56:{s:9:"entity_id";s:2:"72";s:8:"store_id";s:1:"1";s:10:"created_at";s:19:"2014-06-25 06:03:28";s:10:"updated_at";s:19:"2014-06-25 06:04:38";s:12:"converted_at";N;s:9:"is_active";s:1:"1";s:10:"is_virtual";s:1:"0";s:17:"is_multi_shipping";s:1:"0";s:11:"items_count";i:1;s:9:"items_qty";d:1;s:13:"orig_order_id";s:1:"0";s:18:"store_to_base_rate";s:6:"1.0000";s:19:"store_to_quote_rate";s:6:"1.0000";s:18:"base_currency_code";s:3:"USD";s:19:"store_currency_code";s:3:"USD";s:19:"quote_currency_code";s:3:"USD";s:11:"grand_total";d:0;s:16:"base_grand_total";d:0;s:15:"checkout_method";N;s:11:"customer_id";s:2:"10";s:21:"customer_tax_class_id";s:1:"3";s:17:"customer_group_id";s:1:"1";s:14:"customer_email";s:17:"krn#ocodewire.com";s:15:"customer_prefix";N;s:18:"customer_firstname";s:5:"Karan";s:19:"customer_middlename";N;s:17:"customer_lastname";s:8:"Adhikari";s:15:"customer_suffix";N;s:12:"customer_dob";N;s:13:"customer_note";N;s:20:"customer_note_notify";s:1:"1";s:17:"customer_is_guest";s:1:"0";s:9:"remote_ip";s:14:"112.196.22.234";s:16:"applied_rule_ids";s:0:"";s:17:"reserved_order_id";s:9:"100000057";s:13:"password_hash";N;s:11:"coupon_code";N;s:20:"global_currency_code";s:3:"USD";s:19:"base_to_global_rate";s:6:"1.0000";s:18:"base_to_quote_rate";s:6:"1.0000";s:15:"customer_taxvat";N;s:15:"customer_gender";s:4:"male";s:8:"subtotal";d:0;s:13:"base_subtotal";d:0;s:22:"subtotal_with_discount";d:0;s:27:"base_subtotal_with_discount";d:0;s:10:"is_changed";s:1:"1";s:17:"trigger_recollect";i:0;s:17:"ext_shipping_info";N;s:15:"gift_message_id";N;s:13:"is_persistent";s:1:"0";s:15:"x_forwarded_for";N;s:17:"virtual_items_qty";i:0;s:15:"taxes_for_items";a:0:{}s:14:"can_apply_msrp";b:0;s:21:"totals_collected_flag";b:1;} [order_item_info] => a:74:{s:7:"item_id";s:2:"88";s:8:"quote_id";s:2:"72";s:10:"created_at";s:19:"2014-06-25 06:03:28";s:10:"updated_at";s:19:"2014-06-25 06:03:28";s:10:"product_id";s:1:"6";s:8:"store_id";s:1:"1";s:14:"parent_item_id";N;s:10:"is_virtual";s:1:"0";s:3:"sku";s:10:"one-yr-sub";s:4:"name";s:8:"One Year";s:11:"description";N;s:16:"applied_rule_ids";s:0:"";s:15:"additional_data";N;s:13:"free_shipping";s:1:"0";s:14:"is_qty_decimal";s:1:"0";s:11:"no_discount";s:1:"0";s:6:"weight";s:6:"0.0010";s:3:"qty";i:1;s:5:"price";d:100;s:10:"base_price";d:100;s:12:"custom_price";N;s:16:"discount_percent";i:0;s:15:"discount_amount";i:0;s:20:"base_discount_amount";i:0;s:11:"tax_percent";i:0;s:10:"tax_amount";i:0;s:15:"base_tax_amount";i:0;s:9:"row_total";d:100;s:14:"base_row_total";d:100;s:23:"row_total_with_discount";s:6:"0.0000";s:10:"row_weight";d:0.001000000000000000020816681711721685132943093776702880859375;s:12:"product_type";s:6:"simple";s:24:"base_tax_before_discount";N;s:19:"tax_before_discount";N;s:21:"original_custom_price";N;s:12:"redirect_url";N;s:9:"base_cost";N;s:14:"price_incl_tax";d:100;s:19:"base_price_incl_tax";d:100;s:18:"row_total_incl_tax";d:100;s:23:"base_row_total_incl_tax";d:100;s:17:"hidden_tax_amount";i:0;s:22:"base_hidden_tax_amount";i:0;s:15:"gift_message_id";N;s:20:"weee_tax_disposition";i:0;s:24:"weee_tax_row_disposition";i:0;s:25:"base_weee_tax_disposition";i:0;s:29:"base_weee_tax_row_disposition";i:0;s:16:"weee_tax_applied";s:6:"a:0:{}";s:23:"weee_tax_applied_amount";i:0;s:27:"weee_tax_applied_row_amount";i:0;s:28:"base_weee_tax_applied_amount";i:0;s:30:"base_weee_tax_applied_row_amnt";N;s:11:"qty_options";a:0:{}s:12:"tax_class_id";s:1:"0";s:12:"is_recurring";s:1:"1";s:9:"has_error";b:0;s:10:"is_nominal";b:1;s:22:"base_calculation_price";d:100;s:17:"calculation_price";d:100;s:15:"converted_price";d:100;s:19:"base_original_price";d:100;s:14:"taxable_amount";d:100;s:19:"base_taxable_amount";d:100;s:17:"is_price_incl_tax";b:0;s:14:"original_price";d:100;s:32:"base_weee_tax_applied_row_amount";i:0;s:25:"discount_tax_compensation";i:0;s:20:"base_shipping_amount";d:5;s:15:"shipping_amount";d:5;s:17:"nominal_row_total";d:105;s:22:"base_nominal_row_total";d:105;s:21:"nominal_total_details";a:0:{}s:15:"info_buyRequest";s:225:"a:4:{s:4:"uenc";s:124:"aHR0cDovL2J3Y211bHRpbWVkaWEuY29tL0UvZXh0ZW5zaW9udGVzdC9pbmRleC5waHAvbXVsdGl2ZW5kb3IvdmVuZG9ycHJvZHVjdHMvc3Vic2NyaXB0aW9uLw,,";s:7:"product";s:1:"6";s:8:"form_key";s:16:"be2eDRXu1MC7OXfK";s:3:"qty";i:1;}";} [billing_address_info] => a:97:{s:10:"address_id";s:3:"145";s:8:"quote_id";s:2:"72";s:10:"created_at";s:19:"2014-06-25 06:03:28";s:10:"updated_at";s:19:"2014-06-25 06:04:38";s:11:"customer_id";s:2:"10";s:20:"save_in_address_book";s:1:"0";s:19:"customer_address_id";N;s:12:"address_type";s:7:"billing";s:5:"email";s:23:"sukhwantc#ocodewire.com";s:6:"prefix";N;s:9:"firstname";s:4:"test";s:10:"middlename";N;s:8:"lastname";s:4:"test";s:6:"suffix";N;s:7:"company";s:9:"OcodeTest";s:6:"street";s:9:"1 Main St";s:4:"city";s:8:"San Jose";s:6:"region";s:10:"California";s:9:"region_id";s:2:"12";s:8:"postcode";s:5:"95131";s:10:"country_id";s:2:"US";s:9:"telephone";s:10:"9085656554";s:3:"fax";N;s:15:"same_as_billing";s:1:"1";s:13:"free_shipping";i:0;s:22:"collect_shipping_rates";s:1:"0";s:15:"shipping_method";N;s:20:"shipping_description";N;s:6:"weight";i:0;s:8:"subtotal";i:0;s:13:"base_subtotal";i:0;s:22:"subtotal_with_discount";s:6:"0.0000";s:27:"base_subtotal_with_discount";s:6:"0.0000";s:10:"tax_amount";i:0;s:15:"base_tax_amount";i:0;s:15:"shipping_amount";i:0;s:20:"base_shipping_amount";i:0;s:19:"shipping_tax_amount";i:0;s:24:"base_shipping_tax_amount";i:0;s:15:"discount_amount";i:0;s:20:"base_discount_amount";i:0;s:11:"grand_total";d:0;s:16:"base_grand_total";d:0;s:14:"customer_notes";N;s:13:"applied_taxes";s:6:"a:0:{}";s:20:"discount_description";N;s:24:"shipping_discount_amount";N;s:29:"base_shipping_discount_amount";N;s:17:"subtotal_incl_tax";i:0;s:28:"base_subtotal_total_incl_tax";N;s:17:"hidden_tax_amount";N;s:22:"base_hidden_tax_amount";N;s:26:"shipping_hidden_tax_amount";N;s:29:"base_shipping_hidden_tax_amnt";N;s:17:"shipping_incl_tax";i:0;s:22:"base_shipping_incl_tax";i:0;s:6:"vat_id";N;s:12:"vat_is_valid";N;s:14:"vat_request_id";N;s:16:"vat_request_date";N;s:19:"vat_request_success";N;s:15:"gift_message_id";N;s:24:"should_ignore_validation";b:1;s:16:"extra_tax_amount";i:0;s:21:"base_extra_tax_amount";i:0;s:28:"recurring_initial_fee_amount";i:0;s:33:"base_recurring_initial_fee_amount";i:0;s:16:"cached_items_all";a:0:{}s:20:"cached_items_nominal";a:0:{}s:23:"cached_items_nonnominal";a:0:{}s:30:"recurring_trial_payment_amount";i:0;s:35:"base_recurring_trial_payment_amount";i:0;s:23:"nominal_subtotal_amount";i:0;s:28:"base_nominal_subtotal_amount";i:0;s:9:"total_qty";i:0;s:19:"base_virtual_amount";i:0;s:14:"virtual_amount";i:0;s:22:"base_subtotal_incl_tax";i:0;s:23:"nominal_discount_amount";i:0;s:28:"base_nominal_discount_amount";i:0;s:16:"applied_rule_ids";s:0:"";s:19:"nominal_weee_amount";i:0;s:24:"base_nominal_weee_amount";i:0;s:18:"nominal_tax_amount";i:0;s:23:"base_nominal_tax_amount";i:0;s:11:"msrp_amount";i:0;s:16:"base_msrp_amount";i:0;s:19:"freeshipping_amount";i:0;s:24:"base_freeshipping_amount";i:0;s:11:"weee_amount";i:0;s:16:"base_weee_amount";i:0;s:18:"free_method_weight";i:0;s:19:"tax_shipping_amount";i:0;s:24:"base_tax_shipping_amount";i:0;s:16:"shipping_taxable";i:0;s:21:"base_shipping_taxable";i:0;s:20:"is_shipping_incl_tax";b:0;} [shipping_address_info] => a:103:{s:10:"address_id";s:3:"146";s:8:"quote_id";s:2:"72";s:10:"created_at";s:19:"2014-06-25 06:03:28";s:10:"updated_at";s:19:"2014-06-25 06:04:38";s:11:"customer_id";s:2:"10";s:20:"save_in_address_book";s:1:"0";s:19:"customer_address_id";N;s:12:"address_type";s:8:"shipping";s:5:"email";s:23:"sukhwantc#ocodewire.com";s:6:"prefix";N;s:9:"firstname";s:4:"test";s:10:"middlename";N;s:8:"lastname";s:4:"test";s:6:"suffix";N;s:7:"company";s:9:"OcodeTest";s:6:"street";s:9:"1 Main St";s:4:"city";s:8:"San Jose";s:6:"region";s:10:"California";s:9:"region_id";s:2:"12";s:8:"postcode";s:5:"95131";s:10:"country_id";s:2:"US";s:9:"telephone";s:10:"9085656554";s:3:"fax";N;s:15:"same_as_billing";s:1:"0";s:13:"free_shipping";i:0;s:22:"collect_shipping_rates";b:1;s:15:"shipping_method";s:17:"flatrate_flatrate";s:20:"shipping_description";s:17:"Flat Rate - Fixed";s:6:"weight";i:0;s:8:"subtotal";i:0;s:13:"base_subtotal";i:0;s:22:"subtotal_with_discount";s:6:"0.0000";s:27:"base_subtotal_with_discount";s:6:"0.0000";s:10:"tax_amount";i:0;s:15:"base_tax_amount";i:0;s:15:"shipping_amount";i:0;s:20:"base_shipping_amount";i:0;s:19:"shipping_tax_amount";i:0;s:24:"base_shipping_tax_amount";i:0;s:15:"discount_amount";i:0;s:20:"base_discount_amount";i:0;s:11:"grand_total";d:0;s:16:"base_grand_total";d:0;s:14:"customer_notes";N;s:13:"applied_taxes";s:6:"a:0:{}";s:20:"discount_description";s:0:"";s:24:"shipping_discount_amount";i:0;s:29:"base_shipping_discount_amount";i:0;s:17:"subtotal_incl_tax";i:0;s:28:"base_subtotal_total_incl_tax";N;s:17:"hidden_tax_amount";i:0;s:22:"base_hidden_tax_amount";i:0;s:26:"shipping_hidden_tax_amount";i:0;s:29:"base_shipping_hidden_tax_amnt";N;s:17:"shipping_incl_tax";i:0;s:22:"base_shipping_incl_tax";i:0;s:6:"vat_id";N;s:12:"vat_is_valid";N;s:14:"vat_request_id";N;s:16:"vat_request_date";N;s:19:"vat_request_success";N;s:15:"gift_message_id";N;s:24:"should_ignore_validation";b:1;s:16:"extra_tax_amount";i:0;s:21:"base_extra_tax_amount";i:0;s:28:"recurring_initial_fee_amount";i:0;s:33:"base_recurring_initial_fee_amount";i:0;s:16:"cached_items_all";a:0:{}s:20:"cached_items_nominal";a:0:{}s:23:"cached_items_nonnominal";a:0:{}s:30:"recurring_trial_payment_amount";i:0;s:35:"base_recurring_trial_payment_amount";i:0;s:23:"nominal_subtotal_amount";i:0;s:28:"base_nominal_subtotal_amount";i:0;s:9:"total_qty";i:0;s:19:"base_virtual_amount";i:0;s:14:"virtual_amount";i:0;s:22:"base_subtotal_incl_tax";i:0;s:15:"rounding_deltas";a:0:{}s:23:"nominal_discount_amount";i:0;s:28:"base_nominal_discount_amount";i:0;s:16:"cart_fixed_rules";a:0:{}s:16:"applied_rule_ids";s:0:"";s:19:"nominal_weee_amount";i:0;s:24:"base_nominal_weee_amount";i:0;s:19:"applied_taxes_reset";b:1;s:18:"nominal_tax_amount";i:0;s:23:"base_nominal_tax_amount";i:0;s:31:"base_shipping_hidden_tax_amount";i:0;s:18:"free_method_weight";i:0;s:8:"item_qty";i:1;s:11:"region_code";s:2:"CA";s:11:"msrp_amount";i:0;s:16:"base_msrp_amount";i:0;s:19:"freeshipping_amount";i:0;s:24:"base_freeshipping_amount";i:0;s:11:"weee_amount";i:0;s:16:"base_weee_amount";i:0;s:19:"tax_shipping_amount";i:0;s:24:"base_tax_shipping_amount";i:0;s:16:"shipping_taxable";i:0;s:21:"base_shipping_taxable";i:0;s:20:"is_shipping_incl_tax";b:0;} [profile_vendor_info] => [additional_info] => Now I want to Pick Product id From That array ,How can i Do So?
I've spent some time digging through a few print_r($array) outputs and it is never much fun. Have you tried anything in particular that hasn't worked? Be sure to let us know what you've tried and what you've come up with. I think I may be able to help you though. I'm pretty sure that this will get you what you want: $profiles[order_item_info][product_id] I had to try this out using the terminal and I had to put the names in quotes like the following or I got a "undefined constant" error for my test values so if that throws an error like that, then try this: $profiles['order_item_info']['product_id'] I've been away from PHP for a while but I'm pretty sure you can get what you need by digging into this multi-dimensional array.
How to show additional option for size, brand in magento admin panel order description page
I am importing some order using programmatically but i have a problem that how to show the additional attribute like size , brand etc after product name in the "Items Ordered" help me out i have search a lot for this but not find the appropriate answer. http://inchoo.net/ecommerce/magento/programming-magento/programatically-create-customer-and-order-in-magento-with-full-blown-one-page-checkout-process-under-the-hood/ actually i have these information about product : Array ( [pr_id] => 30250 [sku_id] => 10086663 [dmn] => (blank) [sector] => (blank) [cat] => (blank) [brnd] => Mimi Holliday [supp] => Damaris Ltd [desc] => Black lace plunge bra with silk [clr] => Black [size] => 32B [st] => Shipped [cn_cause] => [ret_cause] => [sl_amt] => 16.99 [l_s_ex_vt] => 14.46 [l_s_ex_vt_a_vh] => 14.46 [l_p_ex_vt] => 6.87 [l_cpn] => 0 [l_shp_cst] => 1.63 [mg_b_ds] => 7.59 [mg_b_ds_prct] => 0.52489626556017 [mg_a_ds] => 7.59 [mg_a_ds_prct] => 0.52489626556017 ) Here size and brand is displayed i want to show them.
Normalize British and American English for Elasticsearch
Is there a best practice for normalizing British and American English in Elasticsearch? Using a Synonym Token Filter requires an incredibly long configuration file. There are actually several thousand differently spelled words in UK and US English and it's almost impossible to find a really comprehensive list of words. Here's a list of almost 2.000 words, but it's far from being complete. Preferably, I'd like to create an ES Analyzer/Filter with rules to transform US to UK English. Maybe that's the better approach, but I don't know where to start - which type of filters do I need for that? It doesn't have to cover everything - it should merely normalize most search terms. E.g. "grey" - "gray", "colour" - "color", "center" - "centre", etc.
Here's the approach I went for after fiddling around a while. It's a combination of basic rules, "fixes", and synonyms: First, apply a char_filter to enforce a set of basic spelling rules. It's not 100% correct, but it does the job pretty well: "char_filter": { "en_char_filter": { "type": "mapping", "mappings": [ # fixes "aerie=>axerie", "aeroplane=>airplane", "aloe=>aloxe", "canoe=>canoxe", "coerce=>coxerce", "poem=>poxem", "prise=>prixse", # whole words "armour=>armor", "behaviour=>behavior", "centre=>center" "colour=>color", "clamour=>clamor", "draught=>draft", "endeavour=>endeavor", "favour=>favor", "flavour=>flavor", "harbour=>harbor", "honour=>honor", "humour=>humor", "labour=>labor", "litre=>liter", "metre=>meter", "mould=>mold", "neighbour=>neighbor", "plough=>plow", "saviour=>savior", "savour=>savor", # generic transformations "ae=>e", "ction=>xion", "disc=>disk", "gramme=>gram", "isable=>izable", "isation=>ization", "ise=>ize", "ising=>izing", "ll=>l", "oe=>e", "ogue=>og", "sation=>zation", "yse=>yze", "ysing=>yzing" ] } } The "fixes" entry is there to prevent incorrect application of other rules. E.g. "prise=>prixse" prevents "prise" from getting changed into "prize", which has a different meaning. You may need to adapt this according to your own needs. Next, include a synonym filter for catching the most frequently used exceptions: "en_synonym_filter": { "type": "synonym", "synonyms": EN_SYNONYMS } Here's our list of synonyms that includes the most important keywords for our use case. You may wish to adapt this list to your needs: EN_SYNONYMS = ( "accolade, prize => award", "accoutrement => accouterment", "aching, pain => hurt", "acw, anticlockwise, counterclockwise, counter-clockwise => ccw", "adaptor => adapter", "advocate, attorney, barrister, procurator, solicitor => lawyer", "ageing => aging", "agendas, agendum => agenda", "almanack => almanac", "aluminium => aluminum", "america, united states, usa", "amphitheatre => amphitheater", "anti-aliased, anti-aliasing => antialiased", "arbour => arbor", "ardour => ardor", "arse => ass", "artefact => artifact", "aubergine => eggplant", "automobile, motorcar => car", "axe => ax", "bannister => banister", "barbecue => bbq", "battleaxe => battleax", "baulk => balk", "beetroot => beet", "biassed => biased", "biassing => biasing", "biscuit => cookie", "black american, african american, afro-american, negro", "bobsleigh => bobsled", "bonnet => hood", "bulb, electric bulb, light bulb, lightbulb", "burned => burnt", "bussines, bussiness => business", "business man, business people, businessman", "business woman, business people, businesswoman", "bussing => busing", "cactus, cactuses => cacti", "calibre => caliber", "candour => candor", "candy floss, candyfloss, cotton candy", "car park, parking area, parking ground, parking lot, parking-lot, parking place, parking", "carburettor => carburetor", "castor => caster", "cataloguing => cataloging", "catboat, sailboat, sailing boat", "champion, gainer, victor, win, winner => victory", "chat => talk", "chequebook => checkbook", "chequer => checker", "chequerboard => checkerboard", "chequered => checkered", "christmas tree ball, christmas tree ball ornament, christmas ball ornament, christmas bauble", "christmas, x-mas => xmas", "cinema => movies", "clangour => clangor", "clarinettist => clarinetist", "conditioning => conditioner", "conference => meeting", "coriander => cilantro", "corporate => company", "cosmos, universe => outer space", "cosy, cosiness => cozy", "criminal => crime", "curriculums => curricula", "cypher => cipher", "daddy, father, pa, papa => dad", "defence => defense", "defenceless => defenseless", "demeanour => demeanor", "departure platform, station platform, train platform, train station", "dishrag => dish cloth", "dishtowel, dishcloth => dish towel", "doughnut => donut", "downspout => drainpipe", "drugstore => pharmacy", "e-mail => email", "enamoured => enamored", "england => britain", "english => british", "epaulette => epaulet", "exercise, excercise, training, workout => fitness", "expressway, motorway, highway => freeway", "facebook => facebook, social media", "fanny => buttocks", "fanny pack => bum bag", "farmyard => barnyard", "faucet => tap", "fervour => fervor", "fibre => fiber", "fibreglass => fiberglass", "flashlight => torch", "flautist => flutist", "flier => flyer", "flower fly, hoverfly, syrphid fly, syrphus fly", "foot-walk, sidewalk, sideway => pavement", "football, soccer", "forums => fora", "fourth => 4", "freshman => fresher", "chips, fries, french fries", "gaol => jail", "gaolbird => jailbird", "gaolbreak => jailbreak", "gaoler => jailer", "garbage, rubbish => trash", "gasoline => petrol", "gases, gasses", "gauge => gage", "gauged => gaged", "gauging => gaging", "gipsy, gipsies, gypsies => gypsy", "glamour => glamor", "glueing => gluing", "gravesite, sepulchre, sepulture => sepulcher", "grey => gray", "greyish => grayish", "greyness => grayness", "groyne => groin", "gryphon, griffon => griffin", "hand shake, shake hands, shaking hands, handshake", "haulier => hauler", "hobo, homeless, tramp => bum", "new year, new year's eve, hogmanay, silvester, sylvester", "holiday => vacation", "holidaymaker, holiday-maker, vacationer, vacationist => tourist", "homosexual, fag => gay", "inbox, letterbox, outbox, postbox => mailbox", "independence day, 4th of july, fourth of july, july 4th, july 4, 4th july, july fourth, forth of july, 4 july, fourth july, 4th july", "infant, suckling, toddler => baby", "infeasible => unfeasible", "inquire, inquiry => enquire", "insure => ensure", "internet, website => www", "jelly => jam", "jewelery, jewellery => jewelry", "jogging => running", "journey => travel", "judgement => judgment", "kerb => curb", "kiwifruit => kiwi", "laborer => worker", "lacklustre => lackluster", "ladybeetle, ladybird, ladybug => ladybird beetle", "larrikin, scalawag, rascal, scallywag => naughty boy", "leaf => leaves", "licence, licenced, licencing => license", "liquorice => licorice", "lorry => truck", "loupe, magnifier, magnifying, magnifying glass, magnifying lens, zoom", "louvred => louvered", "louvres => louver", "lustre => luster", "mail => post", "mailman => postman", "marriage, married, marry, marrying, wedding => wed", "mayonaise => mayo", "meagre => meager", "misdemeanour => misdemeanor", "mitre => miter", "mom, momma, mummy, mother => mum", "moonlight => moon light", "moult => molt", "moustache, moustached => mustache", "nappy => diaper", "nightlife => night life", "normalcy => normality", "octopus => kraken", "odour => odor", "odourless => odorless", "offence => offense", "omelette => omelet", "# fix torres del paine", "paine => painee", "pajamas => pyjamas", "pantyhose => tights", "parenthesis, parentheses => bracket", "parliament => congress", "parlour => parlor", "persnickety => pernickety", "philtre => filter", "phoney => phony", "popsicle => iced-lolly", "porch => veranda", "pretence => pretense", "pullover, jumper => sweater", "pyjama => pajama", "railway => railroad", "rancour => rancor", "rappel => abseil", "row house, serial house, terrace house, terraced house, terraced housing, town house", "rigour => rigor", "rumour => rumor", "sabre => saber", "saltpetre => saltpeter", "sanitarium => sanatorium", "santa, santa claus, st nicholas, st nicholas day", "sceptic, sceptical, scepticism, sceptics => skeptic", "sceptre => scepter", "shaikh, sheikh => sheik", "shivaree => charivari", "silverware, flatware => cutlery", "simultaneous => simultanous", "sleigh => sled", "smoulder, smouldering => smolder", "sombre => somber", "speciality => specialty", "spectre => specter", "splendour => splendor", "spoilt => spoiled", "street => road", "streetcar, tramway, tram => trolley-car", "succour => succor", "sulphate, sulphide, sulphur, sulphurous, sulfurous => sulfur", "super hero, superhero => hero", "surname => last name", "sweets => candy", "syphon => siphon", "syphoning => siphoning", "tack, thumb-tack, thumbtack => drawing pin", "tailpipe => exhaust pipe", "taleban => taliban", "teenager => teen", "television => tv", "thank you, thanks", "theatre => theater", "tickbox => checkbox", "ticked => checked", "timetable => schedule", "tinned => canned", "titbit => tidbit", "toffee => taffy", "tonne => ton", "transportation => transport", "trapezium => trapezoid", "trousers => pants", "tumour => tumor", "twitter => twitter, social media", "tyre => tire", "tyres => tires", "undershirt => singlet", "university => college", "upmarket => upscale", "valour => valor", "vapour => vapor", "vigour => vigor", "waggon => wagon", "windscreen, windshield => front shield", "world championship, world cup, worldcup", "worshipper, worshipping => worshiping", "yoghourt, yoghurt => yogurt", "zip, zip code, postal code, postcode", "zucchini => courgette" )
I realize that this answer departs somewhat from the OP's initial question, but if you just want to normalize American vs. British English spelling variants, you can look here for a manageably sized list (~1,700 replacements): http://www.tysto.com/uk-us-spelling-list.html. I'm sure there are others out there too that you could use to create a consolidated master list. Apart from spelling variation, you must be very careful not to blithely replace words in isolation with their (assumed!) counterparts in American English. I would advise against all but the most solid of lexical replacements. E.g., I can't see anything bad happening from this one "anticlockwise, counterclockwise, counter-clockwise => counter-clockwise" but this one "hobo, homeless, tramp => bum" would index "A homeless man" => *"A bum man", which is nonsense. (Not to mention that hobos, the homeless and "tramps" are quite distinct -- http://knowledgenuts.com/2014/11/26/the-difference-between-hobos-tramps-and-bums/.) In summary, apart from spelling variation, the American vs. British dialect divide is complicated and cannot be reduced to simple list look-ups. P.S. If you really want to do this right (i.e., account for grammatical context, etc.), you would probably need a context-sensitive paraphrase model to "translate" British to American English (or the inverse, depending on your needs) before it ever hits the ES index. This could be done (with sufficient parallel data) using an off-the-shelf statistical translation model, or maybe even some custom, in-house software that uses natural language parsing, POS tagging, chunking, etc.