How i can get value from field depending on language in elasticsearch - elasticsearch

I have ES index similar to the below mapping
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
"note": {
"type : text "
}
in note field, I have many values in Arabic and many values in English and other Lang ..
how I can get the Arabic values only
Note: I have many million documents

You can use the multi-fields with analyzer to implement the multi-lingual search
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"en": {
"type": "text",
"analyzer": "english"
},
"ar": {
"type": "text",
"analyzer": "arabic"
}
}
}
}
}
}
Refer medium blog on multi-language search using Elasticsearch

Related

Elasticsearch Not Returning Expected Results For Singular vs Plural

I'm currently having an issue with being unable to return hits for with a particular search term, and it's a bit perplexing to me:
Term: navy flower
The query would up looking like:
(name: "navy flower"~5 OR sku: "navy flower"~10 OR description: "navy flower"~5)
No hits.
If I change the term to: navy flowers
I get 3 hits with it:
The mappings I currently have setup on the index are as follows:
{
"mappings": {
"_doc": {
"properties": {
"active": {
"type": "long"
},
"description": {
"type": "text"
},
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"sku": {
"type": "text"
},
"upc": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
I'm must be missing something obvious for the match to not be working on the singular vs plural form of the word.
As per your index mapping, you have not specified any analyzer that means elastic search by default use standard analyzers and standard analyzer doesn't do stemming as by default it have only 2 token filter:
Lower Case Token Filter
Stop Token Filter (by default disabled)
For supporting your use case, you required Stemmer token filter with the analyzer. So you can create a custom analyzer and configured to the required field:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"active": {
"type": "long"
},
"description": {
"type": "text"
},
"id": {
"type": "integer"
},
"name": {
"type": "text",
"analyzer": "my_analyzer"
},
"sku": {
"type": "text"
},
"upc": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
After this you can search with below query:
GET test/_search?q=(name:"navy flower"~5 OR sku: "navy flower"~10 OR description: "navy flower"~5)

How to exclude fields from being indexed in ElasticSearch?

I am trying to utilize ElasticSearch to store large sets of data. Most of the data will be searchable, however, there are some field that will be there just so the data is stored and returned upon request.
Here is my mapping
{
"mappings": {
"properties": {
"amenities": {
"type": "completion"
},
"summary": {
"type": "text"
},
"street_number": {
"type": "text"
},
"street_name": {
"type": "text"
},
"street_suffix": {
"type": "text"
},
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"state_or_province": {
"type": "text"
},
"postal_code": {
"type": "text"
},
"mlsid": {
"type": "text"
},
"source_id": {
"type": "text"
},
"status": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"subtype": {
"type": "keyword"
},
"year_built": {
"type": "short"
},
"community": {
"type": "keyword"
},
"elementary_school": {
"type": "keyword"
},
"middle_school": {
"type": "keyword"
},
"jr_high_school": {
"type": "keyword"
},
"high_school": {
"type": "keyword"
},
"area_size": {
"type": "double"
},
"lot_size": {
"type": "double"
},
"bathrooms": {
"type": "double"
},
"bedrooms": {
"type": "double"
},
"listed_at": {
"type": "date"
},
"price": {
"type": "double"
},
"sold_at": {
"type": "date"
},
"sold_for": {
"type": "double"
},
"total_photos": {
"type": "short"
},
"formatted_addressLine": {
"type": "text"
},
"formatted_address": {
"type": "text"
},
"location": {
"type": "geo_point"
},
"price_changes": {
"type": "object"
},
"fields": {
"type": "object"
},
"deleted_at": {
"type": "date"
},
"is_available": {
"type": "boolean"
},
"is_unable_to_find_coordinates": {
"type": "boolean"
},
"source": {
"type": "keyword"
}
}
}
}
The fields and price_changes properties are there in case the user want to read that info. But that info should not be searchable or indexed. The fields holds a large list of key-value pairs whereas price_changes fields hold multiple objects of the same type.
Currently, when I attempt to bulk create records, I get Limit of total fields [1000] has been exceeded error. I am guessing this error is happening because every key-value pair in the fields collection is considered a field in elasticsearch.
How can I store the fields and the price_changes object as non-searchable data and not index it or count it toward the fields count?
You could use the enabled property at field level to store the fields without indexing them.
Read here https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html
"price_changes": {
"type": "object",
"enabled": false
}
NOTE: Are you able to create an index using the mapping you gave in the question? It gives me syntax errors(Duplicate key) at "type" field. I think you are missing a closing bracket for "city" field.

Elasticsearch - dynamic mapping with multi-field support

Is it possible to add new fields with multi-field support dynamically?
My index has properties that will only be known at indexing time. So these fields will be included with dynamic mapping.
But, when a new field is added dynamically, I need it to be mapped as text and with three sub-fields: keyword, date (if it fits with dynamic_date_formats) and long.
With these three sub-fields I will be able to search and aggregate many queries with maximum performance.
I know I can do a "pre" mapping my index with these "dynamic fields" using nested field with key and value properties so I can create the value property with these three sub-fields. But I don't want to create a nested key/value field because it's not very fast when performing aggregations with a lot of documents.
I found it.
Dynamic templates is the answer.
Very simple :)
{
"mappings": {
"doc": {
"dynamic_templates": [
{
"objs": {
"match_mapping_type": "object",
"mapping": {
"type": "{dynamic_type}"
}
}
},
{
"attrs": {
"match_mapping_type": "*",
"mapping": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"long": {
"type": "long",
"ignore_malformed": true
},
"double": {
"type": "double",
"ignore_malformed": true
},
"date": {
"type": "date",
"format": "dd/MM/yyyy||dd/MM/yyyy HH:mm:ss||dd/MM/yyyy HH:mm",
"ignore_malformed": true
}
}
}
}
}
],
"dynamic": "strict",
"properties": {
"fixed": {
"properties": {
"aaa": {
"type": "text"
},
"bbb": {
"type": "long"
},
"ccc": {
"type": "date",
"format": "dd/MM/yyyy"
}
}
},
"dyn": {
"dynamic": true,
"properties": {
}
}
}
}
}
}

Elasticsearch for multiple language support

I am using elasticsearch 5.1.1.
I have a requirement where in I want to index data in multiple languages.
I used following mapping:
PUT http://localhost:9200/movies
{
"mappings": {
"title": {
"properties": {
"title": {
"type": "string",
"fields": {
"de": {
"type": "string",
"analyzer": "german"
},
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
},
"es": {
"type": "string",
"analyzer": "spanish"
}
}
}
}
}
}
}
when I try to insert some data as :
POST http://localhost:9200/movies/movie/1
{
"title.en" :"abc123"
}
I am getting following error:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[IQ7CUTp][127.0.0.1:9300][indices:data/write/index[p]]"
}
],
"type": "illegal_argument_exception",
"reason": "[title] is defined as an object in mapping [movie] but this name is already used for a field in other types"
},
"status": 400
}
Can someone point me what is wrong here?
The problem is that the title field is declared as a string and you're trying to access the title.en sub-field like you would do if title was and object field. You need to change your mapping like this instead and then it will work:
{
"mappings": {
"title": {
"properties": {
"title": {
"type": "object", <--- change this
"properties": { <--- and this
"de": {
"type": "string",
"analyzer": "german"
},
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
},
"es": {
"type": "string",
"analyzer": "spanish"
}
}
}
}
}
}
}
As I can see, you have defined the title both as a type and as a property.
The error seems to state this issue.
From your post call, I see that the type is the movie.
Do you really want the title as a type?
You should define the mapping for the title inside the movie type.

Elasticsearch.js analyzer error using custom analyzer

Using the latest version of the elasticsearch.js and trying to create a custom path analyzer when indexing and creating the mapping for some posts.
The goal is creating keywords out of each segment of the path. However as a start simply trying to get the analyzer working.
Here is the elasticsearch.js create_mapped_index.js, you can see the custom analyzer near the top of the file:
var client = require('./connection.js');
client.indices.create({
index: "wcm-posts",
body: {
"settings": {
"analysis": {
"analyzer": {
"wcm_path_analyzer": {
"tokenizer": "wcm_path_tokenizer",
"type": "custom"
}
},
"tokenizer": {
"wcm_path_tokenizer": {
"type": "pattern",
"pattern": "/"
}
}
}
},
"mappings": {
"post": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"titles": {
"type": "object",
"properties": {
"main": { "type": "string" },
"subtitle": { "type": "string" },
"alternate": { "type": "string" },
"concise": { "type": "string" },
"seo": { "type": "string" }
}
},
"tags": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"name": { "type": "string", "index": "not_analyzed" },
"slug": { "type": "string" }
},
},
"main_taxonomies": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"name": { "type": "string", "index": "not_analyzed" },
"slug": { "type": "string", "index": "not_analyzed" },
"path": { "type": "string", "index": "wcm_path_analyzer" }
},
},
"categories": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"name": { "type": "string", "index": "not_analyzed" },
"slug": { "type": "string", "index": "not_analyzed" },
"path": { "type": "string", "index": "wcm_path_analyzer" }
},
},
"content_elements": {
"dynamic": "true",
"type": "nested",
"properties": {
"content": { "type": "string" }
}
}
}
}
}
}
}, function (err, resp, respcode) {
console.log(err, resp, respcode);
});
If the call to wcm_path_analyzer is set to "non_analyzed" or index is omitted the index, mapping and insertion of posts work.
As soon as I try to use the custom analyzer on the main_taxonomy and categories path fields, like shown in the json above, I get this error:
response: '{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"wrong value for index [wcm_path_analyzer] for field [path]"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [post]: wrong value for index [wcm_path_analyzer] for field [path]","caused_by":{"type":"mapper_parsing_exception","reason":"wrong value for index [wcm_path_analyzer] for field [path]"}},"status":400}',
toString: [Function],
toJSON: [Function] } { error:
{ root_cause: [ [Object] ],
type: 'mapper_parsing_exception',
reason: 'Failed to parse mapping [post]: wrong value for index [wcm_path_analyzer] for field [path]',
caused_by:
{ type: 'mapper_parsing_exception',
reason: 'wrong value for index [wcm_path_analyzer] for field [path]' } },
status: 400 } 400
Here is an example of the two objects that need the custom analyzer on the path field. I pulled this example, after inserting 15 posts into the elasticsearch index when not using the custom analyzer:
"main_taxonomies": [
{
"id": "123",
"type": "category",
"name": "News",
"slug": "news",
"path": "/News/"
}
],
"categories": [
{
"id": "157",
"name": "Local News",
"slug": "local-news",
"path": "/News/Local News/",
"main": true
},
To this point, I had googled similar questions and most said that people were missing putting the analyzers in settings and not adding the parameters to the body. I believe this is correct.
I have also reviewed the elasticsearch.js documentation and tried to create a:
client.indices.putSettings({})
But for this to be used the index needs to exist with the mappings or it throws an error 'no indices found'
Not sure where to go from here? Your suggestions are appreciated.
So the final analyzer is:
var client = require('./connection.js');
client.indices.create({
index: "wcm-posts",
body: {
"settings": {
"analysis": {
"analyzer": {
"wcm_path_analyzer": {
"type" : "pattern",
"lowercase": true,
"pattern": "/"
}
}
}
},
"mappings": {
"post": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"client_id": { "type": "string", "index": "not_analyzed" },
"license_id": { "type": "string", "index": "not_analyzed" },
"origin_id": { "type": "string" },
...
...
"origin_slug": { "type": "string" },
"main_taxonomies_path": { "type": "string", "analyzer": "wcm_path_analyzer", "search_analyzer": "standard" },
"categories_paths": { "type": "string", "analyzer": "wcm_path_analyzer", "search_analyzer": "standard" },
"search_tags": { "type": "string" },
// See the custom analyzer set here --------------------------^
I did determine that at least for the path or pattern analyzers that complex nested or objects cannot be used. The flattened fields set to "type": "string" was the only way to get this to work.
I ended up not needing a custom tokenizer as the pattern analyzer is full featured and already includes a tokenizer.
I chose to use the pattern analyzer as it breaks on the pattern leaving individual terms whereas the path segments the path in different ways but does not create individual terms ( I hope I'm correct in saying this. I base it on the documentation ).
Hope this helps someone else!
Steve
So I got it working ... I think that the json objects were too complex or it was the change of adding the analyzer to the field mappings that did the trick.
first I flattened out:
To:
"main_taxonomies_path": "/News/",
"categories_paths": [ "/News/Local/", "/Business/Local/" ],
"search_tags": [ "montreal-3","laval-4" ],
Then I updated the analyzer to:
"settings": {
"analysis": {
"analyzer": {
"wcm_path_analyzer": {
"tokenizer": "wcm_path_tokenizer",
"type": "custom"
}
},
"tokenizer": {
"wcm_path_tokenizer": {
"type": "pattern",
"pattern": "/",
"replacement": ","
}
}
}
},
Notice that the analyzer 'type' is set to custom.
Then when mapping theses flattened fields:
"main_taxonomies_path": { "type": "string", "analyzer": "wcm_path_analyzer" },
"categories_paths": { "type": "string", "analyzer": "wcm_path_analyzer" },
"search_tags": { "type": "string" },
which when searching yields for these fields:
"main_taxonomies_path": "/News/",
"categories_paths": [ "/News/Local News/", "/Business/Local Business/" ],
"search_tags": [ "montreal-2", "laval-3" ],
So the custom analyzer does what it was set to do in this situation.
I'm not sure if I could apply type object to the main_taxonomies_path and categories_paths, so I will play around with this and see.
I will be refining the pattern searches to format the results differently but happy to have this working.
For completeness I will put my final custom pattern analyzer, mapping and results, once I've completed this.
Regards,
Steve

Resources