Apply analyzer on Object fields

Apply analyzer on Object fields - elasticsearch

I have this analyzer:
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true,
"preserve_original": true
}
},
"analyzer": {
"word_join_analyzer": {
"type": "custom",
"filter": [
"word_joiner"
],
"tokenizer": "keyword"
}
}
}
}
}
I apply it on this field:
#Field(type = FieldType.Object, analyzer = "word_join_analyzer")
private Description description;
And here is the Description class:
public class Description {
#JsonProperty("localizedDescriptions")
private Map<String, String> descriptions = new HashMap<>();
}
This is the resulting Elasticsearch mapping for this field:
{
"description":{
"properties":{
"localizedDescriptions":{
"properties":{
"en":{
"type":"string"
},
"fr":{
"type":"string"
},
"it":{
"type":"string"
}
}
}
}
}
}
Like you can see, the anlyzer is not applied at all. It works well with string fields, but I have a hard time doing it with Object types. Any ideas?
Thanks!
EDIT: I tried to use a dynamic mapping:
{
"iam":{
"properties":{
"dynamic_templates":[
{
"localized_strings_values":{
"path_match":"description.localizedDescriptions.*",
"mapping":{
"type":"string",
"analyzer":"word_join_analyzer"
}
}
}
]
}
}
}
But I have this error:
Expected map for property [fields] on field [dynamic_templates] but got a class java.lang.String
Why do I get this error?

Finaly solved this. This is the correct mapping:
{
"cake": {
"dynamic_templates": [
{
"localized_descriptions": {
"path_match": "description.localizedDescriptions.*",
"mapping": {
"type": "string",
"analyzer": "word_join_analyzer"
}
}
}
]
}
}

Related

Custom analyzer, use case : zip-code [ElasticSearch]

Let be a set index/type named customers/customer.
Each document of this set has a zip-code as property.
Basically, a zip-code can be like:
String-String (ex : 8907-1009)
String String (ex : 211-20)
String (ex : 30200)
I'd like to set my index analyzer to get as many documents as possible that could match. Currently, I work like that :
PUT /customers/
{
"mappings":{
"customer":{
"properties":{
"zip-code": {
"type":"string"
"index":"not_analyzed"
}
some string properties ...
}
}
}
When I search a document I'm using that request :
GET /customers/customer/_search
{
"query":{
"prefix":{
"zip-code":"211-20"
}
}
}
That works if you want to search rigourously. But for instance if the zip-code is "200 30", then searching with "200-30" will not give any results.
I'd like to give orders to my index analyser in order to don't have this problem.
Can someone help me ?
Thanks.
P.S. If you want more information, please let me know ;)

As soon as you want to find variations you don't want to use not_analyzed.
Let's try this with a different mapping:
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_code": {
"tokenizer": "standard",
"filter": [ ]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_code"
}
}
}
}
}
We're using the standard tokenizer; strings will be broken up at whitespaces and punctuation marks (including dashes) into tokens. You can see the actual tokens if you run the following query:
POST zip/_analyze
{
"analyzer": "zip_code",
"text": ["8907-1009", "211-20", "30200"]
}
Add your examples:
POST zip/_doc
{
"zip": "8907-1009"
}
POST zip/_doc
{
"zip": "211-20"
}
POST zip/_doc
{
"zip": "30200"
}
Now the query seems to work fine:
GET zip/_search
{
"query": {
"match": {
"zip": "211-20"
}
}
}
This will also work if you just search for "211". However, this might be too lenient, since it will also find "20", "20-211", "211-10",...
What you probably want is a phrase search where all the tokens in your query need to be in the field and also in the right order:
GET zip/_search
{
"query": {
"match_phrase": {
"zip": "211"
}
}
}
Addition:
If the ZIP codes have a hierarchical meaning (if you have "211-20" you want this to be found when searching for "211", but not when searching for "20"), you can use the path_hierarchy tokenizer.
So changing the mapping to this:
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_code": {
"tokenizer": "zip_tokenizer",
"filter": [ ]
}
},
"tokenizer": {
"zip_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_code"
}
}
}
}
}
Using the same 3 documents from above you can use the match query now:
GET zip/_search
{
"query": {
"match": {
"zip": "1009"
}
}
}
"1009" won't find anything, but "8907" or "8907-1009" will.
If you want to also find "1009", but with a lower score, you'll have to analyze the zip code with both variations I have shown (combine the 2 versions of the mapping):
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_hierarchical": {
"tokenizer": "zip_tokenizer",
"filter": [ ]
},
"zip_standard": {
"tokenizer": "standard",
"filter": [ ]
}
},
"tokenizer": {
"zip_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_standard",
"fields": {
"hierarchical": {
"type": "text",
"analyzer": "zip_hierarchical"
}
}
}
}
}
}
}
Add a document with the inverse order to properly test it:
POST zip/_doc
{
"zip": "1009-111"
}
Then search both fields, but boost the one with the hierarchical tokenizer by 3:
GET zip/_search
{
"query": {
"multi_match" : {
"query" : "1009",
"fields" : [ "zip", "zip.hierarchical^3" ]
}
}
}
Then you can see that "1009-111" has a much higher score than "8907-1009".

How to use nested mapping in language analyzer

I am presently working with language analyzer in elasticsearch. In this I found that if we need to use the analyzer for searching documents then we need to define mapping along with analyzer.
In my case, if document contains a normal text field this works fine but when I apply same property to a nested field then the analyzer is not working.
This is code for language analyzer
PUT checkmap
{
"settings": {
"analysis": {
"analyzer": {
"stemmerenglish": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_stemmer"
]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
},
"mappings": {
"dd": {
"properties": {
"Courses": {
"type": "nested",
"properties": {
"Sname": {
"type": "text",
"analyzer": "stemmerenglish",
"search_analyzer": "stemmerenglish"
}
}
}
}
}
}
}
Please help me out with above problem.

You have to use Nested Query for nested type. Use following Query
GET checkmap/_search
{
"query": {
"nested": {
"path": "Courses",
"query": {
"match": {
"Courses.Sname": {
"query": "Jump"
}
}
}
}
}
}
Read more here

Searching in all fields, case insensitive, and not analyzed

In elasticSearch,
How can I define a dynamic default mapping for any field (the fields are not predefined) that is searchable with spaces and case insensitive values.
For example, if i have two documents:
PUT myindex/mytype/1
{
"transaction": "test"
}
and
PUT myindex/mytype/2
{
"transaction": "test SPACE"
}
I'd like to perform the following queries:
Querying: "test", Expected result: "test"
Querying: "test space", Expected result "test SPACE"
I've tried to use:
PUT myindex
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"test":{
"properties":{
"title":{
"analyzer":"analyzer_keyword",
"type":"string"
}
}
}
}
}
But it gives me both document as result when looking for "test".

Apparently there was a mistake running my query:
Here's a solution I found to this problem, when using multi field query:
#any field mapping - not analyzed and case insensitive
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
},
"mappings": {
"doc": {
"dynamic_templates": [
{ "notanalyzed": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"analyzer":"analyzer_keyword"
}
}
}
]
}
}
}
#index test data
POST /test_index/doc/_bulk
{"index":{"_id":3}}
{"name":"Company Solutions", "a" : "a1"}
{"index":{"_id":4}}
{"name":"Company", "a" : "a2"}
#search for document with name “company” and a “a1”
POST /test_index/doc/_search
{
"query" : {
"filtered" : {
"filter": {
"and": {
"filters": [
{
"query": {
"match": {
"name": "company"
}
}
},
{
"query": {
"match": {
"a": "a2"
}
}
}
]
}
}
}
}
}

How to exclude inherited object properties from mappings

I'm trying to setup a mapping for an object that looks like this:
class TestObject
{
public long TestID { get; set; }
[ElasticProperty(Type = FieldType.Object)]
public Dictionary<long, List<DateTime>> Items { get; set; }
}
I use the following mapping code (where Client is IElasticClient):
this.Client.Map<TestObject>(m => m.MapFromAttributes());
I get the following mapping result:
{
"mappings": {
"testobject": {
"properties": {
"items": {
"properties": {
"comparer": {
"type": "object"
},
"count": {
"type": "integer"
},
"item": {
"type": "date",
"format": "dateOptionalTime"
},
"keys": {
"properties": {
"count": {
"type": "integer"
}
}
},
"values": {
"properties": {
"count": {
"type": "integer"
}
}
}
}
},
"testID": {
"type": "long"
}
}
}
}
This becomes a problem when I want to do a search like this:
{
"query_string": {
"query": "[2015-06-03T00:00:00.000 TO 2015-06-05T23:59:59.999]",
"fields": [
"items.*"
]
}
}
This causes exceptions, that I guess are because of all the fields in the items object are not of the same type. What is the proper mapping to searches of this type?

I was able to fix this by using the following mapping:
this.Client.Map<TestObject>(m => m.MapFromAttributes())
.Properties(p => p
.Object<Dictionary<long, List<DateTime>>>(o => o.Name("items")));

ElasticSearch Snowball Analyzer not working with nested query

I have created an index with the following mapping
PUT http://localhost:9200/test1
{
"mappings": {
"searchText": {
"properties": {
"catalogue_product": {
"type":"nested",
"properties": {
"id": {
"type": "string",
"index":"not_analyzed"
},
"long_desc": {
"type":"nested",
"properties": {
"translation": {
"type":"nested",
"properties": {
"en-GB": {
"type": "string",
"anlayzer": "snowball"
},
"fr-FR": {
"type": "string",
"anlayzer": "snowball"
}
}
}
}
}
}
}
}
}
}
}
I have put one record using
PUT http://localhost:9200/test1/searchText/1
{
"catalogue_product": {
"id": "18437",
"long_desc": {
"translation": {
"en-GB": "C120 - circuit breaker - C120H - 4P - 125A - B curve",
"fr-FR": "Disjoncteur C120H 4P 125A courbe B 15000A"
}
}
}
}
Then if i do a search for the word
breaker
inside
catalogue_product.long_desc.translation.en-GB
I get the added record
POST http://localhost:9200/test1/searchText/_search
{
"query": {
"nested": {
"path": "catalogue_product.long_desc.translation",
"query": {
"match": {
"catalogue_product.long_desc.translation.en-GB": "breaker"
}
}
}
}
}
if replace the word
breaker
with
breakers
, I dont get any records in spite of the en-GB field having analyzer=snowball in the mapping
POST http://localhost:9200/test1/searchText/_search
{
"query": {
"nested": {
"path": "catalogue_product.long_desc.translation",
"query": {
"match": {
"catalogue_product.long_desc.translation.en-GB": "breakers"
}
}
}
}
}
I am going crazy with this. Where am I going wrong?
I tried a new mapping with analyzer as english instead of snowball, but that did not work either :(
Any help is appreciated

Dude , its a typo. Its analyzer and not anlayzer
PUT http://localhost:9200/test1
{
"mappings": {
"searchText": {
"properties": {
"catalogue_product": {
"type":"nested",
"properties": {
"id": {
"type": "string",
"index":"not_analyzed"
},
"long_desc": {
"type":"nested",
"properties": {
"translation": {
"type":"nested",
"properties": {
"en-GB": {
"type": "string",
"analyzer": "snowball"
},
"fr-FR": {
"type": "string",
"analyzer": "snowball"
}
}
}
}
}
}
}
}
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Apply analyzer on Object fields - elasticsearch

Finaly solved this. This is the correct mapping: { "cake": { "dynamic_templates": [ { "localized_descriptions": { "path_match": "description.localizedDescriptions.*", "mapping": { "type": "string", "analyzer": "word_join_analyzer" } } } ] } }

Related

Custom analyzer, use case : zip-code [ElasticSearch]

How to use nested mapping in language analyzer

Searching in all fields, case insensitive, and not analyzed

How to exclude inherited object properties from mappings

ElasticSearch Snowball Analyzer not working with nested query

Categories

Resources