Apply analyzer on Object fields - elasticsearch

I have this analyzer:
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true,
"preserve_original": true
}
},
"analyzer": {
"word_join_analyzer": {
"type": "custom",
"filter": [
"word_joiner"
],
"tokenizer": "keyword"
}
}
}
}
}
I apply it on this field:
#Field(type = FieldType.Object, analyzer = "word_join_analyzer")
private Description description;
And here is the Description class:
public class Description {
#JsonProperty("localizedDescriptions")
private Map<String, String> descriptions = new HashMap<>();
}
This is the resulting Elasticsearch mapping for this field:
{
"description":{
"properties":{
"localizedDescriptions":{
"properties":{
"en":{
"type":"string"
},
"fr":{
"type":"string"
},
"it":{
"type":"string"
}
}
}
}
}
}
Like you can see, the anlyzer is not applied at all. It works well with string fields, but I have a hard time doing it with Object types. Any ideas?
Thanks!
EDIT: I tried to use a dynamic mapping:
{
"iam":{
"properties":{
"dynamic_templates":[
{
"localized_strings_values":{
"path_match":"description.localizedDescriptions.*",
"mapping":{
"type":"string",
"analyzer":"word_join_analyzer"
}
}
}
]
}
}
}
But I have this error:
Expected map for property [fields] on field [dynamic_templates] but got a class java.lang.String
Why do I get this error?

Finaly solved this. This is the correct mapping:
{
"cake": {
"dynamic_templates": [
{
"localized_descriptions": {
"path_match": "description.localizedDescriptions.*",
"mapping": {
"type": "string",
"analyzer": "word_join_analyzer"
}
}
}
]
}
}

Related

Custom analyzer, use case : zip-code [ElasticSearch]

Let be a set index/type named customers/customer.
Each document of this set has a zip-code as property.
Basically, a zip-code can be like:
String-String (ex : 8907-1009)
String String (ex : 211-20)
String (ex : 30200)
I'd like to set my index analyzer to get as many documents as possible that could match. Currently, I work like that :
PUT /customers/
{
"mappings":{
"customer":{
"properties":{
"zip-code": {
"type":"string"
"index":"not_analyzed"
}
some string properties ...
}
}
}
When I search a document I'm using that request :
GET /customers/customer/_search
{
"query":{
"prefix":{
"zip-code":"211-20"
}
}
}
That works if you want to search rigourously. But for instance if the zip-code is "200 30", then searching with "200-30" will not give any results.
I'd like to give orders to my index analyser in order to don't have this problem.
Can someone help me ?
Thanks.
P.S. If you want more information, please let me know ;)
As soon as you want to find variations you don't want to use not_analyzed.
Let's try this with a different mapping:
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_code": {
"tokenizer": "standard",
"filter": [ ]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_code"
}
}
}
}
}
We're using the standard tokenizer; strings will be broken up at whitespaces and punctuation marks (including dashes) into tokens. You can see the actual tokens if you run the following query:
POST zip/_analyze
{
"analyzer": "zip_code",
"text": ["8907-1009", "211-20", "30200"]
}
Add your examples:
POST zip/_doc
{
"zip": "8907-1009"
}
POST zip/_doc
{
"zip": "211-20"
}
POST zip/_doc
{
"zip": "30200"
}
Now the query seems to work fine:
GET zip/_search
{
"query": {
"match": {
"zip": "211-20"
}
}
}
This will also work if you just search for "211". However, this might be too lenient, since it will also find "20", "20-211", "211-10",...
What you probably want is a phrase search where all the tokens in your query need to be in the field and also in the right order:
GET zip/_search
{
"query": {
"match_phrase": {
"zip": "211"
}
}
}
Addition:
If the ZIP codes have a hierarchical meaning (if you have "211-20" you want this to be found when searching for "211", but not when searching for "20"), you can use the path_hierarchy tokenizer.
So changing the mapping to this:
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_code": {
"tokenizer": "zip_tokenizer",
"filter": [ ]
}
},
"tokenizer": {
"zip_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_code"
}
}
}
}
}
Using the same 3 documents from above you can use the match query now:
GET zip/_search
{
"query": {
"match": {
"zip": "1009"
}
}
}
"1009" won't find anything, but "8907" or "8907-1009" will.
If you want to also find "1009", but with a lower score, you'll have to analyze the zip code with both variations I have shown (combine the 2 versions of the mapping):
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_hierarchical": {
"tokenizer": "zip_tokenizer",
"filter": [ ]
},
"zip_standard": {
"tokenizer": "standard",
"filter": [ ]
}
},
"tokenizer": {
"zip_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_standard",
"fields": {
"hierarchical": {
"type": "text",
"analyzer": "zip_hierarchical"
}
}
}
}
}
}
}
Add a document with the inverse order to properly test it:
POST zip/_doc
{
"zip": "1009-111"
}
Then search both fields, but boost the one with the hierarchical tokenizer by 3:
GET zip/_search
{
"query": {
"multi_match" : {
"query" : "1009",
"fields" : [ "zip", "zip.hierarchical^3" ]
}
}
}
Then you can see that "1009-111" has a much higher score than "8907-1009".

How to use nested mapping in language analyzer

I am presently working with language analyzer in elasticsearch. In this I found that if we need to use the analyzer for searching documents then we need to define mapping along with analyzer.
In my case, if document contains a normal text field this works fine but when I apply same property to a nested field then the analyzer is not working.
This is code for language analyzer
PUT checkmap
{
"settings": {
"analysis": {
"analyzer": {
"stemmerenglish": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_stemmer"
]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
},
"mappings": {
"dd": {
"properties": {
"Courses": {
"type": "nested",
"properties": {
"Sname": {
"type": "text",
"analyzer": "stemmerenglish",
"search_analyzer": "stemmerenglish"
}
}
}
}
}
}
}
Please help me out with above problem.
You have to use Nested Query for nested type. Use following Query
GET checkmap/_search
{
"query": {
"nested": {
"path": "Courses",
"query": {
"match": {
"Courses.Sname": {
"query": "Jump"
}
}
}
}
}
}
Read more here

Searching in all fields, case insensitive, and not analyzed

In elasticSearch,
How can I define a dynamic default mapping for any field (the fields are not predefined) that is searchable with spaces and case insensitive values.
For example, if i have two documents:
PUT myindex/mytype/1
{
"transaction": "test"
}
and
PUT myindex/mytype/2
{
"transaction": "test SPACE"
}
I'd like to perform the following queries:
Querying: "test", Expected result: "test"
Querying: "test space", Expected result "test SPACE"
I've tried to use:
PUT myindex
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"test":{
"properties":{
"title":{
"analyzer":"analyzer_keyword",
"type":"string"
}
}
}
}
}
But it gives me both document as result when looking for "test".
Apparently there was a mistake running my query:
Here's a solution I found to this problem, when using multi field query:
#any field mapping - not analyzed and case insensitive
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
},
"mappings": {
"doc": {
"dynamic_templates": [
{ "notanalyzed": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"analyzer":"analyzer_keyword"
}
}
}
]
}
}
}
#index test data
POST /test_index/doc/_bulk
{"index":{"_id":3}}
{"name":"Company Solutions", "a" : "a1"}
{"index":{"_id":4}}
{"name":"Company", "a" : "a2"}
#search for document with name “company” and a “a1”
POST /test_index/doc/_search
{
"query" : {
"filtered" : {
"filter": {
"and": {
"filters": [
{
"query": {
"match": {
"name": "company"
}
}
},
{
"query": {
"match": {
"a": "a2"
}
}
}
]
}
}
}
}
}

How to exclude inherited object properties from mappings

I'm trying to setup a mapping for an object that looks like this:
class TestObject
{
public long TestID { get; set; }
[ElasticProperty(Type = FieldType.Object)]
public Dictionary<long, List<DateTime>> Items { get; set; }
}
I use the following mapping code (where Client is IElasticClient):
this.Client.Map<TestObject>(m => m.MapFromAttributes());
I get the following mapping result:
{
"mappings": {
"testobject": {
"properties": {
"items": {
"properties": {
"comparer": {
"type": "object"
},
"count": {
"type": "integer"
},
"item": {
"type": "date",
"format": "dateOptionalTime"
},
"keys": {
"properties": {
"count": {
"type": "integer"
}
}
},
"values": {
"properties": {
"count": {
"type": "integer"
}
}
}
}
},
"testID": {
"type": "long"
}
}
}
}
This becomes a problem when I want to do a search like this:
{
"query_string": {
"query": "[2015-06-03T00:00:00.000 TO 2015-06-05T23:59:59.999]",
"fields": [
"items.*"
]
}
}
This causes exceptions, that I guess are because of all the fields in the items object are not of the same type. What is the proper mapping to searches of this type?
I was able to fix this by using the following mapping:
this.Client.Map<TestObject>(m => m.MapFromAttributes())
.Properties(p => p
.Object<Dictionary<long, List<DateTime>>>(o => o.Name("items")));

ElasticSearch Snowball Analyzer not working with nested query

I have created an index with the following mapping
PUT http://localhost:9200/test1
{
"mappings": {
"searchText": {
"properties": {
"catalogue_product": {
"type":"nested",
"properties": {
"id": {
"type": "string",
"index":"not_analyzed"
},
"long_desc": {
"type":"nested",
"properties": {
"translation": {
"type":"nested",
"properties": {
"en-GB": {
"type": "string",
"anlayzer": "snowball"
},
"fr-FR": {
"type": "string",
"anlayzer": "snowball"
}
}
}
}
}
}
}
}
}
}
}
I have put one record using
PUT http://localhost:9200/test1/searchText/1
{
"catalogue_product": {
"id": "18437",
"long_desc": {
"translation": {
"en-GB": "C120 - circuit breaker - C120H - 4P - 125A - B curve",
"fr-FR": "Disjoncteur C120H 4P 125A courbe B 15000A"
}
}
}
}
Then if i do a search for the word
breaker
inside
catalogue_product.long_desc.translation.en-GB
I get the added record
POST http://localhost:9200/test1/searchText/_search
{
"query": {
"nested": {
"path": "catalogue_product.long_desc.translation",
"query": {
"match": {
"catalogue_product.long_desc.translation.en-GB": "breaker"
}
}
}
}
}
if replace the word
breaker
with
breakers
, I dont get any records in spite of the en-GB field having analyzer=snowball in the mapping
POST http://localhost:9200/test1/searchText/_search
{
"query": {
"nested": {
"path": "catalogue_product.long_desc.translation",
"query": {
"match": {
"catalogue_product.long_desc.translation.en-GB": "breakers"
}
}
}
}
}
I am going crazy with this. Where am I going wrong?
I tried a new mapping with analyzer as english instead of snowball, but that did not work either :(
Any help is appreciated
Dude , its a typo. Its analyzer and not anlayzer
PUT http://localhost:9200/test1
{
"mappings": {
"searchText": {
"properties": {
"catalogue_product": {
"type":"nested",
"properties": {
"id": {
"type": "string",
"index":"not_analyzed"
},
"long_desc": {
"type":"nested",
"properties": {
"translation": {
"type":"nested",
"properties": {
"en-GB": {
"type": "string",
"analyzer": "snowball"
},
"fr-FR": {
"type": "string",
"analyzer": "snowball"
}
}
}
}
}
}
}
}
}
}
}

Resources