ElasticSearch Snowball Analyzer not working with nested query - elasticsearch

I have created an index with the following mapping
PUT http://localhost:9200/test1
{
"mappings": {
"searchText": {
"properties": {
"catalogue_product": {
"type":"nested",
"properties": {
"id": {
"type": "string",
"index":"not_analyzed"
},
"long_desc": {
"type":"nested",
"properties": {
"translation": {
"type":"nested",
"properties": {
"en-GB": {
"type": "string",
"anlayzer": "snowball"
},
"fr-FR": {
"type": "string",
"anlayzer": "snowball"
}
}
}
}
}
}
}
}
}
}
}
I have put one record using
PUT http://localhost:9200/test1/searchText/1
{
"catalogue_product": {
"id": "18437",
"long_desc": {
"translation": {
"en-GB": "C120 - circuit breaker - C120H - 4P - 125A - B curve",
"fr-FR": "Disjoncteur C120H 4P 125A courbe B 15000A"
}
}
}
}
Then if i do a search for the word
breaker
inside
catalogue_product.long_desc.translation.en-GB
I get the added record
POST http://localhost:9200/test1/searchText/_search
{
"query": {
"nested": {
"path": "catalogue_product.long_desc.translation",
"query": {
"match": {
"catalogue_product.long_desc.translation.en-GB": "breaker"
}
}
}
}
}
if replace the word
breaker
with
breakers
, I dont get any records in spite of the en-GB field having analyzer=snowball in the mapping
POST http://localhost:9200/test1/searchText/_search
{
"query": {
"nested": {
"path": "catalogue_product.long_desc.translation",
"query": {
"match": {
"catalogue_product.long_desc.translation.en-GB": "breakers"
}
}
}
}
}
I am going crazy with this. Where am I going wrong?
I tried a new mapping with analyzer as english instead of snowball, but that did not work either :(
Any help is appreciated

Dude , its a typo. Its analyzer and not anlayzer
PUT http://localhost:9200/test1
{
"mappings": {
"searchText": {
"properties": {
"catalogue_product": {
"type":"nested",
"properties": {
"id": {
"type": "string",
"index":"not_analyzed"
},
"long_desc": {
"type":"nested",
"properties": {
"translation": {
"type":"nested",
"properties": {
"en-GB": {
"type": "string",
"analyzer": "snowball"
},
"fr-FR": {
"type": "string",
"analyzer": "snowball"
}
}
}
}
}
}
}
}
}
}
}

Related

Elasticsearch : using fuzzy search to find abbreviations

I have indexed textual articles which mentions company names, like apple and lemonade, and am trying to search for these companies using their abbreviations like APPL and LMND but fuzzy search is giving other results, for example, searching with LMND gives land which is mentioned in the text but it doesn't output lemonade whichever parameters I tried.
First question
Is fuzzy search the suitable solution for such search ?
Second question
what could be a good parameter values ranges to support my problem ?
UPDATE
I have tried synonym filter
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonyms_filter": {
"type": "synonym",
"synonyms": [
"apple,APPL",
"lemonade,LMND"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonyms_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"transcript_data": {
"properties": {
"words": {
"type": "nested",
"properties": {
"word": {
"type": "text",
"search_analyzer":"synonym_analyzer"
}
}
}
}
}
}
}
}
and for SEARCH I used
{
"_source": false,
"query": {
"nested": {
"path": "transcript_data.words",
"query": {
"match": {
"transcript_data.words.word": "lmnd"
}
}
}
}
}
but it's not working
I believe that the best option for you is the use of synonyms, they serve exactly what you need.
I'll leave an example and the link to an article explaining some details.
PUT teste
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonyms_filter": {
"type": "synonym",
"synonyms": [
"apple,APPL",
"lemonade,LMND"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonyms_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"transcript_data": {
"properties": {
"words": {
"type": "nested",
"properties": {
"word": {
"type": "text",
"analyzer":"synonym_analyzer"
}
}
}
}
}
}
}
}
POST teste/_bulk
{"index":{}}
{"transcript_data": {"words":{"word":"apple"}}}
GET teste/_search
{
"query": {
"nested": {
"path": "transcript_data.words",
"query": {
"match": {
"transcript_data.words.word": "appl"
}
}
}
}
}

Query hashmap structure with elasticsearch

I have two questions regarding mapping and querying a java hashmap in elasticsearch.
Does this mapping make sense in elasticsearch (is it the correct way to map a hashmap)?:
{
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
Here is some example data:
{
"itemsMap": {
"2021-12-31": {
"itemVal1": 100.0,
"itemVal2": 150.0,
},
"2021-11-30": {
"itemVal1": 200.0,
"itemVal2": 50.0,
}
}
}
My queries don't seem to work. For example:
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-11-30"
}
}
]
}
}
}
}
}
Am I doing something wrong? How can I query such a structure? I have the possibility to change the mapping if it's necessary.
Thanks
TLDR;
The way you are uploading your data, nothing is stored in key.
You will have fields named 2021-11-30 ... and key is going to be empty.
Either you have a limited amount of "dates" and this is a viable options (less than 1000) else your format is not viable on the long run.
If you don't want to change your doc, here is the query
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "itemsMap.2021-12-31"
}
}
]
}
}
}
}
}
To understand
If you inspect the mapping by querying the index
GET /<index_name>/_mapping
You will see that the number of fields name after your date is going to grow.
And in all your doc, itemsMap.key is going to be empty. (this explain why my previous answer did not work.
A more viable option
Keep your mapping, update the shape of your docs.
They will look like
{
"itemsMap": [
{
"key": "2021-12-31",
"value": { "itemVal1": 100, "itemVal2": 150 }
},
{
"key": "2021-11-30",
"value": { "itemVal1": 200, "itemVal2": 50 }
}
]
}
DELETE /71525899
PUT /71525899/
{
"mappings": {
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
}
POST /_bulk
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2022-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-11-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-12-31"
}
}
]
}
}
}
}
}

In Elasticsearch, how to move data from one field into another field

I have an index with mappings that look like this:
"mappings": {
"default": {
"_all": {
"enabled": false
},
"properties": {
"Foo": {
"properties": {
"Bar": {
"type": "keyword"
}
}
}
}
}
I am trying to change the mapping to introduce a sub-field of Bar, called Code, whilst migrating the string currently in Bar into Bar.Code. Here is the new mapping:
"mappings": {
"default": {
"_all": {
"enabled": false
},
"properties": {
"Foo": {
"properties": {
"Bar": {
"properties": {
"Code": {
"type": "keyword"
}
}
}
}
}
}
}
In order to do this, I think I need to do a _reindex and specify a pipeline. Is that correct? If so, how does my pipeline access the original data?
I have tried variations on the following code, but without success:
PUT _ingest/pipeline/transformFooBar
{
"processors": [
{
"set": {
"field": "Bar.Code",
"value": "{{_source.Bar}}"
}
}
]
}
POST _reindex
{
"source": {
"index": "foo_v1"
},
"dest": {
"index": "foo_v2",
"pipeline": "transformFooBar"
}
}
Ah, I almost had the syntax right. The _source is not required:
// Create a pipeline with a SET processor
PUT _ingest/pipeline/transformFooBar
{
"processors": [
{
"set": {
"field": "Bar.Code",
"value": "{{Bar}}"
}
}
]
}
// Reindex using the above pipeline
POST _reindex
{
"source": {
"index": "foo_v1"
},
"dest": {
"index": "foo_v2",
"pipeline": "transformFooBar"
}
}

Partially matches the requirement in elastic-search query

I am trying to retrieve data from elasticsearch based on 2 conditions, It should match the jarFileName and dependentClassName. The query runs fine with jarFileName but it matches dependendentClassName partially.
This is the query I used.
{
"query": {
"bool": {
"must": [
{
"match": {
"dependencies.dependedntClass": "java/lang/String"
}
},
{
"match": {
"JarFileName": {
"query": "Client.jar"
}
}
}
]
}
}
}
Query fully matches the jarFileName but for the dependentClassName it even matched and returned any part of the value mentioned. For an example if I used java/lang/String, it returns any type that has java or lang or String in their dependentClassName. I think its because of the "/". How can I correct this one?
EDIT
I used this query for mapping,
{
"classdata": {
"properties": {
"dependencies": {
"type": "object",
"properties": {
"dependedntClass": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
You can set the index of dependencies.dependedntClass to not_analyzed so that your given string will not be analyzed with standard analyzer. If you are using ES 2.x then the below mapping should work fine.
PUT /your_index
{
"mappings": {
"your_type":{
"properties": {
"dependencies":{
"type": "string",
"fields": {
"dependedntClass":{
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Then, your query should also work fine.
EDIT (if dependencies field is of nested type)
If your dependencies field is of nested or array type, then change the mapping as like :
POST /your_index
{
"mappings": {
"your_type":{
"properties": {
"dependencies":{
"type": "nested",
"properties": {
"dependedntClass":{
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
And the query should be changed as like below:
GET /your_index/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "dependencies",
"query": {
"match": {
"dependencies.dependedntClass": "java/lang/String"
}
}
}
},
{
"match": {
"JarFileName": {
"query": "Client.jar"
}
}
}
]
}
}
}

Elasticsearch Aggregation - Unable to perform aggregation to object

I have a mapping with an inner object as follows:
{
"mappings": {
"_all": {
"enabled": false
},
"properties": {
"foo": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"address": {
"type": "object",
"properties": {
"address": {
"type": "string"
},
"city": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
When I try the following aggregation it does not return any data:
post data:*/foo/_search?search_type=count
{
"query": {
"match_all": {}
},
"aggs": {
"unique": {
"cardinality": {
"field": "address.city"
}
}
}
}
When I try to put field city or address.city, aggregation returns zero but if i put foo.address.city it is then when i get the correct respond by elasticsearch. This also affects kibana behavior
Any ideas why this is happening? I saw there is a mapping refactoring that might affects this. I use elasticsearch version 1.7.1
To add on this if, I use the relative path in a search query as follows it works normally:
"query": {
"filtered": {
"filter": {
"term": {
"address.city": "london"
}
}
}
}
Seems its this same issue.
This is seen when the type name and field name is same.

Resources