Elasticsearch Highlights document randomly - elasticsearch

When i run a search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query":"Rose wa",
"fuzziness": "AUTO"
}
}
}
]
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
then following documents come in matches
"_index" : "product",
"_type" : "_doc",
"_id" : "52486",
"_score" : 19.770897,
"_source" : {",
"category_code" : "personalcare",
"name" : "Nivea Rosewater Face Wash (100 Ml)",
"category_name" : "Personal Care "
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "120830",
"_score" : 17.775726,
"_source" : {
"category_code" : "beverages",
"name" : "3 Roses 500G",
"category_name" : "Beverages"
},
"highlight" : {
"name" : [
"3 <em>Roses</em> 500G"
]
}
},
why some document have highlighted fields while other don't?
and how do i ensure that highlight is always present for matched documents

Related

Skip duplicates on field in a Elasticsearch search result

Is it possible to remove duplicates on a given field?
For example the following query:
{
"query": {
"term": {
"name_admin": {
"value": "nike"
}
}
},
"_source": [
"name_admin",
"parent_sku",
"sku"
],
"size": 2
}
is retrieving
"hits" : [
{
"_index" : "product",
"_type" : "_doc",
"_id" : "central30603",
"_score" : 4.596813,
"_source" : {
"parent_sku" : "SSP57",
"sku" : "SSP57816401",
"name_admin" : "NIKE U NSW PRO CAP NIKE AIR"
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "central156578",
"_score" : 4.596813,
"_source" : {
"parent_sku" : "SSP57",
"sku" : "SSP57816395",
"name_admin" : "NIKE U NSW PRO CAP NIKE AIR"
}
}
]
I'd like to skip duplicates on parent_sku so I only have one result per parent_sku like it's possible with suggestion by doing something like "skip_duplicates": true.
I know I cloud achieve this with an aggregation but I'd like to stick with a search, as my query is a bit more complicated and as I'm using the scroll API which doesn't work with aggregations.
Field collapsing should help here
{
"query": {
"term": {
"name_admin": {
"value": "nike"
}
}
},
"collapse" : {
"field" : "parent_sku",
"inner_hits": {
"name": "parent",
"size": 1
}
},
"_source": false,
"size": 2
}
The above query will return one document par parent_sku.

Bool Filter not showing just the filtered data in Elastic Search

I have an index "tag_nested" which has data of following type :
{
"jobid": 1,
"table_name": "table_A",
"Tags": [
{
"TagType": "WorkType",
"Tag": "ETL"
},
{
"TagType": "Subject Area",
"Tag": "Telecom"
}
]
}
When I fire the query to filter data on "Tag" and "TagType" by firing following query :
POST /tag_nested/_search
{
"query": {
"bool": {
"must": {"match_all": {}},
"filter": [
{"term": {
"Tags.Tag.keyword": "ETL"
}},
{"term": {
"Tags.TagType.keyword": "WorkType"
}}
]
}
}
}
It gives me the following output. The problem I am facing is while the above query filters documents which doesn't have filtered data BUT it shows all the "Tags" of that document instead of just the filter one
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
},
{
"TagType" : "Subject Area",
"Tag" : "Telecom"
}
]
}
}
Instead of above result I want my output to be like :
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
}
]
}
}
Already answered here, here and here.
TL;DR you'll need to make your Tags field of type nested, resync your index & use inner_hits to only fetch the applicable tag group.

Elasticsearch neglecting special characters

word search in elasticsearch is working fine, but it seems to neglect all special characters. For example, i have this data (123) apple and 123 pear, but when i query "(123)", i expect "(123) apple" to be the first that appear instead of "123 pear". I have tried to change tokeniser from standard tokenizer to whitespace tokenizer, but still not working. Kindly advice. Thanks!
Data:
(123) apple
123 pear
Query: "(123)"
Expected:
(123) apple
123 pear
Actual result:
123 pear
(123) apple
I tried with whitespace tokenizer, it worked
PUT /index25
{
"mappings": {
"properties": {
"message":{
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}
Data:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cIC70m0BD5PlkoxX1O0B",
"_score" : 1.0,
"_source" : {
"message" : "123 pear"
}
},
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cYC70m0BD5PlkoxX9-3n",
"_score" : 1.0,
"_source" : {
"message" : "(123) apple"
}
}
]
Query:
GET index25/_search
{
"query": {
"match": {
"message": "(123)"
}
}
}
Response:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cYC70m0BD5PlkoxX9-3n",
"_score" : 0.47000363,
"_source" : {
"message" : "(123) apple"
}
}
]
Query:
GET index25/_search
{
"query": {
"match": {
"message": "123"
}
}
}
Response:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cIC70m0BD5PlkoxX1O0B",
"_score" : 0.9808292,
"_source" : {
"message" : "123 pear"
}
}
]

Simple way to find which one are in the same company with me?

There is index have field like below, it saves who in which company and which position is
{
"createtime" : 1562844632272,
"post" : "director",
"personId" : 30007346088,
"comId" : 20010774891
}
now want to find the partners of someone, that is which person is in the same company. Now my implementation is
first find the person's related companies(at most 500)
{
"query": { "term": { "personId": 30007346088 } },
"sort": [ { "createtime": "desc" } ],
"_source": ["comId"],
"size":500
}
then find these companies' related persons and exclude the current person and remove duplicate partner(similarly at most 500 partners)
{
"query": {
"bool": {
"must": [{ "terms": { "comId": [20010774891,...] } } ],
"must_not": [ {"term":{"personId":30007346088}} ]
}
},
"aggs" : {
"personId" : {
"terms" : {
"field" : "personId",
"size": 500
}
}
},
"size":0
}
Obviously it's a little complicated, if could exist some more simple way to implement it?
It can work if data is stored in below format.
A unique document for each person , with document id same as person id and company stored as array
POST indexperson/_doc/1
{
"createtime": 1562844632272,
"personId": 1,
"company": [
{
"id": 100,
"post": "director"
},
{
"id": 100,
"post": "director"
}
]
}
Data:
[
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 1,
"company" : [
{
"id" : 100,
"post" : "director"
},
{
"id" : 101,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 2,
"company" : [
{
"id" : 101,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 3,
"company" : [
{
"id" : 100,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 4,
"company" : [
{
"id" : 104,
"post" : "director"
}
]
}
}
]
Query:
Use (terms look up)[https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html]. Terms look up takes doc id as parameter
GET indexperson/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"company.id": {
"index": "indexperson",
"id": "1", --> get all docs in indexperson which match with company id
"path": "company.id"
}
}
}
],
"must_not": [
{
"term": {
"personId": {
"value": 2
}
}
}
]
}
}
}
Result:
"hits" : [
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 1,
"company" : [
{
"id" : 100,
"post" : "director"
},
{
"id" : 101,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 3,
"company" : [
{
"id" : 100,
"post" : "director"
}
]
}
}
]

Search in nested object

I'm having trouble making a query on elasticsearch 7.3
I create an index as this:
PUT myindex
{
"mappings": {
"properties": {
"files": {
"type": "nested"
}
}
}
}
After I create three documents:
PUT myindex/_doc/1
{
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2",
"files" : [
{
"filename" : "firstfilename.exe",
"datetime" : 111111111
},
{
"filename" : "secondfilename.exe",
"datetime" : 111111144
}
]
}
PUT myindex/_doc/2
{
"SHA256" : "87ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "thirdfilename.exe",
"datetime" : 111111133
},
{
"filename" : "fourthfilename.exe",
"datetime" : 111111122
}
]
}
PUT myindex/_doc/3
{
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "fifthfilename.exe",
"datetime" : 111111155
}
]
}
How can I get the last two files based on the datetime (ids: 1 and 3)?
I would SHA256 of the last two DATETIME ordered by DESC..
I did dozens of tests but none went well...
I don't write the code I tried because I'm really on the high seas ...
I would a result like this or similar:
{
"SHA256": [
"94ee05933....a2af6a22cc2",
"565e04933....a2af6a22c5a"
]
}
Query:
GET myindex/_search
{
"_source":"SHA256",
"sort": [
{
"files.datetime": {
"mode":"max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
]
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
]
}
]
In sort you will get the max date time value . So If you need to get file names too , you can add it in _source and use sort file to get appropriate file name.
A bit more complicated query this will give you exactly two values.
GET myindex/_search
{
"_source": "SHA256",
"query": {
"bool": {
"must": [
{
"nested": {
"path": "files",
"query": {
"match_all": {}
},
"inner_hits": {
"size":1,
"sort": [
{
"files.datetime": "desc"
}
]
}
}
}
]
}
},
"sort": [
{
"files.datetime": {
"mode": "max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
[
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_nested" : {
"field" : "files",
"offset" : 0
},
"_score" : null,
"_source" : {
"filename" : "fifthfilename.exe",
"datetime" : 111111155
},
"sort" : [
111111155
]
}
]
}
}
}
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "files",
"offset" : 1
},
"_score" : null,
"_source" : {
"filename" : "secondfilename.exe",
"datetime" : 111111144
},
"sort" : [
111111144
]
}
]
}
}
}
}
]

Resources