Elasticsearch Highlights document randomly

Elasticsearch Highlights document randomly - elasticsearch

When i run a search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query":"Rose wa",
"fuzziness": "AUTO"
}
}
}
]
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
then following documents come in matches
"_index" : "product",
"_type" : "_doc",
"_id" : "52486",
"_score" : 19.770897,
"_source" : {",
"category_code" : "personalcare",
"name" : "Nivea Rosewater Face Wash (100 Ml)",
"category_name" : "Personal Care "
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "120830",
"_score" : 17.775726,
"_source" : {
"category_code" : "beverages",
"name" : "3 Roses 500G",
"category_name" : "Beverages"
},
"highlight" : {
"name" : [
"3 <em>Roses</em> 500G"
]
}
},
why some document have highlighted fields while other don't?
and how do i ensure that highlight is always present for matched documents

Related

Skip duplicates on field in a Elasticsearch search result

Is it possible to remove duplicates on a given field?
For example the following query:
{
"query": {
"term": {
"name_admin": {
"value": "nike"
}
}
},
"_source": [
"name_admin",
"parent_sku",
"sku"
],
"size": 2
}
is retrieving
"hits" : [
{
"_index" : "product",
"_type" : "_doc",
"_id" : "central30603",
"_score" : 4.596813,
"_source" : {
"parent_sku" : "SSP57",
"sku" : "SSP57816401",
"name_admin" : "NIKE U NSW PRO CAP NIKE AIR"
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "central156578",
"_score" : 4.596813,
"_source" : {
"parent_sku" : "SSP57",
"sku" : "SSP57816395",
"name_admin" : "NIKE U NSW PRO CAP NIKE AIR"
}
}
]
I'd like to skip duplicates on parent_sku so I only have one result per parent_sku like it's possible with suggestion by doing something like "skip_duplicates": true.
I know I cloud achieve this with an aggregation but I'd like to stick with a search, as my query is a bit more complicated and as I'm using the scroll API which doesn't work with aggregations.

Field collapsing should help here
{
"query": {
"term": {
"name_admin": {
"value": "nike"
}
}
},
"collapse" : {
"field" : "parent_sku",
"inner_hits": {
"name": "parent",
"size": 1
}
},
"_source": false,
"size": 2
}
The above query will return one document par parent_sku.

Bool Filter not showing just the filtered data in Elastic Search

I have an index "tag_nested" which has data of following type :
{
"jobid": 1,
"table_name": "table_A",
"Tags": [
{
"TagType": "WorkType",
"Tag": "ETL"
},
{
"TagType": "Subject Area",
"Tag": "Telecom"
}
]
}
When I fire the query to filter data on "Tag" and "TagType" by firing following query :
POST /tag_nested/_search
{
"query": {
"bool": {
"must": {"match_all": {}},
"filter": [
{"term": {
"Tags.Tag.keyword": "ETL"
}},
{"term": {
"Tags.TagType.keyword": "WorkType"
}}
]
}
}
}
It gives me the following output. The problem I am facing is while the above query filters documents which doesn't have filtered data BUT it shows all the "Tags" of that document instead of just the filter one
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
},
{
"TagType" : "Subject Area",
"Tag" : "Telecom"
}
]
}
}
Instead of above result I want my output to be like :
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
}
]
}
}

Already answered here, here and here.
TL;DR you'll need to make your Tags field of type nested, resync your index & use inner_hits to only fetch the applicable tag group.

Elasticsearch neglecting special characters

word search in elasticsearch is working fine, but it seems to neglect all special characters. For example, i have this data (123) apple and 123 pear, but when i query "(123)", i expect "(123) apple" to be the first that appear instead of "123 pear". I have tried to change tokeniser from standard tokenizer to whitespace tokenizer, but still not working. Kindly advice. Thanks!
Data:
(123) apple
123 pear
Query: "(123)"
Expected:
(123) apple
123 pear
Actual result:
123 pear
(123) apple

I tried with whitespace tokenizer, it worked
PUT /index25
{
"mappings": {
"properties": {
"message":{
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}
Data:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cIC70m0BD5PlkoxX1O0B",
"_score" : 1.0,
"_source" : {
"message" : "123 pear"
}
},
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cYC70m0BD5PlkoxX9-3n",
"_score" : 1.0,
"_source" : {
"message" : "(123) apple"
}
}
]
Query:
GET index25/_search
{
"query": {
"match": {
"message": "(123)"
}
}
}
Response:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cYC70m0BD5PlkoxX9-3n",
"_score" : 0.47000363,
"_source" : {
"message" : "(123) apple"
}
}
]
Query:
GET index25/_search
{
"query": {
"match": {
"message": "123"
}
}
}
Response:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cIC70m0BD5PlkoxX1O0B",
"_score" : 0.9808292,
"_source" : {
"message" : "123 pear"
}
}
]

Simple way to find which one are in the same company with me?

There is index have field like below, it saves who in which company and which position is
{
"createtime" : 1562844632272,
"post" : "director",
"personId" : 30007346088,
"comId" : 20010774891
}
now want to find the partners of someone, that is which person is in the same company. Now my implementation is
first find the person's related companies(at most 500)
{
"query": { "term": { "personId": 30007346088 } },
"sort": [ { "createtime": "desc" } ],
"_source": ["comId"],
"size":500
}
then find these companies' related persons and exclude the current person and remove duplicate partner(similarly at most 500 partners)
{
"query": {
"bool": {
"must": [{ "terms": { "comId": [20010774891,...] } } ],
"must_not": [ {"term":{"personId":30007346088}} ]
}
},
"aggs" : {
"personId" : {
"terms" : {
"field" : "personId",
"size": 500
}
}
},
"size":0
}
Obviously it's a little complicated, if could exist some more simple way to implement it?

It can work if data is stored in below format.
A unique document for each person , with document id same as person id and company stored as array
POST indexperson/_doc/1
{
"createtime": 1562844632272,
"personId": 1,
"company": [
{
"id": 100,
"post": "director"
},
{
"id": 100,
"post": "director"
}
]
}
Data:
[
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 1,
"company" : [
{
"id" : 100,
"post" : "director"
},
{
"id" : 101,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 2,
"company" : [
{
"id" : 101,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 3,
"company" : [
{
"id" : 100,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 4,
"company" : [
{
"id" : 104,
"post" : "director"
}
]
}
}
]
Query:
Use (terms look up)[https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html]. Terms look up takes doc id as parameter
GET indexperson/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"company.id": {
"index": "indexperson",
"id": "1", --> get all docs in indexperson which match with company id
"path": "company.id"
}
}
}
],
"must_not": [
{
"term": {
"personId": {
"value": 2
}
}
}
]
}
}
}
Result:
"hits" : [
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 1,
"company" : [
{
"id" : 100,
"post" : "director"
},
{
"id" : 101,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 3,
"company" : [
{
"id" : 100,
"post" : "director"
}
]
}
}
]

Search in nested object

I'm having trouble making a query on elasticsearch 7.3
I create an index as this:
PUT myindex
{
"mappings": {
"properties": {
"files": {
"type": "nested"
}
}
}
}
After I create three documents:
PUT myindex/_doc/1
{
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2",
"files" : [
{
"filename" : "firstfilename.exe",
"datetime" : 111111111
},
{
"filename" : "secondfilename.exe",
"datetime" : 111111144
}
]
}
PUT myindex/_doc/2
{
"SHA256" : "87ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "thirdfilename.exe",
"datetime" : 111111133
},
{
"filename" : "fourthfilename.exe",
"datetime" : 111111122
}
]
}
PUT myindex/_doc/3
{
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "fifthfilename.exe",
"datetime" : 111111155
}
]
}
How can I get the last two files based on the datetime (ids: 1 and 3)?
I would SHA256 of the last two DATETIME ordered by DESC..
I did dozens of tests but none went well...
I don't write the code I tried because I'm really on the high seas ...
I would a result like this or similar:
{
"SHA256": [
"94ee05933....a2af6a22cc2",
"565e04933....a2af6a22c5a"
]
}

Query:
GET myindex/_search
{
"_source":"SHA256",
"sort": [
{
"files.datetime": {
"mode":"max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
]
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
]
}
]
In sort you will get the max date time value . So If you need to get file names too , you can add it in _source and use sort file to get appropriate file name.
A bit more complicated query this will give you exactly two values.
GET myindex/_search
{
"_source": "SHA256",
"query": {
"bool": {
"must": [
{
"nested": {
"path": "files",
"query": {
"match_all": {}
},
"inner_hits": {
"size":1,
"sort": [
{
"files.datetime": "desc"
}
]
}
}
}
]
}
},
"sort": [
{
"files.datetime": {
"mode": "max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
[
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_nested" : {
"field" : "files",
"offset" : 0
},
"_score" : null,
"_source" : {
"filename" : "fifthfilename.exe",
"datetime" : 111111155
},
"sort" : [
111111155
]
}
]
}
}
}
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "files",
"offset" : 1
},
"_score" : null,
"_source" : {
"filename" : "secondfilename.exe",
"datetime" : 111111144
},
"sort" : [
111111144
]
}
]
}
}
}
}
]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch Highlights document randomly - elasticsearch

Related

Skip duplicates on field in a Elasticsearch search result

Bool Filter not showing just the filtered data in Elastic Search

Elasticsearch neglecting special characters

Simple way to find which one are in the same company with me?

Search in nested object

Categories

Resources