Bool Filter not showing just the filtered data in Elastic Search - elasticsearch

I have an index "tag_nested" which has data of following type :
{
"jobid": 1,
"table_name": "table_A",
"Tags": [
{
"TagType": "WorkType",
"Tag": "ETL"
},
{
"TagType": "Subject Area",
"Tag": "Telecom"
}
]
}
When I fire the query to filter data on "Tag" and "TagType" by firing following query :
POST /tag_nested/_search
{
"query": {
"bool": {
"must": {"match_all": {}},
"filter": [
{"term": {
"Tags.Tag.keyword": "ETL"
}},
{"term": {
"Tags.TagType.keyword": "WorkType"
}}
]
}
}
}
It gives me the following output. The problem I am facing is while the above query filters documents which doesn't have filtered data BUT it shows all the "Tags" of that document instead of just the filter one
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
},
{
"TagType" : "Subject Area",
"Tag" : "Telecom"
}
]
}
}
Instead of above result I want my output to be like :
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
}
]
}
}

Already answered here, here and here.
TL;DR you'll need to make your Tags field of type nested, resync your index & use inner_hits to only fetch the applicable tag group.

Related

How write a query on item in a collection in Elastic Search

This is my problem.
I have documents with many states and I'm looking for some state at a define time. Sample :
{
"_index" : "toto-2022",
"_type" : "_doc",
"_id" : "9eba6fbbe4284f9d92ad183f5a4bab43",
"_score" : 1.0,
"_source" : {
"conversationId" : "9eba6fbbe4284f9d92ad183f5a4bab43",
"#timestamp" : "2022-11-10T14:17:53.372+0100",
"etats" : [
{
"date" : "2022-11-09T12:32:30.091+0100",
"niveauVisibilite" : 1,
"etat" : "ROUTEE",
"commentaire" : "Routée",
"rang" : 1
},
{
"date" : "2022-11-10T07:07:55.351+0100",
"niveauVisibilite" : 1,
"etat" : "TERMINEE",
"commentaire" : "Terminée",
"rang" : 2
}
]
}
},
{
"_index" : "toto-2022",
"_type" : "_doc",
"_id" : "476ffa93b550497da26380348d97e199",
"etats" : [
{
"date" : "2022-11-10T08:03:17.869+0100",
"niveauVisibilite" : 1,
"etat" : "ROUTEE",
"commentaire" : "Routée",
"rang" : 1
},
{
"date" : "2022-11-10T07:05:23.669+0100",
"niveauVisibilite" : 1,
"etat" : "TERMINEE",
"commentaire" : "Terminée",
"rang" : 2
}
]
}
I only want to have the document with the status ROUTEE on the 11-09-2022.
My query is
GET /toto-2022/_search
{
"query":{
"bool":{
"must": [
{
"match": {"etats.etat": "ROUTEE"}
}
],
"filter":[
{"range" : {"etats.date" : {"gte" : "2022-11-09T00:00:00.000+0100","lt" : "2022-11-09T23:59:59.000+0100"}}}
]
}
}
}
But the 2 documents are selected.
Thank you for your response
I would like to have only the first document
Firstly you need to define etats as a nested field. Elastic search flattens arrays so relation ship between objects in array is lost. Nested field preserves each object in an array as a separate document . You can read about nested field here.
Once mapping is changed you can use nested query to get data
{
"query": {
"nested": {
"path": "etats",
"inner_hits": {} --> it will return matched inner documents.
"query": {
"bool": {
"filter":[
{"range" : {"etats.date" : {"gte" : "2022-11-09T00:00:00.000+0100","lt" : "2022-11-09T23:59:59.000+0100"}}}],
],
"must": [
{
"match": {"etats.etat": "ROUTEE"}
}
]
}
}
}
}
}

How to get different aggregations for different params in bool in Elasticsearch

Document structure:
{
"_index" : "admin",
"_type" : "_doc",
"_id" : "9rEy94EB7k-V-3UYmchn",
"_source" : {
"entity_title" : "Title CPP",
"entity_type" : "type1",
"entity_score" : 185346,
"entity" : {
"customer_id" : "cid1",
"customer_name" : "cname1",
}
}
}
{
"_index" : "admin",
"_type" : "_doc",
"_id" : "9rEy94EB7k-V-3UYmchn",
"_source" : {
"entity_title" : "Title APP",
"entity_type" : "type1",
"entity_score" : 12,
"entity" : {
"customer_id" : "cid2",
"customer_name" : "cname2",
}
}
}
My query
GET /admin/_search
{
"size": 0,
"query" : {
"bool" : {
"should" : [
{
"query_string" : {"default_field" : "entity_title", "query" : "app*"}
},
{
"fuzzy": {"entity_title": {"value": "app"}}
}
]
}
}
},
"aggs": {
"by_entity_type": {
"terms":{
"field":"entity_type",
"size": 4 <total number of entity types>
},
"aggs": {
"by_top_score":{"top_hits":{"size":10, "sort": {"entity_score": {"order" : "desc", "mode" : "avg"}}}}
}
}
}
I need to
Aggregate all search results by entity_type.
Sort the results of matched query (query_string) by _score.
Sort results of fuzzy search by 'entity_score'.
Kindly help me to fetch this as a separate or in same aggregation.
Thanks.

Skip duplicates on field in a Elasticsearch search result

Is it possible to remove duplicates on a given field?
For example the following query:
{
"query": {
"term": {
"name_admin": {
"value": "nike"
}
}
},
"_source": [
"name_admin",
"parent_sku",
"sku"
],
"size": 2
}
is retrieving
"hits" : [
{
"_index" : "product",
"_type" : "_doc",
"_id" : "central30603",
"_score" : 4.596813,
"_source" : {
"parent_sku" : "SSP57",
"sku" : "SSP57816401",
"name_admin" : "NIKE U NSW PRO CAP NIKE AIR"
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "central156578",
"_score" : 4.596813,
"_source" : {
"parent_sku" : "SSP57",
"sku" : "SSP57816395",
"name_admin" : "NIKE U NSW PRO CAP NIKE AIR"
}
}
]
I'd like to skip duplicates on parent_sku so I only have one result per parent_sku like it's possible with suggestion by doing something like "skip_duplicates": true.
I know I cloud achieve this with an aggregation but I'd like to stick with a search, as my query is a bit more complicated and as I'm using the scroll API which doesn't work with aggregations.
Field collapsing should help here
{
"query": {
"term": {
"name_admin": {
"value": "nike"
}
}
},
"collapse" : {
"field" : "parent_sku",
"inner_hits": {
"name": "parent",
"size": 1
}
},
"_source": false,
"size": 2
}
The above query will return one document par parent_sku.

Elasticsearch Highlights document randomly

When i run a search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query":"Rose wa",
"fuzziness": "AUTO"
}
}
}
]
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
then following documents come in matches
"_index" : "product",
"_type" : "_doc",
"_id" : "52486",
"_score" : 19.770897,
"_source" : {",
"category_code" : "personalcare",
"name" : "Nivea Rosewater Face Wash (100 Ml)",
"category_name" : "Personal Care "
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "120830",
"_score" : 17.775726,
"_source" : {
"category_code" : "beverages",
"name" : "3 Roses 500G",
"category_name" : "Beverages"
},
"highlight" : {
"name" : [
"3 <em>Roses</em> 500G"
]
}
},
why some document have highlighted fields while other don't?
and how do i ensure that highlight is always present for matched documents

Returning all documents when query string is empty

Say I have the following mapping:
{
'properties': {
{'title': {'type': 'text'},
{'created': {'type': 'text'}}
}
}
Sometimes the user will query by created, and sometimes by title and created. In both cases I want the query JSON to be as similar as possible. What's a good way to create a query that filters only by created when the user is not using the title to query?
I tried something like:
{
bool: {
must: [
{range: {created: {gte: '2010-01-01'}}},
{query: {match_all: {}}}
]
}
}
But that didn't work. What would be the best way of writing this query?
Your query didn't work cause created is of type text and not date, range queries on string dates will not work as expected, you should change your mappings from type text to date and reindex your data.
Follow this to reindex your data (with the new mappings) step by step.
Now if I understand correctly you want to use a generic query which filters title or/and created depending on the user input.
In this case, my suggestion is to use Query String.
An example (version 7.4.x):
Mappings
PUT my_index
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"created": { -------> change type to date instead of text
"type": "date"
}
}
}
}
Index a few documents
PUT my_index/_doc/1
{
"title":"test1",
"created": "2010-01-01"
}
PUT my_index/_doc/2
{
"title":"test2",
"created": "2010-02-01"
}
PUT my_index/_doc/3
{
"title":"test3",
"created": "2010-03-01"
}
Search Query (created)
GET my_index/_search
{
"query": {
"query_string": {
"query": "created:>=2010-02-01",
"fields" : ["created"]
}
}
}
Results
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"title" : "test2",
"created" : "2010-02-01"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "test3",
"created" : "2010-03-01"
}
}]
Search Query (title)
GET my_index/_search
{
"query": {
"query_string": {
"query": "test2",
"fields" : ["title"]
}
}
}
Results
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.9808292,
"_source" : {
"title" : "test2",
"created" : "2010-02-01"
}
}
]
Search Query (title and created)
GET my_index/_search
{
"query": {
"query_string": {
"query": "(created:>=2010-02-01) AND test3"
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808292,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.9808292,
"_source" : {
"title" : "test3",
"created" : "2010-03-01"
}
}
]
fields in query string - you can mention both fields. if you remove fields then the query will apply on all fields in your mappings.
Hope this helps

Resources