Elastic Search : Restricting the search result in array - elasticsearch

My index metadata :
{
"never": {
"aliases": {},
"mappings": {
"userDetails": {
"properties": {
"Residence_address": {
"type": "nested",
"include_in_parent": true,
"properties": {
"Address_type": {
"type": "string",
"analyzer": "standard"
},
"Pincode": {
"type": "string",
"analyzer": "standard"
},
"address": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
},
"settings": {
"index": {
"creation_date": "1468850158519",
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "1060099"
},
"uuid": "v2njuC2-QwSau4DiwzfQ-g"
}
},
"warmers": {}
}
}
My setting :
POST never
{
"settings": {
"number_of_shards" : 5,
"analysis": {
"analyzer": {
"standard": {
"tokenizer": "keyword",
"filter" : ["lowercase","reverse"]
}
}
}
}
}
My data :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.375,
"hits": [
{
"_index": "never",
"_type": "userDetails",
"_id": "1",
"_score": 0.375,
"_source": {
"Residence_address": [
{
"address": "Omega Residency",
"Address_type": "Owned",
"Pincode": "500004"
},
{
"address": "Collage of Engineering",
"Address_type": "Rented",
"Pincode": "411005"
}
]
}
}
]
}
}
My query :
POST /never/_search?pretty
{
"query": {
"match": {
"Residence_address.address": "Omega"
}
}
}
My Result :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.375,
"hits": [
{
"_index": "never",
"_type": "userDetails",
"_id": "1",
"_score": 0.375,
"_source": {
"Residence_address": [
{
"address": "Omega Residency",
"Address_type": "Owned",
"Pincode": "500004"
},
{
"address": "Collage of Engineering",
"Address_type": "Rented",
"Pincode": "411005"
}
]
}
}
]
}
}
Is there any way to restrict my result to only object containing address = Omega Residency and NOT the other object having address = Collage of Engineering?

You can only do it with nested query and inner_hits. I see that you have include_in_parent: true and not using nested queries though. If you only want to get the matched nested objects you'd need to use inner_hits from nested queries:
GET /never/_search?pretty
{
"_source": false,
"query": {
"nested": {
"path": "Residence_address",
"query": {
"match": {
"Residence_address.address": "Omega Residency"
}
},
"inner_hits" : {}
}
}
}

Related

Keyword normalizer not applied on document

I'm using Elasticsearch 6.8
here is my mapping
{
"index_patterns": [
"my_index_*"
],
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"lower_ascii_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"audit_conformity": {
"dynamic": "false",
"properties": {
"country": {
"type": "keyword",
"normalizer": "my_normalizer"
},
[…]
Then I post a document with this body
{
"_source": {
"company_id": "a813bec1-f9f3-44c7-96ac-11157f64b79b",
"country": "MX",
"user_entity_id": "1"
}
}
When I search for the document, the country is still capitalized
GET /my_index_country/_search
I get
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_index_country",
"_type": "my_index",
"_id": "LOT0fYIBCNP9gFG_7cet",
"_score": 1,
"_source": {
"_source": {
"company_id": "a813bec1-f9f3-44c7-96ac-11157f64b79b",
"country": "MX",
"user_entity_id": "1",
}
}
}
]
}
}
What do I do wrong ?
You do nothing wrong, but normalizers (and analyzer alike) will never modify your source document, only whatever is indexed from it.
This means that the source document keeps holding MX but underneath mx will be indexed for the country field.
If you want to lowercase the country field, you should use an ingest pipeline with a lowercase processor instead which will modify your source document before indexing it:
PUT _ingest/pipeline/lowercase-pipiline
{
"processors": [
{
"lowercase": {
"field": "country"
}
}
]
}
Then use it when indexing your documents:
PUT my_index_country/my_index/LOT0fYIBCNP9gFG_7cet?pipeline=lowercase-pipeline
{
"company_id": "a813bec1-f9f3-44c7-96ac-11157f64b79b",
"country": "MX",
"user_entity_id": "1",
}
GET my_index_country/my_index/LOT0fYIBCNP9gFG_7cet
Result =>
{
"company_id": "a813bec1-f9f3-44c7-96ac-11157f64b79b",
"country": "mx",
"user_entity_id": "1",
}

Elasticsearch returns NullPointerException during inner_hits query

I have an index, which stores a nested document. I wanna see this nested documents, for this purpose I used 'inner_hits' in request, but elastic returns nullPointerException. Do anyone meet with this problem?)
Request to elasticsearch using Postman:
GET http://localhost/my-index/_search
{
"query": {
"nested": {
"path": "address_object",
"query": {
"bool": {
"must": {
"term": {"address_object.city": "Paris"}
}
}
},
"inner_hits" : {}
}
}
}
Response with status code 200:
{
"took": 161,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 1,
"skipped": 0,
"failed": 1,
"failures": [
{
"shard": 0,
"index": "my-index",
"node": "DWdD83KaTmUiodENQkGDww",
"reason": {
"type": "null_pointer_exception",
"reason": null
}
}
]
},
"hits": {
"total": 6500039,
"max_score": 2.1761138,
"hits": []
}
}
Elasticsearch version: 6.2.4
Lucene version: 7.2.1
Update:
Mapping:
{
"my-index": {
"mappings": {
"mytype": {
"dynamic": "false",
"_source": {
"enabled": false
},
"properties": {
"adverts_count": {
"type": "integer",
"store": true
},
...
"address_object": {
"type": "nested",
"properties": {
"adverts_count": {
"type": "integer",
"store": true
},
"city": {
"type": "keyword",
"store": true
}
}
},
...
Sample document:
{
"_index": "my-index",
"_type": "mytype",
"_id": "XDWrGncBdwNBWGEagAM2",
"_score": 2.1587489,
"fields": {
"is_target_page_shown": [
0
],
"updated_at": [
1612264276
],
"is_shown": [
0
],
"nb_queries": [
1
],
"search_query": [
"phone"
],
"target_category": [
15
],
"adverts_count": [
1
]
}
}
Extra information:
If I remove the "inner_hits": {} from search request, elastic returns nested documents(_index, _type, _id, _score), but ain't other fields(e.g city)
Also, as suggested in the comments, I tried setting to true ignore_unmapped, but it doesn't helped. The same nullPointerException.
I tried reproducing your issue, but as you have not provided the proper sample documents(one which you provided doesn't have the address_object properties), I used your mapping and below sample documents.
PUT index-name/_doc/1
{
"address_object" :{
"adverts_count" : 1,
"city": "paris"
}
}
PUT index-name/_doc/2
{
"address_object" :{
"adverts_count" : 1,
"city": "blr"
}
}
And when I use the same search provided by you.
POST 71907588/_search
{
"query": {
"nested": {
"path": "address_object",
"query": {
"bool": {
"must": {
"term": {
"address_object.city": "paris"
}
}
}
},
"inner_hits": {}
}
}
}
I get a proper response, matching paris as city as shown in the search response.
"hits": [
{
"_index": "71907588",
"_id": "1",
"_score": 0.6931471,
"_source": {
"address_object": {
"adverts_count": 1,
"city": "paris"
}
},
"inner_hits": {
"address_object": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.6931471,
"hits": [
{
"_index": "71907588",
"_id": "1",
"_nested": {
"field": "address_object",
"offset": 0
},
"_score": 0.6931471,
"_source": {
"city": "paris",
"adverts_count": 1
}
}
]
}
}
}
}
]

Is there anyway to sort an index before the aggregation

I have the following index template
{
"index_patterns": "notificationtiles*",
"order": 1,
"version": 1,
"aliases": {
"notificationtiles": {}
},
"settings": {
"number_of_shards": 5,
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"dynamic": "false",
"properties": {
"id": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"influencerId": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"friendId": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"message": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"type": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"sponsorshipCharityId": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"createdTimestampEpochInMilliseconds": {
"type": "date",
"format": "epoch_millis",
"index": false
},
"updatedTimestampEpochInMilliseconds": {
"type": "date",
"format": "epoch_millis",
"index": false
},
"createdDate": {
"type": "date"
},
"updatedDate": {
"type": "date"
}
}
}
}
with the following query
{
"query": {
"bool": {
"must": [
{
"match": {
"influencerId": "52407710-f7be-49c1-bc15-6d52363144a6"
}
},
{
"match": {
"type": "friend_completed_sponsorship"
}
}
]
}
},
"size": 0,
"aggs": {
"friendId": {
"terms": {
"field": "friendId",
"size": 2
},
"aggs": {
"latest": {
"top_hits": {
"sort": [
{
"createdDate": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"sponsorshipCharityId",
"message",
"createdDate"
]
},
"size": 1
}
}
}
}
}
}
which returns
{
"took": 72,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 12,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"friendId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 7,
"buckets": [
{
"key": "cf750fd8-998f-4dcd-9c88-56b2b6d6fce9",
"doc_count": 3,
"latest": {
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "notificationtiles-1",
"_type": "_doc",
"_id": "416a8e07-fd72-46d4-ade1-b9442ef46978",
"_score": null,
"_source": {
"createdDate": "2020-06-24T17:35:17.816842Z",
"sponsorshipCharityId": "336de13c-f522-4796-9218-f373ff0b4373",
"message": "Contact Test 788826 Completed Sponsorship!"
},
"sort": [
1593020117816
]
}
]
}
}
},
{
"key": "93ab55c5-795f-44b0-900c-912e3e186da0",
"doc_count": 2,
"latest": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "notificationtiles-1",
"_type": "_doc",
"_id": "66913b8f-94fe-49fd-9483-f332329b80dd",
"_score": null,
"_source": {
"createdDate": "2020-06-24T17:57:17.816842Z",
"sponsorshipCharityId": "dbad136c-5002-4470-b85d-e5ba1eff515b",
"message": "Contact Test 788826 Completed Sponsorship!"
},
"sort": [
1593021437816
]
}
]
}
}
}
]
}
}
}
However, I'd like the results to include the latest documents (ordered by createdDate desc), for example the following document
{
"_index": "notificationtiles-1",
"_type": "_doc",
"_id": "68a2a0a8-27aa-4347-8751-d7afccfa797d",
"_score": 1.0,
"_source": {
"id": "68a2a0a8-27aa-4347-8751-d7afccfa797d",
"influencerId": "52407710-f7be-49c1-bc15-6d52363144a6",
"friendId": "af342805-1990-4794-9d67-3bb2dd1e36dc",
"message": "Contact Test 788826 Completed Sponsorship!",
"type": "friend_completed_sponsorship",
"sponsorshipCharityId": "b2db72e6-a70e-414a-bf8b-558e6314e7ec",
"createdDate": "2020-06-25T17:35:17.816842Z",
"updatedDate": "2020-06-25T17:35:17.816876Z",
"createdTimestampEpochInMilliseconds": 1593021437817,
"updatedTimestampEpochInMilliseconds": 1593021437817
}
}
I need to get the 2 latests documents grouped by friendId with the latest document per friendId. The part of grouping by friendId with the latest document per friendId, works fine. However, I'm unable to sort the index by createdDate desc before the aggregation happens.
essentially, i'd like to sort the index by createdDate desc, before the aggregation takes place. I don't want to have a parent aggregate by createdDate since that wouldn't result in unique friendId. How can that be achieved?
It looks like you need to set the order property of your terms aggregation. By default they are ordered by hit count. You want them to be ordered by the max createdDate. So you should add a sub aggregation to calculate the max createdDate, and then you can use the ID of that aggregation to order the parent terms aggregation.

Querying a string consisting exactly a part of a query

I have a field named "lang" which consists values "en_US","en_GB","ru_RU", e.t.c. with this mapping
"lang": {
"type": "string",
"index": "not_analyzed",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
How to filter for documents, e.g. from "US"?
One way you can do it is change "index": "not_analyzed" on the upper-level field, and set up a pattern analyzer for that field. Since you already have the "lang.raw" field set up, you'll still be able to get the untouched version for faceting or whatever.
So, to test it I set up an index like this:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"whitespace_underscore": {
"type": "pattern",
"pattern": "[\\s_]+",
"lowercase": false
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string"
},
"lang": {
"type": "string",
"index_analyzer": "whitespace_underscore",
"search_analyzer": "standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
}
}
And added a few docs:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name":"doc1","lang":"en_US"}
{"index":{"_id":2}}
{"name":"doc2","lang":"en_GB"}
{"index":{"_id":3}}
{"name":"doc3","lang":"ru_RU"}
Now I can filter by "US" like this:
POST /test_index/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"lang": "US"
}
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"name": "doc1",
"lang": "en_US"
}
}
]
}
}
And I can still get a list of values with a terms aggregation on "lang.raw":
POST /test_index/_search?search_type=count
{
"aggs": {
"lang_terms": {
"terms": {
"field": "lang.raw"
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"lang_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "en_GB",
"doc_count": 1
},
{
"key": "en_US",
"doc_count": 1
},
{
"key": "ru_RU",
"doc_count": 1
}
]
}
}
}
Here is the code I used to test it:
http://sense.qbox.io/gist/ac3f3fd66ea649c0c3a8010241d1f6981a7e012c

Not able to search for string within a string in elasticsearch index

I'm trying to setup the mapping for my elasticsearch instance with full name matching and partial name matching:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '{
"mappings": {
"venue": {
"properties": {
"location": {
"type": "geo_point"
},
"name": {
"fields": {
"name": {
"type": "string",
"analyzer": "full_name"
},
"partial": {
"search_analyzer": "full_name",
"index_analyzer": "partial_name",
"type": "string"
}
},
"type": "multi_field"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"swedish_snow": {
"type": "snowball",
"language": "Swedish"
},
"name_synonyms": {
"type": "synonym",
"synonyms_path": "name_synonyms.txt"
},
"name_ngrams": {
"side": "front",
"min_gram": 2,
"max_gram": 50,
"type": "edgeNGram"
}
},
"analyzer": {
"full_name": {
"filter": [
"standard",
"lowercase"
],
"type": "custom",
"tokenizer": "standard"
},
"partial_name": {
"filter": [
"swedish_snow",
"lowercase",
"name_synonyms",
"name_ngrams",
"standard"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}'
I fill it with some data:
curl -XPOST 'http://127.0.0.1:9200/_bulk?pretty=1' -d '
{"index" : {"_index" : "test", "_type" : "venue"}}
{"location" : [59.3366, 18.0315], "name" : "johnssons"}
{"index" : {"_index" : "test", "_type" : "venue"}}
{"location" : [59.3366, 18.0315], "name" : "johnsson"}
{"index" : {"_index" : "test", "_type" : "venue"}}
{"location" : [59.3366, 18.0315], "name" : "jöhnsson"}
'
Perform some searches to test,
Full name:
curl -XGET 'http://127.0.0.1:9200/test/venue/_search?pretty=1' -d '{
"query": {
"bool": {
"should": [
{
"text": {
"name": {
"boost": 1,
"query": "johnsson"
}
}
},
{
"text": {
"name.partial": "johnsson"
}
}
]
}
}
}'
Result:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.29834434,
"hits": [
{
"_index": "test",
"_type": "venue",
"_id": "CAO-dDr2TFOuCM4pFfNDSw",
"_score": 0.29834434,
"_source": {
"location": [
59.3366,
18.0315
],
"name": "johnsson"
}
},
{
"_index": "test",
"_type": "venue",
"_id": "UQWGn8L9Squ5RYDMd4jqKA",
"_score": 0.14663845,
"_source": {
"location": [
59.3366,
18.0315
],
"name": "johnssons"
}
}
]
}
}
Partial name:
curl -XGET 'http://127.0.0.1:9200/test/venue/_search?pretty=1' -d '{
"query": {
"bool": {
"should": [
{
"text": {
"name": {
"boost": 1,
"query": "johns"
}
}
},
{
"text": {
"name.partial": "johns"
}
}
]
}
}
}'
Result:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.14663845,
"hits": [
{
"_index": "test",
"_type": "venue",
"_id": "UQWGn8L9Squ5RYDMd4jqKA",
"_score": 0.14663845,
"_source": {
"location": [
59.3366,
18.0315
],
"name": "johnssons"
}
},
{
"_index": "test",
"_type": "venue",
"_id": "CAO-dDr2TFOuCM4pFfNDSw",
"_score": 0.016878016,
"_source": {
"location": [
59.3366,
18.0315
],
"name": "johnsson"
}
}
]
}
}
Name within name:
curl -XGET 'http://127.0.0.1:9200/test/venue/_search?pretty=1' -d '{
"query": {
"bool": {
"should": [
{
"text": {
"ame": {
"boost": 1,
"query": "johnssons"
}
}
},
{
"text": {
"name.partial": "johnssons"
}
}
]
}
}
}'
Result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.39103588,
"hits": [
{
"_index": "test",
"_type": "venue",
"_id": "UQWGn8L9Squ5RYDMd4jqKA",
"_score": 0.39103588,
"_source": {
"location": [
59.3366,
18.0315
],
"name": "johnssons"
}
}
]
}
}
As you can see I'm only getting one venue back which is johnssons. Shouldn't I get both johnssons and johnsson back? What am I doing wrong in my settings?
You are using full_name analyzed as a search analyzer for the name.partial field. As a result your query is getting translated into the query for the term johnssons, which doesn't match anything.
You can use Analyze API to see what how your records are indexed. For example, this command
curl -XGET 'http://127.0.0.1:9200/test/_analyze?analyzer=partial_name&pretty=1' -d 'johnssons'
will show you that during indexing the string "johnssons" is getting translated into the following terms: "jo", "joh", "john", "johns", "johnss", "johnsso", "johnsson". While this command
curl -XGET 'http://127.0.0.1:9200/test/_analyze?analyzer=full_name&pretty=1' -d 'johnssons'
will show you that during searching the string "johnssons" is getting translated into term "johnssons". As you can see there is no match between your search term and your data here.

Resources