Finding which nested entry matched an elasticsearch query - elasticsearch

Say I'm indexing elasticsearch data like so:
{"entities": {
"type": "firstName",
"value": "Barack",
},
{
"type": "lastName",
"value": "Obama"
}}
I'd like users to be able to add custom attributes, so I don't know every possible value of "type" ahead of time.
My mappings might look like:
typename:
entities:
type: nested
If I do a match query for the text "Obama", with highlighting, is there a way to get back the full nested "entity" which matched? I would like to know if my query for "Obama" matched the firstName or the lastName.

I was able to solve this with inner_hits (thanks Andrei!)
{
"query": {
"nested": {
"query": {
{"match": {"entities.name": "Obama"}}
}
},
"inner_hits": {
"highlight": {
"fields": {
"entities.name": {}
}
}
}
}
}

Related

ElasticSearch autocomplete doesn't work with the middle words

Using python elasticsearch-dsl:
class Record(Document):
tags = Keyword()
tags_suggest = Completion(preserve_position_increments=False)
def clean(self):
self.tags_suggest = {
"input": self.tags
}
class Index:
name = 'my-index'
settings = {
"number_of_shards": 2,
}
When I index
r1 = Record(tags=['my favourite tag', 'my hated tag'])
r2 = Record(tags=['my good tag', 'my bad tag'])
And when I try to use autocomplete with the word in the middle:
dsl = Record.search()
dsl = dsl.suggest("auto_complete", "favo", completion={"field": "tags_suggest"})
search_response = dsl.execute()
for option in search_response.suggest.auto_complete[0].options:
print(option.to_dict())
It won't return anything, but it will when I search "my favo". Any good practices to fix that (make it return 'my favourite tag' when I request suggestions for "favo")?
Check Mapping
Search in Elasticsearch, Is also depends on how you are indexing your data. I would suggest to have look on index mapping with the below query:
curl -X GET "elasticsearch.url:port/index_name/_mapping?pretty"
You need to check how data is being inserted like is it using any analyzer or tokeninzer to save data. If you have not specified any analyzer elasticsearch default uses standard analyzer. It will produce the terms accordingly.
As per your use case you need to apply analyzer, tokens & filters. Here is the one Example where i have to use like query and implemented ngram token filter.
Solution
As i can see you are using suggester, The suggest feature suggests similar looking terms based on a provided text by using a suggester.
If you want to achieve autocomplete, I would suggest to use search as you type.
I tried to reproduce your use case and below is something which worked for me.
Create Index
PUT /test1?pretty
{
"mappings": {
"properties": {
"tags": {
"type": "search_as_you_type"
}
}
}
}
Indexing data
POST test1/_doc?pretty
{
"tags":"my favourite tag"
}
POST test1/_doc?pretty
{
"tags":"my hated tag"
}
POST test1/_doc?pretty
{
"tags":"my good tag"
}
POST test1/_doc?pretty
{
"tags":"my bad tag"
}
Query with your keyword
GET /test1/_search?pretty
{
"query": {
"multi_match": {
"query": "my",
"type": "bool_prefix",
"fields": [
"tags",
"tags._2gram",
"tags._3gram"
]
}
}
}
GET /test1/_search?pretty
{
"query": {
"multi_match": {
"query": "bad",
"type": "bool_prefix",
"fields": [
"tags",
"tags._2gram",
"tags._3gram"
]
}
}
}
GET /test1/_search?pretty
{
"query": {
"multi_match": {
"query": "fav",
"type": "bool_prefix",
"fields": [
"tags",
"tags._2gram",
"tags._3gram"
]
}
}
}
You can achive this by setting preserve_position_increments parameter to false in your mappings.
"tags_completion": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": false,
"preserve_position_increments": false,
"max_input_length": 50
}
You can query it in console like this:
GET /_search
{
"suggest" : {
"my-suggester": {
"prefix": "favou",
"completion": {
"field": "tags_completion",
"skip_duplicates": true,
"fuzzy": {
"fuzziness": 1
}
}
}
}
}
}

Elasticsearch script query involving root and nested values

Suppose I have a simplified Organization document with nested publication values like so (ES 2.3):
{
"organization" : {
"dateUpdated" : 1395211600000,
"publications" : [
{
"dateCreated" : 1393801200000
},
{
"dateCreated" : 1401055200000
}
]
}
}
I want to find all Organizations that have a publication dateCreated < the organization's dateUpdated:
{
"query": {
"nested": {
"path": "publications",
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['publications.dateCreated'].value < doc['dateUpdated'].value"
}
}
]
}
}
}
}
}
My problem is that when I perform a nested query, the nested query does not have access to the root document values, so doc['dateUpdated'].value is invalid and I get 0 hits.
Is there a way to pass in a value into the nested query? Or is my nested approach completely off here? I would like to avoid creating a separate document just for publications if necessary.
Thanks.
You can not access the root values from nested query context. They are indexed as separate documents. From the documentation
The nested clause “steps down” into the nested comments field. It no
longer has access to fields in the root document, nor fields in any
other nested document.
You can get the desired results with the help of copy_to parameter. Another way to do this would be to use include_in_parent or include_in_root but they might be deprecated in future and it will also increase the index size as every field of nested type will be included in root document so in this case copy_to functionality is better.
This is a sample index
PUT nested_index
{
"mappings": {
"blogpost": {
"properties": {
"rootdate": {
"type": "date"
},
"copy_of_nested_date": {
"type": "date"
},
"comments": {
"type": "nested",
"properties": {
"nested_date": {
"type": "date",
"copy_to": "copy_of_nested_date"
}
}
}
}
}
}
}
Here every value of nested_date will be copied to copy_of_nested_date so copy_of_nested_date will look something like [1401055200000,1393801200000,1221542100000] and then you could use simple query like this to get the results.
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['rootdate'].value < doc['copy_of_nested_date'].value"
}
}
]
}
}
}
You don't have to change your nested structure but you would have to reindex the documents after adding copy_to to publication dateCreated

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

Casting when querying ElasticSearch data

Is there a way in elasticsearch where I can cast a string to a long value at query time?
I have something like this in my document:
"attributes": [
{
"key": "age",
"value": "23"
},
{
"key": "name",
"value": "John"
},
],
I would like to write a query to get all the persons that have an age > 23. For that I need to cast the value to an int such that I can compare it when the key is age.
The above document is an example very specific to this problem.
I would greatly appreciate your help.
Thanks!
You can use scripting for that
POST /index/type/_search
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "foreach(attr : _source['attributes']) {if ( attr['key']=='age') { return attr['value'] > ageValue;} } return false;",
"params" : {
"ageValue" : 23
}
}
},
"query": {
"match_all": {}
}
}
}
}
UPD: Note that dynamic scripting should be enabled in elasticsearch.yml.
Also, I suppose you can archive better query performance by refactoring you document structure and applying appropriate mapping for age field.

Elastic search multiple terms in a dictionary

I have mapping like:
"profile": {
"properties": {
"educations": {
"properties": {
"university": {
"type": "string"
},
"graduation_year": {
"type": "string"
}
}
}
}
}
which obviously holds the educations history of people. Each person can have multiple educations. What I want to do is search for people who graduated from "SFU" in "2012". To do that I am using filtered search:
"filtered": {
"filter": {
"and": [
{
"term": {
"educations.university": "SFU"
}
},
{
"term": {
"educations.graduation_year": "2012"
}
}
]
}
But what this query does is to find the documents who have "SFU" and "2012" in their education, so this document would match, which is wrong:
educations[0] = {"university": "SFU", "graduation_year": 2000}
educations[1] = {"university": "UBC", "graduation_year": 2012}
Is there anyway I could filter both terms on each education?
You need to define nested type for educations and use nested filter to filter it, or Elasticsearch will internally flattens inner objects into a single object, and return the wrong results.
You can refer here for detail explainations and samples:
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Resources