Elasticsearch OR filtered query does not return results - elasticsearch

I have the following data set:
{
"_index": "myIndex",
"_type": "myType",
"_id": "220005",
"_score": 1,
"_source": {
"id": "220005",
"name": "Some Name",
"type": "myDataType",
"doc_as_upsert": true
}
}
Doing a direct match query like so:
GET typo3data/destination/_search
{
"query": {
"match": {
"name": "Some Name"
}
},
"size": 500
}
Will return the data just fine:
"hits": {
"total": 1,
"max_score": 3.442347,
"hits": [...
Doing an OR-query however (I am not sure which syntax is correct, the first syntax is taken from elasticsearch docs, the second is a working query taken from another project with the same versions):
GET typo3data/destination/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"or": {
"filters": [
{
"term": {
"name": "Some Name"
}
}
]
}
}
}
},
"size": 500
}
or
{
"query":
{
"match_all": {}
},
"filter":
{
"or":
[
{ "term": { "name": "Some Name"} },
{ "term": { "name": "Some Other Name"} }
]
},
"size": 1000
}
Does not return anything.
The mapping for the name field is:
"name": {
"type": "string",
"index": "not_analyzed"
}
Elasticsearch version is 1.4.4.

When indexing "some name" , this is broken into tokens as follows -
"some name" => [ "some" , "name" ]
Now in a normal match query , it also does the same above process before matching result. If either "same" or "name" is present , that document is qualified as result
match query ("some name") => search for term "some" or "name"
The term query does not analyze or tokenize your query. This means that it looks for a exact token or term of "some name" which is not present.
term query ("some name") => search for term "some name"
Hence you wont be seeing any result.
Things should work fine if you make the field not_analyzed , but then make sure the case is also matching,
You can read more about the same here.

After extending our mapping to include every field we have:
PUT typo3data/_mapping/destination
{
"someType": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"parentId": {
"type": "integer"
},
"type": {
"type": "string"
},
"generatedUid": {
"type": "integer"
}
}
}
}
The or-filters were working. So the general answer is: If you have such a problem, check your mappings closely and rather do too much work on them than too little.
If someone has an explanation why this might be happening, I will gladly pass the answer mark on to it.

Related

How do I search documents with their synonyms in Elasticsearch?

I have an index with some documents. These documents have the field name. But now, my documents are able to have several names. And the number of names a document can have is uncertain. A document can have only one name, or there can be 10 names of one document.
The question is, how to organize my index, document and query and then search for 1 document by different names?
For example, there's a document with names: "automobile", "automobil", "自動車". And whenever I query one of these names, I should get this document. Can I create kind of an array of these names and build a query to search for each one? Or there's more appropriate way to do this.
Tldr;
I feels like you are looking for something like synonyms?
Solution
In the following example I am creating an index, with a specific text analyser.
This analyser, handle automobile, automobil and 自動車 as the same token.
PUT /74472994
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": ["synonym" ]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [ "automobile, automobil, 自動車" ]
}
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "synonym"
}
}
}
}
POST /74472994/_doc
{
"name": "automobile"
}
which allow me to perform the following request:
GET /74472994/_search
{
"query": {
"match": {
"name": "automobil"
}
}
}
GET /74472994/_search
{
"query": {
"match": {
"name": "自動車"
}
}
}
And always get:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7198386,
"hits": [
{
"_index": "74472994",
"_id": "ROfyhoQBcn6Q8d0DlI_z",
"_score": 1.7198386,
"_source": {
"name": "automobile"
}
}
]
}
}

How does "must" clause with an array of "match" clauses really mean?

I have an elasticsearch query which looks like this...
"query": {
"bool": {
"must": [{
"match": {"attrs.name": "username"}
}, {
"match": {"attrs.value": "johndoe"}
}]
}
}
... and documents in the index that look like this:
{
"key": "value",
"attrs": [{
"name": "username",
"value": "jimihendrix"
}, {
"name": "age",
"value": 23
}, {
"name": "alias",
"value": "johndoe"
}]
}
Which of the following does this query really mean?
Document should contain either attrs.name = username OR attrs.value = johndoe
Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, even if they may match different elements in the attrs array (this would mean that the document given above would match the query)
Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, but they must match the same element in the attrs array (which would mean that the document given above would not match the query)
Further, how do I write a query to express #3 from the list above, i.e. the document should match only if a single element inside the attrs array matches both the following conditions:
attrs.name = username
attrs.value = johndoe
Must stands for "And" so a document satisfying all the clauses in match query is returned.
Must will not satisfy point 1. Document should contain either attrs.name = username OR attrs.value = johndoe- you need a should clause which works like "OR"
Whether Must will satisfy Point 2 or point 3 depends on the type of "attrs" field.
If "attr" field type is object then fields are flattened that is no relationship maintained between different fields for array. So must query will return a document if any attrs.name="username" and attrs.value="John doe", even if they are not part of same object in that array.
If you want an object in an array to act like a separate document, you need to use nested field and use nested query to match documents
{
"query": {
"nested": {
"path": "attrs",
"inner_hits": {}, --> returns matched nested documents
"query": {
"bool": {
"must": [
{
"match": {
"attrs.name": "username"
}
},
{
"match": {
"attrs.value": "johndoe"
}
}
]
}
}
}
}
}
hits in the response will contain all nested documents , to get all matched nested documents , inner_hits has to be specified
Based on your requirements you need to define your attrs field as nested, please refer nested type in Elasticsearch for more information. Disclaimer : it maintains the relationship but costly to query.
Answer to your other two questions also depends on what data type you are using please refer nested vs object data type for more details
Edit: solution using sample mapping, example docs and expected result
Index mapping using nested type
{
"mappings": {
"properties": {
"attrs": {
"type": "nested"
}
}
}
}
Index 2 sample doc one which severs the criteria and other which doesn't
{
"attrs": [
{
"name": "username",
"value": "johndoe"
},
{
"name": "alias",
"value": "myname"
}
]
}
Another which serves criteria
{
"attrs": [
{
"name": "username",
"value": "jimihendrix"
},
{
"name": "age",
"value": 23
},
{
"name": "alias",
"value": "johndoe"
}
]
}
And search query
{
"query": {
"nested": {
"path": "attrs",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"match": {
"attrs.name": "username"
}
},
{
"match": {
"attrs.value": "johndoe"
}
}
]
}
}
}
}
}
And Search result
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "2",
"_score": 1.7509375,
"_source": {
"attrs": [
{
"name": "username",
"value": "johndoe"
},
{
"name": "alias",
"value": "myname"
}
]
},
"inner_hits": {
"attrs": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7509375,
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "2",
"_nested": {
"field": "attrs",
"offset": 0
},
"_score": 1.7509375,
"_source": {
"name": "username",
"value": "johndoe"
}
}
]
}
}
}
}
]

elasticsearch search results with sub query

Getting started with elasticsearch, not sure if this is possible with one query along with pagination. I have a index with two types: user & blog. Example mapping:
"mappings": {
"user": {
"properties": {
"name" : { "type": "string" }
}
},
"blog": {
"properties": {
"title" : { "type": "string" },
"author_name" : { "type": "string" }
}
}
}
}
sample data
user:
[
{"name": "jemmy"},
{"name": "Tom"}
]
blog:
[
{"title": "foo bar", "author": "jemmy"},
{"title": "magic foo", "author": "Tom"},
{"title": "bigdata for dummies", "author": "Tom"},
{"title": "elasticsearch", "author": "Tom"},
{"title": "JS cookbook", "author": "jemmy"},
]
I'd like to query on the index such a way that when I search for blog it should do subquery on on each match. For example:
POST /test_index/blog/_search
{
"query": {
"match": {
"_all": "foo"
}
}
}
Expected (pseudo) results:
[
{
title: "foo bar",
author_name: "Jemmy",
author_post_count: 2
},
{
title: "magic foo",
author_name: "Tom",
author_post_count: 3
}
]
Here author_post_count is blog post count that the user has authored. If it could return those blog posts instead of count that would be great too. Is this possible? Perhaps the term i'm using not right, but I hope my question is clear.
Try something like this:
POST /test_index/blog/_search
{
"query": {
"match": {
"_all": "foo"
}
},
"aggs": {
"counting_posts": {
"global": {},
"aggs": {
"authors": {
"terms": {
"field": "author",
"size": 10
}
}
}
}
}
}
Be careful though with terms aggregation because it is considering the actual tokenized list of terms from the index, not what you actually index (lowercase/uppercase, tokenized in a way or another).

highlight does not properly work with exact phrase match search in elasticsearch

I am having problem with highlighting the exact phrase search. when the highlight comes back it highlights all the terms regardless of the exact search. see the example:
Put /test2
PUT /test2/all/1
{
"name" : "climate behaviour change"
}
PUT /test2/all/3
{
"name" : "climate behaviour change it is climate change"
}
and my mapping:
{
"test2": {
"mappings": {
"all": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
}
and when I run this query:
GET /test2/_search
{
"query": {
"query_string": {
"fields": ["name"],
"query": "\"climate change\""
}
}
,"highlight": {
"fields": {
"name": {"number_of_fragments": 0}
}
}
}
the result is
"hits": [
{
"_index": "test2",
"_type": "all",
"_id": "3",
"_score": 0.23013961,
"_source": {
"name": "climate behaviour change it is climate change"
},
"highlight": {
"name": [
"<em>climate</em> behaviour <em>change</em> it is <em>climate</em> <em>change</em>"
]
}
}
]
since I searched for exact climate change I would expect to have highlight for just the second part of the result, so climate behaviour change should not get highlighted. Does anyone know why this is like that?
or any other way I can get what I want?
By Default query_string uses OR operator see here
Rather use AND operator.
GET /test2/_search
{
"query": {
"query_string": {
"fields": ["name"],
"query": "\"climate change\"",
"default_operator": AND
}
}
,"highlight": {
"fields": {
"name": {"number_of_fragments": 0}
}
}
}

elasticsearch: How to rank first appearing words or phrases higher

For example, if I have the following documents:
1. Casa Road
2. Jalan Casa
Say my query term is "cas"... on searching, both documents have same scores. I want the one with casa appearing earlier (i.e. document 1 here) and to rank first in my query output.
I am using an edgeNGram Analyzer. Also I am using aggregations so I cannot use the normal sorting that happens after querying.
You can use the Bool Query to boost the items that start with the search query:
{
"bool" : {
"must" : {
"match" : { "name" : "cas" }
},
"should": {
"prefix" : { "name" : "cas" }
},
}
}
I'm assuming the values you gave is in the name field, and that that field is not analyzed. If it is analyzed, maybe look at this answer for more ideas.
The way it works is:
Both documents will match the query in the must clause, and will receive the same score for that. A document won't be included if it doesn't match the must query.
Only the document with the term starting with cas will match the query in the should clause, causing it to receive a higher score. A document won't be excluded if it doesn't match the should query.
This might be a bit more involved, but it should work.
Basically, you need the position of the term within the text itself and, also, the number of terms from the text. The actual scoring is computed using scripts, so you need to enable dynamic scripting in elasticsearch.yml config file:
script.engine.groovy.inline.search: on
This is what you need:
a mapping that is using term_vector set to with_positions, and edgeNGram and a sub-field of type token_count:
PUT /test
{
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions",
"index_analyzer": "edgengram_analyzer",
"search_analyzer": "keyword",
"fields": {
"word_count": {
"type": "token_count",
"store": "yes",
"analyzer": "standard"
}
}
}
}
}
},
"settings": {
"analysis": {
"filter": {
"name_ngrams": {
"min_gram": "2",
"type": "edgeNGram",
"max_gram": "30"
}
},
"analyzer": {
"edgengram_analyzer": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"name_ngrams"
],
"tokenizer": "standard"
}
}
}
}
}
test documents:
POST /test/test/1
{"text":"Casa Road"}
POST /test/test/2
{"text":"Jalan Casa"}
the query itself:
GET /test/test/_search
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"term": {
"text": {
"value": "cas"
}
}
},
"script_score": {
"script": "termInfo=_index['text'].get('cas',_POSITIONS);wordCount=doc['text.word_count'].value;if (termInfo) {for(pos in termInfo){return (wordCount-pos.position)/wordCount}};"
},
"boost_mode": "sum"
}
}
]
}
}
}
and the results:
"hits": {
"total": 2,
"max_score": 1.3715843,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 1.3715843,
"_source": {
"text": "Casa Road"
}
},
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 0.8715843,
"_source": {
"text": "Jalan Casa"
}
}
]
}

Resources