Elasticsearch: When just filtering, why use the filtered query type - elasticsearch

What's the difference between
{
"query": {
"filtered": {
"filter": { "term": { "folder": "inbox" } }
}
}
}
and
{
"query": {
"term": { "folder": "inbox" }
}
}
It seems they both filter the index on the folder field by the inbox value.

Query can have two type of context in elastic search. Query context and filter context. Query context tells how well a document matches the query i.e. it calculates score whereas filter context tells whether a document matches the query and no scoring is done.
A query in query context tell you which document better matches the query. Higher the score more relevant the document is.
A query in filter context behaves like a conditional operator i.e. true if document matches the query and false if it doesn't.
To answer your question, both the queries will match the same number of documents but first query will not calculate the score (it will be faster compared to the second one because score calculation is skipped), whereas the second one will calculate score and will be slower comparatively to the first one. So if you just want to filter it is better to tell elastic that score need not to be calculated by putting the query in filter context. This way you save the computational cost of calculating score. Calculating score will be an overhead if only filtering is required and hence there are two type of contexts.
Sample output for 1st query (filter context):
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0, <-------- no scoring done
}
Sample output for 2nd query (query context):
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9808292 <-------- score calculated
}
So use query context to get relevant matches and filter context to filter out documents. You can use the combination of both as well.
You can read more on query and filter context here.

I agree with what is said upstairs, but there is one thing to add.Query and Filter can be used together in a query to reduce time.

Related

Does Elasticsearch execute operations in a specific order?

I read that ES is near real-time, and therefore all index/create/update/delete etc. operations are not executed immediately.
Let's say I index 3 documents with same id, in this order with 1 millisecond between each, and then force refresh:
{
"_id": "A",
"_source": { "text": "a" }
}
{
"_id": "A",
"_source": { "text": "b" }
}
{
"_id": "A",
"_source": { "text": "c" }
}
Then, if I search for a document with id "A", I will get 1 result, but which one?
When Elasticsearch performs a refresh, does it execute operations sequentially in the order in which they arrive?
in this instance it will come down to which indexing approach you take
a bulk request does not guarantee the order that you submitted it in is how it will be applied. it might be in the same order with (some of) your tests, but there's no guarantee that Elasticsearch provides there
you can manage this by specifying a version in your document, so a higher version of a document is always going to be what is indexed
indexing using 3 individual POSTs will be ordered, as you are making 3 separate and sequential requests one after the other. that's because each request has the same _id and will be directed to the same shard and actioned by the order they are received in

Match_phrase is elastic search not working as expected

In my elastic search I have documents which contains a "fieldname" with values "abc" and "abc-def". When I am using match_phrase query for searching documents with fieldname "abc" it is returning me documents with values "abc-def" as well. However when I am querying for "abc-def" it is working fine. My query is as follows:
Get my_index/_search
{
"query" : {
"match_phrase" : {"fieldname" : "abc"}
}
}
Can someone please help me in understanding the issue?
match_phrase query analyzes the search term based on the analyzer provided for the field (if no analyzer is added then by default standard analyzer is used).
Match phrase query searches for those documents that have all the terms present in the field (from the search term), and the terms must be present in the correct order.
In your case, "abc-def" gets tokenized to "abc" and "def" (because of standard analyzer). Now when you are using match phrase query for "abc-def", this searches for all documents that have both abc and def in the same order. (therefore you are getting only 1 doc in the result)
When searching for "abc", this will search for those documents that have abc in the fieldname field (since both the document contain abc, so both are returned in the result)
If you want to return only exact matching documents in the result, then you need to change the way the terms are analyzed.
If you have not explicitly defined any mapping then you need to add .keyword to the fieldname field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after the fieldname field).
Adding a working example with index data, mapping, search query and search result
Index data:
{
"name":"abc-def"
}
{
"name":"abc"
}
Search Query:
{
"query": {
"match_phrase": {
"name.keyword": "abc"
}
}
}
Search Result:
"hits": [
{
"_index": "67394740",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"name": "abc"
}
}
]

Is it possible to retrieve only doc id and score from Elasticsearch without performing the fetch phase of search?

Understanding "Query Then Fetch" shows that an Elasticsearch query is a two step process of query (find/score/sort matching documents from all servers) and fetching (go back to the servers and collect the content of the matching documents).
Is there a way to retrieve only a list of sorted doc_id and score but avoid the fetch? I know that fetch can be avoided by setting size to 0... but I still need the matching docs and their scores and that would return none.
I figure I might be able to turn off _source, but I'm not sure that would work if, for example, the query portion of the search only knows the internal doc_id and needs to go and retrieve the public doc_id.
GET /_search
{
"_source": false,
"query" : {
"term" : { "user" : "kimchy" }
}
}
off course, you have to use your own ids, not auto-generated ones
The scores are separated from the docs' sources so I don't see why a fetch would be necessary to retrieve them.
You can surely turn off _source and then also sort by _id, like so:
GET your_index/_search
{
"_source": false,
"size": 200,
"sort": [
{
"_id": {
"order": "asc"
}
},
{
"_score": {
"order": "desc"
}
}
]
}
Interestingly enough, sorting the response by doc's _source field seems to be ~3x faster than sorting by the inner _id (Contrary to what I expected). I've tested this w/ quite a small index -- ~1.5M docs. I wonder what you get when you run
GET your_index/_search?request_cache=false
{
"_source": false,
"size": 200,
"sort": [
{
"_id": {
"order": "asc"
}
}
]
}
and then replace _id with another doc's _source sortable field.
Indeed by setting size to 0, we will be skipping the fetch phase. In all other cases, if have even a single hit, the fetch phase will be executed and there is no way to skip it.
As you correctly noted, the query phase doesn't know the real _ids of matched documents, only their internal doc ids on respected shard. As a part of the fetch phase we will retrieve those _ids, which are stored as a stored field in Lucene. _source is a separate stored field from _id, which is also loaded during the fetch phase. But to speed the fetch phase, you can disable loading _source if you don't need it. Being a separate field from _id, disabling _source doesn't affect the correct loading of _ids.

Elastic Search | How to get original search query with corresponding match value

I'm using ElasticSearch as search engine for a human resource database.
The user submits a competence (f.ex 'disruption'), and ElasticSearch returns all users ordered by best match.
I have configured the field 'competences' to use synonyms, so 'innovation' would match 'disruption'.
I want to show the user (who is performing the search) how a particular search result matched the search query. For this I use the explain api (reference)
The query works as expected and returns an _explanation to each hit.
Details (simplified a bit) for a particular hit could look like the following:
{
description: "weight(Synonym(skills:innovation skills:disruption)),
value: 3.0988
}
Problem: I cannot see what the original search term was in the _explanation. (As illustrated in example above: I can see that some search query matched with 'innovation' or 'disruption', I need to know what the skill the users searched for)
Question: Is there any way to solve this issue (example: parse a custom 'description' with info about the search query tag to the _explanation)?
Expected Result:
{
description: "weight(Synonym(skills:innovation skills:disruption)),
value: 3.0988
customDescription: 'innovation'
}
Maybe you can put the original query in the _name field?
Like explained in https://qbox.io/blog/elasticsearch-named-queries:
GET /_search
{
"query": {
"query_string" : {
"default_field" : "skills",
"query" : "disruption",
"_name": "disruption"
}
}
}
You can then find the proginal query in the matched queries section in the return object:
{
"_index": "testindex",
"_type": "employee",
"_id": "2",
"_score": 0.19178301,
"_source": {
"skills": "disruption"
},
"matched_queries": [
"disruption"
]
}
Add the explain to the solution and i think it would work fine...?

Elasticsearch query to get results irrespective of spaces in search text

I am trying to fetch data from Elasticsearch matching from a field name. I have following two records
{
"_index": "sam_index",
"_type": "doc",
"_id": "key",
"_version": 1,
"_score": 2,
"_source": {
"name": "Sample Name"
}
}
and
{
"_index": "sam_index",
"_type": "doc",
"_id": "key1",
"_version": 1,
"_score": 2,
"_source": {
"name": "Sample Name"
}
}
When I try to search using texts like sam, sample, Sa, etc, I able fetch both records by using match_phrase_prefix query. The query I tried with match_phrase_prefix is
GET sam_index/doc/_search
{
"query": {
"match_phrase_prefix" : {
"name": "sample"
}
}
}
I am not able to fetch the records when I try to search with string samplen. I need search and get results irrespective of spaces between texts. How can I achieve this in Elasticsearch?
First, you need to understand how Elasticsearch works and why it gives the result and doesn't give the result.
ES works on the token match, Documents which you index in ES goes through the analysis process and creates and stores the tokens generated from this process to inverted index which is used for searching.
Now when you make a query then that query also generates the search tokens, these can be as it is in the search query in case of term query or tokens based on the analyzer defined on the search field in case of match query. Hence it's very important to understand the internals of your search query.
Also, it's very important to understand the mapping of your index, ES uses the standard analyzer by default on the text fields.
You can use the Explain API to understand the internals of the query like which search tokens are generated by your search query, how documents matched to it and on what basis score is calculated.
In your case, I created the name field as text, which uses the word joined analyzer explained in Ignore spaces in Elasticsearch and I was able to get the document which consists of sample name when searched for samplen.
Let us know if you also want to achieve the same and if it solves your issue.

Resources