Hi I am using two search query which is giving similar result. what is difference between these two query simple query string and multi match?
1- simple_query_string
{
"size": 50,
"query": {
"bool": {
"should": [
{
"simple_query_string": {
"query": "text search",
"fields": [
"Field1^2",
"Field2^4",
"Field3^6",
"Field4^8",
"Field5^10",
"Field6^12",
"Field7^14",
"Field8^16",
"*^.1"
]
}
}
]
}
},
"sort": [
"_score",
{
"Field6.keyword": {
"order": "desc"
}
}
]
}
2- Multimatch query
GET index/_search
{
"size": 50,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "text search",
"fields": [
"Field1^2",
"Field2^4",
"Field3^6",
"Field4^8",
"Field5^10",
"Field6^12",
"Field7^14",
"Field8^16",
"*^.1"
],
"type": "most_fields"
}
}
]
Both query gives same result in same order. Is there any advantage of any query ?
Both queries are the same as they will be converted to same query string. If you use query string your query will be slightly faster as you Elastic doesn't need to rewrite your query.
All queries in Lucene undergo a "rewriting" process. A query (and its sub-queries) may be rewritten one or more times, and the process continues until the query stops changing. This process allows Lucene to perform optimizations, such as removing redundant clauses, replacing one query for a more efficient execution path, etc. For example a Boolean → Boolean → TermQuery can be rewritten to a TermQuery, because all the Booleans are unnecessary in this case. The rewriting process is complex and difficult to display, since queries can change drastically. Rather than showing the intermediate results, the total rewrite time is simply displayed as a value (in nanoseconds). This value is cumulative and contains the total time for all queries being rewritten.
You can check your query performance and rewrite time by setting "profile": "true" in your query, for more information check official documentation of Elastic search here.
Related
When using elasticsearch-7 I'm confused by es compound queries syntax.
Though reading es documents repeatedly but i just find standard syntax of Boolean or Constant score seperately.
As it illuminate,i understand what is 'query context' and what is 'filter context'.But when combining these two query type in a single query i don't know what it mean.
Let's see a example:
GET /classes_test/_search
{
"size": "21",
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"class_name": "29386556"
}
}
],
"should": [
{
"term": {
"master": "7033560"
}
},
{
"term": {
"assistant": "7033560"
}
},
{
"term": {
"students": "7033560"
}
}
],
"minimum_should_match": 1,
"must_not": [
{
"term": {
"class_id": 0
}
}
],
"filter": [
{
"term": {
"class_status": "1"
}
}
]
}
}
}
}
}
This query can be executed and response well.Each item in response content has a '_score' value with 1.0.
So,is it mean that the sub bool query as a entirety is in a filter context though it has a 'must' and 'should'?
Also i found boolean query can have a constant score sub query.
Why es allow these syntax but has no more words to explain?
If you use a constant_score query, you'll never get scores different than 1.0, unless you specify boost parameters in which case the score will match those.
If you need scoring you obviously need to ditch constant_score.
In your case, your match query on class_name cannot yield any other score than 1 or 0 since this is basically a yes/no filter, not a matching based on full-text search.
To sum up, all your query executes in a filter context (hence score 0 or 1) since you don't rely on full-text search. So you get scoring whenever you use full-text search, not because you use a match query. In your case, you can merge all must constraints into filter, it won't make any difference since you only have filters (yes/no matches) and no full-text search.
I make several queries to ElasticSearch to retrieve documents by keywords (I match them by code or internal id's). I don't really care about scoring in those queries, just retrieving the documents.
Would wrapping the bool queries I use in a constant_score filter increase performance, or make sense whatsoever?
It make no sense. If you are using bool query then you can apply filter to them.
GET /_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
filter - The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching.
Even more constant_score should be used for scoring so if there is match apply "boost" value as a score.
To Sum Up: Use filter for filter and constant_score when you need score
We just upgraded to Elasticsearch 2.3.1 (from 1.7) and we're getting strange search behavior that I can't explain. What seems to happen is that a search request containing a bool query and a sort clause is returning:
Documents that don't seem to match the given search terms in any way.
Wildly different estimates on the total of matching documents each request
A minimal example of a request with this behavior:
post pim_search_1/_search
{
"explain": false,
"track_scores": false,
"sort": [
{
"product_id": {
"order": "desc"
}
}
],
"query": {
"bool": {
"filter": [
{
"terms": {
"publication": [
"public"
]
}
},
{
"query_string": {
"query": "iphone",
"default_operator": "and"
}
}
]
}
}
}
So in this case, a query string for "iphone" returns no iPhones at all. Setting explain to true yields this for the documents that appear to have no matching terms at all:
"_explanation": {
"value": 0,
"description": "Failure to meet condition(s) of required/prohibited clause(s)",
"details": [
{
"value": 0,
"description": "no match on required clause (#ConstantScore(publication:public) #_all:iphone)",
So the document has no matching clauses, but it's still returned?
We've found two workarounds for this behavior:
Sort on _score or leave out the sort clause entirely. Sorting on anything else, like the field above or on _doc gives the wonky behavior.
Include track_scores : true on the request.
So it appears to have something to do with scoring and relevancy. But since we're sorting on a field of our own, we're not interested in relevancy or score. Without the workarounds, the max_score on the response is null and so is the _score of every document.
Is this behavior something that can be explained in any way, or should we be looking at cluster health/configuration/corruption? According to the cluster, its health is green and all shards for this index appear healthy. It's currently a small index with 3 shards (1 replica per shard) over 3 nodes.
Update
I've further investigated the issue and it seems cache related. Specifically, the fielddata cache for the _all field (I'm not very familiar with the internals of Elasticsearch, so please correct me if that's not a thing).
Steps to reproduce
I have a data set that reproduces the problem, leave a comment and I can send it to you.
Use the following query:
post pim_search_1/_search
{
"fields": [
"_all"
],
"explain": true,
"size": 100,
"sort": [
{
"product_id": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "_all",
"query": "surface",
"default_operator": "and"
}
}
],
"filter": [
{
"terms": {
"publication": [
"public"
]
}
}
]
}
}
}
Execute the query. You're searching for "surface" in the query string here and this should result in 22 hits total. This is correct. Execute this query a bunch of times (this seems to matter for step 2).
Change the query string to "iphone". This will result in 22 hits still, even though the dataset contains only one item that should match. The _explanation also mentions that the found documents don't actually match, like my example above.
Execute this: post pim_search_1/_cache/clear
Execute the query again for "iphone". It should now only return 1 hit, which is correct. Also execute this one a bunch of times.
Execute the query again for "surface", this will now return only 1 hit and again the _explanation states that it didn't get a match on the resulting document.
Remove the sort clause from the query and everything appears normal. The same is true for including "track_scores" : true.
Instead of _cache/clear it also works to just restart the cluster.
I say it's related to the _all field because changing the default_field of the query_string to the primitive_name field (an analyzed field) results in the correct behavior. For this example, I've made _all a stored field (it isn't normally with us) and it's returned in the search results so you can inspect it (doesn't appear to contain anything weird).
The above was done on a single node cluster (my local PC) on Elasticsearch 2.3.5.
This Github question seems to be about the same issue as mine, but could not be reproduced at the time and was closed.
This has been fixed in Elasticsearch 2.4:
https://github.com/elastic/elasticsearch/pull/20196
In my ES mapping I have an 'uri' field which is currently set to not_analysed and I'm not allowed to change the mapping.I wanted to search for uri parts with a query_string query like this (this ES query is autogenerated, that is why it is a bit complicated but let's just focus on the query_string part)
{
"sort": [{"updated": {"order": "desc"}}],
"query": {
"bool": {
"must":[{
"query_string": {
"query":"*w3\\.org\\/2014\\/01\\/a*",
"lowercase_expanded_terms": true,
"default_field": "uri"
}
}],
"minimum_number_should_match": 1
}
}, "size": 50}
Now it is usually working, but I've the following url stored (fictional url): http://w3.org/2014/01/Abc.html and this query does not bring it back because of the A-a difference. Setting the expanded terms to false also not solves this. What should I do for this query to be case insensitive?
Thanks for the help in advance.
From the docs, it seems like you need a new analyzer that first transforms to lowercase and then can run the search. Have you tried that?
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/sorting-collations.html
As I read it, your pattern, lowercase_expanded_terms, only applies to expansions, not to regular words
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
lowercase_expanded_terms
Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Default it true
Try to use match query instead of query string.
{
"sort": [
{
"updated": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"uri": "*w3\\.org\\/2014\\/01\\/a*"
}
}
]
}
},
"size": 50
}
Query string queries are not analyzed and but match queries are analyzed.
What is the difference between such query:
"query": {
"bool": {
...
"should": [
{
"match": {
"description": {
"query": "test"
}
}
},
{
"match": {
"address": {
"query": "test",
}
}
},
{
"match": {
"country": {
"query": "test"
}
}
},
{
"match": {
"city": {
"query": "test"
}
}
}
]
}}
and that one:
"query": {
"bool": {
...
"should": [
{
"query_string": {
"query": "test",
"fields": [
"description",
"address",
"country",
"city"
]
}
}
]
}}
Performance, relevance?
Thanks in advance!
The query is analyzed depending on the field analyzer (unless you specify the analyzer in the query itself), thus querying multiple fields with a single query doesn't necessarily mean analyzing the query only once.
Keep in mind that the query_string supports the lucene query syntax: AND and OR operators, querying on specific fields, wildcard, phrase queries etc. therefore it needs to be parsed, which I don't think makes a lot of difference here in terms of performance, but it is error prone and might lead to errors. If you don't need all that power, stick to the match query, and if you want to perform the same query on multiple fields, have a look at the multi_match query, which does what you did with your query_string but translates internally to multiple match queries.
Also, the scores returned if you compare the output of multiple match queries and your query_string might be quite different. Using a bool query you effectively build a lucene boolean query, while the query_string uses by default "use_dis_max":"true", which means it uses internally a dis_max query by default. Same happens using the multi_match query. If you set use_dis_max to false a bool query is going to be used internally instead.
I terms of performance, I would say that the second query will have performance benefits because, the first query requires the query string to be analyzed for all the four match sections, while in the second there is only one query string that needs to be analyzed.
Apart from that, there are some comparisons done over here that you can look at.
I am not quite sure about the relevancy differences, but that you can always fire these two queries and see if there is any difference in relevance from the results fetched.