As per the elasticsearch 5.1 documentation, I have built the following query to implement a basic search functionality on a subset of the piece of software I am building. For some reason, this query never returns any results even if all of the fields are present. All users are guaranteed to have all of these fields, but to be safe I tested it with each individual field and got the same result each time.
"query": {
"multi_match": {
"fields": [
"displayName",
"title",
"team",
"teamLeader"
],
"query": "a",
"fuzziness": "AUTO"
}
}
}
I have also attempted using other types like best_fields, phrase_prefix, etc. to no avail. I know the data is there because my filter query works just fine, but suddenly no data returns after I add this section. Is there anything I can do to better debug this situation?
Related
What is the difference between simple_query_string and query_string in elastic search?
Which is better for searching?
In the elastic search simple_query_string documentation, they are written
Unlike the regular query_string query, the simple_query_string query
will never throw an exception and discards invalid parts of the
query.
but it not clear. Which one is better?
There is no simple answer. It depends :)
In general the query_string is dedicated for more advanced uses. It has more options but as you quoted it throws exception when sent query cannot be parsed as a whole. In contrary simple_query_string has less options but does not throw exception on invalid parts.
As an example take a look at two below queries:
GET _search
{
"query": {
"query_string": {
"query": "hyperspace AND crops",
"fields": [
"description"
]
}
}
}
GET _search
{
"query": {
"simple_query_string": {
"query": "hyperspace + crops",
"fields": [
"description"
]
}
}
}
Both are equivalent and return the same results from your index. But when you will break the query and sent:
GET _search
{
"query": {
"query_string": {
"query": "hyperspace AND crops AND",
"fields": [
"description"
]
}
}
}
GET _search
{
"query": {
"simple_query_string": {
"query": "hyperspace + crops +",
"fields": [
"description"
]
}
}
}
Then you will get results only from the second one (simple_query_string). The first one (query_string) will throw something like this:
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "Failed to parse query [hyperspace AND crops AND]",
"index_uuid": "FWz0DXnmQhyW5SPU3yj2Tg",
"index": "your_index_name"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
...
]
},
"status": 400
}
Hope you are now understand the difference with throwing/not throwing exception.
Which is better? If you want expose the search to some plain end users I would rather recommend to use simple_query_string. Thanks to that end user will get some result in each query case even if he made a mistake in a query. query_string is recommended for some more advanced users who will be trained in how is the correct query syntax so they will know why they do not have any results in every particular situation.
Adding to what #Piotr has mentioned,
What I understand is when you want the external users or consumers want to make use of the search solution, simple query string offers better solution in terms of error handling and limiting what kind of queries users can probably construct.
In other words, if the search solution is available publicly for any consumers to consume the solution, then I guess simple_query_string would make sense, however if I do know who my end-users are in a way I can drive them as what they are looking for, no reason why I cannot expose them via query_string
Also QueryStringQueryBuilder.java makes use of QueryStringQueryParser.java while SimpleQueryStringBuilder.java makes use of SimpleQueryStringQueryParser.java which makes me think that there would be certain limitations in parsing and definitely the creators wouldn't want many features to be managed by end-users. for .e.g dis-max and which is available in query_string.
Perhaps the main purpose of simple query string is to limit end-users to make use of simple querying for their purpose and devoid them of all forms of complex querying and advance features so that we have more control on our search engine (which I'm not really sure about but just a thought).
Plus the possibilities to misuse query_string might be more as only advanced users are capable of constructing certain complex queries in correct way which may be a bit too much for simple users who are looking for basic search solution.
Assuming I have an index with two fields: title and loc, I would like to search in this two fields and get the "best" match. So if I have three items:
{"title": "castle", "loc": "something"},
{"title": "something castle something", "loc": "something,pontivy,something"},
{"title": "something else", "loc": "something"}
... I would like to get the second one which has "castle" in its title and "pontivy" in its loc. I tried to simplify the example and the base, it's a bit more complicated. So I tried this query, but it seems not accurate (it's a feeling, not really easy to explain):
GET merimee/_search/?
{
"query": {
"multi_match" : {
"query": "castle pontivy",
"fields": [ "title", "loc" ]
}
}
}
Is it the right way to search in various field and get the one which match the in all the fields?
Not sure my question is clear enough, I can edit if required.
EDIT:
The story is: the user type "castle pontivy" and I want to get the "best" result for this query, which is the second because it contains "castle" in "title" and "pontivy" in "loc". In other words I want the result that has the best result in both fields.
As the other posted suggested, you could use a bool query but that might not work for your use case since you have a single search box that you want to query against multiple fields with.
I recommend looking at a Simple Query String query as that will likely give you the functionality you're looking for. See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
So you could do something similar to this:
{
"query": {
"simple_query_string" : {
"query": "castle pontivy",
"fields": ["title", "loc"],
"default_operator": "and"
}
}
}
So this will try to give you the best documents that match both terms in either of those fields. The default operator is set as AND here because otherwise it is OR which might not give you the expected results.
It is worthwhile to experiment with other options available for this query type as well. You might also explore using a Query String query as it gives more flexibility but the Simple Query String term works very well for most cases.
This can be done by using bool type of query and then matching the fields.
GET _search
{
"query":
{
"bool": {"must": [{"match": {"title": "castle"}},{"match": {"loc": "pontivy"}}]
}
}
}
I'm using Elasticsearch 5.2. I'm executing the below query against an index that has only one document
Query:
GET test/val/_validate/query?pretty&explain=true
{
"query": {
"bool": {
"should": {
"multi_match": {
"query": "alkis stackoverflow",
"fields": [
"name",
"job"
],
"type": "most_fields",
"operator": "AND"
}
}
}
}
}
Document:
PUT test/val/1
{
"name": "alkis stackoverflow",
"job": "developer"
}
The explanation of the query is
+(((+job:alkis +job:stackoverflow) (+name:alkis +name:stackoverflow))) #(#_type:val)
I read this as:
Field job must have alkis and stackoverflow
AND
Field name must have alkis and stackoverflow
This is not the case with my document though. The AND between the two fields is actually OR (as it seems from the result I'm getting)
When I change the type to best_fields I get
+(((+job:alkis +job:stackoverflow) | (+name:alkis +name:stackoverflow))) #(#_type:val)
Which is the correct explanation.
Is there a bug with the validate api? Have I misunderstood something? Isn't the scoring the only difference between these two types?
Since you picked the most_fields type with an explicit AND operator, the reasoning is that one match query is going to be generated per field and all terms must be present in a single field for a document to match, which is your case, i.e. both terms alkis and stackoverflow are present in the name field, hence why the document matches.
So in the explanation of the corresponding Lucene query, i.e.
+(((+job:alkis +job:stackoverflow) (+name:alkis +name:stackoverflow)))
when no specific operator is specified between the terms, the default one is an OR
So you need to read this as: Field job must have both alkis and stackoverflow OR field name must have both alkis and stackoverflow.
The AND operator that you apply only concerns all the terms in your query but in regard to a single field, it's not an AND between all fields. Said differently, your query will be executed as a two match queries (one per field) in a bool/should clause, like this:
{
"query": {
"bool": {
"should": [
{ "match": { "job": "alkis stackoverflow" }},
{ "match": { "name": "alkis stackoverflow" }}
]
}
}
}
In summary, the most_fields type is most useful when querying multiple fields that contain the same text analyzed in different ways. This is not your case and you'd probably better be using cross_fields or best_fields depending on your use case, but certainly not most_fields.
UPDATE
When using the best_fields type, ES generates a dis_max query instead of a bool/should and the | (which is not an OR !!) sign separates all sub-queries in a dis_max query.
I want to get the counts for two items in a column from my indexer in elastic search. There is a column called "category" in my indexer and it contains multiple entries. In which I am interested to get 'billing' and 'expenditure' for multiple dates. For which, I have written the below query and it is not working for two values. I can pull a single value either "billing" or "expenditure" but not two at once.
{"dates":
{"date_histogram":
{"field": "createdDateTime",
"interval": "day"},
"aggs": {"fields":
{"term":
{"field": "type",
"include": ["billing", "expenditure"]
}}}}}
The above code is not working in this case, to make it work I need to change the "include" line to
"include": "billing"
or
"include": "expenditure"
It would be great help, if someone look into this and help.:)
Below answers are working for my post above, now I have come across one more problem with the above post that:
In my 'type' field, I want to filter one more value called "spent on". Here the problem is -- ES considers this two worded word as two terms and the result is not as expected. Please help in this. Just want to filter this two worded word as a single word instead of two.
From ES docs( https://www.elastic.co/guide/en/elasticsearch/reference/1.5/search-aggregations-bucket-terms-aggregation.html?q=terms%20agg#_filtering_values):
It is possible to filter the values for which buckets will be created.
This can be done using the include and exclude parameters which are
based on regular expression strings or arrays of exact values [1.5.0]
Added in 1.5.0. support for arrays of values.
So its possible to use array since 1.5 version
"aggs": {
"aggterm": {
"terms": {
"field": "type",
"include" : ["billing", "expenditure"]
}
}
}
The include expects a regex pattern. As #jgr pointed out this is true only for versions of elasticsearch < 1.5.0. So for the example provided in query
would look something as below for versions < 1.5.0 :
"aggs": {
"aggterm": {
"terms": {
"field": "type",
"include" : "billing|expenditure"
}
}
}
If not the example you have in the OP should work
I'm trying to use ElasticSearch to find all records containing a particular string. I'm using a match query for this, and it's working fine.
Now, I'm trying to sort the results based on a particular field. When I try this, I get some very unexpected output, and none of the records even contain my initial search query.
My request is structured as follows:
{
"query":
{
"match": {"_all": "some_search_string"}
},
"sort": [
{
"some_field": {
"order": "asc"
}
}
] }
Am I doing something wrong here?
In order to sort on a string field, your mapping must contain a non-analyzed version of this field. Here's a simple blog post I found that describes how you can do this using the multi_field mapping type.