what's difference between simple_query_string and query_string? - elasticsearch

I had a nested field source in my index seems like this:
"source": [
{
"name": "source_c","type": "type_a"
},
{
"name": "source_c","type": "type_b"
}
]
I used query_string query and simple_query_string query to query type_a and got two different result.
query_string
{
"size" : 3,
"query" : {
"bool" : {
"filter" : {
"query_string" : {
"query" : "source:\"source.type:=\"type_a\"\""
}
}
}
}
}
I got 163459 hits in 294088 docs.
simple_query_string
{
"size": 3,
"query": {
"bool": {
"filter": {
"simple_query_string": {
"query": "source:\"source.type:=\"type_a\"\""
}
}
}
}
}
I got 163505 hits in 294088 docs.
I only made three different types type_a,type_b,type_c randomly. So I had to say 163459 and 163505 were very little difference in 294088 docs.
I noly got one info in Elasticsearch Reference [2.1]
Unlike the regular query_string query, the simple_query_string query will never throw an exception, and discards invalid parts of the query.
I don't think it's the reason to make the difference.
I want to know what make the little different results between query_string and simple_query_string?

As far as I know, nested query syntax is not supported for either query_string or simple_query_string. It is an open issue, and this is the PR regarding that issue.
Then how are you getting the result? Here Explain API will help you understand what is going on. This query
{
"size": 3,
"query": {
"bool": {
"filter": {
"simple_query_string": {
"query": "source:\"source.type:=\"type_a\"\""
}
}
}
}
}
have a look at the output, you will see
"description": "ConstantScore(QueryWrapperFilter(_all:source _all:source.type _all:type_a)),
so what is happening here is that ES looking for term source , source.type or type_a, it finds type_a and returns the result.
You will also find something similar with query_string using explain api
Also query_string and simple_query_string have different syntax, for e.g field_name:search_text is not supported in simple_query_string.
Correct way to query nested objects is using nested query
EDIT
This query will give you desired results.
{
"query": {
"nested": {
"path": "source",
"query": {
"term": {
"source.type": {
"value": "type_a"
}
}
}
}
}
}
Hope this helps!!

Acording to the documentation simple_query_string is meant to be used with unsafe input.
So that users can enter anything and it will not throw exception if input is invalid. Will simply discard invalid input.

Related

How to write ElasticSearch query with AND condition

I am trying to write an elastic search query for searching the data with two.conditions something as below
{
"query": {
"match": {
"trackingId": "track4324234234244",
"log_message": "downstream request-response"
}
}
}
The above query wont work because [match] query doesn't support multiple fields. Is there a way I can achieve this.
You can use Bool query, where a must clause can be used.
must means: The clause (query) must appear in matching documents. These clauses must match, like logical AND.
To know about the difference between must and should refer to this SO answer
Adding Working example with sample docs and search query
Index Sample Data:
{
"trackingId":"track4324234234244",
"log_message":"downstream request-response"
}
{
"trackingId":"track4324234234244",
"log_message":"downstream"
}
{
"trackingId":"tracks4324234234244",
"log_message":"downstream request-response"
}
Search query:
{
"query": {
"bool": {
"must": [
{
"match": {
"trackingId": "track4324234234244"
}
},
{
"match": {
"log_message": {
"query": "downstream request-response",
"operator": "and"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 1.8570712,
"_source": {
"trackingId": "track4324234234244",
"log_message": "downstream request-response"
}
}
]
Apart from Bool, you can also make use of simple query string as mentioned below:
POST <your_index_name>/_search
{
"query": {
"simple_query_string": {
"fields": ["trackingId", "log_message"],
"query": "track4324234234244 downstream request-response",
"default_operator": "AND"
}
}
}
Note how I've just added all the terms and made use of default_operator: AND so that it returns only documents having all the terms present in the fields.
There is also query_string however I would recommend using the above one as query_string works in strict fashion meaning, it would throw errors if the query string has any syntax errors while simple_query_string does not.
POST <your_index_name>/_search
{
"query": {
"query_string": {
"fields": ["trackingId", "log_message"],
"query": "(track4324234234244) AND (downstream request-response)",
"default_operator": "AND"
}
}
}
So as to when to use simple_query_string, mostly only if you would want to expose the query string or terms to end user, at that point which this would be useful.
Hope that helps!

ElasticSearch must-terms does not return data

My ElasticSearch must-terms does not work, the data has clientId value "08d71bc7-c4ab-6e1d-f858-cf3448242e8b" but the result is empty. I am using elasticsearch:6.7.1. Do you know the problem here?
{
"from": 0,
"size": 20,
"query": {
"bool": {
"must": [
{ "terms": { "clientId": ["08d71bc7-c4ab-6e1d-f858-cf3448242e8b", "08d71bc7-c4ab-6e1d-f858-cf3448242e8c"] } },
{
"query_string": {
"query": "*d*",
"fields": ["name", "description", "title"]
}
},
{ "query_string": { "query": "1", "fields": ["type"] } }
]
}
}
}
I share sample data
I haven't worked enough with "query_string"... But if you don't put them and run your query, I'm sure it should at least give you some results. If so, your "query_string"s are the ones that are giving you this bad time
I first recommend you to use "filter" instead of "must".
Consider using the Regexp query your first "query_string". I found here how to query multiple fields with Regexp.
For the second, it would be enough to use "term" instead of "query_string".
Hope this is helpful! :D
The search results depends on the analysis type of clientId . If clientId is a 'keyword' your query should work as expected, but if the type of clientId is 'text' then the value might get tokenized to smaller parts (break at the dash).
You can check the clientId fields type in the index mappings, and also run the analyze API to check the tokenization: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html

elasticsearch added wildcard fails query

Works as expected:
{
"query": {
"query_string": {
"query": "Hofstetten-Grünau"
}
}
}
an added wildcard at the end delivers no results and I wonder why:
{
"query": {
"query_string": {
"query": "Hofstetten-Grünau*"
}
}
}
how to fix it?
elasticsearch v5.3.2
This delivers results:
{
"query": {
"query_string": {
"query": "Hofstetten*"
}
}
}
I use a single search field. The end user can freely use wildcards as they see fit. A user might type in:
hofstetten grünau
+ort:hofstetten-grünau
+ort:Hofstetten-G*
so using a match query wont work out for me.
I am using Jest (Java Annotations) as Mapping, and using "default" for this field. My index mapping declares nothing special for the field:
{
"mappings": {
"_default_": {
"date_detection": false,
"dynamic_templates": [{
}]
}
}
}
Adding the wildcard "*" at the end of your query string is causing the query analyzer to interpret the dash between "Hofstetten" and "Grünau" as a logical NOT operator. So you're actually searching for documents that contain Hofstetten but do NOT contain Grünau.
You can verify this by doing the following variations of your search:
"query": "Hofstetten-XXXXX" #should not return results
"query": "Hofstetten-XXXXX*" #should return results
To fix this I would recommend using a match query instead of a query_string query:
{"query": {"match": { "city": "Hofstetten-Grünau" }}}'
(with whatever your appropriate field name is in place of city).

Search in every field with a fixed parameter

Perhaps it's a basic question; by the way, I need to search in every indexed field and to have a specific fixed value for another field.
How can I do it?
Currently I have a simple: query( "aValue", array_of_models )
I tried many options without success, for example:
query({
"query": {
"bool": {
"query": "aValue",
"filter": {
"term": {
"published": "true"
}
}
}
}
})
I would prefer to avoid to specify the fields to search in because I use the same search params for different models.
I found a solution, perhaps it's not optimized but works:
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": "aValue"
}
}
],
"filter": {
"term": {
"published": true
}
}
}
}
}
Not sure if I understood correctly your intention.
The _all field is as default enabled. So if you have no special mapping every indexed field value is added as text string to the _all field.
You can use the
Query String Query, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
Simple Query String Query, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
With a simple query like this, that should work for you.
GET my_index/_search
{
"query": {
"simple_query_string": {
"query": "aValue",
"fields": []
}
}
}
Both query types contains parameters, that should suffice your use case IMHO.

Elasticsearch fuzzy matching: How can I get direct hits first?

I'm using Elasticsearch to search names in a database, and I want it to be fuzzy to allow for minor spelling errors. Based on the advice I've found on the matter, I'm using "match" and "fuzziness" instead of "fuzzy", which definitely seems to be more accurate. This is my query:
{ "query":
{ "match":
{ "last_name":
{ "query": "Beach",
"type": "phrase",
"fuzziness": 2
}
}
}
}
However, even though I have numerous results with last_name "Beach" (I know there's at least 100), I also get results with last_name "Beech" and "Berch" in the first 10 hits returned by my query. Can someone help me figure out how to get the exact matches first?
Try changing your query to a boolean query with 2 should queries.
The first one being your current query, and then second being a query that only gives exact matches, then give that one a big boost (like 10.0).
That should get your exact matches on top while still listing your partial matches.
I tried to edit "Constantijn" answer above to include sample based on his answer, but still not appearing (pending approval). So, I will just put a sample here instead...
{
"query": {
"bool": {
"should": [
{
"match": {
"last_name": {
"query": "Beach",
"fuzziness": 2,
"boost": 1
}
}
},
{
"match": {
"last_name": {
"query": "Beach",
"boost": 10
}
}
}
]
}
}
}

Resources