Elasticseach query filter/term not working when special characters are involved - elasticsearch

The following query is not working when "metadata.name" has "-" in the text like "demo-application-child3" . But if I remove "-" and make the query to "demoapplicationchild3". It works. The same with other field metadata.version. I've the data for both demoapplicationchild3 and demo-application-child3. suggestions please.
{
"query": {
"bool": {
"filter": [
{"term": { "metadata.name": "demo-application-child3" }},
{"term": { "metadata.version": "00.00.100" }}]
}
}
}

term queries are not analyzed see the official doc which clearly mention this
Returns documents that contain an exact term in a provided field.
Which clearly means that index time you are using some custom analyzer which is removing - and joining the tokens ie for demo-application-child3 your custom analyzer would be generating demoapplicationchild3 token, which you can easily confirm using the Analyze api.
If you want to get result either change term query to match query or use the .keyword suffix with your field if mappping is generated dynamically or create another field which is of type keyword which uses no-op analyzer.

Related

ElasticSearch: Using match_phrase for all fields

As a user of ElasticSearch 5, I have been using something like this to search for a given phrase in all fields:
GET /my_index/_search
{
"query": {
"match_phrase": {
"_all": "this is a phrase"
}
}
}
Now, the _all field is going away, and match_phrase does not seem to work like query_string, where you can simply use something like this to run a search for all fields:
"query": {
"query_string": {
"query": "word"
}
}
What is the alternative for a exact phrase search for all fields without using the _all field from version 6.0?
I have many fields per document so specifying all of them in the query is not really a solution for me.
You can find answer in Elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
It says:
Use a custom field and the mapping copy_to parameter
So, you have to create custom fields in source, and copy all other fields to it.

Elasticsearch wrong explanation validate api

I'm using Elasticsearch 5.2. I'm executing the below query against an index that has only one document
Query:
GET test/val/_validate/query?pretty&explain=true
{
"query": {
"bool": {
"should": {
"multi_match": {
"query": "alkis stackoverflow",
"fields": [
"name",
"job"
],
"type": "most_fields",
"operator": "AND"
}
}
}
}
}
Document:
PUT test/val/1
{
"name": "alkis stackoverflow",
"job": "developer"
}
The explanation of the query is
+(((+job:alkis +job:stackoverflow) (+name:alkis +name:stackoverflow))) #(#_type:val)
I read this as:
Field job must have alkis and stackoverflow
AND
Field name must have alkis and stackoverflow
This is not the case with my document though. The AND between the two fields is actually OR (as it seems from the result I'm getting)
When I change the type to best_fields I get
+(((+job:alkis +job:stackoverflow) | (+name:alkis +name:stackoverflow))) #(#_type:val)
Which is the correct explanation.
Is there a bug with the validate api? Have I misunderstood something? Isn't the scoring the only difference between these two types?
Since you picked the most_fields type with an explicit AND operator, the reasoning is that one match query is going to be generated per field and all terms must be present in a single field for a document to match, which is your case, i.e. both terms alkis and stackoverflow are present in the name field, hence why the document matches.
So in the explanation of the corresponding Lucene query, i.e.
+(((+job:alkis +job:stackoverflow) (+name:alkis +name:stackoverflow)))
when no specific operator is specified between the terms, the default one is an OR
So you need to read this as: Field job must have both alkis and stackoverflow OR field name must have both alkis and stackoverflow.
The AND operator that you apply only concerns all the terms in your query but in regard to a single field, it's not an AND between all fields. Said differently, your query will be executed as a two match queries (one per field) in a bool/should clause, like this:
{
"query": {
"bool": {
"should": [
{ "match": { "job": "alkis stackoverflow" }},
{ "match": { "name": "alkis stackoverflow" }}
]
}
}
}
In summary, the most_fields type is most useful when querying multiple fields that contain the same text analyzed in different ways. This is not your case and you'd probably better be using cross_fields or best_fields depending on your use case, but certainly not most_fields.
UPDATE
When using the best_fields type, ES generates a dis_max query instead of a bool/should and the | (which is not an OR !!) sign separates all sub-queries in a dis_max query.

In Elasticsearch match query how to deal with slash

I have a match query searching for a type of doc:
{
"query": {
"bool": {
"should": {
"match": {
"ph1_enc": "EAAQnb1kMr/e2/ADqo"
}
}
}
}
}
"EAAQnb1kMr/e2/ADqo" is the string i'm trying to match, however in the search results I can see multiple records with substring "/e2/" are also returned.
Looks like "/e2/" is indexed separately, so that this could happen.I thought the match query is to do full-text match... Is it because I missed something when creating the template? Any idea?
Add-on instead of reindex, how to modify the query to match the exact value in the query?
Which analyzer do you set in the mapping to index your data?
If you are using the default one (standard analyzer), then according to the documentation, this uses the default tokenizer that seems to split also the text by slash ('/'). The documentation redirects here for more information about the tokenizer.
So, that will index the following words 'EAAQnb1kMr', 'e2', and 'ADqo'. Accordingly, your query value will also been analyzed the same way the field was indexed. That is why documents with 'e2' are also being returned.
If you don't need to tokenize the 'ph1_enc' field, you can just set its type in the mapping as 'keyword'.
"properties": {
"ph1_enc": {
"type": "keyword"
}
}
That will not analyze the field and it will match exactly while you query.
I hope that it helps.

How to deal with punctuation in an ElasticSearch field

I have a field in a document stored in Elastic Search, which I want to be analyzed as a full text field. In one case, it contains a value for the name field like this:
A&B Corp
I want to be able to search the documents for an auto-complete widget, using a query like this (suppose the user typed A&B into the autocomplete field). The intention is to match documents that contain the any terms with the typed prefix.
{ "query": {
"filtered": {
"query": {
"query_string": {
"query": "A&B*",
"fields": [
"firstName",
"lastName",
"name",
"key",
"email"
]
}
},
"filter": {
"terms": {
"environmentId": [
"foo"
]
}
}
}
}
}
```
My mapping for the name field looks like this:
"name": {
"type": "string"
},
But, I get no results. The query structure works for documents that don't have & in the field, so I'm pretty sure that is part of the problem.
But, I'm not sure how to deal with this. I am pretty sure I still want to analyze the field for full text search.
In addition, if I add a space before the * in the query (ie, "query": "A&B *",) then I get results including A&B, so I don't think it is just discarding the ampersand and treating the A and B as separate terms.
Should I change my mapping? The query?
The Query_string query has a set of reserved characters that needs to be escaped.
query_string : Read the reserved characters section
So to search for
'A&B' (or) 'A&B Corp' (or) 'A&B....'
Your query must be "A&B\\*" such that the query_string parser treats
it as a * wildcard operator.
While currently your query is searching for exact match of
"A&B*" it expects asterik to be part of your data.
And when you search "A&B *" the whitespace is a reserved
character so its
now searching for "A&B" (or) "*" and hence you get a match in this
case.

Sorting a match query with ElasticSearch

I'm trying to use ElasticSearch to find all records containing a particular string. I'm using a match query for this, and it's working fine.
Now, I'm trying to sort the results based on a particular field. When I try this, I get some very unexpected output, and none of the records even contain my initial search query.
My request is structured as follows:
{
"query":
{
"match": {"_all": "some_search_string"}
},
"sort": [
{
"some_field": {
"order": "asc"
}
}
] }
Am I doing something wrong here?
In order to sort on a string field, your mapping must contain a non-analyzed version of this field. Here's a simple blog post I found that describes how you can do this using the multi_field mapping type.

Resources