Inconsistent behavior of ElasticSearch not_analyzed field - elasticsearch

I am using ES version 2.3. I have index some documents which have the structure like this :
{
"BUSINESSLINE" :"ABC CORP",
"NAME" : "John"
....
...
}
The field BUSINESSLINE is not_analyzed string.
The problem is that this query returns results :
{
"query": {
"multi_match" : {
"query": "ABC",
"fields": [ "_all" ]
}
}
}
But this one does not (It shows no hits!):
{
"query": {
"multi_match" : {
"query": "ABC",
"fields": [ "BUSINESSLINE " ]
}
}
}
Any help is appreciated, I tried to google and research but I am not able to able find any reason for this.
Thanks!

Yes, you are correct. The query matches the document because of _all filed which is a big string constructed by concatenating all fields by the space separator. And it is also analysed which is why your query is being matched.
You can read more about it here.

Related

is there match phrase any query in elasticsearch?

In elasticsearch match_phrase query will match full phrase.
match_phrase_prefix query will match phrase as prefix.
for example:
"my_field": "confidence ab"
will match: "confidence above" and "confidence about".
is there query for "match phrase any" like below example:
"my_field": "dence ab"
should fetch match: "confidence above" and "confidence about"
Thanks
There are 2 ways that you can do this
Store the field values as-is in ES by applying keyword analyzer type in mapping => Do a wildcard search
(OR)
Store the field using ngram tokenizer => Do search your data based on your requirement with or without using standard or keyword search analyzers
usually wildcard search are performance inefficient .
Please do let me know on your progress based on my above suggestions so that I can help you further if needed
You need to define the mapping of your field to keyword like below:
PUT test
{
"mappings": {
"properties": {
"name":{
"type": "keyword"
}
}
}
}
Then search over this field using wildcard like below:
GET test/_search
{
"query": {
"wildcard": {
"name": {
"value": "*dence ab*"
}
}
}
}
Please let me know if your have any problem with this.
In your case, the simplest solution is using Query string query or Simple query string query. The latter one is less strict with the query syntax error.
First, make sure that your field is mapped with type text. The example below create a mapping for field named my_field under the test-index.
{
"test-index" : {
"mappings" : {
"properties" : {
"my_field" : {
"type" : "text"
}
}
}
}
}
Then, for searching, use query string query with wild-cards.
{
"query": {
"query_string": {
"fields": ["my_field"],
"query": "*dence ab*"
}
}
}

elasticsearch multi_match with phrase_prefix not working

I am running the below search query on my index
{
"_source": "false",
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["email","name", "company", "phone"],
"query": "tes",
"type" : "phrase_prefix"
}
}
]
}
},
"highlight": {
"fields": {"name": {}, "company" : {}, "email" : {}, "phone" : {}}
}
}
I have some sample data with the field values
name: test paddy
name : test user
name : test logger
name : test
When I run the above query, I do not get any results, but when I change it to "query": "test", I start seeing 1 result "test". I was expecting to see in both cases all the above names that i have. Am I missing something here?
UPDATE
I also noticed that this is working with text fields, but fails with keywords, long fields etc Also, when I tried
{ "query": {
"prefix" : { "phone" : 99 }
}
}
with number fields and keyword fields its working.
So is it like multi_match and prefix don't work well with keyword and number fields?
The issue was that I was running this on keyword fields. I changed it to text and worked like a beauty. Should have read the documentation more clearly!

How to make use of `gt` and `fields` in the same query in Elasticsearch

In my previous question, I was introduced to the fields in a query_string query and how it can help me to search nested fields of a document.
{
"query": {
"query_string": {
"fields": ["*.id","id"],
"query": "2"
}
}
}
But it only works for matching, what if I want to do some comparison? After some reading and testing, it seems queries like range do not support fields. Is there any way I can perform a range query, e.g. on a date, over a field that can be scattered anywhere in the document hierarchy?
i.e. considering the following document:
{
"id" : 1,
"Comment" : "Comment 1",
"date" : "2016-08-16T15:22:36.967489",
"Reply" : [ {
"id" : 2,
"Comment" : "Inner comment",
"date" : "2016-08-16T16:22:36.967489"
} ]
}
Is there a query searching over the date field (like date > '2016-08-16T16:00:00.000000') which matches the given document, because of the nested field, without explicitly giving the address to Reply.date? Something like this (I know the following query is incorrect):
{
"query": {
"range" : {
"date" : {
"gte" : "2016-08-16T16:00:00.000000",
},
"fields": ["date", "*.date"]
}
}
}
The range query itself doesn't support it, however, you can leverage the query_string query (again) and the fact that you can wildcard fields and that it supports range queries in order to achieve what you need:
{
"query": {
"query_string": {
"query": "\*date:[2016-08-16T16:00:00.000Z TO *]"
}
}
}
The above query will return your document because Reply.date matches *date

ElasticSearch - Search for complete phrase only

I am trying to create a search that will return me exactly what i requested.
For instance let's say i have 2 documents with a field named 'Val'
First doc have a value of 'a - Copy', second document is 'a - Copy (2)'
My goal is to search exactly the value 'a - Copy' and find only the first document in my returned results and not both of them with different similarity rankings
When i try most of the usual queries like:
GET test/_search
{
"query": {
"match": {
"Val": {
"query": "a - copy",
"type": "phrase"
}
}
}
}
or:
GET /test/doc/_search
{
"query": {
"query_string": {
"default_field": "Val",
"query": "a - copy"
}
}
}
I get both documents all the time
There is a very good documentation for finding exact values in ES:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html
It shows you how to use the term filter and it mentions problems with analyzed fields, too.
To put it in a nutshell you need to run a term filter like this (I've put your values in):
GET /test/doc/_search
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"term" : {
"Val" : "a - copy"
}
}
}
}
}
However, this doesn't work with analyzed fields. You won't get any results.
To prevent this from happening, we need to tell Elasticsearch that
this field contains an exact value by setting it to be not_analyzed.
There are multiple ways to achieve that. e.g. custom field mappings.
Yes, you are getting that because your field is, most likely, analyzed and split into tokens.
You need an analyzer similar to this one
"custom_keyword_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
which uses the keyword tokenizer and the lowercase filter (I noticed you indexed upper case letters, but expect to search with lowercase letters).
And then use a term filter to search your documents.

Favor exact matches over nGram in elasticsearch

I am trying to map a field as nGram and 'exact' match, and make the exact matches appear first in the search results. This is an answer to a similar question, but I am struggling to make it work.
No matter what boost value I specify for the 'exact' field I get the same results order each time. This is how my field mapping looks:
"name" : {
"type" : "multi_field",
"fields" : {
"name" : {
"type" : "string",
"boost" : 2.0,
"analyzer" : "ngram"
},
"exact" : {
"type" : "string",
"boost" : 4.0,
"analyzer" : "simple",
"include_in_all" : false
}
}
}
And this is how the query looks like:
{
"query": {
"filtered": {
"query": {
"query_string": {
"fields":["name","name.exact"],
"query":"Woods"
}
}
}
}
}
Understating how score is calculated
Elasticsearch has an option for producing an explanation with every search result. by setting the explain parameter to be true
POST <Index>/<Type>/_search?explain&format=yaml
{
"query" : " ....."
}
it will produce a lot of output for every hit and that can be overwhelming, but it worth taking some time to understand what it all means
the output of eplian might be harder to read in json, so adding format=yaml makes it easier to read
Understanding why a document is matched or not
you can pass the query to a specific document like below to see explanation how matching is being done.
GET <Index>/<type>/<id>/_explain
{
"query": "....."
}
The multi_field mapping is correct, but the search query needs to be changed like this:
{
"query": {
"filtered": {
"query": {
"multi_match": { # changed from "query_string"
"fields": ["name","name.exact"],
"query": "Woods",
# added this so the engine does a "sum of" instead of a "max of"
# this is deprecated in the latest versions but works with 0.x
"use_dis_max": false
}
}
}
}
}
Now the results take into account the 'exact' match and adds up to the score.

Resources