ElasticSearch - Search for complete phrase only - elasticsearch

I am trying to create a search that will return me exactly what i requested.
For instance let's say i have 2 documents with a field named 'Val'
First doc have a value of 'a - Copy', second document is 'a - Copy (2)'
My goal is to search exactly the value 'a - Copy' and find only the first document in my returned results and not both of them with different similarity rankings
When i try most of the usual queries like:
GET test/_search
{
"query": {
"match": {
"Val": {
"query": "a - copy",
"type": "phrase"
}
}
}
}
or:
GET /test/doc/_search
{
"query": {
"query_string": {
"default_field": "Val",
"query": "a - copy"
}
}
}
I get both documents all the time

There is a very good documentation for finding exact values in ES:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html
It shows you how to use the term filter and it mentions problems with analyzed fields, too.
To put it in a nutshell you need to run a term filter like this (I've put your values in):
GET /test/doc/_search
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"term" : {
"Val" : "a - copy"
}
}
}
}
}
However, this doesn't work with analyzed fields. You won't get any results.
To prevent this from happening, we need to tell Elasticsearch that
this field contains an exact value by setting it to be not_analyzed.
There are multiple ways to achieve that. e.g. custom field mappings.

Yes, you are getting that because your field is, most likely, analyzed and split into tokens.
You need an analyzer similar to this one
"custom_keyword_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
which uses the keyword tokenizer and the lowercase filter (I noticed you indexed upper case letters, but expect to search with lowercase letters).
And then use a term filter to search your documents.

Related

is there match phrase any query in elasticsearch?

In elasticsearch match_phrase query will match full phrase.
match_phrase_prefix query will match phrase as prefix.
for example:
"my_field": "confidence ab"
will match: "confidence above" and "confidence about".
is there query for "match phrase any" like below example:
"my_field": "dence ab"
should fetch match: "confidence above" and "confidence about"
Thanks
There are 2 ways that you can do this
Store the field values as-is in ES by applying keyword analyzer type in mapping => Do a wildcard search
(OR)
Store the field using ngram tokenizer => Do search your data based on your requirement with or without using standard or keyword search analyzers
usually wildcard search are performance inefficient .
Please do let me know on your progress based on my above suggestions so that I can help you further if needed
You need to define the mapping of your field to keyword like below:
PUT test
{
"mappings": {
"properties": {
"name":{
"type": "keyword"
}
}
}
}
Then search over this field using wildcard like below:
GET test/_search
{
"query": {
"wildcard": {
"name": {
"value": "*dence ab*"
}
}
}
}
Please let me know if your have any problem with this.
In your case, the simplest solution is using Query string query or Simple query string query. The latter one is less strict with the query syntax error.
First, make sure that your field is mapped with type text. The example below create a mapping for field named my_field under the test-index.
{
"test-index" : {
"mappings" : {
"properties" : {
"my_field" : {
"type" : "text"
}
}
}
}
}
Then, for searching, use query string query with wild-cards.
{
"query": {
"query_string": {
"fields": ["my_field"],
"query": "*dence ab*"
}
}
}

term Elastic query not finding hit for alphanumeric keyword texts

how would I use the term or terms elastic query in the below situation?
The mapping set for the field paidagentnumber is as below
"paidagentnumber": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
GET statements_full/statements/_search?q=paidagentnumber:"017M0"
is finding 80 records in the system
But the following GET is not getting any records
GET statements_full/statements/_search
{
"query": {
"term" : {"paidagentnumber":"017M0"}
}
}
But
GET statements_full/statements/_search
{
"query": {
"term" : {"paidagentnumber":"21212"}
}
}
is finding hit results i.e. If the paidagentnumber field doesn't have alphabets in it (just numeric) the term query is finding a hit.
The paidagentnumber field is analyzed (since it is "type": "text"), and so it's contents will be lowercased by the analyzer. Term queries are not analyzed, they are used, according to the docs, to find "the exact term specified in the inverted index", and so this lowercasing analysis will not be applied to the query, thus the mismatch between the two. "017M0" is not an exact match for "017m0".
You have options here:
Use a query_string query. This is the same type of query the q param creates, so if that is working as you want, use that:
"query": {
"query_string" : {
"query": "paidagentnumber:017M0"
}
}
If you want to use term queries, use a keyword field. You've already created one here, so just search the paidagentnumber.keyword field, instead.
"query": {
"term" : {
"paidagentnumber.keyword": "017M0"
}
}
If you didn't really want any sort of text analysis on the paidagentnumber field in the first place, then you could, alernatively, eliminate the text type from it's definition:
"paidagentnumber": {
"type": "keyword",
"ignore_above": 256
}

How to make use of `gt` and `fields` in the same query in Elasticsearch

In my previous question, I was introduced to the fields in a query_string query and how it can help me to search nested fields of a document.
{
"query": {
"query_string": {
"fields": ["*.id","id"],
"query": "2"
}
}
}
But it only works for matching, what if I want to do some comparison? After some reading and testing, it seems queries like range do not support fields. Is there any way I can perform a range query, e.g. on a date, over a field that can be scattered anywhere in the document hierarchy?
i.e. considering the following document:
{
"id" : 1,
"Comment" : "Comment 1",
"date" : "2016-08-16T15:22:36.967489",
"Reply" : [ {
"id" : 2,
"Comment" : "Inner comment",
"date" : "2016-08-16T16:22:36.967489"
} ]
}
Is there a query searching over the date field (like date > '2016-08-16T16:00:00.000000') which matches the given document, because of the nested field, without explicitly giving the address to Reply.date? Something like this (I know the following query is incorrect):
{
"query": {
"range" : {
"date" : {
"gte" : "2016-08-16T16:00:00.000000",
},
"fields": ["date", "*.date"]
}
}
}
The range query itself doesn't support it, however, you can leverage the query_string query (again) and the fact that you can wildcard fields and that it supports range queries in order to achieve what you need:
{
"query": {
"query_string": {
"query": "\*date:[2016-08-16T16:00:00.000Z TO *]"
}
}
}
The above query will return your document because Reply.date matches *date

Elasticsearch doesn't return results for a specific term search

I am attempting to do a query where I filter on term for a specific term. This is the query I am attempting to run:
{
"query": {
"filtered": {
"filter": {
"term": {
"tags": "sports"
}
}
}
},
"sort": {
"timestamp": "desc"
}
}
When I run the same query with a different field (ex: "type": "blog_post") it works, so I am confident in the syntax.
I checked to make sure that tags was properly mapped (I checked at "http://server_name/index/_mapping") and it was.
I also checked that there are documents with "tags" : "sports" in Elasticsearch.
Any ideas what the issue could be? It is only that field, all others work, and "tags" is indexed.
What is the mapping/analyzer you have defined for the field "tags"? If you have not defined any analyzer then it will be analysed using the standard analyzer which in turn will give stemmed token "sport" instead of "sports"
If you do a term search or term filter the input is not analyzed, and will try to search for an exact match. So search for term "sports" won't match.
You should either change the mapping for tags to "not_analyzed" or change the search query to something other than term, like query string query.
Based on a use case you've described I assume tags is mapped as an array of values. That said, term filter can only be used for exact matches.
What I would try is to use terms filter or exist filter instead and change the query to this:
"terms" : { "tags" : "sports" }
or this
"exists" : { "tags" : "sports" }

Favor exact matches over nGram in elasticsearch

I am trying to map a field as nGram and 'exact' match, and make the exact matches appear first in the search results. This is an answer to a similar question, but I am struggling to make it work.
No matter what boost value I specify for the 'exact' field I get the same results order each time. This is how my field mapping looks:
"name" : {
"type" : "multi_field",
"fields" : {
"name" : {
"type" : "string",
"boost" : 2.0,
"analyzer" : "ngram"
},
"exact" : {
"type" : "string",
"boost" : 4.0,
"analyzer" : "simple",
"include_in_all" : false
}
}
}
And this is how the query looks like:
{
"query": {
"filtered": {
"query": {
"query_string": {
"fields":["name","name.exact"],
"query":"Woods"
}
}
}
}
}
Understating how score is calculated
Elasticsearch has an option for producing an explanation with every search result. by setting the explain parameter to be true
POST <Index>/<Type>/_search?explain&format=yaml
{
"query" : " ....."
}
it will produce a lot of output for every hit and that can be overwhelming, but it worth taking some time to understand what it all means
the output of eplian might be harder to read in json, so adding format=yaml makes it easier to read
Understanding why a document is matched or not
you can pass the query to a specific document like below to see explanation how matching is being done.
GET <Index>/<type>/<id>/_explain
{
"query": "....."
}
The multi_field mapping is correct, but the search query needs to be changed like this:
{
"query": {
"filtered": {
"query": {
"multi_match": { # changed from "query_string"
"fields": ["name","name.exact"],
"query": "Woods",
# added this so the engine does a "sum of" instead of a "max of"
# this is deprecated in the latest versions but works with 0.x
"use_dis_max": false
}
}
}
}
}
Now the results take into account the 'exact' match and adds up to the score.

Resources