Elasticsearch term query issue - elasticsearch

As the pictures show the record field of dispatchvoucher value is "True".
But when I searched with the term it couldn´t found any record.
when I changed the value to "true", the result matched. What's the reason for this?

As mentioned in the documentation :
Avoid using the term query for text fields.
By default, Elasticsearch changes the values of text fields as part of
analysis. This can make finding exact matches for text field values
difficult.
To search text field values, use the match query instead.
The standard analyzer is the default analyzer which is used if none is specified. It provides grammar-based tokenization.
GET /_analyze
{
"analyzer" : "standard",
"text" : "True"
}
The token generated is -
{
"tokens": [
{
"token": "true",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Term query returns documents that contain an exact term in a provided field. Since True gets tokenized to true, so when you are using the term query for "dispatchvoucher": "True", it will not show any results.
You can either change your index mapping to
{
"mappings": {
"properties": {
"dispatchvoucher": {
"type": "keyword"
}
}
}
}
OR You need to add .keyword to the dispatchvoucher field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after dispatchvoucher field).
Adding a working example with index data, search query, and search result
Index Data:
{
"dispatchvoucher": "True"
}
Search Query:
{
"query": {
"bool": {
"filter": {
"term": {
"dispatchvoucher.keyword": "True"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "65605120",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"dispatchvoucher": "True"
}
}
]

Related

How to search exact text without matching case in Elasticsearch

I want to search the user name in the Elasticsearch. For this I want to match the exact username ignoring its case whether it is capital or small, I just want to find that user name. I'm using the following query for this:
QueryBuilder queryBuilder = QueryBuilders.termQuery("user_name.keyword", userName);
NativeSearchQuery build = new NativeSearchQueryBuilder().withQuery(queryBuilder).build();
List<Company> companies = elasticsearchTemplate.queryForList(build, User.class);
But it is also matching the exact word with the case. for example: if the user name is "Ram" and I search "ram" then it is not returning that name. If I search "Ram" then it is giving me the result. But I want that it only matches the word not the case of that word. Please, someone, help me to solve this problem. I searched a lot but couldn't find any solution.
Issue is you are using user_name.keyword and terms query. Terms query matches exact word instead of that you can use MatchQueryBuilder query :
Code :
QueryBuilder queryBuilder = QueryBuilders.matchQuery("user_name", userName);
NativeSearchQuery build = new NativeSearchQueryBuilder().withQuery(queryBuilder).build();
List<Company> companies = elasticsearchTemplate.queryForList(build, User.class);
When using .keyword field, elastic does not analyze the text but if you use your text field ElasticSerach analyzes your text using default analyzer on that field. Default Analyzer basically converts your text in lowercase and remove stopwords from it. You can read about it from here : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html
Since you want to do case insensitive search so you don't need to use .keyword.
Also, terms query matches exact terms but again since you want to do case insensitive search you should you match query which also by default internally converts your search text in lowercase and then search the field for that text.
And, now since both your field and search term is in lowercase you can do case insensitive search but this will not do exact match.
For doing exact case insensitive match you need to update your index and use normalizer with your keyword field which guarantees that the analysis chain produces a single token and case insensitive search. You can read more about it from here.
Index Creation:
curl -X PUT "localhost:9200/<index-name>" -H 'Content-Type: application/json' -d
{
"settings": {
"analysis": {
"normalizer": {
"case_insensitive_normalizer": {
"type": "custom",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"user_name": {
"type": "keyword",
"normalizer": "case_insensitive_normalizer"
}
}
}
}
I have indexed these documents :
Doc1 :
{
"user_name": "Ram"
}
Doc2 :
{
"user_name": "Ram Mohan"
}
Search Query :
{
"query" : {
"match" : {
"user_name" : "ram"
}
}
}
Result :
"hits": [
{
"_source": {
"user_name": "Ram"
}
}
]
Try to use Lowercase Token Filter in your index mapping.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lowercase-tokenfilter.html
Such token filter is applied in both indexing and searching so "Ram" will be indexed as "ram" and then if you'll search for "rAm" it'll be changed to "ram" so it'll hit your document.
If you want to do case insensitive match on a keyword field, you can use normalizer with a lowercase filter
The normalizer property of keyword fields is similar to analyzer
except that it guarantees that the analysis chain produces a single
token.
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
}
Data
POST index41/_doc
{
"name":"Ram"
}
Query:
{
"query": {
"term": {
"name.keyword": {
"value": "ram"
}
}
}
}
Result:
"hits" : [
{
"_index" : "index41",
"_type" : "_doc",
"_id" : "IyieGHIBZsF59xnAhb47",
"_score" : 0.6931471,
"_source" : {
"name" : "Ram"
}
}
]
You can simply use the text field on your user-name field, text field uses by default standard analyzer which lowercase the tokens, and then match query applies the same analyzer which is used index time(in this case, standard) which will provide you case-insensitive search.
Tokens generated using the standard analyzer
POST /_analyzer
{
"text" : "ram",
"analyzer" : "standard"
}
{
"tokens": [
{
"token": "ram",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
}
]
}

SuggestionBuilder with BoolQueryBuilder in Elasticsearch

I am currently using BoolQueryBuilder to build a text search. I am having an issue with wrong spellings. When someone searches for a "chiar" instead of "chair" I have to show them some suggestions.
I have gone through the documentation and observed that the SuggestionBuilder is useful to get the suggestions.
Can I send all the requests in a single query, so that I can show the suggestions if the result is zero?
No need to send different search terms ie chair, chiar to get suggestions, it's not efficient and performant and you don't know all the combinations which user might misspell.
Instead, Use the fuzzy query or fuzziness param in the match query itself, which can be used in the bool query.
Let me show you an example, using the match query with the fuzziness parameter.
index def
{
"mappings": {
"properties": {
"product": {
"type": "text"
}
}
}
}
Index sample doc
{
"product" : "chair"
}
Search query with wrong term chiar
{
"query": {
"match" : {
"product" : {
"query" : "chiar",
"fuzziness" : "4" --> control it according to your application
}
}
}
}
Search result
"hits": [
{
"_index": "so_fuzzy",
"_type": "_doc",
"_id": "1",
"_score": 0.23014566,
"_source": {
"product": "chair"
}
}

Elasticsearch : Completion suggester not working with whitespace Analyzer

I am new to Elastic search and I am trying to create one demo of Completion suggester with whitespace Analyzer.
As per the documentation of Whitespace Analyzer, It breaks text
into terms whenever it encounters a whitespace character. So my
question is do it works with Completion suggester too?
So for my completion suggester prefix : "ela", I am expecting output
as "Hello elastic search."
I know an easy solution for this is to add multi-field input as :
"suggest": {
"input": ["Hello","elastic","search"]
}
However, if this is the solution then what is meaning of using analyzer? Does analyzer make sense in completion suggester?
My mapping :
{
"settings": {
"analysis": {
"analyzer": {
"completion_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
},
"mappings": {
"my-type": {
"properties": {
"mytext": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"suggest": {
"type": "completion",
"analyzer": "completion_analyzer",
"search_analyzer": "completion_analyzer",
"max_input_length": 50
}
}
}
}
}
My document :
{
"_index": "my-index",
"_type": "my-type",
"_id": "KTWJBGEBQk_Zl_sQdo9N",
"_score": 1,
"_source": {
"mytext": "dummy text",
"suggest": {
"input": "Hello elastic search."
}
}
}
Search request :
{
"suggest": {
"test-suggest" : {
"prefix" :"ela",
"completion" : {
"field" : "suggest",
"skip_duplicates": true
}
}
}
}
This search is not returning me the correct output, but if I use prefix = 'hel' I am getting correct output : "Hello elastic search."
In brief I would like to know is whitespace Analyzer works with completion suggester?
and if there is a way, can you please suggest me.
PS: I have already look for this links but I didn't find useful answer.
ElasticSearch completion suggester Standard Analyzer not working
What Elasticsearch Analyzer to use for this completion suggester?
I find this link useful Word-oriented completion suggester (ElasticSearch 5.x). However they have not use completion suggester.
Thanks in advance.
Jimmy
The completion suggester cannot perform full-text queries, which means that it cannot return suggestions based on words in the middle of a multi-word field.
From ElasticSearch itself:
The reason is that an FST query is not the same as a full text query. We can't find words anywhere within a phrase. Instead, we have to start at the left of the graph and move towards the right.
As you discovered, the best alternative to the completion suggester that can match the middle of fields is an edge n-gram filter.
gI know this question is ages old, but have you tried have multiple suggestions, one based on prefix and the next one based in regex ?
Something like
{
"suggest": {
"test-suggest-exact" : {
"prefix" :"ela",
"completion" : {
"field" : "suggest",
"skip_duplicates": true
}
},
"test-suggest-regex" : {
"regex" :".*ela.*",
"completion" : {
"field" : "suggest",
"skip_duplicates": true
}
}
}
}
Use results from the second suggest when the first one is empty. The good thing is that meaningful phrases are returned by the Elasticsearch suggest.
Shingle based approach, using a full query search and then aggregating based on search terms sometimes gives broken phrases which are contextually wrong. I can write more if you are interested.

How to apply synonyms at query time instead of index time in Elasticsearch

According to the elasticsearch reference documentation, it is possible to:
Expansion can be applied either at index time or at query time. Each has advantages (⬆)︎ and disadvantages (⬇)︎. When to use which comes down to performance versus flexibility.
The advantages and disadvantages all make sense and for my specific use I want to make use of synonyms at query time. My use case is that I want to allow admin users in my system to curate these synonyms without having to reindex everything on an update. Also, I'd like to do it without closing and reopening the index.
The main reason I believe this is possible is this advantage:
(⬆)︎ Synonym rules can be updated without reindexing documents.
However, I can't find any documentation describing how to apply synonyms at query time instead of index time.
To use a concrete example, if I do the following (example stolen and slightly modified from the reference), it seems like this would apply the synonyms at index time:
/* NOTE: This was all run against elasticsearch 1.5 (if that matters; documentation is identical in 2.x) */
// Create our synonyms filter and analyzer on the index
PUT my_synonyms_test
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"queen,monarch"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
}
}
// Create a mapping that uses this analyzer
PUT my_synonyms_test/rulers/_mapping
{
"properties": {
"name": {
"type": "string"
},
"title": {
"type": "string",
"analyzer": "my_synonyms"
}
}
}
// Some data
PUT my_synonyms_test/rulers/1
{
"name": "Elizabeth II",
"title": "Queen"
}
// A query which utilises the synonyms
GET my_synonyms_test/rulers/_search
{
"query": {
"match": {
"title": "monarch"
}
}
}
// And we get our expected result back:
{
"took": 42,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4142135,
"hits": [
{
"_index": "my_synonyms_test",
"_type": "rulers",
"_id": "1",
"_score": 1.4142135,
"_source": {
"name": "Elizabeth II",
"title": "Queen"
}
}
]
}
}
So my question is: how could I amend the above example so that I would be using the synonyms at query time?
Or am I barking up completely the wrong tree and can you point me somewhere else please? I've looked at plugins mentioned in answers to similar questions like https://stackoverflow.com/a/34210587/2240218 and https://stackoverflow.com/a/18481495/2240218 but they all seem to be a couple of years old and unmaintained, so I'd prefer to avoid these.
Simply use search_analyzer instead of analyzer in your mapping and your synonym analyzer will only be used at search time
PUT my_synonyms_test/rulers/_mapping
{
"properties": {
"name": {
"type": "string"
},
"title": {
"type": "string",
"search_analyzer": "my_synonyms" <--- change this
}
}
}
To use the custom synonym filter at QUERY TIME instead of INDEX TIME, you first need to remove the analyzer from your mapping:
PUT my_synonyms_test/rulers/_mapping
{
"properties": {
"name": {
"type": "string"
},
"title": {
"type": "string"
}
}
}
You can then use the analyzer that makes use of the custom synonym filter as part of a query_string query:
GET my_synonyms_test/rulers/_search
{
"query": {
"query_string": {
"default_field": "title",
"query": "monarch",
"analyzer": "my_synonyms"
}
}
}
I believe the query_string query is the only one that allows for specifying an analyzer since it uses a query parser to parse its content.
As you said, when using the analyzer only at query time, you won't need to re-index on every change to your synonyms collection.
Apart from using the search_analyzer, you can refresh the synonyms list by restarting the index after making changes in the synonym file.
Below is the command to restart your index
curl -XPOST 'localhost:9200/index_name/_close'
curl -XPOST 'localhost:9200/index_name/_open'
After this automatically your synonym list will be refreshed without the need to reingest the data.
I followed this reference Elasticsearch — Setting up a synonyms search to configure the synonyms in ES

Elastic Search Term Query Not Matching URL's

I am a beginner with Elastic search and I am working on a POC from last week.
I am having a URL field as a part of my document which contains URL's in the following format :"http://www.example.com/foo/navestelre-04-cop".
I can not define mapping to my whole object as every object has different keys except the URL.
Here is how I am creating my Index :
POST
{
"settings" : {
"number_of_shards" : 5,
"mappings" : {
"properties" : {
"url" : { "type" : "string","index":"not_analyzed" }
}
}
}
}
I am keeping my URL field as not_analyzed as I have learned from some resource that marking a field as not_analyzed will prevent it from tokenization and thus I can look for an exact match for that field in a term query.
I have also tried using the whitespace analyzer as the URL value thus not have any of the white space character. But again I am unable to get a successful Hit.
Below is my term query :
{
"query":{
"constant_score": {
"filter": {
"term": {
"url":"http://www.example.com/foo/navestelre-04-cop"
}
}
}
}
}
I am guessing the problem is somewhere with the Analyzers and Tokenizers but I am unable to get to a solution. Any kind of help would be great to enhance my knowledge and would help me reach to a solution.
Thanks in Advance.
You have the right idea, but it looks like some small mistakes in your settings request are leading you astray. Here is the final index request:
POST /test
{
"settings": {
"number_of_shards" : 5
},
"mappings": {
"url_test": {
"properties": {
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Notice the added url_test type in the mapping. This lets ES know that your mapping applies to this document type. Also, settings and mappings are also different keys of the root object, so they have to be separated. Because your initial settings request was malformed, ES just ignored it, and used the standard analyzer on your document, which led to you not being able to query it with your query. I point you to the ES Mapping docs
We can index two documents to test with:
POST /test/url_test/1
{
"url":"http://www.example.com/foo/navestelre-04-cop"
}
POST /test/url_test/2
{
"url":"http://stackoverflow.com/questions/37326126/elastic-search-term-query-not-matching-urls"
}
And then execute your unmodified search query:
GET /test/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"url": "http://www.example.com/foo/navestelre-04-cop"
}
}
}
}
}
Yields this result:
"hits": [
{
"_index": "test",
"_type": "url_test",
"_id": "1",
"_score": 1,
"_source": {
"url": "http://www.example.com/foo/navestelre-04-cop"
}
}
]

Resources