Less restrictive search doesn't return any hits in ElasticSearch - elasticsearch

The query below returns hits, for example where name is "Balances by bank":
GET /_search
{ "query": {
"multi_match": { "query": "Balances",
"fields": ["name","descrip","notes"]
}
}
}
So why this doesn't return anything? Note that the query is less restrictive, the word is "Balance" and not "Balances" with an s.
GET /_search
{ "query": {
"multi_match": { "query": "Balance",
"fields": ["name","descrip","notes"]
}
}
}
What search would return both?

You need to change your mapping to be able to do that.
If you didn't specified a mapping with specific analyzers when creating your index, elasticsearch will use the default mapping and analyzer.
The default mapping will map each text field as both text and keyword, so you will be able to performe full text search (match part of the string) and keyword search (match the whole string), but it will use the standard analyzer.
With the standard analyzer your example Balances by bank becomes the following list of tokens: [Balances, by, bank], those items are added to the inverted index and elasticsearch can find the documents when you search for any of them.
When you search for just Balance, this term does not exist in the inverted index and elasticsearch returns nothing.
To be able to return both Balance and Balances you need to change your mapping and use the analyzer for the english language, this analyzer will reduce your terms to their stem and match Balance, Balances as also Balancing, Balanced, Balancer etc.
Look at this part of the documentation to see how the analysis process work.
And of course, you can also search for Balance* and it will return both Balance and Balances, but it is a different query.

Related

Elasticsearch: What is the difference between a match and a term in a filter?

I was following an ES tutorial, and at some point I wrote a query using term in the filter instead the recommended solution using match. My understanding is that match was used in the query part to get scoring, while term was used in the filter part to just remove hits before enter the query part. To my surprise match also works in the filter part.
What is the difference between:
GET blogs/_search
{
"query": {
"bool": {
"filter": {
"match": {
"category.keyword": "News"
}
}
}
}
}
and:
GET blogs/_search
{
"query": {
"bool": {
"filter": {
"term": {
"category.keyword": "News"
}
}
}
}
}
Both returns the same hits, and the score is 0 for all hits.
What is the behaviour or match in a filter clause? I would expect it to yield some score, but it does not.
What I thought:
term : does not analyze either the parameter or the field, and it is a yes/no scenario.
match : analyzes parameter and field and calculates a score of how good they match.
But when using match against a keyword in the filter part of the query, how does it behave?
The match query is a high-level query that resorts to using a term query if it needs to.
Scoring has nothing to do with using match instead of term. Scoring kicks in when you use bool/must/should instead of bool/filter.
Here is how the match query works:
First, it checks the type of the field.
If it's a text field then the value will be analyzed, either with the analyzer specified in the query (if any), or with the search- or index-time analyzer specified in the mapping.
If it's a keyword field (like in your case), then the input is not analyzed and taken "as is"
Since you're using the match query on a keyword field and your input is a single term, nothing is analyzed and the match query resorts to using a term query underneath. This is why you're seeing the same results.
In general, it's always best to use a match query as it is smart enough to know what to do given the field you're querying and the input data you're searching for.
You can read more about the difference between the two here.

ElasticSearch Search query is not case sensitive

I am trying to search query and it working fine for exact search but if user enter lowercase or uppercase it does not work as ElasticSearch is case insensitive.
example
{
"query" : {
"bool" : {
"should" : {
"match_all" : {}
},
"filter" : {
"term" : {
"city" : "pune"
}
}
}
}
}
it works fine when city is exactly "pune", if we change text to "PUNE" it does not work.
ElasticSearch is case insensitive.
"Elasticsearch" is not case-sensitive. A JSON string property will be mapped as a text datatype by default (with a keyword datatype sub or multi field, which I'll explain shortly).
A text datatype has the notion of analysis associated with it; At index time, the string input is fed through an analysis chain, and the resulting terms are stored in an inverted index data structure for fast full-text search. With a text datatype where you haven't specified an analyzer, the default analyzer will be used, which is the Standard Analyzer. One of the components of the Standard Analyzer is the Lowercase token filter, which lowercases tokens (terms).
When it comes to querying Elasticsearch through the search API, there are a lot of different types of query to use, to fit pretty much any use case. One family of queries such as match, multi_match queries, are full-text queries. These types of queries perform analysis on the query input at search time, with the resulting terms compared to the terms stored in the inverted index. The analyzer used by default will be the Standard Analyzer as well.
Another family of queries such as term, terms, prefix queries, are term-level queries. These types of queries do not analyze the query input, so the query input as-is will be compared to the terms stored in the inverted index.
In your example, your term query on the "city" field does not find any matches when capitalized because it's searching against a text field whose input underwent analysis at index time. With the default mapping, this is where the keyword sub field could help. A keyword datatype does not undergo analysis (well, it has a type of analysis with normalizers), so can be used for exact matching, as well as sorting and aggregations. To use it, you would just need to target the "city.keyword" field. An alternative approach could also be to change the analyzer used by the "city" field to one that does not use the Lowercase token filter; taking this approach would require you to reindex all documents in the index.
Elasticsearch will analyze the text field lowercase unless you define a custom mapping.
Exact values (like numbers, dates, and keywords) have the exact value
specified in the field added to the inverted index in order to make
them searchable.
However, text fields are analyzed. This means that their values are
first passed through an analyzer to produce a list of terms, which are
then added to the inverted index. There are many ways to analyze text:
the default standard analyzer drops most punctuation, breaks up text
into individual words, and lower cases them.
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
So if you want to use a term query — analyze the term on your own before querying. Or just lowercase the term in this case.
To Solve this issue i create custom normalization and update mapping to add,
before we have to delete index and add it again
First Delete the index
DELETE PUT http://localhost:9200/users
now create again index
PUT http://localhost:9200/users
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"user": {
"properties": {
"city": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}

Search text in elastic search ignoring uppercase and lowercase alphabet

First of all i am new to elastic search. I have field skillName:"Android Sdk". I map this field as keyword in elastic search. But problem is that when i search by something like
POST _search
{
"query": {
"match" : { "skillName" : "Android sdk" }
}
}
sdk is small in search query. It does not give me any result. How can i search ignoring lower or upper case of text when field is mapped as keyword
Yes, it's ignoring the case different from the original, since you used keyword analyzer, which didn't do anything with the token, but rather preserving it as it is. In your case it will do a match only if you query exact same token
So, I would propose to change this behaviour and at least apply lowercase token filter, so you will be able to match terms with different register.
To search case insensitive on a keyword field you need to use a normalizer, which was introduced in 5.2.0. See here for an example.
You can apply different analyzers to same « field » and have one for full text search and another one for sorting, aggregations.
Try the following:
{
"query": {
"query_string": {
"fields": [
"skillName"
],
"query": "Android sdk"
}
}
}

What is the difference between a term query and a match one?

I have documents with string fields which are not analyzed (enforced by a mapping or set globally). I am trying to understand what is the practical difference between
{
"query": {
"bool": {
"must": [
{"match": {"hostname": "hello"}},
]
}
}
}
and
{
"query": {
"term": {
"hostname": "hello"
}
}
}
I saw in the documentation for term queries that there is a difference when the strings are analyzed (which is not my case). Is there a reason to use term vs match?
In a term query, the searched term (i.e. hello) is not analyzed and is matched exactly as is against the terms present in the inverted index.
In a match query, the searched term (i.e. hello) is analyzed first and then matched against the terms present in the inverted index.
In your case, since hostname is not_analyzed in your mapping, your first choice should be to use a term query since it makes no sense to analyze a term at search time for searching the same term that hasn't been analyzed in the first place at indexing time.

Elasticsearch how to match documents for which the field tokens are a sub-set of the query tokens

I have a keyword/key-phrase field I tokenize using standard analyser. I want this field to match if if there is a search phrase that has all tokens of this field in it.
For example if the field value is "veni, vidi, vici" and the search phrase is "Ceaser veni,vidi,vici" I want this search phrase to match but search phrase "veni, vidi" not match.
I also need "vidi, veni, vici" (weird!) to match. So the positions and ordering of the terms is not really important. A phrase match would not quite work for me I think.
I can use "bool query" with "minimum_should_match" parameter for this specific example but that is not really what I want as minimum should match is about ratio/number of tokens in the search phrase.
Pure ES solution would go like this. You will need two requests.
1) First you need to pass user query through analyze api to get all the search tokens.
curl -XGET 'localhost:9200/_analyze' -d '
{
"analyzer" : "standard",
"text" : "Ceaser veni,vidi,vici"
}'
you will get 4 tokens ceaser, veni, vidi, vici . You need to pass these tokens as an array to next search request.
2) We need to search for documents whose tokens are subset of search tokens.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"query": {
"match": {
"title": "Ceaser veni,vidi,vici"
}
}
},
{
"script": {
"script": "if(search_tokens.containsAll(doc['title'].values)){return true;}",
"params": {
"search_tokens": [
"ceaser",
"veni",
"vidi",
"vici"
]
}
}
}
]
}
}
}
}
}
Here job of first match query inside the filter is to narrow down the documents on which script should run. containsAll method will check if the documents tokens are sublist of search tokens. This will be slow but will do the job with your current set up. One big improvement you can do is store tokens as an array so that doc['title'].values can be replaced with that field which will improve the script.
Hope this helps!
No built-in solution but this works:
Add an extra field with the number of terms in the field for each document. So in your "veni, vidi, vici" example, you would have a field like "field_term_count" : 3.
Perform a separate match search for each token in the search query.
Sum the number of searches that matched for each document with at least one match (e.g. a hashtable with key of document ID and value of count).
Compare the number of matches in 3 to the "field_term_count" field for each of the documents with matches. If they are equal then the document is a match.
Then "Ceaser veni,vidi,vici" will match but the search phrases "veni, vidi" will not, as desired. It should be quite fast for reasonable numbers of matches.

Resources