Elasticsearch Filtering woes, uppercase vs lowercase field filtering - elasticsearch

I have a field in my index called "status" of keyword type.
When I try to filter with {"term": {"status": "Publish"}}, it returns no hits
When I try to filter with {"term": {"status": 'publish"}}, it returns the correct result.
This would be one thing if the status was input as lowercase, but they're actually uppercase.
My kibana GET returns the products with "status": "Publish".
I also remember inserting the statuses with uppercase values. So why can I only filter by lowercase?
The big caveat, and I know its suspicious to do this, is I attempted to add the "status" mapping after the item indices were already created. Thats the main culprit for me right now as to why this is happening.
Does anyone know why the filtering only works with lowercase values when the actual value in the mapping is uppercase?

The standard analyzer is the default analyzer if no analyzer is specified. So, Publish gets indexed as publish.
If you have not explicitly defined any mapping then you need to add .keyword to the status field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after status field).
The term query does not apply any analyzers to the search term, so will only look for that exact term in the inverted index. So to search for the exact term, you need to use status.keyword OR change the mapping of the field.
{
"query": {
"term": {
"status.keyword": "Publish"
}
}
}

Related

Elasticsearch: How does search work when using combination of analyzers?

I'm a novice to Elasticsearch (ES), messing around with the analyzers. As the documentation states, the analyzer can be specifed "index time" and "search time", depending on the use case.
My document has a text field title, and i have defined the following mapping that introduces a sub-field custom:
PUT index/_mapping
{
"properties": {
"title": {
"type": "text",
"fields": {
"custom": {
"type": "text",
"analyzer": "standard",
"search_analyzer":"keyword"
}
}
}
}
}
So if i have the text : "email-id is someid#someprovider.com", the standard-analyzer would analyze the text into the following tokens during indexing:
[email, id, is, someid, someprovider.com].
However whenever I try to query on the field (with different variations in query terms) title.custom, it results in no hits.
This is what I think is happening when i query with the keyword: email:
It gets analyzed by the keyword analyzer.
The field title.custom's value also analyzed by keyword analyzer (analysis on tokens), resulting in same set of tokens as mentioned earlier.
An exact match should happen on email token, returning the document.
Clearly this is not the case and there are gaps in my understanding.
I would like to know what exactly is happening during search.
On a generic level, I would like to know how the analysis and search happens when combination of search and index analyzer is specified.
search_analyzer is set to "keyword" for title.custom, making the whole string work as a single search keyword.
So, in order to get a match on title.custom, it is needed to search for "email-id is someid#someprovider.com", not a part of it.
search_analyzer is applied at search time to override the default behavior of the analyzer applied at indexing time.
Good question, but to make it simple let me explain one by one different use cases:
Analyzers plays a role based on
Type of query (match is analyzed while term is not analyzed query).
By default, if the query is analyzed like match query it uses the same analyzer on the search term used on a field that is used at index time.
If you override the default behavior by specifying the search_analyzer on a field that at query time that analyzer is used to create the tokens which will be matched with the tokens generated depends on the analyzer(Standard is default analyzer).
Now using the above three points and explain API you can figure out what is happening in your case.
Let me know if you need further information and would be happy to explain further.
Match vs term query difference and Analyze API to see the tokens will be helpful as well.

elasticsearch wild card query not working

I seem to be running into a peculiar issue when I run my query with the match directive as below I get a hit
{
"query":
{"match": {
"value.account.names.lastName" : "*GUILLERMO*"
}
}
}
Now when I use the query with the wild card character such as below I don't get a hit.
{
"query":
{"wildcard": {
"value.account.names.lastName" : "*GUILLERMO*"
}
}
}
I am really lost as to what the issue maybe. Many thanks in advance for any input
Assuming you are trying to run wildcard query against analyzed field the behavior of Elasticsearch is totally correct. As Elasticsearch documentation states wildcard query operates on the terms level. When you index document with field name that contains string "Guillermo del Toro" value of that field will be lowercased and split into three tokens: "guillermo", "del" and "toro". Then when you run wildcard query *GUILLERMO* against name field Elasticsearch compares query string as it is with every single token trying to find a match. Here you will not get a hit just because of your query string is in uppercase and analyzed token is in lowercase.
Running wildcard queries against analyzed field is probably a bad idea but if it is strongly required I would recommend to use built-in name.keyword field instead of just name field (but again you will face a problem of case sensitivity). Better solution is to create your own lowercased not-analyzed field for that purpose.

difference between a field and the field.keyword

If I add a document with several fields to an Elasticsearch index, when I view it in Kibana, I get each time the same field twice. One of them will be called
some_field
and the other one will be called
some_field.keyword
Where does this behaviour come from and what is the difference between both of them?
PS: one of them is aggregatable (not sure what that means) and the other (without keyword) is not.
Update : A short answer would be that type: text is analyzed, meaning it is broken up into distinct words when stored, and allows for free-text searches on one or more words in the field. The .keyword field takes the same input and keeps as one large string, meaning it can be aggregated on, and you can use wildcard searches on it. Aggregatable means you can use it in aggregations in elasticsearch, which resembles a sql group by if you are familiar with that. In Kibana you would probably use the .keyword field with aggregations to count distinct values etc.
Please take a look on this article about text vs. keyword.
Briefly: since Elasticsearch 5.0 string type was replaced by text and keyword types. Since then when you do not specify explicit mapping, for simple document with string:
{
"some_field": "string value"
}
below dynamic mapping will be created:
{
"some_field": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
As a consequence, it will both be possible to perform full-text search on some_field, and keyword search and aggregations using the some_field.keyword field.
I hope this answers your question.
Look at this issue. There is some explanation of your question in it. Roughly speaking some_field is analyzed and can be used for fulltext search. On the other hand some_field.keyword is not analyzed and can be used in term queries or in aggregation.
I will try to answer your questions one by one.
Where does this behavior come from?
It is introduced in Elastic 5.0.
What is the difference between the two?
some_field is used for full text search and some_field.keyword is used for keyword searching.
Full text searching is used when we want to include individual tokens of a field's value to be included in search. For instance, if you are searching for all the hotel names that has "farm" in it, such as hay farm house, Windy harbour farm house etc.
Keyword searching is used when we want to include the whole value of the field in search and not individual tokens from the value. For eg, suppose you are indexing documents based on city field. Aggregating based on this field will have separate count for "new" and "york" instead of "new york" which is usually the expected behavior.
From Elastic 5.0 onwards, strings now will be mapped both as keyword and text by default.

How Elasticsearch multi matching with _all work?

I wanted to know how multi matching with _all work. Let's say I have the following query:
"multi_match": {
"query": x,
"type": "phrase",
"fields":"_all",
}
Does it search all available fields for the particular phrase and returns a record if the phrase exists in all fields? What if some of the fields have it and some other do not?
_all field is just field which concatenate all your fields into one big string and then analyze it in standard way - if no defined using standard analyzer for text. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
It's possible to remove some fields from _all fields while defining your mapping with param 'include_in_all' https://www.elastic.co/guide/en/elasticsearch/reference/current/include-in-all.html
So does it make sense to use phrase query on concatenation of your all fields? Rather not. I would say that multi_match can let you achieve similar goals as _all fields; you can search multiple fields in one query. But when using _all fields you can just use 'match' query.
_all field (which is removed in 6.0) index all the values from your json document whatever the field they appeared in.

ElasticSearch: How to specify specific fields to search at?

Right now in my mapping, I am setting "include_in_all" to true, which means all the fields are included in _all field.
However, when I am searching, instead of wasting space, and putting everything in the _all field, I want to specify the specific fields to certain for (and taking into account the boost scores in the mapping).
How do I create a query that tells Elastic Search to only look at specific fields(not just 1) and take into account the boosting I gave it during my mapping?
Start with a multi_match query. It allows you to query multiple fields, giving them different weights, and it's usually the way to go when you have a search box.
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^2", "message" ]
}
}
The query_string is more powerful but more dangerous too since it's parsed and can break. Use it only if you need it.
You don't need to keep data in _all field to query for a field.
You can use query_string or bool queries to search over multiple fields.

Resources