Elasticsearch sorting on string not returning expected results - sorting

When sorting on a string field with multiple words, Elasticsearch is splitting the string value and using the min or max as the sort value. I.E.: when sorting on a field with the value "Eye of the Tiger" in ascending order, the sort value is: "Eye" and when sorting in descending order the value is: "Tiger".
Lets say I have "Eye of the Tiger" and "Wheel of Death" as entries in my index, when I do an ascending sort on this field, I would expect, "Eye of the Tiger" to be first, since "E" comes before "W", but what I'm seeing when sorting on this field, "Wheel of Death" is coming up first, since "D" is the min value of that term and "E" is the min value of "Eye of the Tiger".
Does anyone know how to turn off this behavior and just allow a regular sort on this string field?

As mconlin mentioned if you want to sort on the unanalyzed doc field you need to specify "index": "not_analyzed" to sort as you described. But if you're looking to be able to keep this field tokenized to search on, this post by sloan shows a great example. Using multi-field to keep two different mappings for a field is very common in Elasticsearch.
Hope this helps, let me know if I can offer more explanation.

If you want the sorting to be case-insensitive "index": "not_analyzed" doesn't work, so I've created a custom sort analyzer.
index-settings.yml
index :
analysis :
analyzer :
sort :
type : custom
tokenizer : keyword
filter : [lowercase]
Mapping:
...
"articleName": {
"type": "string",
"analyzer": "standard",
"fields": {
"sort": {
"type": "string",
"analyzer": "sort"
}
}
}
...

Related

When trying to index in elasticsearch 7.8.1, an error occurs saying "field" is too large, must be <= 32766 Is there a solution?

When trying to index in elasticsearch 7.8.1, an error occurs saying "testField" is too large, must be <= 32766 Is there a solution?
Field Info
"testField":{
"type": "keyword",
"index": false
}
It is a known issue and it is not clear yet on what is best to solve it. Lucene enforces a maximum term length of 32766, beyond which the document is rejected.
Until this gets solved, there are two immediate options you can choose from:
A. Use a script ingest processor to truncate the value to at most 32766 bytes.
PUT _ingest/pipeline/truncate-pipeline
{
"description": "truncate",
"processors": [
{
"script": {
"source": """
ctx.testField = ctx.testField.substring(0, 32766);
"""
}
}
]
}
PUT my-index/_doc/123?pipeline=truncate-pipeline
{ "testField": "hgvuvhv....sjdhbcsdc" }
B. Use a text field with an appropriate analyzer that would truncate the value, but you'd lose the ability to aggregate and sort on that field.
If you want to keep your field as a keyword, I'd go with option A

Non indexed field in sort clause

I have a field in my elastic search index mapping which would not be used for any searching. But I require it in sort clause of the query. Is it possible that I put "index" : "false" in the mapping definition ?
Basically in mapping :
"name":{
"type": "keyword",
"index": "false"
}
And in query :
"sort" : [
{"name" : {"order" : "asc"}}
]
Please read about the index option from the official elasticsearch documents, which says:
The index option controls whether field values are indexed. It accepts
true or false and defaults to true. Fields that are not indexed are
not queryable.
So, in your case, you are explicitly making it false, hence you would not be able to include in your query, hence sort queries will also not work on this field.
You can easily verify this yourself, by creating one such field in your index and see if it allows you to sort on that field.

ElasticSearch Search query is not case sensitive

I am trying to search query and it working fine for exact search but if user enter lowercase or uppercase it does not work as ElasticSearch is case insensitive.
example
{
"query" : {
"bool" : {
"should" : {
"match_all" : {}
},
"filter" : {
"term" : {
"city" : "pune"
}
}
}
}
}
it works fine when city is exactly "pune", if we change text to "PUNE" it does not work.
ElasticSearch is case insensitive.
"Elasticsearch" is not case-sensitive. A JSON string property will be mapped as a text datatype by default (with a keyword datatype sub or multi field, which I'll explain shortly).
A text datatype has the notion of analysis associated with it; At index time, the string input is fed through an analysis chain, and the resulting terms are stored in an inverted index data structure for fast full-text search. With a text datatype where you haven't specified an analyzer, the default analyzer will be used, which is the Standard Analyzer. One of the components of the Standard Analyzer is the Lowercase token filter, which lowercases tokens (terms).
When it comes to querying Elasticsearch through the search API, there are a lot of different types of query to use, to fit pretty much any use case. One family of queries such as match, multi_match queries, are full-text queries. These types of queries perform analysis on the query input at search time, with the resulting terms compared to the terms stored in the inverted index. The analyzer used by default will be the Standard Analyzer as well.
Another family of queries such as term, terms, prefix queries, are term-level queries. These types of queries do not analyze the query input, so the query input as-is will be compared to the terms stored in the inverted index.
In your example, your term query on the "city" field does not find any matches when capitalized because it's searching against a text field whose input underwent analysis at index time. With the default mapping, this is where the keyword sub field could help. A keyword datatype does not undergo analysis (well, it has a type of analysis with normalizers), so can be used for exact matching, as well as sorting and aggregations. To use it, you would just need to target the "city.keyword" field. An alternative approach could also be to change the analyzer used by the "city" field to one that does not use the Lowercase token filter; taking this approach would require you to reindex all documents in the index.
Elasticsearch will analyze the text field lowercase unless you define a custom mapping.
Exact values (like numbers, dates, and keywords) have the exact value
specified in the field added to the inverted index in order to make
them searchable.
However, text fields are analyzed. This means that their values are
first passed through an analyzer to produce a list of terms, which are
then added to the inverted index. There are many ways to analyze text:
the default standard analyzer drops most punctuation, breaks up text
into individual words, and lower cases them.
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
So if you want to use a term query — analyze the term on your own before querying. Or just lowercase the term in this case.
To Solve this issue i create custom normalization and update mapping to add,
before we have to delete index and add it again
First Delete the index
DELETE PUT http://localhost:9200/users
now create again index
PUT http://localhost:9200/users
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"user": {
"properties": {
"city": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}

Ngram Tokenizer on field, not on query

I'm having trouble finding the solution for a use case here.
Basically, it's pretty simple : I need to perform a "contains" query, like a SQL like '%...%'.
I've seen there is a regexp query, which I actually managed to get working perfectly, but as it seems to scale badly, i'm trying out nGrams. Now, I've played around with them before and know "how they work", but the behaviour isn't the one I expect it to be.
Basically, i've configured my analyzer to be mingram =2, maxgram = 20. Say I index a user called "Christophe". I want the query "Chris" to actually match, which it does, since Chris is a 5-gram of Christophe. The problem is, "Risotto" matches aswell, because it gets broken down into Ngrams and ultimately "is" is a 2-gram of "Christophe" and so it matches aswell.
What I need is the analyzer to actually break down the indexed field in nGrams at indexing time, and compare those to the FULL text query. Risotto should match Risotto, XXXRisottoXXX and so on, but not Risolo or something where the nGrams do match.
Is there any solution ?
You need to use search_analyzer setting to have distinct index time and search time analyzers.
Sample from docs:
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}

Elasticsearch 5.0.2 failed to search for keyword

I am indexing a very simple item with a field define as such:
"comid": {
"type": "keyword",
"store": "false",
"index": "no",
"include_in_all": false,
"doc_values": false
}
I then ingested a single item where comid = "this is an id"
When I query the item with exact match "this is an id"
I got this error:
cannot search on field [comid] since it is not indexed
Maybe I misunderstood the documentation but I thought we are able to search for keyword (using exact match) ?
I think I could get around this problem by changing the type from keyword to text and then using a keyword analyzer (which is noop if I understood it correctly) but it seems weird to do this for every keyword type field.
I must be missing something obvious here?
If it's not indexed, the field data is only stored in the document, not the index so you cannot search using that field.
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index.html

Resources