How can an Elasticsearch index be made globally case-sensitive? - elasticsearch

By default, ES is case-insensitive. There are examples ( eg case insensitive search in elasticsearch ) of how to define an analyzer for a specific field in ES.
I have a large number of data types with varying fields being loaded, and it's totally impractical for me to set the analyzer on fields by name.
I was previously using Solr, and accomplished a globally case-sensitive search by using dynamicFields for all of my data, and editing schema.xml to modify the "text" fieldtype to remove the LowerCaseFilterFactory from the analyzer.
How can I do something similar in ES?

Have a look at the elasticsearch documentation for the Analysis index module. There's a Default analyzers section which says:
The default logical name allows one to configure an analyzer that will
be used both for indexing and for searching APIs. The default_index
logical name can be used to configure a default analyzer that will be
used just when indexing, and the default_search can be used to
configure a default analyzer that will be used just when searching.
I guess that is what you're looking for. Probably good to know that the default analyzer in elasticsearch is the StandardAnalyzer.

Related

Elasticsearch 7 - Sort on custom field of multi-field property

I am working on upgrading a system at work from using ES1 to ES7.
Part of the ES1 implementation included a custom plugin to add an analyzer for custom sorting. The custom sorting behavior we have is similar to "natural sort", but extended to deal with legal codes. For example, it will sort 1.1.1 before 1.10.1. We've been calling this "legal sort". We used this plugin to add an extra .legalsort field to multi-field properties in our index, and then we would sort based on this field when searching.
I am currently trying to adapt the main logic for indexing and searching to ES7. I am not trying to replace the "legal sort" plugin yet. When trying to implement sorting for searches, I ran into the error Fielddata is disabled on text fields by default. The solution I've seen suggested for that is to add a .keyword field for any text properties, which will be used for sorting and aggregation. This "works", but I don't see how I can then apply our old logic of sorting based on a .legalsort field.
Is there a way to sort on a field other than .keyword, which can use a custom analyzer, like we were able to in ES1?
The important aspect is not the name of your field (like *.keyword), but the type of field. For exact match searches, sorting and aggregation the type of the field should be “keyword“.
If you only use the legalsort field for display, sorting, aggregations or exact match, simply change the type from “text” to “keyword”.
If you want to use the same information for both purposes, it’s recommended to make it a multi-field by itself. Use the “keyword”-type field for sorting, aggregations and exact match search and use the “text”-type field for full-text search.
Having 2 types available for the 2 purposes is a significant improvement over the single string type you had in ES 1.0. When you sorted in ES 1.0, the information stored in the inverted index, had to get uninverted and was kept in RAM. This datastructure was/has been called fielddata. It was unbounded and often caused out-of-memory exceptions. Newer versions of Lucene introduced an alternative data structure which resides on disk (and in the file system cache) as a “replacement” to the “fielddata” data structure. It’s named doc-values and allows to sort on huge amounts of data without consuming significant amount of heap RAM. The only drawback: docvalues are not available for analyzed text (fields of type text), hence the need for a field of type keyword.
You also could set the mapping parameter “fielddata” to true for your legalsort field, enabling fielddata for this particular field to get back the previous behaviour with all its drawbacks

Elasticsearch set default field analyzer for index

I was wondering if it is possible to modify the behviour of ES when dynamically mapping a field. In my case I don't want ES to map anything. Most of the fields I have are considered text by ES when the field occurs for the first time.
The correct mapping though for our application is 99% always keyword since we don't want the tokenizer to run on it. Can we modify the behaviour for new fields to be always mapped as keyword (unless defined otherwise in the index mapping of course)
Cheers and thanks!
You can use dynamic templates to solve your issue. Moreover, Elasticsearch guide has snippet which is suitable for your case.

What's a difference between indexing document after creating an index mapping AND creating an document directly with indexing in Elasticsearch

I'm confused with mapping and indexing. As I know, mapping a index is make kinda a schema of document.
My point is when I'm creating a document, there are several ways.
1) mapping an index -> indexing documents
2) when creating documents, simultaneously mapping would be done.
then, why do I have to do a mapping for some cases?
Elasticsearch allows to not define mapping for fields because it has some options to detect field types and apply its default mapping. https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html
It's good practice to always default mapping explicitly, relying on ES algorithm can cause unpredictable results.
If you really need some dynamic mappings because for example you don't know all required fields while defining mapping you can use something like dynamic templates or default mapping.
There are several known limitations with default mappings (like text is defaulted to keyword and anything above 256 bytes is ignored). That might work for most of the cases where data is from log. But having a mapping type allows you more control on what kind of indexing to be done and whether to index a field or not. Depending on your use case, the preferred option (defining mapping vs on the fly mapping) might be different.

Elasticsearch: Show which analyzer was used in analyze api

I'm trying to figure out how elasticsearch analyzers work exactly and I'm using the _analyze api e.g. _analyze?text=http://www.google.com
Does elasticsearch provide the information of which analyzer was used?
Although the information provided is step by step of the analysis performed, some analyzers may produce the same output so instead of trying to force a different output in order to check which analyzer was used, I was wondering if this can be provided by the api.
I'm using ElasticSearch 1.7.5
It will not give you the analyzer being used because it's supposed to be specified either in the command itself with ?analyzer= or using the analyzer from the index or from the field that's being used in the command.
Also, there are rules related to which analyzer is being used and you should be able to determine from these which one is actually applied: https://www.elastic.co/guide/en/elasticsearch/guide/current/_controlling_analysis.html#_default_analyzers

How to find the globally defined analyzer name in Elastic search?

When searching in Elastic search, by default, the globally defined analyzer is used. How can I find out what this analyzer is ? We are using a Elastic search saas provider and I thus want to find out what the setting is ?
As far as I am aware, Elasticsearch will use the Standard Analyzer as default if none other is specified upon index creation.

Resources