Retrieve variants generated by Elasticsearch on ingest - elasticsearch

Elasticsearch versions 7 and 8.
When Elasticsearch ingests data, it generates certain phonetic keys for the tokens (and other types depending on the analyzer you specify). Is there a way to retrieve and view these for a given document via query?

You can use the analyze API on specific index and provide the text of your field in your documents to see the tokens generated by Elasticsearch.
Please refer to the examples given in the documentation.

Related

How about including JSON doc version? Is it possible for elastic search, to include different versions of JSON docs, to save and to search?

We are using ElasticSearch to save and manage information on complex transactions. We might need to add more information for every transaction, on the near future.
How about including JSON doc version?
Is it possible for elastic search, to include different versions of JSON docs, to save and to search?
How does this affects performance on ElasticSearch?
It's completely possible, By default elastic uses the dynamic mappings for every new documents such as your JSON documents to index them. For each field in your documents elastic creates a table called inverted_index and the search queries executed against them so regardless of your field variation as long as you know which field you want to execute query the data throughput and performance will not be affected.

Retrieve synonyms list from database in Elasticsearch

Is it possible to configure Elasticsearch to retrieve synonyms list from database/sql instead of file?
Unfortunately it is not possible.
ES supports file or online synonyms. It supports solr synonyms format..
doc
But you can add all the synonyms along with original field as an array in ES. While querying, you can use that array field and retrieve the original field.

Is there any tool out there for generating elasticsearch mapping

Mostly what I do is to assemble the mapping by hand. Choosing the correct types myself.
Is there any tool which facilitates this?
For example which will read a class (c#,java..etc) and choosing the closest ES types accordingly.
I've never seen such a tool, however I know that ElasticSearch has a REST API over HTTP.
So you can create a simple HTTP query with JSON body that will depict your object with your fields: field names + types (Strings, numbers, booleans) - pretty much like a Java/C# class that you've described in the question.
Then you can ask the ES to store the data in the non-existing index (to "index" your document in ES terms). It will index the document, but it will also create an index, and the most importantly for your question, will create a mapping for you "dynamically", so that later you will be able to query the mapping structure (again via REST).
Here is the link to the relevant chapter about dynamically created mappings in the ES documentation
And Here you can find the API for querying the mapping structure
At the end of the day you'd still want to retain some control over how your mapping is generated. I'd recommend:
syncing some sample documents w/o a mapping
investigating what mapping was auto generated and
dropping the index & using dynamic_templates to pseudo-auto-generate / update the mapping as new documents come in.
This GUI could help too.
Currently, there is no such tool available to generate the mapping for elastic.
It is a kind of similar thing as we have to design a database in MySQL.
But if we want such kind of thing then we use Mongo DB which requires no predefined schema.
But Elastic comes with its very dynamic feature, which allows us to play around it. One of the most important features of Elasticsearch is that it tries to get out of your way and let you start exploring your data as quickly as possible like the mongo schema which can be manipulated dynamically.
To index a document, you don’t need to first define a mapping or schema and define your fields along with their data type .
You can just index a document and the index, type, and fields will be created automatically.
For further details you can go through the below documentation:
Elastic Dynamic Mapping

Elasticsearch: Show which analyzer was used in analyze api

I'm trying to figure out how elasticsearch analyzers work exactly and I'm using the _analyze api e.g. _analyze?text=http://www.google.com
Does elasticsearch provide the information of which analyzer was used?
Although the information provided is step by step of the analysis performed, some analyzers may produce the same output so instead of trying to force a different output in order to check which analyzer was used, I was wondering if this can be provided by the api.
I'm using ElasticSearch 1.7.5
It will not give you the analyzer being used because it's supposed to be specified either in the command itself with ?analyzer= or using the analyzer from the index or from the field that's being used in the command.
Also, there are rules related to which analyzer is being used and you should be able to determine from these which one is actually applied: https://www.elastic.co/guide/en/elasticsearch/guide/current/_controlling_analysis.html#_default_analyzers

Where do i apply analyzers and token filters in elasticsearch while indexing documents?

I am trying to implement an analyzer (uppercase) and index some documents after that in elasticsearch. My question is, am i following the correct procedure?
Implement your analyzer (containing index and type name), which would create the index if it doesnt exist
Then index the documents with the same index and type name as above during which stream of text would pass through the analyzer and then would be saved in index.
Is this the correct way to go about it?
I indexed some documents with and without using analyzers, checked the contents of index before/after using Facets, and they were no different.
The content is not supposed to be different. How it's indexed is. You should recognize the difference because queries would have different results, like some documents are found which weren't without the analyzers, and viceversa.
Try for instance a March Query.
The _score may and should also change

Resources