I'm trying to figure out how elasticsearch analyzers work exactly and I'm using the _analyze api e.g. _analyze?text=http://www.google.com
Does elasticsearch provide the information of which analyzer was used?
Although the information provided is step by step of the analysis performed, some analyzers may produce the same output so instead of trying to force a different output in order to check which analyzer was used, I was wondering if this can be provided by the api.
I'm using ElasticSearch 1.7.5
It will not give you the analyzer being used because it's supposed to be specified either in the command itself with ?analyzer= or using the analyzer from the index or from the field that's being used in the command.
Also, there are rules related to which analyzer is being used and you should be able to determine from these which one is actually applied: https://www.elastic.co/guide/en/elasticsearch/guide/current/_controlling_analysis.html#_default_analyzers
Related
Elasticsearch versions 7 and 8.
When Elasticsearch ingests data, it generates certain phonetic keys for the tokens (and other types depending on the analyzer you specify). Is there a way to retrieve and view these for a given document via query?
You can use the analyze API on specific index and provide the text of your field in your documents to see the tokens generated by Elasticsearch.
Please refer to the examples given in the documentation.
I changed the simple analyzer on a field to Standard analyzer and tested it locally and it's working fine. I don't have to re-index all my documents in ES.
But according to this SO post and this ES doc, looks like we need to re-index if we add/change the analyzer on a field.
I am confused as its working fine now and it would take consider amount of time if I do the re-indexing and want to avoid it, if it's not required.
Let me know if somebody faced the similar situation and what they did ?
Edit :- I am using the ES 1.7 version and I changed the analyzer on a field and just started the app again, I think my app just update the latest mapping to ES.
If you change an analyzer, of course you need to reindex your data, or at the very least the field whose analyzer was changed.
I want to build elastisearch queries using JAVA API. I want to know how to can use Lucene analyzers in elasticsearch java programs. I have checked QueryBuilders and tried to use analyzers directly as below.
QueryBuilder builder = QueryBuilders.matchQuery(searchString, fields).analyzer("porterstem");
But, it turned out to be wrong. If any one tried it, could you please give me some information?
You should define your analyzer in mapping.
So the analyzer will be used at index time and at query time.
ANALYZERS are used to analyze the documents that your are indexed. Analysis means it Ll split,the text in to tokens, normalize it, and also Lower case your indexed doc text. This analysis process Ll b more helpful while you search and searching will be faster..
You can mention analyzer while you query . But analyze the stored documents during query time. Ll b expensive. So analyze the document during indexing time. ES will analysis the doc during indexed and query time will b less and faster result.
So mention analyzers in mapping and searching efficiently..
For more information about analyzer refer
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/Analyzer.html
By default, ES is case-insensitive. There are examples ( eg case insensitive search in elasticsearch ) of how to define an analyzer for a specific field in ES.
I have a large number of data types with varying fields being loaded, and it's totally impractical for me to set the analyzer on fields by name.
I was previously using Solr, and accomplished a globally case-sensitive search by using dynamicFields for all of my data, and editing schema.xml to modify the "text" fieldtype to remove the LowerCaseFilterFactory from the analyzer.
How can I do something similar in ES?
Have a look at the elasticsearch documentation for the Analysis index module. There's a Default analyzers section which says:
The default logical name allows one to configure an analyzer that will
be used both for indexing and for searching APIs. The default_index
logical name can be used to configure a default analyzer that will be
used just when indexing, and the default_search can be used to
configure a default analyzer that will be used just when searching.
I guess that is what you're looking for. Probably good to know that the default analyzer in elasticsearch is the StandardAnalyzer.
I have implemented own Lucene Analyzer. How can I use it with ElasticSearch?
You will need to implement AnalysisBinderProcessor, which would make your analyzer available to elasticsearch and than wrap it into an elasticsearch plugin. The simplest way to do it is by starting with one of the many examples available on github.