Update Index Settings Analyzer in Java - elasticsearch

I want to update filter in my analyzer, so saw this UpdateSettingsRequestBuilder but there we need to pass the whole updated settings string. Can we just pass the updated filter like in Elastic?
My Index Settings::
"settings": {
"number_of_shards": "3",
"index.mapping.total_fields.limit": 10000,
"analysis": {
"filter": {
"minimal_english": {
"type": "stemmer",
"language": "minimal_english"
},
"synonym_graph": {
"type": "synonym_graph",
"updateable": "true"
}
}
}
}
Elastic query:
PUT /test_index2/_settings
{
"analysis" : {
"filter": {
"synonym_graph": {
"type": "synonym_graph",
"updateable": "true",
"synonyms": ["i-phone, i phone => iphone"]
}
}
}
}
Is there a way we can pass just this filter as in elastic to update the filter in java.

Updating an index settings with the put settings API is currently not possible directly in Spring Data Elasticsearch API.
You can create such a request by yourself and send it to Elasticsearch using the ElasticsearchTemplate.execute() method:
ElasticsearchTemplate restTemplate; // injected
// setup the request
UpdateSettingsRequest updateSettingsRequest = new UpdateSettingsRequest();
AcknowledgedResponse response = restTemplate.execute(client ->
client.indices().putSettings(updateSettingsRequest, RequestOptions.DEFAULT));

Related

Elasticsearch - Do searches for alternative country codes

I have a document with a field called 'countryCode'. I have a term query that search for the keyword value of it. But having some issues with:
Some records saying UK and some other saying GB
Some records saying US and some other USA
And the list goes on..
Can I instruct my index to handle all those variations somehow, instead of me having to expand the terms on my query filter?
What you are looking for is a way to have your tokens understand similar tokens which may or may not be having similar characters. This is only possible using synonyms.
Elasticsearch provides you to configure your synonyms and have your query use those synonyms and return the results accordingly.
I have configured a field using a custom analyzer using synonym token filter. I have created a sample mapping and query so that you can play with it and see if that fits your needs.
Mapping
PUT my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"usa, us",
"uk, gb"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
},
"mappings": {
"mydocs": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_synonyms"
}
}
}
}
}
Sample Document
POST my_index/mydocs/1
{
"name": "uk is pretty cool country"
}
And when you make use of the below query, it does return the above document as well.
Query
GET my_index/mydocs/_search
{
"query": {
"match": {
"name": "gb"
}
}
}
Refer to their official documentation to understand more on this. Hope this helps!
Handling within ES itself without using logstash, I'd suggest using a simple ingest pipeline with gsub processor to update the field in it's place
{
"gsub": {
"field": "countryCode",
"pattern": "GB",
"replacement": "UK"
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/master/gsub-processor.html

Mapping definition for [fields] has unsupported parameters: [analyzer : case_sensitive]

In my search engine, users can select to search case-sensitively or not. If they choose to, the query will search on fields which use a custom case-sensitive analyser. This is my setup:
GET /candidates/_settings
{
"candidates": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "candidates",
"creation_date": "1528210812046",
"analysis": {
"analyzer": {
"case_sensitive": {
"filter": [
"stop",
"porter_stem"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
...
}
}
}
}
So I have created a custom analyser called case_sensitive taken from this answer. I am trying to define my mapping as follows:
PUT /candidates/_mapping/candidate
{
"properties": {
"first_name": {
"type": "text",
"fields": {
"case": {
"type": "text",
"analyzer": "case_sensitive"
}
}
}
}
}
So, when querying, for a case-sensitive match, I can do:
simple_query_string: {
query: **text to search**,
fields: [
"first_name.case"
]
}
I am not even getting to the last step as I am getting the error described in the title when I am trying to define the mapping. The full stack trace is in the image below:
I initially thought that my error was similar to this one but I think that issue is only related to using the keyword tokenizer and not the standard one
In this mapping definition, I was actually trying to adjust the mapping for several different fields and not just first_name. One of these fields has the type long and that is the mapping definition that was throwing the error. When I remove that from the mapping definition, it works as expected. However, I am unsure as to why this fails for this data type?

How to ingest an elastic search database definition file

I got a file with some definitions for an elastic search database in the following format:
PUT /drafts
{
"settings": {
"max_result_window" : "100000"
}
}
PUT /drafts/draft/_mapping
{
"draft":{
"properties":{
"id":{
"type":"keyword"
},
"analysis_id":{
"type":"keyword"
}
}
}
}
PUT /variants
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase"]
}
}
},
"max_result_window" : "100000"
}
}
How can I ingest that into my elastic search server in bulk?
I tried the _bulk api call but that does not seem to work
curl localhost:9200/_bulk -d #file
I have seen this format in the elastic search tutorials but it never states how to run those files...Is it even possible
Bulk API is for indexing/ updating / deleting . Operations mentioned above are either altering mapping or adding settings to indexing. This is not possible with bulk API.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

Elasticsearch Mapping - Rename existing field

Is there anyway I can rename an element in an existing elasticsearch mapping without having to add a new element ?
If so whats the best way to do it in order to avoid breaking the existing mapping?
e.g. from fieldCamelcase to fieldCamelCase
{
"myType": {
"properties": {
"timestamp": {
"type": "date",
"format": "date_optional_time"
},
"fieldCamelcase": {
"type": "string",
"index": "not_analyzed"
},
"field_test": {
"type": "double"
}
}
}
}
You could do this by creating an Ingest pipeline, that contains a Rename Processor in combination with the Reindex API.
PUT _ingest/pipeline/my_rename_pipeline
{
"description" : "describe pipeline",
"processors" : [
{
"rename": {
"field": "fieldCamelcase",
"target_field": "fieldCamelCase"
}
}
]
}
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "my_rename_pipeline"
}
}
Note that you need to be running Elasticsearch 5.x in order to use ingest. If you're running < 5.x then you'll have to go with what #Val mentioned in his comment :)
Updating field name in ES (version>5, missing has been removed) using _update_by_query API:
Example:
POST http://localhost:9200/INDEX_NAME/_update_by_query
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "NEW_FIELD_NAME"
}
}
}
},
"script" : {
"inline": "ctx._source.NEW_FIELD_NAME = ctx._source.OLD_FIELD_NAME; ctx._source.remove(\"OLD_FIELD_NAME\");"
}
}
First of all, you must understand how elasticsearch and lucene store data, by immutable segments (you can read about easily on Internet).
So, any solution will remove/create documents and change mapping or create a new index so a new mapping as well.
The easiest way is to use the update by query API: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-update-by-query.html
POST /XXXX/_update_by_query
{
"query": {
"missing": {
"field": "fieldCamelCase"
}
},
"script" : {
"inline": "ctx._source.fieldCamelCase = ctx._source.fieldCamelcase; ctx._source.remove(\"fieldCamelcase\");"
}
}
Starting with ES 6.4 you can use "Field Aliases", which allow the functionality you're looking for with close to 0 work or resources.
Do note that aliases can only be used for searching - not for indexing new documents.

Elasticsearch completion - generating input list with analyzers

I've had a look at this article: https://www.elastic.co/blog/you-complete-me
However, it requires writing some logic in the client to create multiple "input". Is there a way to define an analyzer (maybe using shingle or ngram/edge-ngram) that will generate the multiple terms for input?
Here's what I tried (and it obviously doesn't work):
DELETE /products/
PUT /products/
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type":"shingle",
"max_shingle_size":5,
"min_shingle_size":2
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"autocomplete_filter"
],
"tokenizer": "standard"
}
}
}
},
"mappings": {
"product": {
"properties": {
"name": {"type": "string"
,"copy_to": ["name_suggest"]
}
,"name_suggest": {
"type": "completion",
"payloads": false,
"analyzer": "autocomplete"
}
}
}
}
}
PUT /products/product/1
{
"name": "Apple iPhone 5"
}
PUT /products/product/2
{
"name": "iPhone 4 16GB"
}
PUT /products/product/3
{
"name": "iPhone 3 GS 16GB black"
}
PUT /products/product/4
{
"name": "Apple iPhone 4 S 16 GB white"
}
PUT /products/product/5
{
"name": "Apple iPhone case"
}
POST /products/_suggest
{
"suggestions": {
"text":"i"
,"completion":{
"field": "name_suggest"
}
}
}
Don't think there's a direct way to achieve this.
I'm not sure why it would be needed to store ngrammed tokens considering elasticsearch already stores the 'input' text as an FST structure. New releases also allow for fuzziness in the suggest query.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html#fuzzy
I can understand the need for something like a shingle analyser to generate the inputs for you, but there doesn't seem to be a way yet. Having said that, the _analyze endpoint can be used to generate tokens from the analyzer of your choice and those tokens can be passed to the 'input' field (with or without any other added logic). This way you won't have to replicate your analyzer logic in your application code. That's the only way i can think of to achieve the desired input field.
Hope it helps.

Resources