Adding filter to elasticsearch return nothing - elasticsearch

I have an ElasticSearch 7.9 single node instance setup with 0 documents and trying to add a filter by following the documentation example . When I try to add a filter by issuing a PUT on index my_web (an index that exists)
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [ "stop" ]
}
}
}
}
}
I get no response from the server. If I issue a GET with _ml/filters/ it responds showing there are 0 filters.
Do I have badly formed JSON or something? It's pretty frustrating not to receive any response at all...

I had 2 issues.
My application wasn't returning exception errors, so Elasticsearch was indeed reporting an error. Thanks to #Val in the comments for pushing me on the right track with this
Once I had this resolved, I could see there was a resource_already_exists_exception because the command was trying to add an index that already existed (I thought it would just replace it)
The solution found in Elasticsearch documentation To add an analyzer, you must close the index, define the analyzer, and reopen the index:
POST /my-index-000001/_close
PUT /my-index-000001/_settings
{
"analysis" : {
"analyzer":{
"content":{
"type":"custom",
"tokenizer":"whitespace"
}
}
}
}
POST /my-index-000001/_open

Related

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

Adding default value to existing mapping in elastic search

I have an index with mapping. I decided to add a new field to existing mapping:
{
"properties": {
"sexifield": {
"type": "keyword",
"null_value": "NULL"
}
}
}
As far as I understand, the field should appear in existing documents when I reindex. So when I use api to reindex:
{
"source": {
"index": "index_v1"
},
"dest": {
"index": "index_v2",
"version_type": "external"
}
}
I see that the mapping for index_v2 does not consist sexifield, and documents are not consisting it neither. Also this operation took less than 60ms.
Please point me, what I do not understand from it...
Adding the new documents to the first index (via java API, for an entity which has not this field (sexifield), so probably elastic should add me the default one) with sexifield, also does not create me this additional field.
Thanks in advance for tips.
Regards
great question +1 ( I learned something while solving your problem)
I don't know the answer to how to consider the second mapping (reindexed mapping) while reindexing, but here is how I would update the reindexed index (all the documents) once the reindexing is done from original index. I still continue to research to see if there is a way to consider the default values that are defined in the mapping of the second index while reindexing, but for now see if this solution helps..
POST /index_v2/_update_by_query
{
"script": {
"lang": "painless",
"inline": "ctx._source.sexifield = params.null_value",
"params": {
"null_value": "NULL"
}
}
}

Remove custom analyzer / filter on Elasticsearch for all new indexes

We tried adding a customer analyzer / lowercase filter to all new indexes in Elasticsearch. It looks something like this:
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"filter": [
"lowercase"
],
"type": "custom",
"char_filter": []
}
}
},
This is automatically applied to all new indexes. How do I remove this? I realize i can not remove this from existing indexes, but how do i stop it from being automatically added to new ones?
It appears that these settings are located somewhere in my master "template". I can see the template using "GET /_template", which contains all of the unwanted lowercase normalizers... but how do i remove them?
Thanks!
Here is how you can delete and index template
DELETE /_template/template_1
Also in future if you want to add a new custom analyzer to any template first make a testing index with your custom analyzer and then test if you custom analyzers is giving you desired results by following
GET <index_name>/_analyze
{
"analyzer" : "analyzer_name",
"text" : "this is a test"
}

How to test analyzer-smartcn plugin for Elastic Search on local machine

I installed the smartcn plugin on my elastic search, restarted elasticsearch and tried to create an index with these settings:
PUT /test_chinese
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"default": {
"type": "smartcn"
}
}
}
}
}
}
However, when I run this in Marvel, I get this error back and I see a bunch of errors in Elastic search:
"error": "IndexCreationException[[test_chinese] failed to create
index]; nested: ElasticsearchIllegalArgumentException[failed to find
analyzer type [smartcn] or tokenizer for [default]]; nested:
NoClassSettingsException[Failed to load class setting [type] with
value [smartcn]]; nested:
ClassNotFoundException[org.elasticsearch.index.analysis.smartcn.SmartcnAnalyzerProvider];
", "status": 400
Any ideas what I might be missing?
I figured it out. I manually installed the plugins from the zip and it was causing issues... I reinstalled the right way but specific to 1.7 and it worked

How to get the analyzed text from the elasticsearch database

I need to get the analyzed text from the elasticseatch database. I know that I can apply an analyzer to any text using the analyze API, however, since the text has already be analyzed during indexing, there should be a way to get access to the analyzed data.
Here is what I want to do using the analyze API and Python Elasticsearch
res = es.indices.analyze(index=app.config['ES_ARXIV_PAPER_INDEX'],
body={"char_filter": ["html_strip"],
"tokenizer" : "standard",
"filter" : ["lowercase", "stop", "snowball"],
"text" : text})
tokens = []
for token in res['tokens']:
tokens.append(token['token'])
print("tokens = ", tokens)
I noticed that this procedure is actually quite slow. So getting the data directly from the indexed data should be much faster.
Using the termvectors api should do the job, but you must specify the id of every entry and it must be enabled (since the information is stored). If you don't want that, then you are already using the correct method.
Example below:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"my_field": {
"type": "text"
}
}
}
}
}
POST my_index/my_type/1
{
"my_field": "this is a test"
}
GET /my_index/my_type/1/_termvectors?fields=*
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/term-vector.html

Resources