Elasticsearch - Wrong field type - elasticsearch

I'm running on ElasticSearch 6.8.
I tried to add a keyword type field to my index mapping.
What I want is a mapping with my_field seeming like that:
"my_field": {
"type": "keyword"
}
So in order to do that, I added a field to my mapping:
"properties": {
...
"my_field": {
"type": "keyword",
"norms": false
},
...
}
But currently, it gives me something like:
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
I need this keyword type because I need to aggregate on it, and with a text type, it gave me:
Fielddata is disabled on text fields by default. Set fielddata=true on [my_field] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.
But I'm not able to set fielddata to true.
I tried many things like creating a new index instead of updating one but none of these tries worked.
Anyone knows how to have the correct field type ? (the solution I prefer)
Or how to set fielddata to true in the mapping?
Best regards,
Jules

I just created set field-data to true on text field by using below curl command on Elasticsearch 6.X version:
curl -X POST "localhost:9200/my_index/type?pretty" -H 'Content-Type: application/json' -d'
> {
> "mappings" :{
> "properties": {
> "my_field": {
> "type": "text",
> "fielddata": true
> }
> }
> }
> }'
And it created index with proper mapping.
{
"_index" : "my_index",
"_type" : "type",
"_id" : "3Jl0F3EBg44VI1hJVGnz",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
Mapping API gives below JSON response.
{
"my_index": {
"mappings": {
"type": {
"properties": {
"mappings": {
"properties": {
"properties": {
"properties": {
"my_field": {
"properties": {
"fielddata": {
"type": "boolean"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
}
}
}
}

Related

why elasticsearch keyword search not working?

i use NLog to write log message to Elasticsearch, the index structure is here:
"mappings": {
"logevent": {
"properties": {
"#timestamp": {
"type": "date"
},
"MachineName": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"level": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"message": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
}
}
I was able to get results using a text search:
GET /webapi-2022.07.28/_search
{
"query": {
"match": {
"message": "ERROR"
}
}
}
result
"hits" : [
{
"_index" : "webapi-2022.07.28",
"_type" : "logevent",
"_id" : "IFhYQoIBRhF4cR9wr-ja",
"_score" : 4.931916,
"_source" : {
"#timestamp" : "2022-07-28T01:07:58.8822339Z",
"level" : "Error",
"message" : """2022-07-28 09:07:58.8822|ERROR|AppSrv.Filter.AccountAuthorizeAttribute|[KO17111808]-[172.10.2.200]-[ERROR]-"message"""",
"MachineName" : "WIN-EPISTFOBD41"
}
}
//.....
]
but when i use keyword, i get nothing:
GET /webapi-2022.07.28/_search
{
"query": {
"term": {
"message.keyword": "ERROR"
}
}
}
i tried term and match, the result is same.
this is happening due to message field not just containing ERROR but also having other string in the .keyword field, you need to use the text search only in your case, you can use the .keyword field only in case of the exact search.
If your message field contained only the ERROR string than only searching on your .keyword would produce result, you can test it yourself by indexing a sample document.

Elassandra multiple type mapping

I'm using Elassandra to make search in mail, Cassandra to save the mail and ElasticSearch to search in those mails.
My problem is that since ElasticSearch 6, we can't use multiple type in one mapping. Here is my mapping:
"mappings": {
"mail__mail": {
"discover" : ".*",
"properties": {
"mailfrom": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"ngram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
}
}
},
"subject": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"ngram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
}
}
},
"date" : {
"type" : "date"
},
"folderid" : {
"type" : "text"
}
}
},
"mail__account" : {
"discover" : ".*",
"properties": {
"userId" : {
"type" : "Integer"
}
}
}
}
How can i use ElasticSearch 6 to search in multiple cassandra table ?
Since ES6 you need to map 1 table per index.
Searching multiple indexes:
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html
As #Alex said, you need to map 1 table per ES index, but you can create multiple ES indexes per keyspace, mapping to different tables.
You have to specify a keyspace name as an index setting. This is done with the following syntax :
curl -XPUT "http://localhost:9200/your_index/" -d '{
"settings" : { "keyspace" : "your_keyspace" },
"mappings" : {
"your_table" : {
"properties" : {
...
}
}
}
}

ElasticSearch terms aggregation on whole field

This is mapping for my field:
{
"product" : {
"mappings" : {
"product" : {
"filters.brand" : {
"full_name" : "filters.brand",
"mapping" : {
"brand" : {
"type" : "text",
"fielddata" : true
}
}
}
}
}
}
}
I'm trying to get unique brands with doc count will following curl:
curl -XGET 'http://localhost:9200/product/_search?pretty' -H 'Content-Type: application/json' -d'
{
"aggs": {
"domains": {
"terms": {
"field": "filters.brand",
"missing": "N/A",
"size": 10,
"order": {
"_count": "desc"
}
}
}
}
}'
It is working ok except it is returning count by field tokens, not by whole field.
For example I have brand "Absolut Joy" and it returns result for them as separate tokens.
How to get aggregation for whole field?
ElasticSearch version: 5.3.1
Thank you
You can update the mapping of filters.brand as
{
"mapping": {
"brand": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
And update the aggregation query to contain "field": "filters.brand.keyword".
Use of fielddata: true for text is not advised.
Refer: Before-enabling-field-data
For using same field for different purposes. Refer: use-multi-fields
Solution is to change analyzer and search_analyzer.
"analyzer": "keyword",
"search_analyzer": "keyword"

Elasticsearch Index template lost raw string mapping

I'm running a small ELK 5.4.0 stack server on a single node. When I started, I just took all the defaults, which meant 5 shards for each index. I didn't want the overhead of all those shards, so I created an index template like so:
PUT /_template/logstash
{
"template": "logstash*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
This worked fine, but I just realized that all my raw fields are now missing in ES. For example, "uri" is one of my indexed fields and I used to get "uri.raw" as an unanalyzed version of it. But since I updated the template, they are missing. Looking at the current template shows
GET /_template/logstash
Returns:
{
"logstash": {
"order": 0,
"template": "logstash*",
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "0"
}
},
"mappings": {},
"aliases": {}
}
}
It seems that the mappings have gone missing. I can pull the mappings off an earlier index
GET /logstash-2017.03.01
and compare it with a recent one
GET /logstash-2017.08.01
Here I see that back in March there was a mapping structure like
mappings: {
"logs": {
"_all": {...},
"dynamic_templates": {...},
"properties": {...}
},
"_default_": {
"_all": {...},
"dynamic_templates": {...},
"properties": {...}
}
}
and now I have only
mappings: {
"logs": {
"properties": {...}
}
}
The dynamic_templates hash holds the information about creating "raw" fields.
My guess is that I need to add to update my index template to
PUT /_template/logstash
{
"template": "logstash*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"logs": {
"_all": {...},
"dynamic_templates": {...},
},
"_default_": {
"_all": {...},
"dynamic_templates": {...},
"properties": {...}
}
}
IOW, everything but logs.properties (which holds the current list of fields being sent over by logstash).
But I'm not an ES expert and now I'm a bit worried. My original index template didn't work out the way I thought it would. Is my above plan going to work? Or am I going to make things worse? Must you always include everything when you create an index template? And where did the mappings for the older indexes, before I had a template file, come from?
When Logstash first starts, the elasticsearch output plugin installs its own index template with the _default_ template and dynamic_templates as you correctly figured out.
Everytime Logstash creates a new logstash-* index (i.e. every day), the template is leveraged and the index is created with the proper mapping(s) present in the template.
What you need to do now is simply to take the official logstash template that you have overridden and reinstall it like this (but with the modified shard settings):
PUT /_template/logstash
{
"template" : "logstash-*",
"version" : 50001,
"settings" : {
"index.refresh_interval" : "5s"
"index.number_of_shards": 1,
"index.number_of_replicas": 0
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true, "norms" : false},
"dynamic_templates" : [ {
"message_field" : {
"path_match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text",
"norms" : false
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text", "norms" : false,
"fields" : {
"keyword" : { "type": "keyword", "ignore_above": 256 }
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date", "include_in_all": false },
"#version": { "type": "keyword", "include_in_all": false },
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "half_float" },
"longitude" : { "type" : "half_float" }
}
}
}
}
}
}
Another way you could have done it is to not overwrite the logstash template, but use any other id, such as _template/my_logstash, so that at index creation time, both templates would have kicked in and used the mappings from the official logstash template and the shard settings from your template.

Multiple document types with same mapping in Elasticseach

I have index named test which can be associated to n number of documents types named sub_test_1 to sub_text_n. But all will have same mapping.
Is there any way to make an index such all document types have same mapping for their documents? I.e. test\sub_text1\_mapping should be same as test\sub_text2\_mapping.
Otherwise if I have like 1000 document types, I will we having 1000 mappings of the same type referring to each document types.
UPDATE:
PUT /test_index/
{
"settings": {
"index.store.type": "default",
"index": {
"number_of_shards": 5,
"number_of_replicas": 1,
"refresh_interval": "60s"
},
"analysis": {
"filter": {
"porter_stemmer_en_EN": {
"type": "stemmer",
"name": "porter"
},
"default_stop_name_en_EN": {
"type": "stop",
"name": "_english_"
},
"snowball_stop_words_en_EN": {
"type": "stop",
"stopwords_path": "snowball.stop"
},
"smart_stop_words_en_EN": {
"type": "stop",
"stopwords_path": "smart.stop"
},
"shingle_filter_en_EN": {
"type": "shingle",
"min_shingle_size": "2",
"max_shingle_size": "2",
"output_unigrams": true
}
}
}
}
}
Intended mapping:
{
"sub_text" : {
"properties" : {
"_id" : {
"include_in_all" : false,
"type" : "string",
"store" : true,
"index" : "not_analyzed"
},
"alternate_id" : {
"include_in_all" : false,
"type" : "string",
"store" : true,
"index" : "not_analyzed"
},
"text" : {
"type" : "multi_field",
"fields" : {
"text" : {
"type" : "string",
"store" : true,
"index" : "analyzed",
},
"pdf": {
"type" : "attachment",
"fields" : {
"pdf" : {
"type" : "string",
"store" : true,
"index" : "analyzed",
}
}
}
}
}
}
}
}
I want this mapping to be an individual mapping for all sub_texts I create so that I can change it for one sub_text without affecting others e.g. I may want to add two custom analyzers to sub_text1 and three analyzers to sub_text3, rest others will stay same.
UPDATE:
PUT /my-index/document_set/_mapping
{
"properties": {
"type": {
"type": "string",
"index": "not_analyzed"
},
"doc_id": {
"type": "string",
"index": "not_analyzed"
},
"plain_text": {
"type": "string",
"store": true,
"index": "analyzed"
},
"pdf_text": {
"type": "attachment",
"fields": {
"pdf_text": {
"type": "string",
"store": true,
"index": "analyzed"
}
}
}
}
}
POST /my-index/document_set/1
{
"type": "d1",
"doc_id": "1",
"plain_text": "simple text for doc1."
}
POST /my-index/document_set/2
{
"type": "d1",
"doc_id": "2",
"pdf_text": "cGRmIHRleHQgaXMgaGVyZS4="
}
POST /my-index/document_set/3
{
"type": "d2",
"doc_id": "3",
"plain_text": "simple text for doc3 in d2."
}
POST /my-index/document_set/4
{
"type": "d2",
"doc_id": "4",
"pdf_text": "cGRmIHRleHQgaXMgaGVyZSBpbiBkMi4="
}
GET /my-index/document_set/_search
{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"type" : "d1"
}
}
}
}
}
This gives me the documents related to type "d1". How to add analyzers only to document of type "d1"?
At the moment a possible solution is to use index templates or dynamic mapping. However they do not allow wildcard type matching so you would have to use the _default_ root type to apply the mappings to all types in the index and thus it would be up to you to ensure that all your types can be applied to the same dynamic mapping. This template example may work for you:
curl -XPUT localhost:9200/_template/template_1 -d '
{
"template" : "test",
"mappings" : {
"_default_" : {
"dynamic": true,
"properties": {
"field1": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
'
Do not do this.
Otherwise if I have like 1000 document types, I will we having 1000 mappings of the same type referring to each document types.
You're exactly right. For every additional _type with an identical mapping you are needlessly adding to the size of your index's mapping. They will not be merged, nor will any compression save you.
A much better solution is to simply create a shared _type and to create a field that represents the intended type. This completely avoids having wasted mappings and all of the negatives associated with it, including an unnecessary increase for your cluster state's size.
From there, you can imitate what Elasticsearch is doing for you and filter on your custom type without ballooning your mappings.
$ curl -XPUT localhost:9200/my-index -d '{
"mappings" : {
"my-type" : {
"properties" : {
"type" : {
"type" : "string",
"index" : "not_analyzed"
},
# ... whatever other mappings exist ...
}
}
}
}'
Then, for any search against sub_text1 (etc.), then you can do a term (for one) or terms (for more than one) filter to imitate the _type filter that would happen for you.
$ curl -XGET localhost:9200/my-index/my-type/_search -d '{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"type" : "sub_text1"
}
}
}
}
}'
This is doing the same thing as the _type filter and you can create _aliases that contain the filter if you want to have the higher level search capability without exposing client-level logic to the filtering.

Resources