Elassandra multiple type mapping - elasticsearch

I'm using Elassandra to make search in mail, Cassandra to save the mail and ElasticSearch to search in those mails.
My problem is that since ElasticSearch 6, we can't use multiple type in one mapping. Here is my mapping:
"mappings": {
"mail__mail": {
"discover" : ".*",
"properties": {
"mailfrom": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"ngram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
}
}
},
"subject": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"ngram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
}
}
},
"date" : {
"type" : "date"
},
"folderid" : {
"type" : "text"
}
}
},
"mail__account" : {
"discover" : ".*",
"properties": {
"userId" : {
"type" : "Integer"
}
}
}
}
How can i use ElasticSearch 6 to search in multiple cassandra table ?

Since ES6 you need to map 1 table per index.
Searching multiple indexes:
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html

As #Alex said, you need to map 1 table per ES index, but you can create multiple ES indexes per keyspace, mapping to different tables.
You have to specify a keyspace name as an index setting. This is done with the following syntax :
curl -XPUT "http://localhost:9200/your_index/" -d '{
"settings" : { "keyspace" : "your_keyspace" },
"mappings" : {
"your_table" : {
"properties" : {
...
}
}
}
}

Related

Copying co-ordinates to field geo_point type using copy_to in Elasticsearch

I am trying to work with geo code in elasticsearch, I have an index which is having two different unique field as latitude and longitude. Both are being stored as double, I want to use copy to feature of elasticsearch and copy both field value to a third field which will have geo_point type. I tried doing that but that's not working as intended.
{
"mappings": {
"properties": {
"unique_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"location_data": {
"properties": {
"latitude": {
"type": "float",
"copy_to": "last_location"
},
"longitude": {
"type": "float",
"copy_to": "last_location"
},
"last_location": {
"type": "geo_point"
}
}
}
}
}
}
When I index a sample document such as
{
"unique_id": "12345_mytest",
"location_data": {
"latitude": 37.16,
"longitude": -124.76
}
}
You will be able to see in the new mapping that the last_location field which was supposed to be inside location_data object is also populated at root level with a different data type other than geo_point.
{
"mappings" : {
"properties" : {
"last_location" : {
"type" : "float"
},
"location_data" : {
"properties" : {
"last_location" : {
"type" : "geo_point",
"store" : true
},
"latitude" : {
"type" : "float",
"copy_to" : [
"last_location"
]
},
"longitude" : {
"type" : "float",
"copy_to" : [
"last_location"
]
}
}
},
"unique_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
}
}
}
}
And furthermore when I query over the field I am unable to get the result as expected.
This doesn't works, any other ideas or way to do that. I know I can do that from the source itself or by altering the data before indexing, but I don't have to luxury to do that right away. Any other way by altering the mapping is most welcome. Thanks in advance for any pointers to get this done.
Thanks
Ashit

Elasticsearch - Wrong field type

I'm running on ElasticSearch 6.8.
I tried to add a keyword type field to my index mapping.
What I want is a mapping with my_field seeming like that:
"my_field": {
"type": "keyword"
}
So in order to do that, I added a field to my mapping:
"properties": {
...
"my_field": {
"type": "keyword",
"norms": false
},
...
}
But currently, it gives me something like:
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
I need this keyword type because I need to aggregate on it, and with a text type, it gave me:
Fielddata is disabled on text fields by default. Set fielddata=true on [my_field] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.
But I'm not able to set fielddata to true.
I tried many things like creating a new index instead of updating one but none of these tries worked.
Anyone knows how to have the correct field type ? (the solution I prefer)
Or how to set fielddata to true in the mapping?
Best regards,
Jules
I just created set field-data to true on text field by using below curl command on Elasticsearch 6.X version:
curl -X POST "localhost:9200/my_index/type?pretty" -H 'Content-Type: application/json' -d'
> {
> "mappings" :{
> "properties": {
> "my_field": {
> "type": "text",
> "fielddata": true
> }
> }
> }
> }'
And it created index with proper mapping.
{
"_index" : "my_index",
"_type" : "type",
"_id" : "3Jl0F3EBg44VI1hJVGnz",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
Mapping API gives below JSON response.
{
"my_index": {
"mappings": {
"type": {
"properties": {
"mappings": {
"properties": {
"properties": {
"properties": {
"my_field": {
"properties": {
"fielddata": {
"type": "boolean"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
}
}
}
}

Custom indexing template is not being applied

I have a project where I am to analyze and visualize access log data. I use Logstash to send data to Elasticsearch and then visualize some stuff with Kibana.
Everything has worked fine until I discovered that I needed the Path Hierarchy Analyzer to show what I want to. I now have a custom template (JSON) and changed the out section of my Logstash configuration. But when I index data, my template is not being applied.
(Version 5.2 of Elasticseach and Logstash, can't update since that is the version in use at the place where I work).
My JSON file is valid. As far as the input and filters go, my Logstash configuration is fine, too. I guess I made a mistake in the output.
I already tried setting manage_template to false. I also tried template_overwrite => "false" just for the sake of it.
I tried creating the index first (Kibana Dev Tools) and populating it after. I created the index template and then the index. That way my template was applied and when I created the index pattern, everything seemed correct. Then I indexed one of my log files. I ended up with a Courier Fetch Error. http://localhost:9200/_all/_mapping?pretty=1 showed my that while indexing my data a default template was being used instead of my custom one. Nothing was different from before adding a custom template.
I searched the web and read everything I could find on stackoverflow and in the elastic forum about custom templates not being applied. I tried out all the solutions provided there, that is why I ended up opting for a custom template saved locally and providing the path in my logstash output. But I am all out of ideas now.
This is the output of my logstash configuration:
output {
elasticsearch {
hosts => ["localhost:9200"]
template => "/etc/logstash/conf.d/template.json"
index => "beam-%{+YYYY.MM.dd}"
manage_template => "true"
template_overwrite => "true"
document_type => "beamlogs"
}
stdout {
codec => rubydebug
}
}
And this is my custom template:
{
"template": "beam_custom",
"index_patterns": "beam-*",
"order" : 5,
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
},
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
"custom_hierarchy_reversed": {
"type": "path_hierarchy",
"delimiter": "/",
"reverse": "true"
}
}
}
},
"mappings": {
"beamlogs": {
"properties": {
"object": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
}
}
},
"referral": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
}
}
},
"#timestamp" : {
"type" : "date"
},
"action" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"datetime" : {
"type" : "date",
"format": "time_no_millis",
"fields" : {
"keyword" : {
"type": "keyword"
}
}
},
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"info" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"page" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"path" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"result" : {
"type" : "long"
},
"s_direct" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"s_limit" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"s_mobile" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"s_terms" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"size" : {
"type" : "long"
},
"sort" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
}
}
}
}
}
After indexing my data this is part of what I get with http://localhost:9200/_all/_mapping?pretty=1
"datetime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"object" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
datetime should not have the type text. But worse than that, fields like objet.tree are not even created.
I really don't care about the wrong mapping for datetime, but I need to get the Path Hierarchy Analyzer to work. I just don't know what to do anymore.
So. What I just tried was creating the index template in Kibana.
PUT _template/beam_custom
/followed by what is in my template.json
I then checked if the template was created.
GET _template/beam_custom
The output was this:
{
"beam_custom": {
"order": 100,
"template": "beam_custom",
"settings": {
"index": {
"analysis": {
"analyzer": {
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
},
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
...
So I guess creating the template worked.
Then I created an index
PUT beam-2019-07-15
But when I checked the index, I got this:
{
"beam-2019.07.15": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"creation_date": "1563044670605",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "rGzplctSQDmrI_NSlt47hQ",
"version": {
"created": "5061699"
},
"provided_name": "beam-2019.07.15"
}
}
}
}
Shouldn't the index pattern have been recognized? I think this is the heart of the problem. I thought that my template would have been used and the output should have been something like this instead:
{
"beam-2019.07.15": {
"aliases": {},
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date"
},
"action": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},...
Why doesn't it recognize the pattern?
So, I found the mistake.
When I looked up how to build my own template, at some point I looked at the documentation for the current version. But in 5.2., "index_patterns =>" doesn't exist.
"template": "beam_custom",
"index_patterns": "beam-*",
This doesn't work then, of course.
Instead, I dropped the "index_patterns" line and defined my pattern in the template-parameter.
"template": ["beam-*"],
//rest
This fixed the problem. After that, my pattern was recognized.
Yet I am facing a different problem now. The Path Hierarchy Analyzer is not working properly. object.tree and the rest of the fields I want are not being created.
GET beam-*/_search
{
"query": {
"term": {
"object.tree": "/belletristik/"
}
}
}
yields nothing, though I should have a few hundred hits. Looking at my data, there are no analyzed fields for my paths. Any ideas?

Multiple document types with same mapping in Elasticseach

I have index named test which can be associated to n number of documents types named sub_test_1 to sub_text_n. But all will have same mapping.
Is there any way to make an index such all document types have same mapping for their documents? I.e. test\sub_text1\_mapping should be same as test\sub_text2\_mapping.
Otherwise if I have like 1000 document types, I will we having 1000 mappings of the same type referring to each document types.
UPDATE:
PUT /test_index/
{
"settings": {
"index.store.type": "default",
"index": {
"number_of_shards": 5,
"number_of_replicas": 1,
"refresh_interval": "60s"
},
"analysis": {
"filter": {
"porter_stemmer_en_EN": {
"type": "stemmer",
"name": "porter"
},
"default_stop_name_en_EN": {
"type": "stop",
"name": "_english_"
},
"snowball_stop_words_en_EN": {
"type": "stop",
"stopwords_path": "snowball.stop"
},
"smart_stop_words_en_EN": {
"type": "stop",
"stopwords_path": "smart.stop"
},
"shingle_filter_en_EN": {
"type": "shingle",
"min_shingle_size": "2",
"max_shingle_size": "2",
"output_unigrams": true
}
}
}
}
}
Intended mapping:
{
"sub_text" : {
"properties" : {
"_id" : {
"include_in_all" : false,
"type" : "string",
"store" : true,
"index" : "not_analyzed"
},
"alternate_id" : {
"include_in_all" : false,
"type" : "string",
"store" : true,
"index" : "not_analyzed"
},
"text" : {
"type" : "multi_field",
"fields" : {
"text" : {
"type" : "string",
"store" : true,
"index" : "analyzed",
},
"pdf": {
"type" : "attachment",
"fields" : {
"pdf" : {
"type" : "string",
"store" : true,
"index" : "analyzed",
}
}
}
}
}
}
}
}
I want this mapping to be an individual mapping for all sub_texts I create so that I can change it for one sub_text without affecting others e.g. I may want to add two custom analyzers to sub_text1 and three analyzers to sub_text3, rest others will stay same.
UPDATE:
PUT /my-index/document_set/_mapping
{
"properties": {
"type": {
"type": "string",
"index": "not_analyzed"
},
"doc_id": {
"type": "string",
"index": "not_analyzed"
},
"plain_text": {
"type": "string",
"store": true,
"index": "analyzed"
},
"pdf_text": {
"type": "attachment",
"fields": {
"pdf_text": {
"type": "string",
"store": true,
"index": "analyzed"
}
}
}
}
}
POST /my-index/document_set/1
{
"type": "d1",
"doc_id": "1",
"plain_text": "simple text for doc1."
}
POST /my-index/document_set/2
{
"type": "d1",
"doc_id": "2",
"pdf_text": "cGRmIHRleHQgaXMgaGVyZS4="
}
POST /my-index/document_set/3
{
"type": "d2",
"doc_id": "3",
"plain_text": "simple text for doc3 in d2."
}
POST /my-index/document_set/4
{
"type": "d2",
"doc_id": "4",
"pdf_text": "cGRmIHRleHQgaXMgaGVyZSBpbiBkMi4="
}
GET /my-index/document_set/_search
{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"type" : "d1"
}
}
}
}
}
This gives me the documents related to type "d1". How to add analyzers only to document of type "d1"?
At the moment a possible solution is to use index templates or dynamic mapping. However they do not allow wildcard type matching so you would have to use the _default_ root type to apply the mappings to all types in the index and thus it would be up to you to ensure that all your types can be applied to the same dynamic mapping. This template example may work for you:
curl -XPUT localhost:9200/_template/template_1 -d '
{
"template" : "test",
"mappings" : {
"_default_" : {
"dynamic": true,
"properties": {
"field1": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
'
Do not do this.
Otherwise if I have like 1000 document types, I will we having 1000 mappings of the same type referring to each document types.
You're exactly right. For every additional _type with an identical mapping you are needlessly adding to the size of your index's mapping. They will not be merged, nor will any compression save you.
A much better solution is to simply create a shared _type and to create a field that represents the intended type. This completely avoids having wasted mappings and all of the negatives associated with it, including an unnecessary increase for your cluster state's size.
From there, you can imitate what Elasticsearch is doing for you and filter on your custom type without ballooning your mappings.
$ curl -XPUT localhost:9200/my-index -d '{
"mappings" : {
"my-type" : {
"properties" : {
"type" : {
"type" : "string",
"index" : "not_analyzed"
},
# ... whatever other mappings exist ...
}
}
}
}'
Then, for any search against sub_text1 (etc.), then you can do a term (for one) or terms (for more than one) filter to imitate the _type filter that would happen for you.
$ curl -XGET localhost:9200/my-index/my-type/_search -d '{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"type" : "sub_text1"
}
}
}
}
}'
This is doing the same thing as the _type filter and you can create _aliases that contain the filter if you want to have the higher level search capability without exposing client-level logic to the filtering.

Elasticsearch: multiple languages in two fields when the query's language is unknown or mixed

I am new to Elasticsearch, and I am not sure how to proceed in my situation.
I have the following mapping:
{
"mappings": {
"book": {
"properties": {
"title": {
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"ar": {
"type": "string",
"analyzer": "arabic"
}
}
},
"keyword": {
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"ar": {
"type": "string",
"analyzer": "arabic"
}
}
}
}
}
}
}
A sample document may have two languages for the same field of the same book. Here are two example documents:
{
"title" : {
"en": "hello",
"ar": "مرحبا"
},
"keyword" : {
"en": "world",
"ar": "عالم"
}
}
{
"title" : {
"en": "Elasticsearch"
},
"keyword" : {
"en": "full-text index"
}
}
When I know what language is used in query, I am able to build query as follows (when English is used):
"query": {
"multi_match" : {
"query" : "keywords",
"fields" : [ "title.en", "keyword.en" ]
}
}
Based on my current document mapping, how can I build a query if
the query language is unknown or
is mixed with English and Arabic?
Thanks for any input!
Regards.
p.s. I am also open to any improvement to the above mapping.
the query language is unknown
You can use same multi match query but on all the fields.for eg,
Assuming you are using keyword analyzer
"query": {
"multi_match" : {
"query" : "keywords",
"fields" : [ "title.en", "keyword.en", "title.ar", "keyword.ar" ]
}
}
is mixed with English and Arabic
You need to change the analyzer to standard and then you can perform the same query.
Thanks

Resources