How to build a parent/child mapping for Elasticsearch? - elasticsearch

I tried to use the following mapping to index my data:
{
"mappings": {
"chow-demo": {
"properties": {
"#fields": {
"dynamic": "true",
"properties": {
"asgid": {
"type": "string",
"analyzer": "keyword"
},
"asid": {
"type": "long"
},
"astid": {
"type": "long"
},
"clfg": {
"analyzer": "keyword",
"type": "string"
},
"httpcode": {
"type": "long"
},
"oid": {
"type": "string"
},
"onid": {
"type": "long"
},
"ptrnr": {
"analyzer": "keyword",
"type": "string"
},
"pguid": {
"analyzer": "keyword",
"type": "string"
},
"ptid": {
"type": "long"
},
"sid": {
"type": "long"
},
"src_url": {
"analyzer": "keyword",
"type": "string"
},
"title": {
"analyzer": "keyword",
"type": "string"
},
"ts": {
"type": "long"
}
}
},
"#timestamp": {
"format": "dateOptionalTime",
"type": "date"
},
"#message": {
"type": "string"
},
"#source": {
"type": "string"
},
"#type": {
"analyzer": "keyword",
"type": "string"
},
"#tags": {
"type": "string"
},
"#source_host": {
"type": "string"
},
"#source_path": {
"type": "string"
}
}
},
"chow-clfg": {
"_parent": {
"type": "chow-demo"
},
"dynamic": "true",
"properties": {
"_ttl": {
"enabled": true,
"default": "1h"
},
"clfg": {
"analyzer": "keyword",
"type": "string"
},
"#timestamp": {
"format": "dateOptionalTime",
"type": "date"
},
"count": {
"type": "long"
}
}
}
}
}
I tried to populate the parent type "chow-demo" without populating the child type "chow-clfg", and the document refused to index. (No documents were indexed into Elasticsearach)
When I take out the child mapping for "chow-clfg", it does indexing properly as usual. Hence I have the following question:
Is my mapping structure wrong?
Must the parent and child be indexed together at the same time before the data can be successfully indexed?
Really need help in this question for my project to progress! Thanks!

Yes, your mapping is wrong. The _ttl element should be one level higher in the chow-clfg type. In other words _ttl should be on the same level as _parent. However, I am not quite sure how this problem can affect your ability to index.
Parents and children don't have to be indexed together.

Related

What is the equivalent type ElasticSearch "keyword" in Bigquery?

I would like to load data from BigQuery to ElasticSearch using these custom mappings.
{
"properties": {
"#timestamp": {
"type": "date"
},
"affordable": {
"type": "boolean"
},
"building_type": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"category_1_name": {
"type": "keyword"
},
"category_2_name": {
"type": "keyword"
},
"category_3_name": {
"type": "keyword"
},
"company_id": {
"type": "long"
}
}
I'm not sure what is the equivalent type for "keyword" that i can use in BigQuery. Appreciate your help! Thanks

How to exclude fields from being indexed in ElasticSearch?

I am trying to utilize ElasticSearch to store large sets of data. Most of the data will be searchable, however, there are some field that will be there just so the data is stored and returned upon request.
Here is my mapping
{
"mappings": {
"properties": {
"amenities": {
"type": "completion"
},
"summary": {
"type": "text"
},
"street_number": {
"type": "text"
},
"street_name": {
"type": "text"
},
"street_suffix": {
"type": "text"
},
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"state_or_province": {
"type": "text"
},
"postal_code": {
"type": "text"
},
"mlsid": {
"type": "text"
},
"source_id": {
"type": "text"
},
"status": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"subtype": {
"type": "keyword"
},
"year_built": {
"type": "short"
},
"community": {
"type": "keyword"
},
"elementary_school": {
"type": "keyword"
},
"middle_school": {
"type": "keyword"
},
"jr_high_school": {
"type": "keyword"
},
"high_school": {
"type": "keyword"
},
"area_size": {
"type": "double"
},
"lot_size": {
"type": "double"
},
"bathrooms": {
"type": "double"
},
"bedrooms": {
"type": "double"
},
"listed_at": {
"type": "date"
},
"price": {
"type": "double"
},
"sold_at": {
"type": "date"
},
"sold_for": {
"type": "double"
},
"total_photos": {
"type": "short"
},
"formatted_addressLine": {
"type": "text"
},
"formatted_address": {
"type": "text"
},
"location": {
"type": "geo_point"
},
"price_changes": {
"type": "object"
},
"fields": {
"type": "object"
},
"deleted_at": {
"type": "date"
},
"is_available": {
"type": "boolean"
},
"is_unable_to_find_coordinates": {
"type": "boolean"
},
"source": {
"type": "keyword"
}
}
}
}
The fields and price_changes properties are there in case the user want to read that info. But that info should not be searchable or indexed. The fields holds a large list of key-value pairs whereas price_changes fields hold multiple objects of the same type.
Currently, when I attempt to bulk create records, I get Limit of total fields [1000] has been exceeded error. I am guessing this error is happening because every key-value pair in the fields collection is considered a field in elasticsearch.
How can I store the fields and the price_changes object as non-searchable data and not index it or count it toward the fields count?
You could use the enabled property at field level to store the fields without indexing them.
Read here https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html
"price_changes": {
"type": "object",
"enabled": false
}
NOTE: Are you able to create an index using the mapping you gave in the question? It gives me syntax errors(Duplicate key) at "type" field. I think you are missing a closing bracket for "city" field.

Unable to apply new index template

I am currently trying to update an index template on Elastic Search 6.7/6.8.
Templates are stored in the code and are applied each time my API starts.
There are no errors, the request returns 200.
For example, here is a template i am currently using:
{
"index_patterns": [ "*-ec2-reports" ],
"version": 11,
"mappings": {
"ec2-report": {
"properties": {
"account": {
"type": "keyword"
},
"reportDate": {
"type": "date"
},
"reportType": {
"type": "keyword"
},
"instance": {
"properties": {
"id": {
"type": "keyword"
},
"region": {
"type": "keyword"
},
"state": {
"type": "keyword"
},
"purchasing": {
"type": "keyword"
},
"keyPair": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"platform": {
"type": "keyword"
},
"tags": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"costs": {
"type": "object"
},
"stats": {
"type": "object",
"properties": {
"cpu": {
"type": "object",
"properties": {
"average": {
"type": "double"
},
"peak": {
"type": "double"
}
}
},
"network": {
"type": "object",
"properties": {
"in": {
"type": "double"
},
"out": {
"type": "double"
}
}
},
"volumes": {
"type": "nested",
"properties": {
"id": {
"type": "keyword"
},
"read": {
"type": "double"
},
"write": {
"type": "double"
}
}
}
}
},
"recommendation": {
"type": "object",
"properties": {
"instancetype": {
"type": "keyword"
},
"reason": {
"type": "keyword"
},
"newgeneration": {
"type": "keyword"
}
}
}
}
}
},
"_all": {
"enabled": false
},
"numeric_detection": false,
"date_detection": false
}
}
}
I'd like to add a new keyword field under the properties object like this :
"exampleField": {
"type": "keyword"
}
but it seems the template is not applied to existing indexes.
When data is inserted into a specific index which use the template, it is stored like this:
"exampleField": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
because the template has not been updated beforehand.
I would expect it to be like:
"exampleField": {
"type": "keyword"
}
in the index and in the template.
Does someone have any idea on how to have this result?
Thank you, Alexandre.

Why does the keyword type take up much more space than text in elasticsearch?

env: ElasticSearch 5.5.1
First there are two indexs in my elasticsearch
and the only different of two index is the message field, the field's type of message in index1 is keyword, and in index2 is text.
To ensure that it is not affected by other fields,I remove the message field and compare before and after result:
Before remove message field:
after remove message field i got:
Obvious the message field takes up a lot of space,and the type of keyword take up much more than text,but I don't know why keyword take up much more size than text?
so, is there anyone help me ?
Following is the index of index1's mapping info:
"mappings": {
"system": {
"dynamic": "true",
"_all": {
"enabled": false
},
"dynamic_date_formats": [
"yyyy-MM-dd HH:mm:ss.SSS"
],
"dynamic_templates": [
{
"geo2": {
"match": "*_geo",
"mapping": {
"type": "geo_point"
}
}
},
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"numeric_detection": false,
"properties": {
"#agent_timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"#timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"Kafkaspeed": {
"type": "keyword"
},
"_index_name": {
"type": "keyword"
},
"count": {
"type": "long"
},
"datex": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"defaultWriteESspeed": {
"type": "double"
},
"filepathname": {
"type": "keyword"
},
"jsonmessage": {
"type": "text"
},
"key": {
"type": "keyword"
},
"logcount": {
"type": "long"
},
"loglevel": {
"type": "keyword"
},
"message": {
"type": "keyword"
},
"paredspeed": {
"type": "float"
},
"seccount": {
"type": "long"
},
"sn": {
"type": "long"
},
"sourceName": {
"type": "keyword"
},
"sourceip": {
"type": "keyword"
},
"sourcename": {
"type": "keyword"
},
"sourceport": {
"type": "long"
},
"sucesscount": {
"type": "long"
},
"time_str": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"timestamp": {
"type": "long"
},
"totalcount": {
"type": "long"
},
"uniqueid": {
"type": "keyword"
}
}
}
}
and settings info:
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "3",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "60s",
"durability": "async"
},
"provided_name": "index1",
"creation_date": "1531389785215",
"analysis": {
"analyzer": {
"optionIK": {
"filter": [
"word_delimiter"
],
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "0",
"uuid": "zd8oVbwUQbys1UJ8hJZRmQ",
"version": {
"created": "5050099"
}
}
}
Following is the index of index2's mapping info:
"mappings": {
"system": {
"dynamic": "true",
"_all": {
"enabled": false
},
"dynamic_date_formats": [
"yyyy-MM-dd HH:mm:ss.SSS"
],
"dynamic_templates": [
{
"geo2": {
"match": "*_geo",
"mapping": {
"type": "geo_point"
}
}
},
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"numeric_detection": false,
"properties": {
"#agent_timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"#timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"CommunicationReturnCode": {
"type": "keyword"
},
"Kafkaspeed": {
"type": "keyword"
},
"_index_name": {
"type": "keyword"
},
"action": {
"type": "keyword"
},
"count": {
"type": "long"
},
"datex": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"defaultWriteESspeed": {
"type": "double"
},
"filepathname": {
"type": "keyword"
},
"jsonmessage": {
"type": "text"
},
"key": {
"type": "keyword"
},
"logcount": {
"type": "long"
},
"loglevel": {
"type": "keyword"
},
"message": {
"type": "text"
},
"msgid": {
"type": "keyword"
},
"msgname": {
"type": "keyword"
},
"nodetype": {
"type": "keyword"
},
"orgid": {
"type": "keyword"
},
"orgname": {
"type": "keyword"
},
"paredspeed": {
"type": "float"
},
"processingState": {
"type": "keyword"
},
"processingStatecode": {
"type": "keyword"
},
"seccount": {
"type": "long"
},
"sn": {
"type": "long"
},
"sourceName": {
"type": "keyword"
},
"sourceip": {
"type": "keyword"
},
"sourcename": {
"type": "keyword"
},
"sourceport": {
"type": "long"
},
"sucesscount": {
"type": "long"
},
"thread": {
"type": "keyword"
},
"time_str": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"timestamp": {
"type": "long"
},
"totalcount": {
"type": "long"
},
"transDescription": {
"type": "keyword"
},
"transactionErrorCode": {
"type": "keyword"
},
"transactionTimeConsuming": {
"type": "keyword"
},
"transcode": {
"type": "keyword"
},
"uniqueid": {
"type": "keyword"
}
}
}
}
and setting info:
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "2",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "60s",
"durability": "async"
},
"provided_name": "index2",
"creation_date": "1531467294314",
"analysis": {
"analyzer": {
"optionIK": {
"filter": [
"word_delimiter"
],
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "0",
"uuid": "yROU2MrMTzip4VXH_zWEXQ",
"version": {
"created": "5050099"
}
}
}
Following are one of the index's file structure of the two shards about the text type field:
and the keyword type field:
And you can believe that there are same number of documents in two folder, and the only difference of the field is the type of message field.
Could you explain it?
Thank you so much!
In Elasticsearch keyword fields have doc_values enabled by default, while text fields does not. This means that on your keyword fields it will store the whole field in a column-oriented fashion, in order to be able to perform aggregations or sorting, without relying on fielddata.
Also, Once you tokenize a string, with stemming, lowercasing, etc, you can achieve much better compression.
You can try to disable doc_values on that field if you don't perform aggregations or sorting on it.

Upgrading to Elasticsearch 5.2

I have the following legacy mapping code that works in ES 1.7 but fails in 5.2. The things that fail are multi_field is not supported as well as path. The documentation mentions that these fields were removed but fails to provide the remedy beyond suggesting to use copy_to. Cans someone give a bit more details on that.
{
"sample": {
"_parent": {
"type": "security"
},
"properties": {
"securityDocumentId": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
},
"id": {
"type": "multi_field",
"path": "full",
"fields": {
"indexer_sample_id": {
"type": "string"
},
"id": {
"type": "string",
"include_in_all": false
}
}
},
"sampleid": {
"type": "multi_field",
"path": "just_name",
"fields": {
"sampleid": {
"type": "string",
"analyzer": "my_analyzer"
},
"sample.sampleid": {
"type": "string",
"analyzer": "my_analyzer"
},
"sample.sampleid.sort": {
"type": "string",
"analyzer": "case_insensitive_sort_analyzer"
},
"sample.sampleid.name.autocomplete": {
"type": "string",
"analyzer": "autocomplete"
}
}
},
The path option's default value was full, so you can leave it out since it way deprecated in 2.0. The path value just_name doesn't exist anymore and you MUST reference all your fields by their full path name. The multi-fields can be rewritten very simply:
{
"sample": {
"_parent": {
"type": "security"
},
"properties": {
"securityDocumentId": {
"type": "keyword",
"include_in_all": false
},
"id": {
"type": "text",
"fields": {
"indexer_sample_id": {
"type": "text"
},
"id": {
"type": "text",
"include_in_all": false
}
}
},
"sampleid": {
"type": "text",
"fields": {
"sampleid": {
"type": "text",
"analyzer": "my_analyzer"
},
"sample.sampleid": {
"type": "text",
"analyzer": "my_analyzer"
},
"sample.sampleid.sort": {
"type": "text",
"analyzer": "case_insensitive_sort_analyzer"
},
"sample.sampleid.name.autocomplete": {
"type": "text",
"analyzer": "autocomplete"
}
}
},
Note that I'm not sure of the usefulness and added value of the id sub-fields

Resources