I am trying to utilize ElasticSearch to store large sets of data. Most of the data will be searchable, however, there are some field that will be there just so the data is stored and returned upon request.
Here is my mapping
{
"mappings": {
"properties": {
"amenities": {
"type": "completion"
},
"summary": {
"type": "text"
},
"street_number": {
"type": "text"
},
"street_name": {
"type": "text"
},
"street_suffix": {
"type": "text"
},
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"state_or_province": {
"type": "text"
},
"postal_code": {
"type": "text"
},
"mlsid": {
"type": "text"
},
"source_id": {
"type": "text"
},
"status": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"subtype": {
"type": "keyword"
},
"year_built": {
"type": "short"
},
"community": {
"type": "keyword"
},
"elementary_school": {
"type": "keyword"
},
"middle_school": {
"type": "keyword"
},
"jr_high_school": {
"type": "keyword"
},
"high_school": {
"type": "keyword"
},
"area_size": {
"type": "double"
},
"lot_size": {
"type": "double"
},
"bathrooms": {
"type": "double"
},
"bedrooms": {
"type": "double"
},
"listed_at": {
"type": "date"
},
"price": {
"type": "double"
},
"sold_at": {
"type": "date"
},
"sold_for": {
"type": "double"
},
"total_photos": {
"type": "short"
},
"formatted_addressLine": {
"type": "text"
},
"formatted_address": {
"type": "text"
},
"location": {
"type": "geo_point"
},
"price_changes": {
"type": "object"
},
"fields": {
"type": "object"
},
"deleted_at": {
"type": "date"
},
"is_available": {
"type": "boolean"
},
"is_unable_to_find_coordinates": {
"type": "boolean"
},
"source": {
"type": "keyword"
}
}
}
}
The fields and price_changes properties are there in case the user want to read that info. But that info should not be searchable or indexed. The fields holds a large list of key-value pairs whereas price_changes fields hold multiple objects of the same type.
Currently, when I attempt to bulk create records, I get Limit of total fields [1000] has been exceeded error. I am guessing this error is happening because every key-value pair in the fields collection is considered a field in elasticsearch.
How can I store the fields and the price_changes object as non-searchable data and not index it or count it toward the fields count?
You could use the enabled property at field level to store the fields without indexing them.
Read here https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html
"price_changes": {
"type": "object",
"enabled": false
}
NOTE: Are you able to create an index using the mapping you gave in the question? It gives me syntax errors(Duplicate key) at "type" field. I think you are missing a closing bracket for "city" field.
Related
I would like to load data from BigQuery to ElasticSearch using these custom mappings.
{
"properties": {
"#timestamp": {
"type": "date"
},
"affordable": {
"type": "boolean"
},
"building_type": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"category_1_name": {
"type": "keyword"
},
"category_2_name": {
"type": "keyword"
},
"category_3_name": {
"type": "keyword"
},
"company_id": {
"type": "long"
}
}
I'm not sure what is the equivalent type for "keyword" that i can use in BigQuery. Appreciate your help! Thanks
I am currently trying to update an index template on Elastic Search 6.7/6.8.
Templates are stored in the code and are applied each time my API starts.
There are no errors, the request returns 200.
For example, here is a template i am currently using:
{
"index_patterns": [ "*-ec2-reports" ],
"version": 11,
"mappings": {
"ec2-report": {
"properties": {
"account": {
"type": "keyword"
},
"reportDate": {
"type": "date"
},
"reportType": {
"type": "keyword"
},
"instance": {
"properties": {
"id": {
"type": "keyword"
},
"region": {
"type": "keyword"
},
"state": {
"type": "keyword"
},
"purchasing": {
"type": "keyword"
},
"keyPair": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"platform": {
"type": "keyword"
},
"tags": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"costs": {
"type": "object"
},
"stats": {
"type": "object",
"properties": {
"cpu": {
"type": "object",
"properties": {
"average": {
"type": "double"
},
"peak": {
"type": "double"
}
}
},
"network": {
"type": "object",
"properties": {
"in": {
"type": "double"
},
"out": {
"type": "double"
}
}
},
"volumes": {
"type": "nested",
"properties": {
"id": {
"type": "keyword"
},
"read": {
"type": "double"
},
"write": {
"type": "double"
}
}
}
}
},
"recommendation": {
"type": "object",
"properties": {
"instancetype": {
"type": "keyword"
},
"reason": {
"type": "keyword"
},
"newgeneration": {
"type": "keyword"
}
}
}
}
}
},
"_all": {
"enabled": false
},
"numeric_detection": false,
"date_detection": false
}
}
}
I'd like to add a new keyword field under the properties object like this :
"exampleField": {
"type": "keyword"
}
but it seems the template is not applied to existing indexes.
When data is inserted into a specific index which use the template, it is stored like this:
"exampleField": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
because the template has not been updated beforehand.
I would expect it to be like:
"exampleField": {
"type": "keyword"
}
in the index and in the template.
Does someone have any idea on how to have this result?
Thank you, Alexandre.
I just deployed a small application that loads a few thousand docs into an index and when working with production data i get an error in my search request.
http code is 400 and the error is
{
"error": {
"root_cause": [
{
"type": "number_format_exception",
"reason": "empty String"
}
],
"type": "number_format_exception",
"reason": "empty String"
},
"status": 400
}
Okay i kind of get it that my mapping defines some numeric field which i oviously dont store correctly, but how am i supposed to find that field?
each doc contains hundereds of fields.... i mean, really?
I tried looking in /var/log/elasticsearch but nothing useful there...
Please help me i need to get the thing going
I defined my fields as integer which should hold arrays and might be empty. Could that be a problem?
My ES Version is 6.6.0
Update:
The error does occur while searching, during index all is fine
My mapping for that index:
{
"development-object-1551202425": {
"mappings": {
"_doc": {
"dynamic": "false",
"properties": {
"accommodation": {
"properties": {
"badges": {
"properties": {
"maskedProp1": {
"type": "boolean"
},
"maskedProp2": {
"type": "boolean"
},
"maskedProp3": {
"type": "boolean"
},
"maskedProp4": {
"type": "boolean"
},
"maskedProp5": {
"type": "boolean"
},
"maskedProp6": {
"type": "boolean"
}
}
},
"businessTypes": {
"type": "integer"
},
"classification": {
"properties": {
"classification": {
"type": "keyword"
},
"classificationValue": {
"type": "short"
}
}
},
"endowments": {
"type": "integer"
},
"hasPrice": {
"type": "boolean"
},
"lowestPrice": {
"type": "float"
},
"metascore": {
"type": "short"
},
"rating": {
"type": "short"
},
"regionscore": {
"type": "short"
}
}
},
"certificates": {
"type": "integer"
},
"geoLocation": {
"type": "geo_point"
},
"id": {
"type": "text"
},
"isAccommodation": {
"type": "boolean"
},
"location": {
"properties": {
"maskedProp1": {
"type": "integer"
},
"maskedProp2": {
"type": "integer"
},
"id": {
"type": "integer"
},
"name": {
"type": "text",
"fielddata": true
},
"zipcodes": {
"type": "integer"
}
}
},
"maskedProp1": {
"type": "integer"
},
"maskedProp2": {
"type": "integer"
},
"description": {
"type": "text"
},
"sortTitle": {
"type": "keyword"
},
"title": {
"type": "text"
}
}
}
}
}
}
The name consists of an environment string (development) and a timestamp appended (i work with automatic index switching and query for the alias, that does is called {env}-{name}-current.
In my case the error was an empty "size" parameter in the query, i tried to find the error in my filters and did not see that...
A more verbose error message (at least at what property or setting the error occured) could save thousands of hours of debugging all around the world i guess xD.
For now you would have to take apart your dsl section by section to find the issue.
env: ElasticSearch 5.5.1
First there are two indexs in my elasticsearch
and the only different of two index is the message field, the field's type of message in index1 is keyword, and in index2 is text.
To ensure that it is not affected by other fields,I remove the message field and compare before and after result:
Before remove message field:
after remove message field i got:
Obvious the message field takes up a lot of space,and the type of keyword take up much more than text,but I don't know why keyword take up much more size than text?
so, is there anyone help me ?
Following is the index of index1's mapping info:
"mappings": {
"system": {
"dynamic": "true",
"_all": {
"enabled": false
},
"dynamic_date_formats": [
"yyyy-MM-dd HH:mm:ss.SSS"
],
"dynamic_templates": [
{
"geo2": {
"match": "*_geo",
"mapping": {
"type": "geo_point"
}
}
},
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"numeric_detection": false,
"properties": {
"#agent_timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"#timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"Kafkaspeed": {
"type": "keyword"
},
"_index_name": {
"type": "keyword"
},
"count": {
"type": "long"
},
"datex": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"defaultWriteESspeed": {
"type": "double"
},
"filepathname": {
"type": "keyword"
},
"jsonmessage": {
"type": "text"
},
"key": {
"type": "keyword"
},
"logcount": {
"type": "long"
},
"loglevel": {
"type": "keyword"
},
"message": {
"type": "keyword"
},
"paredspeed": {
"type": "float"
},
"seccount": {
"type": "long"
},
"sn": {
"type": "long"
},
"sourceName": {
"type": "keyword"
},
"sourceip": {
"type": "keyword"
},
"sourcename": {
"type": "keyword"
},
"sourceport": {
"type": "long"
},
"sucesscount": {
"type": "long"
},
"time_str": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"timestamp": {
"type": "long"
},
"totalcount": {
"type": "long"
},
"uniqueid": {
"type": "keyword"
}
}
}
}
and settings info:
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "3",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "60s",
"durability": "async"
},
"provided_name": "index1",
"creation_date": "1531389785215",
"analysis": {
"analyzer": {
"optionIK": {
"filter": [
"word_delimiter"
],
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "0",
"uuid": "zd8oVbwUQbys1UJ8hJZRmQ",
"version": {
"created": "5050099"
}
}
}
Following is the index of index2's mapping info:
"mappings": {
"system": {
"dynamic": "true",
"_all": {
"enabled": false
},
"dynamic_date_formats": [
"yyyy-MM-dd HH:mm:ss.SSS"
],
"dynamic_templates": [
{
"geo2": {
"match": "*_geo",
"mapping": {
"type": "geo_point"
}
}
},
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"numeric_detection": false,
"properties": {
"#agent_timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"#timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"CommunicationReturnCode": {
"type": "keyword"
},
"Kafkaspeed": {
"type": "keyword"
},
"_index_name": {
"type": "keyword"
},
"action": {
"type": "keyword"
},
"count": {
"type": "long"
},
"datex": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"defaultWriteESspeed": {
"type": "double"
},
"filepathname": {
"type": "keyword"
},
"jsonmessage": {
"type": "text"
},
"key": {
"type": "keyword"
},
"logcount": {
"type": "long"
},
"loglevel": {
"type": "keyword"
},
"message": {
"type": "text"
},
"msgid": {
"type": "keyword"
},
"msgname": {
"type": "keyword"
},
"nodetype": {
"type": "keyword"
},
"orgid": {
"type": "keyword"
},
"orgname": {
"type": "keyword"
},
"paredspeed": {
"type": "float"
},
"processingState": {
"type": "keyword"
},
"processingStatecode": {
"type": "keyword"
},
"seccount": {
"type": "long"
},
"sn": {
"type": "long"
},
"sourceName": {
"type": "keyword"
},
"sourceip": {
"type": "keyword"
},
"sourcename": {
"type": "keyword"
},
"sourceport": {
"type": "long"
},
"sucesscount": {
"type": "long"
},
"thread": {
"type": "keyword"
},
"time_str": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"timestamp": {
"type": "long"
},
"totalcount": {
"type": "long"
},
"transDescription": {
"type": "keyword"
},
"transactionErrorCode": {
"type": "keyword"
},
"transactionTimeConsuming": {
"type": "keyword"
},
"transcode": {
"type": "keyword"
},
"uniqueid": {
"type": "keyword"
}
}
}
}
and setting info:
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "2",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "60s",
"durability": "async"
},
"provided_name": "index2",
"creation_date": "1531467294314",
"analysis": {
"analyzer": {
"optionIK": {
"filter": [
"word_delimiter"
],
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "0",
"uuid": "yROU2MrMTzip4VXH_zWEXQ",
"version": {
"created": "5050099"
}
}
}
Following are one of the index's file structure of the two shards about the text type field:
and the keyword type field:
And you can believe that there are same number of documents in two folder, and the only difference of the field is the type of message field.
Could you explain it?
Thank you so much!
In Elasticsearch keyword fields have doc_values enabled by default, while text fields does not. This means that on your keyword fields it will store the whole field in a column-oriented fashion, in order to be able to perform aggregations or sorting, without relying on fielddata.
Also, Once you tokenize a string, with stemming, lowercasing, etc, you can achieve much better compression.
You can try to disable doc_values on that field if you don't perform aggregations or sorting on it.
I tried to use the following mapping to index my data:
{
"mappings": {
"chow-demo": {
"properties": {
"#fields": {
"dynamic": "true",
"properties": {
"asgid": {
"type": "string",
"analyzer": "keyword"
},
"asid": {
"type": "long"
},
"astid": {
"type": "long"
},
"clfg": {
"analyzer": "keyword",
"type": "string"
},
"httpcode": {
"type": "long"
},
"oid": {
"type": "string"
},
"onid": {
"type": "long"
},
"ptrnr": {
"analyzer": "keyword",
"type": "string"
},
"pguid": {
"analyzer": "keyword",
"type": "string"
},
"ptid": {
"type": "long"
},
"sid": {
"type": "long"
},
"src_url": {
"analyzer": "keyword",
"type": "string"
},
"title": {
"analyzer": "keyword",
"type": "string"
},
"ts": {
"type": "long"
}
}
},
"#timestamp": {
"format": "dateOptionalTime",
"type": "date"
},
"#message": {
"type": "string"
},
"#source": {
"type": "string"
},
"#type": {
"analyzer": "keyword",
"type": "string"
},
"#tags": {
"type": "string"
},
"#source_host": {
"type": "string"
},
"#source_path": {
"type": "string"
}
}
},
"chow-clfg": {
"_parent": {
"type": "chow-demo"
},
"dynamic": "true",
"properties": {
"_ttl": {
"enabled": true,
"default": "1h"
},
"clfg": {
"analyzer": "keyword",
"type": "string"
},
"#timestamp": {
"format": "dateOptionalTime",
"type": "date"
},
"count": {
"type": "long"
}
}
}
}
}
I tried to populate the parent type "chow-demo" without populating the child type "chow-clfg", and the document refused to index. (No documents were indexed into Elasticsearach)
When I take out the child mapping for "chow-clfg", it does indexing properly as usual. Hence I have the following question:
Is my mapping structure wrong?
Must the parent and child be indexed together at the same time before the data can be successfully indexed?
Really need help in this question for my project to progress! Thanks!
Yes, your mapping is wrong. The _ttl element should be one level higher in the chow-clfg type. In other words _ttl should be on the same level as _parent. However, I am not quite sure how this problem can affect your ability to index.
Parents and children don't have to be indexed together.