I write data using ES BulkProcessor(I tried python script, storm es-bolt, flink es-sink), but the index rate is so slow after create index mapping.
Situation 1: Leave all index settings as its default, index rate can reach about 10000+.
Situation 2: Just create index mapping, index rate fall to 3000.
I use the same data, same code, same machines.
result
flink es-sink write json data to es:
My data
repeat write the same data below(the message field is the raw log, it's about 7KB size, the delete some content for exceeding the question limit):
{
"_index": "nyc_flink_test997",
"_type": "doc",
"_id": "k8uS92cBOH4ugSIjCzmn",
"_score": 1,
"_source": {
"exception": "false",
"log_id": "8F71AF1606EE46BFA9D57AA2282D8596",
"offset": "2368",
"message_length": "2103",
"level": "INFO",
"source": "/opt/hadoop/elastic-stack/s_login/Gusermanager.usermanager.s_login.20.log",
"sessionid": "provider-60-2883b4bd3ff2b",
"associate_id": "33d081b83a0654a2",
"message": """
[16:41:33.376][I][ec4edfe0b2584b73]log start:53F9A1A1E71044E281755E930E1B004C
[16:41:33.376][T][ec4edfe0b2584b73]入参0=__REQ__
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4119)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2815)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2322)
at cn.com.agree.addal.cp.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:46)
at tc.bank.aesb.mbs.MBS_DBIMPL.PyDBGetSel(MBS_DBIMPL.java:1624)
at tc.bank.aesb.mbs.MBS_DBIMPL.PyDBExecOneSQL(MBS_DBIMPL.java:466)
at tc.bank.aesb.mbs.MBS_DBIMPL.PyDBExecGrpSQL(MBS_DBIMPL.java:123)
at tc.bank.aesb.mbs.B_MBS_DataBase.B_DBUnityRptOpr(B_MBS_DataBase.java:121)
at CUST.CustomerInfoQry.TCustomerInfoQry$Step1$Node4.execute(TCustomerInfoQry.java:200)
at CUST.CustomerInfoQry.TCustomerInfoQry$Step1.execute(TCustomerInfoQry.java:113)
at CUST.CustomerInfoQry.TCustomerInfoQry.execute(TCustomerInfoQry.java:76)
at cn.com.agree.afa.svc.javaengine.JavaEngine.execute(JavaEngine.java:237)
at cn.com.agree.afa.svc.handler.TradeHandler.handle(TradeHandler.java:62)
[16:41:33.414][I][ec4edfe0b2584b73]log end:53F9A1A1E71044E281755E930E1B004C
""",
"exec_ip": "10.88.188.167",
"start_time": "2018-12-09 16:46:14.764",
"group_v2": "Gusermanager",
"script_exec_time": "1",
"trade_exec_time": "2"
}
}
index mapping
{
"mappings": {
"doc":{
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"norms": false,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
],
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "keyword"
},
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"location": {
"type": "geo_point"
},
"latitude": {
"type": "half_float"
},
"longitude": {
"type": "half_float"
}
}
},
"exception": {
"type": "boolean"
},
"message":{
"type":"text",
"norms": false,
"analyzer": "ik_max_word"
},
"associate_id": {
"type": "text",
"analyzer": "ik_max_word"
},
"end_time": {
"type": "date",
"format": "date_time||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd||epoch_millis||HH:mm:ss.SSS"
},
"start_time": {
"type": "date",
"format": "date_time||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd||epoch_millis||HH:mm:ss.SSS"
},
"exec_ip": {
"type": "ip"
},
"level": {
"type": "keyword"
},
"script_exec_time": {
"type": "long"
},
"trade_exec_time": {
"type": "long"
},
"sessionid": {
"type": "text",
"analyzer": "ik_max_word"
},
"log_id": {
"type": "text",
"analyzer": "ik_max_word"
},
"discard_time": {
"type": "long"
},
"scene_code": {
"type": "text",
"analyzer": "ik_max_word"
},
"service_code": {
"type": "text",
"analyzer": "ik_max_word"
},
"group": {
"type": "text"
},
"group_v2" :{
"type": "text",
"analyzer": "ik_max_word"
},
"message_length":{
"type": "long"
},
"log_filename":{
"type": "text",
"analyzer": "ik_max_word"
},
"ingest_time":{
"type": "date"
}
}
}
}
}
I tried writing with python scirpts, storm es-bolt, the result is same, index rate falls after create index mapping. Can anyone give some ideas about it. Thanks in advance.
Related
Fairly new to Elastic Search so may have to bare with me, I'm running into a problem where if I search for a document using 20 characters or less, the document appears, however any more characters within the same word within the query, I get no results:
Using 'phenoxymethylpenicillin' brings no documents.
Using 'phenoxymethylpenicil' brings back documents.
This is the query I'm trying to use:
{
"match_phrase": {
"genericNames.name": {
"query": "phenoxymethylpenicillin",
"slop": 15,
"zero_terms_query": "NONE",
"boost": 1.0
}
}
}
Here is the full query: https://pastebin.com/DEJvP2uS
Like I said, I'm fairly new to this, it may be a point of not looking in the correct area.
So my question is, what possible areas would cause this and why?
Thanks!
Edit:
Provided is an extract from one of the documents from the sample data. I can't show a lot of it due a lot of it being sensitive, luckily the names from sample data I can share. This is from the data I'm trying to search for:
"genericNames":[
{
"nameType":1,
"name":"Phenoxymethylpenicillin 250mg tablets",
"nameChangeCode":"0000",
"nameBasisCode":"0001",
"nameTypeDescription":"Name",
"startDate":"1948-01-01T00:00:00.000000+0000",
"endDate":"3456-02-01T00:00:00.000000+0000"
},
{
"nameType":5,
"name":"Penicillin V 250mg tablets",
"nameTypeDescription":"Alternative Name 3",
"startDate":"1948-01-01T00:00:00.000000+0000",
"endDate":"3456-02-01T00:00:00.000000+0000"
}
],
I have also provided the index mapping as it may provide extra information:
{
"amp": {
"mappings": {
"properties": {
"_class": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ampId": {
"type": "long"
},
"amppId": {
"type": "long"
},
"attributes": {
"type": "nested",
"properties": {
"attributeQualifier": {
"type": "keyword"
},
"attributeType": {
"type": "integer"
},
"attributeTypeDescription": {
"type": "keyword"
},
"attributeValue": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"countryId": {
"type": "long"
},
"decodedValue": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
},
"dictionaries": {
"type": "nested",
"properties": {
"abbreviation": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"description": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"dictId": {
"type": "integer"
},
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
},
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"excipients": {
"type": "nested",
"properties": {
"basisOfStrengthCode": {
"type": "keyword"
},
"bossId": {
"type": "long"
},
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"id": {
"type": "long"
},
"ingredientNames": {
"properties": {
"endDate": {
"type": "date"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"startDate": {
"type": "date"
}
}
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"strengthDenominatorUnitOfMeasureCode": {
"type": "keyword"
},
"strengthDenominatorValue": {
"type": "keyword"
},
"strengthNumeratorUnitOfMeasureCode": {
"type": "keyword"
},
"strengthNumeratorValue": {
"type": "keyword"
},
"strengthVal": {
"type": "keyword"
},
"unitOfMeasure": {
"type": "keyword"
}
}
},
"extractableEntry": {
"type": "boolean"
},
"genericNames": {
"type": "nested",
"properties": {
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"name": {
"type": "text",
"ignore_above": 256,
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"nameBasisCode": {
"type": "keyword"
},
"nameChangeCode": {
"type": "keyword"
},
"nameType": {
"type": "integer"
},
"nameTypeDescription": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
},
"id": {
"type": "keyword"
},
"ingredients": {
"type": "nested",
"properties": {
"basisOfStrengthCode": {
"type": "keyword"
},
"bossId": {
"type": "long"
},
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"id": {
"type": "long"
},
"ingredientNames": {
"properties": {
"endDate": {
"type": "date"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"startDate": {
"type": "date"
}
}
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"strengthDenominatorUnitOfMeasureCode": {
"type": "keyword"
},
"strengthDenominatorValue": {
"type": "keyword"
},
"strengthNumeratorUnitOfMeasureCode": {
"type": "keyword"
},
"strengthNumeratorValue": {
"type": "keyword"
},
"strengthVal": {
"type": "keyword"
},
"unitOfMeasure": {
"type": "keyword"
}
}
},
"invalidEntry": {
"type": "boolean"
},
"pitId": {
"type": "integer"
},
"ppaCodes": {
"type": "nested",
"properties": {
"code": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
},
"proprietaryNames": {
"type": "nested",
"properties": {
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"name": {
"type": "text",
"ignore_above": 256,
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"nameBasisCode": {
"type": "keyword"
},
"nameChangeCode": {
"type": "keyword"
},
"nameType": {
"type": "integer"
},
"nameTypeDescription": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
},
"qpuUomCde": {
"type": "keyword"
},
"qpuVal": {
"type": "keyword"
},
"qtyUomCde": {
"type": "keyword"
},
"qtyVal": {
"type": "keyword"
},
"snomedCodes": {
"type": "nested",
"properties": {
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"ppaNextNo": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"snomed": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
},
"snomedDescriptions": {
"type": "nested",
"properties": {
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"ppaNextNo": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"snomed": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"suppliers": {
"type": "nested",
"properties": {
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"id": {
"type": "long"
},
"names": {
"type": "nested",
"properties": {
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"nameBasisCode": {
"type": "keyword"
},
"nameChangeCode": {
"type": "keyword"
},
"nameType": {
"type": "integer"
},
"nameTypeDescription": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
},
"udfs": {
"type": "nested",
"properties": {
"ddIndicator": {
"type": "integer"
},
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"udfsUomCode": {
"type": "keyword"
},
"udfsValue": {
"type": "keyword"
},
"vmpUomCode": {
"type": "keyword"
}
}
},
"vmpId": {
"type": "long"
},
"vmppId": {
"type": "long"
},
"vtms": {
"type": "nested",
"properties": {
"endDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
},
"id": {
"type": "long"
},
"startDate": {
"type": "date",
"format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
}
}
}
}
}
}
}
Edit: Added link to full query - https://pastebin.com/DEJvP2uS
Edit: Settings for index:
{
"index": {
"max_ngram_diff": "20",
"analysis": {
"filter": {
"autocomplete_suffix_filter": {
"type": "ngram",
"min_gram": "1",
"max_gram": "20"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete_index": {
"filter": [
"lowercase",
"autocomplete_filter",
"autocomplete_suffix_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"autocomplete_search": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1"
}
}
This must be happening due to the custom analyzer which you have on your genericNames.name field, you have different custom analyzer, index time you are using the autocomplete_index and search time autocomplete_search analyzer, but the definition of these analyzers is not provided in the question, only mapping part is provided.
Please provide the output of _setting API on your index, refer https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-settings.html for more info.
You need to check the tokens generated for phenoxymethylpenicillin using the analyze API for both autocomplete_index and autocomplete_search analyzer and you will notice the difference.
In the index mapping provided above, genericNames is of the nested type so you need to use nested query
Adding a working example using the same index data as provided above along with search query and search result.
Search Query:
{
"query": {
"nested": {
"path": "genericNames",
"query": {
"bool": {
"must": [
{
"match": {
"genericNames.name": "phenoxymethylpenicillin"
}
}
]
}
},
"inner_hits":{}
}
}
}
Search Result:
"hits": [
{
"_index": "64817981",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "genericNames",
"offset": 0
},
"_score": 0.7361701,
"_source": {
"nameType": 1,
"name": "Phenoxymethylpenicillin 250mg tablets",
"nameChangeCode": "0000",
"nameBasisCode": "0001",
"nameTypeDescription": "Name",
"startDate": "1948-01-01T00:00:00.000000+0000",
"endDate": "3456-02-01T00:00:00.000000+0000"
}
}
]
I am totally new to elastic search. So please forgive me if this is a stupid Question and my Questions might have been answered somewhere else already but I couldn't find it.
I want to use Elastic Search as a search engine for PDF'S and docx's in my network. I used fscrawler to ingest the PDF's to elastic search. Since the documents I want to ingest are in several languages I wanted to use n-graming for stemming. To do so I wanted to update my mapping like this
PUT test/_mappings/_all
{
"mappings": {
"title": {
"properties": {
"title": {
"type": "text",
"fields": {
"de": {
"type": "string",
"analyzer": "german"
},
"en": {
"type": "string",
"analyzer": "english"
},
"general": {
"type": "string",
"analyzer": "trigrams"
}
}
}
}
}
}
}
And now I get this Errormessage
{ "error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [mappings : {title={properties={title={type=text,
fields={de={type=string, analyzer=german}, en={type=string,
analyzer=english}, general={type=string, analyzer=trigrams}}}}}}]"
}
],
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [mappings : {title={properties={title={type=text,
fields={de={type=string, analyzer=german}, en={type=string,
analyzer=english}, general={type=string, analyzer=trigrams}}}}}}]"
}, "status": 400 }
Do you have any idea how i can fix this? Or do you have an idea how I can ingest the files with the right mapping without using fscrawler?
those are my settings
{
"test": {
"settings": {
"index": {
"mapping": {
"total_fields": {
"limit": "2000"
}
},
"number_of_shards": "5",
"provided_name": "test",
"creation_date": "1542031632596",
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "3"
}
},
"analyzer": {
"fscrawler_path": {
"tokenizer": "fscrawler_path"
},
"trigrams": {
"filter": [
"lowercase",
"trigrams_filter"
],
"type": "custom",
"tokenizer": "standard"
}
},
"tokenizer": {
"fscrawler_path": {
"type": "path_hierarchy"
}
}
},
"number_of_replicas": "1",
"uuid": "7L3QE5_xRACECVbTFlFY-Q",
"version": {
"created": "6040399"
}
}
}
}
}
My mapping
{
"test": {
"mappings": {
"_doc": {
"dynamic_templates": [
{
"raw_as_text": {
"path_match": "meta.raw.*",
"mapping": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
}
}
}
],
"properties": {
"attachment": {
"type": "binary"
},
"attributes": {
"properties": {
"group": {
"type": "keyword"
},
"owner": {
"type": "keyword"
}
}
},
"content": {
"type": "text"
},
"file": {
"properties": {
"checksum": {
"type": "keyword"
},
"content_type": {
"type": "keyword"
},
"created": {
"type": "date",
"format": "dateOptionalTime"
},
"extension": {
"type": "keyword"
},
"filename": {
"type": "keyword",
"store": true
},
"filesize": {
"type": "long"
},
"indexed_chars": {
"type": "long"
},
"indexing_date": {
"type": "date",
"format": "dateOptionalTime"
},
"last_accessed": {
"type": "date",
"format": "dateOptionalTime"
},
"last_modified": {
"type": "date",
"format": "dateOptionalTime"
},
"url": {
"type": "keyword",
"index": false
}
}
},
"meta": {
"properties": {
"altitude": {
"type": "text"
},
"author": {
"type": "text"
},
"comments": {
"type": "text"
},
"contributor": {
"type": "text"
},
"coverage": {
"type": "text"
},
"created": {
"type": "date",
"format": "dateOptionalTime"
},
"creator_tool": {
"type": "keyword"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"description": {
"type": "text"
},
"format": {
"type": "text"
},
"identifier": {
"type": "text"
},
"keywords": {
"type": "text"
},
"language": {
"type": "keyword"
},
"latitude": {
"type": "text"
},
"longitude": {
"type": "text"
},
"metadata_date": {
"type": "date",
"format": "dateOptionalTime"
},
"modifier": {
"type": "text"
},
"print_date": {
"type": "date",
"format": "dateOptionalTime"
},
"publisher": {
"type": "text"
},
"rating": {
"type": "byte"
},
"relation": {
"type": "text"
},
"rights": {
"type": "text"
},
"source": {
"type": "text"
},
"title": {
"type": "text"
},
"type": {
"type": "text"
}
}
},
"path": {
"properties": {
"real": {
"type": "keyword",
"fields": {
"fulltext": {
"type": "text"
},
"tree": {
"type": "text",
"analyzer": "fscrawler_path",
"fielddata": true
}
}
},
"root": {
"type": "keyword"
},
"virtual": {
"type": "keyword",
"fields": {
"fulltext": {
"type": "text"
},
"tree": {
"type": "text",
"analyzer": "fscrawler_path",
"fielddata": true
}
}
}
}
}
}
}
}
}
}
env: ElasticSearch 5.5.1
First there are two indexs in my elasticsearch
and the only different of two index is the message field, the field's type of message in index1 is keyword, and in index2 is text.
To ensure that it is not affected by other fields,I remove the message field and compare before and after result:
Before remove message field:
after remove message field i got:
Obvious the message field takes up a lot of space,and the type of keyword take up much more than text,but I don't know why keyword take up much more size than text?
so, is there anyone help me ?
Following is the index of index1's mapping info:
"mappings": {
"system": {
"dynamic": "true",
"_all": {
"enabled": false
},
"dynamic_date_formats": [
"yyyy-MM-dd HH:mm:ss.SSS"
],
"dynamic_templates": [
{
"geo2": {
"match": "*_geo",
"mapping": {
"type": "geo_point"
}
}
},
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"numeric_detection": false,
"properties": {
"#agent_timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"#timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"Kafkaspeed": {
"type": "keyword"
},
"_index_name": {
"type": "keyword"
},
"count": {
"type": "long"
},
"datex": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"defaultWriteESspeed": {
"type": "double"
},
"filepathname": {
"type": "keyword"
},
"jsonmessage": {
"type": "text"
},
"key": {
"type": "keyword"
},
"logcount": {
"type": "long"
},
"loglevel": {
"type": "keyword"
},
"message": {
"type": "keyword"
},
"paredspeed": {
"type": "float"
},
"seccount": {
"type": "long"
},
"sn": {
"type": "long"
},
"sourceName": {
"type": "keyword"
},
"sourceip": {
"type": "keyword"
},
"sourcename": {
"type": "keyword"
},
"sourceport": {
"type": "long"
},
"sucesscount": {
"type": "long"
},
"time_str": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"timestamp": {
"type": "long"
},
"totalcount": {
"type": "long"
},
"uniqueid": {
"type": "keyword"
}
}
}
}
and settings info:
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "3",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "60s",
"durability": "async"
},
"provided_name": "index1",
"creation_date": "1531389785215",
"analysis": {
"analyzer": {
"optionIK": {
"filter": [
"word_delimiter"
],
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "0",
"uuid": "zd8oVbwUQbys1UJ8hJZRmQ",
"version": {
"created": "5050099"
}
}
}
Following is the index of index2's mapping info:
"mappings": {
"system": {
"dynamic": "true",
"_all": {
"enabled": false
},
"dynamic_date_formats": [
"yyyy-MM-dd HH:mm:ss.SSS"
],
"dynamic_templates": [
{
"geo2": {
"match": "*_geo",
"mapping": {
"type": "geo_point"
}
}
},
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"numeric_detection": false,
"properties": {
"#agent_timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"#timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"CommunicationReturnCode": {
"type": "keyword"
},
"Kafkaspeed": {
"type": "keyword"
},
"_index_name": {
"type": "keyword"
},
"action": {
"type": "keyword"
},
"count": {
"type": "long"
},
"datex": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"defaultWriteESspeed": {
"type": "double"
},
"filepathname": {
"type": "keyword"
},
"jsonmessage": {
"type": "text"
},
"key": {
"type": "keyword"
},
"logcount": {
"type": "long"
},
"loglevel": {
"type": "keyword"
},
"message": {
"type": "text"
},
"msgid": {
"type": "keyword"
},
"msgname": {
"type": "keyword"
},
"nodetype": {
"type": "keyword"
},
"orgid": {
"type": "keyword"
},
"orgname": {
"type": "keyword"
},
"paredspeed": {
"type": "float"
},
"processingState": {
"type": "keyword"
},
"processingStatecode": {
"type": "keyword"
},
"seccount": {
"type": "long"
},
"sn": {
"type": "long"
},
"sourceName": {
"type": "keyword"
},
"sourceip": {
"type": "keyword"
},
"sourcename": {
"type": "keyword"
},
"sourceport": {
"type": "long"
},
"sucesscount": {
"type": "long"
},
"thread": {
"type": "keyword"
},
"time_str": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"timestamp": {
"type": "long"
},
"totalcount": {
"type": "long"
},
"transDescription": {
"type": "keyword"
},
"transactionErrorCode": {
"type": "keyword"
},
"transactionTimeConsuming": {
"type": "keyword"
},
"transcode": {
"type": "keyword"
},
"uniqueid": {
"type": "keyword"
}
}
}
}
and setting info:
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "2",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "60s",
"durability": "async"
},
"provided_name": "index2",
"creation_date": "1531467294314",
"analysis": {
"analyzer": {
"optionIK": {
"filter": [
"word_delimiter"
],
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "0",
"uuid": "yROU2MrMTzip4VXH_zWEXQ",
"version": {
"created": "5050099"
}
}
}
Following are one of the index's file structure of the two shards about the text type field:
and the keyword type field:
And you can believe that there are same number of documents in two folder, and the only difference of the field is the type of message field.
Could you explain it?
Thank you so much!
In Elasticsearch keyword fields have doc_values enabled by default, while text fields does not. This means that on your keyword fields it will store the whole field in a column-oriented fashion, in order to be able to perform aggregations or sorting, without relying on fielddata.
Also, Once you tokenize a string, with stemming, lowercasing, etc, you can achieve much better compression.
You can try to disable doc_values on that field if you don't perform aggregations or sorting on it.
I have the following legacy mapping code that works in ES 1.7 but fails in 5.2. The things that fail are multi_field is not supported as well as path. The documentation mentions that these fields were removed but fails to provide the remedy beyond suggesting to use copy_to. Cans someone give a bit more details on that.
{
"sample": {
"_parent": {
"type": "security"
},
"properties": {
"securityDocumentId": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
},
"id": {
"type": "multi_field",
"path": "full",
"fields": {
"indexer_sample_id": {
"type": "string"
},
"id": {
"type": "string",
"include_in_all": false
}
}
},
"sampleid": {
"type": "multi_field",
"path": "just_name",
"fields": {
"sampleid": {
"type": "string",
"analyzer": "my_analyzer"
},
"sample.sampleid": {
"type": "string",
"analyzer": "my_analyzer"
},
"sample.sampleid.sort": {
"type": "string",
"analyzer": "case_insensitive_sort_analyzer"
},
"sample.sampleid.name.autocomplete": {
"type": "string",
"analyzer": "autocomplete"
}
}
},
The path option's default value was full, so you can leave it out since it way deprecated in 2.0. The path value just_name doesn't exist anymore and you MUST reference all your fields by their full path name. The multi-fields can be rewritten very simply:
{
"sample": {
"_parent": {
"type": "security"
},
"properties": {
"securityDocumentId": {
"type": "keyword",
"include_in_all": false
},
"id": {
"type": "text",
"fields": {
"indexer_sample_id": {
"type": "text"
},
"id": {
"type": "text",
"include_in_all": false
}
}
},
"sampleid": {
"type": "text",
"fields": {
"sampleid": {
"type": "text",
"analyzer": "my_analyzer"
},
"sample.sampleid": {
"type": "text",
"analyzer": "my_analyzer"
},
"sample.sampleid.sort": {
"type": "text",
"analyzer": "case_insensitive_sort_analyzer"
},
"sample.sampleid.name.autocomplete": {
"type": "text",
"analyzer": "autocomplete"
}
}
},
Note that I'm not sure of the usefulness and added value of the id sub-fields
I tried to use the following mapping to index my data:
{
"mappings": {
"chow-demo": {
"properties": {
"#fields": {
"dynamic": "true",
"properties": {
"asgid": {
"type": "string",
"analyzer": "keyword"
},
"asid": {
"type": "long"
},
"astid": {
"type": "long"
},
"clfg": {
"analyzer": "keyword",
"type": "string"
},
"httpcode": {
"type": "long"
},
"oid": {
"type": "string"
},
"onid": {
"type": "long"
},
"ptrnr": {
"analyzer": "keyword",
"type": "string"
},
"pguid": {
"analyzer": "keyword",
"type": "string"
},
"ptid": {
"type": "long"
},
"sid": {
"type": "long"
},
"src_url": {
"analyzer": "keyword",
"type": "string"
},
"title": {
"analyzer": "keyword",
"type": "string"
},
"ts": {
"type": "long"
}
}
},
"#timestamp": {
"format": "dateOptionalTime",
"type": "date"
},
"#message": {
"type": "string"
},
"#source": {
"type": "string"
},
"#type": {
"analyzer": "keyword",
"type": "string"
},
"#tags": {
"type": "string"
},
"#source_host": {
"type": "string"
},
"#source_path": {
"type": "string"
}
}
},
"chow-clfg": {
"_parent": {
"type": "chow-demo"
},
"dynamic": "true",
"properties": {
"_ttl": {
"enabled": true,
"default": "1h"
},
"clfg": {
"analyzer": "keyword",
"type": "string"
},
"#timestamp": {
"format": "dateOptionalTime",
"type": "date"
},
"count": {
"type": "long"
}
}
}
}
}
I tried to populate the parent type "chow-demo" without populating the child type "chow-clfg", and the document refused to index. (No documents were indexed into Elasticsearach)
When I take out the child mapping for "chow-clfg", it does indexing properly as usual. Hence I have the following question:
Is my mapping structure wrong?
Must the parent and child be indexed together at the same time before the data can be successfully indexed?
Really need help in this question for my project to progress! Thanks!
Yes, your mapping is wrong. The _ttl element should be one level higher in the chow-clfg type. In other words _ttl should be on the same level as _parent. However, I am not quite sure how this problem can affect your ability to index.
Parents and children don't have to be indexed together.