extract data from csv failed - elasticsearch

so i tried to extract data from csv, but its seems i failed
i have already tried, but it seems i always got it wrong
this is my message data
"message" : """42307;"FX2CHTPEKAFB";"PACKING CYL COP JUPITER Z FUBORU";"PCS";"";"";"";"";;"""""
this is my pattern
"patterns": ["""%{DATA:id_product};"%{DATA:code_product}";"%{DATA:name_product}";"%{DATA:satuan_product}";"%{DATA:merek_vehicle}";"%{DATA:jenis_vehicle}";"%{DATA:merek_product}";"%{DATA:part_number}";%{DATA:weight:float};"%{DATA:unit_weight}""""]"patterns": ["""%{DATA:id_product};"%{DATA:code_product}";"%{DATA:name_product}";"%{DATA:satuan_product}";"%{DATA:merek_vehicle}";"%{DATA:jenis_vehicle}";"%{DATA:merek_product}";"%{DATA:part_number}";%{DATA:weight:float};"%{DATA:unit_weight}""""]
my result
"docs" : [
{
"error" : {
"root_cause" : [
{
"type" : "exception",
"reason" : """java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [42305;"FX4PER000501I";"PER DPN F-50 DH-0005-01 48110-87624-01 MITS";"PCS";"DAIHATSU";"";"INDOSPRING";"";;]""",
"header" : {
"processor_type" : "grok"
}
}
],
"type" : "exception",
"reason" : """java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [42305;"FX4PER000501I";"PER DPN F-50 DH-0005-01 48110-87624-01 MITS";"PCS";"DAIHATSU";"";"INDOSPRING";"";;]""",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : """java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [42305;"FX4PER000501I";"PER DPN F-50 DH-0005-01 48110-87624-01 MITS";"PCS";"DAIHATSU";"";"INDOSPRING";"";;]""",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : """Provided Grok expressions do not match field value: [42305;"FX4PER000501I";"PER DPN F-50 DH-0005-01 48110-87624-01 MITS";"PCS";"DAIHATSU";"";"INDOSPRING";"";;]"""
}
},
"header" : {
"processor_type" : "grok"
}
}

It doesn't like %{DATA:weight:float}.
If you remove :float, giving:
%{DATA:id_product};"%{DATA:code_product}";"%{DATA:name_product}";"%{DATA:satuan_product}";"%{DATA:merek_vehicle}";"%{DATA:jenis_vehicle}";"%{DATA:merek_product}";"%{DATA:part_number}";%{DATA:weight};"%{DATA:unit_weight}
You will get:
{
"name_product": "PACKING CYL COP JUPITER Z FUBORU",
"jenis_vehicle": "",
"satuan_product": "PCS",
"weight": "",
"id_product": "42307",
"merek_vehicle": "",
"code_product": "FX2CHTPEKAFB",
"merek_product": "",
"part_number": "",
"unit_weight": ""
}

Related

Why can't I pass index field value into painless script?

Here's my index:
PUT my-index-000001/_doc/1
{
"virtual": "/testss/3-1.pdf",
"file": "3-1",
"caseno": "testss"
}
I am trying to pass the file value "3-1" into the following script and then conditionally either return the value or divide it by 100:
GET my-index-000001/_search
{
"script_fields": {
"mynewfield": {
"script": {
"source":"""
List i=Arrays.asList(doc['file'].value.splitOnToken("-"));
if (i.length==1){
return Float.parseFloat(i[0]);
}
if (i.length==2){
return Float.parseFloat(i[0])+Float.parseFloat(i[1])/100;
}
"""
}
}
}
}
And Getting following errors:
{
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:757)",
"org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:116)",
"org.elasticsearch.index.query.QueryShardContext.lambda$lookup$0(QueryShardContext.java:330)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:97)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:94)",
"java.base/java.security.AccessController.doPrivileged(AccessController.java:312)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:94)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:41)",
"i=Arrays.asList(doc['file'].value.splitOnToken(\"-\"));\n ",
" ^---- HERE"
],
"script" : "\n \n List i=Arrays.asList(doc['file'].value.splitOnToken(\"-\"));\n if (i.length==1){\n return Float.parseFloat(i[0]);\n }\n if (i.length==2){\n return Float.parseFloat(i[0])+Float.parseFloat(i[1])/100;\n }\n \n \n ",
"lang" : "painless",
"position" : {
"offset" : 39,
"start" : 19,
"end" : 78
}
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "my-index-000001",
"node" : "wJdb2G1VQCyaDNduQLS4SQ",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:757)",
"org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:116)",
"org.elasticsearch.index.query.QueryShardContext.lambda$lookup$0(QueryShardContext.java:330)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:97)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:94)",
"java.base/java.security.AccessController.doPrivileged(AccessController.java:312)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:94)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:41)",
"i=Arrays.asList(doc['file'].value.splitOnToken(\"-\"));\n ",
" ^---- HERE"
],
"script" : "\n \n List i=Arrays.asList(doc['file'].value.splitOnToken(\"-\"));\n if (i.length==1){\n return Float.parseFloat(i[0]);\n }\n if (i.length==2){\n return Float.parseFloat(i[0])+Float.parseFloat(i[1])/100;\n }\n \n \n ",
"lang" : "painless",
"position" : {
"offset" : 39,
"start" : 19,
"end" : 78
},
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [file] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
}
]
},
"status" : 400
}
What should I do differently here?
I tried referencing File: "3-1" with doc['file'].value in the script...doesnt seem to be working.
This ended up working for me:
POST my-index-000001/_update_by_query/ { "script": {
"lang": "painless",
"source": "ctx._source.caseno=Arrays.asList(ctx._source.virtual.splitOnToken('/'))[1];ctx._source.file=Arrays.asList(Arrays.asList(ctx._source.virtual.splitOnToken('/'))[2].splitOnToken('.'))[0];List
i=Arrays.asList(Arrays.asList(Arrays.asList(ctx._source.virtual.splitOnToken('/'))[2].splitOnToken('.'))[0].splitOnToken('-'));if
(i.length==1){ctx._source.mynewfield=Float.parseFloat(i[0]);}if
(i.length==2){ctx._source.mynewfield=Float.parseFloat(i[0])+Float.parseFloat(i[1])/100}"
} }

Getting a timestamp exception when I try to update an unrelated field using painless in elasticsearch

Im trying to run the following script
POST /data_hip/_update/1638643727.0
{
"script":{
"source":"ctx._source.avgmer=4;"
}
}
But I am getting the following error.
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [#timestamp] of type [date] in document with id '1638643727.0'. Preview of field's value: '1.638642742E12'"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [#timestamp] of type [date] in document with id '1638643727.0'. Preview of field's value: '1.638642742E12'",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "failed to parse date field [1.638642742E12] with format [epoch_millis]",
"caused_by" : {
"type" : "date_time_parse_exception",
"reason" : "Failed to parse with all enclosed parsers"
}
}
},
"status" : 400
}
this is strange because on queries (not updates) the date time is parsed fine.
The timestamp field mapping is as follows
"#timestamp": {
"type":"date",
"format":"epoch_millis"
},
I am running elasticsearch 7+
EDIT:
Adding my index settings
{
"data_hip" : {
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "data_hip",
"creation_date" : "1638559533343",
"number_of_replicas" : "1",
"uuid" : "CHjkvSdhSgySLioCju9NqQ",
"version" : {
"created" : "7150199"
}
}
}
}
}
Im not running an ingest pipeline
The problem is the scientific notation, the 'E12' suffix, being in a field that ES is expecting to be an integer.
Using this reprex:
PUT so_test
{
"mappings": {
"properties": {
"ts": {
"type": "date",
"format": "epoch_millis"
}
}
}
}
# this works
POST so_test/_doc/
{
"ts" : "123456789"
}
# this does not, throws the same error you have IRL
POST so_test/_doc/
{
"ts" : "123456789E12"
}
I'm not sure how/where those values are creeping in, but they are there in the document you are passing to ES.

Mapping text field to integer field in Elasticsearch

I imported the data into my Elasticsearch and later on I was trying to change the field from text to integer but I'm getting an error:
Root mapping definition has unsupported parameters: [include_type_name : false] [_doc : {properties={year={type=integer}}}]
My query:
PUT index-csv/_mapping
{
"include_type_name": "false",
"_doc": {
"properties": {
"year": {
"type": "integer"
}
}
}
}
And error message:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "Root mapping definition has unsupported parameters: [include_type_name : false] [_doc : {properties={year={type=integer}}}]"
}
],
"type" : "mapper_parsing_exception",
"reason" : "Failed to parse mapping [_doc]: Root mapping definition has unsupported parameters: [include_type_name : false] [_doc : {properties={year={type=integer}}}]",
"caused_by" : {
"type" : "mapper_parsing_exception",
"reason" : "Root mapping definition has unsupported parameters: [include_type_name : false] [_doc : {properties={year={type=integer}}}]"
}
},
"status" : 400
}
I'm a novice to Elasticsearch; how can I resolve this?
Mapping types are removed in 7.0.0 and consequently, "include_type_name" is deprecated after 7.0.0.
Also, this should be the correct syntax.
PUT index-csv/_mapping
{
"properties": {
"year": {
"type": "integer"
}
}
}

ElasticSearch, simple two fields comparison with painless

I'm trying to run a query such as SELECT * FROM indexPeople WHERE info.Age > info.AgeExpectancy
Note the two fields are NOT nested, they are just json object
POST /indexPeople/_search
{
"from" : 0,
"size" : 200,
"query" : {
"bool" : {
"filter" : [
{
"bool" : {
"must" : [
{
"script" : {
"script" : {
"source" : "doc['info.Age'].value > doc['info.AgeExpectancy'].value",
"lang" : "painless"
},
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"info"
],
"excludes" : [ ]
}
}
However this query fails as
{
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.get(ScriptDocValues.java:121)",
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.getValue(ScriptDocValues.java:115)",
"doc['info.Age'].value > doc['info.AgeExpectancy'].value",
" ^---- HERE"
],
"script" : "doc['info.Age'].value > doc['info.AgeExpectancy'].value",
"lang" : "painless",
"position" : {
"offset" : 22,
"start" : 0,
"end" : 70
}
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "indexPeople",
"node" : "c_Dv3IrlQmyvIVpLoR9qVA",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.get(ScriptDocValues.java:121)",
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.getValue(ScriptDocValues.java:115)",
"doc['info.Age'].value > doc['info.AgeExpectancy'].value",
" ^---- HERE"
],
"script" : "doc['info.Age'].value > doc['info.AgeExpectancy'].value",
"lang" : "painless",
"position" : {
"offset" : 22,
"start" : 0,
"end" : 70
},
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
}
}
}
]
},
"status" : 400
}
Is there a way to achieve this?
What is the best way to debug it? I wanted to print the objects or look at the logs (which aren't there), but I couldn't find a way to do neither.
The mapping is:
{
"mappings": {
"_doc": {
"properties": {
"info": {
"properties": {
"Age": {
"type": "long"
},
"AgeExpectancy": {
"type": "long"
}
}
}
}
}
}
}
perhaps you already solved the issue. The reason why the query failed is clear:
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
}
Basically there is one or more document that do not have one of the queried fields. So you can achieve the result you need by using an if to check if the fields do indeed exists. If they do not exist, you can simply return false as follows:
{
"script": """
if (doc['info.Age'].size() > 0 && doc['info.AgeExpectancy'].size() > 0) {
return doc['info.Age'].value > doc['info.AgeExpectancy'].value
}
return false;
}
"""
I tested it with an Elasticsearch 7.10.2 and it works.
What is the best way to debug it
That is a though question, perhaps someone has a better answer for it. I try to list some options. Obviously, debugging requires to read carefully the error messages.
PAINLESS LAB
If you have a pretty recent version of Kibana, you can try to use the painless lab to simulate your documents and get the errors quicker and in a more focused environment.
KIBANA Scripted Field
You can try to create a bolean scripted field in the index pattern named condition. Before clicking create remember to click "preview result":
MINIMAL EXAMPLE Create a minimal example to reduce the complexity.
For this answer I used a sample index with four documents with all possible cases.
No info: { "message": "ok"}
Info.Age but not AgeExpectancy: {"message":"ok","info":{"Age":14}}
Info.AgeExpectancy but not Age: {"message":"ok","info":{"AgeExpectancy":12}}
Info.Age and AgeExpectancy: {"message":"ok","info":{"Age":14, "AgeExpectancy": 12}}

Elasticsearch not allowing brackets for fields name for scripting

I've tried to use update api. I've inserted a document which contains a list.
INSERT:
curl -XPOST "http://localhost:9200/t/t/1" -d'
{
"hobbies(first)" : ["a", "b"]
}'
UPDATE QUERY:
curl -XPOST localhost:9200/t/t/1/_update?pretty -d '{ "script" : {
"inline": "ctx._source.hobbies(first).add(params.new_hobby)",
"params" : {
"new_hobby" : "c"
}
}
}'
ERROR:
{
"error" : {
"root_cause" : [
{
"type" : "remote_transport_exception",
"reason" : "[aaBiwwv][172.17.0.2:9300][indices:data/write/update[s]]"
}
],
"type" : "illegal_argument_exception",
"reason" : "failed to execute script",
"caused_by" : {
"type" : "script_exception",
"reason" : "compile error",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Variable [first] is not defined."
},
"script_stack" : [
"ctx._source.hobbies(first).add(params.new_hob ...",
" ^---- HERE"
],
"script" : "ctx._source.hobbies(first).add(params.new_hobby)",
"lang" : "painless"
}
},
"status" : 400
}
When I've tried to update list, I've got error above. I've realized that when I removed part with brackets('(first)') from my field's name, it's working. How can I prepare an update query with a field name with brackets?
Thanks in advance.
this is a horrible convention for field names, just stick with alphanumerics (please keep in mind, that someone after you has to maintain this, so it would be so much nicer to work with cleaner data in Elasticsearch). You can try ctx.source['hobbies(first)']

Resources