Fluentd JSON string parsing with multiple data type in array - elasticsearch

I am trying to set up a logging pipeline with Fluentd and elastic search. One of my log patterns looks like the following:
{
"key": "value",
"inputs": [
[
"2023-01-16T04: 45: 12.238Z",
{
"type": "channel",
"subtype": "profile",
"data": {
"firstName": "Customer"
}
}
]
]
}
The issue with this structure is that the first object in the inner array date is a string. Whenever Flunetd is trying to write it to ES, it throws an exception with error code 400, and following message
#0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: illegal_argument_exception [reason]: 'can't merge a non object mapping [data.inputs] with an object mapping'" location=nil
What is the way forward?
When I remove this date from the array, it is getting synced correctly to ES.

Related

Defining an array of json docs in Elasticsearch Painless Lab

I'm trying to define some docs in ES Painless Lab, to test some logic in Painless Lab, before running it on the actual index, but can't figure out how to do it, and the docs are not helping either. There is very little documentation on the actual syntax and it's not much help for someone with no Java background.
If I try to define a doc like this:
def docs = [{ "id": 1, "name": "Apple" }];
I get an error:
Unhandled Exception illegal_argument_exception
invalid sequence of tokens near ['{'].
Stack:
[
"def docs = [{ \"id\": 1, \"name\": \"Apple ...",
" ^---- HERE"
]
If I want to do it the Java way:
String message;
JSONObject json = new JSONObject();
json.put("test1", "value1");
message = json.toString();
I'm also getting an error:
Unhandled Exception illegal_argument_exception
invalid declaration: cannot resolve type [JSONObject]
Stack:
[
"... ring message;\nJSONObject json = new JSONObject();\n ...",
" ^---- HERE"
]
So what's the proper way to define an array of json objects to play with in Painless Lab?
After more experimenting, I found out that the docs can be passed in the parameters tab as:
{
"docs": [
{ "id": 1, "name": "Apple" },
{ "id": 2, "name": "Pear" },
{ "id": 3, "name": "Pineapple" }
]
}
and then access it from the code as
def doc = params.docs[1];
return doc["name"];
I'd be still interested how to define an object or array in the code itself.

Jackson derealization with SpringBoot : To get field names present in request along with respective field mapping

I have a requirement to throw different error in case of different scenarios like below, and there are many such fields not just 1.
e.g.
{
"id": 1,
"name": "nameWithSpecialChar$"
}
Here it should throw error for special character.
{
"id": 1,
"name": null
}
Here throw field null error.
{
"id": 1
}
Here throw field missing error.
Handling, 1st and 2nd scenario is easy, but for 3rd one, is there any way we can have a List of name of fields that were passed in input json at the time of serialization itself with Jackson?
One way, I am able to do it is via mapping request to JsonNode and then check if nodes are present for required fields and after that deserialize that JsonNode manually and then validate rest of the members as below.
public ResponseEntity myGetRequest(#RequestBody JsonNode requestJsonNode) {
if(!requestJsonNode.has("name")){
throw some error;
}
MyRequest request = ObjectMapper.convertValue(requestJsonNode, MyRequest .class);
validateIfFieldsAreInvalid(request);
But I do not like this approach, is there any other way of doing it?
You can define a JSON schema and validate your object against it. In your example, your schema may look like this:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"id": {
"description": "The identifier",
"type": "integer"
},
"name": {
"description": "The item name",
"type": "string",
"pattern": "^[a-zA-Z]*$"
}
},
"required": [ "id", "name" ]
}
To validate your object, you could use the json-schema-validator library. This library is built on Jackson. Since you're using Spring Boot anyway, you already have Jackson imported.
The example code looks more or less like this:
String schema = "<define your schema here>";
String data = "<put your data here>";
JsonSchemaFactory factory = JsonSchemaFactory.byDefault();
ObjectMapper m = new ObjectMapper();
JsonSchema jsonSchema = factory.getJsonSchema(m.readTree(schema));
JsonNode json = m.readTree(data);
ProcessingReport report = jsonSchema.validate(json);
System.out.println(report);
The report includes detailed errors for different input cases. For example, with this input
{
"id": 1,
"name": "nameWithSpecialChar$"
}
this output is printed out
--- BEGIN MESSAGES ---
error: ECMA 262 regex "^[a-zA-Z]*$" does not match input string "nameWithSpecialChar$"
level: "error"
schema: {"loadingURI":"#","pointer":"/properties/name"}
instance: {"pointer":"/name"}
domain: "validation"
keyword: "pattern"
regex: "^[a-zA-Z]*$"
string: "nameWithSpecialChar$"
--- END MESSAGES ---
Or instead of just printing out the report, you can loop through all errors and have your specific logic
for (ProcessingMessage message : report) {
// Add your logic here
}
You could check the example code to gain more information about how to use the library.

Publishing Avro messages using Kafka REST Proxy throws "Conversion of JSON to Avro failed"

I am trying to publish a message which has a union for one field as
{
"name": "somefield",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
Publishing the message using the Kafka REST Proxy keeps throwing me the following error when somefield has an array populated.
{
"error_code": 42203,
"message": "Conversion of JSON to Avro failed: Failed to convert JSON to Avro: Expected start-union. Got START_ARRAY"
}
Same schema with somefield: null is working fine.
The Java classes are built in the Spring Boot project using the gradle plugin from the Avro schemas. When I use the generated Java classes and publish a message, with the array populated using the Spring KafkaTemplate, the message is getting published correctly with the correct schema. (The schema is taken from the generated Avro Specific Record) I copy the same json value and schema and publish via REST proxy, it fails with the above error.
I have these content types in the API call
accept:application/vnd.kafka.v2+json, application/vnd.kafka+json, application/json
content-type:application/vnd.kafka.avro.v2+json
What am I missing here? Any pointers to troubleshoot the issue is appreciated.
The messages I tested for were,
{
"somefield" : null
}
and
{
"somefield" : [
{"field1": "hello"}
]
}
However, it should be instead passed as,
{
"somefield" : {
"array": [
{"field1": "hello"}
]}
}

How to rename a nested field containing dots with elasticsearch rename processor and ingest pipeline

I have a field in elasticsearch (5.5.1) which I need to rename because the name contains a '.' and it is causing various problems. The field I want to rename is nested inside another field.
I am trying to use a Rename Processor in an Ingest Pipeline to do a Reindex as described here: https://stackoverflow.com/a/43142634/5114
Here is my pipeline simulation request (you can copy this verbatim into the Dev Tools utility in Kibana to test it):
POST _ingest/pipeline/_simulate
{
"pipeline" : {
"description": "rename nested fields to remove dot",
"processors": [
{
"rename" : {
"field" : "message.message.group1",
"target_field" : "message_group1"
}
},
{
"rename" : {
"field" : "message.message.group2",
"target_field" : "message.message_group2"
}
}
]
},
"docs":[
{
"_type": "status",
"_id": "1509533940000-m1-bfd7183bf036bd346a0bcf2540c05a70fbc4d69e",
"_version": 5,
"_score": null,
"_source": {
"message": {
"_job-id": "AV8wHJEaa4J0sFOfcZI5",
"message.group1": 0,
"message.group2": "foo"
},
"timestamp": 1509533940000
}
}
]
}
The problem is that I get an error when trying to use my pipeline:
{
"docs": [
{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"header": {
"processor_type": "rename"
}
}
],
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "field [message.message.group1] doesn't exist"
}
},
"header": {
"processor_type": "rename"
}
}
}
]
}
I think the problem is caused by the field "message.group1" being inside another field ("message"). I'm not sure how to refer to the field I want in the context of the processor. It seems that there could be ambiguity between cases of nested fields, fields containing dots and nested fields containing dots.
I'm looking for the correct way to reference these fields, or if Elasticsearch can not do what I want, confirmation that this is not possible. If Elasticsearch can do this, then it will probably go very fast, else I have to write an external script to pull the documents, transform them, and re-save them to the new index.
Ok, investigating in the Elasticsearch code, I think I know why this won't work.
First we look at the Elasticsearch Rename Processor:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RenameProcessor.java#L76-L84
Object value = document.getFieldValue(field, Object.class);
document.removeField(field);
try {
document.setFieldValue(targetField, value);
} catch (Exception e) {
// setting the value back to the original field shouldn't as we just fetched it from that field:
document.setFieldValue(field, value);
throw e;
}
What this is doing is looking for the field to rename, getting its value, then removing the field and adding a new field with the same value but with the new name.
Now we look at what happens in document.getFieldValue:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L101-L108
public <T> T getFieldValue(String path, Class<T> clazz) {
FieldPath fieldPath = new FieldPath(path);
Object context = fieldPath.initialContext;
for (String pathElement : fieldPath.pathElements) {
context = resolve(pathElement, path, context);
}
return cast(path, context, clazz);
}
Notice it uses a FieldPath object to represent the path to the field in the document.
Now look at how the FieldPath represents the path:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L688
this.pathElements = newPath.split("\\.");
This is splitting the path on any "." character, because that is the delimiter between path elements in field names.
The problem is that the source document has a field named "message.group1", so we need to be able to reference that. Just splitting the path on "." does not account for field names containing a "." in the name. We would need a syntax more like javascript for that, where we could use brackets and quotes to make the dot mean something different.
If the source documents were all transformed so that a "." in the field name would turn that field into an object before saving, then this path scheme would work. But with source documents having field names containing "." we can not reference them in certain contexts.
To solve my problem and reindex my index, I wrote a python script which pulled a batch of documents, transformed them and bulk inserted them in a new index. This is basically what the Elasticsearch reindex api does, but I did it in python instead.
More than two year later, I come across the same issue. You can manage to have your dotted-properties expanded to real nested objects with the the dot_expander processor.
Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor
Issue 37507 on Elasticsearch's Github pointed me in the right direction.

Indexing metadata field in ElasticSearch

I have a metadata field inside the model I'm indexing, but when I index a field inside metadata that was previously indexed as another type, I get a "no mapping" error... How can I disable the dynamic mapping of the metadata field only?
If I previously indexed this document:
{
...
"metadata": {
"key": {
"value": "test"
}
},
...
}
Then, if I index this document:
{
...
"metadata": {
"key": "test"
},
...
}
I get the "tried to parse as object, but got EOF, has a concrete value been provided to it?" error because metadata[key] is no longer an object. But this might happen when indexing metadata.
Thanks,
Pedro

Resources