I have to store documents with a single field contains a single Json object. this object has a variable depth and variable schema.
I config a mapping like this:
"mappings": {
"properties": {
"#timestamp": {
"type": "date"
},
"message": {
"type": "object"
}
}
}
It works fine and Elasticsearch creates and updates mapping with documents that received.
The problem is that after some updates in mapping, it rejects new documents and do not update mapping anymore. At this time I change the indices and mapping update occurred for that indies. I'm looking forward to know the right solution.
for example the first document is:
{
personalInfo:{
fistName: "tom"
}
moviesStatistics: {
count: 100
}
}
the second document that will update Elasticsearch mapping is:
{
personalInfo:{
fistName: "tom",
lastName: "hanks"
},
moviesStatistics: {
count: 100
},
education: {
title: "a title..."
}
}
Elasticsearch creates mapping with doc1 and updates it with doc2, doc3, ... until a number of documents received. After that it starts to reject every document that is not matched to the last mapping fields.
After all I found the solution in the home page of Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/7.13//dynamic-field-mapping.html
We can use Dynamic mapping and simply use this mapping:
"mappings": {
"dynamic": "true"
}
You should also change some default restrictions that mentioned here:
https://www.elastic.co/guide/en/elasticsearch/reference/7.13//mapping-settings-limit.html
I have a requirement to throw different error in case of different scenarios like below, and there are many such fields not just 1.
e.g.
{
"id": 1,
"name": "nameWithSpecialChar$"
}
Here it should throw error for special character.
{
"id": 1,
"name": null
}
Here throw field null error.
{
"id": 1
}
Here throw field missing error.
Handling, 1st and 2nd scenario is easy, but for 3rd one, is there any way we can have a List of name of fields that were passed in input json at the time of serialization itself with Jackson?
One way, I am able to do it is via mapping request to JsonNode and then check if nodes are present for required fields and after that deserialize that JsonNode manually and then validate rest of the members as below.
public ResponseEntity myGetRequest(#RequestBody JsonNode requestJsonNode) {
if(!requestJsonNode.has("name")){
throw some error;
}
MyRequest request = ObjectMapper.convertValue(requestJsonNode, MyRequest .class);
validateIfFieldsAreInvalid(request);
But I do not like this approach, is there any other way of doing it?
You can define a JSON schema and validate your object against it. In your example, your schema may look like this:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"id": {
"description": "The identifier",
"type": "integer"
},
"name": {
"description": "The item name",
"type": "string",
"pattern": "^[a-zA-Z]*$"
}
},
"required": [ "id", "name" ]
}
To validate your object, you could use the json-schema-validator library. This library is built on Jackson. Since you're using Spring Boot anyway, you already have Jackson imported.
The example code looks more or less like this:
String schema = "<define your schema here>";
String data = "<put your data here>";
JsonSchemaFactory factory = JsonSchemaFactory.byDefault();
ObjectMapper m = new ObjectMapper();
JsonSchema jsonSchema = factory.getJsonSchema(m.readTree(schema));
JsonNode json = m.readTree(data);
ProcessingReport report = jsonSchema.validate(json);
System.out.println(report);
The report includes detailed errors for different input cases. For example, with this input
{
"id": 1,
"name": "nameWithSpecialChar$"
}
this output is printed out
--- BEGIN MESSAGES ---
error: ECMA 262 regex "^[a-zA-Z]*$" does not match input string "nameWithSpecialChar$"
level: "error"
schema: {"loadingURI":"#","pointer":"/properties/name"}
instance: {"pointer":"/name"}
domain: "validation"
keyword: "pattern"
regex: "^[a-zA-Z]*$"
string: "nameWithSpecialChar$"
--- END MESSAGES ---
Or instead of just printing out the report, you can loop through all errors and have your specific logic
for (ProcessingMessage message : report) {
// Add your logic here
}
You could check the example code to gain more information about how to use the library.
I have a field in elasticsearch (5.5.1) which I need to rename because the name contains a '.' and it is causing various problems. The field I want to rename is nested inside another field.
I am trying to use a Rename Processor in an Ingest Pipeline to do a Reindex as described here: https://stackoverflow.com/a/43142634/5114
Here is my pipeline simulation request (you can copy this verbatim into the Dev Tools utility in Kibana to test it):
POST _ingest/pipeline/_simulate
{
"pipeline" : {
"description": "rename nested fields to remove dot",
"processors": [
{
"rename" : {
"field" : "message.message.group1",
"target_field" : "message_group1"
}
},
{
"rename" : {
"field" : "message.message.group2",
"target_field" : "message.message_group2"
}
}
]
},
"docs":[
{
"_type": "status",
"_id": "1509533940000-m1-bfd7183bf036bd346a0bcf2540c05a70fbc4d69e",
"_version": 5,
"_score": null,
"_source": {
"message": {
"_job-id": "AV8wHJEaa4J0sFOfcZI5",
"message.group1": 0,
"message.group2": "foo"
},
"timestamp": 1509533940000
}
}
]
}
The problem is that I get an error when trying to use my pipeline:
{
"docs": [
{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"header": {
"processor_type": "rename"
}
}
],
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "field [message.message.group1] doesn't exist"
}
},
"header": {
"processor_type": "rename"
}
}
}
]
}
I think the problem is caused by the field "message.group1" being inside another field ("message"). I'm not sure how to refer to the field I want in the context of the processor. It seems that there could be ambiguity between cases of nested fields, fields containing dots and nested fields containing dots.
I'm looking for the correct way to reference these fields, or if Elasticsearch can not do what I want, confirmation that this is not possible. If Elasticsearch can do this, then it will probably go very fast, else I have to write an external script to pull the documents, transform them, and re-save them to the new index.
Ok, investigating in the Elasticsearch code, I think I know why this won't work.
First we look at the Elasticsearch Rename Processor:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RenameProcessor.java#L76-L84
Object value = document.getFieldValue(field, Object.class);
document.removeField(field);
try {
document.setFieldValue(targetField, value);
} catch (Exception e) {
// setting the value back to the original field shouldn't as we just fetched it from that field:
document.setFieldValue(field, value);
throw e;
}
What this is doing is looking for the field to rename, getting its value, then removing the field and adding a new field with the same value but with the new name.
Now we look at what happens in document.getFieldValue:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L101-L108
public <T> T getFieldValue(String path, Class<T> clazz) {
FieldPath fieldPath = new FieldPath(path);
Object context = fieldPath.initialContext;
for (String pathElement : fieldPath.pathElements) {
context = resolve(pathElement, path, context);
}
return cast(path, context, clazz);
}
Notice it uses a FieldPath object to represent the path to the field in the document.
Now look at how the FieldPath represents the path:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L688
this.pathElements = newPath.split("\\.");
This is splitting the path on any "." character, because that is the delimiter between path elements in field names.
The problem is that the source document has a field named "message.group1", so we need to be able to reference that. Just splitting the path on "." does not account for field names containing a "." in the name. We would need a syntax more like javascript for that, where we could use brackets and quotes to make the dot mean something different.
If the source documents were all transformed so that a "." in the field name would turn that field into an object before saving, then this path scheme would work. But with source documents having field names containing "." we can not reference them in certain contexts.
To solve my problem and reindex my index, I wrote a python script which pulled a batch of documents, transformed them and bulk inserted them in a new index. This is basically what the Elasticsearch reindex api does, but I did it in python instead.
More than two year later, I come across the same issue. You can manage to have your dotted-properties expanded to real nested objects with the the dot_expander processor.
Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor
Issue 37507 on Elasticsearch's Github pointed me in the right direction.
Everytime I follow the instruction about Create Index, Mapping and Add Data in elasticsearch i have the error.
I'm using Postman.
First of all, i create index:
POST http://localhost:9200/schools
(actually, i have to use put to create succesfully)
Next, i create Mapping and Add Data:
POST http://localhost:9200/schools/_bulk
Request Body
{
"index":{
"_index":"schools", "_type":"school", "_id":"1"
}
}
{
"name":"Central School", "description":"CBSE Affiliation", "street":"Nagan",
"city":"paprola", "state":"HP", "zip":"176115", "location":[31.8955385, 76.8380405],
"fees":2000, "tags":["Senior Secondary", "beautiful campus"], "rating":"3.5"
}
{
"index":{
"_index":"schools", "_type":"school", "_id":"2"
}
}
{
"name":"Saint Paul School", "description":"ICSE
Afiliation", "street":"Dawarka", "city":"Delhi", "state":"Delhi", "zip":"110075",
"location":[28.5733056, 77.0122136], "fees":5000,
"tags":["Good Faculty", "Great Sports"], "rating":"4.5"
}
{
"index":{"_index":"schools", "_type":"school", "_id":"3"}
}
{
"name":"Crescent School", "description":"State Board Affiliation", "street":"Tonk Road",
"city":"Jaipur", "state":"RJ", "zip":"176114","location":[26.8535922, 75.7923988],
"fees":2500, "tags":["Well equipped labs"], "rating":"4.5"
}
But all i receive is just:
{
"error": {
"root_cause": [
{
"type": "json_e_o_f_exception",
"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#681c6189; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#681c6189; line: 2, column: 3]"
}
],
"type": "json_e_o_f_exception",
"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#681c6189; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#681c6189; line: 2, column: 3]"
},
"status": 500
}
This is because your request body JSON is malformed. I'd advise checking with just one entry until you can get it into Elasticsearch, then add the others.
The following JSON is valid, though I'm not sure if it provides the structure you want:
{
"index":{
"_index":"schools", "_type":"school", "_id":"1"
},
"name":"Central School", "description":"CBSE Affiliation", "street":"Nagan",
"city":"paprola", "state":"HP", "zip":"176115", "location":[31.8955385, 76.8380405],
"fees":2000, "tags":["Senior Secondary", "beautiful campus"], "rating":"3.5"
}
You can use a tool for formatting and validating JSON to make sure it is valid JSON. Below are some examples.
http://jsonformatter.org/
https://jsonformatter.curiousconcept.com/
I see something which similar to my problem. My problem solved!
Elasticsearch Bulk API - Unexpected end-of-input: expected close marker for ARRAY
To load data to Elasticsearch, use the REST API endpoint is '/_bulk' which expects
the following newline delimited JSON (NDJSON) structure:
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
The Curl request example:
curl -H 'Content-Type: application/x-ndjson' -XPOST 'elasticsearchhost:port/index-name-sample/_bulk?pretty' --data-binary #sample.json
In your case, the request will be as follows:
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/schools/_bulk?pretty' --data-binary #schools-sample.json
The schools-sample.json content:
{"index":{"_index":"schools", "_type":"school", "_id":"1"}}
{"name":"Central School", "description":"CBSE Affiliation", "street":"Nagan","city":"paprola", "state":"HP", "zip":"176115", "location":[31.8955385, 76.8380405],"fees":2000, "tags":["Senior Secondary", "beautiful campus"], "rating":"3.5"}
{"index":{"_index":"schools", "_type":"school", "_id":"2"}}
{"name":"Saint Paul School", "description":"ICSE Afiliation", "street":"Dawarka", "city":"Delhi", "state":"Delhi", "zip":"110075","location":[28.5733056, 77.0122136], "fees":5000,"tags":["Good Faculty", "Great Sports"], "rating":"4.5"}
/n
Important: the final line of data must end with a newline character \n. Each newline character may be preceded by a carriage return \r. Otherwise, you will get an error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\n]"
}
],
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\n]"
},
"status" : 400
}
{ "index":{"_index":"schools", "_type":"school", "_id":"1" }}
{ "name":"Central School", "description":"CBSE Affiliation", "street":"Nagan", "city":"paprola", "state":"HP", "zip":"176115", "location":[31.8955385, 76.8380405], "fees":2000, "tags":["Senior Secondary", "beautiful campus"], "rating":"3.5\n"}
{ "index":{ "_index":"schools", "_type":"school", "_id":"2" }}
{ "name":"Saint Paul School", "description":"ICSE Afiliation", "street":"Dawarka", "city":"Delhi", "state":"Delhi", "zip":"110075","location":[28.5733056, 77.0122136], "fees":5000,"tags":["Good Faculty", "Great Sports"], "rating":"4.5\n" }
{ "index":{"_index":"schools", "_type":"school", "_id":"3"}}
{ "name":"Crescent School", "description":"State Board Affiliation", "street":"Tonk Road", "city":"Jaipur", "state":"RJ", "zip":"176114","location":[26.8535922, 75.7923988],"fees":2500, "tags":["Well equipped labs"], "rating":"4.5\n"}
I have a metadata field inside the model I'm indexing, but when I index a field inside metadata that was previously indexed as another type, I get a "no mapping" error... How can I disable the dynamic mapping of the metadata field only?
If I previously indexed this document:
{
...
"metadata": {
"key": {
"value": "test"
}
},
...
}
Then, if I index this document:
{
...
"metadata": {
"key": "test"
},
...
}
I get the "tried to parse as object, but got EOF, has a concrete value been provided to it?" error because metadata[key] is no longer an object. But this might happen when indexing metadata.
Thanks,
Pedro