How to reject "invalid" documents in ElasticSearch - elasticsearch

We're currently using a Couchbase Plugin (transport-couchbase) to transport and index the data into ElasticSearch (http://docs.couchbase.com/couchbase-elastic-search/)
I've taken a look at ElasticSearch's mapping documentation here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html
My understanding is that if you rely on defaults for ElasticSearch, once a document gets indexed, ElasticSearch will create a dynamic mapping for that document type. This is what we've defaulted to.
We ran into issues where after adding a specific document type, and when the transport plugin inserts an "invalid" document (the document's field type is now different -- from string -> array), ElasticSearch throws an exception and essentially breaks the replication from Couchbase to ElasticSearch. The exception looks like this:
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property
[xyz]
java.lang.RuntimeException: indexing error MapperParsingException[failed to parse
[doc.myfield]]; nested: ElasticsearchIllegalArgumentException[unknown property[xyz]]
Is there a way we can configure ElasticSearch so that "invalid" documents simply get filtered without throwing exception and breaking the replication?
Thanks.

{
"tweet" : {
"dynamic": "strict",
"properties" : {
"message" : {"type" : "string", "store" : true }
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html

Related

Upgrading elasticsearch: what is the state of "types" in version 7?

I am in the process of upgrading Elasticsearch. I upgraded elasticsearch from 6.8 to 7.17 and I upgraded the javascript client to #elastic/elasticsearch 7.17.0. I then deleted my old indices, put the mappings in place and tried to reindex the data coming from another database.
Now I am struggling with the current state of types in elasticsearch 7.17. I know that an index can only have one type of document and it looks like the type parameter of the javascript client is deprecated, but it still seems to be required. When I make a call to client.index() it complains about a missing type parameter:
ConfigurationError: Missing required parameter: type
And the error stack points to this block of code:
await client.index({
index: indexName,
id: obj.id,
body: obj.body,
});
My mappings look like this:
{
"author_index" : {
"mappings" : {
"dynamic" : "false",
"properties" : {
"articleCount" : {
"type" : "integer"
}
// ,,,
}
}
}
}
Should I still be specifying the type? Why does the client require it when its deprecated? What am I missing?
Replace your type with _doc and it should work as type is deprecated but you still need to give the _doc placeholder in the API calls.

Elastic Search scan operation not working

I'm performing some operation in Dataflow and putting document in ElasticSearch index.While trying to fetch doc from Kibana, I'm not able to fetch more than 10 records at a time. So I have used scan operation and also provide the size in url, now I'm getting scan operation not supported error.
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "No search type for [scan]"
}
],
"type" : "illegal_argument_exception",
"reason" : "No search type for [scan]"
},
So is there any way to get more than 10 docs from Kibana at the same time. So I'm using Kibana 7.7.0 management. Thanks in Advance.
search_type=scan was supported til Elasticsearch v2.1, and then removed.
Probably you're using something higher than ES 2.1.
https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-request-search-type.html

Kafka Elasticsearch Connector Timestamps

I can see this has been discussed a few times here for instance but I think the solutions are out of date due to breaking changes in Elasticsearch.
I'm trying to convert a long/epoch field in my Json in my Kafka topic to an Elasticsearch date type which is pushed through the connector.
When I try to add a dynamic mapping, my Kafka connect updates fail because Im trying to apply two mappings to a field, _doc and kafkaconnect. This was a breaking change around version 6 I believe where you can only have one mapping per index.
{
"index_patterns": [ "depart_details" ],
"mappings": {
"dynamic_templates": [
{
"scheduled_to_date": {
"match": "scheduled",
"mapping": {
"type": "date"
}
}
}
]
}}
I've now focussed on trying to translate the message at source in the connector by changing the field to a timestamp, time or date.
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.field" : "scheduled",
"transforms.TimestampConverter.target.type": "Timestamp"
However, any messages I try to send through this transformer fail with
Caused by: org.apache.kafka.connect.errors.DataException: Java class class java.util.Date does not have corresponding schema type.
at org.apache.kafka.connect.json.JsonConverter.convertToJson(JsonConverter.java:604)
at org.apache.kafka.connect.json.JsonConverter.convertToJson(JsonConverter.java:668)
at org.apache.kafka.connect.json.JsonConverter.convertToJsonWithoutEnvelope(JsonConverter.java:574)
at org.apache.kafka.connect.json.JsonConverter.fromConnectData(JsonConverter.java:324)
at io.confluent.connect.elasticsearch.DataConverter.getPayload(DataConverter.java:181)
at io.confluent.connect.elasticsearch.DataConverter.convertRecord(DataConverter.java:163)
at io.confluent.connect.elasticsearch.ElasticsearchWriter.tryWriteRecord(ElasticsearchWriter.java:285)
at io.confluent.connect.elasticsearch.ElasticsearchWriter.write(ElasticsearchWriter.java:270)
at io.confluent.connect.elasticsearch.ElasticsearchSinkTask.put(ElasticsearchSinkTask.java:169)
Seems like a really common thing to need to do, but I don't see how to get a date or time field into Elastic through this connector in version 7?
The Confluent documentation states that the ES connector is currently not supported with ES 7.
According to this issue, it might suffice to change type.name=kafkaconnect to type.name=_doc in your connector configuration.

Logstash/ElasticSearch: guesses wrong for datatype for field

The log files I'm trying to import into Logstash contain a field that sometimes looks like a date/time and sometimes does not. Unfortunately, the first occurrence looked like a date/time and someone (logstash or elasticsearch) decided to define the field as a date/time. When trying to import a later log record, Elasticsearch has an exception:
Failed to execute [index ...]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse [#fields.field99]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:320)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:587)
...
Caused by: java.lang.IllegalArgumentException: Invalid format: "empty"
at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:747)
...
Question: How do I tell logstash/elasticsearch to not define this field as a date/time? I would like all the fields from my log (except the one explicit timestamp field) to be defined as just text.
Question: it appears that logstash gives up trying to import records from the log file after seeing this one that elasticsearch throws an exception on. How can I tell logstash to ignore this exception and keep trying to import the other records from the log file?
I found the answer to my first question myself.
Before adding data through Logstash, I had to set the defaults for Elasticsearch to treat the field as "string" instead of "date".
I did this by creating a defaults.js file like this:
{
"template": "logstash-*",
"mappings": {
`"_default_"`: {
"dynamic_templates": [{
"fields_template": {
"mapping": { "type": "string" },
"path_match": "#fields.*"
}
}]
}
}
}
and telling Elasticsearch to use it before adding any data through Logstash:
curl -XPUT 'http://localhost:9200/_template/template_logstash/' -d #defaults_for_elasticsearch.js
Hope this helps someone else.

Disable date detection in Tire's elasticsearch mapping

I'm indexing a document with a property obj_properties, which is a hash of property name -> property value. elasticsearch is inferring that some of the property values are dates, leading to the following error when it encounters a subsequent value for the same property that can't be parsed as a date.
org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field <NON-DATE FIELD within obj_properties>
So, I'd like to disable date detection for obj_properties and anything nested within it. Per
http://elasticsearch-users.115913.n3.nabble.com/Date-Detection-not-always-wanted-tp1638890p1639415.html
(Note, I believe the linked post contains a typo -- the field should be date_formats rather than date_format, but I've tried both ways)
I've created the following mapping
mapping do
indexes :name
indexes :obj_properties, type: "object", date_formats: "none"
end
but I continue to receive the same exception. The properties in obj_properties are not known ahead of time, so it's not possible to create an exhaustive mapping of types. Any ideas? Is disabling date detection the correct approach?
You can turn off date detection for a particular type by specifying it in the mapping:
curl -XPUT 'http://127.0.0.1:9200/myindex/?pretty=1' -d '
{
"mappings" : {
"mytype" : {
"date_detection" : 0
}
}
}
'
or for all types in an index by specifying it in the default mapping:
curl -XPUT 'http://127.0.0.1:9200/myindex/?pretty=1' -d '
{
"mappings" : {
"_default_" : {
"date_detection" : 0
}
}
}
'
mapping(date_detection: false) do
indexes :name
indexes :obj_properties, type: "object"
end
then curl 'http://127.0.0.1:9200/myindex/_mapping?pretty=1' will include date_detection = false mentioned here
Although i believe this applies to the entire index - not a particular field

Resources