I have managed to create an import from Kafka to Elasticsearch using Kafka Connect.
Connector-config:
{
"name": "raw-customer-equipment",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": 1,
"topics": "raw.customer.equipment",
"key.ignore": true,
"value.converter.schemas.enable": false,
"schema.ignore": true,
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"connection.url": "<elastic-url>",
"connection.username": "<user>",
"connection.password": "<pwd>",
"type.name": "_doc" }
}
However Elasticsearch doesnt seem to be able to map the imported (Json) data. When peeking on it in Kibana the imported data doesnt seem to be search'able.
{
"_index": "raw.customer.equipment",
"_type": "_doc",
"_id": "raw.customer.equipment+1+929943",
"_version": 1,
"_score": 0,
"_source": {
"ifstats_list": [
{
"Event Time": "1589212678436",
"AP_list": [
{
"AP ID": 1,
"AP Alias": "PRIV0"
},
{
"AP ID": 2,
"AP Alias": "VID1"
},
{
"AP ID": 5,
"AP Alias": "VID1_BH"
}
],
"Device Type": "<type>",
...
"Associated Stations": [
{
"Packets sent": 11056613,
"Packets received": 304744,
"Multiple Retries Count": 0,
"Channel STA": 6,
"MAC Address": "<mac>",
....
},
{
....
}]
....
I want to be able to query by for instance "MAC Address" but Elastic seem to just handle the imported data as a big text-chunk.
I guess It is something in the Kafka-connector setup that is missing or wrong but I fail to see what.
As you might have guessed Im new at Elastic, and Im not the one supposed to be able to use the data in the end
Any help appreciated
BR
Edit:
Added connector-config by request.
Related
This is payload
{
"videourl": "*****",
"name": "ABCqq",
"description": "AAAnb",
"tags": "#AAAzx",
"uploadedtime": "2020-02-24T05:48:37.527Z",
"uploadedby": "Dr AAAgh",
"thumbnail": "http://",
"duration": "5:32",
"postedby": "AAAdf",
"doctorimage": "AAA12",
"doctorname": "nnn",
}
Result in the form
{"_index": "rwe",
"_type": "_doc",
"_id": "8wEed3ABcYN_H8khP4hB",
"_score": 1,
"_source": {
"videourl": "*****",
"name": "ABCqq",
"description": "AAAnb",
"tags": "#AAAzx",
"uploadedtime": "2020-02-24T05:48:37.527Z",
"uploadedby": "Dr AAAgh",
"thumbnail": "http://",
"duration": "5:32",
"postedby": "AAAdf",
"doctorimage": "AAA12",
"doctorname": "nnn"
}
}
This is a document where I want to increment the count value of the field every time when this doc gets updated.
we have to add new field which has the name counter_value.
Expected Resultt
{"_index": "rwe",
"_type": "_doc",
"_id": "8wEed3ABcYN_H8khP4hB",
"_score": 1,
"_source": {
"videourl": "*****",
"name": "ABCqq",
"description": "AAAnb",
"tags": "#AAAzx",
"uploadedtime": "2020-02-24T05:48:37.527Z",
"uploadedby": "Dr AAAgh",
"thumbnail": "http://",
"duration": "5:32",
"postedby": "AAAdf",
"doctorimage": "AAA12",
"doctorname": "nnn",
"counter_value": 1
}
}
You can just increment the counter via scripting, see here and here. However, elastic already has a version field. Depending on your usecase, it might be enough to add the version parameter to your query, as described here:
curl -XGET 'http://localhost:9200/rwe/_search?version=true'
I have a Kibana instance which stores log data from our java apps in per daily indexes, like logstash-java-beats-2019.09.01. As far as amount of indexes could be pretty big in future I want to create a rollup job, to be able to archive old logs in separate index, something like logstash-java-beats-rollup. Typical document in logstash-java-beats-2019.09.01 index looks like this:
{
"_index": "logstash-java-beats-2019.10.01",
"_type": "_doc",
"_id": "C9mfhG0Bf_Fr5GBl6kTg",
"_version": 1,
"_score": 1,
"_source": {
"#timestamp": "2019-10-01T00:02:13.756Z",
"ecs": {
"version": "1.0.0"
},
"event_timestamp": "2019-10-01 00:02:13,756",
"log": {
"offset": 5729359,
"file": {
"path": "/var/log/application-name/application.log"
}
},
"tags": [
"service-name",
"location",
"beats_input_codec_plain_applied"
],
"loglevel": "WARN",
"java_class": "java.class.name",
"message": "Log message here",
"host": {
"name": "host-name-beat"
},
"#version": "1",
"agent": {
"hostname": "host-name",
"id": "a34af368-3359-495a-9775-63502693d148",
"ephemeral_id": "cc4afd3c-ad97-47a4-bd21-72255d450232",
"type": "filebeat",
"version": "7.2.0",
"name": "host-name-beat"
},
"input": {
"type": "log"
}
}
}
So I created a rollup job with such config:
{
"config": {
"id": "Test 2 job",
"index_pattern": "logstash-java-beats-2*",
"rollup_index": "logstash-java-beats-rollup",
"cron": "0 0 * * * ?",
"groups": {
"date_histogram": {
"fixed_interval": "1000ms",
"field": "#timestamp",
"delay": "1d",
"time_zone": "UTC"
}
},
"metrics": [],
"timeout": "20s",
"page_size": 1000
},
"status": {
"job_state": "stopped",
"current_position": {
"#timestamp.date_histogram": 1567933199000
},
"upgraded_doc_id": true
},
"stats": {
"pages_processed": 1840,
"documents_processed": 5322525,
"rollups_indexed": 1838383,
"trigger_count": 1,
"index_time_in_ms": 1555018,
"index_total": 1839,
"index_failures": 0,
"search_time_in_ms": 59059,
"search_total": 1840,
"search_failures": 0
}
}
but it fails to rollup the data with such exception:
Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$GTvyIZtPhKqi-dtfVd6MXg], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[1]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$v-r89eEpLvImr0lWIrOb_Q], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[2]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$quCHwZP1iVU_Bs2fmhgSjQ], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
...
logstash-java-beats-rollup index is empty, even if there is some stats for the rollup job available.
I'm using elasticsearch v7.2.0
Could you please explain what is wrong with the data, or with the rollup job configuration?
I want to get specific embedded array from mongodb document and add new document in that embedded array using "mongodb.Driver" .net driver.
am inserting doc as:
{
"_id": "5c41b5c6b0ce0437dc576c53",
"ProjectId": "234",
"OwnerId": "62",
"ProjectName": "proj4h46m",
"FileDetails": [
{
"TotalWord": "-1",
"RepeatedWord": "-1",
"TMWordCount": "-1",
"TranslationRequired": "-1",
"ParentFileName": "test",
"ChildFileName": "test_AR-SA",
"Status": "Newly Uploaded"
}
]
}
I expect to get "FileDetails" array from it and add new doc and update to mongodb. as shown below:
{
"_id": "5c41b5c6b0ce0437dc576c53",
"ProjectId": "234",
"OwnerId": "62",
"ProjectName": "proj4h46m",
"FileDetails": [
{
"TotalWord": "-1",
"RepeatedWord": "-1",
"TMWordCount": "-1",
"TranslationRequired": "-1",
"ParentFileName": "test",
"ChildFileName": "test_AR-SA",
"Status": "Newly Uploaded"
},
{
"TotalWord": "10",
"RepeatedWord": "3",
"TMWordCount": "12",
"TranslationRequired": "1",
"ParentFileName": "test2",
"ChildFileName": "test_AR-KSA",
"Status": "Newly Uploaded"
}
]
}
I got this by using below method:-
var query2 = Query.EQ(""ProjectId", "234");
var document=#"{""TotalWord"": ""10"",""RepeatedWord"": "3",""TMWordCount"": ""12"",""TranslationRequired"": ""1"",""ParentFileName"": ""test2"",""ChildFileName"": ""test_AR-KSA"",""Status"": ""Newly Uploaded""}";
var update = Update.Push("FileDetails", document.ToBsonDocument());
collec.Update(query2, update);
I have a kafka-connect flow of mongodb->kafka connect->elasticsearch sending data end to end OK, but the payload document is JSON encoded. Here's my source mongodb document.
{
"_id": "1541527535911",
"enabled": true,
"price": 15.99,
"style": {
"color": "blue"
},
"tags": [
"shirt",
"summer"
]
}
And here's my mongodb source connector configuration:
{
"name": "redacted",
"config": {
"connector.class": "com.teambition.kafka.connect.mongo.source.MongoSourceConnector",
"databases": "redacted.redacted",
"initial.import": "true",
"topic.prefix": "redacted",
"tasks.max": "8",
"batch.size": "1",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.JSONSerializer",
"key.serializer.schemas.enable": false,
"value.serializer.schemas.enable": false,
"compression.type": "none",
"mongo.uri": "mongodb://redacted:27017/redacted",
"analyze.schema": false,
"schema.name": "__unused__",
"transforms": "RenameTopic",
"transforms.RenameTopic.type":
"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.RenameTopic.regex": "redacted.redacted_Redacted",
"transforms.RenameTopic.replacement": "redacted"
}
}
Over in elasticsearch, it ends up looking like this:
{
"_index" : "redacted",
"_type" : "kafka-connect",
"_id" : "{\"schema\":{\"type\":\"string\",\"optional\":true},\"payload\":\"1541527535911\"}",
"_score" : 1.0,
"_source" : {
"ts" : 1541527536,
"inc" : 2,
"id" : "1541527535911",
"database" : "redacted",
"op" : "i",
"object" : "{ \"_id\" : \"1541527535911\", \"price\" : 15.99,
\"enabled\" : true, \"tags\" : [\"shirt\", \"summer\"],
\"style\" : { \"color\" : \"blue\" } }"
}
}
I'd like to do use 2 single message transforms:
ExtractField to grab object, which is a string of JSON
Something to parse that JSON into an object or just let the normal JSONConverter handle it, as long as it ends up as properly structured in elasticsearch.
I've attempted to do it with just ExtractField in my sink config, but I see this error logged by kafka
kafka-connect_1 | org.apache.kafka.connect.errors.ConnectException:
Bulk request failed: [{"type":"mapper_parsing_exception",
"reason":"failed to parse",
"caused_by":{"type":"not_x_content_exception",
"reason":"Compressor detection can only be called on some xcontent bytes or
compressed xcontent bytes"}}]
Here's my elasticsearch sink connector configuration. In this version, I have things working but I had to code a custom ParseJson SMT. It's working well, but if there's a better way or a way to do this with some combination of built-in stuff (converters, SMTs, whatever works), I'd love to see that.
{
"name": "redacted",
"config": {
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"batch.size": 1,
"connection.url": "http://redacted:9200",
"key.converter.schemas.enable": true,
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"schema.ignore": true,
"tasks.max": "1",
"topics": "redacted",
"transforms": "ExtractFieldPayload,ExtractFieldObject,ParseJson,ReplaceId",
"transforms.ExtractFieldPayload.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractFieldPayload.field": "payload",
"transforms.ExtractFieldObject.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractFieldObject.field": "object",
"transforms.ParseJson.type": "reaction.kafka.connect.transforms.ParseJson",
"transforms.ReplaceId.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.ReplaceId.renames": "_id:id",
"type.name": "kafka-connect",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": false
}
}
I am not sure about your Mongo connector. I don't recognize the class or the configurations... Most people probably use Debezium Mongo connector
I would setup this way, though
"connector.class": "com.teambition.kafka.connect.mongo.source.MongoSourceConnector",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.JSONSerializer",
"key.serializer.schemas.enable": false,
"value.serializer.schemas.enable": true,
The schemas.enable is important, that way the internal Connect data classes can know how to convert to/from other formats.
Then, in the Sink, you again need to use JSON DeSerializer (via the converter) so that it creates a full object rather than a plaintext string, as you see in Elasticsearch ({\"schema\":{\"type\":\"string\").
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": false,
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": true
And if this doesn't work, then you might have to manually create your index mapping in Elasticsearch ahead of time so it knows how to actually parse the strings you are sending it
I am new to elastic search, I have created an index "cmn" with a type "mention". I am trying to import data from my existing solr to elasticsearch, so I want to map an existing field to the _id field.
I have created the following file under /config/mappings/cmn/,
{
"mappings": {
"mentions":{
"_id" : {
"path" : "docKey"
}
}
}
}
But this doesn't seem to be working, every time I index a record the following _id is created,
"_index": "cmn",
"_type": "mentions",
"_id": "k4E0dJr6Re2Z39HAIjYMmg",
"_score": 1
Also, the mapping is not reflects. I have also tried the following option,
{
"mappings": {
"_id" : {
"path" : "docKey"
}
}
}
SAMPLE DOCUMENT: Basically a tweet.
{
"usrCreatedDate": "2012-01-24 21:34:47",
"sex": "U",
"listedCnt": 2,
"follCnt": 432,
"state": "Southampton",
"classified": 0,
"favCnt": 468,
"timeZone": "Casablanca",
"twitterId": 473333038,
"lang": "en",
"stnostem": "#ootd #ootw #fashion #styling #photography #white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh",
"sourceId": "tw",
"timestamp": "2014-04-09T22:58:00.396Z",
"sentiment": 0,
"updatedOnGMTDate": "2014-04-09T22:56:57.000Z",
"userLocation": "Southampton",
"age": 0,
"priorityScore": 57.4700012207031,
"statusCnt": 14612,
"name": "YazzyK",
"profilePicUrl": "http://pbs.twimg.com/profile_images/453578494556270594/orsA0pKi_normal.jpeg",
"mentions": "",
"sourceStripped": "Instagram",
"collectionName": "STREAMING",
"tags": "557/161/193/197",
"msgid": 1397084280396.33,
"_version_": 1464949081784713200,
"url2": "{\"urls\":[{\"url\":\"http://t.co/YbPFrXlpuh\",\"expandedURL\":\"http://instagram.com/p/mliZbgxVZm/\",\"displayURL\":\"instagram.com/p/mliZbgxVZm/\",\"start\":88,\"end\":110}]}",
"links": "http://t.co/YbPFrXlpuh",
"retweetedStatus": "",
"twtScreenName": "YazKader",
"postId": "454030232501358592",
"country": "Bermuda",
"message": "#ootd #ootw #fashion #styling #photography #white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh",
"source": "Instagram",
"parentStatusId": -1,
"bio": "Live and breathe Fashion. Persian and proud- Instagram: #Yazkader",
"createdOnGMTDate": "2014-04-09T22:56:57.000Z",
"searchText": "#ootd #ootw #fashion #styling #photography #white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh",
"isFavorited": "False",
"frenCnt": 214,
"docKey": "tw_454030232501358592"
}
Also, how can we create unique mapping for each "TYPE" and not just the index.
Thanks
Do like this,
Put the mapping as,
PUT index_name/type_name/_mapping
{
"type_name": {
"_id": {
"path": "docKey"
},
"properties": {
"docKey": {
"type": "string"
}
}
}
}
And, it will work. (When you index docKey, then _id is set). You shouldn't have to provide all the mapping.