Versions for integration of apache flink, elasticsearch and kafka - elasticsearch

I have problems with different versions of Flink, Kafka and Elastic Search. I'm using Flink 1.8.1 version but I don't know what version to use for Kafka. On the other hand, I want to use the version 6 for Elastic Search. Which versions do you think are suitable for Flink, Kafka and Elastic Search?
The following link is a version of Kafka, but in the comments section, it is introduced as a beta
enter link description here

As listed in the table, Kafka 0.11 (and higher) will work fine. The beta is a version of the Flink Connector, not Kafka itself
Plus, Kafka Connect for Elasticsearch, should you choose to use it, works for elasticsearch 6

As #cricket_007 said, it's safe to use the Kafka connector, even though it is labeled beta (which should be removed as this connector has now been battle-tested since over a year in production).
The setup Kafka -> Flink -> ES6 is quite common, so you can and should use recent version on all involved components.

Related

Most useful plugins for ElasticSearch

What would be some of the top most used Elasticsearch plugins?
For example, monitoring data, mapping, or analysis plugin.
OP didn't mention the ES version you are using. I am suggesting below plugins as they are easy to set-up, free and provides the admin interface for elasticsearch cluster.
I would recommend, For ES versions less than 2.x KOPF plugin and for the latest version of ES, use Cerebro, which is from the same author of kopf.
It offers an easy way of performing common tasks on an elasticsearch cluster. Not every single API is covered by this plugin, but it does offer a REST client which allows you to explore the full potential of the ElasticSearch API.

i was new to apache storm would like to know key difference between storm 1.1 and storm 2.0?

I was trying to find up any major difference between storm 1.1 and storm 2.0.
Is there any difference while setting up cluster for either of the versions?
(read on official website about new Java-based implementation but has anyone seen any difference between these two versions).
In addition to reading the changelog at https://www.apache.org/dist/storm/apache-storm-2.0.0/RELEASE_NOTES.html, you can look at https://issues.apache.org/jira/browse/STORM-2306?focusedCommentId=16291947&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16291947 for some performance numbers. You can also run your own benchmarks of course.

Couchbase plugin for ElasticSearch deprecated?

I was reading https://www.elastic.co/blog/deprecating-rivers which stats that ES rivers (plugin) are getting deprecated. i.e. any plugin directly integrated with ElasticSearch server will no longer work beyond ES 3.x onwards.
Couchbase plugin is one of those kind.
I searched all the documents of couchbase plugin at http://developer.couchbase.com/documentation/server/4.5/connectors/elasticsearch-2.1/elastic-intro.html but could not find if they are using deprecated way or not?
Does anyone know? Should we keep using couchbase plugin or should start planning to write data directly to ES using our application.
We have couchbase data getting replicated to ES using couchbase plugin and XDCR.
I'm the maintainer of the Couchbase ES transport plugin. As Roi mention in his answer, the plugin doesn't use rivers, so it won't be deprecated. It currently supports any version of ES from 1.3 to 2.x, and I'm working on adding support for 5.x. It's taking a bit longer, because ES 5.x broke some configuration sharing features in unexpected ways.
I'd suggest always looking at our github repo for the latest plugin releases:
https://github.com/couchbaselabs/elasticsearch-transport-couchbase
The Couchbase plugin is not using Rivers, there is another River plugin which is not longer valid.
take a look here: https://github.com/couchbaselabs/elasticsearch-transport-couchbase

Elasticsearch / Storm integration methods

Looking for a simple integration path between Elasticsearch and Apache Storm. Support for this is included in the elasticsearch-hadoop library, but this brings tons of dependencies on the Hadoop stack: from Hive to Cascading, that I simply don't need. Has anyone out there succeeded in this integration without bringing in elasticsearch-hadoop? Thanks.
In my project we're using rabbitmq river for indexing the storm output. It's very efficient and convenient way to write to elasticsearch. You basically put the messages to the queue and the river does the rest. If something gets stucked the data are simply buffered on the queue.
So I would say, use this river approach for writing and elasticsearch Java API for reading, like Kit Menke suggests (or the Jest client, we've found this cool and it offers async API basing on ApacheHttpAsyncClient, though we're not reading from elasticsearch in storm topology but in different services).

Rhadoop with Elasticsearch-hadoop

I am using hadoop with a database from ElasticSearch (no hdfs).
Do you know if elasticsearch-hadoop can work together?
Else do you know how using analytics for my project?
Yes, there is a connector for Elasticsearch and Hadoop that is built and released by Elasticsearch:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/index.html
They just released the GA version 2.0 - here's the blog post about it:
http://www.elasticsearch.org/blog/es-hadoop-2-0-g/

Resources