Rhadoop with Elasticsearch-hadoop - hadoop

I am using hadoop with a database from ElasticSearch (no hdfs).
Do you know if elasticsearch-hadoop can work together?
Else do you know how using analytics for my project?

Yes, there is a connector for Elasticsearch and Hadoop that is built and released by Elasticsearch:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/index.html
They just released the GA version 2.0 - here's the blog post about it:
http://www.elasticsearch.org/blog/es-hadoop-2-0-g/

Related

Versions for integration of apache flink, elasticsearch and kafka

I have problems with different versions of Flink, Kafka and Elastic Search. I'm using Flink 1.8.1 version but I don't know what version to use for Kafka. On the other hand, I want to use the version 6 for Elastic Search. Which versions do you think are suitable for Flink, Kafka and Elastic Search?
The following link is a version of Kafka, but in the comments section, it is introduced as a beta
enter link description here
As listed in the table, Kafka 0.11 (and higher) will work fine. The beta is a version of the Flink Connector, not Kafka itself
Plus, Kafka Connect for Elasticsearch, should you choose to use it, works for elasticsearch 6
As #cricket_007 said, it's safe to use the Kafka connector, even though it is labeled beta (which should be removed as this connector has now been battle-tested since over a year in production).
The setup Kafka -> Flink -> ES6 is quite common, so you can and should use recent version on all involved components.

Couchbase plugin for ElasticSearch deprecated?

I was reading https://www.elastic.co/blog/deprecating-rivers which stats that ES rivers (plugin) are getting deprecated. i.e. any plugin directly integrated with ElasticSearch server will no longer work beyond ES 3.x onwards.
Couchbase plugin is one of those kind.
I searched all the documents of couchbase plugin at http://developer.couchbase.com/documentation/server/4.5/connectors/elasticsearch-2.1/elastic-intro.html but could not find if they are using deprecated way or not?
Does anyone know? Should we keep using couchbase plugin or should start planning to write data directly to ES using our application.
We have couchbase data getting replicated to ES using couchbase plugin and XDCR.
I'm the maintainer of the Couchbase ES transport plugin. As Roi mention in his answer, the plugin doesn't use rivers, so it won't be deprecated. It currently supports any version of ES from 1.3 to 2.x, and I'm working on adding support for 5.x. It's taking a bit longer, because ES 5.x broke some configuration sharing features in unexpected ways.
I'd suggest always looking at our github repo for the latest plugin releases:
https://github.com/couchbaselabs/elasticsearch-transport-couchbase
The Couchbase plugin is not using Rivers, there is another River plugin which is not longer valid.
take a look here: https://github.com/couchbaselabs/elasticsearch-transport-couchbase

How to copy data from Cassandra to Elasticsearch?

How can I copy data from Cassandra to Elasticsearch? Should I use Spark, is there some convenient plugin/other tool to do it?
Cassandra version is 2.1.5 dsc
Spark version is 1.2.1
Elasticsearch version is 2.2.0
EDIT:
Im trying to achieve that with spark:
import org.elasticsearch.spark._
import org.elasticsearch.spark.rdd.EsSpark
json_rdd = ...
EsSpark.saveToEs(json_rdd, "index_name")
Im trying to follow the https://www.elastic.co/guide/en/elasticsearch/hadoop/2.2/spark.html documntation but I cant find where is the connection to elasticsearch happening?
1) Install Spark in stand-alone mode, co-locate Spark workers on Cassandra nodes
2) Use the Spark-Cassandra connector to fetch data locally out of Cassandra
3) Use the Spark-ES connector to push data to ES

How to know the recommended version elastic search of the logstash

I am an newbie to logstash, when I studied
logstash sample. I noticed that it said
Each release of Logstash has a recommended version of Elasticsearch you
should use.
But I failed to find it. I didn't see the release notes of logstash. For example, the logstash I used is 1.5.0, how to know which version elastic search I should use. In the sample above, it said I could use 1.5.1 version.
All Logstash version including and after 1.4.2 can use any ElasticSearch version above 1.x
It is suggested to use ElasticSearch above 1.1.1 as there was a small vulnerability which has since been patched in later versions.
Using 1.5.0 and 1.5.1 will be perfectly fine.
You cand find some notes regarding the recommended ElasticSearch version in Logstash web documentation > Output plugins > ElasticSearch. Here is the link for the current logstash version:
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
For instance, if you want to know about the recommended Elasticsearch version for Logstash 1.4.2, you can visit the old documentation, here:
http://logstash.net/docs/1.4.2/outputs/elasticsearch
Try changing the version from the URL to get specific details for this version. For instance, for Logstash 1.3.2 you can use:
http://logstash.net/docs/1.3.2/outputs/elasticsearch
In each of these documentation references you'll find a section "VERSION NOTE" regarding the recommended ElasticSearch version.

Cassandra wih Hive

Am new in cassandra and Hive. Now i want integrate cassandra with the Hadoop-Hive but how can i integrate the cassandra with Hive.
You're in luck: DataStax just released Brisk, a Cassandra distribution integrating Hadoop and Hive.
http://www.datastax.com/products/brisk
You can look in to WSO2 BAM2 to get an idea about Hive Cassandra integration.
https://svn.wso2.org/repos/wso2/carbon/platform/branches/4.0.0/components/bam2/
You need a Cassandra java storage library.
And here is one https://github.com/dvasilen/Hive-Cassandra
or one mine https://github.com/2013Commons/hive-cassandra

Resources