How to copy data from Cassandra to Elasticsearch? - elasticsearch

How can I copy data from Cassandra to Elasticsearch? Should I use Spark, is there some convenient plugin/other tool to do it?
Cassandra version is 2.1.5 dsc
Spark version is 1.2.1
Elasticsearch version is 2.2.0
EDIT:
Im trying to achieve that with spark:
import org.elasticsearch.spark._
import org.elasticsearch.spark.rdd.EsSpark
json_rdd = ...
EsSpark.saveToEs(json_rdd, "index_name")
Im trying to follow the https://www.elastic.co/guide/en/elasticsearch/hadoop/2.2/spark.html documntation but I cant find where is the connection to elasticsearch happening?

1) Install Spark in stand-alone mode, co-locate Spark workers on Cassandra nodes
2) Use the Spark-Cassandra connector to fetch data locally out of Cassandra
3) Use the Spark-ES connector to push data to ES

Related

Versions for integration of apache flink, elasticsearch and kafka

I have problems with different versions of Flink, Kafka and Elastic Search. I'm using Flink 1.8.1 version but I don't know what version to use for Kafka. On the other hand, I want to use the version 6 for Elastic Search. Which versions do you think are suitable for Flink, Kafka and Elastic Search?
The following link is a version of Kafka, but in the comments section, it is introduced as a beta
enter link description here
As listed in the table, Kafka 0.11 (and higher) will work fine. The beta is a version of the Flink Connector, not Kafka itself
Plus, Kafka Connect for Elasticsearch, should you choose to use it, works for elasticsearch 6
As #cricket_007 said, it's safe to use the Kafka connector, even though it is labeled beta (which should be removed as this connector has now been battle-tested since over a year in production).
The setup Kafka -> Flink -> ES6 is quite common, so you can and should use recent version on all involved components.

Apache Hive on Apache Spark

Does anyone has worked on this configuration: Apache Hive on Apache Spark?
What is the latest version compatibility for this configuration?
I want to implement this in my production systems. Kindly help with the compatibility matrix for Apache Hadoop, Apache Hive, Apache Spark and Apache Zeppelin.
You have to use hive2 (0.11+) and SPARK 2.2.0 and in hive-site.xml. And you have to set Spark as executor engine so you can easily run your queries on top of Spark.
In hive2 there are some options like Tez, llap etc. For more information kindly check the document Hive on Spark: Getting Started.
follow the tutorial
apache hive installation
and then just copy the hive-site.xml to $APACHE_HOME/conf
Hive is moving to rely only on the Tez execution engine. Please build all new workloads on MapReduce or Tez.

Rhadoop with Elasticsearch-hadoop

I am using hadoop with a database from ElasticSearch (no hdfs).
Do you know if elasticsearch-hadoop can work together?
Else do you know how using analytics for my project?
Yes, there is a connector for Elasticsearch and Hadoop that is built and released by Elasticsearch:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/index.html
They just released the GA version 2.0 - here's the blog post about it:
http://www.elasticsearch.org/blog/es-hadoop-2-0-g/

Cassandra and Hadoop

I am new to Cassandra and Hadoop. I am trying to read cassandra data on hourly basis and dump into HDFS. Cassandra and Hadoop are on different clusters. Any pointers on Clients/API I could use to do this is much appreciated.
I recommend Java because Hadoop and Cassandra are both Java based. Astyanax is a good Java Cassandra API.
I've used org.apache.hadoop to write to HDFS using Java but there might be something better out there.

Cassandra wih Hive

Am new in cassandra and Hive. Now i want integrate cassandra with the Hadoop-Hive but how can i integrate the cassandra with Hive.
You're in luck: DataStax just released Brisk, a Cassandra distribution integrating Hadoop and Hive.
http://www.datastax.com/products/brisk
You can look in to WSO2 BAM2 to get an idea about Hive Cassandra integration.
https://svn.wso2.org/repos/wso2/carbon/platform/branches/4.0.0/components/bam2/
You need a Cassandra java storage library.
And here is one https://github.com/dvasilen/Hive-Cassandra
or one mine https://github.com/2013Commons/hive-cassandra

Resources