How to load data from Cassandra into ELK

How to load data from Cassandra into ELK - elasticsearch

I have installed Cassandra 3.11.3 in my ubuntu virtual machine. I have also installed the ELK(elasticsearch, logstash, kibana).
What is the way using which I can visualize the Cassandra data into Kibana using the ELK. Please let me know the detail configurations that i will need to do in order to get data from Cassandra database into the Kibana dashboard.

I did the similar thing using Kafka, where I used below structure:
Cassandra -> Confluent Kafka -> Elastic search.
Its pretty easy to do as connectors are provided by Confluent.
But if you only need to visualize the data , you can try Banana which gels well with Cassandra.
Note: Banana is forked version of Kibana.
Se

Related

How to generate huge random data and populate Elastic search running on K8S cluster?

I've K8S cluster up and running. There is Elastic search and Kibana deployed on the K8S cluster.
I need to populate ES with almost 25 t0 50GB of random data to Elastic search for testing. Any easy way to achieve this. I'm a newbie to ES and K8S. Any inputs or pointers will be of great help.

You can use Logstash for ingesting data to the elasticsearch. Logstash supports various input plugins from elasticsearch log4j to S3. You can try ingesting data from any one of the sources that logstash supports as input plugin.
[https://www.elastic.co/guide/en/logstash/current/input-plugins.html][1]

Ambari Hadoop/Spark and Elasticsearch SSL Integration

I have a Hadoop/Spark cluster setup via Ambari (HDP -2.6.2.0). Now that I have my cluster running, I want to feed some data into it. We have an Elasticsearch cluster on premise (version 5.6). I want to setup the ES-Hadoop Connector (https://www.elastic.co/guide/en/elasticsearch/hadoop/current/doc-sections.html) that Elastic provides so I can dump some data from Elastic to HDFS.
I grabbed the ZIP file with the JARS and followed the directions on a blog post at CERN:
https://db-blog.web.cern.ch/blog/prasanth-kothuri/2016-05-integrating-hadoop-and-elasticsearch-%E2%80%93-part-2-%E2%80%93-writing-and-querying
So far, this seems reasonable, but I have some questions:
We have SSL/TLS setup on our Elasticsearch cluster, so when I perform a query, I obviously get an error using the example on the blog. What do I need to do on my Hadoop/Spark side and on the Elastic side to make this communication work?
I read that I need to add those JARS to the Spark classpath - is there a rule of thumb as to where i should put those on my cluster? I assume on of my Spark Client nodes, but I am not sure. Also, once i put them there, is there a way to add them to the classpath so that all of my nodes / client nodes have the same classpath? Maybe something in Ambari provides that?
Basically what I am looking for is to be able to preform a query to ES from Spark that triggers a job that tells ES to push "X" amount of data to my HDFS. Based on what I can read on the Elastic site, this is how I think it should work, but I am really confused by the documentation. It's lacking and has confused both me and my Elastic team. Can someone provide some clear directions or some clarity around what I need to do to set this up?

For the project setup part of the question you can take a look at
https://github.com/zouzias/elasticsearch-spark-example
which a project template integrating elasticsearch with spark.

Can Kafka be used as a messaging service between oracle and elasticsearch

Can Kafka be used as a messaging service between oracle and elastic search ? any downsides of this approach?

Kafka Connect provides you a JDBC Source and an Elasticsearch Sink.
No downsides that I am aware of, other than service maintenance.
Feel free to use Logstash instead, but Kafka provides better resiliency and scalability.

I have tried this in the past with Sql server instead of Oracle and it works great, and I am sure you could try the same approach with Oracle as well since I know the logstash JDBC plugin that I am going to describe below has support for Oracle DB.
So basically you would need a Logstash JDBC input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html that points to your Oracle DB instance and pushes the rows over to Kafka using the Kafka Output plugin https://www.elastic.co/guide/en/logstash/current/plugins-outputs-kafka.html.
Now to read the contents from Kafka you would need, another Logstash instance(this is the indexer) and use the Kafka input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html. And finally use the Elastic search output plugin in the Logstash indexer configuration file to push the events to Elastic Search.
So the pipeline would look like this,
Oracle -> Logstash Shipper -> Kafka -> Logstash Indexer -> Elastic search.
So overall I think this is a pretty scalable way to push events from your DB to Elastic search. Now, if you look at downsides, at times you can feel that there are one too many components in your pipeline and can be frustrating especially when you have failures. So you need to put in appropriate controls and monitoring at every level to make sure you have a functioning data aggregation pipeline that is described above. Give it a try and good luck!

How to reset replication stream between couchbase and elasticsearch

I have a couchbase cluster setup as the primary source for data. From this a subset of data is synced to a elasticsearch cluster via the Couchbase Transport Plugin for ElasticSearch(https://github.com/couchbaselabs/elasticsearch-transport-couchbase) which sets up an XDCR stream from couchbase to elasticsearch.
Due to some issues with the elasticsearch cluster all data needs to be synced again from couchbase to elasticsearch. I have tried recreating XDCR but that does not seem to help as it only copies a very small subset of documents. Is there a way by which this can be achieved?
Additional details
Couchbase version: 3.1.0
Number of couchbase documents: 50K+
Documents synced to elasticsearch: around 700 (expected 20K+)
If a document in couchbase is modified it is successfully synced to elasticsearch

The issue you're experiencing is likely in one of the following: XDCR, the Couchbase Transport Plugin for Elasticsearch, or Elasticsearch itself.
Start by checking for XDCR errors. You can find your XDCR logs using these instructions. Be aware that the Transport Plugin uses XDCR v1 and almost everything else in Couchbase uses v2.
Consult the advice in troubleshooting the Couchbase Transport Plugin for Elasticsearch. Instructions should work for you even though they are from the 4.0 docs.
Pay attention to how your documents are being mapped to Elasticsearch. You mention that you're expecting only a subset of documents to be synced to Elasticsearch, so it's possible that you have lost a setting or misconfigured something. You can enable logging and observe a small set of test data. At TRACE level, you should be able to see each document that is inspected.
If all of that fails, make sure the basics are working by indexing the beer sample dataset, following the directions in the Couchbase docs. ES is probably not the issue, but test with a fresh ES instance will rule out problems on that side.

Setting up an ELK cluster

I am trying to build a log pipe using RabbitMQ + ELK on Windows Servers.
RabbitMQ --> Logstash --> ElasticSearch --> Kibana.
Ideally i want to have 2 instances to RabbitMQ, 2 of Logstash, 3 of ElasticSearch and 1 Kibana.
Has anyone setup up something like this ? I know we can setup ElasticSearch cluster easily via setting the cluster name in the yml. What is the mechanism for lagstash to write to the ES cluster ?
Should i setup RabbitmQ+Logstash combos in each instance so that if MQs are behind a load balancer, each MQ will have its own logstash output instance and from there data goes to the cluster.

Technically you could write directly from Logstash to ES using elasticsearch output plugin or Elasticsearch_http output plugin(if using ES version not compatible with Logstash). That said for an enterprise scenario you would need fault tolerance and to handle volume, its a good idea to have RabbitMQ/Redis.
Your above config looks good, although input to your Rabbit cluster would be from one or many Logstash shippers(instances running on the client machines where logs live), that would point to a HA RabbitMQ cluster. Then a Logstash indexer whose input would be configured to look at the RabbitMQ queue(s)and output it to Elastic search cluster.
Hope that helps.

It's not recommended to put directelly the DATA from Logstash to ES.
ES Write is slow , so in heavy load you can loose data .
The idea is to add a proxy between Logstash and ES .
Logstash --> Proxy --> Elasticsearch
Logstash support Redis and RabbitMQ as a proxy .
This proxy can handle large Inputs and work as a queue mechanism .
Logstash is putting Redis as a primary choice (Because of simplicity of setup and monitoring).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio