I need to sync my oracle data with elastic search , when i search on net i found 3 ways it can be done
Using Logstash JDBC input plugin
Using Kafka connect JDBC
Using Elasticsearch JDBC input plugin
I am using jdk1.8 and elsaticsearch 7.2.
I want to use through JDBC input plugin i search for installer and steps to configure , not able to found .
Can you please guide me on that.
I would have gone for the Logstash JDBC input plugin plugin. Should be pretty straight forward and the continuous syncing is well documented.
PS: "Elasticsearch JDBC input plugin" are you sure there is such a thing? Generally you don't want to do any blocking (especially IO) calls in Elasticsearch — that's why Rivers were removed and we're careful not to add any such problems again.
Related
I am using ElasticSearch 7.1. It comes with log4j2.11.1.jar. The problem comes when I am trying to setup a remote data store with log4j2 running as a TcpSocketServer. I would then use log4j logging API in different Java applications to transmit logs over to the remote data store to analyse. However, from log4j2 Java documentation, I found out that the TcpSocketServer has been taken out.
How did you guys managed to configure a remote data store with the latest log4j2 library? Is there any working architecture layout which still fits my use case?
Elasticsearch is not a great log shipper; also what happens if the network is down? We're generally going more down the route that the Beats should take that part over, so Filebeat with the Elasticsearch module here: https://www.elastic.co/guide/en/beats/filebeat/7.1/filebeat-module-elasticsearch.html
Can Kafka be used as a messaging service between oracle and elastic search ? any downsides of this approach?
Kafka Connect provides you a JDBC Source and an Elasticsearch Sink.
No downsides that I am aware of, other than service maintenance.
Feel free to use Logstash instead, but Kafka provides better resiliency and scalability.
I have tried this in the past with Sql server instead of Oracle and it works great, and I am sure you could try the same approach with Oracle as well since I know the logstash JDBC plugin that I am going to describe below has support for Oracle DB.
So basically you would need a Logstash JDBC input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html that points to your Oracle DB instance and pushes the rows over to Kafka using the Kafka Output plugin https://www.elastic.co/guide/en/logstash/current/plugins-outputs-kafka.html.
Now to read the contents from Kafka you would need, another Logstash instance(this is the indexer) and use the Kafka input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html. And finally use the Elastic search output plugin in the Logstash indexer configuration file to push the events to Elastic Search.
So the pipeline would look like this,
Oracle -> Logstash Shipper -> Kafka -> Logstash Indexer -> Elastic search.
So overall I think this is a pretty scalable way to push events from your DB to Elastic search. Now, if you look at downsides, at times you can feel that there are one too many components in your pipeline and can be frustrating especially when you have failures. So you need to put in appropriate controls and monitoring at every level to make sure you have a functioning data aggregation pipeline that is described above. Give it a try and good luck!
I saw that logstash is used for sync data between a sql server and Elastic Search 5
In this example , it is shown that Logstash can use jdbc plugin for importing data from a database
But when I look at the available plugins, I notice one plugin named Beats,
it look like to also be used for importing data
I propapbly missanderstood , so is anybody acn explain me whatr the use of Beats plugin and hos is it used by logstash please?
Logstash currently has 52 ways of getting input. As you've seen, jdbc and beats are two. Each of the inputs serves a different use case. As described in the doc, jdbc is used to "ingest data in any database with a JDBC interface" while beats is used to "receive events from the Elastic Beats framework".
Depending on your needs, you would choose the appropriate input plugin.
I wasn't able to find out, how to crawl website and index data to elasticsearch. I managed to do that in the combination nutch+solr and as nutch should be able from the version 1.8 export data directly to elasticsearch (source), I tried to use nutch again. Nevertheless I didn't succeed. After trying to invoke
$ bin/nutch elasticindex
I get:
Error: Could not find or load main class elasticindex
I don't insist on using nutch. I just would need the simpliest way to crawl websites and index them to elasticsearch. The problem is, that I wasn't able to find any step-by-step tutorial and I'm quite new to these technologies.
So the question is - what would be the simpliest solution to integrate crawler to elasticsearch and if possible, I would be grateful for any step-by-step solution.
Did you have a look at the River Web plugin? https://github.com/codelibs/elasticsearch-river-web
It provides a good How To section, including creating the required indexes, scheduling (based on Quartz), authentication (basic and NTLM are supported), meta data extraction, ...
Might be worth having a look at the elasticsearch river plugins overview as well: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html#river
Since the River plugins have been deprecated, it may be worth having a look at ManifoldCF or Norconex Collectors.
You can evaluate indexing Common Crawl metadata into Elasticsearch using Hadoop:
When working with big volumes of data, Hadoop provides all the power to parallelize the data ingestion.
Here is an example that uses Cascading to index directly into Elasticsearch:
http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch
The process involves the use of a Hadoop cluster (EMR on this example) running the Cascading application that indexes the JSON metadata directly into Elasticsearch.
Cascading source code is also available to understand how to handle the data ingestion in Elasticsearch.
I’m trying to install grafana to work with OpenTSDB datasource. I’d like to know, what should I do to install it without elasticsearch?
I'm using grafana with Influxdb and I'm not using elasticsearch.
Grafana 2 is out in beta and I've been using that in production for a while. Grafana 2 now has its own data store, which either uses MySQL or SQLite. But you can always use Elasticsearch as well. You can read more about it here
Update: Stable version of Grafana 2 is now out, and it just works.
Grafana is a frontend, you will need some kind of database to store values and configuration in. I just grabbed the .tar.gz file from grafana's downloads page, created a config.js and pointed it at my influxdb server. No elasticsearch here, either.
You might want to take a look at gofana which will allow you to run Grafana without Elasticsearch. It's a self-contained binary that allows you to store dashboards on the filesystem and not in Elasticsearch or InfluxDB. It also supports HTTPS and basic authentication.
Note: I'm the author of gofana.