export data from elasticsearch to neo4j - elasticsearch

I'm looking for the best method to export data from elasticsearch.
is there something better than running a query with from/size, until all the data exported?
specifically, i want to copy parts of it to Neo4j, if there is any plugin for that..

I'm not aware of any kind of the plugin, which can do what you want.
But you can write one. I recommend you to use Jest, because default ElasticSearch Java client using different Lucene version than Neo4j and those versions are incompatible.
Second option is export data from ElasticSearch to CSV and then use Load CSV in Neo4j. This approach is good enough if import data is one time operation.

Related

How can I aggregate metrics per day in a Grafana - Youse table or metric?

I would add a Metric in use Grafana, in a ruby project.
What are the parameters?, What gem can I use?
Are there a manual?
You should first look into Datasources for Grafana. http://docs.grafana.org/features/datasources/ Datasources are the Programs Grafana can interact with to generate a Graph so you need to install one of them on some device. Grafana itself does not store any data, it "just" creates queries to a Datasource and renders the data.
There are a lot of possible Datasources for Grafana as you can see. Commonly used are Graphite (my favourite) and InfluxDB (easy setup) but a standard SQL could also be the way to go for you. When researching the possible Datasources you can also search for Ruby Gems. I found one for InfluxDB, maintained by Influxdata itself https://github.com/influxdata/influxdb-ruby

crawler + elasticsearch integration

I wasn't able to find out, how to crawl website and index data to elasticsearch. I managed to do that in the combination nutch+solr and as nutch should be able from the version 1.8 export data directly to elasticsearch (source), I tried to use nutch again. Nevertheless I didn't succeed. After trying to invoke
$ bin/nutch elasticindex
I get:
Error: Could not find or load main class elasticindex
I don't insist on using nutch. I just would need the simpliest way to crawl websites and index them to elasticsearch. The problem is, that I wasn't able to find any step-by-step tutorial and I'm quite new to these technologies.
So the question is - what would be the simpliest solution to integrate crawler to elasticsearch and if possible, I would be grateful for any step-by-step solution.
Did you have a look at the River Web plugin? https://github.com/codelibs/elasticsearch-river-web
It provides a good How To section, including creating the required indexes, scheduling (based on Quartz), authentication (basic and NTLM are supported), meta data extraction, ...
Might be worth having a look at the elasticsearch river plugins overview as well: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html#river
Since the River plugins have been deprecated, it may be worth having a look at ManifoldCF or Norconex Collectors.
You can evaluate indexing Common Crawl metadata into Elasticsearch using Hadoop:
When working with big volumes of data, Hadoop provides all the power to parallelize the data ingestion.
Here is an example that uses Cascading to index directly into Elasticsearch:
http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch
The process involves the use of a Hadoop cluster (EMR on this example) running the Cascading application that indexes the JSON metadata directly into Elasticsearch.
Cascading source code is also available to understand how to handle the data ingestion in Elasticsearch.

Run query on couchbase data imported using sqoop and hadoop connector

I am using sqoop with hadoop couchbase connector to import some data from couchbase to hdfs.
As stated in
http://docs.couchbase.com/hadoop-plugin-1.1/#limitations
querying is not supported for couchbase.
I want a solution to run query using the hadoop connector.
For ex:
I have 2 documents in db as follows:
{'doctype':'a'}
and
{'doctype':'b'}
I need to get only the docs which belong to docType=a.
Is there a way to do this?
If you want to select data from Couchbase, you don't need hadoop connector...you can just use couchbase view that filters on doc.doctype=='a'
See couchbase views documentation
On other hand, I recommend using new N1QL query functionality from Couchbase. It is quite flexible query language (similar to SQL), see online N1QL tutorial.
Note: If you look at compatibility for N1QL to run it has v2.2 and higher, see N1QL Compatibility You will need to deploy Couchbase N1QL Query server and point to your existing CB v2.2 cluster. see: Couchbase N1QL queries on server
Suggesting another alternative for Sqoop for the above requirement called 'Couchdoop'.
Couchdoop uses views to fetch data from Couchbase. Hence we can write a query as per our need and use Couchdoop to hit the view and fetch data.
https://github.com/Avira/couchdoop
Worked for me.

Elasticsearch CrateData Compatibily?

All,
I've been playing around with CrateData, and was wondering if you can utilize existing Elasticsearch tools such as drivers and add-ons like Logstash. For example, can you use an Elasticsearch river (http://www.elasticsearch.org/guide/en/elasticsearch/rivers/current/) for data ingest, then use the CrateData query engine, etc. against that data? Can incoming JSON objects be mapped to a table? Are there plans to have, or maintain a coexistence?
Thanks!
You can use existing tools for Elasticsearch with Crate if those tools use the REST API. In order to do so you'll have to enable the es rest api in the crate.yml file. There is aa setting to do so:
es.api.enabled: true
Elasticsearch Plugins won't work without minor modifications as Crate and Elasticsearch aren't binary compatible. Elasticsearch has a shading step in their maven configuration so the elasticsearch jar contains different namespaces then Crate does as Crate doesn't use shading.
So if you wanted to use a plugin you'd have to adjust the namespaces/imports and compile it against crate.

How to implement faster search in website using Apache Solr?

I want to use Apache Solr in my website in order to make the search faster.
I need java code to index data from mysql database so that I can perform faster search?
So can anybody please tell me how to implement this?
You can start with Looking at Solr DataImportHandler
This will enable you to index data from DB to Solr.
You would need to configure Solr to make it faster in performance though and it depends how much and what kind of data you have.
If you specifically want to use Java code to add data to the index, you should use the SolrJ client. For the specific case of adding data focus on the Adding Data to Solr section. However as #Jayendra pointed out, you can use other means than just Java, like the DataImportHandler, to load data into Solr. Also, please refer to the Integrating Solr page for a list of additional Solr Client/Language Bindings.

Resources