Elasticsearch CrateData Compatibily? - elasticsearch

All,
I've been playing around with CrateData, and was wondering if you can utilize existing Elasticsearch tools such as drivers and add-ons like Logstash. For example, can you use an Elasticsearch river (http://www.elasticsearch.org/guide/en/elasticsearch/rivers/current/) for data ingest, then use the CrateData query engine, etc. against that data? Can incoming JSON objects be mapped to a table? Are there plans to have, or maintain a coexistence?
Thanks!

You can use existing tools for Elasticsearch with Crate if those tools use the REST API. In order to do so you'll have to enable the es rest api in the crate.yml file. There is aa setting to do so:
es.api.enabled: true
Elasticsearch Plugins won't work without minor modifications as Crate and Elasticsearch aren't binary compatible. Elasticsearch has a shading step in their maven configuration so the elasticsearch jar contains different namespaces then Crate does as Crate doesn't use shading.
So if you wanted to use a plugin you'd have to adjust the namespaces/imports and compile it against crate.

Related

Metricbeat modules

Why are there two modules in metric beat for ES
Elasticsearch
Elasticsearch-pack
Both has same configurations in the modules.d directory.
Kibana page for Elasticsearch module suggests to use Elasticsearch module.
But documentation of Elasticsearch modules suggests the later one. Reference
Alternatively, run metricbeat modules disable elasticsearch and metricbeat modules enable elasticsearch-xpack.
It's so confusing. I think that if I need to use ES-wtih-Xpack, then the later module. But from 6.7.0 onwards, ES ships basic features of x-pack with open source one.
Thanks.
The configuration are almost the same, the elasticsearch-xpack has the option xpack.enabled: true, which is not present in the elasticsearch module, and in the elasticsearch-xpack modulo you also do not specify any metricsets.
If you are using the monitoring UI in Kibana, then you should use the elasticsearch-xpack module, which will collect the metrics that kibana needs.
If you are not using the monitoring UI in Kibana, or are not even using Kibana and just want to collect the metrics, then you need to use the elasticsearch module and specify the metricsets that you want to collect.
The elasticsearch-xpack is just the elasticsearch module without any metricsets configured and with the option xpack.enabled: true.

how to implement elasticsearch

can kibana's console (in Dev Tools) be used for writing and implementing elasticsearch ? I am new to elasticsearch and very confused when it comes to doing hands-on it. thank you in advance.
kibana Dev tools makes calling elastic search API's easier so you can develop what ever you want in kibana Dev tools to make aggregation call or make query string to call the API's.
on the other hand you should use it with an SDK in your application like Elasticsearch JS for javascript so you can use the developed queries and aggregations in kibana to be used in your application and more you can monitor your shards health or put mapping for your indexes and more of functionality which can be found in Documentation, Although, you can find JS API's Documentation here
You can use Kibana Dev Tools to invoke REST API commands to perform cluster level actions such as taking snapshots, restore etc and also index simple documents. But, if you are looking to writing data to Elastic on a regular basis like ingesting server/ app logs or server metrics (CPU, memory, Disk usage etc) you should look at installing filebeats or metricbeats.

How can I aggregate metrics per day in a Grafana - Youse table or metric?

I would add a Metric in use Grafana, in a ruby project.
What are the parameters?, What gem can I use?
Are there a manual?
You should first look into Datasources for Grafana. http://docs.grafana.org/features/datasources/ Datasources are the Programs Grafana can interact with to generate a Graph so you need to install one of them on some device. Grafana itself does not store any data, it "just" creates queries to a Datasource and renders the data.
There are a lot of possible Datasources for Grafana as you can see. Commonly used are Graphite (my favourite) and InfluxDB (easy setup) but a standard SQL could also be the way to go for you. When researching the possible Datasources you can also search for Ruby Gems. I found one for InfluxDB, maintained by Influxdata itself https://github.com/influxdata/influxdb-ruby

export data from elasticsearch to neo4j

I'm looking for the best method to export data from elasticsearch.
is there something better than running a query with from/size, until all the data exported?
specifically, i want to copy parts of it to Neo4j, if there is any plugin for that..
I'm not aware of any kind of the plugin, which can do what you want.
But you can write one. I recommend you to use Jest, because default ElasticSearch Java client using different Lucene version than Neo4j and those versions are incompatible.
Second option is export data from ElasticSearch to CSV and then use Load CSV in Neo4j. This approach is good enough if import data is one time operation.

crawler + elasticsearch integration

I wasn't able to find out, how to crawl website and index data to elasticsearch. I managed to do that in the combination nutch+solr and as nutch should be able from the version 1.8 export data directly to elasticsearch (source), I tried to use nutch again. Nevertheless I didn't succeed. After trying to invoke
$ bin/nutch elasticindex
I get:
Error: Could not find or load main class elasticindex
I don't insist on using nutch. I just would need the simpliest way to crawl websites and index them to elasticsearch. The problem is, that I wasn't able to find any step-by-step tutorial and I'm quite new to these technologies.
So the question is - what would be the simpliest solution to integrate crawler to elasticsearch and if possible, I would be grateful for any step-by-step solution.
Did you have a look at the River Web plugin? https://github.com/codelibs/elasticsearch-river-web
It provides a good How To section, including creating the required indexes, scheduling (based on Quartz), authentication (basic and NTLM are supported), meta data extraction, ...
Might be worth having a look at the elasticsearch river plugins overview as well: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html#river
Since the River plugins have been deprecated, it may be worth having a look at ManifoldCF or Norconex Collectors.
You can evaluate indexing Common Crawl metadata into Elasticsearch using Hadoop:
When working with big volumes of data, Hadoop provides all the power to parallelize the data ingestion.
Here is an example that uses Cascading to index directly into Elasticsearch:
http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch
The process involves the use of a Hadoop cluster (EMR on this example) running the Cascading application that indexes the JSON metadata directly into Elasticsearch.
Cascading source code is also available to understand how to handle the data ingestion in Elasticsearch.

Resources