I'm using JanusGraph with ElasticSearch and Cassandra.
My question is how JanusGraph stores the data when I create a new entity in case that I'm using two databases (JanusGraph and ElasticSearch)
I could understand that ElasticSearch is used as index backend and Cassandra is the storage, but:
What JanusGraph does when I persist a new data ? It'll duplicate the same data into Cassandra and also on ElasticSearch (because it's also a database)?
If the answer for the first item is yes, so, when we perform a query that will traversal the graph, the JanusGraph will understand and perform the query on Cassandra and when this is a full text search then JanusGraph switch the query to ElasticSearch ?
If the answer for the first item is no, so, all the data will be stored on Cassandra and in some way JanusGraph will just use the index from ElasticSearch to do a search on Cassandra database ?
ElasticSearch indexes the data stored in Cassandra.
When you do graph traversals, it uses the search index to retrieve the data from Cassandra. Cheers!
Related
When I search in Elasticsearch, Elasticsearch send RESTAPI to original db?
Or Elasticsearch have orginal data?
I find Elasticsearch have indexing data. But I can't certain Elasticsearch have original data.
Elasticsearch is a database itself, so if you want some external data source to be in Elasticsearch (e.g: SQL Database) you need to index the data into Elasticsearch first, and then search against that data.
So no, the REST Api will not query against the original DB but against the data you have in Elasticsearch.
You can read more about the process here:
https://www.elastic.co/guide/en/cloud/current/ec-getting-started-search-use-cases-db-logstash.html
So far all the articles I read about Cassandra all mentions about doing data denormalizing/duplication to improve read performance (e.g. [ebay blog](http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/ and cassandra blog). But to me it seems like the use case is only if you are using Cassandra as the main database for querying.
Currently I have ElasticSearch indexed my Cassandra DB (where everything is still normalized), so does it make sense for me to still denormalize my Cassandra DB given that all my queries actually go through ElasticSearch (ie. it will return list of ids and I fetch the ids directly from Cassandra)?
I want to run elasticsearch with my tomcat server and index the data pulling from database and put it in elasticsearch index. Any pointers will help.
Elasticsearch has a Java API that you can use to do what you want to: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html
If you are new to ES Definitive Guide is a very very good document: https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html
By the way Elasticsearch is a full-text search engine. If you are looking an in memory data solution may be you should consider something like Apache Ignite: http://ignite.apache.org/
I went through Couchbase xcdr replication documentation, but failed to understand below point:
1. couchbase replicate the all the data in bucket in batches to elstic search. And elastic search provide the indexing for these data for realtime statical data. My question is if all the data is replicated to elsastic search , then in this case elastic search is like database which can hold huge amount of data. So can we replace couchbase with elastic search?
2.how the data in form json is send to d3.js for display statical graph.
All of the data is replicated to Elastic Search, but is not held there by default. The indexes and such are created, but the documents are discarded. Elastic Search is not a database and does not perform like one and certainly not on the level of Couchbase. Take a look at this presentation where it talks about performance and stuff and why Cochbas
If your data are not critical or if you have another source of truth, you can use Elasticsearch only.
Otherwise, I'd keep Couchbase and Elasticsearch.
There is a resiliency page on Elastic.co website which describes potential known problems. https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html
My 2 cents.
I'm new to ElasticSearch and am trying to figure out what is the most optimal way to index 1 Terabyte of data in Cassandra.
Two options that I understand right now are:
Move data periodically to ElasticSearch using the Cassandra-River plugin and then run index on the data.
Advantage: Search queries create no impact on Cassandra load
Disadvantage: Have to sync the data periodically
Without moving the data run ElasticSearch on Cassandra to index the data (not sure how will this be done).
Advantage: Data always in sync
Disadvantage: Impacts Cassandra performance ?
Any thoughts would be appreciated.
Prehaps in the context of ElasticSearch 1.4 and above.. just using ElasticSearch as a datastore and search engine might be simpler and elegant option.
Add more nodes to scale.