JanusGraph: Some key(s) on index byAllTweetFields do not currently have status(es) [REGISTERED] - janusgraph

I have many data on JanusGraph DB and I created an index. But when index created no data indexed.
The line of code which prints the log is this:
ManagementSystem.awaitGraphIndexStatus(jGraph, index.name).timeout(10, ChronoUnit.DAYS).call()
The log is:
GraphIndexStatusWatcher - Some key(s) on index byAllTweetFields do not currently have status(es) [REGISTERED]: idCode=INSTALLED,text=INSTALLED,title=INSTALLED,urgencyLevel=INSTALLED
I used Cassandra and ElasticSearch as backend DBs.
What I have missed?

Related

When I search in Elasticsearch, Elasticsearch send RESTAPI to original db?

When I search in Elasticsearch, Elasticsearch send RESTAPI to original db?
Or Elasticsearch have orginal data?
I find Elasticsearch have indexing data. But I can't certain Elasticsearch have original data.
Elasticsearch is a database itself, so if you want some external data source to be in Elasticsearch (e.g: SQL Database) you need to index the data into Elasticsearch first, and then search against that data.
So no, the REST Api will not query against the original DB but against the data you have in Elasticsearch.
You can read more about the process here:
https://www.elastic.co/guide/en/cloud/current/ec-getting-started-search-use-cases-db-logstash.html

BIgquery to elasticsearch (Avoid adding duplicate documents to elasticsearch.)

I am trying to sync data between Bigquery and elasticsearch using the job template provided in GCP. The issue is that Bigquery sends all the documents everytime the job is run, now as elasticsearch has the document id as _id ,it creates duplicate documents.
Is there a way by which we can configure data _id field while sending data from bigquery to elasticsearch.

JanusGraph - How is the data is stored in ElasticSearch and Cassandra?

I'm using JanusGraph with ElasticSearch and Cassandra.
My question is how JanusGraph stores the data when I create a new entity in case that I'm using two databases (JanusGraph and ElasticSearch)
I could understand that ElasticSearch is used as index backend and Cassandra is the storage, but:
What JanusGraph does when I persist a new data ? It'll duplicate the same data into Cassandra and also on ElasticSearch (because it's also a database)?
If the answer for the first item is yes, so, when we perform a query that will traversal the graph, the JanusGraph will understand and perform the query on Cassandra and when this is a full text search then JanusGraph switch the query to ElasticSearch ?
If the answer for the first item is no, so, all the data will be stored on Cassandra and in some way JanusGraph will just use the index from ElasticSearch to do a search on Cassandra database ?
ElasticSearch indexes the data stored in Cassandra.
When you do graph traversals, it uses the search index to retrieve the data from Cassandra. Cheers!

Best way to synchronize Elasticsearch with Mysql

I am using elasticsearch in my spring web mvc project (spring-data-elasticsearch) and to synchronize with database (MySQL).
I am indexing the document from my app, if any new entity going to add in db tables at the same time, from service layer, I request to index this document to elasticsearch also.
Both db tables and elasticsearch index have same data and to delete and update operation on I am using same concept, performing the change operation on elasticsearch and db table, it is working fine.
Now I want to know what will be the disadvantages of this approach.
Or is there any best way to make our elasticsearch index up to date from db. I used logstash but what about the deleted entities
The disadvantage of Synchronous indexation is there is no retry if there is an error while creating index data.
At your place i will create a cronjob/batch ( for trigger it depends how much data are updated and how important is the update of index )
and this job will have execution status with logs
you will have the clear idea about your index and no missing data
And for indexes you can a FULL index mode & an UPDATE indexes mode ( you should add an update date on your tables )
Indexing strategy you have two phases and you can choose TWO_PHASES : you need a master & slave ==> while executing indexing on master the slave will respond to requests and when the indexing is over you synchronize DIRECT_MODE : drop index & create new one

Where / How ElasticSearch stores logs received from Logstash?

Disclaimer: I am very new to ELK Stack, so this question can be very basic.
I am setting up ELK stack now. I have below basic questions about ElasticSearch.
What is the storage model elastic search is following?
For example Oracle is using relational model ,Alfresco is using "document model" and Apache Jackrabbit is using "hierarchial model"
2.Log data stored in elastic search is persistent/permanent ? Or ElasticSearch deletes log data after certain period?
3.How we will manage/backup this data?
4.Log/data files in Elastic Search is human-readable?
Any help/route to documentation will be appreciated.
the storage model is a Document model. Everything is a document. The documents are of a particular type and they are stored in an index.
Data send to ES is stored on disk. It can be then read, searched or deleted through a REST API.
The Data is managed through the rest API. Usually for log centralisation, the logs are stored in date-based index (one index for today, one for yesterday and so on), so to delete the logs from one day, you delete the relevant index. Curator can help in this case. ES offers a backup and restore module.
To access the data in ES, you'll have to use the REST API or use the Kibana client.
Documentation:
https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Resources