Janusgraph not able to find suitable index for a index enabled property key - elasticsearch

I'm working on a Janusgraph application. To improve gremlin query performance we are creating two mixed indexes, one for vertices and one for edges.
Now Janusgraph can query indexes for property keys that are created and indexed at the time of index creation i.e in the same transaction. If I'm creating and indexing a new property key in a new transaction then Janusgraph is not able to query them using indexing, instead, it does a complete graph scan.
Using Janusgeaph management API I checked that all property keys are indexed and enabled, even then Janusgraph is scanning a complete graph for querying on an indexed property key.
Is there anything I'm missing? Any help would be greatly appreciated.
Backend index engine -> ElasticSearch
Backend Storage -> Cassandra

Have faced this problem once. Try to reindex the created index once (Index created in some other transaction). It worked for me. Hope it works for you too.
Please find the steps below:-
For Reindex:
mgmt = graph.openManagement()
i = mgmt.getGraphIndex('IndexName')
mgmt.updateIndex(i, SchemaAction.REINDEX)
mgmt.commit()
For Enable the index:
ManagementSystem.awaitGraphIndexStatus(graph, 'IndexName').status(SchemaStatus.ENABLED).call()
NOTE: if you get "false" in enabling the index, Try enabling it 2 3 times using the same command (ManagementSystem.awaitGraphIndexStatus(graph, 'IndexName').status(SchemaStatus.ENABLED).call()). It would work eventually.

Related

How to get current using shards in Elasticsearch or Opensearch

My opensearch sometimes reaches this error when i adding new index:
Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;
So i have to increase cluster.max_shards_per_node larger.
I wonder if is there any way to check current shards we are using to avoid this error happening?
The best way to see indexing and search activity is by using a monitoring system. And the best monitoring system for Elasticsearch is Opster. You can try it for free at the following link.
https://opster.com/
For the manual check and sort, you can try the following APIs.
You can sort your indices according to the creation date string (cds). It will help you to understand which one is the old one. So you can have an idea about your indices (shards).
GET _cat/indices?v&h=index,cds&s=cds
Also, you check the indices stats to see if is there any activity in searching or indexing.
To check all indices you can use GET _all/_stats
To check only one index you can use GET index_name/_stats

Cannot create mixed index in JanusGraph

I'm using janusgraph 0.5.2 with Cassandra and elastic search. I wanted to create mixes indices.
I followed the docs and created my script as below. Basically I'm closing all open transactions and then creating the mixed index.
size = graph.getOpenTransactions().size();
for(i=0;i<size;i++) {graph.getOpenTransactions().getAt(0).rollback()}
mgmt = graph.openManagement()
taxNoKey = mgmt.getPropertyKey('taxNo')
mgmt.buildIndex('taxNo_mixed', Vertex.class).addKey(taxNoKey).buildMixedIndex("search")
mgmt.commit()
ManagementSystem.awaitGraphIndexStatus(graph, 'taxNo_mixed').status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call()
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("taxNo_mixed"), SchemaAction.REINDEX).get()
mgmt.commit()
After mgmt.updateIndex(mgmt.getGraphIndex("taxNo_mixed"), SchemaAction.REINDEX).get() It get the below error.
ERROR org.janusgraph.graphdb.database.management.ManagementLogger -
Evicted [2#7f00010124289-ivis-SYS-7039A-I1] from cache but waiting too
long for transactions to close. Stale transaction alert on:
[standardjanusgraphtx[0x332460d4], standardjanusgraphtx[0x3de388c0],
standardjanusgraphtx[0x39dc0ba4], standardjanusgraphtx[0x33efa7d4]]
==>org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScanMetrics#3054cdd3
My graph is not big, it contains 200k nodes and 400k edges
I'm copy-pasting to gremlin shell? Is it ok?
Should there be any specific settings in elastic search for creating an index?
Any help is appreciated, thanks
JanusGraph can also have problems creating indices when one of the instances that once opened the graph, was not properly closed. JanusGraph has the following manual procedure to force closure afterwards:
mgmt = graph.openManagement()
mgmt.getOpenInstances() //all open instances
==>7f0001016161-dunwich1(current)
==>7f0001016161-atlantis1
mgmt.forceCloseInstance('7f0001016161-atlantis1') //remove an instance
mgmt.commit()

Update indices in Elasticsearch on adding new documents to my database

I'm new to elastic search however had to work with it. I have successfully set it up using logstash to connect it to my oracle database(one particular table). Now if new records are added to one of the tables in my oracle database(which I built the index on), what should be done?
I have thought of two solutions,
Re-build the indices by running the logstash conf file.
On insert into the table, also POST to elastic search.
The first solution is not working like it should. I mean that if 'users' is the table that I have updated with new records, then on re-building indices(for the 'users' table) in elastic search, the new records also should be reflected in the logstash get query.
The first should would help as a POC.
So, Any help is appreciated.
Thank you Val for pointing me in the right direction.
However, for the first brute-force solution it was about changing the document type in the logstash conf file.
{"document_type":"same_type"}
This must be consistent with the previously mentioned type. I had run it with different type, first time(Same_type). After adding new records, I used same_type. So, the elastic search as thrown an exception for multiple mapping rejection.
For further clarification, it looked up here.
Thank you guys.

Does elastic Search Cluster Set Up leads to deletion of existing indexes?

As of now my elastic search set up exists only on one machine.Now I want to set up elastic search cluster using two nodes.If I make my existing machine as master and data node new machine,then will my existing indexex and data get lost from my master/existing machine?
As per my finding/experiment it got lost,please update if I am wrong.
Please visit this link for the same.

how do you update or sync with a jdbc river

A question about rivers and data syncing with a production database using elastic search:
Are rivers suited for only bulk loading data initially, or does it somehow listen or monitor for changes.
If I have a nightly import of data, is it just better to delete rivers and indexes, and re-index and recreate the rivers?
If I update or change a river, do I have to delete and re-create the index?
How do I set up a schedule with a river to fetch new data periodically. Can it store last maxid so that it can do diff queries in the sql to select into the river?
Any suggestions on a better way to keep the database and elastic search in sync - without calling individual index update functions with a PUT command?
All of the Elasticsearch rivers are different - some are provided directly by Elasticsearch, many more are developed by third parties:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
Each operates differently, so to answer your questions you have to choose a specific river. For your case, since you're looking to index data from a production database, I'll assume that the JDBC river is what you would use:
https://github.com/jprante/elasticsearch-river-jdbc
This river will index data from your JDBC source, including picking up changes. It can do so on a schedule (there is detailed documentation on the schedule parameter on this page: https://github.com/jprante/elasticsearch-river-jdbc). However, this river will not pick up deletes:
https://github.com/jprante/elasticsearch-river-jdbc/issues/213
you may find this discussion useful, concerning getting around the lack of delete support with building a new river/index daily and using index aliases: ElasticSearch river JDBC MySQL not deleting records
You can just map your id in your DB to be _id with alias, this way the elastic will identify when the document was changed or not.

Resources