Cannot create mixed index in JanusGraph - janusgraph

I'm using janusgraph 0.5.2 with Cassandra and elastic search. I wanted to create mixes indices.
I followed the docs and created my script as below. Basically I'm closing all open transactions and then creating the mixed index.
size = graph.getOpenTransactions().size();
for(i=0;i<size;i++) {graph.getOpenTransactions().getAt(0).rollback()}
mgmt = graph.openManagement()
taxNoKey = mgmt.getPropertyKey('taxNo')
mgmt.buildIndex('taxNo_mixed', Vertex.class).addKey(taxNoKey).buildMixedIndex("search")
mgmt.commit()
ManagementSystem.awaitGraphIndexStatus(graph, 'taxNo_mixed').status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call()
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("taxNo_mixed"), SchemaAction.REINDEX).get()
mgmt.commit()
After mgmt.updateIndex(mgmt.getGraphIndex("taxNo_mixed"), SchemaAction.REINDEX).get() It get the below error.
ERROR org.janusgraph.graphdb.database.management.ManagementLogger -
Evicted [2#7f00010124289-ivis-SYS-7039A-I1] from cache but waiting too
long for transactions to close. Stale transaction alert on:
[standardjanusgraphtx[0x332460d4], standardjanusgraphtx[0x3de388c0],
standardjanusgraphtx[0x39dc0ba4], standardjanusgraphtx[0x33efa7d4]]
==>org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScanMetrics#3054cdd3
My graph is not big, it contains 200k nodes and 400k edges
I'm copy-pasting to gremlin shell? Is it ok?
Should there be any specific settings in elastic search for creating an index?
Any help is appreciated, thanks

JanusGraph can also have problems creating indices when one of the instances that once opened the graph, was not properly closed. JanusGraph has the following manual procedure to force closure afterwards:
mgmt = graph.openManagement()
mgmt.getOpenInstances() //all open instances
==>7f0001016161-dunwich1(current)
==>7f0001016161-atlantis1
mgmt.forceCloseInstance('7f0001016161-atlantis1') //remove an instance
mgmt.commit()

Related

How to get current using shards in Elasticsearch or Opensearch

My opensearch sometimes reaches this error when i adding new index:
Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;
So i have to increase cluster.max_shards_per_node larger.
I wonder if is there any way to check current shards we are using to avoid this error happening?
The best way to see indexing and search activity is by using a monitoring system. And the best monitoring system for Elasticsearch is Opster. You can try it for free at the following link.
https://opster.com/
For the manual check and sort, you can try the following APIs.
You can sort your indices according to the creation date string (cds). It will help you to understand which one is the old one. So you can have an idea about your indices (shards).
GET _cat/indices?v&h=index,cds&s=cds
Also, you check the indices stats to see if is there any activity in searching or indexing.
To check all indices you can use GET _all/_stats
To check only one index you can use GET index_name/_stats

Janusgraph not able to find suitable index for a index enabled property key

I'm working on a Janusgraph application. To improve gremlin query performance we are creating two mixed indexes, one for vertices and one for edges.
Now Janusgraph can query indexes for property keys that are created and indexed at the time of index creation i.e in the same transaction. If I'm creating and indexing a new property key in a new transaction then Janusgraph is not able to query them using indexing, instead, it does a complete graph scan.
Using Janusgeaph management API I checked that all property keys are indexed and enabled, even then Janusgraph is scanning a complete graph for querying on an indexed property key.
Is there anything I'm missing? Any help would be greatly appreciated.
Backend index engine -> ElasticSearch
Backend Storage -> Cassandra
Have faced this problem once. Try to reindex the created index once (Index created in some other transaction). It worked for me. Hope it works for you too.
Please find the steps below:-
For Reindex:
mgmt = graph.openManagement()
i = mgmt.getGraphIndex('IndexName')
mgmt.updateIndex(i, SchemaAction.REINDEX)
mgmt.commit()
For Enable the index:
ManagementSystem.awaitGraphIndexStatus(graph, 'IndexName').status(SchemaStatus.ENABLED).call()
NOTE: if you get "false" in enabling the index, Try enabling it 2 3 times using the same command (ManagementSystem.awaitGraphIndexStatus(graph, 'IndexName').status(SchemaStatus.ENABLED).call()). It would work eventually.

How does the 'delete index' command work inside ES?

How does the 'delete index' command work inside ES?
Are there any risks when using the 'delete index' command on a running ES cluster?
will this command cost too much CPU or memory?
Deleting the indices are normally pretty fast and Elasticsearch doesn't actually delete all the documents when it sends the success response of a delete indices request.
Elasticsearch mainly updates the cluster state(maintained on all the nodes of a cluster) to mark the indices as deleted and major heavy-lifting is done in updating it as well as some other things like routing table, metadata, etc.
this is the main method in Elaticsearch source code, which would help you understand the thing which I mentioned above and internals of the delete index.
Some important code snippet from above link
RoutingTable.Builder routingTableBuilder = RoutingTable.builder(currentState.routingTable());
Metadata.Builder metadataBuilder = Metadata.builder(meta);
ClusterBlocks.Builder clusterBlocksBuilder = ClusterBlocks.builder().blocks(currentState.blocks());
final IndexGraveyard.Builder graveyardBuilder = IndexGraveyard.builder(metadataBuilder.indexGraveyard());
final int previousGraveyardSize = graveyardBuilder.tombstones().size();
for (final Index index : indices) {
String indexName = index.getName();
logger.info("{} deleting index", index);
routingTableBuilder.remove(indexName);
clusterBlocksBuilder.removeIndexBlocks(indexName);
metadataBuilder.remove(indexName);
}
// add tombstones to the cluster state for each deleted index
final IndexGraveyard currentGraveyard = graveyardBuilder.addTombstones(indices).build(settings);
metadataBuilder.indexGraveyard(currentGraveyard); // the new graveyard set on the metadata
logger.trace("{} tombstones purged from the cluster state. Previous tombstone size: {}. Current tombstone size: {}.",
graveyardBuilder.getNumPurged(), previousGraveyardSize, currentGraveyard.getTombstones().size());
Coming to your question, Are there any risks when using the 'delete index' command on a running ES cluster? will this command cost too much CPU or memory?
No there is no risk of using delete index request on a running Elasticsearch cluster unless you have a huge cluster state and as mentioned actually deleting the indices happens async and this just updates various state and flags, it doesn't command too much CPU.
You can also enable the trace log in org.elasticsearch.cluster.metadata.MetadataDeleteIndexService and see how big is your cluster size as its logged in the code snippet above.

Creating Indexes in JanusGraph with DynamoDB and ElasticSearch as backend

I am trying to create the indexes with Elastic Search.
I have below in dynamodb.properties:
index.search.backend=elasticsearch
index.search.hostname=10.0.0.55
The gremlin server comes up without any exceptions.
I am able to connect to the gremlin server from gremlin console.
:remote connect tinkerpop.server conf/remote.yaml session
:remote console
I created below vertices in the graph:
gremlin> g.V().properties()
==>vp[name->sandeep]
==>vp[name->uday]
I am trying to create the indexes on the property "name" and it is getting stuck in "INSTALLED" state.
graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.commit()
//Wait for the index to become available
mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()
3589249 [gremlin-server-session-1] INFO org.janusgraph.graphdb.database.management.GraphIndexStatusWatcher - Some key(s) on index byNameComposite do not currently have status(es) [REGISTERED]: name=INSTALLED
3589709 [gremlin-server-worker-1] WARN org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor - Script evaluation exceeded the configured threshold for request [RequestMessage{, requestId=40375b29-d180-4732-9816-24870fb1b3b1, op='eval', processor='session', args={gremlin=mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call(), session=ef0c3a0e-bef0-4a93-bc29-7869c1fd24db, bindings={}, manageTransaction=false, batchSize=64}}]
java.util.concurrent.TimeoutException: Script evaluation exceeded the configured 'scriptEvaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()]
I tried the same thing in my local Janusgraph with ES and Cassandra and was able to use the index.
Please help me out.
Any help is appreciated.
Thanks
Sandeep

how to disable shard re-balancing in elastic search, while allowing new indices to be allocated?

I am using ElasticSearch version 1.0.1 and want to achieve two things at the same time -
1. Allow new indices to be created ( the primary and replica shards need to be allocated as per usual logic).
2. Prevent existing shards to be rebalanced on node failure.
What combination of settings will allow me to achieve the same? I tried the settings from the cluster module documented at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html. But I am unable to achieve both of them at the same time.
Thanks,

Resources