Creating Indexes in JanusGraph with DynamoDB and ElasticSearch as backend - elasticsearch

I am trying to create the indexes with Elastic Search.
I have below in dynamodb.properties:
index.search.backend=elasticsearch
index.search.hostname=10.0.0.55
The gremlin server comes up without any exceptions.
I am able to connect to the gremlin server from gremlin console.
:remote connect tinkerpop.server conf/remote.yaml session
:remote console
I created below vertices in the graph:
gremlin> g.V().properties()
==>vp[name->sandeep]
==>vp[name->uday]
I am trying to create the indexes on the property "name" and it is getting stuck in "INSTALLED" state.
graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.commit()
//Wait for the index to become available
mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()
3589249 [gremlin-server-session-1] INFO org.janusgraph.graphdb.database.management.GraphIndexStatusWatcher - Some key(s) on index byNameComposite do not currently have status(es) [REGISTERED]: name=INSTALLED
3589709 [gremlin-server-worker-1] WARN org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor - Script evaluation exceeded the configured threshold for request [RequestMessage{, requestId=40375b29-d180-4732-9816-24870fb1b3b1, op='eval', processor='session', args={gremlin=mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call(), session=ef0c3a0e-bef0-4a93-bc29-7869c1fd24db, bindings={}, manageTransaction=false, batchSize=64}}]
java.util.concurrent.TimeoutException: Script evaluation exceeded the configured 'scriptEvaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()]
I tried the same thing in my local Janusgraph with ES and Cassandra and was able to use the index.
Please help me out.
Any help is appreciated.
Thanks
Sandeep

Related

Cannot create mixed index in JanusGraph

I'm using janusgraph 0.5.2 with Cassandra and elastic search. I wanted to create mixes indices.
I followed the docs and created my script as below. Basically I'm closing all open transactions and then creating the mixed index.
size = graph.getOpenTransactions().size();
for(i=0;i<size;i++) {graph.getOpenTransactions().getAt(0).rollback()}
mgmt = graph.openManagement()
taxNoKey = mgmt.getPropertyKey('taxNo')
mgmt.buildIndex('taxNo_mixed', Vertex.class).addKey(taxNoKey).buildMixedIndex("search")
mgmt.commit()
ManagementSystem.awaitGraphIndexStatus(graph, 'taxNo_mixed').status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call()
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("taxNo_mixed"), SchemaAction.REINDEX).get()
mgmt.commit()
After mgmt.updateIndex(mgmt.getGraphIndex("taxNo_mixed"), SchemaAction.REINDEX).get() It get the below error.
ERROR org.janusgraph.graphdb.database.management.ManagementLogger -
Evicted [2#7f00010124289-ivis-SYS-7039A-I1] from cache but waiting too
long for transactions to close. Stale transaction alert on:
[standardjanusgraphtx[0x332460d4], standardjanusgraphtx[0x3de388c0],
standardjanusgraphtx[0x39dc0ba4], standardjanusgraphtx[0x33efa7d4]]
==>org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScanMetrics#3054cdd3
My graph is not big, it contains 200k nodes and 400k edges
I'm copy-pasting to gremlin shell? Is it ok?
Should there be any specific settings in elastic search for creating an index?
Any help is appreciated, thanks
JanusGraph can also have problems creating indices when one of the instances that once opened the graph, was not properly closed. JanusGraph has the following manual procedure to force closure afterwards:
mgmt = graph.openManagement()
mgmt.getOpenInstances() //all open instances
==>7f0001016161-dunwich1(current)
==>7f0001016161-atlantis1
mgmt.forceCloseInstance('7f0001016161-atlantis1') //remove an instance
mgmt.commit()

Janusgraph not able to find suitable index for a index enabled property key

I'm working on a Janusgraph application. To improve gremlin query performance we are creating two mixed indexes, one for vertices and one for edges.
Now Janusgraph can query indexes for property keys that are created and indexed at the time of index creation i.e in the same transaction. If I'm creating and indexing a new property key in a new transaction then Janusgraph is not able to query them using indexing, instead, it does a complete graph scan.
Using Janusgeaph management API I checked that all property keys are indexed and enabled, even then Janusgraph is scanning a complete graph for querying on an indexed property key.
Is there anything I'm missing? Any help would be greatly appreciated.
Backend index engine -> ElasticSearch
Backend Storage -> Cassandra
Have faced this problem once. Try to reindex the created index once (Index created in some other transaction). It worked for me. Hope it works for you too.
Please find the steps below:-
For Reindex:
mgmt = graph.openManagement()
i = mgmt.getGraphIndex('IndexName')
mgmt.updateIndex(i, SchemaAction.REINDEX)
mgmt.commit()
For Enable the index:
ManagementSystem.awaitGraphIndexStatus(graph, 'IndexName').status(SchemaStatus.ENABLED).call()
NOTE: if you get "false" in enabling the index, Try enabling it 2 3 times using the same command (ManagementSystem.awaitGraphIndexStatus(graph, 'IndexName').status(SchemaStatus.ENABLED).call()). It would work eventually.

Elasticsearch not accepting data

I have an Elasticsearch cluster setup on kubernetes. Recently logstash was not able to push any data to the cluster because one of the node in the cluster was out of disk space.
This was the error in logstash
[Ruby-0-Thread-13#[main]>worker1: /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:383] elasticsearch - retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
The es-master had marked the node as read only because the available disk space crossed the threshold
[WARN ][o.e.c.r.a.DiskThresholdMonitor] [es-master-65ccf55794-pm4xz] flood stage disk watermark [95%] exceeded on [SaRCGuyyTBOxTjNtvjui-g][es-data-1][/data/data/nodes/0] free: 9.1gb[2%], all indices on this node will be marked read-only
Following this I freed up resources on that node and now it has enough space available (almost 50%). But logstash is still not able to push data to elastic search and is logging the same error above.
I have the following questions
Will elasticsearch recover from this automatically?
If not, should I restart the cluster? Is it enough if I just restart the data nodes, or should I be restarting the master and the indest nodes as well?
Is there any way to mark the indices writable again without restart?
You have to manually reset the read-only block on your indices.
You can see documentation here in the cluster.routing.allocation.disk.watermark.flood_stage block
The index block must be released manually once there is enough disk
space available to allow indexing operations to continue.
PUT /<your index name>/_settings
{
"index.blocks.read_only_allow_delete": null
}

where can I find the data sent from kafka to elasticSearch ? knowing that I using a ElasticSearch sink?

This my output. I think that everything works fine:
[2018-07-06 22:27:51,458] INFO [Consumer clientId=consumer-4, groupId=connect-elk-sink] Setting newly assigned partitions [elk-test-0] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-07-06 22:27:51,471] INFO [Consumer clientId=consumer-4, groupId=connect-elk-sink] Resetting offset for partition elk-test-0 to offset 0. (org.apache.kafka.clients.consumer.internals.Fetcher)
how I could visualize the data in Kibana?
You can use Elasticsearch REST API directly to hit the index that you set up in your Connect Config, or the /_cat endpoint, I believe.
However, Kibana would be easier as a visualization. Before you can do anything in Kibana, you must setup an Index Pattern via the Settings panel, and it'll prompt for a time field which can be ignored if you don't have one, but you can then go to the Discover panel and query the data as needed.
If no index pattern is created, then Connect either doesn't have permissions to do so, or there is no data flowing, or some other error.
Your log output is not an indication things are actually working, as it's simply saying that the consumer offset has been reset to the beginning of the Kafka partition

When will elasticsearch try to recover its indexes?

I am trying to fix an issue related to Elasticsearch in our production which is not reproducible always. I am using Elasticsearch 1.5(yet to upgrade).
The issue is that while trying to create an index I get an error with exception IndexAlreadyExistsException because call to
client.admin().indices().prepareExists(index).get().isExists();
returns false as Elasticsearch is in recovery mode then when I try to create an index I get that exception.
Below are few links to the issues which says that Elasticsearch returns false while recovering indexes.
8945
8105
As I am not able to reproduce the issue always I am not able to test my fix which is to check the health first before checking isExists().
My question is that when will Elasticsearch start recovery?
You can use the prepareHealth() admin method in order to wait for the cluster to reach a given status before doing your index maintenance operations:
ClusterHealthResponse health = client.admin().cluster().prepareHealth(index)
.setWaitForGreenStatus()
.get();
Or you can also wait for the whole cluster to get green:
ClusterHealthResponse health = client.admin().cluster().prepareHealth()
.setWaitForGreenStatus()
.get();

Resources