Gremlin-Python Connecting to existing JanusGraph - janusgraph

I Have created a graph using gremlin console
gremlin> ConfiguredGraphFactory.graphNames
==>MYGRAPH
gremlin> ConfiguredGraphFactory.getConfiguration('MYGRAPH')
==>storage.backend=cql
==>graph.graphname=MYGRAPH
==>storage.hostname=127.0.0.1
==>Template_Configuration=false
gremlin> g.V().properties()
==>vp[name->SFO]
==>vp[country->USA]
==>vp[name->ALD]
==>vp[country->IND]
==>vp[name->BLR]
==>vp[country->IND]
gremlin>
I want to connect with MYGRAPH using gremlin-python.
Can someone please tell me how to access graph named "MYGRAPH" using gremlin-python.
Thanks in advance...

First of all you will need to install some jar files for JanusGraph to handle gremlin-python scripts:
./bin/gremlin-server.sh -i org.apache.tinkerpop gremlin-python 3.2.9
Please note that the version of gremlin-python you install must match the Tinkerpop version JanusGraph is compatible with. You can find compatibility information on the JanusGraph releases page. For example JanusGraph 0.2.2 is compatible with Tinkerpop 3.2.9.
Next you need to start a JanusGraph server using ConfiguredGraphFactory. You just have to use the file conf/gremlin-server/gremlin-server-configuration.yaml from the ditribution.
bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration.yaml
This file differs from the traditional conf/gremlin-server/gremlin-server.yaml in those few lines
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties
}
Then we need to load the graph MYGRAPH during the initialization script of the server. Please create an init script scripts/init.groovy. Here you can load as many different graphs as you want.
def globals = [:]
myGraph = ConfiguredGraphFactory.open("MYGRAPH")
globals << [myGraphTraversal : myGraph.traversal()]
Make sure this script is executed when gremlin server starts in conf/gremlin-server/gremlin-server-configuration.yaml
scriptEngines: {
gremlin-groovy: {
imports: [java.lang.Math],
staticImports: [java.lang.Math.PI],
scripts: [scripts/init.groovy]}}
Finally in your Python project, install the gremlin-python package that matches the Tinkerpop version of your version of JanusGraph. In case of JanusGraph 0.2.2, this is version 3.2.9.
pip install gremlin-python==3.2.9
Start a Python shell and start coding:
>>> from gremlin_python import statics
>>> from gremlin_python.structure.graph import Graph
>>> from gremlin_python.process.graph_traversal import __
>>> from gremlin_python.process.strategies import *
>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
>>> graph = Graph()
>>> myGraphTraversal = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','myGraphTraversal'))
>>> myGraphTraversal.V().count()

Related

Dialogflow CX - Location settings have to be initialized - FAILED_PRECONDITION

I am automating Dialogflow CX using Python client libraries. That includes agent/intent/entity etc. creation/updation/deletion.
But for the first time run, I am encountering the below error from python.
If I login to console and set the location from there and rerun the code, it is working fine. I am able to create agent.
Followed this URL of GCP -
https://cloud.google.com/dialogflow/cx/docs/concept/region
I am looking for code to automate the region & location setting before running the python code. Kindly provide me with the code.
Below is the code I am using to create agent.
Error -
google.api_core.exceptions.FailedPrecondition: 400 com.google.apps.framework.request.FailedPreconditionException: Location settings have to be initialized before creating the agent in location: us-east1. Code: FAILED_PRECONDITION
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.FAILED_PRECONDITION
details = "com.google.apps.framework.request.FailedPreconditionException: Location settings have to be initialized before creating the agent in location: us-east1. Code: FAILED_PRECONDITION"
debug_error_string = "{"created":"#1622183899.891000000","description":"Error received from peer ipv4:142.250.195.170:443","file":"src/core/lib/surface/call.cc","file_line":1068,"grpc_message":"com.google.apps.framework.request.FailedPreconditionException: Location settings have to be initialized before creating the agent in location: us-east1. Code: FAILED_PRECONDITION","grpc_status":9}"
main.py -
# Import Libraries
import google.auth
import google.auth.transport.requests
from google.cloud import dialogflowcx as df
from google.protobuf.field_mask_pb2 import FieldMask
import os, time
import pandas as pd
# Function - Authentication
def gcp_auth():
cred, project = google.auth.default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
auth_req = google.auth.transport.requests.Request()
cred.refresh(auth_req)
# Function - Create Agent
def create_agent(agent_name, agent_description, language_code, location_id, location_path):
if location_id == "global":
agentsClient = df.AgentsClient()
else:
agentsClient = df.AgentsClient(client_options={"api_endpoint": f"{location_id}-dialogflow.googleapis.com:443"})
agent = df.Agent(display_name=agent_name, description=agent_description, default_language_code=language_code, time_zone=time_zone, enable_stackdriver_logging=True)
createAgentRequest = df.CreateAgentRequest(agent=agent, parent=location_path)
agent = agentsClient.create_agent(request=createAgentRequest)
return agent```
Currently, Dialogflow does not support configuring the location settings through the API, thus you can not initialise location settings through it. You can only set the location through the Console.
As an alternative, since the location setting has to be initialised only once for each region per project you could set the location and automate the agent creation process, some useful links: 1 and 2.
On the other hand, if you would find this feature useful, you can file a Feature Request, here. It will be evaluated by the Google's product team.
Many thanks Alexandre Moraes. I have raised a feature request for the same.

Elasticsearch query in Julia

How do I connect Julia with Elasticsearch? Has anyone ever tried it, or found a package that is ready to use?
I know that in Julia we can use python package, but I still have no idea how to use it.
There it is:
#Installation
using Conda
Conda.add("elasticsearch")
# loading module and getting connection
using PyCall
elasticsearch = pyimport("elasticsearch")
es = elasticsearch.Elasticsearch() # <== this is the connection to ES
es.info() # connection information
# put some data
dat = Dict("a1"=>"blaaa", "a2"=>"hello")
res = es.index(index="data", doc_type="data", id="1", body=dat)
# fetch some data
q1 = Dict("query"=>Dict("match"=>Dict("a1"=>Dict("query"=>"blaaa"))))
es.search("data",body=q1)["hits"]["hits"]

Setup and configuration of JanusGraph for a Spark cluster and Cassandra

I am running JanusGraph (0.1.0) with Spark (1.6.1) on a single machine.
I did my configuration as described here.
When accessing the graph on the gremlin-console with the SparkGraphComputer, it is always empty. I cannot find any error in the logfiles, it is just an empty graph.
Is anyone using JanusGraph with Spark and can share his configuration and properties?
Using a JanusGraph, I get the expected Output:
gremlin> graph=JanusGraphFactory.open('conf/test.properties')
==>standardjanusgraph[cassandrathrift:[127.0.0.1]]
gremlin> g=graph.traversal()
==>graphtraversalsource[standardjanusgraph[cassandrathrift:[127.0.0.1]], standard]
gremlin> g.V().count()
14:26:10 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes
==>1000001
gremlin>
Using a HadoopGraph with Spark as GraphComputer, the graph is empty:
gremlin> graph=GraphFactory.open('conf/test.properties')
==>hadoopgraph[cassandrainputformat->gryooutputformat]
gremlin> g=graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().count()
==>0==============================================> (14 + 1) / 15]
My conf/test.properties:
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.cassandra.CassandraInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
#
# Titan Cassandra InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=cassandrathrift
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.keyspace=janusgraph
storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.keyspace=janusgraph
#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.keyspace=janusgraph
cassandra.input.predicate=0c00020b0001000000000b000200000000020003000800047fffffff0000
cassandra.input.columnfamily=edgestore
cassandra.range.batch.size=2147483647
#
# SparkGraphComputer Configuration
#
spark.master=spark://127.0.0.1:7077
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.executor.memory=100g
gremlin.spark.persistContext=true
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer
HDFS seems to be configured correctly as described here:
gremlin> hdfs
==>storage[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_178390072_1, ugi=cassandra (auth:SIMPLE)]]]
Try fixing these properties:
janusgraphmr.ioformat.conf.storage.keyspace=janusgraph
storage.keyspace=janusgraph
Replace with:
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
storage.cassandra.keyspace=janusgraph
The default keyspace name is janusgraph, so despite the mistakes on the property names, I don't think you would have observed that problem unless you loaded your data using a different keyspace name.
The latter property is described in the Configuration Reference. Also, keep an eye on this open issue to improve the docs for Hadoop-Graph usage.

Error on installing Titan DB on Windows

Following the official guide of Titan DB here, and trying to run the command:
graph = TitanFactory.open('conf/titan-cassandra-es.properties')
I got this error:
Backend shorthand unknown: conf/titan-cassandra-es.properties
Obviously, the reason is the incorrect path to the
titan-cassandra-es.properties
file. So I changed it to:
graph = TitanFactory.open('../conf/titan-cassandra-es.properties')
and got this error:
Encountered unregistered class ID: 141.
The error happens in the following version:
titan-0.5.4-hadoop2
On titan-1.0.0-hadoop2 instead of this error message I get this one:
Invalid import definition: 'com.thinkaurelius.titan.hadoop.MapReduceIndexManagement'; reason: startup failed: script14747941661821834264593.groovy: 1: unable to resolve class com.thinkaurelius.titan.hadoop.MapReduceIndexManagement # line 1, column 1. import com.thinkaurelius.titan.hadoop.MapReduceIndexManagement ^
1 error
And on titan-1.0.0-hadoop2 I get this one:
The input line is too long.
The syntax of the command is incorrect.
Does anyone know how to handle this issue?
It seems like you have not even managed to get Titan 1 to start up yet.
I do not believe Titan 1 has been deployed to support Windows out of the box. I.e. the downloadable package will not just work with windows.
Saying that I have managed to get Titan DB 1 to work on windows. To do so, all you have to do is install Cassandra 2.x on Windows. This guide may help you out. Start cassandra and enable thrift connections.
With that done you should be able to get Titan doing basic operations on windows. From there you may find dealing with you current errors easier.
Side Note: Windows support for Titan 0.5.x may be more substantial. So you could look into that as well.

RethinkDB import error

I'm trying to import CSV or JSON file to Rethink DB but I always get the same error:
rethinkdb import -f ~/Downloads/convertcsv.json --table test.stats --format json
[ ] 0%
0 rows imported in 1 table
'indexes'
In file: /home/xxxxx/Downloads/convertcsv.json
Errors occurred during import
I don't see anything in logs and the same files import ok on my laptop.
Import creates the table but that's about it.
My system:
- List item
- Ubuntu 10.10
- Python 2.7.8
- rethinkdb 1.16.0+1~0utopic (GCC 4.9.1)
Already tried to re-install RethinkDB, sudo pip2 install --upgrade rethinkdb. Not sure what else I can do.
This appears to have been an oversight when adding export/import of secondary indexes - the import script is looking for the indexes field in the info, which doesn't exist when importing a single file. This can be worked around by providing the flag --no-secondary-indexes. A fix was released in the RethinkDB Python driver version 1.16.0-2, see the Github issue #3278 for details.

Resources