connect JanusGraph management API to remote JanusGraph Server - janusgraph

I have a running instance of JanusGraph Server and I can connect using EmptyGraph for read and write. But I cannot use EmptyGraph to create an instance of JanusGraphManagement. I want to use the API to define my schema but the only options I can find are:
use Cluster to create a Client to submit command strings
connect an embedded JanusGraph instance directly to my Cassandra backend
I prefer to do everything through the JanusGraph Server. Is there a way to do this using the Java API? Or am I stuck with only the two above options?

As described in the updated docs of JanusGraph here:
The described connection uses GraphBinary and the janusgraph-driver which doesn't allow accessing the internal JanusGraph components such as ManagementSystem. To access the ManagementSystem, you have to submit java-based scripts, see Submitting Scripts, or directly accessing JanusGraph by local opening a JanusGraph instance.
So for now, no other options exists.

Related

Dataflow PubSub to Elasticsearch Template proxy

We need to create a Dataflow job that ingests from PubSub to Elasticsearch but the job can not make outbound internet connections to reach Elastic Cloud.
Is there a way to pass proxy parameters to the Dataflow vm on creation time?
Found this article but proxy parameters are part of a maven app, I'm not sure how to use it here.
https://leifengblog.net/blog/run-dataflow-jobs-in-a-shared-vpc-on-gcp/
Thanks
To reach an external endpoint you’ll need to configure internet access and firewall settings, depending on your use case, your VMs may also need access to other resources you can check in this document which method you’ll need to configure for Dataflow. Before selecting the method that you’ll choose please check the document how to specify a network or a subnetwork.
In GCP, in subnetwork, you can enable Google Private Access, and the VMs in that subnetwork will be able to reach all the GCP endpoints (Dataflow, BigQuery, etc), even if they have private IPs only. There is no need to set up a proxy. See this document.
For instance, for Java pipelines, I normally use private IPs only for the Dataflow workers, and they are able to reach Pubsub, BigQuery, Bigtable, etc.
For Python pipelines, if you have external dependencies, the workers will need to reach the PyPi, and for that, you need Internet connectivity. If you want to use private IPs in Python pipelines, you can ship those external dependencies in a custom container, so the workers don't need to download them.
You can use a maven file right after you write your pipeline, you must create and stage your template file(mvn) you can follow this example.

Flush Redis AWS ElastiCache with Laravel

I am using Laravel 5.5 in my application hosted by AWS; for caching I'm using Redis on ElastiCache. After some research I was able to configure it (using cluster), it works fine however Laravel is unable to flush in redis-cluster:
Cannot use 'FLUSHDB' with redis-cluster
After some digging I learned there is a bug in Laravel that does not allow flush in redis-cluster. I'm wondering: is there a way to use Redis in ElastiCache in a "non-cluster" way?
When I created the Redis instance I did not select the Cluster Mode enabled but apparently it still create as cluster.
If you don't want to use Clusters, configure your config/database.php file such that there's no clusters key in the redis connection
Check out the docs to learn how: https://laravel.com/docs/5.5/redis#configuration

Loadbalancing settings via spring AWS libraries for multiple RDS Read Only Replicas

If there are multiple read replicas, where load balancing related settings can be specified when using spring AWS libraries.
Read replicas have their own endpoint address similar to the original RDS instance. Your application will need to take care of using all the replicas and to switch between them. You'd need to introduce this algorithm into your application so it automatically detects which RDS instance it should connect to in turn. The following links can help:
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Replication.html#Overview.ReadReplica

Accessing Hadoop data using REST service

I am trying to update HDP architecture so data residing in Hive tables can be accessed by REST APIs. What are the best approaches how to expose data from HDP to other services?
This is my initial idea:
I am storing data in Hive tables and I want to expose some of the information through REST API therefore I thought that using HCatalog/WebHCat would be the best solution. However, I found out that it allows only to query metadata.
What are the options that I have here?
Thank you
You can very well use WebHDFS which is basically a REST Service over Hadoop.
Please see documentation below:
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html
The REST API gateway for the Apache Hadoop Ecosystem is called KNOX
I would check it before explore any other options. In other words, Do you have any reason to avoid using KNOX?
What version of HDP are you running?
The Knox component has been available for quite a while and manageable via Ambari.
Can you get an instance of HiveServer2 running in HTTP mode?
This would give you SQL access through J/ODBC drivers without requiring Hadoop config and binaries (other than those required for the drivers) on the client machines.

Connect hadoop cluster to mutiple Google Cloud Storage backets in multiple Google Projects

It is possible, to connect my Hadoop cluster to multiple Google Cloud Projects at once ?
I can easly use any Google Storage bucket in single Google Project via Google Cloud Storage Connector as explained in this thread Migrating 50TB data from local Hadoop cluster to Google Cloud Storage. But i can't find any documentation or example how to connect to two or more Google Cloud Project from single map-reduce job. Do You have any suggestion/trick ?
Thanks a lot.
Indeed, it is possible to connect your cluster to buckets from multiple different projects at once. Ultimately, if you're using the instructions for using a service-account keyfile, the GCS requests are performed on behalf of that service-account, which can be treated more-or-less like any other user. You can either add the service account email your-service-account-email#developer.gserviceaccount.com to all the different cloud projects owning buckets you want to process, using the permissions section of cloud.google.com/console and simply adding that email address like any other member, or you can set GCS-level access to add that service-account like any other user.

Resources