Janusgraph doesn't allow to set vertex Id even after setting this property `graph.set-vertex-id=true` - janusgraph

I'm running janusgraph server backed by Cassandra. It doesn't allow me to use custom vertex ids.
I do see the following log when the janusgraph gremlin server is starting.
Local setting graph.set-vertex-id=true (Type: FIXED) is overridden by globally managed value (false). Use the ManagementSystem interface instead of the local configuration to control this setting
Even tried to set this property via management API and still no luck.
gremlin> mgmt = graph.openManagement()
gremlin> mgmt.set('graph.set-vertex-id', true)

As the log message already states, this config option has the mutability FIXED which means that it is a global configuration option. Global configuration is described in this section of the JanusGraph documentation.
It states that:
Global configuration options apply to all instances in a cluster.
JanusGraph stores these configuration options in its storage backend which is Cassandra in your case. This ensures that all JanusGraph instances have the same values for these configuration values. Any changes that are made to these options in a local file are ignored because of this. Instead, you have to use the management API to change them which will update them in the storage backend.
But that is already what you tried with mgmt.set(). This doesn't work in this case however because this specific config option has the mutability level FIXED. The JanusGraph documentation describes this as:
FIXED: Like GLOBAL, but the value cannot be changed once the JanusGraph cluster is initialized.
So, this value can really not be changed in an existing JanusGraph cluster. Your only option is to start with a new cluster if you really need to change this value.
It is of course unfortunate that the error message suggested to use the management API even though it doesn't work in this case. I have created an issue with the JanusGraph project to improve this error message to avoid such confusion in the future: https://github.com/JanusGraph/janusgraph/issues/3206

Related

Fixing ERROR: cluster setting 'kv.rangefeed.enabled' is currently overridden by the operator in CockroachDB Serverless

I'm following a guide for setting up a changefeed on CockroachDB, but right from the start I get the error cluster setting 'kv.rangefeed.enabled' is currently overridden by the operator. How can I enable changefeeds?
In CockroachDB Serverless, it's not necessary to set kv.rangefeed.enabled--you can just skip that part of the setup. If you're setting up a changefeed to write to external endpoints, you may need to have a credit card on file in your Serverless account, but you can keep your spend limit set to $0 and still run changefeeds.

V10 Map node and database configuration by environment

I can't seem to find this in the docs.
If I have a flow with a Map node and that node has a database insert as one of its outputs, I can configure that just fine. What I can't figure out is how to change the database target when I go from environment to environment (dev to test to production). In v7 I could switch this with a property file and use the mqsibaroverride command but in v10, I no longer see the database instance name in the output of mqsireadbar.
Anyone know what the 'new' way to do this is?
https://www.ibm.com/support/knowledgecenter/en/SSMKHH_10.0.0/com.ibm.etools.mft.doc/cm28825_.htm
I found it. You have to use a JDBCProvider. Apparently, you can't use ODBC anymore.

How do you use s3a with spark 2.1.0 on aws us-east-2?

Background
I have been working on getting a flexible setup for myself to use spark on aws with docker swarm mode. The docker image I have been using is configured to use the latest spark, which at the time is 2.1.0 with Hadoop 2.7.3, and is available at jupyter/pyspark-notebook.
This is working, and I have been just going through to test out the various connectivity paths that I plan to use. The issue I came across is with the uncertainty around the correct way to interact with s3. I have followed the trail on how to provide the dependencies for spark to connect to data on aws s3 using the s3a protocol, vs s3n protocol.
I finally came across the hadoop aws guide and thought I was following how to provide the configuration. However, I was still receiving the 400 Bad Request error, as seen in this question that describes how to fix it by defining the endpoint, which I had already done.
I ended up being too far off the standard configuration by being on us-east-2, making me uncertain if I had a problem with the jar files. To eliminate the region issue, I set things back up on the regular us-east-1 region, and I was able to finally connect with s3a. So I have narrowed down the problem to the region, but thought I was doing everything required to operate on the other region.
Question
What is the correct way to use the configuration variables for hadoop in spark to use us-east-2?
Note: This example uses local execution mode to simplify things.
import os
import pyspark
I can see in the console for the notebook these download after creating the context, and adding these took me from being completely broken, to getting the Bad Request error.
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell'
conf = pyspark.SparkConf('local[1]')
sc = pyspark.SparkContext(conf=conf)
sql = pyspark.SQLContext(sc)
For the aws config, I tried both the below method and by just using the above conf, and doing conf.set(spark.hadoop.fs.<config_string>, <config_value>) pattern equivalent to what I do below, except doing it this was I set the values on conf before creating the spark context.
hadoop_conf = sc._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.endpoint", "s3.us-east-2.amazonaws.com")
hadoop_conf.set("fs.s3a.access.key", access_id)
hadoop_conf.set("fs.s3a.secret.key", access_key)
One thing to note, is that I also tried an alternative endpoint for us-east-2 of s3-us-east-2.amazonaws.com.
I then read some parquet data off of s3.
df = sql.read.parquet('s3a://bucket-name/parquet-data-name')
df.limit(10).toPandas()
Again, after moving the EC2 instance to us-east-1, and commenting out the endpoint config, the above works for me. To me, it seems like endpoint config isn't being used for some reason.
us-east-2 is a V4 auth S3 instance so, as you attemped, the fs.s3a.endpoint value must be set.
if it's not being picked up then assume the config you are setting isn't the one being used to access the bucket. Know that Hadoop caches filesystem instances by URI, even when the config changes. The first attempt to access a filesystem fixes, the config, even when its lacking in auth details.
Some tactics
set the value is spark-defaults
using the config you've just created, try to explicitly load the filesystem via a call to Filesystem.get(new URI("s3a://bucket-name/parquet-data-name", myConf) will return the bucket with that config (unless it's already there). I don't know how to make that call in .py though.
set the property "fs.s3a.impl.disable.cache" to true to bypass the cache before the get command
Adding more more diagnostics on BadAuth errors, along with a wiki page, is a feature listed for S3A phase III. If you were to add it, along with a test, I can review it and get it in

Disabling/Pause database replication using ML-Gradle

I want to disable the Database Replication from the replica cluster in MarkLogic 8 using ML-Gradle. After updating the configurations, I also want to re-enable it.
There are tasks for enabling and disabling flexrep in ML Gradle. But I couldn't found any such thing for Database Replication. How can this be done?
ml-gradle uses the Management API to handle configuration changes. Database Replication is controlled by sending a PUT command to /manage/v2/databases/[id-or-name]/properties. Update your ml-config/databases/content-database.json file (example that does not include that property) to include database-replication, including replication-enabled: true.
To see what that object should look like, you can send a GET request to the properties endpoint.
You can create your own command to set replication-enabled - see https://github.com/rjrudin/ml-gradle/wiki/Writing-your-own-management-task
I'll also add a ticket for making official commands - e.g. mlEnableReplication and mlDisableReplication, with those defaulting to the content database, and allowing for any database to be specified.

How to restrict index creation/deletion on Elasticsearch cluster?

How to authenticate/secure index creation/deletion operations in ElasticSearch 1.0.0 cluster? also would like to know how to disable delete index operation on ElasticSearch HQ plugin? I tried following settings in elasticsearch.yml file, but still allows user to perform the operations.
action.disable_delete_all_indices: true
action.auto_create_index: false
Apprrecaite any inputs.
Write a custom ConnectionPool class and use that instead of the default connection pools that ship with the client and make auth parameters as mandatory.
Now you can authenticate user every time.
You can use Pimple, It is a simple PHP Dependency Injection Container
example:
$elasticsearch_params['connectionParams']['auth'] =
array($collection['username'],$collection['password'],'Basic')

Resources