Fixing ERROR: cluster setting 'kv.rangefeed.enabled' is currently overridden by the operator in CockroachDB Serverless - cockroachdb

I'm following a guide for setting up a changefeed on CockroachDB, but right from the start I get the error cluster setting 'kv.rangefeed.enabled' is currently overridden by the operator. How can I enable changefeeds?

In CockroachDB Serverless, it's not necessary to set kv.rangefeed.enabled--you can just skip that part of the setup. If you're setting up a changefeed to write to external endpoints, you may need to have a credit card on file in your Serverless account, but you can keep your spend limit set to $0 and still run changefeeds.

Related

Janusgraph doesn't allow to set vertex Id even after setting this property `graph.set-vertex-id=true`

I'm running janusgraph server backed by Cassandra. It doesn't allow me to use custom vertex ids.
I do see the following log when the janusgraph gremlin server is starting.
Local setting graph.set-vertex-id=true (Type: FIXED) is overridden by globally managed value (false). Use the ManagementSystem interface instead of the local configuration to control this setting
Even tried to set this property via management API and still no luck.
gremlin> mgmt = graph.openManagement()
gremlin> mgmt.set('graph.set-vertex-id', true)
As the log message already states, this config option has the mutability FIXED which means that it is a global configuration option. Global configuration is described in this section of the JanusGraph documentation.
It states that:
Global configuration options apply to all instances in a cluster.
JanusGraph stores these configuration options in its storage backend which is Cassandra in your case. This ensures that all JanusGraph instances have the same values for these configuration values. Any changes that are made to these options in a local file are ignored because of this. Instead, you have to use the management API to change them which will update them in the storage backend.
But that is already what you tried with mgmt.set(). This doesn't work in this case however because this specific config option has the mutability level FIXED. The JanusGraph documentation describes this as:
FIXED: Like GLOBAL, but the value cannot be changed once the JanusGraph cluster is initialized.
So, this value can really not be changed in an existing JanusGraph cluster. Your only option is to start with a new cluster if you really need to change this value.
It is of course unfortunate that the error message suggested to use the management API even though it doesn't work in this case. I have created an issue with the JanusGraph project to improve this error message to avoid such confusion in the future: https://github.com/JanusGraph/janusgraph/issues/3206

Replica database is not accessible in Read Scale Availability Group in SQL Server 2017 Standard edition

I'm looking into ways of replicating databases from On-Premise environments to Azure and one of the options I found was setting up a Read-Scale availability group.
The reason I'm using a Read-Scale and not an Always On availability group, is because I don't won't to use SQL Server Enterprise edition due to the cost.
I followed a tutorial from Microsoft (MS TUTORIAL) to set this all up and in the end, I think I got it working as my database appeared on the Azure environment.
However, the problem is that my replica always stays in the Synchronizing state - which is probably due to the fact I chose Asynchronous Replication by using the AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT command - but even worse is that I can't access the database itself.
Each time I try to fire a query against it, it comes back with an Object is not accessible exception.
After some reading, I found that the cause of this might be because my replica doesn't have a secondary role. Trying to set this via the ... SECONDARY_ROLE({ALLOW_CONNECTIONS = ALL})... command, clearly states that this feature is not available in the Standard version of SQL Server.
My whole confusion comes from the fact that on the Microsoft documentation (MS DOCS), it says that With availability groups, one or more secondary replicas can be configured to support read-only access to secondary databases. which is exactly what I'm not succeeding in.
Did anybody have the same issue, or knows how to configure the Read-Scale availability group on SQL Server Standard so my second replica is accessible and readable as well?
P.S. I did look at the actual SQL Replication with Transaction Replication, but there are quite a bit of moving parts there, so I'm exploring all options before making a decision.
Based on a twitter conversation I found out that you will need to create a snapshot of the database in Secondary Replica in order to read from there.
Please read this tweeter thread.
I also added a suggestion in the feedback channel to fix the documentation.

AWS RDS database can't read record that was just written to database

I'm seeing an error with some Laravel code that uses an AWS RDS database. The code writes a record to the database and then immediately does a search to load that record using the primary key and gets no results.
If I try it manually afterwards I find the record. If I insert a 1-second sleep in the code it works correctly.
I've tried this using Laravel's separate settings for read and write hosts. I've also tried setting them to the same host and only using one host. The result is always the same. However other environments with the same configuration do not have the error.
Is there an option in RDS that needs to be changed to have the record available immediately after it's written.
The error is due to the mySQL master-slave replication lag.
A common mistake is to use a mySQL cluster and then perform a read
immediately after a write.
Since the read occurs on one of the slave/read hosts and the write occurs on the master, the data would not be replicated at the time of the read.
There are a couple of ways to rectify the error:
The read immediately after must be performed on the master (not the slave). Even though you've mentioned that you changed it to a single host, often people make a mistake while switching the connection. Refer this SO post to properly switch connections in Laravel
An easier way may be to use the sticky database option in Laravel. Beware: this may cause performance issues if not used carefully for only the use case you desire. From the docs:
The sticky option is an optional value that can be used to allow the
immediate reading of records that have been written to the database
during the current request cycle.
If the sticky option is enabled and a "write" operation has been
performed against the database during the current request cycle, any
further "read" operations will use the "write" connection.
The most "non-obvious" way is to NOT perform a read immediately after a write. Think about whether this can be avoided depending on your use case.
Other methods: refer this SO post

How do you use s3a with spark 2.1.0 on aws us-east-2?

Background
I have been working on getting a flexible setup for myself to use spark on aws with docker swarm mode. The docker image I have been using is configured to use the latest spark, which at the time is 2.1.0 with Hadoop 2.7.3, and is available at jupyter/pyspark-notebook.
This is working, and I have been just going through to test out the various connectivity paths that I plan to use. The issue I came across is with the uncertainty around the correct way to interact with s3. I have followed the trail on how to provide the dependencies for spark to connect to data on aws s3 using the s3a protocol, vs s3n protocol.
I finally came across the hadoop aws guide and thought I was following how to provide the configuration. However, I was still receiving the 400 Bad Request error, as seen in this question that describes how to fix it by defining the endpoint, which I had already done.
I ended up being too far off the standard configuration by being on us-east-2, making me uncertain if I had a problem with the jar files. To eliminate the region issue, I set things back up on the regular us-east-1 region, and I was able to finally connect with s3a. So I have narrowed down the problem to the region, but thought I was doing everything required to operate on the other region.
Question
What is the correct way to use the configuration variables for hadoop in spark to use us-east-2?
Note: This example uses local execution mode to simplify things.
import os
import pyspark
I can see in the console for the notebook these download after creating the context, and adding these took me from being completely broken, to getting the Bad Request error.
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell'
conf = pyspark.SparkConf('local[1]')
sc = pyspark.SparkContext(conf=conf)
sql = pyspark.SQLContext(sc)
For the aws config, I tried both the below method and by just using the above conf, and doing conf.set(spark.hadoop.fs.<config_string>, <config_value>) pattern equivalent to what I do below, except doing it this was I set the values on conf before creating the spark context.
hadoop_conf = sc._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.endpoint", "s3.us-east-2.amazonaws.com")
hadoop_conf.set("fs.s3a.access.key", access_id)
hadoop_conf.set("fs.s3a.secret.key", access_key)
One thing to note, is that I also tried an alternative endpoint for us-east-2 of s3-us-east-2.amazonaws.com.
I then read some parquet data off of s3.
df = sql.read.parquet('s3a://bucket-name/parquet-data-name')
df.limit(10).toPandas()
Again, after moving the EC2 instance to us-east-1, and commenting out the endpoint config, the above works for me. To me, it seems like endpoint config isn't being used for some reason.
us-east-2 is a V4 auth S3 instance so, as you attemped, the fs.s3a.endpoint value must be set.
if it's not being picked up then assume the config you are setting isn't the one being used to access the bucket. Know that Hadoop caches filesystem instances by URI, even when the config changes. The first attempt to access a filesystem fixes, the config, even when its lacking in auth details.
Some tactics
set the value is spark-defaults
using the config you've just created, try to explicitly load the filesystem via a call to Filesystem.get(new URI("s3a://bucket-name/parquet-data-name", myConf) will return the bucket with that config (unless it's already there). I don't know how to make that call in .py though.
set the property "fs.s3a.impl.disable.cache" to true to bypass the cache before the get command
Adding more more diagnostics on BadAuth errors, along with a wiki page, is a feature listed for S3A phase III. If you were to add it, along with a test, I can review it and get it in

Disabling/Pause database replication using ML-Gradle

I want to disable the Database Replication from the replica cluster in MarkLogic 8 using ML-Gradle. After updating the configurations, I also want to re-enable it.
There are tasks for enabling and disabling flexrep in ML Gradle. But I couldn't found any such thing for Database Replication. How can this be done?
ml-gradle uses the Management API to handle configuration changes. Database Replication is controlled by sending a PUT command to /manage/v2/databases/[id-or-name]/properties. Update your ml-config/databases/content-database.json file (example that does not include that property) to include database-replication, including replication-enabled: true.
To see what that object should look like, you can send a GET request to the properties endpoint.
You can create your own command to set replication-enabled - see https://github.com/rjrudin/ml-gradle/wiki/Writing-your-own-management-task
I'll also add a ticket for making official commands - e.g. mlEnableReplication and mlDisableReplication, with those defaulting to the content database, and allowing for any database to be specified.

Resources